Skip to main content

ARP’s latest developments at the Dataverse Community Meeting

The Dataverse Community Meeting (DCM) 2026 was held in Barcelona between May 11-15, 2026. Following the successful introduction of ARP last year at DCM2025 (Chapel Hill, NC, USA), Balázs Pataki and Norbert Finta from the HUN-REN SZTAKI Department of Distributed Systems (SZTAKI DSD) presented ARP’s latest development to the DCM professional community.

Dataverse, the research repository system developed by Harvard, turned 20 this year. Fortunately, this advanced software age is not reflected in the system itself: under Harvard’s leadership, the software is developed in collaboration with several research institutions, including HUN-REN SZTAKI, and new versions are released quarterly. The software’s popularity remains strong. Today, there are 147 Dataverse installations, some of which serve several hundred universities or research institutions at the same time, meaning that Dataverse has users at several thousand research sites and educational institutions.

Dataverse is of particular importance to us because this system also forms the basis of the HUN-REN Data Repository Platform (ARP). With a single Dataverse installation, ARP serves the entire HUN-REN research network and, from 2026, Hungarian universities as well. In ARP, each institution receives its own small Dataverse, or collection, which institutional data stewards can manage with institution-specific permissions.

""

Half of the 147 Dataverse installations, 73 in total, are located in Europe. Recognising this outstanding level of interest, Slava Tykhonov, CODATA’s Chief AI Technology Officer and the first official “Dataverse Ambassador”, launched the Dataverse.eu initiative. Its aim is to enable European users and developers to become more actively involved in Dataverse development than before, and to help their developments reach end users more quickly. This initiative is in the brainstorming phase at the moment, but at DCM 2026 we had the opportunity to hold in-person discussions and lay the foundations for a common European Dataverse strategy. Staff from ARP’s two partner institutions, HUN-REN SZTAKI and ELTE TK, are also participating in this work.

The conference this year focused on three main areas: creating AI solutions to enhance repository workflows, data quality and AI-ready data; improving interoperability to enable richer linkage and reuse across datasets, domains and platforms; and expanding support for sensitive and restricted data.

Balázs Pataki presented ARP’s latest development: VibeARP, an AI agent that assists data-stewards by reducing their workload of creating metadata annotation of data packages, allowing them to deal with more important data content-related issues. Naturally, AI was discussed at the conference in many different ways. It was addressed through solutions similar to VibeARP which, on the one hand, enable the easier and more reliable creation of metadata; and on the other, support better understanding and reuse of these data. Curation and support for research workflows are key topics in the field of research data repositories, and these will also become increasingly important for ARP in the future.

Pataki Balázs előad a DCM 2026 konferencián

During DCM 2026 our colleagues gained insight into the use-cases and solutions vital to the Dataverse user community, and had discussions about future cooperations and potential projects. They talked to developers from Harvard about how the CEDAR metadata schema registry, developed within ARP, and its integration with Dataverse could become part of the core Dataverse software in the future. If this initiative is realised, it will enable not only the Hungarian research community but also the 146 other Dataverse installations worldwide to create, use and share metadata schemas in a simple and user-friendly way.

DCM 2026 konferencia résztvevői