RDM Infrastructure Design

Designing a modern Research Data Management (RDM) infrastructure means taking the FAIR principles seriously. FAIR is often described as a set of guidelines for researchers or a description of how the data should look like at the end of its life cycle. However, in practice it places equally strong requirements on the systems that store, move, describe, and publish research outputs. A FAIR‑aligned platform is therefore not a single tool but a coordinated ecosystem of services that together make data Findable, Accessible, Interoperable, and Reusable.

What FAIR Means for Infrastructure

Findable (F1–F4)

To make data findable, infrastructure must support persistent identifiers (F1) such as DOIs or Handles, ensuring that datasets remain citable and identifiable even as they move across storage systems. These identifiers must be accompanied by rich metadata (F2) that describes the dataset in a structured, machine‑readable way. Metadata must explicitly link back to the dataset it describes (F3), and both data and metadata must be registered in searchable indexes (F4) that can be harvested by external services.

In practice, this means PID services, metadata registries, search engines, and APIs that expose metadata for discovery.

Accessible (A1–A2)

Accessibility requires that data and metadata be retrievable through open, standardised protocols (A1), such as HTTPS, S3, WebDAV, or OAI‑PMH. These protocols must be free to implement (A1.1) and support authentication and authorisation where needed (A1.2). Even if data becomes restricted or removed, its metadata must remain accessible (A2), ensuring that the scholarly record is preserved.

This translates into federated identity, fine‑grained access control, and stable landing pages for metadata.

Interoperable (I1–I3)

Interoperability is achieved when infrastructure supports formal, shared languages (I1) such as RDF, JSON‑LD, or XML, enabling machines to understand and combine metadata from different sources. Metadata must rely on FAIR vocabularies and ontologies (I2), which include both generic, cross‑domain vocabularies such as Dublin Core, SKOS, or schema.org, and domain‑specific FAIR ontologies from communities like the OBO Foundry. These vocabularies provide stable identifiers and shared semantics, allowing metadata to be interpreted consistently across systems. Interoperability also requires qualified references (I3) that link datasets to related entities such as software, instruments, or grants.

Together, these elements require APIs, ontology support, and metadata models capable of expressing rich, typed relationships rather than flat fields.

Reusable (R1–R1.3)

Finally, reuse depends on rich descriptions (R1) that capture domain‑specific details using community standards such as ISA, MIAPPE, or DDI. Metadata must include a clear usage license (R1.1), detailed provenance (R1.2), and conform to community expectations (R1.3).

Infrastructure must therefore support schema versioning, provenance capture, workflow integration, and controlled vocabularies for licensing.

What a FAIR‑Aligned RDM Infrastructure Looks Like

A FAIR platform is best understood as a stack of coordinated services, each addressing a different phase of the research data lifecycle. No single system can satisfy all FAIR requirements. Instead, FAIR emerges from the interplay between systems.

A typical FAIR‑aligned architecture includes:

Active research & collaboration

Where data is created, shared, and iterated on. E.g.:

Nextcloud / ResearchDrive for collaborative storage
MetaVox for early, schema‑based metadata

Metadata creation & FAIRification

Where metadata becomes structured, validated, and machine‑actionable. E.g.:

FAIR Data Station for domain‑specific metadata (ISA, RDF)
CKAN as a central metadata registry and search portal

Storage & orchestration

Where data is stored, moved, and governed. E.g.:

Object Store, dCache, HPC filesystems as storage backends
iRODS as the policy engine and metadata layer
Globus for high‑performance transfers

Publication & discovery

Where curated datasets become part of the scholarly record. E.g.:

Dataverse (e.g., DANS Data Stations)
Invenio‑based repositories (SURF Data Repository, Zenodo)
4TU.ResearchData (Djehuty)
Discovery portals combining repository search with metadata registries

Integration, identity & policy

Where the ecosystem becomes coherent. E.g.:

SURF Research Data Connector for publishing workflows
Federated identity (SURFconext, SRAM, eduGAIN)
APIs (REST, OAI‑PMH, SPARQL, S3) for machine access

Together, these layers support all FAIR principles:

Findable: PIDs, indexes, registries, APIs
Accessible: Open protocols, AAI, policy‑based access
Interoperable: RDF/JSON‑LD, ontologies, typed links
Reusable: Community schemas, provenance, licensing, versioning

A Layered FAIR Architecture

%%{init: {
  "look": "handDrawn",
  "flowchart": { "htmlLabels": false }
}}%%

flowchart TD

    %% Users
    subgraph Users["Users"]
        R["Researchers"]
        DS["Data Stewards"]
        M["Machines / Workflows"]
    end

    %% Layer 1
    subgraph L1["Active Research"]
        NC["Nextcloud / ResearchDrive"]
        MV["MetaVox"]
    end

    %% Layer 2
    subgraph L2["FAIR Metadata"]
        FDS["FAIR Data Station"]
        CKAN["CKAN Registry"]
    end

    %% Layer 3
    subgraph L3["Storage and Orchestration"]
        IRODS["iRODS"]
        OBJ["Object Store"]
        DC["dCache"]
        GLOB["Globus"]
    end

    %% Layer 4
    subgraph L4["Publication and Discovery"]
        DV["Dataverse"]
        INV["Invenio / SDR"]
        DJE["4TU / Djehuty"]
        PORTAL["Discovery Portal"]
    end

    %% Layer 5
    subgraph L5["AAI and Integration"]
        RDC["Research Data Connector"]
        AAI["Federated Identity"]
        API["APIs"]
    end

    %% Simplified flows
    R --> NC
    R --> FDS
    NC --> OBJ
    NC --> DC
    MV --> CKAN
    FDS --> CKAN
    CKAN --> PORTAL

    OBJ --> IRODS
    DC --> IRODS
    IRODS --> GLOB

    IRODS --> RDC
    RDC --> DV
    RDC --> INV
    RDC --> DJE

    DV --> API
    INV --> API
    DJE --> API
    API --> PORTAL

    R --> AAI
    DS --> AAI
    AAI --> NC
    AAI --> DV
    AAI --> IRODS

    %% Highlight key nodes
    style DV fill:#d8f3dc,stroke:#1b4332,color:#1b4332
    style FDS fill:#d8f3dc,stroke:#1b4332,color:#1b4332
    style IRODS fill:#d8f3dc,stroke:#1b4332,color:#1b4332
    style CKAN fill:#d8f3dc,stroke:#1b4332,color:#1b4332
    style NC fill:#d8f3dc,stroke:#1b4332,color:#1b4332
    style PORTAL fill:#d8f3dc,stroke:#1b4332,color:#1b4332

Integration and Policy: The Glue That Makes FAIR Work

A FAIR platform is only FAIR if its components work together. This requires well‑defined integrations and a policy engine that governs how data and metadata move across the lifecycle. Storage, metadata services, workflow engines, and repositories must interoperate through stable interfaces and shared standards so that data remains coherent as it transitions between systems.

A FAIR‑aligned infrastructure therefore needs:

Clear integration contracts (APIs, event streams, metadata mappings, PID handling)
Consistent metadata models across all layers
Automated workflows for validation, provenance, versioning, and access control
A central policy engine (e.g., iRODS rules, workflow orchestrators) to coordinate:
- ingest from active storage
- metadata enrichment
- transformations and quality checks
- publication to repositories
- synchronisation across catalogs and indexes

Without this orchestration layer, the platform becomes a loose collection of tools. With it, the platform becomes a coherent, predictable, auditable, FAIR‑by‑design ecosystem.