%%{init: {
"look": "handDrawn",
"flowchart": { "htmlLabels": false }
}}%%
flowchart TD
%% Users
subgraph Users["Users"]
R["Researchers"]
DS["Data Stewards"]
M["Machines / Workflows"]
end
%% Layer 1
subgraph L1["Active Research"]
NC["Nextcloud / ResearchDrive"]
MV["MetaVox"]
end
%% Layer 2
subgraph L2["FAIR Metadata"]
FDS["FAIR Data Station"]
CKAN["CKAN Registry"]
end
%% Layer 3
subgraph L3["Storage and Orchestration"]
IRODS["iRODS"]
OBJ["Object Store"]
DC["dCache"]
GLOB["Globus"]
end
%% Layer 4
subgraph L4["Publication and Discovery"]
DV["Dataverse"]
INV["Invenio / SDR"]
DJE["4TU / Djehuty"]
PORTAL["Discovery Portal"]
end
%% Layer 5
subgraph L5["AAI and Integration"]
RDC["Research Data Connector"]
AAI["Federated Identity"]
API["APIs"]
end
%% Simplified flows
R --> NC
R --> FDS
NC --> OBJ
NC --> DC
MV --> CKAN
FDS --> CKAN
CKAN --> PORTAL
OBJ --> IRODS
DC --> IRODS
IRODS --> GLOB
IRODS --> RDC
RDC --> DV
RDC --> INV
RDC --> DJE
DV --> API
INV --> API
DJE --> API
API --> PORTAL
R --> AAI
DS --> AAI
AAI --> NC
AAI --> DV
AAI --> IRODS
%% Highlight key nodes
style DV fill:#d8f3dc,stroke:#1b4332,color:#1b4332
style FDS fill:#d8f3dc,stroke:#1b4332,color:#1b4332
style IRODS fill:#d8f3dc,stroke:#1b4332,color:#1b4332
style CKAN fill:#d8f3dc,stroke:#1b4332,color:#1b4332
style NC fill:#d8f3dc,stroke:#1b4332,color:#1b4332
style PORTAL fill:#d8f3dc,stroke:#1b4332,color:#1b4332
RDM Infrastructure Design
Designing a modern Research Data Management (RDM) infrastructure means taking the FAIR principles seriously. FAIR is often described as a set of guidelines for researchers or a description of how the data should look like at the end of its life cycle. However, in practice it places equally strong requirements on the systems that store, move, describe, and publish research outputs. A FAIR‑aligned platform is therefore not a single tool but a coordinated ecosystem of services that together make data Findable, Accessible, Interoperable, and Reusable.
What FAIR Means for Infrastructure
Findable (F1–F4)
To make data findable, infrastructure must support persistent identifiers (F1) such as DOIs or Handles, ensuring that datasets remain citable and identifiable even as they move across storage systems. These identifiers must be accompanied by rich metadata (F2) that describes the dataset in a structured, machine‑readable way. Metadata must explicitly link back to the dataset it describes (F3), and both data and metadata must be registered in searchable indexes (F4) that can be harvested by external services.
In practice, this means PID services, metadata registries, search engines, and APIs that expose metadata for discovery.
Accessible (A1–A2)
Accessibility requires that data and metadata be retrievable through open, standardised protocols (A1), such as HTTPS, S3, WebDAV, or OAI‑PMH. These protocols must be free to implement (A1.1) and support authentication and authorisation where needed (A1.2). Even if data becomes restricted or removed, its metadata must remain accessible (A2), ensuring that the scholarly record is preserved.
This translates into federated identity, fine‑grained access control, and stable landing pages for metadata.
Interoperable (I1–I3)
Interoperability is achieved when infrastructure supports formal, shared languages (I1) such as RDF, JSON‑LD, or XML, enabling machines to understand and combine metadata from different sources. Metadata must rely on FAIR vocabularies and ontologies (I2), which include both generic, cross‑domain vocabularies such as Dublin Core, SKOS, or schema.org, and domain‑specific FAIR ontologies from communities like the OBO Foundry. These vocabularies provide stable identifiers and shared semantics, allowing metadata to be interpreted consistently across systems. Interoperability also requires qualified references (I3) that link datasets to related entities such as software, instruments, or grants.
Together, these elements require APIs, ontology support, and metadata models capable of expressing rich, typed relationships rather than flat fields.
Reusable (R1–R1.3)
Finally, reuse depends on rich descriptions (R1) that capture domain‑specific details using community standards such as ISA, MIAPPE, or DDI. Metadata must include a clear usage license (R1.1), detailed provenance (R1.2), and conform to community expectations (R1.3).
Infrastructure must therefore support schema versioning, provenance capture, workflow integration, and controlled vocabularies for licensing.
What a FAIR‑Aligned RDM Infrastructure Looks Like
A FAIR platform is best understood as a stack of coordinated services, each addressing a different phase of the research data lifecycle. No single system can satisfy all FAIR requirements. Instead, FAIR emerges from the interplay between systems.
A typical FAIR‑aligned architecture includes:
Active research & collaboration
Where data is created, shared, and iterated on. E.g.:
- Nextcloud / ResearchDrive for collaborative storage
- MetaVox for early, schema‑based metadata
Metadata creation & FAIRification
Where metadata becomes structured, validated, and machine‑actionable. E.g.:
- FAIR Data Station for domain‑specific metadata (ISA, RDF)
- CKAN as a central metadata registry and search portal
Storage & orchestration
Where data is stored, moved, and governed. E.g.:
- Object Store, dCache, HPC filesystems as storage backends
- iRODS as the policy engine and metadata layer
- Globus for high‑performance transfers
Publication & discovery
Where curated datasets become part of the scholarly record. E.g.:
- Dataverse (e.g., DANS Data Stations)
- Invenio‑based repositories (SURF Data Repository, Zenodo)
- 4TU.ResearchData (Djehuty)
- Discovery portals combining repository search with metadata registries
Integration, identity & policy
Where the ecosystem becomes coherent. E.g.:
- SURF Research Data Connector for publishing workflows
- Federated identity (SURFconext, SRAM, eduGAIN)
- APIs (REST, OAI‑PMH, SPARQL, S3) for machine access
Together, these layers support all FAIR principles:
- Findable: PIDs, indexes, registries, APIs
- Accessible: Open protocols, AAI, policy‑based access
- Interoperable: RDF/JSON‑LD, ontologies, typed links
- Reusable: Community schemas, provenance, licensing, versioning
A Layered FAIR Architecture
Integration and Policy: The Glue That Makes FAIR Work
A FAIR platform is only FAIR if its components work together. This requires well‑defined integrations and a policy engine that governs how data and metadata move across the lifecycle. Storage, metadata services, workflow engines, and repositories must interoperate through stable interfaces and shared standards so that data remains coherent as it transitions between systems.
A FAIR‑aligned infrastructure therefore needs:
Clear integration contracts (APIs, event streams, metadata mappings, PID handling)
Consistent metadata models across all layers
Automated workflows for validation, provenance, versioning, and access control
A central policy engine (e.g., iRODS rules, workflow orchestrators) to coordinate:
- ingest from active storage
- metadata enrichment
- transformations and quality checks
- publication to repositories
- synchronisation across catalogs and indexes
- ingest from active storage
Without this orchestration layer, the platform becomes a loose collection of tools. With it, the platform becomes a coherent, predictable, auditable, FAIR‑by‑design ecosystem.