The Digital Object

A digital (or data) object is a self‑contained, structured unit of information that exists in electronic form. It is more than just raw bits: it is a combination of content, context, and identity that allows computers and humans to store, interpret, preserve, and reuse information reliably.

At its core, a digital object consists of three interdependent components:

%%{init: {
  "look": "handDrawn",
  "flowchart": { "htmlLabels": false }
}}%%

flowchart TD
    A["Digital Object"]
    B["Files"]
    C["Metadata"]
    D["Persistent Identifier"]

    A --> B
    A --> C
    A --> D
Figure 4.1: The Digital Object

How These Components Work Together

These three parts form a single functional entity:

  • The files hold the content.
  • The metadata explains the content and enables interpretation.
  • The persistent identifier ensures the object can always be located and referenced.

If any one of these components is missing, the digital object becomes incomplete. Files without metadata are uninterpretable; metadata without files is empty description; and neither files nor metadata are useful if they cannot be reliably identified and located.

The diagrams below illustrate two common patterns for connecting a PID, the underlying files, and their metadata.

%%{init: {
  "look": "handDrawn",
  "flowchart": { "htmlLabels": false }
}}%%

flowchart LR

     F["Files"]
     M["Metadata"]
    
    P["Persistent Identifier (PID)"]

    P --> |"points to"| F
    F --> |"points to"| M
    M --> |"contains"| P
Figure 4.2: PID points to a file which contains or points to the metadata.

In the first pattern, the PID resolves directly to a file, which then contains or links to the relevant metadata.

%%{init: {
  "look": "handDrawn",
  "flowchart": { "htmlLabels": false }
}}%%

flowchart LR

    F["Files"]
    M["Metadata"]
    P["Persistent Identifier (PID)"]

    M --> |"contains"| P
    P --> |"points to"| M
    M --> |"has access info to"| F
Figure 4.3: PID points to metadata or a landing page that contains information on how to access the file(s).

In the second pattern, the PID resolves to a metadata record or landing page, which provides the information needed to locate and access the file(s).
Both approaches are widely used; the choice depends on repository design and access requirements.

Why Digital Objects Matter

Digital objects are the foundation of modern research data management systems. Their structured nature enables:

  • Preservation — keeping knowledge accessible as formats, platforms, and storage systems evolve.
    Files can be migrated, metadata enriched, and storage reorganized without breaking the object’s identity.

  • Interoperability — ensuring different tools and repositories can exchange and interpret the object consistently.
    Content, metadata, and identifiers travel together, preserving meaning across systems.

  • Reusability and sharing — supporting research, education, and collaboration.
    Clear metadata, stable identifiers, and standardized structures make digital objects easy to understand and integrate into new workflows.

  • Reliable citation and tracking — through persistent identifiers that remain valid even as files move or metadata changes.
    PIDs support accurate citation, versioning, provenance tracking, and automated linking.

  • Robust storage strategies — enabling multiple replicas across diverse storage environments while maintaining a single, consistent identity.
    Replicas of files on different storage systems improve resilience and performance without compromising citation or access.

Additional lifecycle functions include:

  • Versioning — recording updates or corrections while keeping earlier states accessible and citable.
  • Fixity — verifying file integrity through checksums or hashes to detect corruption or tampering.
  • Provenance — documenting how data came to be, including transformations and dependencies.
    This includes relationships such as raw data → processed data → analysed data, making the object’s history transparent and reproducible.
%%{init: {
  "look": "handDrawn",
  "flowchart": { "htmlLabels": false }
}}%%

flowchart RL

    subgraph RAW["Raw Data Object"]
        direction TB
        R_F["Files"]
        R_M["Metadata"]
        R_P["PID"]
    end

    subgraph PROC["Processed Data Object"]
        direction TB
        P_F["Files"]
        P_M["Metadata"]
        P_P["PID"]
    end
    
    subgraph ANAL["Analysed Data Object"]
        direction TB
        A_F["Files"]
        A_M["Metadata"]
        A_P["PID"]
    end

    P_M --> |"created from"| R_P
    A_M --> |"created from"| P_P
Figure 4.4: Data lineage represented as a chain of Digital Objects (raw → processed → analysed).

In essence

A digital object is content + context + identity.
The bitstreams supply the content, the metadata provides the meaning needed to interpret and manage that content, and the persistent identifier anchors a stable, durable identity. Together, these components make digital information portable across systems, understandable over time, and resilient to changes in storage or technology. The chapters that follow examine each of these elements in depth, along with the infrastructure that enables them to function as a coherent whole.

References

  • https://www.lisedunetwork.com/what-is-a-digital-object/?utm_source=copilot.com
  • https://wiki.dpconline.org/index.php?title=4.2.1.3.1_Representation_Information_Types&utm_source=copilot.com
  • https://archive.rd-alliance.org/sites/default/files/2-kahn-do-talk.pdf?utm_source=copilot.com
  • DOI 10.1007/s00799-005-0128-x