%%{init: {"look": "handDrawn"}}%%
flowchart LR
A(PID) --> B{Resolver}
B --> C(URL)
Persistent Identifiers
Persistent Identifiers (PIDs) are long-lasting references used to uniquely identify digital objects such as research papers, datasets, software, or even researchers. Unlike regular web links that may break or change over time, PIDs are designed to remain stable and consistently point to the correct resource location, even if its location on the internet changes.
A PID works by assigning a unique identifier to an object and linking it to a record that stores important information (metadata) about that object, such as its title, creator, and most importantly its current location. You can think of the so-called PID entry as an ID card or at least an address book for the object. Like with your own ID card, you need to update that metadata. For example, if the object moves to a different website or repository, the metadata behind the PID needs to be updated so that the identifier continues to point to the correct place.
By providing stable, machine-readable references, PIDs help ensure that digital resources remain accessible, traceable, and properly attributed over time.
Articles & Datasets: Digital Object Identifiers (DOI)
People: ORCID iD
Institutions: Research Organisation Registry (ROR)
Why not simply use Uniform Resource Locations (URLs)?
In the context of Persistent Identifiers (PIDs), the distinction between URLs and URIs becomes important because PIDs are designed to identify digital objects in a stable way, even if their location changes.
A URL (Uniform Resource Locator) is a specific kind of URI that points to the current location of a resource on the web. The URI (Uniform Resource Identifier) is a broader concept: it is any identifier used to uniquely identify a resource. Many PIDs are implemented as URIs because they provide a stable identity for an object without tying it permanently to a specific location.
| Type | Example | What it does |
|---|---|---|
| URL (Uniform Resource Locator) | https://www.nature.com/articles/nphys1170 |
The actual web address where the resource is currently located. |
| URI (Uniform Resource Identifier) | urn:isbn:9780131101630 |
Identifies a resource (in this case a book by its ISBN) but does not necessarily tell you where to find it online. |
| PID (Persistent Identifier) | 10.1038/nphys1170 (a Digital Object Identifier (DOI)) |
A long-lasting identifier for a digital object such as a research article. It stays the same even if the location of the object changes. |
The persistent aspect of a PID is the identifier, it MUST NOT change. The location even the content of the object it identifies can change over time. Also when the object is deleted, the PID continues to exist.
How does a PID work?
A PID typically works through a resolution system or resolver:
- The PID uniquely identifies the object.
- The PID resolves through an infrastructure service.
- The service redirects the user to the current URL where the object is located or, if the object itself is not reachable through the http, a metadata entry that provides more information like the landing page of a data repository.
For example, a Digital Object Identifier (DOI) such as 10.1000/182 identifies a publication. When you access it via https://doi.org/10.1000/182, the DOI service resolves the identifier and redirects you to the current URL of the article on a publisher or repository website. In this example, the DOI is an identifier for The DOI Handbook. The DOI service resolves the identifier to the current URL which, as of writing this, is https://www.doi.org/the-identifier/resources/handbook/.
This separation is what makes PIDs persistent. If the resource moves to a different website, the URL can be updated in the PID record, while the identifier itself remains the same, ensuring that citations and references continue to work over time.
A notable update of the location is the so-called tombstone. If the object an identifier points to is deleted, the identifier still continues to exist and should be redirected to a tombstone page.
PIDs help to prevent Link Rot.
%%{init: {
"look": "handDrawn",
"flowchart": { "htmlLabels": false }
}}%%
flowchart LR
LINK["Create direct URL"]
USE["Users access URL"]
MOVE["Resource moves"]
BREAK["URL breaks 404"]
LINK --> USE --> MOVE --> BREAK
%% Style for the 404 node
style BREAK fill:#ffcccc,stroke:#cc0000,color:#000000
style MOVE fill:#e0f7e9,stroke:#2e8b57,color:#000000
%%{init: {
"look": "handDrawn",
"themeVariables": {
"fontSize": "22px",
"fontSizeSection": "28px"
},
"flowchart": { "htmlLabels": false }
}}%%
flowchart LR
MINT["Mint PID"]
STORE["Store Handle record"]
RESOLVE["Resolve PID"]
ACCESS["Access resource"]
MOVE_PID["Resource moves"]
UPDATE["Update record"]
MINT --> STORE --> RESOLVE --> ACCESS --> MOVE_PID --> UPDATE --> RESOLVE
%% Highlight the two nodes
style MOVE_PID fill:#e0f7e9,stroke:#2e8b57,color:#000000
style UPDATE fill:#e0f7e9,stroke:#2e8b57,color:#000000
We explain the resolving mechanism in detail in Resolution of Persistent Identifiers.
How do we ensure that PIDs are globally unique?
To reliably identify objects, the PID needs to be globally unique. Persistent Identifiers are designed with structured namespaces and controlled registration systems to guarantee this global uniqueness. A good example is the Digital Object Identifier (DOI).
1. Structured Identifier Format
A DOI has two main parts separated by a slash:
prefix/suffix
Example:
10.1038/nphys1170
- Prefix (
10.1038) – identifies the organization that registered the DOI. - Suffix (
nphys1170) – chosen by that organization to uniquely identify a specific object.
This structure ensures uniqueness because different organizations control different prefixes.
3. Responsibility of the Prefix Holder
Once an organization receives a prefix, it is responsible for ensuring that the suffixes it creates are unique within its prefix. For example:
10.5281/zenodo.3238330
10.5281/zenodo.3257173
Both belong to the same prefix (10.5281, which is owned by Zenodo) but identify different objects.
4. Registry Validation
When a DOI is registered, the system checks the central registry to ensure that the full identifier has not already been assigned.
Summary
Global uniqueness is ensured through:
- A structured identifier format (prefix + suffix)
- Central governance and registration agencies
- Delegated namespaces for organizations
- Registry validation when identifiers are created
This layered approach allows billions of objects to receive unique identifiers while keeping the system scalable and reliable.
Minting Persistent Identifiers (PIDs)
Minting a Persistent Identifier (PID) refers to the process of creating and registering a new, globally unique identifier for a digital object. This object can be a research article, dataset, software package, researcher, organization, or other scholarly resource.
Minting a PID ensures that the object can be consistently identified, cited, and discovered over time, regardless of where the object is stored.
How PID Minting Works
The process of minting a PID typically involves several steps:
Create the identifier A unique identifier is generated according to the structure of the PID system. For example, in the Digital Object Identifier (DOI) system this means creating a prefix and suffix combination.
Register the PID with a registration authority The new PID is registered with an official infrastructure provider or registration agency, such as DataCite or Crossref. This ensures the identifier is globally unique and recorded in a central registry.
Attach metadata Metadata describing the object is submitted along with the identifier. This may include information such as the title, creator, publication year, and the URL where the object is located.
Link the PID to a resolvable location The PID is associated with a URL where the object can be accessed. When someone uses the PID (for example through a resolver), they are redirected to this location.
Minting a PID is therefore not only about creating an identifier—it also involves registering and maintaining the information that allows the identifier to function over time.
Note: The metadata of a PID is always publicly accessible!
PID Profiles and PID Information Types
To make Persistent Identifiers truly useful in digital infrastructures, it is not enough to simply create the identifier itself. Systems also need a clear way to define what information is associated with a PID and how that information should be structured and interpreted. This is where PID profiles and PID information types play an important role.
PID Profiles
A PID profile defines how a specific type of PID should be implemented and used within a community or infrastructure. It describes the expected structure, metadata, and relationships associated with identifiers for a particular type of object.
PID profiles help ensure that identifiers are used consistently and interoperably across systems.
A PID profile typically specifies:
- The type of object the PID represents (e.g., dataset, person, organization, instrument)
- Required and optional metadata fields
- Relationships to other identifiers
- Rules for updating and maintaining the PID record
- Resolution expectations
For example, a dataset registered through DataCite using a Digital Object Identifier (DOI) follows a defined metadata profile that includes fields such as title, creator, publisher, publication year, and resource type.
PID profiles therefore provide guidance for implementers, ensuring that identifiers are not only unique but also meaningful and usable across systems.
PID Information Types
PID information types define the individual pieces of information stored in a PID record and how they are represented in the PID infrastructure. They describe the structure and semantics of the data associated with a PID.
In systems such as the Handle System, see PID services, a PID record consists of a set of typed values, where each value has a specific meaning.
Examples of information types include:
- Location information – the URL where the resource or its landing page can be accessed
- Administrative information – ownership or management of the PID
- Checksum or integrity information – verifying the authenticity of the resource
- Metadata references – links to detailed metadata records
- Relationships – links to other identifiers such as people, organizations, or related datasets
Each information type has a defined structure and meaning, allowing machines and services to interpret the information consistently.
How PID Profiles and Information Types Work Together
PID profiles and PID information types complement each other:
- PID profiles define what information should exist for a particular type of object.
- PID information types define how that information is stored and structured within the PID record.
Together, they enable PID infrastructures to support reliable identification, rich metadata, and automated linking between research entities, forming the basis for interoperable research data ecosystems.
Both together can be seen as a sort metadata schema for PIDs.
Infrastructures that implement PIDs with PID profiles and PID information types are Coscine (Collaborative Scientific Integration Environment) and EUDAT (European Data Infrastructure). Coscine implements the PID kernel information as defined by the Research Data Alliance (RDA) in their Recommendations. EUDAT does not implement a standard for PID profiles and information types. PID profiles in EUDAT are customised to community needs and include timestamps, checksums and data classification.
Using Persistent Identifiers (PIDs) for Non-Resolvable or Sensitive Data
Persistent Identifiers are often accessed through resolvers (for example https://doi.org/...), directing to a resource which can be accessed through http. However, some resources on the internet are not accessible through http, or are sensitive and only accessible after a successful authorisation. This includes offline data, restricted datasets, physical samples, or resources stored in systems without web access. In those cases the resolver will not be able to serve the resource.
In those cases it is advised to create a landing page with information for users and machines how to access the data. In the chapter PID services we show how the PID entry itself can serve as a landing page.
The key principle is that a PID identifies the object, not necessarily its downloadable location.
1. Use the PID to Identify the Object
Even if the object itself cannot be accessed online, a PID still provides a stable reference. For example, a dataset stored in a secure environment or a physical specimen can still receive a Digital Object Identifier (DOI).
The PID ensures that the object can be:
- Cited in publications
- Referenced in metadata records
- Linked to related research outputs
2. Provide a Landing Page Instead of the Data
For non-resolvable or restricted data, the PID typically resolves to a metadata landing page rather than the data itself.
This landing page may include:
- Description of the resource
- Creator information
- Access conditions
- Instructions for requesting access
- Related identifiers
Many repositories using services such as DataCite follow this pattern.
Example:
%%{init: {
"look": "handDrawn",
"flowchart": { "htmlLabels": false }
}}%%
flowchart LR
A(PID) --> B(Landing page)
B --> C(Instructions for accessing the data)
This ensures the identifier remains useful even when the data cannot be openly retrieved.
3. Use PIDs for Physical or Offline Objects
PIDs are also commonly used for non-digital resources, such as:
- biological specimens
- museum collections
- laboratory samples
- instruments or facilities
In these cases, the PID identifies the object and resolves to documentation about the object.
4. Use Persistent Identifier Schemes That Do Not Depend on HTTP
Some identifier schemes are designed primarily for identification rather than web resolution, such as identifiers based on the Handle System or URN-based schemes. These identifiers can still be stored and exchanged in metadata systems even when direct web access is not possible. We will give some examples in the hands-on section PID services.
A PID does not require the underlying object to be directly accessible online. Instead, the PID provides a stable identifier and a metadata record, ensuring that the object can still be found, cited, and described, even if access requires special permissions or occurs outside the web.
Assigning a PID marks the moment when a dataset enters the scholarly record. From here, the responsibility shifts to the systems that keep that identifier usable over time. In the following chapter, we look at the PID services that mint identifiers, store their metadata, and ensure that they continue to resolve even as data and infrastructures evolve.
References
General
- https://projects.tib.eu/pid-service/en/persistent-identifiers/persistent-identifiers-pids/
- https://servicedesk.surf.nl/wiki/spaces/WIKI/pages/102827225/PIDs+Setting+up+your+PID+structure
PID profiles and Information types
https://www.rd-alliance.org/wp-content/uploads/2015/01/PIT20final20report.pdf
(misc?){Weigel_Plale_Parsons_Zhou_Luo_Schwardmann_Quick_Hellström_Kurakawa_2018, title={RDA Recommendation on PID Kernel Information}, DOI={10.15497/rda00031}, publisher={Research Data Alliance}, author={Weigel, Tobias and Plale, Beth and Parsons, Mark and Zhou, Gabriel and Luo, Yu and Schwardmann, Ulrich and Quick, Robert and Hellström, Margareta and Kurakawa, Kei}, year={2018} }