gocfl / ocfl / extensions
In addition to extracting file contents, gocfl provides the capability to export the metadata stored within an OCFL object into a separate JSON file. This is particularly useful for further processing or indexing in external systems.
extractmeta CommandThe extractmeta command reads the metadata of the object (including technical metadata from extensions) and writes it to the specified target.
gocfl extractmeta [options] [path to storage root or object]
To extract the metadata of our test object into a JSON file, we use:
gocfl --log-level DEBUG --config ./gocfl/config/gocfl.toml extractmeta ./gocfl/temp/test42 -i urn:nbn:de:gbv:42-test1 --output ./gocfl/temp/meta.json
Explanation:
--log-level DEBUG: Displays detailed information during the process.extractmeta: The command for extracting metadata../gocfl/temp/test42: The path to the storage root.-i urn:nbn:de:gbv:42-test1: The ID of the object whose metadata is to be extracted.--output ./gocfl/temp/meta.json: Specifies the file path where the metadata will be written.--obfuscate: Exports the metadata in an anonymized form. Path and file names are replaced with random UUIDs, while technical metadata such as file sizes, PRONOM IDs, and MIME types are preserved.The extracted JSON file (meta.json) essentially contains three types of information:
NNNN-indexer).The meta.json provides an aggregated view of the OCFL object. While the standard inventory is more of a “flat” list of checksums and paths, extractmeta combines this information with the results from various extensions.
meta.json:versions block in the inventory).v1/content/data/image.jpg).Object Header:
{
"ID": "urn:nbn:de:gbv:42-test1",
"DigestAlgorithm": "sha512",
"Head": "v1",
"Versions": {
"v1": {
"Created": "2026-03-15T15:04:33Z",
"Message": "initial commit",
"Name": "User OCFL",
"Address": "mailto:ocfl.user@unibas.ch"
}
},
...
}
File Entry with Extensions:
This shows how extension data (NNNN-filesystem, NNNN-indexer, NNNN-thumbnail) is mapped directly to the file:
"1082b5603213566c3...": {
"InternalName": ["v1/content/data/image/IMG_6914.jpg"],
"VersionName": {
"v1": ["data/image/IMG_6914.jpg"]
},
"Extension": {
"NNNN-filesystem": {
"v1": [{
"path": "data/image/IMG_6914.jpg",
"meta": {
"size": 3696602,
"mTime": "2023-11-27T16:54:03+01:00"
}
}]
},
"NNNN-indexer": {
"mimetype": "image/jpeg",
"pronom": "fmt/43",
"type": "image"
},
"NNNN-thumbnail": {
"id": "internal",
"filename": "metadata/thumbnails/v1/00002.png"
}
}
}
Global Extensions Section:
At the end of the meta.json, there is a summary section for extensions that provide global information for the entire object:
{
...
"Extension": {
"NNNN-content-subpath": {
"content": {
"path": "data",
"description": "Payload of archival object"
},
"metadata": {
"path": "metadata",
"description": "additional semantic metadata"
}
},
"NNNN-metafile": {
"title": "Some OCFL Testfiles (initial version)",
"authors": ["Doe, John", "Doe, Jane"],
"description": "Lorem ipsum dolor sit amet...",
"created": "2023-10-31",
"collection": "OCFL Demo"
}
}
}
The Inventory (inventory.json) is the “truth” of the OCFL standard. It is optimized for ensuring the integrity and structure of the object, but is harder to consume directly by humans or external search engines because information needs to be merged across multiple blocks (manifest, versions, fixity).
The meta.json is a refined export view. It resolves the references of the inventory and enriches them with data from the configured extensions. This export serves as the basis for the display function of gocfl and is used to ingest data into the OCFL Native Archive.
The OCFL object thus contains all the metadata required for the long-term archiving of the object. The extractmeta function provides this in a processed format (meta.json) that an archive system can easily interpret. This ensures the discoverability of the object within the archive.
This makes it the ideal basis for:
Using the parameter --obfuscate, the metadata can be exported in an anonymized form. This is particularly useful when metadata needs to be shared for analysis or testing purposes without revealing sensitive information such as real file names or path structures.
Files block, as well as the paths in InternalName and VersionName, are replaced with random UUIDs.size)mimetype)pronom): Used for Preservation Management. This allows for Format Watching even with anonymized data.Checksums block is cleared to prevent any conclusions about the original file.{
"Files": {
"14aafd0e-4d07-47c2-91d0-b08d91ed5b53": {
"Checksums": {},
"InternalName": ["v1/content/c7537293-a62f-43f6-9e3d-e8c27bf0c245"],
"VersionName": {
"v1": ["c7537293-a62f-43f6-9e3d-e8c27bf0c245"]
},
"Extension": {
"NNNN-indexer": {
"mimetype": "application/pdf",
"pronom": "fmt/20",
"size": 547440,
"metadata": {}
}
}
}
}
}
As shown in the example, the technical characteristics of the file (PDF, approx. 547 KB) are preserved for statistical evaluations, while the context (file name, path, hash) has been completely anonymized.
| Back to Extracting Content | Back to Table of Contents |