je4.github.io

gocfl / ocfl / extensions

View My GitHub Profile

Deutsch

Initializing an OCFL Storage Root

After familiarizing ourselves with the basics and configuration of gocfl, the first practical step in an OCFL-based archive is the creation of a Storage Root.

1. What is a Storage Root?

The Storage Root is the top level of your archive. It is the container in which all OCFL objects are stored. A storage root contains:

2. Structure and Requirements (Mandatory vs. Optional)

According to the OCFL Specification on Root Structure, clear rules apply to a storage root to ensure long-term interpretability:

Mandatory Components

  1. OCFL Version Marker: A file named ocfl_v1.1.txt (or similar, gocfl uses 0=ocfl_1.1) identifying the path as an OCFL Storage Root.
  2. OCFL Objects: The actual data organized in subdirectories.

Optional Components

Important Structure Rules

3. The Command gocfl init

With gocfl, you initialize a new storage root via the init subcommand.

Syntax

gocfl init [path to storage root] [flags]

Important Flags for init

3. Practical Example

In this example, we initialize a storage root using a specific configuration file and a target directory:

gocfl --config ./gocfl/config/gocfl.toml init ./gocfl/temp/test42/

What Happens?

If you look into the directory after the command, you will see a structure that goes beyond the minimal OCFL specification, as gocfl creates useful additional information and configurations:

test42/
├── 0=ocfl_1.1          # Storage root marker (OCFL Version 1.1)
├── ocfl_layout.json    # Definition of the path layout
├── ocfl_spec_1.1.md    # The OCFL specification as a reference
├── extensions/         # Configurations for enabled extensions
│   ├── 0004-hashed-n-tuple-storage-layout/
│   ├── initial/
│   └── NNNN-gocfl-extension-manager/
├── 00XX-*.md           # Documentation for standard extensions
├── NNNN-*.md           # Documentation for the used GOCFL extensions
└── initial.md          # Documentation of the initial configuration

Key Files in Detail

  1. 0=ocfl_1.1: In the specification, this file is called ocfl_1.1.txt. gocfl uses this name by default to declare the version of the storage root. It is empty and serves only as a “name tag”.
  2. ocfl_layout.json: This file is crucial for scalability. In our example, 0004-hashed-n-tuple-storage-layout is used. This means that object IDs (like ark:/12345/bcd987) are hashed and distributed into subdirectories (e.g., a1b/2c3/d4e/...) to prevent too many folders in a single directory.
  3. ocfl_spec_1.1.md: This file contains the complete OCFL specification directly in the storage root. Thus, the root is not only self-contained (all data is present) but also self-describing, as the rules for access and interpretation are provided directly.
  4. extensions/: Here are the configuration files (config.json) for the extensions (see OCFL Storage Root Extensions).
    • 0004-... configures the layout mentioned above.
    • initial determines which extension is loaded first (in our case, the NNNN-gocfl-extension-manager).
    • NNNN-gocfl-extension-manager is a gocfl-specific extension responsible solely for initializing the storage root.
  5. The .md files (e.g., ocfl_spec_1.1.md, 0001-*.md and NNNN-*.md): During initialization, gocfl copies both the complete OCFL specification and an extensive collection of descriptions for the available extensions (such as 0004-...md, NNNN-indexer.md or NNNN-migration.md) directly into the storage root. This happens during initialization by the Extension Manager. Thus, the principle of self-documentation is consistently implemented: anyone accessing this medium in the future will find not only the general specification but also technical explanations for all functions used in the archive right on-site, without being dependent on external websites.

Back to Table of Contents Next Topic: OCFL Extensions