Inside CDL

3. Enhanced Service Level Requirements

3.1. METS

METS Profiles

CDL ingests content in the form of METS (Metadata Encoding and Transmission Standard) encoded digital objects. CDL depends upon METS Profiles to successfully process submitted objects.

METS profiles describe classes of METS digital objects that share common characteristics, such as content file formats (e.g., digital images, TEI texts) or metadata encoding formats (e.g., MODS or Dublin Core). Profiles should include enough details to enable METS creators and programmers to create and process METS-encoded digital objects conforming with a particular profile. A METS profile itself is an XML document that must adhere to the METS XML Profile Schema. For information about METS profiles, see the METS website.

METS files must conform to valid METS profiles, which must be declared during pre-submission discussions with CDL staff.

Metadata and Encoding Transmission Standard <METS> Element

The METS top-level <mets> element must have an OBJID attribute containing an ARK identifier for the digital object (see bolded example). For more information about ARKs, visit the Archival Resource Key (ARK) page.

Example:
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:mix="http://www.loc.gov/mix/" xmlns:rts="http://cosimo.stanford.edu/sdr/metsrights/" xmlns:xlink="http://www.w3.org/TR/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-0.xsd http://www.loc.gov/mix/ http://www.loc.gov/standards/mix/mix.xsd http://cosimo.stanford.edu/sdr/metsrights/ http://cosimo.stanford.edu/sdr/metsrights.xsd" OBJID="ark:/13030/kt9g50158w" TYPE="still image" LABEL="[Pablo de la Guerra (1833-1874), son of José de la Guerra y Noriega]" PROFILE="http://www.loc.gov/mets/profiles/00000001.xml">

Content File Section <fileSec> Element

The METS Content File Section <fileSec> element must contain links to network-exposed (i.e., online) content files using File Location <FLocat> elements (see bolded example). Each <FLocat> element must contain a XLINK:HREF attribute that identifies a link to its associated content file.

Example:
<mets:file ID="FID8" MIMETYPE="image/jpeg" SEQ="2" CREATED="1999-06-28T00:00:00" ADMID="ADM1A" GROUPID="GID2">
<mets:FLocat xlink:href="http://sunsite.berkeley.edu/moa2/images/bkm00002774a_c.jpg" LOCTYPE="URL" />
</mets:file>

The METS file and associated content files must be well formed and uncorrupted.

File <file> Element and Checksum Values

To support the orderly transmission and ingest of digital objects, the CDL strongly recommends submission of checksum (MD5, SHA-1, or CRC32) and byte size values in the METS File <file> element.

File <file> Element MIMETYPE Attribute

In addition to conforming to CDL-supported METS profiles, all digital objects must explicitly state content file format MIME types (Multipurpose Internet Mail Extensions) for each <file> File Element tag in the METS document (see bolded example).

Example:
<mets:file ID="FID1" MIMETYPE="image/tiff" SEQ="1" CREATED="1999-06-17T00:00:00" ADMID="ADM1A" GROUPID="GID1">

For a list of MIME type content type and subtype values, see the MIME Media Types from the Internet Assigned Numbers Authority.

Institution/Repository Information: Specialized Use of the <mdRef> Metadata Reference Element

In order for the CDL to uniquely identify and manage digital objects by contributing institution, the CDL strongly recommends the use of a <mdRef> element with a MDTYPE attribute set to "other" and a OTHERMDTYPE attribute set to "contributing-institution-code". Additionally, use a XLINK:HREF attribute to reference the normalized version of the MARC Organization Code for the contributing institution. The code should be listed at the end of the following URI string: "http://id.loc.gov/organizations/" (see bolded example).

Example:
<mets:dmdSec>
<mets:mdRef LOCTYPE="URL" MDTYPE="other" OTHERMDTYPE="contributing-institution-code" xlink:href="http://id.loc.gov/organizations/cub" />
</mets:dmdSec>

Linking from Digital Objects to Collection Descriptions: Specialized Use of the <mdRef> Metadata Reference Element

For guidelines on linking digital objects to associated, parent-level collection descriptions (represented either in the form of a MARC record or an EAD finding aid), see Appendix C .

3.2. Metadata

3.2.1 Using Metadata Schemas

Metadata mappings are for extant XML extension metadata schemas such as MODS and qualified Dublin Core.

Encode metadata consistently based on the specific usage guidelines established for the schema. For example, if encoding in Dublin Core, follow the Dublin Core usage guidelines for each element.

Do not include HTML markup within metadata encoding, in cases where a metadata schema does not support it.

Granularity

Whenever possible, provide the most granular and richest metadata possible. For example, if encoding in Dublin Core, encode your metadata in qualified Dublin Core.

Repeatability of Elements and Data Values

Elements may be used repeatedly. Note that it may be necessary to supply multiple elements for the same piece of information, e.g., a general form of the date of creation of a resource ("January 1, 1999") in addition to an ISO8601 normalized form of that date ("1999-01-01").

However, avoid combining different kinds of data values or repeating the same type of data values within a single element; use separate elements for each data value. For example, avoid encoding multiple subject terms ("Municipal government; City Council members") in a single element. Instead, encode the two different terms within their own elements.

Character Encoding

Use UTF-8 or UTF-16 standard character sets or encodings. The CDL recommends using standardized forms of names for character sets, as documented by the Internet Assigned Numbers Authority (e.g., use "UTF-8" and not "UTF8").

If using the UTF-8 character set in particular, encode directly in Unicode or use Unicode decimal or hexadecimal character references. All decimal character references should begin with an ampersand and pound sign, and end with a semicolon (use the syntax "&#D;" where D is a decimal number). All hexadecimal character references should begin with an ampersand, pound sign, and lower- or uppercase "x", and end with a semicolon (use the syntax "&#xH;" or "&#XH;" where H is a hexadecimal number); see the Unicode Code Charts for hexadecimal character reference codes.

For more detailed information about UTF-8 Unicode, see the W3C/Unicode Consortium document Unicode in XML and other Markup Languages.

Example using UTF-8 Unicode hexadecimal character references to encode the letter "é" in the term "émigrés":

... The papers also document trends in high school and university education among Russian &#x00E9;migr&#x00E9;s...

Characters reserved for XML markup delimiters (ampersand, left angle bracket, and right angle bracket) need to be replaced with the character entities in the following table.

Reserved Characters

Character

Character Name

Character Entity

& Ampersand &amp;
< Left angle bracket &lt;
> Right angle bracket &gt;
' Single quote &apos;
" Double quote &quot;

Headings, Labels, Punctuation, and Formatting

Do not include line breaks, list formatting or other any formatting controls within the body of elements. Headings and labels should not appear within the body of elements (except for certain cases; see Section 3.2.3).

Some XML extension schemas (e.g., MODS) provide label attributes on particular elements. In these cases, institutions may encode data values (e.g., text comprising concise headings or descriptions) within those label attributes as permitted by those schemas.

Note that the CDL GDO supports the creation of digital objects that are largely independent of a particular online presentation. The encoding can be manipulated and repurposed through the application of customized style sheets to meet custom display needs and formatting preferences. This includes the special formatting of text, the ordering and positioning of text, the addition of headings and labels, and punctuation.

In order to provide a consistent user experience, CDL style sheets support a standard presentation that may not accommodate local preferences. Your institution may devise and implement local style sheets for presenting customized views of its digital objects .

3.2.2. Descriptive Metadata

Using Descriptive Metadata Schemas

The CDL strongly supports the assertion that Dublin Core does not provide enough encoding granularity. The CDL therefore prefers that descriptive metadata is encoded in a richer format, such as MODS. Institutions should use qualified Dublin Core only in cases where MODS is not locally supported.

Object Description

Descriptive metadata can be used to describe different expressions of a given resource. In the case of analog objects that have been digitized, the descriptive metadata may apply to the source analog object or the digital surrogate. For example, the "creator" of a resource may apply to an illustrator of a graphic book or the name of the technician responsible for scanning an image from that book. Likewise, the "date of creation" of a resource may apply to the date of printing for a graphic book or the date of scanning an image from that book. In the case of born-digital objects, the descriptive metadata pertains to the born-digital object itself.

Some descriptive metadata schemas do not allow encoders to clearly disambiguate between uses of a given element to apply to source analog objects versus digital surrogates. Therefore, when creating descriptive metadata for an analog object that has been digitized, we suggest that you consider the following two points:

  • Be consistent in your use of descriptive metadata elements: emphasize the description of either the source analog object or the digital surrogate.
  • Provide descriptive metadata that supports user access to and discovery of the digital object. Information about the source analog object may be more relevant to users.

Descriptive Metadata Guidelines (Summary)

[NOTE: See Appendix A for detailed descriptions of each element. Element names below are also linked to those descriptions]

Element

 

Status

 

Identifier Required element
Title Required element
Creator Required element (NOTE: if no name can be supplied, provide a name in Contributor, Institution/Repository, and/or Publisher)
Date Required element
Description Recommended element
Language Recommended element
Subject (Name) Recommended element
Subject (Title) Recommended element
Subject (Place) Recommended element
Subject (Topic, Function, or Occupation) Recommended element
Genre Recommended element
Type Required element
Format/Physical Description Recommended element
Related Collection/Project Recommended element
Institution/Repository Required element
Contributor Recommended element
Publisher Recommended element

3.2.3. Rights Management Administrative Metadata

CDL's Rights Management Group (RMG) has developed a Rights Management Framework that may assist institutions contributing content to CDL preservation and access services in thinking about copyright and fair use issues for digital objects. The CDL strongly encourages contributors to provide rights information whenever possible, using one of the following methods:

  • Use rights-related elements in the schema chosen for supplying descriptive metadata (e.g., <dc:rights> in Dublin Core, <accessCondition> in MODS). Elements in these schemas are repeatable, so if more than one rights-related element is used, contributors should provide clarifying information about each piece of rights information either using a label attribute (MODS) or by providing a label as part of the element's content (Dublin Core).
  • Supply rights information using METSRights, an approved extension schema for METS.

Rights Management Administrative Metadata Guidelines (Summary)

[NOTE: See Appendix B for detailed descriptions of each element. Element names below are also linked to those descriptions]

Element

 

Status

 

Copyright Status Recommended element
Copyright Statement Recommended element
Copyright Date Recommended element
Copyright Owner Name Recommended element
Copyright Owner Contact Information Recommended element

3.2.4. Structural Metadata

Structural metadata must be encoded in the METS format: structural metadata is represented in the <structMap> Structural Map section of a METS document. This section defines a structure that allows users of the digital object to navigate through its hierarchical organization. Guidelines for preparing Structural Maps are documented in CDL-supported METS profiles.

3.2.5. Technical Metadata

The CDL generates the technical metadata required to support the orderly management of digital objects in its repositories. Currently, the CDL utilizes the JSTOR/Harvard Object Validation Environment (JHOVE) tool to derive technical metadata for accepted content file types.

You are encouraged to submit any additional technical metadata associated with a particular digital object (such as checksum [MD5, SHA-1, or CRC32] and byte size values in the METS <file> element, or information based on NISO's Data Dictionary: Technical Metadata for Still Images), but are not required to do so. CDL preservation services will store any supplied additional metadata with the object.

Note that all supplied technical metadata should be encoded using valid XML extension schemas as specified by CDL-supported METS profiles (such as in the NISO Metadata for Images in XML Schema (MIX) format). If a given set of metadata does not conform to a valid XML extension schema, then you should create a schema to embed the metadata and facilitate validation of the METS file. Otherwise, the metadata should be stored independently of the METS file and referred to using the METS <mdRef> Metadata Reference from within the METS file.

3.2.6. Other Metadata (Digital Provenance Administrative Metadata, Source Administrative Metadata, and Behaviors Metadata)

You may submit any additional metadata associated with a particular digital object, but are not required to do so. CDL preservation services will store any additional metadata with the object. CDL access services (OAC, Calisphere) will not necessarily display supplemental metadata to users.

Note that all supplied metadata should be encoded using valid XML extension schemas as specified by CDL-supported METS profiles. If a given set of metadata does not conform to a valid XML extension schema, then you should create a schema to embed the metadata and facilitate validation of the METS file. Otherwise, the metadata should be stored independently of the METS file and referred to using the METS <mdRef> Metadata Reference from within the METS file.

3.3. Content Files

The following content file types are currently supported by the CDL for the Enhanced Service Level.

Content File Type Content File Guidelines
  • Image files
Image files should comply with the CDL Guidelines for Digital Images.

Producers are strongly encouraged to submit at least one copy of a digital master file for each digital object. Producers must submit, at minimum, at least two derivative file types for each digital object:

  • An access image (a service or reference image for more detailed viewing).
  • A thumbnail image (for the fastest access during the search, browse, and retrieval process).
  • PDF files

All PDF file formats are supported. The CDL prefers PDF files with embedded text transcriptions.

Producers must submit one PDF file per digital object.

  • TEI files

TEI text files should comply with the CDL Structured Text Working Group TEI Encoding Guidelines.

Producers must submit one TEI file per digital object.

Each content file should have a file name that is unique to your institution (i.e., not necessarily globally unique); often the unique identifier is used to name the content file itself.


Contact the CDL