CDL ingests content in the form of METS (Metadata Encoding and Transmission Standard) encoded digital objects. CDL depends upon METS Profiles to successfully process submitted objects.
METS profiles describe classes of METS digital objects that share common characteristics, such as content file formats (e.g., digital images, TEI texts) or metadata encoding formats (e.g., MODS or Dublin Core). Profiles should include enough details to enable METS creators and programmers to create and process METS-encoded digital objects conforming with a particular profile. A METS profile itself is an XML document that must adhere to the METS XML Profile Schema. For information about METS profiles, see the METS website.
METS files must conform to valid METS profiles, which must be declared during pre-submission discussions with CDL staff.
The METS top-level <mets> element must have an OBJID attribute containing an ARK identifier for the digital object (see bolded example). For more information about ARKs, visit the Archival Resource Key (ARK) page.
Example:
<mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:mods="http://www.loc.gov/mods/v3" xmlns:mix="http://www.loc.gov/mix/" xmlns:rts="http://cosimo.stanford.edu/sdr/metsrights/" xmlns:xlink="http://www.w3.org/TR/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-0.xsd http://www.loc.gov/mix/ http://www.loc.gov/standards/mix/mix.xsd http://cosimo.stanford.edu/sdr/metsrights/ http://cosimo.stanford.edu/sdr/metsrights.xsd" OBJID="ark:/13030/kt9g50158w" TYPE="still image" LABEL="[Pablo de la Guerra (1833-1874), son of José de la Guerra y Noriega]" PROFILE="http://www.loc.gov/mets/profiles/00000001.xml">
The METS Content File Section <fileSec> element must contain links to network-exposed (i.e., online) content files using File Location <FLocat> elements (see bolded example). Each <FLocat> element must contain a XLINK:HREF attribute that identifies a link to its associated content file.
Example:
<mets:file ID="FID8" MIMETYPE="image/jpeg" SEQ="2" CREATED="1999-06-28T00:00:00" ADMID="ADM1A" GROUPID="GID2">
<mets:FLocat xlink:href="http://sunsite.berkeley.edu/moa2/images/bkm00002774a_c.jpg" LOCTYPE="URL" />
</mets:file>
The METS file and associated content files must be well formed and uncorrupted.
To support the orderly transmission and ingest of digital objects, the CDL strongly recommends submission of checksum (MD5, SHA-1, or CRC32) and byte size values in the METS File <file> element.
In addition to conforming to CDL-supported METS profiles, all digital objects must explicitly state content file format MIME types (Multipurpose Internet Mail Extensions) for each <file> File Element tag in the METS document (see bolded example).
Example:
<mets:file ID="FID1" MIMETYPE="image/tiff" SEQ="1" CREATED="1999-06-17T00:00:00" ADMID="ADM1A" GROUPID="GID1">
For a list of MIME type content type and subtype values, see the MIME Media Types from the Internet Assigned Numbers Authority.
In order for the CDL to uniquely identify and manage digital objects by contributing institution, the CDL strongly recommends the use of a <mdRef> element with a MDTYPE attribute set to "other" and a OTHERMDTYPE attribute set to "contributing-institution-code". Additionally, use a XLINK:HREF attribute to reference the normalized version of the MARC Organization Code for the contributing institution. The code should be listed at the end of the following URI string: "http://id.loc.gov/organizations/" (see bolded example).
Example:
<mets:dmdSec>
<mets:mdRef LOCTYPE="URL" MDTYPE="other" OTHERMDTYPE="contributing-institution-code" xlink:href="http://id.loc.gov/organizations/cub" />
</mets:dmdSec>
For guidelines on linking digital objects to associated, parent-level collection descriptions (represented either in the form of a MARC record or an EAD finding aid), see Appendix C .
Metadata mappings are for extant XML extension metadata schemas such as MODS and qualified Dublin Core.
Encode metadata consistently based on the specific usage guidelines established for the schema. For example, if encoding in Dublin Core, follow the Dublin Core usage guidelines for each element.
Do not include HTML markup within metadata encoding, in cases where a metadata schema does not support it.
Whenever possible, provide the most granular and richest metadata possible. For example, if encoding in Dublin Core, encode your metadata in qualified Dublin Core.
Elements may be used repeatedly. Note that it may be necessary to supply multiple elements for the same piece of information, e.g., a general form of the date of creation of a resource ("January 1, 1999") in addition to an ISO8601 normalized form of that date ("1999-01-01").
However, avoid combining different kinds of data values or repeating the same type of data values within a single element; use separate elements for each data value. For example, avoid encoding multiple subject terms ("Municipal government; City Council members") in a single element. Instead, encode the two different terms within their own elements.
Use UTF-8 or UTF-16 standard character sets or encodings. The CDL recommends using standardized forms of names for character sets, as documented by the Internet Assigned Numbers Authority (e.g., use "UTF-8" and not "UTF8").
If using the UTF-8 character set in particular, encode directly in Unicode or use Unicode decimal or hexadecimal character references. All decimal character references should begin with an ampersand and pound sign, and end with a semicolon (use the syntax "&#D;" where D is a decimal number). All hexadecimal character references should begin with an ampersand, pound sign, and lower- or uppercase "x", and end with a semicolon (use the syntax "&#xH;" or "&#XH;" where H is a hexadecimal number); see the Unicode Code Charts for hexadecimal character reference codes.
For more detailed information about UTF-8 Unicode, see the W3C/Unicode Consortium document Unicode in XML and other Markup Languages.
Example using UTF-8 Unicode hexadecimal character references to encode the letter "é" in the term "émigrés":
... The papers also document trends in high school and university education among Russian émigrés...
Characters reserved for XML markup delimiters (ampersand, left angle bracket, and right angle bracket) need to be replaced with the character entities in the following table.
Reserved Characters |
||
Character |
Character Name |
Character Entity |
|---|---|---|
| & | Ampersand | & |
| < | Left angle bracket | < |
| > | Right angle bracket | > |
| ' | Single quote | ' |
| " | Double quote | " |
Do not include line breaks, list formatting or other any formatting controls within the body of elements. Headings and labels should not appear within the body of elements (except for certain cases; see Section 3.2.3).
Some XML extension schemas (e.g., MODS) provide label attributes on particular elements. In these cases, institutions may encode data values (e.g., text comprising concise headings or descriptions) within those label attributes as permitted by those schemas.
Note that the CDL GDO supports the creation of digital objects that are largely independent of a particular online presentation. The encoding can be manipulated and repurposed through the application of customized style sheets to meet custom display needs and formatting preferences. This includes the special formatting of text, the ordering and positioning of text, the addition of headings and labels, and punctuation.
In order to provide a consistent user experience, CDL style sheets support a standard presentation that may not accommodate local preferences. Your institution may devise and implement local style sheets for presenting customized views of its digital objects .
The CDL strongly supports the assertion that Dublin Core does not provide enough encoding granularity. The CDL therefore prefers that descriptive metadata is encoded in a richer format, such as MODS. Institutions should use qualified Dublin Core only in cases where MODS is not locally supported.
Descriptive metadata can be used to describe different expressions of a given resource. In the case of analog objects that have been digitized, the descriptive metadata may apply to the source analog object or the digital surrogate. For example, the "creator" of a resource may apply to an illustrator of a graphic book or the name of the technician responsible for scanning an image from that book. Likewise, the "date of creation" of a resource may apply to the date of printing for a graphic book or the date of scanning an image from that book. In the case of born-digital objects, the descriptive metadata pertains to the born-digital object itself.
Some descriptive metadata schemas do not allow encoders to clearly disambiguate between uses of a given element to apply to source analog objects versus digital surrogates. Therefore, when creating descriptive metadata for an analog object that has been digitized, we suggest that you consider the following two points:
Descriptive Metadata Guidelines (Summary) [NOTE: See Appendix A for detailed descriptions of each element. Element names below are also linked to those descriptions] |
|
Element
|
|
|---|---|
Status
|
|
| Identifier | Required element |
| Title | Required element |
| Creator | Required element (NOTE: if no name can be supplied, provide a name in Contributor, Institution/Repository, and/or Publisher) |
| Date | Required element |
| Description | Recommended element |
| Language | Recommended element |
| Subject (Name) | Recommended element |
| Subject (Title) | Recommended element |
| Subject (Place) | Recommended element |
| Subject (Topic, Function, or Occupation) | Recommended element |
| Genre | Recommended element |
| Type | Required element |
| Format/Physical Description | Recommended element |
| Related Collection/Project | Recommended element |
| Institution/Repository | Required element |
| Contributor | Recommended element |
| Publisher | Recommended element |
CDL's Rights Management Group (RMG) has developed a Rights Management Framework that may assist institutions contributing content to CDL preservation and access services in thinking about copyright and fair use issues for digital objects. The CDL strongly encourages contributors to provide rights information whenever possible, using one of the following methods:
Rights Management Administrative Metadata Guidelines (Summary) [NOTE: See Appendix B for detailed descriptions of each element. Element names below are also linked to those descriptions] |
|
Element
|
Status
|
| Copyright Status | Recommended element |
| Copyright Statement | Recommended element |
| Copyright Date | Recommended element |
| Copyright Owner Name | Recommended element |
| Copyright Owner Contact Information | Recommended element |
Structural metadata must be encoded in the METS format: structural metadata is represented in the <structMap> Structural Map section of a METS document. This section defines a structure that allows users of the digital object to navigate through its hierarchical organization. Guidelines for preparing Structural Maps are documented in CDL-supported METS profiles.
The CDL generates the technical metadata required to support the orderly management of digital objects in its repositories. Currently, the CDL utilizes the JSTOR/Harvard Object Validation Environment (JHOVE) tool to derive technical metadata for accepted content file types.
You are encouraged to submit any additional technical metadata associated with a particular digital object (such as checksum [MD5, SHA-1, or CRC32] and byte size values in the METS <file> element, or information based on NISO's Data Dictionary: Technical Metadata for Still Images), but are not required to do so. CDL preservation services will store any supplied additional metadata with the object.
Note that all supplied technical metadata should be encoded using valid XML extension schemas as specified by CDL-supported METS profiles (such as in the NISO Metadata for Images in XML Schema (MIX) format). If a given set of metadata does not conform to a valid XML extension schema, then you should create a schema to embed the metadata and facilitate validation of the METS file. Otherwise, the metadata should be stored independently of the METS file and referred to using the METS <mdRef> Metadata Reference from within the METS file.
You may submit any additional metadata associated with a particular digital object, but are not required to do so. CDL preservation services will store any additional metadata with the object. CDL access services (OAC, Calisphere) will not necessarily display supplemental metadata to users.
Note that all supplied metadata should be encoded using valid XML extension schemas as specified by CDL-supported METS profiles. If a given set of metadata does not conform to a valid XML extension schema, then you should create a schema to embed the metadata and facilitate validation of the METS file. Otherwise, the metadata should be stored independently of the METS file and referred to using the METS <mdRef> Metadata Reference from within the METS file.
The following content file types are currently supported by the CDL for the Enhanced Service Level.
| Content File Type | Content File Guidelines |
|---|---|
|
Image files should comply with the CDL
Guidelines for Digital Images.
Producers are strongly encouraged to submit at least one copy of a digital master file for each digital object. Producers must submit, at minimum, at least two derivative file types for each digital object:
|
|
All PDF file formats are supported. The CDL prefers PDF files with embedded text transcriptions. Producers must submit one PDF file per digital object. |
|
TEI text files should comply with the CDL Structured Text Working Group TEI Encoding Guidelines. Producers must submit one TEI file per digital object. |
Each content file should have a file name that is unique to your institution (i.e., not necessarily globally unique); often the unique identifier is used to name the content file itself.