Metadata for You & Me - Defining Shareable Metadata: Conformance to Standards

Module Content

Screencast


Powerpoint Slides and Other Resources


Module Text

1. Defining conformance for shareable records

Conformance to a variety of standards and expectations is key for shareable metadata records. Many categories of standards come in to play in shareable metadata records, including:

  1. The sharing protocol. Protocols for sharing include OAI-PMH, Z39.50, and SRU. All the relevant protocol functions should be correctly implemented.
  2. Metadata structure standards. Metadata structure standards define the fields ("buckets" for information) that can be used in a metadata record, and include Dublin Core, MODS, MARC, CDWA Lite, and VRA Core. (Note: MARC is also a data communication standard - how confusing!) Ensure conformance to the field names, order, cardinality, etc. laid out in the standard.
  3. Controlled vocabularies and syntax encoding schemes. When using a controlled vocabulary list (such as LCSH) or a syntax encoding scheme specifying how to format a value (such as W3CDTF for dates), be sure the value is actually valid to the standard to which it claims to conform.
  4. Data Content Standards. A data content standard such as AACR2, Describing Archives: A Content Standard (DACS), or Cataloging Cultural Objects (CCO) should be applied to the overall creation of records.
  5. Technical Standards:
    • Character encoding standards, such as UTF-8, define how characters should be represented. Proper conformance becomes vital when dealing with "special" characters.
    • Data encoding standards, such as XML, have their own sets of rules for structuring data.
    • Entity references. Within a data encoding standard, certain characters will have special meanings. In XML, characters such as ampersands and angle brackets must be represented in a certain way if they are to be interpreted as part of the metadata value rather than part of the encoding.
2. Putting it into practice

Ensuring conformance to the various standards in play can be a daunting task. Different types of standards require different types of review to determine if records conform. Conformance to vocabulary and content standards can be assessed through regular metadata quality review processes. Conformance to technical standards and metadata structure standards for XML-based languages can be assessed by including a step validating all XML documents as part of your pre-sharing workflow. Checking conformance to a sharing protocol would involve using validation tools built for protocol implementers, such as the OAI Repository Explorer for the OAI-PMH sharing protocol.

In the record that follows, review the first dc:publisher field. Note the publisher name includes an ampersand. In XML, ampersands cannot appear as "naked" characters - they must instead be encoded with the entity reference &. In this record, however, the routine to convert the original ampersand into the & form has mistakenly been run twice, resulting in a "double-encoding" of &. The same issue has affected the dc:title field, turning what should be ' into '

<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Daisy&amp;apos;s necklace, and what came of it [electronic resource] :
(a literary episode) / by T.B. Aldrich.</dc:title>

<dc:creator>Aldrich, Thomas Bailey, 1836-1907.</dc:creator>
<dc:description> </dc:description>
<dc:publisher>New York : Derby &amp;amp; Jackson ;
Cincinnati : H.W. Derby,</dc:publisher>

<dc:publisher>[Bloomington, Ind.] : Indiana University Digital Library Program
for the Committee on Institution</dc:publisher>
<dc:contributor>Indiana University. Digital Library Program.</dc:contributor>
<dc:contributor>Committee on Institutional Cooperation.</dc:contributor>
<dc:date>1857.</dc:date>
<dc:date>[2001 or 2002.]</dc:date>
<dc:type>text</dc:type>
<dc:format>text/sgml</dc:format>
<dc:format>text/html</dc:format>
<dc:format>application/pdf</dc:format>
<dc:format>image/gif</dc:format>
<dc:identifier>http://purl.dlib.indiana.edu/iudl/wright2/wright2-0035</dc:identifier>
<dc:source>Wright, L. H. Amer. fiction, 1851-1875, 35</dc:source>
<dc:source>Amer. fiction, 1774-1910 (microfilm, 1970-1978 ed.), v. 2 (1851-1875),
reel A-6, no. 35</dc:source>
<dc:source>Digitized image of the microfilm version produced in Woodbridge, CT
by Research Publications (later called Primary Source Mi</dc:source>
<dc:language>English</dc:language>
<dc:relation>Issued also in print and on microfilm.</dc:relation>
<dc:rights>http://www.letrs.indiana.edu/web/w/wright2/copyright.html</dc:rights>
</oai_dc:dc>

Next, consider the following record. Note first the attempt to add the HTML <i> tag (indicating text should be rendered in italics) within the first dc:title element. Including <i> as-is would result in an XML validation error, so the authors of this metadata record attempted to overcome this by providing entity references for the open and close brackets, i.e., &lt;i&gt;GOPHERUS POLYPHEMUS&lt;/i&gt;. While the record therefore passes validation, this practice violates the spirit of the Dublin Core metadata format, which does not provide for text formatting. In addition, in the unpredictable processing and display environments of metadata aggregators it is unclear if this practice will ultimately result in the desired behavior of displaying this text in italics.

Next, consider the attribute xsi:type="dcterms:URI" on the dc:identifier element. This record is defined as conforming to the oai_dc XML Schema implementing the simple Dublin Core metadata format, which does not define the xsi namespace or provide for attributes on Dublin Core elements representing refinements from qualified Dublin Core. The presence of this attribute causes the record to fail validation against the oai_dc Schema, and would likely therefore cause problems processing the record by aggregators. Such processing routines are likely to include reading of the record by a standard XML parser, which will fail due to the validity problem.

<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>&lt;i&gt;GOPHERUS POLYPHEMUS&lt;/i&gt; (Gopher Tortoise) COYOTE PREDATION</dc:title>
<dc:creator>Moore, Jon A.</dc:creator>
<dc:creator>Engeman, Richard M.</dc:creator>
<dc:creator>Smith, Henry T.</dc:creator>
<dc:creator>Woolard, John</dc:creator>
<dc:description>Gopherus polyphemus is listed as a species of special concern by the state of Florida
(Florida Wildlife Code Chap. 39 F.A.C.), and as a threatened species by the Florida Committee on Rare and
Endangered Plants and Animals (Moler 1992. Rare and Endangered Biota of Florida: Volume 111,
Reptiles and Amphibians. University Press of Florida, Gainesville, Florida. 291 pp.). Coyotes (Canis latrans) are
invasive to Florida with ranges that are expanding within the state (Schmitz and Brown 1994. An
Assessment of Invasive Non-Indigenous Species in Florida's Public Lands. Florida Dept.
Environmental Protection. Tallahassee, Florida. 283 pp.; Wooding and Hardinsky 1990. Florida Field Nat.
18:12-14), including the southeastern coast (Cunningham and Dunford 1970. Quart. J. Florida Acad.
Sci. 33:279-280; Brady 1983. Florida Field Nat. 11:40-41; Hill et al. 1987. Wildl. Soc. Bull.
15:521-524; Wooding and Hardinsky, op. cit.). We report here evidence of Coyote predation on Gopher
Tortoise hatchlings in southeastern coastal Florida.</dc:description>
<dc:date>2006-02-21</dc:date>
<dc:type>text</dc:type>
<dc:format>application/pdf</dc:format>
<dc:identifier xsi:type="dcterms:URI">http://digitalcommons.unl.edu/icwdm_usdanwrc/434</dc:identifier>
<dc:publisher>DigitalCommons@University of Nebraska - Lincoln</dc:publisher>
</oai_dc:dc>

3. What to look for in specific element types
4. Introducing Activity 2

Look at this shared record. Spend some time with it on its own. What doesn't make sense?

After doing some analysis on the raw record, look at the record in its native environment What would you have done differently to make this record more coherent and include the appropriate context?

In the next module, we'll talk through some issues related to the shareability of this record.

Next module: Defining Shareable Metadata: Analysis of Activity 2