Metadata for You & Me - Defining Shareable Metadata: Conformance to Standards
Module Content
Screencast
Powerpoint Slides and Other Resources
Module Text
1. Defining conformance for shareable recordsConformance to a variety of standards and expectations is key for shareable metadata records. Many categories of standards come in to play in shareable metadata records, including:
- The sharing protocol. Protocols for sharing include OAI-PMH, Z39.50, and SRU. All the relevant protocol functions should be correctly implemented.
- Metadata structure standards. Metadata structure standards define the fields ("buckets" for information) that can be used in a metadata record, and include Dublin Core, MODS, MARC, CDWA Lite, and VRA Core. (Note: MARC is also a data communication standard - how confusing!) Ensure conformance to the field names, order, cardinality, etc. laid out in the standard.
- Controlled vocabularies and syntax encoding schemes. When using a controlled vocabulary list (such as LCSH) or a syntax encoding scheme specifying how to format a value (such as W3CDTF for dates), be sure the value is actually valid to the standard to which it claims to conform.
- Data Content Standards. A data content standard such as AACR2, Describing Archives: A Content Standard (DACS), or Cataloging Cultural Objects (CCO) should be applied to the overall creation of records.
- Technical Standards:
- Character encoding standards, such as UTF-8, define how characters should be represented. Proper conformance becomes vital when dealing with "special" characters.
- Data encoding standards, such as XML, have their own sets of rules for structuring data.
- Entity references. Within a data encoding standard, certain characters will have special meanings. In XML, characters such as ampersands and angle brackets must be represented in a certain way if they are to be interpreted as part of the metadata value rather than part of the encoding.
Ensuring conformance to the various standards in play can be a daunting task. Different types of standards require different types of review to determine if records conform. Conformance to vocabulary and content standards can be assessed through regular metadata quality review processes. Conformance to technical standards and metadata structure standards for XML-based languages can be assessed by including a step validating all XML documents as part of your pre-sharing workflow. Checking conformance to a sharing protocol would involve using validation tools built for protocol implementers, such as the OAI Repository Explorer for the OAI-PMH sharing protocol.
In the record that follows, review the first dc:publisher field. Note the publisher name includes an ampersand. In XML, ampersands cannot appear as "naked" characters - they must instead be encoded with the entity reference &. In this record, however, the routine to convert the original ampersand into the & form has mistakenly been run twice, resulting in a "double-encoding" of &. The same issue has affected the dc:title field, turning what should be ' into '
<oai_dc:dc
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Daisy&apos;s necklace,
and what came of it [electronic resource] :
(a literary episode) / by T.B.
Aldrich.</dc:title>
<dc:creator>Aldrich,
Thomas Bailey, 1836-1907.</dc:creator>
<dc:description> </dc:description>
<dc:publisher>New York : Derby
&amp; Jackson ;
Cincinnati : H.W.
Derby,</dc:publisher>
<dc:publisher>[Bloomington, Ind.] : Indiana University Digital Library
Program
for the Committee on Institution</dc:publisher>
<dc:contributor>Indiana University. Digital Library
Program.</dc:contributor>
<dc:contributor>Committee
on Institutional Cooperation.</dc:contributor>
<dc:date>1857.</dc:date>
<dc:date>[2001 or 2002.]</dc:date>
<dc:type>text</dc:type>
<dc:format>text/sgml</dc:format>
<dc:format>text/html</dc:format>
<dc:format>application/pdf</dc:format>
<dc:format>image/gif</dc:format>
<dc:identifier>http://purl.dlib.indiana.edu/iudl/wright2/wright2-0035</dc:identifier>
<dc:source>Wright, L. H. Amer. fiction, 1851-1875,
35</dc:source>
<dc:source>Amer. fiction, 1774-1910
(microfilm, 1970-1978 ed.), v. 2 (1851-1875),
reel A-6, no.
35</dc:source>
<dc:source>Digitized image of the
microfilm version produced in Woodbridge, CT
by Research Publications (later
called Primary Source Mi</dc:source>
<dc:language>English</dc:language>
<dc:relation>Issued also in print and on
microfilm.</dc:relation>
<dc:rights>http://www.letrs.indiana.edu/web/w/wright2/copyright.html</dc:rights>
</oai_dc:dc>
Next, consider the following record. Note first the attempt to add the HTML <i> tag (indicating text should be rendered in italics) within the first dc:title element. Including <i> as-is would result in an XML validation error, so the authors of this metadata record attempted to overcome this by providing entity references for the open and close brackets, i.e., <i>GOPHERUS POLYPHEMUS</i>. While the record therefore passes validation, this practice violates the spirit of the Dublin Core metadata format, which does not provide for text formatting. In addition, in the unpredictable processing and display environments of metadata aggregators it is unclear if this practice will ultimately result in the desired behavior of displaying this text in italics.
Next, consider the attribute xsi:type="dcterms:URI" on the dc:identifier element. This record is defined as conforming to the oai_dc XML Schema implementing the simple Dublin Core metadata format, which does not define the xsi namespace or provide for attributes on Dublin Core elements representing refinements from qualified Dublin Core. The presence of this attribute causes the record to fail validation against the oai_dc Schema, and would likely therefore cause problems processing the record by aggregators. Such processing routines are likely to include reading of the record by a standard XML parser, which will fail due to the validity problem.
<oai_dc:dc
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title><i>GOPHERUS
POLYPHEMUS</i> (Gopher Tortoise) COYOTE
PREDATION</dc:title>
<dc:creator>Moore,
Jon A.</dc:creator>
<dc:creator>Engeman, Richard
M.</dc:creator>
<dc:creator>Smith, Henry
T.</dc:creator>
<dc:creator>Woolard,
John</dc:creator>
<dc:description>Gopherus
polyphemus is listed as a species of special concern by the state of Florida
(Florida Wildlife Code Chap. 39 F.A.C.), and as a threatened species by the
Florida Committee on Rare and
Endangered Plants and Animals (Moler 1992. Rare
and Endangered Biota of Florida: Volume 111,
Reptiles and Amphibians.
University Press of Florida, Gainesville, Florida. 291 pp.). Coyotes (Canis latrans)
are
invasive to Florida with ranges that are expanding within the state
(Schmitz and Brown 1994. An
Assessment of Invasive Non-Indigenous Species in
Florida's Public Lands. Florida Dept.
Environmental Protection. Tallahassee,
Florida. 283 pp.; Wooding and Hardinsky 1990. Florida Field Nat.
18:12-14),
including the southeastern coast (Cunningham and Dunford 1970. Quart. J. Florida
Acad.
Sci. 33:279-280; Brady 1983. Florida Field Nat. 11:40-41; Hill et al.
1987. Wildl. Soc. Bull.
15:521-524; Wooding and Hardinsky, op. cit.). We report
here evidence of Coyote predation on Gopher
Tortoise hatchlings in southeastern
coastal Florida.</dc:description>
<dc:date>2006-02-21</dc:date>
<dc:type>text</dc:type>
<dc:format>application/pdf</dc:format>
<dc:identifier
xsi:type="dcterms:URI">http://digitalcommons.unl.edu/icwdm_usdanwrc/434</dc:identifier>
<dc:publisher>DigitalCommons@University of Nebraska -
Lincoln</dc:publisher>
</oai_dc:dc>
- Names: Use of a content standard will guide you in how to select names for a metadata record. The content standard together with a controlled vocabulary for names will assist you in formatting names.
- Subjects: Use terms from controlled vocabularies whenever
possible. When local headings are needed, it is best practice to maintain a
local vocabulary to ensure consistent application of terms and provide a
machine-readable indication in the metadata record that a local vocabulary is in
use.
<mods:subject authority="lcsh">
<mods:topic>Funeral rites and ceremonies</mods:topic>
<mods:geographic>Louisiana</mods:geographic>
<mods:geographic>New Orleans</mods:geographic>
</mods:subject>
<mods:subject authority="local">
<mods:topic>Jazz funerals</mods:topic>
</mods:subject>
Look at this shared record. Spend some time with it on its own. What doesn't make sense?
After doing some analysis on the raw record, look at the record in its native environment What would you have done differently to make this record more coherent and include the appropriate context?
In the next module, we'll talk through some issues related to the shareability of this record.
Next module: Defining Shareable Metadata: Analysis of Activity 2