Metadata for You & Me - Defining Shareable Metadata: Coherence

Module Content

Screencast


Powerpoint Slides and Other Resources


Module Text

1. Defining coherence for shareable records

A shareable metadata record should make sense on its own, outside of the local institutional context and without access to the resource itself. The records we reviewed at the beginning of the course displayed some coherence problems.

2. Putting it into practice

Creating a coherent shareable metadata record is largely accomplished through simple common sense. The key is considering the metadata record on its own, without any other supporting information.

In the following record, the dc:subject element includes two subject headings rather than just one. Although to a human the use of the semi-colon to distinguish the two makes them comprehensible, to a computer, it is much more difficult. Note also here that one of the subject values itself includes a semi-colon, as part of the entity reference & representing an ampersand. In order to build a subject browse, for example, the aggregator will have to separate the individual subjects out of this field. It is better practice to repeat the elements for multiple values in shared records. Also note the value in the dc:description element - this resource is a postcard, so this value could go in the more specific dc:type element.

<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Jefferson Monument, Louisville, Ky.</dc:title>
<dc:description>The Thomas Jefferson monument in Louisville, Kentucky. Jefferson stands upon a pedestal supported by four winged female figures; he holds a partially unrolled scroll. The pedestal, Jefferson's figure and the sky behind the statue are colorized; the base is grey. This monument was given to the city of Louisville by Isaac W. Bernheim. The verso bears a postmark of May 5, 1913.</dc:description>
<dc:subject>Monuments &amp; memorials; Jefferson, Thomas, 1743-1826--Monuments;</dc:subject>
<dc:coverage>Louisville (Ky.)</dc:coverage>
<dc:date>1913?</dc:date>
<dc:description>Postcards</dc:description>
<dc:date>2006-03-23</dc:date>
<dc:type>Still image</dc:type>
<dc:identifier>ULUA.008.007</dc:identifier>
<dc:language>eng</dc:language>
<dc:identifier>http://digital.library.louisville.edu/u?/ulua001,98</dc:identifier>
</oai_dc:dc>

In the following record, the term "Silver gelatin prints" appears in a dc:source field, instead of the more expected element dc:type. In dc:source, it's unlikely to be identified as a resource type by an aggregator. The dc:subject field includes two subjects, separated by a semi-colon, which would be better in two separate subject elements. Note also the second dc:title in the record. This refers to the title of the collection in which the photograph resides. This information is probably best recorded in dc:relation.

<oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Washing & ironing clothes.</dc:title>
<dc:title>Braceros in Oregon Photograph Collection.</dc:title>
<dc:creator/>
<dc:date>ca. 1942</dc:date>
<dc:description>Mexican workers washing and ironing clothes.</dc:description>
<dc:subject> Agricultural laborers--Mexican--Oregon; Agricultural laborers--Housing--Oregon; Laundry </dc:subject>
<dc:coverage/>
<dc:type>Image</dc:type>
<dc:source>Silver gelatin prints</dc:source>
<dc:title> Extension Bulletin Illustrations Photograph Collection (P20)</dc:title>
<dc:identifier>P20:1069</dc:identifier>
<dc:source>Copy negative.</dc:source>
<dc:rights> Permission to use must be obtained from OSU Archives.</dc:rights>
<dc:description> Master scanned with Epson 1640XL scanner at 600 or 800 dpi. Image manipulated with Adobe Photoshop ver. 7.0. </dc:description>
<dc:identifier>P020_1069.</dc:identifier>
<dc:identifier>http://digitalcollections.library.oregonstate.edu/u?/bracero,37 </dc:identifier>
</oai_dc:dc>

3. What to look for in specific element types

Here are a handful of things to think about in terms of coherence for specific elements. The caution above about packing elements applies to all.

4. Coherently linking to a resource

(Note: Much of the text below is adapted from the Best Practices for Shareable Metadata. Jenn and Sarah were involved in the writing and editing of these Best Practices.)

For the most part, the shared metadata describes Internet-accessible resources, i.e., the metadata contain URLs that lead a user to the digital resource. However, this is not always the case: some metadata provides share metadata that describes analog resources that are not available digitally. Either way, appropriate links are an important piece of the ultimate shareability of the record. Virtually all brief record displays in metadata aggregators provide some sort of link to the resource taken from the metadata record.

In the case of metadata describing digital resources, the location to which users are sent after clicking on a URL is of critical importance for the end-users, but the URL link also affects the credibility of both the metadata provides and aggregators. It is the URL that most often links the metadata record in the service provider's aggregation with the actual resource in the metadata provider's repository. Send users to the wrong place--or to an empty page--and they may leave both the metadata provider's and aggregator's sites altogether. End-users may also become frustrated if they must wade through many layers of indirection (i.e., numerous mouse clicks) before viewing the resource itself.

Numerous links may be applicable to a resource. A metadata record may describe a single resource available as a single digital file, but this is less common than one might think. Resources may have multiple parts, such as multipage texts. A metadata record describing an analog or digital resource may have a URL pointing to the institution's homepage or a page describing the collection. Metadata records may also include URLs to pages describing conditions of use, copyright information for the resource, related resources, etc. As a result of this variability, the specific level of representation a URL should point to cannot easily be prescribed, but the following best practices should be followed whenever possible.

In all cases, it is a best practice to use some form of standard, persistent URL (such as a PURL, Handle, or a resolvable DOI) if possible. If this is not possible, it is a best practice to keep URLs up to date in the shared metadata records. Link rot is a significant problem for service providers, and one that is entirely out of their control except through reharvesting updated metadata records from the data provider.

Strictly speaking, a URL must be a valid URI. Implementers should refer to the most current documentation to determine which characters are reserved and unreserved and which may need to be escaped. URIs require that non-ASCII characters and some ASCII characters be escaped. In practice, this may require special treatment of certain characters, such as spaces, in URLs.

It is a best practice to include the appropriate scheme prefix (i.e., http://, ftp://, etc.) in links provided in metadata records.

It is a best practice to provide either one, primary URL or a machine-readable indication of which link in a record is points to the resource with its contextual material (e.g., metadata, navigation to the collection homepage).

<mods:location>
<mods:url usage="primary display" access="object in context">http://purl.dlib.indiana.edu/iudl/archives/cushman/P04995
<mods:url access="raw object">http://purl.dlib.indiana.edu/iudl/archives/cushman/screen/P04995.jpg
</mods:location>

In this MODS record <location><url> allows a usage attribute set to "primary display" to indicate the primary URL. Thus the <location><url> element could be repeated, albeit not with usage="primary display". In the case of a Dublin Core record, a single <dc:identifier> element should contain the URL that points to the resource. This element should not be repeated unless with information that is not a URL (for example, an ISBN).

The use of one primary URL or the indication of such allows aggregators to know what URL to provide as the link to the resource. In cases of multipart resources or resources with multiple manifestations, this best practice means that metadata providers must make a decision about what is the most relevant or appropriate place for a user to get access to the resource.

The best practice of the URL pointing to the resource in context is important because many aggregators do not display the full metadata record harvested from a metadata provider. Thus, if the primary link to a resource is to a stand-alone version of the resource (such as a JPG image only), an end-user will have no context except for the metadata on the aggregator's site. This does not serve the end-user well, nor does it serve the metadata provider well as the end-user cannot easily navigate to other parts of the metadata provider's collection. At a minimum, the URL should point to a page that contains the resource and a navigation bar that allows users to reach the collection homepage. It is highly desirable that this page also include the descriptive metadata for the resource.

If it is not possible to provide a URL that links to the resource, the end-user should be able to access the resource with a minimum number of clicks (at most two) with a minimum amount of effort. For example, in the case of the arXiv.org data repository, the URL provided takes the user to a page with basic metadata and a selection of formats to choose from (see http://arxiv.org/abs/chem-ph/9403001 as an example). The resource is only one click away. The same is true of DSpace repositories.

It is particularly bad practice to include as the primary URL for a resource the collection homepage with the expectation that the end-user will conduct what is probably an additional search to find the relevant resource. End-users' frustration with this particular practice is described by Shreeves and Kirkham (2004):

[The subjects] reported a significant slowing of their efforts when a pointer, or active link, within a record led them to another institution's Web site in which they had to execute an additional search. The subjects clearly believed that a live URL in a search result should immediately display the digital object of interest. One student described the interaction as being comparable to going to McDonald's "and upon walking up to the counter, the employee hands across directions to Burger King across town.

Next module: Defining Shareable Metadata: Context