Preparing files to Upload
Acceptable file types
Your data can be in any of the following file types:
- Comma-separated value (.csv or .txt)
- Tab delimited (.tab or .txt)
- Microsoft Excel (.xls or .xlsx)
- dBASE (.dbf)
Example of a comma-separated value (CSV) file
A CSV file contains the values in a table as a series of ASCII text lines organized so that each column value is separated by a comma from the next column's value and each row starts a new line. It can be opened in Microsoft Excel or a text editor as a either a .csv or a .txt file. When viewed in a text editor, the file looks like this:
Example of a tab delimited file
A tab delimited file is a special kind of plain text file with a tab between each column in the text. When imported into the desktop publishing application, the tabs allow the columns to line up neatly. When viewed in a text editor, the file looks like this:
Example of an Excel (.xls) file
An Excel file is displayed as a spreadsheet, with data organized into columns and rows. It can be exported as either a CSV file or a tab delimited file (see above). When using Excel (.xls or .xslx) file formats, please be aware that only the first worksheet in the file will be processed. When viewed in Microsoft Excel, the file looks like this:
Definition of Xbase Data File (.dbf) file
Xbase data file is the central table in an Xbase database. All other data files are related to this one file. The Data File is a mix of binary and ASCII data. Header contains binary data. The records are all in ASCII.
How to format your data
In order for your records to be interpreted correctly by the mapping tool, your data needs to meet certain formatting criteria. Make sure that:
- The first row of the file contains either field names or the first data record.
-
-
First row of a CSV file:
- First row of a tab delimited file:
Notice that in text files, the lines tend to wrap, so it can be hard to see where the row ends. The first three lines of text highlighted above represent the first row of the file – in this case, the first row contains field names. - First row of an Excel file:
When using Excel (.xls or .xslx) file formats, please be aware that only the first worksheet in the file will be processed.
-
First row of a CSV file:
- Each subsequent row contains one – and only one – record.
- Data appears in a single file or worksheet. Programs like Excel allow tables to be stored in separate worksheets. Make sure all your data is in a single file and in a single worksheet.
- The file is less than 15MB. The Metadata Mapping Tool will time out (your browser will stop uploading/processing) if a file takes longer than 5 minutes to upload.
Examples of how to format your data
Incorrect formatting
The mapping tool cannot interpret a file like the one below:
- Rows 1, 2, and 4 need to be removed. Rows 1 and 2 need to be removed because the first row can only contain field names or the first data record. Row 4 needs to be removed because subsequent rows can only contain data records.
- The field names currently in row 3 (Composer/Title/Publisher, etc.) should appear in the first row.
Corrected formatting
The mapping tool can use this data:
- The information in the first two fields was removed from the spreadsheet (and re-entered during the mapping process as additional static fields). Now the first row contains field names.
- "Volume" information has been transferred to a separate column (since rows 2 and up can only contain record data).
The example below, in which the first row contains the first record, also works with the mapping tool.
Preparing files from Microsoft Access
Exporting data from Access to Excel
Since the metadata mapping tool will not accept Microsoft Access database files in the .accd format (see acceptable file types), sheet music metadata stored in an Access database must be exported to a different format, such as Microsoft Excel.
If your Access table is formatted like the one below, then your data can be exported into Microsoft Excel. Notice that each line contains a separate, non-duplicated record. The first column contains the song ID, each of which only appears once.
To export the data above, right-click on the table name in the left-hand panel. From the context menu that pops up, first select Export, then select Excel.
In the box labeled "File name," type or browse to identify the location where you want to save your Excel spreadsheet.
Once your data is in an Excel spreadsheet, consult the beginning of this section, How to format your data, to ensure that it is ready for uploading.
Potential problems with Access database structure
Many Access databases store data in multiple tables. While this makes sense from the perspective of data storage, it can create problems in terms of having a flat file to upload to the mapping tool.
If you pull your data into a single table and it looks like the data below, you must reformat it before uploading it to the mapping tool. The problem comes from the repetition of records: the song with ID 20, "No one will ever know my heart" is listed once with composer "Foree, Mel" and again with composer "Rose, Fred." If this is uploaded into the metadata mapping tool, there will be no way of showing that these two lines are, in fact, a single record.
If your data looks like the data above, you must reformat it so that each line contains all the information for a given song. In the table below, additional fields have been added to capture Composer information. The metadata for the song "No one will ever know my heart" is located on a single line, with Composer 1 being "Foree, Mel" and Composer 2 being "Rose, Fred."
Now the table is ready to be exported into Excel and uploaded to the metadata mapping tool.
Tips for making your metadata work with the SMC site
The Sheet Music Consortium promotes the creation and sharing of useable metadata. Following SMC Cataloging Guidelines will help you to create MODS and Dublin Core records that can be used by other institutions and applications.
General practices
- Elements contain no extraneous data. For example:
- Use "1961," not "Published in 1961."
- Use "Nash, Gene," not "Composed by Gene Nash."
- Each label refers to precisely one kind of data. This makes it possible to generate a more accurate MODS record. For example:
- Use separate columns for "Composer" and "Lyricist," not one column for "Composer or Lyricist." Given the latter, the user will have to map the whole column to the generic "Name," which is less informative than either "Composer" or Lyricist."
- Each field contains precisely one data value. For example:
- Use "Subject 1: Flowers," "Subject 2: Candy," not "Subject: Flowers, Candy."
Elements important for discovery
Because they will be used for functions like browsing and grouping, it is critical that the elements below adhere to our guidelines.
- Titles
- Titles should be listed without initial articles ("The," "An," "La," etc.) so that users can browse your records by title. Initial articles should be stored in a separate data field.
- Names
- If a work is attributed to "unknown" or "anonymous," leave the name element blank; do not include these terms as name metadata.
- Enter names in inverted order, in the fullest form possible, with no period at the end.
- We highly recommend using the Library of Congress Name Authority File (LCNAF) to ensure name consistency across records.
- Dates
- Provide dates in four digit year format (YYYY). If a more precise date is known, use YYYY-MM or YYYY-MM-DD.
- When a date is cataloger-supplied, indicate this through the use of a date range (described below) rather than inserting characters such as "ca.", brackets or a question mark as part of the date string.
-
- When only a decade is known, enter a date range for the entire decade (e.g. for the 1930s, enter: 1930/1939).
- When only a century is known, enter a date range for the entire century (e.g. for the 19th century, enter: 1800/1899).
- If no date is known, leave the element blank.
- Subjects
- If a controlled vocabulary for subject terms is desired, take the form of the subject term from a standard subject thesaurus, such as the Library of Congress Thesaurus for Graphical Materials I: Subject Terms (TGM I).
- If creating terms, be as unambiguous as possible. If a homograph (multiple meanings for one spelling of a word) must be used, provide a parenthetical note for the domain to which the term belongs (e.g. mercury (metal), Mercury (planet), or Mercury (Roman deity)). If a singular object is depicted use the plural form of its term (e.g. picture of one cow, use: "Cows," or picture of a goose, use: "Geese").
- Collection
- Including this information will allow users to search or browse within your collection. Even if the collection name is not included in the original metadata records, static fields can be added easily during the mapping process.
- Prefer taking the collection name from a standard naming authority file, such as the Library of Congress Name Authority File (LCNAF). If a name does not appear in an authority file, establish it according to a content standard such as Anglo-American Cataloguing Rules (AACR2), Cataloging Cultural Objects (CCO), or Describing Archives: a Content Standard (DACS).