Showing posts with label Ead McTaggart. Show all posts
Showing posts with label Ead McTaggart. Show all posts

Friday, September 30, 2011

EAD and Microsoft Access

Our collection database, and database of record, KIDB, is an Access database. We also have most of our folder lists in an Access database. What this means is that virtually all the metadata needed to construct EAD guides for our collections exists in Access databases. So instead of having a collection cataloged in MARC, extracting EAD from the MARC record using Terry Reese’s MarcEdit, tagging the container list with Ead McTaggart, merging the two resultant documents, and finally doing a fair amount of editing. I wanted push-button EAD.

I am not including all the code for doing this here. There is just too much of it, in reality it is most of the KIDB database. I will include the code that actually writes the front matter portion of the EAD document. You can find the code for the container list in an earlier post about EAD McTaggart. First a little bit about how to get ready to push the EAD button.

When I started this KIDB already had many of the elements needed to create an EAD guide. The collection title, creator, collection number, extent, etc. Some of the other material is pretty much boilerplate: contact information, repository, restrictions, citation, etc. What was missing were the descriptive elements like biography, abstract, organizational history, related collections, subjects. Step one was to build into KIDB a way to hold those elements and keep track of them. I did this with four tables. One, tblFrontMatter, has fields for the collection ID number, abstract, scope and content note, biography, organizational history, related collections, and subjects. A second, tblEntities, has fields for contact information, user information, and repository. The third keeps track of who added what to tblFrontMatter for which collection and when. The fourth is used to record who made any changes to the data in tblFrontMatter and when it was done.

With all the necessary data now contained in one database it was simply a matter of writing the code to tag it properly. The entire tagged code for the front matter is compiled in one variable, which is then added to a table designed to hold it. Then Ead McTaggart is called and he adds the tagged container list to the table, one row for each folder. When that is done a report pulls out the data row by row and exports it as an xml file. The table is then emptied.

Here is the code for tagging the front matter:

    strText = "<" & Chr(63) & "xml version=" & Chr(34) & "1.0" & Chr(34) & " encoding=" & Chr(34) & "utf-8" & Chr(34) & Chr(63) & ">"
    strText = strText & "<" & Chr(63) & "xml-stylesheet type=" & Chr(34) & "text/xsl" & Chr(34) & " href=" & Chr(34) & "../styles/style.xsl" & Chr(34) & Chr(63) & ">" & Chr(10)
    strText = strText & "<" & Chr(33) & "DOCTYPE ead PUBLIC " & Chr(34) & Chr(43) & "//ISBN 1-931666-00-8//DTD ead.dtd "
    strText = strText & "(Encoded Archival Description (EAD) Version 2002)//EN" & Chr(34) & " " & Chr(34) & "../dtds/ead.dtd" & Chr(34) & ">" & Chr(10)
    strText = strText & "<ead>" & Chr(10) & "<eadheader repositoryencoding=" & Chr(34) & "iso15511"
    strText = strText & Chr(34) & " relatedencoding=" & Chr(34) & "MARC21" & Chr(34) & " countryencoding="
    strText = strText & Chr(34) & "iso3166-1" & Chr(34) & " scriptencoding=" & Chr(34) & "iso15924" & Chr(34)
    strText = strText & " dateencoding=" & Chr(34) & "iso8601" & Chr(34) & " langencoding=" & Chr(34) & "iso639-2b" & Chr(34) & ">" & Chr(10)
    strText = strText & "<eadid mainagencycode=" & Chr(34) & "nic" & Chr(34) & " countrycode="
    strText = strText & Chr(34) & "us" & Chr(34) & " publicid=" & Chr(34) & "-//Cornell University::"
    strText = strText & "Cornell University Library::Kheel Center for Labor-Management Documentation and Archives//"
    strText = strText & "TEXT(US::NIC::KCL0" & strCollNum & "::" & strCollTitle & ".)//EN" & Chr(34) & ">"
    strText = strText & "KCL0" & strPathNum & ".xml</eadid>" & Chr(10)
    strText = strText & "<filedesc>" & Chr(10)
    strText = strText & "<titlestmt>" & Chr(10)
    strText = strText & "<titleproper>Guide to " & strCollTitle & "<date> " & strDate & "</date></titleproper>" & Chr(10)
    strText = strText & "<titleproper type=" & Chr(34) & "sort" & Chr(34) & ">" & strCollTitle & "</titleproper>" & Chr(10)
    strText = strText & "<author>Compiled by  " & strProcessor & "</author>" & Chr(10)
    strText = strText & "</titlestmt>" & Chr(10)
    strText = strText & "<publicationstmt>" & Chr(10)
    strText = strText & "<publisher>Kheel Center for Labor-Management Documentation and Archives, Cornell University Library</publisher>" & Chr(10)
    strText = strText & "<date>" & Format(dteDate, "MMMM dd, yyyy") & "</date>" & Chr(10)
    strText = strText & "</publicationstmt>" & Chr(10)
    strText = strText & "<notestmt>" & Chr(10)
    strText = strText & "<note audience=" & Chr(34) & "internal" & Chr(34) & ">" & Chr(10)
    strText = strText & "<p><subject>Labor</subject></p>" & Chr(10)
    strText = strText & "</note>" & Chr(10)
    strText = strText & "</notestmt>" & Chr(10)
    strText = strText & "</filedesc>" & Chr(10)
    strText = strText & "<profiledesc>" & Chr(10)
    strText = strText & "<creation>Finding aid encoded by KIDB, Ead McTaggart, and " & strEncoder & ", <date>" & Format(Date, "MMMM dd, yyyy") & "</date></creation>" & Chr(10)
    strText = strText & "</profiledesc>" & Chr(10)
    strText = strText & "</eadheader>" & Chr(10)
    strText = strText & "<frontmatter>" & Chr(10) & "<titlepage>"
    strText = strText & "<titleproper>Guide to the " & strCollTitle & "<lb/></titleproper>" & Chr(10)
    strText = strText & "<num>Collection Number: " & strCollNum & "</num>" & Chr(10)
    strText = strText & strAddress
    strText = strText & "<defitem>"
    strText = strText & "<label>Compiled by:</label>"
    strText = strText & "<item>" & strProcessor & "</item>"
    strText = strText & "</defitem>"
    strText = strText & "<defitem>"
    strText = strText & "<label>EAD encoding:</label>"
    strText = strText & "<item>" & strEncoder & ", " & Format(Date, "MMMM dd, yyyy") & "</item>"
    strText = strText & "</defitem>"
    strText = strText & "</list>"
    strText = strText & "<date>© " & DatePart("yyyy", Date) & " Kheel Center for Labor-Management Documentation and Archives, Cornell University Library </date>" & Chr(10)
    strText = strText & "</titlepage>" & Chr(10) & "</frontmatter>" & Chr(10)
    strText = strText & "<archdesc level=" & Chr(34) & "collection" & Chr(34) & ">" & Chr(10) & Chr(9) & Chr(9) & "<did>" & Chr(10)
    strText = strText & "<head id=" & Chr(34) & "a1" & Chr(34) & ">DESCRIPTIVE SUMMARY</head>" & Chr(10)
    strText = strText & "<unittitle label=" & Chr(34) & "Title:" & Chr(34) & " encodinganalog=" & Chr(34) & "MARC 245$a" & Chr(34) & ">" & strCollTitle & "," & Chr(10)
    strText = strText & "<unitdate encodinganalog=" & Chr(34) & "MARC 245$f" & Chr(34) & ">" & strDate & "</unitdate>" & Chr(10)
    strText = strText & "</unittitle>" & Chr(10)
    strText = strText & "<unitid label=" & Chr(34) & "Collection Number:" & Chr(34) & ">" & strCollNum & "</unitid>" & Chr(10)
    strText = strText & "<origination label=" & Chr(34) & "Creator:" & Chr(34) & ">" & Chr(10)
    strText = strText & "<persname encodinganalog=" & Chr(34) & "MARC 100" & Chr(34) & " role=" & Chr(34) & "creator" & Chr(34) & ">" & strCollCreator & "</persname>" & Chr(10)
    strText = strText & "</origination>" & Chr(10)
    strText = strText & "<physdesc label=" & Chr(34) & "Quantity:" & Chr(34) & " encodinganalog=" & Chr(34) & "MARC 300" & Chr(34) & ">" & dblLinear & " linear ft.</physdesc>" & Chr(10)
    strText = strText & "<physdesc label=" & Chr(34) & "Forms of Material:" & Chr(34) & ">Articles, reprints, pamphlets, correspondence, photographs.</physdesc>" & Chr(10)
    strText = strText & strRepository & Chr(10)
    strText = strText & strAbstract
    strText = strText & "<langmaterial label=" & Chr(34) & "Language:" & Chr(34) & ">Collection material in <language encodinganalog=" & Chr(34) & "MARC 041" & Chr(34) & " langcode=" & Chr(34) & "eng" & Chr(34) & ">English</language>" & Chr(10)
    strText = strText & "</langmaterial>" & Chr(10)
    strText = strText & "</did>" & Chr(10)
    strText = strText & strTopOrgHist
    strText = strText & strBio
    strText = strText & strOrgHist
    strText = strText & strScope
    strText = strText & strSubjects
    strText = strText & "<descgrp><head id=" & Chr(34) & "a10" & Chr(34) & ">INFORMATION FOR USERS</head>"
    strText = strText & "<accessrestrict><head>Access Restrictions:</head>"
    strText = strText & "<p>Access to the collections in the Kheel Center is restricted. Please contact a reference archivist for access to these materials.</p>"
    strText = strText & "</accessrestrict><userestrict><head>Restrictions on Use:</head>"
    strText = strText & "<p>This collection must be used in keeping with the Kheel Center Information Sheet and Procedures for Document Use.</p>"
    strText = strText & "</userestrict><prefercite><head>Cite As:</head>"
    strText = strText & "<p>" & strCollTitle & " #" & strCollNum & ". Kheel Center for Labor-Management Documentation and Archives, Cornell University Library.</p>"
    strText = strText & "</prefercite></descgrp>"
    strText = strText & strRelated



A few notes on some of the variables. The collection number is in two variables: strCollNum has the collection number as it appears in KIDB (i.e. 5169/043 AV), strPathNum has the collection number formatted to work as a valid file name (i.e. 5619-043av). How you choose to store multiple value fields will determine how you populate variables like strCreator, strSubjects, strRelated. I list creators in one field, delimited with a semi-colon. It is the same with related collections. When I pull the data I use the split function to separate out the values, format and tag then for EAD, then reassemble them in a single variable. Subjects are stored with a line break between them, each one beginning with the MARC field code (600: for a person and so on). For subjects I split on the line break (chr(10)) and use the MARC field to set the tags (persname, corpname, etc.) and the attributes.

Friday, March 25, 2011

Re-Formatting Finding Aids for Conversion to EAD: Parsing the Date From the Title

Quite often our legacy finding aids have the date at the end of the title field, rather than as in a separate data field. For Ead McTaggart to properly format the EAD guides, I need the year, or range of years to be in a separate date field. This means I need a way to move the year from the title field to the date field without copying and pasting hundreds, often thousands, of times per finding aid. I automated this process using an if-then-else statement.

Fortunately the dates in most of our findings conform to a few variations of a few formats: they end in a single four-digit year (i.e. 1984), year range (i.e. 1920-1932 or 1920-32); or have the elements of the date separated by slashes (i.e. 3/25/1911 or 3/25/11).

If we assume the first title is in cell C2 then if the date ends in a four-digit year I just want the last four digits. To determine if this is the case the logic test of the if-then-else statement looks to see if the fifth character from the right is a space or a slash:

= IF((MID(C2,LEN(C2)-4,1)= " ")+(MID(C2,LEN(C2)-4,1)= "/"),RIGHT(C2,4),…)

If the date looks like 1920-1932 then the fifth character from the right is a dash and I want the last nine characters:

=IF(MID(C2,LEN(C2)-4,1)= "-",RIGHT(C2,9),…)

But if the date looks like 1920-32 then the third character from the right is a dash and I not only want the last seven characters, but I also want to insert “19” immediately after the dash:

=IF(MID(C2,LEN(C2)-2,1)="-",MID(C2,LEN(C2)-6,5)&"19"&RIGHT(C2,2),…)

That just leaves the dates separated by slashes with only the last two digits of the year. In this case the third character from the right is a slash and I want the last two characters with “19” inserted in front of them:

=IF(MID(C2,LEN(C2)-2,1)="/","19"&RIGHT(C2,2),…)


This last test will actually have to be the first test for those times when the date is just the month and the year (i.e. 5/14), otherwise the formula will see that the fifth character from the right is a space and pull the last four characters before it sees that the third character from the right is a slash.

And if none of the above conditions are met then there is no date at the end of the title and I want nothing in the title field so I set the final false value to a zero length string: "".

Here is the complete formula:

= IF(MID(C2,LEN(C2)-2,1)="/","19"&RIGHT(C2,2), IF((MID(C2,LEN(C2)-4,1)= " ")+(MID(C2,LEN(C2)-4,1)= "/"),RIGHT(C2,4),IF(MID(C2,LEN(C2)-4,1)= "-",RIGHT(C2,9),IF(MID(C2,LEN(C2)-2,1)="-",MID(C2,LEN(C2)-6,5)&"19"&RIGHT(C2,2), ""))))

There will, of course be some outliers: oddly formatted dates, and so on. Still this formula will work for the vast majority of the dates I see in our finding aids. Proof-reading picks up the outliers.

Monday, March 14, 2011

Re-Formatting Finding Aids for Conversion to EAD: Scope Content Notes

Many of our legacy finding aids here at the Kheel Center are either Microsoft Word documents or scans of hard copy documents. Since I use Ead McTaggart to tag container lists these need to be reformatted as Microsoft Excel documents. If a simple copy and paste does not convert the text to columns that can be easily done using the “Text to Columns” function in Excel. What is a little more difficult is when the title and scope/content notes are parse into separate lines rather than separate columns:


A B C
Box Folder Title
1 1 Folder One Title
Folder One Scope Content
1 2 Folder Two Title
Folder Two Scope Content


What I need to do in this case is move the scope/content notes from C3 to D2 and C5 to D4. You can drag and drop, but there is a quicker way: in cell D2 enter this formula:

=If(A3>0,””,C3)
This translates to “If the value in cell A3 is greater than zero (that is it contains a box number) set the value of cell D2 to a zero length string, if the value of cell A3 is not greater than zero (that is it has no box number) set the value of cell D2 equal to the value of cell C3.”

Moving cursor to the handle at the lower right corner of cell D2, changes it from a large white cross to a small black cross. Double clicking will fill the column down with the formula. The column will fill down until it reaches an empty cell in column C. This may, or may not, be the bottom of column C. Using “control, down arrow” to test this, the cell selection will move to the bottom of the column. If that is also the bottom of column C repeat the fill down process until the bottom of column C is reached.

The spreadsheet now looks this:


A B C D
Box Folder Title ScopeContent
1 1 Folder One Title Folder One Scope Content
Folder One Scope Content
1 2 Folder Two Title Folder Two Scope Content
Folder Two Scope Content

But, column D actually contains only formulas, not text. Highlight column D and copy, then right click the column and select “Paste Special”, then select “Values”. I now have text in column D.

At this point I insert a new column, called “Index”, between the Title and ScopeContent columns. Enter “1” into cell D2 and “2” into cell D3, then select both D2 and D3 and fill down. This gives me an index I can use to maintain or re-establish the proper order of the data.


A B C D E
Box Folder Title Index ScopeContent
1 1 Folder One Title 1 Folder One Scope Content


Folder One Scope Content 2
1 2 Folder Two Title 3 Folder Two Scope Content


Folder Two Scope Content 4

Now sort the table on column A. This will group all the lines with no box numbers together at the bottom of the spreadsheet. I can now delete those rows and resort on column D, Index:


A B C D E
Box Folder Title Index ScopeContent
1 1 Folder One Title 1 Folder One Scope Content
1 2 Folder Two Title 3 Folder Two Scope Content

At this point I re-index so that there are no missing numbers in the Index column. Not really necessary, but I do it anyway.

The finding is now formatted in Excel for conversion by Ead McTaggart. Except that the dates are probably part of the title.

Next week: Re-Formatting Finding Aids for Conversion to EAD: Parsing the Date From the Title