Friday, March 25, 2011

Re-Formatting Finding Aids for Conversion to EAD: Parsing the Date From the Title

Quite often our legacy finding aids have the date at the end of the title field, rather than as in a separate data field. For Ead McTaggart to properly format the EAD guides, I need the year, or range of years to be in a separate date field. This means I need a way to move the year from the title field to the date field without copying and pasting hundreds, often thousands, of times per finding aid. I automated this process using an if-then-else statement.

Fortunately the dates in most of our findings conform to a few variations of a few formats: they end in a single four-digit year (i.e. 1984), year range (i.e. 1920-1932 or 1920-32); or have the elements of the date separated by slashes (i.e. 3/25/1911 or 3/25/11).

If we assume the first title is in cell C2 then if the date ends in a four-digit year I just want the last four digits. To determine if this is the case the logic test of the if-then-else statement looks to see if the fifth character from the right is a space or a slash:

= IF((MID(C2,LEN(C2)-4,1)= " ")+(MID(C2,LEN(C2)-4,1)= "/"),RIGHT(C2,4),…)

If the date looks like 1920-1932 then the fifth character from the right is a dash and I want the last nine characters:

=IF(MID(C2,LEN(C2)-4,1)= "-",RIGHT(C2,9),…)

But if the date looks like 1920-32 then the third character from the right is a dash and I not only want the last seven characters, but I also want to insert “19” immediately after the dash:

=IF(MID(C2,LEN(C2)-2,1)="-",MID(C2,LEN(C2)-6,5)&"19"&RIGHT(C2,2),…)

That just leaves the dates separated by slashes with only the last two digits of the year. In this case the third character from the right is a slash and I want the last two characters with “19” inserted in front of them:

=IF(MID(C2,LEN(C2)-2,1)="/","19"&RIGHT(C2,2),…)


This last test will actually have to be the first test for those times when the date is just the month and the year (i.e. 5/14), otherwise the formula will see that the fifth character from the right is a space and pull the last four characters before it sees that the third character from the right is a slash.

And if none of the above conditions are met then there is no date at the end of the title and I want nothing in the title field so I set the final false value to a zero length string: "".

Here is the complete formula:

= IF(MID(C2,LEN(C2)-2,1)="/","19"&RIGHT(C2,2), IF((MID(C2,LEN(C2)-4,1)= " ")+(MID(C2,LEN(C2)-4,1)= "/"),RIGHT(C2,4),IF(MID(C2,LEN(C2)-4,1)= "-",RIGHT(C2,9),IF(MID(C2,LEN(C2)-2,1)="-",MID(C2,LEN(C2)-6,5)&"19"&RIGHT(C2,2), ""))))

There will, of course be some outliers: oddly formatted dates, and so on. Still this formula will work for the vast majority of the dates I see in our finding aids. Proof-reading picks up the outliers.

No comments:

Post a Comment