Thursday, March 31, 2011

Getting a Proper Title Sort Order in Microsoft Access

In our publications database I have my student workers enter the titles as they appear, which means that they often have an initial article. Unlike MARC Access does not have a built-in way to tell it to ignore the first four, or whatever, spaces before sorting, so all the titles beginning with “The” are grouped together and all the titles beginning with “An” are grouped together, as so on. I wanted the sort order to ignore any initial articles, since that is the way most users except a list of titles to be constructed. I had thought of adding an “non-indexing spaces” field and using that to control the sort order. This idea was rejected because it meant another data element that the student workers would have to enter, and they would have to think about it not just transcribe what was on the item. Instead, I decided to deal with the initial articles after the fact.

What I did was create a function which strips the initial articles from the titles as the list is being created and the output from the function is used to sort the list even though the full title is displayed. I put this function in a module I call BasicFunctionsRDM, which is where I park any function I might use in other databases, if I need it I just import the entire module into another database. You can place this code in any module you like.

Public Function Article(strTitle As String)
     'Debug.Print "Before: " & strTitle

     'First make sure there are no double spaces after the initial article
     'replace any double spaces with a single space.

     strTitle = Replace(strTitle, Chr(32) & Chr(32), Chr(32))

     'Strip off any initial double quotes

     If Mid(strTitle, 1, 1) Like Chr(34) Then
          strTitle = Mid(strTitle, 2)
     End If

     'Strip the initial article from the title by using the mid-string function.
     'Always include, and count, the space after the article, otherwise you will
     'end up stripping the first two characters from a title beginning "Always".

     If Mid(strTitle, 1, 4) Like "The " Then
          strTitle = Mid(strTitle, 5)
     ElseIf Mid(strTitle, 1, 2) Like "A " Then
          strTitle = Mid(strTitle, 3)
     ElseIf Mid(strTitle, 1, 3) Like "An " Then
          strTitle = Mid(strTitle, 4)
     'Strip off any initial spaces.
     ElseIf Mid(strTitle, 1, 1) Like Chr(32) Then
          strTitle = Mid(strTitle, 2)
     'Strip off any single quotes at the beginning of the title.
     ElseIf Mid(strTitle, 1, 1) Like Chr(33) Then
          strTitle = Mid(strTitle, 2)
     'Strip off any exclamation points at the beginning of the title.
     ElseIf Mid(strTitle, 1, 1) Like Chr(39) Then
          strTitle = Mid(strTitle, 2)
     ElseIf Mid(strTitle, 1, 3) Like "El " Then
          strTitle = Mid(strTitle, 4)
     'Use a single character wildcard to catch both "Le " and "La ".
     ElseIf Mid(strTitle, 1, 3) Like "L? " Then
          strTitle = Mid(strTitle, 4)
     ElseIf Mid(strTitle, 1, 3) Like "De " Then
          strTitle = Mid(strTitle, 4)
     ElseIf Mid(strTitle, 1, 3) Like "Di " Then
          strTitle = Mid(strTitle, 4)
     ElseIf Mid(strTitle, 1, 3) Like "Il " Then
          strTitle = Mid(strTitle, 4)
     ElseIf Mid(strTitle, 1, 3) Like "Un " Then
          strTitle = Mid(strTitle, 4)
     ElseIf Mid(strTitle, 1, 2) Like "l'" Then
          strTitle = Mid(strTitle, 3)
     'Use a single character wildcard to catch both "Uno " and "Una ".
     ElseIf Mid(strTitle, 1, 4) Like "Un? " Then
          strTitle = Mid(strTitle, 5)
     'Use a single character wildcard to catch "Les ", "Las ", and "Los ".
     ElseIf Mid(strTitle, 1, 4) Like "L?s " Then
          strTitle = Mid(strTitle, 5)
     ElseIf Mid(strTitle, 1, 4) Like "Gli " Then
          strTitle = Mid(strTitle, 5)
     'Use a single character wildcard to catch "Der ", "Dem ", "Den ", and "Des ".
     ElseIf Mid(strTitle, 1, 4) Like "De? " Then
          strTitle = Mid(strTitle, 5)
     ElseIf Mid(strTitle, 1, 6) Like "Einen " Then
          strTitle = Mid(strTitle, 7)
     ElseIf Mid(strTitle, 1, 5) Like "Eine " Then
          strTitle = Mid(strTitle, 6)
     ElseIf Mid(strTitle, 1, 4) Like "Ein " Then
          strTitle = Mid(strTitle, 5)
     ElseIf Mid(strTitle, 1, 4) Like "Die " Then
          strTitle = Mid(strTitle, 5)
     ElseIf Mid(strTitle, 1, 4) Like "Das " Then
          strTitle = Mid(strTitle, 5)
     End If

     'Strip off any double quotes left at the beginning of the title.
     If Mid(strTitle, 1, 1) Like Chr(34) Then
          Article = Mid(strTitle, 2)
     Else:
          Article = strTitle
     End If

     'Debug.Print "After: " & Article

End Function

It’s not perfect. I’m sure I’ve missed some articles in other languages and probably some for the languages I did try to cover, but it is very easy to add another article and this pretty well covers the ones most common in our database.

To use this function simply include it the “Order By” clause of your SQL query. So if you have a text box, txtKeyword, and a search button, cmdSearch, the code in the “On Click” event would look something like this:

Private Sub cmdKeyword_Click()
On Error GoTo Err_cmdKeyword_Click

     Dim strKey As String       'Variable for the keyword.
     Dim strSQL As String      'Variable for the SQL query to populate the results list.

     strKey = Nz(Me!txtKeyword, "")
     strKey = Authority(strKey)
     'Debug.Print strKey
     'Format the keyword with wildcards and single quotes.
     strKey = "'*" & strKey & "*'"
     'Debug.Print "strKey = " & strKey

     'This SQL query pulls the pamphlet ID, title, and date for any pamphlet with
     'the keyword anywhere in the title. It then sorts it by title, ignoring initial articles.

     strSQL = "SELECT tblPams.PamID, tblPams.Title, tblPams.Date FROM tblPams "
     strSQL = strSQL & "WHERE tblPams.Title LIKE " & strKey & " "
     strSQL = strSQL & " ORDER BY Article(tblPams.Title), tblPams.Date;"
     'Debug.Print strSQL

     'Display the search results in a list box.
     Me.lstPamList.RowSource = strSQL
     Forms!frmStartPage.Refresh

     'Display the search query in a text box, txtString, on the form.
     Me!txtString = "Results for Search: Keyword Anywhere LIKE " & strKey & ""

     'Clear the keyword text box.
     Me!txtKeyword = Null

     Exit_cmdKeyword_Click:
          Exit Sub

Err_cmdKeyword_Click:
     MsgBox Err.Description
     Resume Exit_cmdKeyword_Click

End Sub

Friday, March 25, 2011

Re-Formatting Finding Aids for Conversion to EAD: Parsing the Date From the Title

Quite often our legacy finding aids have the date at the end of the title field, rather than as in a separate data field. For Ead McTaggart to properly format the EAD guides, I need the year, or range of years to be in a separate date field. This means I need a way to move the year from the title field to the date field without copying and pasting hundreds, often thousands, of times per finding aid. I automated this process using an if-then-else statement.

Fortunately the dates in most of our findings conform to a few variations of a few formats: they end in a single four-digit year (i.e. 1984), year range (i.e. 1920-1932 or 1920-32); or have the elements of the date separated by slashes (i.e. 3/25/1911 or 3/25/11).

If we assume the first title is in cell C2 then if the date ends in a four-digit year I just want the last four digits. To determine if this is the case the logic test of the if-then-else statement looks to see if the fifth character from the right is a space or a slash:

= IF((MID(C2,LEN(C2)-4,1)= " ")+(MID(C2,LEN(C2)-4,1)= "/"),RIGHT(C2,4),…)

If the date looks like 1920-1932 then the fifth character from the right is a dash and I want the last nine characters:

=IF(MID(C2,LEN(C2)-4,1)= "-",RIGHT(C2,9),…)

But if the date looks like 1920-32 then the third character from the right is a dash and I not only want the last seven characters, but I also want to insert “19” immediately after the dash:

=IF(MID(C2,LEN(C2)-2,1)="-",MID(C2,LEN(C2)-6,5)&"19"&RIGHT(C2,2),…)

That just leaves the dates separated by slashes with only the last two digits of the year. In this case the third character from the right is a slash and I want the last two characters with “19” inserted in front of them:

=IF(MID(C2,LEN(C2)-2,1)="/","19"&RIGHT(C2,2),…)


This last test will actually have to be the first test for those times when the date is just the month and the year (i.e. 5/14), otherwise the formula will see that the fifth character from the right is a space and pull the last four characters before it sees that the third character from the right is a slash.

And if none of the above conditions are met then there is no date at the end of the title and I want nothing in the title field so I set the final false value to a zero length string: "".

Here is the complete formula:

= IF(MID(C2,LEN(C2)-2,1)="/","19"&RIGHT(C2,2), IF((MID(C2,LEN(C2)-4,1)= " ")+(MID(C2,LEN(C2)-4,1)= "/"),RIGHT(C2,4),IF(MID(C2,LEN(C2)-4,1)= "-",RIGHT(C2,9),IF(MID(C2,LEN(C2)-2,1)="-",MID(C2,LEN(C2)-6,5)&"19"&RIGHT(C2,2), ""))))

There will, of course be some outliers: oddly formatted dates, and so on. Still this formula will work for the vast majority of the dates I see in our finding aids. Proof-reading picks up the outliers.

Monday, March 14, 2011

Re-Formatting Finding Aids for Conversion to EAD: Scope Content Notes

Many of our legacy finding aids here at the Kheel Center are either Microsoft Word documents or scans of hard copy documents. Since I use Ead McTaggart to tag container lists these need to be reformatted as Microsoft Excel documents. If a simple copy and paste does not convert the text to columns that can be easily done using the “Text to Columns” function in Excel. What is a little more difficult is when the title and scope/content notes are parse into separate lines rather than separate columns:


A B C
Box Folder Title
1 1 Folder One Title
Folder One Scope Content
1 2 Folder Two Title
Folder Two Scope Content


What I need to do in this case is move the scope/content notes from C3 to D2 and C5 to D4. You can drag and drop, but there is a quicker way: in cell D2 enter this formula:

=If(A3>0,””,C3)
This translates to “If the value in cell A3 is greater than zero (that is it contains a box number) set the value of cell D2 to a zero length string, if the value of cell A3 is not greater than zero (that is it has no box number) set the value of cell D2 equal to the value of cell C3.”

Moving cursor to the handle at the lower right corner of cell D2, changes it from a large white cross to a small black cross. Double clicking will fill the column down with the formula. The column will fill down until it reaches an empty cell in column C. This may, or may not, be the bottom of column C. Using “control, down arrow” to test this, the cell selection will move to the bottom of the column. If that is also the bottom of column C repeat the fill down process until the bottom of column C is reached.

The spreadsheet now looks this:


A B C D
Box Folder Title ScopeContent
1 1 Folder One Title Folder One Scope Content
Folder One Scope Content
1 2 Folder Two Title Folder Two Scope Content
Folder Two Scope Content

But, column D actually contains only formulas, not text. Highlight column D and copy, then right click the column and select “Paste Special”, then select “Values”. I now have text in column D.

At this point I insert a new column, called “Index”, between the Title and ScopeContent columns. Enter “1” into cell D2 and “2” into cell D3, then select both D2 and D3 and fill down. This gives me an index I can use to maintain or re-establish the proper order of the data.


A B C D E
Box Folder Title Index ScopeContent
1 1 Folder One Title 1 Folder One Scope Content


Folder One Scope Content 2
1 2 Folder Two Title 3 Folder Two Scope Content


Folder Two Scope Content 4

Now sort the table on column A. This will group all the lines with no box numbers together at the bottom of the spreadsheet. I can now delete those rows and resort on column D, Index:


A B C D E
Box Folder Title Index ScopeContent
1 1 Folder One Title 1 Folder One Scope Content
1 2 Folder Two Title 3 Folder Two Scope Content

At this point I re-index so that there are no missing numbers in the Index column. Not really necessary, but I do it anyway.

The finding is now formatted in Excel for conversion by Ead McTaggart. Except that the dates are probably part of the title.

Next week: Re-Formatting Finding Aids for Conversion to EAD: Parsing the Date From the Title

Tuesday, March 8, 2011

Authority Control. Well, sort of.

In our publications database we have American publications with the word “labor” in the title. We also have British, Canadian, Australian, and other publications with the word “labour” in the title. The problem was if you included “labor” in your title search you did not get titles with “labour” in them. The same held true for “employees” and “employes.” What I needed was for the database to return both when a researcher entered either.

Some if-then-else logic would probably work so that when you search for “Blah Blah Labor Blah” in the title the search is actually for:
(Title = “Blah Blah Labor Blah”) OR (Title = “Blah Blah Labour Blah”)

The solution I came up with is simpler, though not perfect. I run the search string through a function which replaces both “labor” and “labour” with “labo*r”. The wildcard will then match both “labor” and “labour”.

Public Function Authority(strKey1 As String)

      strKey1 = Replace(strKey1, "labor", "labo*r")
      strKey1 = Replace(strKey1, "labour", "labo*r")
      strKey1 = Replace(strKey1, "employee", "employe*")
      strKey1 = Replace(strKey1, "employe", "employe*")
     Authority = strKey1

End Function