Showing posts with label Microsoft Excel. Show all posts
Showing posts with label Microsoft Excel. Show all posts

Thursday, September 13, 2012

Filling in folder numbers in an MS Excel container list.


Here at the Kheel Center we have quite a few legacy container lists in Excel spreadsheets that list the folder titles for each box, without giving the folder numbers.

A
B
Box
Folder Title
1
Folder 1
1
Folder 2
1
Folder 3
2
Folder 4
2
Folder 5
3
Folder 6
3
Folder 7


Before converting the container list to EAD I wanted to insert the folder numbers. I was able to do so using a fairly simple If-Then-Else statement. First insert a new column between the Box and Folder Title columns.

A
B
C
Box
Folder
Folder Title
1

Folder 1
1

Folder 2
1

Folder 3
2

Folder 4
2

Folder 5
3

Folder 6
3

Folder 7

In the first cell of the Folder column, cell B2, enter the following formula:

=IF(A2-A1=1,1,B1+1)

What this does is subtract A1 from B2, which will give you either a 1 or a 0. If the value is 1 then the current row is the first folder of the box. If the value is 0 then the current folder is the next folder in the box and should have a folder number that is one greater than the one above it. When you have the formula done double-click on the little handle at the lower right corner of the cell. This is the fill-down function and will run the formula down the entire column.

Note: this formula will give an error for all the folders in the first box: you are trying to subtract text (“Box”) from a number in the first cell.

A
B
C
Box
Folder
Folder Title
1
#VALUE!
Folder 1
1
#VALUE!
Folder 2
1
#VALUE!
Folder 3
2
1
Folder 4
2
2
Folder 5
3
1
Folder 6
3
2
Folder 7

To correct the error simply temporarily replace the “Box” in A1 with a zero.

A
B
C
0
Folder
Folder Title
1
1
Folder 1
1
2
Folder 2
1
3
Folder 3
2
1
Folder 4
2
2
Folder 5
3
1
Folder 6
3
2
Folder 7

Select column B and copy, then Paste Special=>Values right over it. This will convert the formulas in column B to the values shown. After that you can change cell A1 back to “Box”.

You now have a container list with both box and folder numbers.

Friday, February 10, 2012

Line breaks in FileMaker Pro exports

We have a FileMaker Pro database as our reference database that is about to be retired. I exported the data from the database in order to see what was there and how it could be handled in order to get it into another database. What I found when I opened it up in Microsoft Excel and Microsoft Access was that some of the fields contained line breaks that show up as little boxes with arrows. I needed to either hide or get rid of these visible line breaks. Neither Excel or Access would find these using the Find function or the Find and Replace function, even using “^l” as the search term. The question was: How to get rid of those line breaks.

My solution was to open the file in Excel, then save it as a csv file. The csv file was then opened in Microsoft Word, which will find the line breaks as “^l”. I did a Find and Replace for all the line breaks, replacing them with a unique character so that I could later parse the data, putting the data from the separate lines into separate fields. In searching the data I found that any character I could both use and later parse on was already being used. I finally settle on the tilde “~” since it was used in the data to signify an approximate date. First I replaced the tilde with “ca.” then I replaced the line breaks with tildes. The file was then saved as a text file and reopened in Excel. No line breaks and ready to parse.

Friday, March 25, 2011

Re-Formatting Finding Aids for Conversion to EAD: Parsing the Date From the Title

Quite often our legacy finding aids have the date at the end of the title field, rather than as in a separate data field. For Ead McTaggart to properly format the EAD guides, I need the year, or range of years to be in a separate date field. This means I need a way to move the year from the title field to the date field without copying and pasting hundreds, often thousands, of times per finding aid. I automated this process using an if-then-else statement.

Fortunately the dates in most of our findings conform to a few variations of a few formats: they end in a single four-digit year (i.e. 1984), year range (i.e. 1920-1932 or 1920-32); or have the elements of the date separated by slashes (i.e. 3/25/1911 or 3/25/11).

If we assume the first title is in cell C2 then if the date ends in a four-digit year I just want the last four digits. To determine if this is the case the logic test of the if-then-else statement looks to see if the fifth character from the right is a space or a slash:

= IF((MID(C2,LEN(C2)-4,1)= " ")+(MID(C2,LEN(C2)-4,1)= "/"),RIGHT(C2,4),…)

If the date looks like 1920-1932 then the fifth character from the right is a dash and I want the last nine characters:

=IF(MID(C2,LEN(C2)-4,1)= "-",RIGHT(C2,9),…)

But if the date looks like 1920-32 then the third character from the right is a dash and I not only want the last seven characters, but I also want to insert “19” immediately after the dash:

=IF(MID(C2,LEN(C2)-2,1)="-",MID(C2,LEN(C2)-6,5)&"19"&RIGHT(C2,2),…)

That just leaves the dates separated by slashes with only the last two digits of the year. In this case the third character from the right is a slash and I want the last two characters with “19” inserted in front of them:

=IF(MID(C2,LEN(C2)-2,1)="/","19"&RIGHT(C2,2),…)


This last test will actually have to be the first test for those times when the date is just the month and the year (i.e. 5/14), otherwise the formula will see that the fifth character from the right is a space and pull the last four characters before it sees that the third character from the right is a slash.

And if none of the above conditions are met then there is no date at the end of the title and I want nothing in the title field so I set the final false value to a zero length string: "".

Here is the complete formula:

= IF(MID(C2,LEN(C2)-2,1)="/","19"&RIGHT(C2,2), IF((MID(C2,LEN(C2)-4,1)= " ")+(MID(C2,LEN(C2)-4,1)= "/"),RIGHT(C2,4),IF(MID(C2,LEN(C2)-4,1)= "-",RIGHT(C2,9),IF(MID(C2,LEN(C2)-2,1)="-",MID(C2,LEN(C2)-6,5)&"19"&RIGHT(C2,2), ""))))

There will, of course be some outliers: oddly formatted dates, and so on. Still this formula will work for the vast majority of the dates I see in our finding aids. Proof-reading picks up the outliers.

Monday, March 14, 2011

Re-Formatting Finding Aids for Conversion to EAD: Scope Content Notes

Many of our legacy finding aids here at the Kheel Center are either Microsoft Word documents or scans of hard copy documents. Since I use Ead McTaggart to tag container lists these need to be reformatted as Microsoft Excel documents. If a simple copy and paste does not convert the text to columns that can be easily done using the “Text to Columns” function in Excel. What is a little more difficult is when the title and scope/content notes are parse into separate lines rather than separate columns:


A B C
Box Folder Title
1 1 Folder One Title
Folder One Scope Content
1 2 Folder Two Title
Folder Two Scope Content


What I need to do in this case is move the scope/content notes from C3 to D2 and C5 to D4. You can drag and drop, but there is a quicker way: in cell D2 enter this formula:

=If(A3>0,””,C3)
This translates to “If the value in cell A3 is greater than zero (that is it contains a box number) set the value of cell D2 to a zero length string, if the value of cell A3 is not greater than zero (that is it has no box number) set the value of cell D2 equal to the value of cell C3.”

Moving cursor to the handle at the lower right corner of cell D2, changes it from a large white cross to a small black cross. Double clicking will fill the column down with the formula. The column will fill down until it reaches an empty cell in column C. This may, or may not, be the bottom of column C. Using “control, down arrow” to test this, the cell selection will move to the bottom of the column. If that is also the bottom of column C repeat the fill down process until the bottom of column C is reached.

The spreadsheet now looks this:


A B C D
Box Folder Title ScopeContent
1 1 Folder One Title Folder One Scope Content
Folder One Scope Content
1 2 Folder Two Title Folder Two Scope Content
Folder Two Scope Content

But, column D actually contains only formulas, not text. Highlight column D and copy, then right click the column and select “Paste Special”, then select “Values”. I now have text in column D.

At this point I insert a new column, called “Index”, between the Title and ScopeContent columns. Enter “1” into cell D2 and “2” into cell D3, then select both D2 and D3 and fill down. This gives me an index I can use to maintain or re-establish the proper order of the data.


A B C D E
Box Folder Title Index ScopeContent
1 1 Folder One Title 1 Folder One Scope Content


Folder One Scope Content 2
1 2 Folder Two Title 3 Folder Two Scope Content


Folder Two Scope Content 4

Now sort the table on column A. This will group all the lines with no box numbers together at the bottom of the spreadsheet. I can now delete those rows and resort on column D, Index:


A B C D E
Box Folder Title Index ScopeContent
1 1 Folder One Title 1 Folder One Scope Content
1 2 Folder Two Title 3 Folder Two Scope Content

At this point I re-index so that there are no missing numbers in the Index column. Not really necessary, but I do it anyway.

The finding is now formatted in Excel for conversion by Ead McTaggart. Except that the dates are probably part of the title.

Next week: Re-Formatting Finding Aids for Conversion to EAD: Parsing the Date From the Title