We have a FileMaker Pro database as our reference database that is about to be retired. I exported the data from the database in order to see what was there and how it could be handled in order to get it into another database. What I found when I opened it up in Microsoft Excel and Microsoft Access was that some of the fields contained line breaks that show up as little boxes with arrows. I needed to either hide or get rid of these visible line breaks. Neither Excel or Access would find these using the Find function or the Find and Replace function, even using “^l” as the search term. The question was: How to get rid of those line breaks.
My solution was to open the file in Excel, then save it as a csv file. The csv file was then opened in Microsoft Word, which will find the line breaks as “^l”. I did a Find and Replace for all the line breaks, replacing them with a unique character so that I could later parse the data, putting the data from the separate lines into separate fields. In searching the data I found that any character I could both use and later parse on was already being used. I finally settle on the tilde “~” since it was used in the data to signify an approximate date. First I replaced the tilde with “ca.” then I replaced the line breaks with tildes. The file was then saved as a text file and reopened in Excel. No line breaks and ready to parse.
Tips, code, and information about Microsoft Access, VBA, Microsoft Excel, and EAD for archivists with limited programming experience.
Showing posts with label Microsoft Word. Show all posts
Showing posts with label Microsoft Word. Show all posts
Friday, February 10, 2012
Monday, March 14, 2011
Re-Formatting Finding Aids for Conversion to EAD: Scope Content Notes
Many of our legacy finding aids here at the Kheel Center are either Microsoft Word documents or scans of hard copy documents. Since I use Ead McTaggart to tag container lists these need to be reformatted as Microsoft Excel documents. If a simple copy and paste does not convert the text to columns that can be easily done using the “Text to Columns” function in Excel. What is a little more difficult is when the title and scope/content notes are parse into separate lines rather than separate columns:
What I need to do in this case is move the scope/content notes from C3 to D2 and C5 to D4. You can drag and drop, but there is a quicker way: in cell D2 enter this formula:
=If(A3>0,””,C3)
This translates to “If the value in cell A3 is greater than zero (that is it contains a box number) set the value of cell D2 to a zero length string, if the value of cell A3 is not greater than zero (that is it has no box number) set the value of cell D2 equal to the value of cell C3.”
Moving cursor to the handle at the lower right corner of cell D2, changes it from a large white cross to a small black cross. Double clicking will fill the column down with the formula. The column will fill down until it reaches an empty cell in column C. This may, or may not, be the bottom of column C. Using “control, down arrow” to test this, the cell selection will move to the bottom of the column. If that is also the bottom of column C repeat the fill down process until the bottom of column C is reached.
The spreadsheet now looks this:
But, column D actually contains only formulas, not text. Highlight column D and copy, then right click the column and select “Paste Special”, then select “Values”. I now have text in column D.
At this point I insert a new column, called “Index”, between the Title and ScopeContent columns. Enter “1” into cell D2 and “2” into cell D3, then select both D2 and D3 and fill down. This gives me an index I can use to maintain or re-establish the proper order of the data.
Now sort the table on column A. This will group all the lines with no box numbers together at the bottom of the spreadsheet. I can now delete those rows and resort on column D, Index:
At this point I re-index so that there are no missing numbers in the Index column. Not really necessary, but I do it anyway.
The finding is now formatted in Excel for conversion by Ead McTaggart. Except that the dates are probably part of the title.
Next week: Re-Formatting Finding Aids for Conversion to EAD: Parsing the Date From the Title
| A | B | C |
| Box | Folder | Title |
| 1 | 1 | Folder One Title |
| Folder One Scope Content | ||
| 1 | 2 | Folder Two Title |
| Folder Two Scope Content |
What I need to do in this case is move the scope/content notes from C3 to D2 and C5 to D4. You can drag and drop, but there is a quicker way: in cell D2 enter this formula:
=If(A3>0,””,C3)
This translates to “If the value in cell A3 is greater than zero (that is it contains a box number) set the value of cell D2 to a zero length string, if the value of cell A3 is not greater than zero (that is it has no box number) set the value of cell D2 equal to the value of cell C3.”
Moving cursor to the handle at the lower right corner of cell D2, changes it from a large white cross to a small black cross. Double clicking will fill the column down with the formula. The column will fill down until it reaches an empty cell in column C. This may, or may not, be the bottom of column C. Using “control, down arrow” to test this, the cell selection will move to the bottom of the column. If that is also the bottom of column C repeat the fill down process until the bottom of column C is reached.
The spreadsheet now looks this:
| A | B | C | D |
| Box | Folder | Title | ScopeContent |
| 1 | 1 | Folder One Title | Folder One Scope Content |
| Folder One Scope Content | |||
| 1 | 2 | Folder Two Title | Folder Two Scope Content |
| Folder Two Scope Content |
But, column D actually contains only formulas, not text. Highlight column D and copy, then right click the column and select “Paste Special”, then select “Values”. I now have text in column D.
At this point I insert a new column, called “Index”, between the Title and ScopeContent columns. Enter “1” into cell D2 and “2” into cell D3, then select both D2 and D3 and fill down. This gives me an index I can use to maintain or re-establish the proper order of the data.
| A | B | C | D | E |
| Box | Folder | Title | Index | ScopeContent |
| 1 | 1 | Folder One Title | 1 | Folder One Scope Content |
| Folder One Scope Content | 2 | |||
| 1 | 2 | Folder Two Title | 3 | Folder Two Scope Content |
| Folder Two Scope Content | 4 |
Now sort the table on column A. This will group all the lines with no box numbers together at the bottom of the spreadsheet. I can now delete those rows and resort on column D, Index:
| A | B | C | D | E |
| Box | Folder | Title | Index | ScopeContent |
| 1 | 1 | Folder One Title | 1 | Folder One Scope Content |
| 1 | 2 | Folder Two Title | 3 | Folder Two Scope Content |
At this point I re-index so that there are no missing numbers in the Index column. Not really necessary, but I do it anyway.
The finding is now formatted in Excel for conversion by Ead McTaggart. Except that the dates are probably part of the title.
Next week: Re-Formatting Finding Aids for Conversion to EAD: Parsing the Date From the Title
Subscribe to:
Posts (Atom)