Difference between revisions of "Export DataPerfect"
(additional information) |
(special characters) |
||
Line 50: | Line 50: | ||
* '''Down''' on a field which has references to entries in another table, that referenced entry is displayed | * '''Down''' on a field which has references to entries in another table, that referenced entry is displayed | ||
* '''Up''' The list of entries in this table where the content of the current field is displayed. Navigate though this list using the '''Up''' and '''Down''' keys. When typing characters this list the focus will jump to the entry whose unique key column is like the typed characters. For example in the table displayed in the image below, typing a number will focus on the entry with that id. The corresponding entry is automatically displayed. To edit this entry, press '''Enter'''. | * '''Up''' The list of entries in this table where the content of the current field is displayed. Navigate though this list using the '''Up''' and '''Down''' keys. When typing characters this list the focus will jump to the entry whose unique key column is like the typed characters. For example in the table displayed in the image below, typing a number will focus on the entry with that id. The corresponding entry is automatically displayed. To edit this entry, press '''Enter'''. | ||
− | * '''F7''' Goes to the next upper level in the hierarchy. For example if the list of entries is opened, it will jump back to the | + | * '''F7''' Goes to the next upper level in the hierarchy. For example if the list of entries is opened, it will jump back to the next upper level in the hierarchy. |
[[File:DPE10_DIAS_Table1_browse.png|frame|none|Browsing through the first table of DIAS]] | [[File:DPE10_DIAS_Table1_browse.png|frame|none|Browsing through the first table of DIAS]] | ||
=== Select Report === | === Select Report === | ||
+ | From the list of available reports select the entry at the top: '''Build-In Short Reports'''. | ||
[[File:DPE11_DIAS_Table1_Report_List.png|frame|none|The list of available reports.]] | [[File:DPE11_DIAS_Table1_Report_List.png|frame|none|The list of available reports.]] | ||
Line 72: | Line 73: | ||
=== Converting special characters === | === Converting special characters === | ||
+ | The export out of DataPerfect has problems with special characters which are not converted correctly in a proper character set. Though all of the occurrences of such special characters can be replaced automatically, each of the character to be replaced has to be defined manually one. The files as they are generated by DataPerfect use the ANSI Character Encoding. This needs to be converted to UTF8. Under Windows the free program Notepad++ is very suited for this. When selecting the menu item '''Encoding''' (''Kodierung'' in the image, because it is the German version of Notepad++), the entry ''ANSI'' should be marked as the current encoding. Now click on '''Convert to UTF-8''' and save the file. | ||
+ | [[File:DPE16 Notepad CharSet.png|frame|none|Converting the Character Set in Notepad++]] | ||
+ | The next step is to run the file through a program that converts the characters. The small Java program CharReplacer was written just to do that. There are two different files to that program: CharReplacer.class (the acutual program) and CharReplacer.settings (the settings file, which specifies, what characters to replace. The first line in ''CharReplacer.settings'' are all the characters which will not be replaced. All the following lines have the number value of the character which will be replaced, followed by a tab and the character it will be replaced with. | ||
+ | |||
+ | abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",.-; ()?\':/=&[]!`<>#+% | ||
+ | 8222 ä | ||
+ | 8221 ö | ||
+ | 129 ü | ||
+ | 225 ß | ||
+ | 381 Ä | ||
+ | 8482 Ö | ||
+ | 353 Ü | ||
+ | Excerpt from the ''CharReplacer.settings'' file. | ||
+ | |||
+ | To run the CharReplacer program, the Java Runtime Envirement (JRE) must be installed on a computer. It must be run via the command line interface (also known as console). | ||
+ | |||
+ | Under Windows the Command Prompt (cmd.exe) can be used, but it needs to be adjusted to show the UTF-8 characters. | ||
+ | |||
+ | ''(The following steps displayed in italics only need to be done if the Windows Command Prompt is used.)'' | ||
+ | * Copy the files CharReplacer.class and CharReplacer.settings in the folder in which the exported files are (in the example from above it is C:\DOS\DP26Y\DIAS\) | ||
+ | * Start the console. | ||
+ | * Change to the directory in which the files are. | ||
+ | * ''Set the Font for the Command Prompt to '''Lucida Console''''' | ||
+ | * ''Switch the Character Set for the console by typing '''chcp 65001''''' | ||
+ | * run the program by typing '''java -Dfile.encoding=UTF-8 CharReplacer''' followed by the name or names of the files you want to run the program on, e.g. ''java -Dfile.encoding=UTF-8 CharReplacer OUTPUT1.EXP'' and hit Enter. | ||
+ | |||
+ | {| | ||
+ | |[[File:DPE17 Console change Font.png|frame|none|Changing the Font in the Windows Command Prompt (German version)]] | ||
+ | |[[File:DPE18 Console start CharReplacer.png|frame|none|Running the CharReplacer]] | ||
+ | |} | ||
+ | |||
+ | The output of the program will be a file with the name of the input file, but with an additional '''.csv''' extension at the end. So '''OUTPUT1.EXP''' will become '''OUTPUT1.EXP.csv'''. | ||
+ | |||
+ | If the program find characters which are not in the list of allowed characters (the first line of the file) and for which no replacement rule exists, it will not replace this character, but it will show a message, informing the user that an unknown character was found, where it was found (file, line and column), what it looks like in UTF-8 and what its code is. If this occurs it is necessary to review the file at this position and create a new rule on how to handle that character. If the program prints out an unknown character, open the file and look at the given position. If the character is correct and represents precisely what was meant in the original dataset, then add this character at the end of the first line in the settings file. If it is a wrong character however, try to figure out what character it is meant to be. This could be derived from the context in which this unknown character appears. In the example below, the unknown character is displayed as '''†''' and appears in the work '''"R†dhusplassen"'''. People how are familiar with Norwegian might recognize the word as '''"Rådhusplassen"''' (the Norwegian word for "City Hall Place" or "Town Hall Place"). In other cased it could be not so clear. So it becomes necessary to look up the entry in the original DataPerfect table and see, what character was originally entered. In the third image, it can be seen that in this case the unknown character is indeed a '''å'''. So a new line can be added at the end of the settings file with the following content: | ||
+ | 8224 å | ||
+ | The 8224 is the code for the character, as shown in the message of the CharReplacer program in the first image. | ||
+ | |||
+ | To edit the ''CharReplacer.settings'' file, use a regular text editor (like Notepad++ under Windows). | ||
+ | |||
+ | {| | ||
+ | |[[File:DPE19 Console run CharReplacer.png|frame|none|some unknown characters were found]] | ||
+ | |[[File:DPE20_Notepad_Special_Characters.png|frame|none|Viewing the unconverted character in Notepad++]] | ||
+ | |[[File:DPE21 DataPerfect Special Characters.png|frame|none|Viewing the unconverted character in the original data entry in DataPerfect.]] | ||
+ | |} | ||
=== Documenting Foreign Key Relations === | === Documenting Foreign Key Relations === | ||
Line 80: | Line 125: | ||
=== Alternative Ways of exporting data from a DataPerfect database === | === Alternative Ways of exporting data from a DataPerfect database === | ||
+ | Another way of exporting DataPerfect files is the [http://dans-dp-lib.sourceforge.net/ DANS DataPerfect Library]. There is a reference implementation of a [http://dans-dp-lib.sourceforge.net/Dp2MySqlExport.java DP2MySQLConverter]. After testing it on small sample databases, it worked fine, but it ran intro problems exporting the DIAS database, causing huge and corrupt output files. For other DataPerfect files it could still be useful, especially since it already handles the special characters correctly. |
Revision as of 16:06, 7 March 2012
This article describes how to export data from a DataPerfect database, based on the Rohwer data set.
Data Perfect is a DOS based database system. It's latest release is 2.6Y from June 2008. It can be downloaded via http://dataperfect.nl/ .
Contents
- 1 Install and run DOSBox
- 2 Download DataPerfect
- 3 Copy your data files in the DataPerfect Folder
- 4 Start DOSBox
- 5 Mount the DataPerfect Folder
- 6 Start DataPerfect
- 7 Change to the folder of your data files
- 8 Open the data files
- 9 Select the table to export
- 10 Navigate the Table
- 11 Select Report
- 12 Set Export Properties
- 13 Run Export
- 14 Converting special characters
- 15 Documenting Foreign Key Relations
- 16 Importing into a modern relational database
- 17 Next Steps
- 18 Alternative Ways of exporting data from a DataPerfect database
Install and run DOSBox
- To run DataPerfect a DOS emulator is needed. The free emulator DOSBox works quite well. Go to http://www.dosbox.com/download.php?main=1 or http://sourceforge.net/projects/dosbox/, download the latest release and install it. For this articles DOSBox 0.74 will be used and installed under Windows 7.
Download DataPerfect
- Go to http://dataperfect.nl/ and download the DataPerfect. Unzip the files into a specific folder. The folder used for this article is C:/DOS/DP26Y.
Copy your data files in the DataPerfect Folder
- You can either copy them directly in the folder or create a subfolder for your data files and copy them in there. The data files for this article are located in the directory DIAS/ within the DataPerfect folder.
Start DOSBox
- When starting DOSBox a second console window is opened. When running several instances of DOSBox these additional windows can bloat the window bar quite a bit. To avoid this just start the version DOXBox-0.74/Extras/DOSBox 0.74 (noconsole) from the Program Menu.
Mount the DataPerfect Folder
- After starting DOSBox the directory of the DataPerfect files needs to be mounted as a virtual drive. Type mount c C:\DOS\DP27Y to mount the folder and C: to change to the drive.
Start DataPerfect
- Start DataPerfect by typing DP26YU and pressing Enter.
Change to the folder of your data files
- If the data files are not in the directory of DataPerfect you need to change the directory. Press 2 to change the directory and type in the path to the directory of your data.
Open the data files
- If you are in the correct directory you see the DataPerfect datasets within the folder. Use the cursor keys to select the correct data set and press enter.
Select the table to export
You can now see the tables within this database project. Select the table you want to export by using the up or down keys and pressing enter.
This these steps are not necessary for the export, but will be documented here anyway. Here are important keys for navigating thought a table:
- Tab the next field is highlighted
- Down on a field which has references to entries in another table, that referenced entry is displayed
- Up The list of entries in this table where the content of the current field is displayed. Navigate though this list using the Up and Down keys. When typing characters this list the focus will jump to the entry whose unique key column is like the typed characters. For example in the table displayed in the image below, typing a number will focus on the entry with that id. The corresponding entry is automatically displayed. To edit this entry, press Enter.
- F7 Goes to the next upper level in the hierarchy. For example if the list of entries is opened, it will jump back to the next upper level in the hierarchy.
Select Report
From the list of available reports select the entry at the top: Build-In Short Reports.
Set Export Properties
Set the export settings as shown in the first image below. Press the number of the property you want to change. For example, to change the file name of the output file, press 2 and the file options will appear, as can be seen in the second image. Press 1 to create a new file and enter the name of the file (image 3). The filename of the export file must not be longer than 8 characters without file extension and the file extension itself (without the dot) must not be longer than 3 characters.
Run Export
To start the export press Shift + F7 again. The screen now shows a counter of how many elements from the table have already been exported. DOSBox starts with a limited CPU speed for the programs running in it. The speed can be increased by pressing Ctrl + F12. Increasing the speed to much, however creates a big overhead which will also result in a slower export. For some reason when increasing the speed the CPU load will jump from around 5% of one core to 100% within just 2-3 increase steps. So the CPU load should be watched when increasing the speed of DOSBox. On a DualCore 3 GHz Processor a speed of 50000-55000 cycles (can be seen in the title bar of the DOSBox Window) appears to be a good export speed without overhead.
After the export is done, you will see the list of available reports again. To step to the next higher level of the hierarchy press F7. Repeat this until you see the list of tables within this database project again. Now repeat the export process for all the other tables you want to export.
Converting special characters
The export out of DataPerfect has problems with special characters which are not converted correctly in a proper character set. Though all of the occurrences of such special characters can be replaced automatically, each of the character to be replaced has to be defined manually one. The files as they are generated by DataPerfect use the ANSI Character Encoding. This needs to be converted to UTF8. Under Windows the free program Notepad++ is very suited for this. When selecting the menu item Encoding (Kodierung in the image, because it is the German version of Notepad++), the entry ANSI should be marked as the current encoding. Now click on Convert to UTF-8 and save the file.
The next step is to run the file through a program that converts the characters. The small Java program CharReplacer was written just to do that. There are two different files to that program: CharReplacer.class (the acutual program) and CharReplacer.settings (the settings file, which specifies, what characters to replace. The first line in CharReplacer.settings are all the characters which will not be replaced. All the following lines have the number value of the character which will be replaced, followed by a tab and the character it will be replaced with.
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",.-; ()?\':/=&[]!`<>#+% 8222 ä 8221 ö 129 ü 225 ß 381 Ä 8482 Ö 353 Ü
Excerpt from the CharReplacer.settings file.
To run the CharReplacer program, the Java Runtime Envirement (JRE) must be installed on a computer. It must be run via the command line interface (also known as console).
Under Windows the Command Prompt (cmd.exe) can be used, but it needs to be adjusted to show the UTF-8 characters.
(The following steps displayed in italics only need to be done if the Windows Command Prompt is used.)
- Copy the files CharReplacer.class and CharReplacer.settings in the folder in which the exported files are (in the example from above it is C:\DOS\DP26Y\DIAS\)
- Start the console.
- Change to the directory in which the files are.
- Set the Font for the Command Prompt to Lucida Console
- Switch the Character Set for the console by typing chcp 65001
- run the program by typing java -Dfile.encoding=UTF-8 CharReplacer followed by the name or names of the files you want to run the program on, e.g. java -Dfile.encoding=UTF-8 CharReplacer OUTPUT1.EXP and hit Enter.
The output of the program will be a file with the name of the input file, but with an additional .csv extension at the end. So OUTPUT1.EXP will become OUTPUT1.EXP.csv.
If the program find characters which are not in the list of allowed characters (the first line of the file) and for which no replacement rule exists, it will not replace this character, but it will show a message, informing the user that an unknown character was found, where it was found (file, line and column), what it looks like in UTF-8 and what its code is. If this occurs it is necessary to review the file at this position and create a new rule on how to handle that character. If the program prints out an unknown character, open the file and look at the given position. If the character is correct and represents precisely what was meant in the original dataset, then add this character at the end of the first line in the settings file. If it is a wrong character however, try to figure out what character it is meant to be. This could be derived from the context in which this unknown character appears. In the example below, the unknown character is displayed as † and appears in the work "R†dhusplassen". People how are familiar with Norwegian might recognize the word as "Rådhusplassen" (the Norwegian word for "City Hall Place" or "Town Hall Place"). In other cased it could be not so clear. So it becomes necessary to look up the entry in the original DataPerfect table and see, what character was originally entered. In the third image, it can be seen that in this case the unknown character is indeed a å. So a new line can be added at the end of the settings file with the following content:
8224 å
The 8224 is the code for the character, as shown in the message of the CharReplacer program in the first image.
To edit the CharReplacer.settings file, use a regular text editor (like Notepad++ under Windows).
Documenting Foreign Key Relations
Importing into a modern relational database
Next Steps
Alternative Ways of exporting data from a DataPerfect database
Another way of exporting DataPerfect files is the DANS DataPerfect Library. There is a reference implementation of a DP2MySQLConverter. After testing it on small sample databases, it worked fine, but it ran intro problems exporting the DIAS database, causing huge and corrupt output files. For other DataPerfect files it could still be useful, especially since it already handles the special characters correctly.