Difference between revisions of "Export DataPerfect"

From reBiND Documentation
Jump to: navigation, search
(additional information)
(special characters)
Line 50: Line 50:
 
* '''Down''' on a field which has references to entries in another table, that referenced entry is displayed
 
* '''Down''' on a field which has references to entries in another table, that referenced entry is displayed
 
* '''Up''' The list of entries in this table where the content of the current field is displayed. Navigate though this list using the '''Up''' and '''Down''' keys. When typing characters this list the focus will jump to the entry whose unique key column is like the typed characters. For example in the table displayed in the image below, typing a number will focus on the entry with that id. The corresponding entry is automatically displayed. To edit this entry, press '''Enter'''.  
 
* '''Up''' The list of entries in this table where the content of the current field is displayed. Navigate though this list using the '''Up''' and '''Down''' keys. When typing characters this list the focus will jump to the entry whose unique key column is like the typed characters. For example in the table displayed in the image below, typing a number will focus on the entry with that id. The corresponding entry is automatically displayed. To edit this entry, press '''Enter'''.  
* '''F7''' Goes to the next upper level in the hierarchy. For example if the list of entries is opened, it will jump back to the  
+
* '''F7''' Goes to the next upper level in the hierarchy. For example if the list of entries is opened, it will jump back to the next upper level in the hierarchy.
 
[[File:DPE10_DIAS_Table1_browse.png|frame|none|Browsing through the first table of DIAS]]
 
[[File:DPE10_DIAS_Table1_browse.png|frame|none|Browsing through the first table of DIAS]]
  
 
=== Select Report ===
 
=== Select Report ===
 +
From the list of available reports select the entry at the top: '''Build-In Short Reports'''.
 
[[File:DPE11_DIAS_Table1_Report_List.png|frame|none|The list of available reports.]]
 
[[File:DPE11_DIAS_Table1_Report_List.png|frame|none|The list of available reports.]]
  
Line 72: Line 73:
  
 
=== Converting special characters ===
 
=== Converting special characters ===
 +
The export out of DataPerfect has problems with special characters which are not converted correctly in a proper character set. Though all of the occurrences of such special characters can be replaced automatically, each of the character to be replaced has to be defined manually one. The files as they are generated by DataPerfect use the ANSI Character Encoding. This needs to be converted to UTF8. Under Windows the free program Notepad++ is very suited for this. When selecting the menu item '''Encoding''' (''Kodierung'' in the image, because it is the German version of Notepad++), the entry ''ANSI'' should be marked as the current encoding. Now click on '''Convert to UTF-8''' and save the file.
 +
[[File:DPE16 Notepad CharSet.png|frame|none|Converting the Character Set in Notepad++]]
 +
The next step is to run the file through a program that converts the characters. The small Java program CharReplacer was written just to do that. There are two different files to that program: CharReplacer.class (the acutual program) and CharReplacer.settings (the settings file, which specifies, what characters to replace. The first line in ''CharReplacer.settings'' are all the characters which will not be replaced. All the following lines have the number value of the character which will be replaced, followed by a tab and the character it will be replaced with.
 +
 +
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",.-; ()?\':/=&[]!`<>#+%
 +
8222 ä
 +
8221 ö
 +
129 ü
 +
225 ß
 +
381 Ä
 +
8482 Ö
 +
353 Ü
 +
Excerpt from the ''CharReplacer.settings'' file.
 +
 +
To run the CharReplacer program, the Java Runtime Envirement (JRE) must be installed on a computer. It must be run via the command line interface (also known as console).
 +
 +
Under Windows the Command Prompt (cmd.exe) can be used, but it needs to be adjusted to show the UTF-8 characters.
 +
 +
''(The following steps displayed in italics only need to be done if the Windows Command Prompt is used.)''
 +
* Copy the files CharReplacer.class and CharReplacer.settings in the folder in which the exported files are (in the example from above it is C:\DOS\DP26Y\DIAS\)
 +
* Start the console.
 +
* Change to the directory in which the files are.
 +
* ''Set the Font for the Command Prompt to '''Lucida Console'''''
 +
* ''Switch the Character Set for the console by typing '''chcp 65001'''''
 +
* run the program by typing '''java -Dfile.encoding=UTF-8 CharReplacer''' followed by the name or names of the files you want to run the program on, e.g. ''java -Dfile.encoding=UTF-8 CharReplacer OUTPUT1.EXP'' and hit Enter.
 +
 +
{|
 +
|[[File:DPE17 Console change Font.png|frame|none|Changing the Font in the Windows Command Prompt (German version)]]
 +
|[[File:DPE18 Console start CharReplacer.png|frame|none|Running the CharReplacer]]
 +
|}
 +
 +
The output of the program will be a file with the name of the input file, but with an additional '''.csv''' extension at the end. So '''OUTPUT1.EXP''' will become '''OUTPUT1.EXP.csv'''.
 +
 +
If the program find characters which are not in the list of allowed characters (the first line of the file) and for which no replacement rule exists, it will not replace this character, but it will show a message, informing the user that an unknown character was found, where it was found (file, line and column), what it looks like in UTF-8 and what its code is. If this occurs it is necessary to review the file at this position and create a new rule on how to handle that character. If the program prints out an unknown character, open the file and look at the given position. If the character is correct and represents precisely what was meant in the original dataset, then add this character at the end of the first line in the settings file. If it is a wrong character however, try to figure out what character it is meant to be. This could be derived from the context in which this unknown character appears. In the example below, the unknown character is displayed as '''†''' and appears in the work '''"R†dhusplassen"'''. People how are familiar with Norwegian might recognize the word as '''"Rådhusplassen"''' (the Norwegian word for "City Hall Place" or "Town Hall Place"). In other cased it could be not so clear. So it becomes necessary to look up the entry in the original DataPerfect table and see, what character was originally entered. In the third image, it can be seen that in this case the unknown character is indeed a '''å'''. So a new line can be added at the end of the settings file with the following content:
 +
8224 å
 +
The 8224 is the code for the character, as shown in the message of the CharReplacer program in the first image.
 +
 +
To edit the ''CharReplacer.settings'' file, use a regular text editor (like Notepad++ under Windows).
 +
 +
{|
 +
|[[File:DPE19 Console run CharReplacer.png‎|frame|none|some unknown characters were found]]
 +
|[[File:DPE20_Notepad_Special_Characters.png‎|frame|none|Viewing the unconverted character in Notepad++]]
 +
|[[File:DPE21 DataPerfect Special Characters.png|frame|none|Viewing the unconverted character in the original data entry in DataPerfect.]]
 +
|}
  
 
=== Documenting Foreign Key Relations ===
 
=== Documenting Foreign Key Relations ===
Line 80: Line 125:
  
 
=== Alternative Ways of exporting data from a DataPerfect database ===
 
=== Alternative Ways of exporting data from a DataPerfect database ===
 +
Another way of exporting DataPerfect files is the [http://dans-dp-lib.sourceforge.net/ DANS DataPerfect Library]. There is a reference implementation of a [http://dans-dp-lib.sourceforge.net/Dp2MySqlExport.java DP2MySQLConverter]. After testing it on small sample databases, it worked fine, but it ran intro problems exporting the DIAS database, causing huge and corrupt output files. For other DataPerfect files it could still be useful, especially since it already handles the special characters correctly.

Revision as of 16:06, 7 March 2012

This article describes how to export data from a DataPerfect database, based on the Rohwer data set.

Data Perfect is a DOS based database system. It's latest release is 2.6Y from June 2008. It can be downloaded via http://dataperfect.nl/ .

Install and run DOSBox

To run DataPerfect a DOS emulator is needed. The free emulator DOSBox works quite well. Go to http://www.dosbox.com/download.php?main=1 or http://sourceforge.net/projects/dosbox/, download the latest release and install it. For this articles DOSBox 0.74 will be used and installed under Windows 7.

Download DataPerfect

Go to http://dataperfect.nl/ and download the DataPerfect. Unzip the files into a specific folder. The folder used for this article is C:/DOS/DP26Y.

Copy your data files in the DataPerfect Folder

You can either copy them directly in the folder or create a subfolder for your data files and copy them in there. The data files for this article are located in the directory DIAS/ within the DataPerfect folder.

Start DOSBox

When starting DOSBox a second console window is opened. When running several instances of DOSBox these additional windows can bloat the window bar quite a bit. To avoid this just start the version DOXBox-0.74/Extras/DOSBox 0.74 (noconsole) from the Program Menu.
File:DPE01 DOSBox.png
The DOSBOX Start Window

Mount the DataPerfect Folder

After starting DOSBox the directory of the DataPerfect files needs to be mounted as a virtual drive. Type mount c C:\DOS\DP27Y to mount the folder and C: to change to the drive.
File:DPE02 DOSBox mount.png
Mounting the DataPerfect Folder

Start DataPerfect

Start DataPerfect by typing DP26YU and pressing Enter.
File:DPE03 DataPerfect start.png
Switch to C: and start DataPerfect
File:DPE04 DataPerfect welcome.png
The Welcome Screen of DataPerfect

Change to the folder of your data files

If the data files are not in the directory of DataPerfect you need to change the directory. Press 2 to change the directory and type in the path to the directory of your data.
File:DPE05 DataPerfect open.png
The initial screen of DataPerfect
File:DPE06 DataPerfect change directory.png
Change to the data directory

Open the data files

If you are in the correct directory you see the DataPerfect datasets within the folder. Use the cursor keys to select the correct data set and press enter.
File:DPE07 DataPerfect select DIAS.png
Selecting the DIAS data set.

Select the table to export

You can now see the tables within this database project. Select the table you want to export by using the up or down keys and pressing enter.

File:DPE08 DIAS Overview.png
The list of available tables.
File:DPE09 DIAS Table1.png
The first entry in the first table of DIAS

Navigate the Table

This these steps are not necessary for the export, but will be documented here anyway. Here are important keys for navigating thought a table:

  • Tab the next field is highlighted
  • Down on a field which has references to entries in another table, that referenced entry is displayed
  • Up The list of entries in this table where the content of the current field is displayed. Navigate though this list using the Up and Down keys. When typing characters this list the focus will jump to the entry whose unique key column is like the typed characters. For example in the table displayed in the image below, typing a number will focus on the entry with that id. The corresponding entry is automatically displayed. To edit this entry, press Enter.
  • F7 Goes to the next upper level in the hierarchy. For example if the list of entries is opened, it will jump back to the next upper level in the hierarchy.
File:DPE10 DIAS Table1 browse.png
Browsing through the first table of DIAS

Select Report

From the list of available reports select the entry at the top: Build-In Short Reports.

File:DPE11 DIAS Table1 Report List.png
The list of available reports.

Set Export Properties

Set the export settings as shown in the first image below. Press the number of the property you want to change. For example, to change the file name of the output file, press 2 and the file options will appear, as can be seen in the second image. Press 1 to create a new file and enter the name of the file (image 3). The filename of the export file must not be longer than 8 characters without file extension and the file extension itself (without the dot) must not be longer than 3 characters.

File:DPE12 DIAS Table1 Export 1.png
Export Settings Overview
File:DPE13 DIAS Table1 Export 2.png
Changing the File Options

Run Export

To start the export press Shift + F7 again. The screen now shows a counter of how many elements from the table have already been exported. DOSBox starts with a limited CPU speed for the programs running in it. The speed can be increased by pressing Ctrl + F12. Increasing the speed to much, however creates a big overhead which will also result in a slower export. For some reason when increasing the speed the CPU load will jump from around 5% of one core to 100% within just 2-3 increase steps. So the CPU load should be watched when increasing the speed of DOSBox. On a DualCore 3 GHz Processor a speed of 50000-55000 cycles (can be seen in the title bar of the DOSBox Window) appears to be a good export speed without overhead.

File:DPE15 DIAS Export.png
The export is running.

After the export is done, you will see the list of available reports again. To step to the next higher level of the hierarchy press F7. Repeat this until you see the list of tables within this database project again. Now repeat the export process for all the other tables you want to export.

Converting special characters

The export out of DataPerfect has problems with special characters which are not converted correctly in a proper character set. Though all of the occurrences of such special characters can be replaced automatically, each of the character to be replaced has to be defined manually one. The files as they are generated by DataPerfect use the ANSI Character Encoding. This needs to be converted to UTF8. Under Windows the free program Notepad++ is very suited for this. When selecting the menu item Encoding (Kodierung in the image, because it is the German version of Notepad++), the entry ANSI should be marked as the current encoding. Now click on Convert to UTF-8 and save the file.

File:DPE16 Notepad CharSet.png
Converting the Character Set in Notepad++

The next step is to run the file through a program that converts the characters. The small Java program CharReplacer was written just to do that. There are two different files to that program: CharReplacer.class (the acutual program) and CharReplacer.settings (the settings file, which specifies, what characters to replace. The first line in CharReplacer.settings are all the characters which will not be replaced. All the following lines have the number value of the character which will be replaced, followed by a tab and the character it will be replaced with.

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",.-; ()?\':/=&[]!`<>#+%
8222	ä
8221	ö
129	ü
225	ß
381	Ä
8482	Ö
353	Ü

Excerpt from the CharReplacer.settings file.

To run the CharReplacer program, the Java Runtime Envirement (JRE) must be installed on a computer. It must be run via the command line interface (also known as console).

Under Windows the Command Prompt (cmd.exe) can be used, but it needs to be adjusted to show the UTF-8 characters.

(The following steps displayed in italics only need to be done if the Windows Command Prompt is used.)

  • Copy the files CharReplacer.class and CharReplacer.settings in the folder in which the exported files are (in the example from above it is C:\DOS\DP26Y\DIAS\)
  • Start the console.
  • Change to the directory in which the files are.
  • Set the Font for the Command Prompt to Lucida Console
  • Switch the Character Set for the console by typing chcp 65001
  • run the program by typing java -Dfile.encoding=UTF-8 CharReplacer followed by the name or names of the files you want to run the program on, e.g. java -Dfile.encoding=UTF-8 CharReplacer OUTPUT1.EXP and hit Enter.
File:DPE17 Console change Font.png
Changing the Font in the Windows Command Prompt (German version)

The output of the program will be a file with the name of the input file, but with an additional .csv extension at the end. So OUTPUT1.EXP will become OUTPUT1.EXP.csv.

If the program find characters which are not in the list of allowed characters (the first line of the file) and for which no replacement rule exists, it will not replace this character, but it will show a message, informing the user that an unknown character was found, where it was found (file, line and column), what it looks like in UTF-8 and what its code is. If this occurs it is necessary to review the file at this position and create a new rule on how to handle that character. If the program prints out an unknown character, open the file and look at the given position. If the character is correct and represents precisely what was meant in the original dataset, then add this character at the end of the first line in the settings file. If it is a wrong character however, try to figure out what character it is meant to be. This could be derived from the context in which this unknown character appears. In the example below, the unknown character is displayed as and appears in the work "R†dhusplassen". People how are familiar with Norwegian might recognize the word as "Rådhusplassen" (the Norwegian word for "City Hall Place" or "Town Hall Place"). In other cased it could be not so clear. So it becomes necessary to look up the entry in the original DataPerfect table and see, what character was originally entered. In the third image, it can be seen that in this case the unknown character is indeed a å. So a new line can be added at the end of the settings file with the following content:

8224	å

The 8224 is the code for the character, as shown in the message of the CharReplacer program in the first image.

To edit the CharReplacer.settings file, use a regular text editor (like Notepad++ under Windows).

File:DPE19 Console run CharReplacer.png
some unknown characters were found
File:DPE20 Notepad Special Characters.png
Viewing the unconverted character in Notepad++
File:DPE21 DataPerfect Special Characters.png
Viewing the unconverted character in the original data entry in DataPerfect.

Documenting Foreign Key Relations

Importing into a modern relational database

Next Steps

Alternative Ways of exporting data from a DataPerfect database

Another way of exporting DataPerfect files is the DANS DataPerfect Library. There is a reference implementation of a DP2MySQLConverter. After testing it on small sample databases, it worked fine, but it ran intro problems exporting the DIAS database, causing huge and corrupt output files. For other DataPerfect files it could still be useful, especially since it already handles the special characters correctly.