Difference between revisions of "Export Paradox Data"
(complete script added) |
(description and images) |
||
Line 1: | Line 1: | ||
− | This article describes how to export data from Paradox Database files, based on the data set provided by the Museum | + | This article describes how to export data from Paradox Database files, based on the data set provided by the Museum Koenig. |
== Introduction == | == Introduction == | ||
+ | |||
+ | We received three dataset with data in the Paradox database format and were asked if we could extract the data and convert it to SQL. | ||
+ | Paradox is a file based database system. Each table has at least two (usually more) files with the same based name and different file ending. | ||
Further Reading | Further Reading | ||
Line 7: | Line 10: | ||
* [http://www.sundialservices.com/articles/paradox/understanding_filesystem_databases Understanding File-System Databases] | * [http://www.sundialservices.com/articles/paradox/understanding_filesystem_databases Understanding File-System Databases] | ||
− | == Viewing | + | == Viewing The Data == |
− | + | In order to view the data we first used the software [http://www.sportamok.com/development/delphi/8-paradox-dbase-reader Paradox dBase Reader]. Though all the structural information of the table was clearly visible, all the content was only displayed as garbage (Image 1). As it turns out, the files were password protected, the Paradox dBase Reader actually showed that they were protected in the general information page of the table (the checked checkbox at the bottom of the second image) | |
− | + | {| | |
− | Paradox 5.0 and 7.0 | + | |[[File:Paradox dBase Reader with Password Protected Data.png|frame|none|Viewing Password Encrypted Files in Paradox dbase Reader]] |
− | Paradox 4.0 | + | |[[File:Paradox dBase Reader with Password Protected Data 2.png|frame|none|Information about the file in Paradox dbase Reader]] |
+ | |} | ||
+ | |||
+ | We then used the software [http://www.scalabium.com/pdx/ Paradox Viewer]. It promoted a dialog box asking to input the password. All encrypted Paradox files have a default password, depending on the version of Paradox used. For Paradox 5.0 and 7.0 this is either <code>jIGGAe</code> or <code>cupcdvum</code> and for Paradox 4.0 it is <code>nx66ppx</code>. The files were saved by Paradox 7 (as can be see on the general information page of the Paradox dBase Reader in the image above). In our case <code>jIGGAe</code> was the correct password. Now the content was displayed correctly. However in the unregistered version Paradox Viewer only shows the first 100 entries and has no possibility to remove the password protection. | ||
== Removing Password Protection == | == Removing Password Protection == | ||
− | + | The [http://www.mitec.cz/pde.html Paradox Data Editor] offers the functionality of removing the password protection from a table. It also opens the encrypted tables without even prompting for a password. In properties panel on the left, it still shows that the file is protected. Above you can also see the entry "Code Page: 437" which will become important later on. To remove the password protection, click on the key icon in the toolbar. | |
− | + | [[File:Paradox Data Editor.png|frame|none|Data already decrypted in Paradox Data Editor]] | |
− | |||
== Exporting the data == | == Exporting the data == | ||
+ | To export the data from the Paradox files, we used the software [http://jan.kneschke.de/projects/pxtools/ PXTools]. It runs under Linux but first it has be compiled and installed. | ||
− | === Installing | + | === Installing PXTools === |
− | * | + | After downloading the *.tar.gz file from the homepage of PXTools, run the following commands to install the software. |
<syntaxhighlight lang="bash" enclose="pre"> | <syntaxhighlight lang="bash" enclose="pre"> | ||
Line 30: | Line 36: | ||
sudo make install | sudo make install | ||
sudo make clean | sudo make clean | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | To see information about a table use the command | ||
+ | <syntaxhighlight lang="bash" enclose="pre"> | ||
pxinfo -f COLEOPTE_DATA.DB | pxinfo -f COLEOPTE_DATA.DB | ||
− | |||
− | |||
</syntaxhighlight> | </syntaxhighlight> | ||
− | + | === Using PXTools === | |
− | + | To export the data from one table into the mysql file use the command | |
− | |||
− | |||
− | |||
<syntaxhighlight lang="bash" enclose="pre"> | <syntaxhighlight lang="bash" enclose="pre"> | ||
− | + | pxsqldump -d mysql -f COLEOPTE_DATA.DB -b COLEOPTE_DATA.MB -d coleopte_data> COLEOPTE_DATA.sql | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | pxsqldump -d mysql -f | ||
</syntaxhighlight> | </syntaxhighlight> | ||
− | + | PXTools support export to CSV, XML, PostgreSQL and mySQL. In this example we will be exporting the data to mySQL. | |
− | |||
− | |||
− | |||
− | |||
+ | === Using PXTools With A Script === | ||
+ | Calling the tool as shown above, means that the command has to be adjusted and executed for each table individually. A more convenient way is use a script that will run for all of the tables automatically. The following code does just this as well as a few more things, which will be explained later on. | ||
<syntaxhighlight lang="bash" enclose="pre"> | <syntaxhighlight lang="bash" enclose="pre"> | ||
Line 111: | Line 106: | ||
mv $1.sql.tmp ../../$1.sql | mv $1.sql.tmp ../../$1.sql | ||
</syntaxhighlight> | </syntaxhighlight> | ||
+ | |||
+ | Save the script above in a file called 'px2sql-all.sh' and save it in the folder ~/bin/. Then run the following commands | ||
+ | |||
+ | PATH=$PATH:~/bin | ||
+ | chmod +x ~/bin/px2sql-all.sh | ||
+ | |||
+ | Here are the different processing steps the script does: | ||
+ | # creates a directory with the name 'sql' | ||
+ | # goes through all the Paradox table files in the current directory and export them to individual sql files in the ''sql'' directory | ||
+ | # creates a subdirectory within the directory called 'converted' | ||
+ | # goes through all the exported sql files and converts them to UTF-8. For this to work the script needs to know the Character Encoding of the original data. This is where the Code Page entry from the properties page in the Paradox Data Editor comes in play. In this example the Code Page was 437, so the character encoding is CP437. This will be used as a parameter when calling the script. ''See the man-page for the program iconv to see what other character encodings are supported. To learn more about character encoding, please read the article ''[http://www.joelonsoftware.com/articles/Unicode.html The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)]'' by Joel Spolsky.'' | ||
+ | # creates one sql file with code to create a database within the mySQL database engine. The content of all the individual sql files is then also added to this sql file. The resulting file is then copied in the directory in which the original Paradox files are. | ||
+ | |||
+ | To run the script go into the directory where the Paradox data files are located and call the command | ||
+ | |||
+ | px2sql-all.sh <database name> <original character encoding> | ||
+ | |||
+ | In this example the name of the database is 'ornithol' and the character encoding is CP437, so resulting command will be | ||
px2sql-all.sh ornithol CP437 | px2sql-all.sh ornithol CP437 | ||
+ | |||
+ | The resulting file will be ornithol.sql which can then be imported into a mySQL database. | ||
+ | |||
+ | |||
+ | == Additional Steps In Processing The Data == | ||
+ | |||
+ | === Documenting Table Relations === | ||
+ | PXTools is not able to export the relations between the different columns of different tables (Foreign Key Relations). With the Paradox Data Editor it is however possible to view those relations. When a table is opened, click on the Icon "Structure Information" in the toolbar (the forth icon from the right, marked with a red rectangle in the Paradox Data Editor image above). This will open an info window with the structure information of the table (This window will not open, if the name of the Paradox file or the path to it contains any special characters). Go to the tab "Referential Integrity" and you will see a list of all the relations from this table. With these information it is possible to remodel the relations in mySQL. | ||
+ | [[File:Paradox Data Editor Structure Information.png|frame|none|Structural Information about the References to other tables]] | ||
+ | |||
+ | === Further Fixing Character Encoding === | ||
+ | Though most of the data in the tables we received were encoded in CP437, some parts of it used different character encodings. This was fixed using the same method as described in [[Export_DataPerfect (Section 'Converting Characters Individually')||Export_DataPerfect#converting_characters_individually]]. | ||
+ | |||
+ | === Turning BLOB Entries Into TEXT Entries === | ||
+ | In our dataset a lot of large texts were stored in Paradox as BLOB objects. After exporting the data from Paradox these texts where in plain text in the mxSQL files, however the data type for the column was still BLOB. Though this works with mySQL we changed the data type for these columns to TEXT, so the text content is more easily editable and searchable within mySQL. | ||
+ | |||
+ | === Optimized For Import Speed === | ||
+ | The way PXTools exports the data, makes it slow to import it in mySQL database again. When trying to import it using phpMyAdmin, several timeout occured. In some cased phpMyAdmin saved the last position and it was easy to continue importing, on other cased, the files had to be adjusted manually in order to continue the import at the correct position. When importing the files through the command line interface of mySQL it worked without errors, however, it still took quite a while. A dataset with 76000 entries took almost 50 minutes to import. After the data was imported, it was exported again. This time the data was optimized for a reimport. The same dataset now only took 30 seconds to import. However during the initial import some data points where there was no data from Paradox but were required by mySQL were filled out using default values. This therefore altered the data and so the reexported data is not equal to the initially exported data. |
Revision as of 16:37, 20 July 2012
This article describes how to export data from Paradox Database files, based on the data set provided by the Museum Koenig.
Contents
Introduction
We received three dataset with data in the Paradox database format and were asked if we could extract the data and convert it to SQL. Paradox is a file based database system. Each table has at least two (usually more) files with the same based name and different file ending.
Further Reading
Viewing The Data
In order to view the data we first used the software Paradox dBase Reader. Though all the structural information of the table was clearly visible, all the content was only displayed as garbage (Image 1). As it turns out, the files were password protected, the Paradox dBase Reader actually showed that they were protected in the general information page of the table (the checked checkbox at the bottom of the second image)
We then used the software Paradox Viewer. It promoted a dialog box asking to input the password. All encrypted Paradox files have a default password, depending on the version of Paradox used. For Paradox 5.0 and 7.0 this is either jIGGAe
or cupcdvum
and for Paradox 4.0 it is nx66ppx
. The files were saved by Paradox 7 (as can be see on the general information page of the Paradox dBase Reader in the image above). In our case jIGGAe
was the correct password. Now the content was displayed correctly. However in the unregistered version Paradox Viewer only shows the first 100 entries and has no possibility to remove the password protection.
Removing Password Protection
The Paradox Data Editor offers the functionality of removing the password protection from a table. It also opens the encrypted tables without even prompting for a password. In properties panel on the left, it still shows that the file is protected. Above you can also see the entry "Code Page: 437" which will become important later on. To remove the password protection, click on the key icon in the toolbar.
Exporting the data
To export the data from the Paradox files, we used the software PXTools. It runs under Linux but first it has be compiled and installed.
Installing PXTools
After downloading the *.tar.gz file from the homepage of PXTools, run the following commands to install the software.
tar -zxvf pxtools-0.0.20.tar.gz
cd pxtools-0.0.20/
./configure
sudo make
sudo make install
sudo make clean
To see information about a table use the command
pxinfo -f COLEOPTE_DATA.DB
Using PXTools
To export the data from one table into the mysql file use the command
pxsqldump -d mysql -f COLEOPTE_DATA.DB -b COLEOPTE_DATA.MB -d coleopte_data> COLEOPTE_DATA.sql
PXTools support export to CSV, XML, PostgreSQL and mySQL. In this example we will be exporting the data to mySQL.
Using PXTools With A Script
Calling the tool as shown above, means that the command has to be adjusted and executed for each table individually. A more convenient way is use a script that will run for all of the tables automatically. The following code does just this as well as a few more things, which will be explained later on.
#!/bin/sh
# parameters: $1=database_name $2=Paradox_File_Encoding
if [ ! -d "sql" ]; then
echo "creating 'sql' directory"
mkdir sql
fi
DBFILES=`find -maxdepth 1 \( -name "*.DB" ! -name "BDS_*" \) `
for i in $DBFILES
do
echo
echo "#############################"
echo "# Exporting $i"
echo "#############################"
pxsqldump -d mysql -f $i -b $(echo "$i" | sed -e 's/\.DB$/\.MB/') -n $1.$(echo "$i" | tr '[:upper:]' '[:lower:]'| sed -e 's/\.db$//' | sed -e 's/.*\///') > sql/$(echo "$i" | tr '[:upper:]' '[:lower:]'| sed -e 's/\.db$//' | sed -e 's/.*\///').sql
done
cd sql
if [ ! -d "converted" ]; then
mkdir converted
fi
SQLFILES=`find -maxdepth 1 \( -name "*.sql" ! -name "BDS_*" \)`
for i in $SQLFILES
do
echo "Converting $i"
iconv -f $2 -t UTF-8 < $i > converted/$i
done
cd converted
echo
echo "Joining SQL Files"
echo "CREATE DATABASE IF NOT EXISTS $1 ;" > $1.sql.tmp
echo "USE $1 ;" >> $1.sql.tmp
echo >> $1.sql.tmp
echo >> $1.sql.tmp
SQLFILES=`find \( -name "*.sql" ! -name "BDS_*" ! -name "$1.sql" \)`
for i in $SQLFILES
do
cat $i >> $1.sql.tmp
echo >> $1.sql.tmp
echo >> $1.sql.tmp
echo $i
done
echo "" >> $1.sql.tmp
mv $1.sql.tmp ../../$1.sql
Save the script above in a file called 'px2sql-all.sh' and save it in the folder ~/bin/. Then run the following commands
PATH=$PATH:~/bin chmod +x ~/bin/px2sql-all.sh
Here are the different processing steps the script does:
- creates a directory with the name 'sql'
- goes through all the Paradox table files in the current directory and export them to individual sql files in the sql directory
- creates a subdirectory within the directory called 'converted'
- goes through all the exported sql files and converts them to UTF-8. For this to work the script needs to know the Character Encoding of the original data. This is where the Code Page entry from the properties page in the Paradox Data Editor comes in play. In this example the Code Page was 437, so the character encoding is CP437. This will be used as a parameter when calling the script. See the man-page for the program iconv to see what other character encodings are supported. To learn more about character encoding, please read the article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.
- creates one sql file with code to create a database within the mySQL database engine. The content of all the individual sql files is then also added to this sql file. The resulting file is then copied in the directory in which the original Paradox files are.
To run the script go into the directory where the Paradox data files are located and call the command
px2sql-all.sh <database name> <original character encoding>
In this example the name of the database is 'ornithol' and the character encoding is CP437, so resulting command will be
px2sql-all.sh ornithol CP437
The resulting file will be ornithol.sql which can then be imported into a mySQL database.
Additional Steps In Processing The Data
Documenting Table Relations
PXTools is not able to export the relations between the different columns of different tables (Foreign Key Relations). With the Paradox Data Editor it is however possible to view those relations. When a table is opened, click on the Icon "Structure Information" in the toolbar (the forth icon from the right, marked with a red rectangle in the Paradox Data Editor image above). This will open an info window with the structure information of the table (This window will not open, if the name of the Paradox file or the path to it contains any special characters). Go to the tab "Referential Integrity" and you will see a list of all the relations from this table. With these information it is possible to remodel the relations in mySQL.
Further Fixing Character Encoding
Though most of the data in the tables we received were encoded in CP437, some parts of it used different character encodings. This was fixed using the same method as described in |Export_DataPerfect#converting_characters_individually.
Turning BLOB Entries Into TEXT Entries
In our dataset a lot of large texts were stored in Paradox as BLOB objects. After exporting the data from Paradox these texts where in plain text in the mxSQL files, however the data type for the column was still BLOB. Though this works with mySQL we changed the data type for these columns to TEXT, so the text content is more easily editable and searchable within mySQL.
Optimized For Import Speed
The way PXTools exports the data, makes it slow to import it in mySQL database again. When trying to import it using phpMyAdmin, several timeout occured. In some cased phpMyAdmin saved the last position and it was easy to continue importing, on other cased, the files had to be adjusted manually in order to continue the import at the correct position. When importing the files through the command line interface of mySQL it worked without errors, however, it still took quite a while. A dataset with 76000 entries took almost 50 minutes to import. After the data was imported, it was exported again. This time the data was optimized for a reimport. The same dataset now only took 30 seconds to import. However during the initial import some data points where there was no data from Paradox but were required by mySQL were filled out using default values. This therefore altered the data and so the reexported data is not equal to the initially exported data.