Difference between revisions of "Export Paradox Data"

From reBiND Documentation
Jump to: navigation, search
m (Further Fixing Character Encoding: Link Correktion)
m (5 revision)
 
(No difference)

Latest revision as of 17:59, 10 November 2014

This article describes how to export data from Paradox Database files, based on the data set provided by the Museum Koenig.

Introduction

We received three dataset with data in the Paradox database format and were asked if we could extract the data and convert it to SQL. Paradox is a file based database system. Each table has at least two (usually more) files with the same based name and different file ending.

Further Reading

Viewing The Data

In order to view the data we first used the software Paradox dBase Reader. Though all the structural information of the table was clearly visible, all the content was only displayed as garbage (Image 1). As it turns out, the files were password protected, the Paradox dBase Reader actually showed that they were protected in the general information page of the table (the checked checkbox at the bottom of the second image)

File:Paradox dBase Reader with Password Protected Data.png
Viewing Password Encrypted Files in Paradox dbase Reader
File:Paradox dBase Reader with Password Protected Data 2.png
Information about the file in Paradox dbase Reader

We then used the software Paradox Viewer. It promoted a dialog box asking to input the password. All encrypted Paradox files have a default password, depending on the version of Paradox used. For Paradox 5.0 and 7.0 this is either jIGGAe or cupcdvum and for Paradox 4.0 it is nx66ppx. The files were saved by Paradox 7 (as can be see on the general information page of the Paradox dBase Reader in the image above). In our case jIGGAe was the correct password. Now the content was displayed correctly. However in the unregistered version Paradox Viewer only shows the first 100 entries and has no possibility to remove the password protection.

Removing Password Protection

The Paradox Data Editor offers the functionality of removing the password protection from a table. It also opens the encrypted tables without even prompting for a password. In properties panel on the left, it still shows that the file is protected. Above you can also see the entry "Code Page: 437" which will become important later on. To remove the password protection, click on the key icon in the toolbar.

File:Paradox Data Editor.png
Data already decrypted in Paradox Data Editor

Exporting the data

To export the data from the Paradox files, we used the software PXTools. It runs under Linux but first it has be compiled and installed.

Installing PXTools

After downloading the *.tar.gz file from the homepage of PXTools, run the following commands to install the software.

 tar -zxvf pxtools-0.0.20.tar.gz
 cd pxtools-0.0.20/
 ./configure
 sudo make
 sudo make install
 sudo make clean

To see information about a table use the command

 pxinfo -f COLEOPTE_DATA.DB

Using PXTools

To export the data from one table into the mysql file use the command

 pxsqldump -d mysql -f COLEOPTE_DATA.DB -b COLEOPTE_DATA.MB -d coleopte_data> COLEOPTE_DATA.sql

PXTools support export to CSV, XML, PostgreSQL and mySQL. In this example we will be exporting the data to mySQL.


Using PXTools With A Script

Calling the tool as shown above, means that the command has to be adjusted and executed for each table individually. A more convenient way is use a script that will run for all of the tables automatically. The following code does just this as well as a few more things, which will be explained later on.

#!/bin/sh
# parameters: $1=database_name $2=Paradox_File_Encoding

if [ ! -d "sql" ]; then
	echo "creating 'sql' directory"
	mkdir sql
fi
DBFILES=`find -maxdepth 1 \( -name "*.DB" ! -name "BDS_*" \) `

for i in $DBFILES
do 
	echo 
	echo "#############################"
	echo "# Exporting $i"
	echo "#############################"
	pxsqldump -d mysql -f $i -b $(echo "$i" | sed -e 's/\.DB$/\.MB/') -n $1.$(echo "$i" | tr '[:upper:]' '[:lower:]'| sed -e 's/\.db$//' | sed -e 's/.*\///') > sql/$(echo "$i" | tr '[:upper:]' '[:lower:]'| sed -e 's/\.db$//' | sed -e 's/.*\///').sql
done 

cd sql
if [ ! -d "converted" ]; then
	mkdir converted
fi
SQLFILES=`find -maxdepth 1 \( -name "*.sql" ! -name "BDS_*" \)`

for i in $SQLFILES
do 
	echo "Converting $i"
	iconv -f $2 -t UTF-8 < $i > converted/$i
done

cd converted
echo 
echo "Joining SQL Files"
echo "CREATE DATABASE IF NOT EXISTS $1 ;" > $1.sql.tmp
echo "USE $1 ;" >> $1.sql.tmp
echo >> $1.sql.tmp 
echo >> $1.sql.tmp 

SQLFILES=`find \( -name "*.sql" ! -name "BDS_*" ! -name "$1.sql" \)`

for i in $SQLFILES
do 
	cat $i >> $1.sql.tmp 
	echo >> $1.sql.tmp 
	echo >> $1.sql.tmp 
	echo $i
done
echo "" >> $1.sql.tmp
mv $1.sql.tmp ../../$1.sql

Save the script above in a file called 'px2sql-all.sh' and save it in the folder ~/bin/. Then run the following commands

PATH=$PATH:~/bin
chmod +x ~/bin/px2sql-all.sh

Here are the different processing steps the script does:

  1. creates a directory with the name 'sql'
  2. goes through all the Paradox table files in the current directory and export them to individual sql files in the sql directory
  3. creates a subdirectory within the directory called 'converted'
  4. goes through all the exported sql files and converts them to UTF-8. For this to work the script needs to know the Character Encoding of the original data. This is where the Code Page entry from the properties page in the Paradox Data Editor comes in play. In this example the Code Page was 437, so the character encoding is CP437. This will be used as a parameter when calling the script. See the man-page for the program iconv to see what other character encodings are supported. To learn more about character encoding, please read the article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.
  5. creates one sql file with code to create a database within the mySQL database engine. The content of all the individual sql files is then also added to this sql file. The resulting file is then copied in the directory in which the original Paradox files are.

To run the script go into the directory where the Paradox data files are located and call the command

px2sql-all.sh <database name> <original character encoding>

In this example the name of the database is 'ornithol' and the character encoding is CP437, so resulting command will be

px2sql-all.sh ornithol CP437

The resulting file will be ornithol.sql which can then be imported into a mySQL database.


Additional Steps In Processing The Data

Documenting Table Relations

PXTools is not able to export the relations between the different columns of different tables (Foreign Key Relations). With the Paradox Data Editor it is however possible to view those relations. When a table is opened, click on the Icon "Structure Information" in the toolbar (the forth icon from the right, marked with a red rectangle in the Paradox Data Editor image above). This will open an info window with the structure information of the table (This window will not open, if the name of the Paradox file or the path to it contains any special characters). Go to the tab "Referential Integrity" and you will see a list of all the relations from this table. With these information it is possible to remodel the relations in mySQL.

File:Paradox Data Editor Structure Information.png
Structural Information about the References to other tables

Further Fixing Character Encoding

Though most of the data in the tables we received were encoded in CP437, some parts of it used different character encodings. This was fixed using the same method as described in Export_DataPerfect (Section 'Converting Characters Individually').

Turning BLOB Entries Into TEXT Entries

In our dataset a lot of large texts were stored in Paradox as BLOB objects. After exporting the data from Paradox these texts where in plain text in the mxSQL files, however the data type for the column was still BLOB. Though this works with mySQL we changed the data type for these columns to TEXT, so the text content is more easily editable and searchable within mySQL.

Optimized For Import Speed

The way PXTools exports the data, makes it slow to import it in mySQL database again. When trying to import it using phpMyAdmin, several timeout occured. In some cased phpMyAdmin saved the last position and it was easy to continue importing, on other cased, the files had to be adjusted manually in order to continue the import at the correct position. When importing the files through the command line interface of mySQL it worked without errors, however, it still took quite a while. A dataset with 76000 entries took almost 50 minutes to import. After the data was imported, it was exported again. This time the data was optimized for a reimport. The same dataset now only took 30 seconds to import. However during the initial import some data points where there was no data from Paradox but were required by mySQL were filled out using default values. This therefore altered the data and so the reexported data is not equal to the initially exported data.