Difference between revisions of "Supporting data preparation software"

From reBiND Documentation
Jump to: navigation, search
(Data Splitter)
(Data Splitter)
Line 8: Line 8:
  
 
The user interface is shown in the screenshot below.
 
The user interface is shown in the screenshot below.
 +
 +
[[File:Data_splitter.PNG|border]]
  
 
For example the user enters a regular expression in the Regex box which atomises the data by splitting it at the specified character, in this case it was a comma separated value (csv) file. In the example file the full locality information occurred in one field but we required it to be atomised. After clicking ‘Split Data’ the original data shown in column 1 is split into several numbered columns (1-7) to the right of this data. Highlighting cells in the split data and pressing Ctrl + J can be used to re-join any columns.
 
For example the user enters a regular expression in the Regex box which atomises the data by splitting it at the specified character, in this case it was a comma separated value (csv) file. In the example file the full locality information occurred in one field but we required it to be atomised. After clicking ‘Split Data’ the original data shown in column 1 is split into several numbered columns (1-7) to the right of this data. Highlighting cells in the split data and pressing Ctrl + J can be used to re-join any columns.

Revision as of 15:04, 20 October 2014

Software products to support data preparation

Several software tools were created to support data preparation, by data cleaning, data substitutions and other modifications. These aureoutlined below.

Data Splitter

A Java-based program was written to facilitate preparation of data where a single field of data in a text file needs to be split into several fields. This ‘Data Splitter’ program requires the user to specify a regular expression where data in a single field should be split.

The user interface is shown in the screenshot below.

Data splitter.PNG

For example the user enters a regular expression in the Regex box which atomises the data by splitting it at the specified character, in this case it was a comma separated value (csv) file. In the example file the full locality information occurred in one field but we required it to be atomised. After clicking ‘Split Data’ the original data shown in column 1 is split into several numbered columns (1-7) to the right of this data. Highlighting cells in the split data and pressing Ctrl + J can be used to re-join any columns.


  • Character Encoding Correcter
  • Stand-alone Correction Manager