Difference between revisions of "Data preparation"

From reBiND Documentation
Jump to: navigation, search
m
Line 1: Line 1:
 
*Prepare the content for ABCD mapping: resolve abbreviations, add units, concatenate atomised scientific name, additional information from data documentation, add metadata, denormalise
 
*Prepare the content for ABCD mapping: resolve abbreviations, add units, concatenate atomised scientific name, additional information from data documentation, add metadata, denormalise
 
*ABCD Mapping ([http://wiki.bgbm.org/bps/index.php/ABCD2Mapping see here for detailed documentation on mapping using the Biocase Provider software]) plus additional examples
 
*ABCD Mapping ([http://wiki.bgbm.org/bps/index.php/ABCD2Mapping see here for detailed documentation on mapping using the Biocase Provider software]) plus additional examples
 +
 +
 +
= Prepare Data for XML transfer and ABCD Mapping =
 +
 +
Before you start with data preparation as described in BioCASE documentation
 +
(See http://wiki.bgbm.org/bps/index.php/Preparation )
 +
it is required to prepare the content of the data set for following reasons:
 +
* Make content information comprehensible
 +
* Add fields mandatory to ABCD Mapping
 +
 +
Following '''preparation steps''' are recommended:
 +
 +
==1. Convert abbreviations into complete words.==
 +
 +
 +
 +
Figure 1: Abbreviations in original data on the left were transferred to complete words in the right: RLB - Rote Liste Berlin (= Red Lists Berlin), RLBb - Rote Liste Brandenburg (= Red Lists Brandenburg)
 +
 +
==2. Translate numbers or characters into text. ==
 +
In some cases numbers or characters don’t represent a value but qualitative statements. To make this information comprehensible it is recommended to convert it into written text.
 +
You can use “if functions” in Excel for conversion of numbers and characters.
 +
e.g. =IF(A3="3";"3-gefährdet";IF(A3="1";"1- vom Aussterben bedroht";A3;""))
 +
 +
* Example 1 conversion of numbers
 +
 +
 +
Figure 2: Numbers in original data were completed with the meaning of the numbers (1 – vom Aussterben bedroht =  critically endangered, 2. – stark gefährded = endangered, 3 – gefährdet = vulnerable 4- potentiell gefährdet = near threatended)
 +
 +
 +
 +
* Example 2 conversion of characters
 +
 +
 +
Figure 3: characters in original data are turned into text (different ecological types)
 +
 +
 +
 +
 +
==3. Convert Values into the ABCD essential form. ==
 +
* Example: Transform lat/lon coordinates into decimal form (required for ABCD)
 +
 +
 +
 +
 +
Figure 4: latitude in degree, minute and seconds is transformed into decimal degree
 +
 +
 +
 +
 +
==4. Enter columns with ‘unit of measurement’, if missing, according to the measurement.==
 +
Otherwise in ABCD the units cannot be allocated to the values correctly.
 +
Example: header of a column in Excel: tarsus (mm) – length, width, height. Add columns with header ‘Unit’ and enter ‘mm’ in every row.
 +
You can add columns with equal content by using following SQL statement:  SELECT  [unit_tarsus_length] AS mm
 +
 +
 +
 +
Figure 5: adding the unit of measurement in an extra column, derived from the column header in original data.
 +
 +
 +
 +
==5. Add core information from metadata to every unit, if missing in the unit table.==
 +
You can add this information by using the SQL statement above.
 +
 +
 +
==6. In case of repeatable elements in ABCD you need to denormalise information.==
 +
More information see:
 +
BioCASE documention Wiki: http://wiki.bgbm.org/bps/index.php/Preparation
 +
Here: “controlled denormalisation”

Revision as of 14:06, 4 November 2014


Prepare Data for XML transfer and ABCD Mapping

Before you start with data preparation as described in BioCASE documentation (See http://wiki.bgbm.org/bps/index.php/Preparation ) it is required to prepare the content of the data set for following reasons:

  • Make content information comprehensible
  • Add fields mandatory to ABCD Mapping

Following preparation steps are recommended:

1. Convert abbreviations into complete words.

Figure 1: Abbreviations in original data on the left were transferred to complete words in the right: RLB - Rote Liste Berlin (= Red Lists Berlin), RLBb - Rote Liste Brandenburg (= Red Lists Brandenburg)

2. Translate numbers or characters into text.

In some cases numbers or characters don’t represent a value but qualitative statements. To make this information comprehensible it is recommended to convert it into written text. You can use “if functions” in Excel for conversion of numbers and characters.

e.g. =IF(A3="3";"3-gefährdet";IF(A3="1";"1- vom Aussterben bedroht";A3;""))
  • Example 1 conversion of numbers


Figure 2: Numbers in original data were completed with the meaning of the numbers (1 – vom Aussterben bedroht = critically endangered, 2. – stark gefährded = endangered, 3 – gefährdet = vulnerable 4- potentiell gefährdet = near threatended)


  • Example 2 conversion of characters


Figure 3: characters in original data are turned into text (different ecological types)



3. Convert Values into the ABCD essential form.

  • Example: Transform lat/lon coordinates into decimal form (required for ABCD)



Figure 4: latitude in degree, minute and seconds is transformed into decimal degree



4. Enter columns with ‘unit of measurement’, if missing, according to the measurement.

Otherwise in ABCD the units cannot be allocated to the values correctly. Example: header of a column in Excel: tarsus (mm) – length, width, height. Add columns with header ‘Unit’ and enter ‘mm’ in every row.

You can add columns with equal content by using following SQL statement:  SELECT  [unit_tarsus_length] AS mm


Figure 5: adding the unit of measurement in an extra column, derived from the column header in original data.


5. Add core information from metadata to every unit, if missing in the unit table.

You can add this information by using the SQL statement above.


6. In case of repeatable elements in ABCD you need to denormalise information.

More information see: BioCASE documention Wiki: http://wiki.bgbm.org/bps/index.php/Preparation Here: “controlled denormalisation”