Difference between revisions of "Integrated Rules"

From Data Quality Toolkit
Jump to: navigation, search
(Created page with "This page is a working document containing an evolving set of rules which will be contineously implemented into the data integrity service and quality toolkit. So far, only a few...")
 
(Check whether the value for measurement and fact is a number)
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
This page is a working document containing an evolving set of rules which will be contineously implemented into the data integrity service and quality toolkit. So far, only a few examples have been included. The numbering scheme will also be used to specify the set of rules to be applied when using the integrity service.
 
This page is a working document containing an evolving set of rules which will be contineously implemented into the data integrity service and quality toolkit. So far, only a few examples have been included. The numbering scheme will also be used to specify the set of rules to be applied when using the integrity service.
 
 
----
 
 
 
 
== Integrity Rules ==
 
 
 
1 Atomized Genus element
 
2 Collection date fields
 
3 Site coordinate latitude
 
4 Site coordinate longitude
 
5 Syntax of email elements
 
6 ISO country element
 
7 Scientific name (zoology)
 
8 Scientific name (botany)
 
9 Mime type for multimedia objects
 
10 Check whether multimedia object file is available
 
11 Check whether multimedia object has an associated copyright statement
 
12 Check whether rule 7 and rule 8 find the scientific name
 
13 Check whether the value for measurement and fact  is a number
 
14 Check whether Record basis is mapped
 
 
   
 
   
 +
* ABCD Schema (http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_1.20/ABCD-1.20.html#complexType_ReferenceType_Link03B5BC30)
  
  
----
+
=== Atomized Genus elements should start with a single uppercase character followed ny a non-empty sequence of lower-case characters ===
 
 
 
 
 
 
 
 
 
=== 1 Atomized Genus elements should start with a single uppercase character followed ny a non-empty sequence of lower-case characters ===
 
  
  
Line 45: Line 17:
 
[A-Z][a-z]+
 
[A-Z][a-z]+
  
 
 
 
  
  
=== 2 Check whether collection date fields conform to specification in ABCD 2.06 ===
+
=== Check whether collection date fields conform to specification in ABCD 2.06 ===
  
  
Line 66: Line 35:
 
   
 
   
  
+
=== Check numeric ranges of site coordinates latitude value ===
 
 
 
 
=== 3 Check numeric ranges of site coordinates latitude value ===
 
  
  
Line 80: Line 46:
 
-90.0 <= lat <= 90.0
 
-90.0 <= lat <= 90.0
  
 
  
 
  
 
+
=== Check numeric ranges of site coordinates longitude value ===
=== 4 Check numeric ranges of site coordinates longitude value ===
 
  
  
Line 98: Line 61:
 
   
 
   
  
 
+
=== Check syntactical correctness of ABCD elements used for email addresses ===
=== 5 Check syntactical correctness of ABCD elements used for email addresses ===
 
  
  
Line 111: Line 73:
  
  
=== 6 Check whether country element conforms with ISO3166 ===
+
=== Check whether country element conforms with ISO3166 ===
 +
 
  
  
Line 123: Line 86:
  
  
=== 7 Check whether scientific name is known by zoological name service ===
+
=== Check whether scientific name is known by zoological name service ===
 +
 
  
  
Line 135: Line 99:
  
  
=== 8 Check whether scientific name is known by botanical name service ===
+
=== Check whether scientific name is known by botanical name service ===
 +
 
  
  
Line 148: Line 113:
 
   
 
   
 
   
 
   
=== 9 Check whether field for multimedia object type uses mime types ===
+
=== Check whether field for multimedia object type uses mime types ===
  
  
Line 159: Line 124:
  
 
Use http://www.ietf.org/rfc/rfc2046.txt
 
Use http://www.ietf.org/rfc/rfc2046.txt
 
 
   
 
   
  
  
=== 10 Check whether multimedia object file is available ===
+
=== Check whether multimedia object file is available ===
  
  
Line 176: Line 140:
  
  
=== 11 Check whether multimedia object has an associated copyright statement ===
+
=== Check whether multimedia object has an associated copyright statement ===
  
  
Line 189: Line 153:
 
   
 
   
  
 
+
=== Check whether rule 7 and rule 8 find the scientific name ===
=== 12 Check whether rule 7 and rule 8 find the scientific name ===
 
  
  
Line 201: Line 164:
 
Use rule 7 and rule 8
 
Use rule 7 and rule 8
  
 
  
  
=== 13 Check whether the value for measurement and fact  is a number ===
+
=== Check whether the value for measurement and fact  is a number ===
  
  
Line 210: Line 172:
  
 
/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/LowerValue
 
/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/LowerValue
 +
 
/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/UpperValue
 
/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/UpperValue
 +
 
/DataSets/DataSet/Units/Unit/Gathering/Depth/MeasurementOrFactText
 
/DataSets/DataSet/Units/Unit/Gathering/Depth/MeasurementOrFactText
 +
 
/DataSets/DataSet/Units/Unit/Gathering/Height/MeasurementOrFactText
 
/DataSets/DataSet/Units/Unit/Gathering/Height/MeasurementOrFactText
  
Line 218: Line 183:
 
MaF data type field values have to be a number
 
MaF data type field values have to be a number
  
+
=== Check whether Record basis is mapped ===
 
 
 
 
=== 14 Check whether Record basis is mapped ===
 
  
  

Latest revision as of 12:25, 7 November 2012

This page is a working document containing an evolving set of rules which will be contineously implemented into the data integrity service and quality toolkit. So far, only a few examples have been included. The numbering scheme will also be used to specify the set of rules to be applied when using the integrity service.


Atomized Genus elements should start with a single uppercase character followed ny a non-empty sequence of lower-case characters

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/NameAtomised/Zoological/GenusOrMonomial /DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/NameAtomised/Botanical/GenusOrMonomial

Regular expression:

[A-Z][a-z]+


Check whether collection date fields conform to specification in ABCD 2.06

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/DateTime/ISODateTimeBegin /DataSets/DataSet/Units/Unit/Gathering/DateTime/ISODateTimeEnd /DataSets/DataSet/Units/Unit/Identifications/Identification/Date/ISODateTimeBegin /DataSets/DataSet/Units/Unit/Identifications/Identification/Date/ISODateTimeEnd

Regular expression:

\d\d\d\d(\-(0[1-9]|1[012])(\-((0[1-9])|1\d|2\d|3[01])(T(0\d|1\d|2[0-3])(:[0-5]\d){0,2})?)?)?|\-\-(0[1-9]|1[012])(\-(0[1-9]|1\d|2\d|3[01]))?|\-\-\-(0[1-9]|1\d|2\d|3[01])


Check numeric ranges of site coordinates latitude value

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/SiteCoordinateSets/SiteCoordinates/CoordinatesLatLong/LatitudeDecimal

Rule:

-90.0 <= lat <= 90.0


Check numeric ranges of site coordinates longitude value

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/SiteCoordinateSets/SiteCoordinates/CoordinatesLatLong/LongitudeDecimal

Rule:

-180.0 <= lon <= 180.0


Check syntactical correctness of ABCD elements used for email addresses

ABCD elements:

All elements with email-addresses

Regular expression:

^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9\-\.]+)$


Check whether country element conforms with ISO3166

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/Country/ISO3166Code

Rule:

Use 2- or 3-letter ISO country code (ISO3166-1).


Check whether scientific name is known by zoological name service

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/FullScientificNameString

Rule:

Use Zoological Name Service


Check whether scientific name is known by botanical name service

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/FullScientificNameString

Rule:

Use Botanical Name Service


Check whether field for multimedia object type uses mime types

ABCD elements:

/DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/FileURI /DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/Format

Rule:

Use http://www.ietf.org/rfc/rfc2046.txt


Check whether multimedia object file is available

ABCD elements:

/DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/File

Rule:

HTTP HEAD request


Check whether multimedia object has an associated copyright statement

ABCD elements:

/DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/IPR/Copyrights/Copyright/Text

Rule:

Copyright element has to be non-empty.


Check whether rule 7 and rule 8 find the scientific name

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/FullScientificNameString

Rule:

Use rule 7 and rule 8


Check whether the value for measurement and fact is a number

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/LowerValue

/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/UpperValue

/DataSets/DataSet/Units/Unit/Gathering/Depth/MeasurementOrFactText

/DataSets/DataSet/Units/Unit/Gathering/Height/MeasurementOrFactText

Rule:

MaF data type field values have to be a number

Check whether Record basis is mapped

ABCD elements:

/DataSets/DataSet/Units/Unit/RecordBasis

Rule:

Record basis field has to be mapped