Difference between revisions of "Integrated Rules"

From Data Quality Toolkit
Jump to: navigation, search
(Created page with "This page is a working document containing an evolving set of rules which will be contineously implemented into the data integrity service and quality toolkit. So far, only a few...")
 
Line 1: Line 1:
 
This page is a working document containing an evolving set of rules which will be contineously implemented into the data integrity service and quality toolkit. So far, only a few examples have been included. The numbering scheme will also be used to specify the set of rules to be applied when using the integrity service.
 
This page is a working document containing an evolving set of rules which will be contineously implemented into the data integrity service and quality toolkit. So far, only a few examples have been included. The numbering scheme will also be used to specify the set of rules to be applied when using the integrity service.
 
 
----
 
  
  
Line 45: Line 42:
 
[A-Z][a-z]+
 
[A-Z][a-z]+
  
 
 
 
  
  
Line 65: Line 59:
  
 
   
 
   
 
 
 
  
 
=== 3 Check numeric ranges of site coordinates latitude value ===
 
=== 3 Check numeric ranges of site coordinates latitude value ===
Line 80: Line 71:
 
-90.0 <= lat <= 90.0
 
-90.0 <= lat <= 90.0
  
 
 
 
  
  
Line 97: Line 85:
  
 
   
 
   
 
  
 
=== 5 Check syntactical correctness of ABCD elements used for email addresses ===
 
=== 5 Check syntactical correctness of ABCD elements used for email addresses ===
Line 112: Line 99:
  
 
=== 6 Check whether country element conforms with ISO3166 ===
 
=== 6 Check whether country element conforms with ISO3166 ===
 +
  
  
Line 124: Line 112:
  
 
=== 7 Check whether scientific name is known by zoological name service ===
 
=== 7 Check whether scientific name is known by zoological name service ===
 +
  
  
Line 136: Line 125:
  
 
=== 8 Check whether scientific name is known by botanical name service ===
 
=== 8 Check whether scientific name is known by botanical name service ===
 +
  
  
Line 159: Line 149:
  
 
Use http://www.ietf.org/rfc/rfc2046.txt
 
Use http://www.ietf.org/rfc/rfc2046.txt
 
 
   
 
   
  
Line 188: Line 177:
  
 
   
 
   
 
  
 
=== 12 Check whether rule 7 and rule 8 find the scientific name ===
 
=== 12 Check whether rule 7 and rule 8 find the scientific name ===
Line 201: Line 189:
 
Use rule 7 and rule 8
 
Use rule 7 and rule 8
  
 
  
  
Line 219: Line 206:
  
 
   
 
   
 
  
 
=== 14 Check whether Record basis is mapped ===
 
=== 14 Check whether Record basis is mapped ===

Revision as of 11:59, 7 November 2012

This page is a working document containing an evolving set of rules which will be contineously implemented into the data integrity service and quality toolkit. So far, only a few examples have been included. The numbering scheme will also be used to specify the set of rules to be applied when using the integrity service.


Integrity Rules

1 Atomized Genus element 2 Collection date fields 3 Site coordinate latitude 4 Site coordinate longitude 5 Syntax of email elements 6 ISO country element 7 Scientific name (zoology) 8 Scientific name (botany) 9 Mime type for multimedia objects 10 Check whether multimedia object file is available 11 Check whether multimedia object has an associated copyright statement 12 Check whether rule 7 and rule 8 find the scientific name 13 Check whether the value for measurement and fact is a number 14 Check whether Record basis is mapped





1 Atomized Genus elements should start with a single uppercase character followed ny a non-empty sequence of lower-case characters

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/NameAtomised/Zoological/GenusOrMonomial /DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/NameAtomised/Botanical/GenusOrMonomial

Regular expression:

[A-Z][a-z]+


2 Check whether collection date fields conform to specification in ABCD 2.06

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/DateTime/ISODateTimeBegin /DataSets/DataSet/Units/Unit/Gathering/DateTime/ISODateTimeEnd /DataSets/DataSet/Units/Unit/Identifications/Identification/Date/ISODateTimeBegin /DataSets/DataSet/Units/Unit/Identifications/Identification/Date/ISODateTimeEnd

Regular expression:

\d\d\d\d(\-(0[1-9]|1[012])(\-((0[1-9])|1\d|2\d|3[01])(T(0\d|1\d|2[0-3])(:[0-5]\d){0,2})?)?)?|\-\-(0[1-9]|1[012])(\-(0[1-9]|1\d|2\d|3[01]))?|\-\-\-(0[1-9]|1\d|2\d|3[01])


3 Check numeric ranges of site coordinates latitude value

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/SiteCoordinateSets/SiteCoordinates/CoordinatesLatLong/LatitudeDecimal

Rule:

-90.0 <= lat <= 90.0


4 Check numeric ranges of site coordinates longitude value

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/SiteCoordinateSets/SiteCoordinates/CoordinatesLatLong/LongitudeDecimal

Rule:

-180.0 <= lon <= 180.0


5 Check syntactical correctness of ABCD elements used for email addresses

ABCD elements:

All elements with email-addresses

Regular expression:

^([a-zA-Z0-9_\-\.\+]+)@([a-zA-Z0-9\-\.]+)$


6 Check whether country element conforms with ISO3166

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/Country/ISO3166Code

Rule:

Use 2- or 3-letter ISO country code (ISO3166-1).


7 Check whether scientific name is known by zoological name service

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/FullScientificNameString

Rule:

Use Zoological Name Service


8 Check whether scientific name is known by botanical name service

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/FullScientificNameString

Rule:

Use Botanical Name Service


9 Check whether field for multimedia object type uses mime types

ABCD elements:

/DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/FileURI /DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/Format

Rule:

Use http://www.ietf.org/rfc/rfc2046.txt


10 Check whether multimedia object file is available

ABCD elements:

/DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/File

Rule:

HTTP HEAD request


11 Check whether multimedia object has an associated copyright statement

ABCD elements:

/DataSets/DataSet/Units/Unit/MultiMediaObjects/MultiMediaObject/IPR/Copyrights/Copyright/Text

Rule:

Copyright element has to be non-empty.


12 Check whether rule 7 and rule 8 find the scientific name

ABCD elements:

/DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/FullScientificNameString

Rule:

Use rule 7 and rule 8


13 Check whether the value for measurement and fact is a number

ABCD elements:

/DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/LowerValue /DataSets/DataSet/Units/Unit/Gathering/Altitude/MeasurementOrFactAtomised/UpperValue /DataSets/DataSet/Units/Unit/Gathering/Depth/MeasurementOrFactText /DataSets/DataSet/Units/Unit/Gathering/Height/MeasurementOrFactText

Rule:

MaF data type field values have to be a number


14 Check whether Record basis is mapped

ABCD elements:

/DataSets/DataSet/Units/Unit/RecordBasis

Rule:

Record basis field has to be mapped