Difference between revisions of "Documentation"

From Data Quality Toolkit
Jump to: navigation, search
Line 7: Line 7:
 
ii) selecting a set of data quality rules to be applied, and iii) filtering the subset of unit-records to be analyzed:
 
ii) selecting a set of data quality rules to be applied, and iii) filtering the subset of unit-records to be analyzed:
 
   
 
   
[[File:DQT content final.png]]
+
[[File:DQT content final.png|350px|Preview DataQualityToolkit]]
  
 
The Data Quality Toolkit contains a set of rules implemented into the data integrity service. Also the botanical and zoological name service is available with different databases.  
 
The Data Quality Toolkit contains a set of rules implemented into the data integrity service. Also the botanical and zoological name service is available with different databases.  

Revision as of 11:31, 7 November 2012

User Interface

The Data Quality Toolkit is available at http://services.bgbm.org/DataQualityToolkit.

It has a simple HTML user interface offering fields for i)specifying the BioCASE provider installation to be analyzed, ii) selecting a set of data quality rules to be applied, and iii) filtering the subset of unit-records to be analyzed:

Preview DataQualityToolkit

The Data Quality Toolkit contains a set of rules implemented into the data integrity service. Also the botanical and zoological name service is available with different databases. The set of rules integrated in the toolkit is still not complete. However, the functionality of the system (construction of queries, paging through ABCD records, applying quality rules, compilation of the response document) and the practicability for the user is fully functional.




Implementation

The Data Quality Toolkit implementation is based on Node.js(www.nodejs.org), running on top of the Google V8 JavaScript Engine. This has some interesting implications as JavaScript is the only programming language needed for both the client and the server. Programming is done 'asynchronously' using 'callbacks' and non-blocking IO. This leads to a highly effective programming of concurrent processes. The individual software modules are:


  • HTTP server provides basic functions of an HTTP server and controls the workflow.
  • HTTP client communicates with the BioCASE providers and other servers.
  • XML parser builds data structures from XML data.
  • Validator uses these data structures to apply the data quality rules.
  • Rules module contains the definitions of the individual rules in the form of JSON objects.
  • Config module is used for rules mapping, paging and annotations parameters.


Individual ABCD XML elements can receive multiple annotations from the application of several rules.

The paging process for BioCASE provider installations is configurable with regard to the page size (the number of unit records per page).