Documentation

From Data Quality Toolkit
Revision as of 10:43, 7 November 2012 by PeerSchwirtz (talk | contribs) (Created page with "== User Interface == The Data Quality Toolkit is available at http://services.bgbm.org/DataQualityToolkit. It has a simple HTML user interface offering fields for i)specifying...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

User Interface

The Data Quality Toolkit is available at http://services.bgbm.org/DataQualityToolkit.

It has a simple HTML user interface offering fields for i)specifying the BioCASE provider installation to be analyzed, ii) selecting a set of data quality rules to be applied, and iii) filtering the subset of unit-records to be analyzed:

DQT content final.png

The Data Quality Toolkit contains a set of rules implemented into the data integrity service. Also the botanical and zoological name service is available with different databases. The set of rules integrated in the toolkit is still not complete. However, the functionality of the system (construction of queries, paging through ABCD records, applying quality rules, compilation of the response document) and the practicability for the user is fully functional.




Implementation

The Data Quality Toolkit implementation is based on Node.js, running on top of the Google V8 JavaScript Engine. This has some interesting implications as JavaScript is the only programming language needed for both the client and the server. Programming is done 'asynchronously' using 'callbacks' and non-blocking IO. This leads to a highly effective programming of concurrent processes. The individual software modules are:

  • HTTP server provides basic functions of an HTTP server and controls the workflow.
  • HTTP client communicates with the BioCASE providers and other servers.
  • XML parser builds data structures from XML data.
  • Validator uses these data structures to apply the data quality rules.
  • Rules module contains the definitions of the individual rules in the form of JSON objects.
  • Config module is used for rules mapping, paging and annotations parameters.

Individual ABCD XML elements can receive multiple annotations from the application of several rules. The paging process for BioCASE provider installations is configurable with regard to the page size (the number of unit records per page).