Overview rebind workflow

From reBiND Documentation
Revision as of 17:18, 8 October 2014 by LornaMorris (talk | contribs) (Overview of the reBiND workflow)
Jump to: navigation, search

Overview of the reBiND workflow

The general structure of the reBiND Framework. Blue solid lines indicate how the document is transformed and processed, orange dashed lines indicate user interaction or input.

This figure shows the general structure of the reBiND processing architecture. It shows each step in the workflow from submission of a dataset, preparation and processing to its final publication.

Before the data can be uploaded into the reBiND data portal several steps are required to prepare the data and map it to an appropriate schema. We have used the ABCD - Access to Biological Collections Data - schema. ABCD is a common data specification for biological collection units, including living and preserved specimens and field observations. The majority of data we received from contributing scientists was in spreadsheet format, which can easily be imported into a relational database. Once the data is in a rational datebase we used the BioCASe Provider Software (BPS). The BPS supports many different SQL based databases and these databases offer imports for different file types. In order to generate the XML files the columns from the relational database have to be mapped to the corresponding concepts of ABCD. At this point the expert knowledge of the contributing scientists is needed. At the end of the mapping process an ABCD XML document is generated.

Once the data has been converted into an XML format it can be uploaded onto the reBiND web portal. After the XML document has been uploaded, the correction process can be started by the Content Administrator. The grey box in the figure highlights the steps between upload of the data into the reBiND portal and the validation, correction and review steps prior to publication of the data. The Correction Manager processes several correction modules, each for a specific purpose. When any of the modules makes any changes to the document or encounters problems, these issues are recorded in a document, so they can later be reviewed. When the modules are finished running the corrected document is loaded back into the reBiND system. At this stage the document should now be valid or if the set of correction modules were unable to fix any problems encountered the remaining validation errors will be marked.

The next step is the review. In the issue list produced by the Correction Manager issues of three different severity level are flagged:

  • information (a change was made that is not expected to cause any problem)
  • warning (a change has been made or a problem with the content has been detected that can not be changed automatically but it has no consequence for the validity of the document)
  • error (a problem with the content has been detected that cause the document to be invalid and it can not be fixed automatically).

The issues should be reviewed. Some of the problems could be the result of some technical issues and may be fixed by specifying new correction modules. Other problems could be caused by the content errors and therefore discussion with the contributing scientist might be necessary in ordre to fix these.