An XML document can be uploaded to the eXist database even if the document is not valid according to the schema used. However it is mandatory that the document is well formed, otherwise errors will occur when trying to store the document.
But before the documents can be stored, a closer look at the collection structure within eXist is needed. The reBiND Framework depends on two collections for managing the different data projects. One of the collections is for the unpublished data projects, which are still being corrected, reviewed or otherwise prepared for publication. The other collection is for the published projects, which can be publicly searched and accessed. Within this document they will be referred to as unpublished and published. Both of these collections are located in the root collection of eXist. Within these collections are the collection for the individual data projects. Having the two collections for unpublished and published data projects makes the security configuration also quite easy, since the security settings for these two collections are automatically inherited by data projects within these.
Instructions on how to create projects via the reBiND user interface have been described in the data archiving section.
The collection structure of eXist, showing the unpublished and published project collections is shown below:
/db/ ├─ (eXist default) ├─ unpublished/ │ ├─ (data-project-name 1)/ │ │ ├─ data.xml │ │ ├─ metadata.xml │ │ ├─ original-data.xls │ │ └─ images/ │ │ ├─ image_001.jpg │ │ ├─ image_002.jpg │ │ └─ ... │ └─ (data-project-name 2)/ │ └─ ... └─ published/ └─ (data-project-name 3)/ ├─ data.xml ├─ metadata.xml ├─ original-data.xls ├─ images/ │ ├─ image_001.jpg │ ├─ image_002.jpg │ └─ ... └─ multimedia/ ├─ movie1.mov ├─ movie2.avi └─ ...
The majority of database administration can be done through the eXist Java webstart client. For example, the client can be used to add new user accounts, modify permissions for different database collections and delete/edit/query the data.
Users can be created via the eXist Java admin client. For example creating a new user for the contributing scientist is shown in the screenshot below.
In the above example the 'contributing scientist' is allocated to the group 'users', so the /db/unpublished/ directory within the eXist database should be made accessible to the group 'users'.
Though all of the corrections and modifications to the data document could be done using XQuery and the XQuery Update Facility, it was decided to not have the corrections run in XQuery directly. The Correction Manager is written in Java. It is only loosely coupled to eXist in order to make the Framework more modular. Instead of a document being directly accessed within eXist by the Correction Manager, it will be exported by the custom XQuery function to a regular XML file on the file system of the server. This file is then handed over to the Correction Manager. The source code for the eXist module that interacts with the Correction Manager is available from our subversion repository.
The actual corrections are not done by the Correction Manager, but by individual Correction Modules which are managed by the Correction Manager. The source code for the Correction Manager and correction modules is available from our subversion repository.
Writing Custom Correction Modules
A Correction Module is a Java Class which implements a specific Java Interface. It is possible to extend the correction manager by adding a new module (this should implement the class Module).
A new module can be added to take care of a specific problem for the XML documents of the particular reBiND Instance, by writing a new class implementing the interface and putting the compiled class into a specific folder. The eXist server should be restarted but does not have to be rebuilt, which makes it very easy to add new correction modules. This was also one of the reasons why the corrections are done in Java code and not in XQuery. Also Java is much more widely used and understood than XQuery. Furthermore the Correction Manager could be used in stand-alone mode, either by creating an independent GUI tool or by calling the appropriate correction functions via the Command Line. This would require the ABCD data file and the configuration file (specifying the correction modules to be run) to be specified and then calling the startCorrection method in the CorrectionManager class.
Modifying the correction configuration file
The correction configuration file specifies which correction modules are run and in what order. It is an XML file stored in a special collection within eXist - the default location is /db/rebind/correction and the default name is default-correction.xml. By creating different configuration files the user can specify different checks, for example the default configuration checks for all possible errors that have been seen in ABCD files, in another configuration the user might just want to check if the ISO dates are formatted correctly. Details of how to modify the configuration file and an explanation of the function of the implemented modules are described here.
Using a different metadata format
We chose Ecological Markup Language (EML) for creating additional metadata to associate with our published ABCD datasets. EML is used by researchers to document a typical dataset in the ecological sciences. We chose to use a sub-set of EML to describe the core features of our datasets. We also investigated several other Metadata standards, before selecting EML as the most appropriate for our purpose.