Difference between revisions of "Administration"

From reBiND Documentation
Jump to: navigation, search
(Correction Manager)
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
===Administration===
+
==Administration==
  
 
An XML document can be uploaded to the eXist database even if the document is not valid according to the schema used. However it is mandatory that the document is well formed, otherwise errors will occur when trying to store the document.  
 
An XML document can be uploaded to the eXist database even if the document is not valid according to the schema used. However it is mandatory that the document is well formed, otherwise errors will occur when trying to store the document.  
  
But before the documents can be stored, a closer look at the collection structure within eXist is needed. The reBiND Framework depends on two collections for managing the different data projects. One of the collections is for the unpublished data projects, which are still being corrected, reviewed or otherwise prepared for publication. The other collection is for the published projects, which can be publicly searched and accessed. Within this document they will be referred to as ''unpublished'' and ''published''. Both of these collections are located in the root collection of eXist. Within these collections are the collection for the individual data projects. Having the two collections for unpublished and published data projects makes the security configuration also quite easy, since the security settings for these two collections are automatically inherited by data projects within these.
+
But before the documents can be stored, a closer look at the collection structure within eXist is needed. The [[Glossary|reBiND Framework]] depends on two collections for managing the different data projects. One of the collections is for the unpublished data projects, which are still being corrected, reviewed or otherwise prepared for publication. The other collection is for the published projects, which can be publicly searched and accessed. Within this document they will be referred to as ''unpublished'' and ''published''. Both of these collections are located in the root collection of eXist. Within these collections are the collection for the individual data projects. Having the two collections for unpublished and published data projects makes the security configuration also quite easy, since the security settings for these two collections are automatically inherited by data projects within these.
  
Instructions on how to create projects via the reBiND user interface have been desctibed in the [Data_upload_to_rebind_framework data archiving section].
+
Instructions on how to create projects via the reBiND user interface have been described in the [[Data_upload_to_rebind_framework|data archiving section]].
  
 
The collection structure of eXist, showing the unpublished and published project collections is shown below:
 
The collection structure of eXist, showing the unpublished and published project collections is shown below:
Line 37: Line 37:
 
</pre>
 
</pre>
  
The majority of the database administration can be done through the eXist Java webstart client. For example, the client can be used to add new user accounts, modify permissions to different database collections and delete/edit/query the data.
+
The majority of database administration can be done through the eXist Java webstart client. For example, the client can be used to add new user accounts, modify permissions for different database collections and delete/edit/query the data.
  
===User management===
+
==User management==
  
 
Users can be created via the eXist Java admin client. For example creating a new user for the contributing scientist is shown in the screenshot below.
 
Users can be created via the eXist Java admin client. For example creating a new user for the contributing scientist is shown in the screenshot below.
Line 47: Line 47:
 
In the above example the 'contributing scientist' is allocated to the group 'users', so the /db/unpublished/ directory within the eXist database should be made accessible to the group 'users'.
 
In the above example the 'contributing scientist' is allocated to the group 'users', so the /db/unpublished/ directory within the eXist database should be made accessible to the group 'users'.
  
===Correction Manager===
+
==Correction Manager==
  
 
Though all of the corrections and modifications to the data document could be done using XQuery and the XQuery Update Facility, it was decided to not have the corrections run in XQuery directly. The Correction Manager is written in Java. It is only loosely coupled to eXist in order to make the Framework more modular. Instead of a document being directly accessed within eXist by the Correction Manager, it will be exported by the custom XQuery function to a regular XML file on the file system of the server. This file is then handed over to the Correction Manager. The source code for the eXist module that interacts with the Correction Manager is available from our [http://ww2.biocase.org/svn/rebind/trunk/reBiND-eXist-Module/ subversion repository].   
 
Though all of the corrections and modifications to the data document could be done using XQuery and the XQuery Update Facility, it was decided to not have the corrections run in XQuery directly. The Correction Manager is written in Java. It is only loosely coupled to eXist in order to make the Framework more modular. Instead of a document being directly accessed within eXist by the Correction Manager, it will be exported by the custom XQuery function to a regular XML file on the file system of the server. This file is then handed over to the Correction Manager. The source code for the eXist module that interacts with the Correction Manager is available from our [http://ww2.biocase.org/svn/rebind/trunk/reBiND-eXist-Module/ subversion repository].   
Line 53: Line 53:
 
The actual corrections are not done by the Correction Manager, but by individual Correction Modules which are managed by the Correction Manager. The source code for the Correction Manager and correction modules is available from our [http://ww2.biocase.org/svn/rebind/trunk/reBiND-CorrectionManager/ subversion repository].  
 
The actual corrections are not done by the Correction Manager, but by individual Correction Modules which are managed by the Correction Manager. The source code for the Correction Manager and correction modules is available from our [http://ww2.biocase.org/svn/rebind/trunk/reBiND-CorrectionManager/ subversion repository].  
  
A Correction Module is a Java Class which implements a specific Java Interface. It is possible to extend the correction manager by adding a new module (this should implement the class [http://ww2.biocase.org/svn/rebind/trunk/reBiND-CorrectionManager/src/org/bgbm/rebind/correction/modules/ Module]. A new module can be added to take care of a specific problem for the XML documents of the particular reBiND Instance, by writing a new class implementing the interface and putting the compiled class into a specific folder. The eXist server should be restarted but does not have to be rebuilt, which makes it very easy to add new correction modules. This was also one of the reasons why the corrections are done in Java code and not in XQuery. Also Java is much more widely used and understood than XQuery. Furthermore the Correction Manager could be used in stand-alone mode, either by creating an independent GUI tool or by calling the appropriate correction functions via the Command Line. This would require the ABCD data file and the configuration file (specifiying the correction modules to be run) to be specified and then calling the startCorrection method in the CorrectionManager class.
+
===Writing Custom Correction Modules===
 +
A Correction Module is a Java Class which implements a specific Java Interface. It is possible to extend the correction manager by adding a new module (this should implement the class [http://ww2.biocase.org/svn/rebind/trunk/reBiND-CorrectionManager/src/org/bgbm/rebind/correction/modules/ Module]).  
  
==[[Correction Modules]]==
+
{{CodeExample|lang=Java| 1=
** Writing Custom Correction Modules ([[Architecture-Concept#Automated_Corrections|How to modify the correction configuration file]])
+
package org.bgbm.rebind.correction.modules;
** Using a different XML format
+
 
** Using a different metadata format ([[Metadata|Discussion of different Metadata standards]])
+
import java.io.File;
 +
 
 +
public interface Module {
 +
public String process(File inputFile, File outputFile, String[][] settings);
 +
}
 +
|description=The Java code of the Module Interface.}}
 +
 
 +
A new module can be added to take care of a specific problem for the XML documents of the particular reBiND Instance, by writing a new class implementing the interface and putting the compiled class into a specific folder. The eXist server should be restarted but does not have to be rebuilt, which makes it very easy to add new correction modules. This was also one of the reasons why the corrections are done in Java code and not in XQuery. Also Java is much more widely used and understood than XQuery. Furthermore the Correction Manager could be used in stand-alone mode, either by creating an independent GUI tool or by calling the appropriate correction functions via the Command Line. This would require the ABCD data file and the configuration file (specifying the correction modules to be run) to be specified and then calling the startCorrection method in the CorrectionManager class.
 +
 
 +
===Modifying the correction configuration file===
 +
The correction configuration file specifies which correction modules are run and in what order. It is an XML file stored in a special collection within eXist - the default location is /db/rebind/correction and the default name is default-correction.xml. By creating different configuration files the user can specify different checks, for example the default configuration checks for all possible errors that have been seen in ABCD files, in another configuration the user might just want to check if the ISO dates are formatted correctly. Details of how to modify the configuration file and an explanation of the function of the implemented modules are [[Correction Modules|described here]].
 +
 
 +
==Using a different metadata format==
 +
 
 +
We chose Ecological Markup Language (EML) for creating additional metadata to associate with our published ABCD datasets. EML is used by researchers to document a typical dataset in the ecological sciences. We chose to use [[Ecologial_Metadata_Language| a sub-set of EML to describe the core features of our datasets]]. We also investigated [[Metadata| several other Metadata standards]], before selecting EML as the most appropriate for our purpose.

Latest revision as of 01:59, 19 November 2014

Administration

An XML document can be uploaded to the eXist database even if the document is not valid according to the schema used. However it is mandatory that the document is well formed, otherwise errors will occur when trying to store the document.

But before the documents can be stored, a closer look at the collection structure within eXist is needed. The reBiND Framework depends on two collections for managing the different data projects. One of the collections is for the unpublished data projects, which are still being corrected, reviewed or otherwise prepared for publication. The other collection is for the published projects, which can be publicly searched and accessed. Within this document they will be referred to as unpublished and published. Both of these collections are located in the root collection of eXist. Within these collections are the collection for the individual data projects. Having the two collections for unpublished and published data projects makes the security configuration also quite easy, since the security settings for these two collections are automatically inherited by data projects within these.

Instructions on how to create projects via the reBiND user interface have been described in the data archiving section.

The collection structure of eXist, showing the unpublished and published project collections is shown below:

/db/
 ├─ (eXist default)
 ├─ unpublished/
 │   ├─ (data-project-name 1)/
 │   │   ├─ data.xml
 │   │   ├─ metadata.xml
 │   │   ├─ original-data.xls
 │   │   └─ images/
 │   │       ├─ image_001.jpg
 │   │       ├─ image_002.jpg
 │   │       └─ ...
 │   └─ (data-project-name 2)/
 │       └─ ...
 └─ published/
     └─ (data-project-name 3)/
         ├─ data.xml
         ├─ metadata.xml
         ├─ original-data.xls
         ├─ images/
         │   ├─ image_001.jpg
         │   ├─ image_002.jpg
         │   └─ ...
         └─ multimedia/
             ├─ movie1.mov
             ├─ movie2.avi
             └─ ...

The majority of database administration can be done through the eXist Java webstart client. For example, the client can be used to add new user accounts, modify permissions for different database collections and delete/edit/query the data.

User management

Users can be created via the eXist Java admin client. For example creating a new user for the contributing scientist is shown in the screenshot below.

Create exist user.PNG

In the above example the 'contributing scientist' is allocated to the group 'users', so the /db/unpublished/ directory within the eXist database should be made accessible to the group 'users'.

Correction Manager

Though all of the corrections and modifications to the data document could be done using XQuery and the XQuery Update Facility, it was decided to not have the corrections run in XQuery directly. The Correction Manager is written in Java. It is only loosely coupled to eXist in order to make the Framework more modular. Instead of a document being directly accessed within eXist by the Correction Manager, it will be exported by the custom XQuery function to a regular XML file on the file system of the server. This file is then handed over to the Correction Manager. The source code for the eXist module that interacts with the Correction Manager is available from our subversion repository.

The actual corrections are not done by the Correction Manager, but by individual Correction Modules which are managed by the Correction Manager. The source code for the Correction Manager and correction modules is available from our subversion repository.

Writing Custom Correction Modules

A Correction Module is a Java Class which implements a specific Java Interface. It is possible to extend the correction manager by adding a new module (this should implement the class Module).

package org.bgbm.rebind.correction.modules;

import java.io.File;

public interface Module {
	public String process(File inputFile, File outputFile, String[][] settings);
}
The Java code of the Module Interface.


A new module can be added to take care of a specific problem for the XML documents of the particular reBiND Instance, by writing a new class implementing the interface and putting the compiled class into a specific folder. The eXist server should be restarted but does not have to be rebuilt, which makes it very easy to add new correction modules. This was also one of the reasons why the corrections are done in Java code and not in XQuery. Also Java is much more widely used and understood than XQuery. Furthermore the Correction Manager could be used in stand-alone mode, either by creating an independent GUI tool or by calling the appropriate correction functions via the Command Line. This would require the ABCD data file and the configuration file (specifying the correction modules to be run) to be specified and then calling the startCorrection method in the CorrectionManager class.

Modifying the correction configuration file

The correction configuration file specifies which correction modules are run and in what order. It is an XML file stored in a special collection within eXist - the default location is /db/rebind/correction and the default name is default-correction.xml. By creating different configuration files the user can specify different checks, for example the default configuration checks for all possible errors that have been seen in ABCD files, in another configuration the user might just want to check if the ISO dates are formatted correctly. Details of how to modify the configuration file and an explanation of the function of the implemented modules are described here.

Using a different metadata format

We chose Ecological Markup Language (EML) for creating additional metadata to associate with our published ABCD datasets. EML is used by researchers to document a typical dataset in the ecological sciences. We chose to use a sub-set of EML to describe the core features of our datasets. We also investigated several other Metadata standards, before selecting EML as the most appropriate for our purpose.