Difference between revisions of "TechnicalDocumentation"

From Annotationssystem für Biodiversitätsdaten
Jump to: navigation, search
(Web Services)
(Web Services)
Line 991: Line 991:
 
   "size": 2,
 
   "size": 2,
 
   "annotations": [
 
   "annotations": [
{
+
    {
    "repositoryURI": "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/annotations/BGBM/AnnoSys/1404736720822",
+
    "repositoryURI": "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/annotations/BGBM/AnnoSys/1404736720822",
    "recordURIs": [
+
    "recordURIs": [
        "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/records/IBMT/CCCryo/242-06/1404725446799/abcd2.06b"
+
      "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/records/IBMT/CCCryo/242-06/1404725446799/abcd2.06b"
    ],
+
    ],
    "annotator": "Wolf-Henning Kusber",
+
    "annotator": "Wolf-Henning Kusber",
    "time": 1404737340416,
+
    "time": 1404737340416,
    "motivation": "Determination"
+
    "motivation": "Determination"
},
+
    },
{
+
    {
"repositoryURI": "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/annotations/BGBM/AnnoSys/1410859759463",
+
    "repositoryURI": "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/annotations/BGBM/AnnoSys/1410859759463",
    "recordURIs": [
+
    "recordURIs": [
        "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/records/STU/Staatliches+Museum+f%C3%BCr+Naturkunde+Stuttgart%2C+Herbarium/Main-1-8374/1410859752058/abcd2.06b"
+
      "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/records/STU/Staatliches+Museum+f%C3%BCr+Naturkunde+Stuttgart%2C+Herbarium/Main-1-8374/1410859752058/abcd2.06b"
    ],
+
    ],
    "annotator": "Okka Tschöpe",
+
    "annotator": "Okka Tschöpe",
    "time": 1410859852835,
+
    "time": 1410859852835,
    "motivation": "ScientificName"
+
    "motivation": "ScientificName"
},
+
    },
]
+
  ]
 
  </nowiki>
 
  </nowiki>
 
  }
 
  }

Revision as of 17:06, 27 May 2015

Contents

Introduction

This document provides the technical documentation of the AnnoSys system. Therein, the following methodology is used. First, a reference to basic workflow definitions is given. Then, an overview of the system architecture introduces the technical components implemented to realise these workflows. Next, a detailed description of the basic data model and its implementation with the means of the W3C Open Annotation specification is provided. Finally, the specification of technical interfaces or configuration details are provided in the following section for each technical component whenever apropriate.

Basic Workflow

Our publication Annotating Biodiversity Data via the Internet[1] provides an detailed description of the basic workflows elaborated for AnnoSys.

System Architecture

AnnoSys' implementation aims to model the basic workflow and to integrate further requirements identified in [1] as close as possible. Therefore, AnnoSys is composed of the following interacting system components:

  • Repository
  • User Interface
  • Security
  • Message System
  • Services
AnnoSys system architecture

The repository component provides functionalities to archive annotations including the relating original record documents presented to the user at an annotation's creation time. A web based, but desktop oriented user interface supports annotators, curators and users to intuitively manage annotation workflows. The security component is responsible for secure authentication, authorisation and data privacy concerns within the system. The task of the message system component is to organise information flows such as informing annotators, curators or other users about status changes in annotation workflows. Finally, the service component provides interfaces to enable data exchange with external clients or services interacting with the repository.

The following sections elaborate on AnnoSys' system components.

Repository

Within AnnoSys, W3C Open Annotation Data Model (OA)[2] serves as a basis data model to store and exchange annotation data. While OA's data model is defined based on the Resource Description Framework (RDF)[3], record documents are delivered based on Extensible Markup Language (XML)[4] exchange formats like e.g. ABCD[5]. Due to the close relationship of annotations and record documents, both have to be archived within the AnnoSys repository to ensure exact recovery of annotation data at any later point in time. Finally, the repository also comprehends agent(user) profile data and agent authentication and authorisation information.

Annotations

In consideration of the RDF oriented nature of OA based annotation data, AnnoSys uses so called RDF stores (aka. triple stores) to store annotations. These stores can be regarded as databases primarily specializing in storing and retrieving of RDF data. Usually, information stored in RDF stores can be retrieved by queries specified using the SPARQL[6] query language.

Some recent evaluation reports and RDF store benchmarking studies [7][8][9] evaluated individual strengths and weaknesses of different RDF stores. Referring to these evaluation results, AnnoSys decided to use Apache Jena TDB[10] for development and prototype implementations and Virtuoso Open-Source Edition[11] for productional use.

Original Record documents

Referring to the experience gained from the BioCASE annotation system [12] its file system based, hierarchical storage architecture has proofed to be a suitable solution for efficiently storing and accessing XML based original record documents(e.g ABCD, Darwin Core[13]) in a biodiversity context. Following that experience, the current repository implementation stores original record documents on the server's file system in a configurable directory location.

Agent Profiles

The term agent profiles aggregates the following information concerning system agents (users):

  • personal metadata
  • authentication & authorisation data
  • personal annotation repository
  • preferences configuration

While an agent's personal metadata comprises his real name, affiliation and email address only, authentication & authorisation data add login credentials (username, password), roles and a unique agent identifier to the profile. Please find any details about the information stored for registered agents within the security database in section Security.

Additionally, according to the best practices recommended by the W3C Open Annotation Data Model[14], the personal metadata is also stored based on the FOAF vocabulary[15] together with annotation data within AnnoSys' RDF store. For privacy reasons, agents' email addresses or other sensible information are not publicly accessible in the RDF store.

Agent metadata properties
property definition
rdf:type agent type (foaf:Person, foaf:Organization or dcTypes:Software)
foaf:name agent's name
foaf:member optional list of members (for agent type foaf:Organization only)

AnnoSys maintains a personal annotation repository per registered agent within the system. These repositories keep track on any annotation data created and edited by authenticated agents and are neither publicly accessible nor by other system users. On publication, annotation data will be moved from the publishing agent's personal annotation repository to AnnoSys' publicly accessible RDF store. Annotations may also be manually removed by authenticated agents via the user interface.

The preferences configuration may also be managed by authenticated agents via the user interface. It holds agent individual system configurations like e.g. a list of currently opened documents in the user interface for restoring at the next login.

The security database is realised as SQLite[16] database. The personal annotation repositories are named according to the agent's resourceId in AnnoSys. The private annotation repositories are created as Apache Jena TDB[10] model in a directory location on the server including any personal agent profiles in subdirectories named according the agent's system resourceid.

User Interface

Providing and managing annotations may become a complex job either to annotators or data curators. For instance, annotators have to gather specimen data, analyse a potentially large number of data objects and annotate or comment specific data elements accurately. Again, data curators have to analyse these annotations, accept or reject them possibly along with individual comments and have to reintegrate them back into their data collections. Further on, any of these activities should be documented and communicated back to annotators and other subscribing members of the community.

As the main focus of both annotators and curators should be on preparing accurate information, a suitable user interface should give advice and guide users through all these workflows. Furthermore, it should allow a clear presentation of information, adoption to individual user preferences and provide convenient guidance and support through the most common workflow activities. This requires modern web user interfaces providing functionality and look and feel of desktop applications. Fortunately, due to recent web toolkit developments based on the Asynchronous JavaScript and XML (Ajax)[17] technology, desktop alike applications can be implemented for the web now as well.

Here, AnnoSys decided to use Eclipse's Eclipse's Remote Application Platform (RAP)[18] for developing a desktop oriented web user interface. The main reason for that decision was that Eclipse RAP imposes only small requirements on client machine's capabilities and does not transmit potentially large documents over the network. Thus, any complex functionality is executed on the server.

Security

The security component provides the following functionalities:

  • agent profile management
  • secure authentication
  • role based authorisation

Agent profile management combines the maintenance of the security database and personal profile stores for each registered agent (see Agent Profiles).

Agent profile management also facilitates the agent registration process. The agent registration consists of filling in a form with the required information. After running the registration procedure successfully, agents can immediately login to the system and start editing and publishing annotations. Please find any details about the information to be entered in section Secure Authentication.

Agent authentication is secured by individual agent credentials to be entered in the login dialog. Moreover, any data transmissions initiated through network connections are secured by the Hypertext Transfer Protocol Secure (HTTPS)[19].

Based on the implementation of role based authorisation, AnnoSys may restrict access to certain functionalities based on permissions. These permission may either be compiled in and assigned to agents through role definitions (e.g. curator for a given collection) or individually by system administrators on institutional request. Please find details in section Role Based Authorisation Management.

In AnnoSys, any security relevant implementations base on the Apache Shiro[20] framework. The security database is held in a SQLite[16] database.

For security reasons and with respect to data privacy, the security database, personal profile stores and other system relevant configuration files are stored in a location on the server inaccessible for unauthorised AnnoSys agents or external services.

Message System

Via the message system, AnnoSys aims to inform registered agents and curators about annotation workflow status changes triggered by agent or system interactions. Therefore, the message system forwards notifications related to annotation workflow activities initiated by agents, curators or the system itself. Notifications are dispatched via system internal message queues and are individually manageable by registered agents via the user interface. Additionally, notifications are also send to agents by email.

In particular, the following kinds of annotation publishings cause the message system to trigger notifications regarding annotation workflow status changes:

  • original record annotations published by annotators
  • curator decisions reported by the AnnoSys curator interface (not implemented yet)
  • version updates of collection records, either detected on repository import processing or reported by curators (not implemented yet)

On occasion of any annotation workflow status change described above, the message system issues notifications according to the following topics agents may subscribe to:

  • subsequent publication of annotations relating to records previously annotated by the given agent (automatically)
  • publication of annotations relating to records an agent actively subscribed to
  • publication of annotations relating to a group of records which meet specific criteria an agent previously defined

Furthermore, the message system automatically issues notifications to curators on occasion of any annotation published and related to those records within collections a curator has registered for.

Summarising the set of notifications specified above, AnnoSys may issue the following kinds of messages:

message type definition
ANNOTATION_PUBLISHED annotation issued by an annotator
ANNOTATION_CURATED annotation curated by a collection manager

(Currently the messages aren't distinguished by type. Although messages might have different content depending on the notification event.)

Technically, the message system builds up on the Java Message Service (JMS)[21] standard developed by the Java Community Process and defined within specification JSR 914[22]. Therefore, the JMS provider Apache ActiveMQ[23] is used in tandem with Apache Camel[24] to realise internal and external message transports.

Please find further technical details in section MessageSystem.

Services

The service component enables external services to access public data from the AnnoSys repository. Therefore, AnnoSys provides the following services and interfaces:

  • RESTful web services
  • Linked Open Data
  • SPARQL Endpoint

Primarily, the RESTful web service interface is intended for collaborating data portals to display information about annotations or original record documents stored in the AnnoSys repository in general, or with regard to a specific record based on its tripleId. Thus, beside lists all annotations or records available in the repository, external services may also request a list of available annotations relating to a specific record based on its tripleId. These lists contain stable URIs referring to the related annotation or record stored in the repository.

Any of these stable URIs are resolvable, such as either the denoted annotation can be downloaded as RDF/XML document, or the original record as XML document from the AnnoSys repository. That way, the AnnoSys repository and any services are completely following the recommendations of the Linked Open Data paradigm.

Finally, external services may run self-defined SPARQL queries on the AnnoSys annotation repository via the AnnoSys SPARQL endpoint[6]. Any non-anonymous resource in the repository's RDF store is also based on stable URI and thus, follows the Linked Open Data paradigm. RESTful and Linked Open Data services are implemented using the Java API for RESTful Services (JAX-RS)[25]


Annotation Context Model

AnnoSys' annotation context modelling approach founds on the presumption that annotations will be related to any kind of metadata related to physical specimens uniquely identifiable by Globally Unique Identifiers (GUIDs)[26][27]). Beyond that, the accessibility of those record data in an XML document based standard like ABCD[5] is assumed. In the biodiversity domain, there are several use cases like e.g. the annotation of georeferences which require to simultaneously address multiple record data elements (e.g. longitude, latitude) in a single annotation for providing useful information. That is, the annotation context model combines general meta information determining e.g. the annotated record or the annotator's name with an aggregated list of data element selectors. With each of these selectors, an expectation towards the curator of the collection the record relates will be expressed. The expectation may either state to update, add or remove the selected data element within the record. Furthermore, the annotator may propose new or corrected values to the selected data elements and/or may make natural language comments. Based on these information, the curator may decide to either update a collection's database or not.

Therefore, in AnnoSys the annotation context will be defined by

  • A GUID for the specimen's metadata data object (tripleID)
  • An XML representation of the data record the GUID refers to
  • A selection of XML elements (XPath selectors) within the XML representation affected by the annotation

The following sections will introduce the AnnoSys' interpretation and implementation of the above mentioned annotation context model elements.

AnnoSysID's and Globally Unique Identifiers (GUIDs)

With regard to biodiversity collections, the term GUID initially reflects data records describing physical unit objects in museum collections. As such, a GUID is usually called a triple identifier(tripleId) and is specified by the following data triple:

  • institution identifier (institutionId)
  • collection identifier (collectionId)
  • unit identifier (unitId)

Thereby, the tripleId refers to a specimen object (unit) belonging to a determined collection which is physically located at the given instution (e.g. herbarium). Thus, any TDWG standard like ABCD[5] usually defines data elements reflecting that tripleId.

Basically, any annotation workflows in AnnoSys start by retrieving, accessing and analysing an XML record document. In general, data providers generate these documents on the fly from the data actually stored for the given tripleId in their collection databases and no versioning mechanisms can presumed. Therefore, within the analysing process, the system attempts to detect the tripleId from the downloaded XML record document first. Next, the most recent version of XML record document available from the AnnoSys repository for that tripleId will be compared with the downloaded one. If the comparison result shows any differences between the documents, AnnoSys assumes the issuance of a new data record revision for that tripleId. Thus, the downloaded XML record document has to be added as a new version to the repository, since any changes within the XML record document potentially invalidate annotations created previously.

Therefore, AnnoSys always connects annotations to a specific version of an XML record document. That is, AnnoSys makes use of LSIDs[28][29], which extends the notion of tripleIds by the requested version information. The usage of the version information in optional (e.g. when referencing annotations).

As (different) data providers may supply the same record data in different data formats (e.g. Darwin Core[13]), annotations can not be correctly reproduced (different XPath selectors), if the standard of the corresponding XML record document had not been recorded. To support that, AnnoSys again extends the LSID specification by adding a defined namespace prefix for each supported XML data standard. The usage of the format information in optional (e.g. when referencing annotations or agents).

Finally, this brings up a persistent identifier called AnnoSysID and introduces the following schema to unambiguously identify XML record data within the repository:

  • institutionId:collectionId:unitId:[version]:[format]
Example:
BGBM:Herbarium+Berolinense:B+18+0014862:1379406965371:abcd2.06b (Record)
BGBM:AnnoSys:1379006965559 (Annotation or Agent)

XML Data Record representations

Currently, AnnoSys supports XML record documents in ABCD 2.06b[5] format only.

Through the generic context selection approach context selection approach, AnnoSys can be adopted to support other XML standards or versions as well.

Annotation Context Selection

Using AnnoSys, annotators may identify several data elements within a record to be worth annotating. While AnnoSys is designed to support XML record documents, a mechanism to select these data elements has to be developed and implemented. During the evalution phase, it turned out that using XML diff representations or inserting annotated values and/or natural language comments as XML comments into XML record documents results in either cryptic result representations, low performance or conceptual difficulties. Therefore, AnnoSys developed the idea of realising some kind of "recorder" functionality to capture those XML data elements selected by annotators.

For that, XPath[30] expressions provide suitable means to unambiguously identify either single or sets of data elements within XML documents of any record format. In particular, XPath based element selectors can be used to uniquely identify (sets of) data elements like

  • "all elements named "xxx" located below the element "yyy""
  • "all elements named "xxx" with a value "zzz""

Annotation Data Model

The data model outlined within the next sections is centered on the Annotation Context Model developed earlier in this document. So, the annotation data model bases in two interrelated parts:

  • Annotation Metadata Specification
  • Annotated Elements Specification

While the metadata specification provides general meta information (e.g. determining the annotated record or annotator's name), the "recorder" functionality is realised through a list of data elements capturing those data elements selected for annotation by the annotator. These "annotated elements" are "recorded" according to the XPath selector method outlined in section Annotation Context Selection. The following sections will provide more detailed information on both parts.

Annotation Data Model

Built up on these base concepts, enhanced concepts have been added in order to permit curators to communicate annotations about the curational processing of annotations. Also, a concept for mass (or multi) annotations has been developed in order to link together identical annotations addressing identical errors in a (large) number of distinct record documents. The latter concept was also introduced for curation annotations. Thus, the annotation data model reflects the following extended concepts as well:

  • Curation Annotation Specification
  • Curated Elements Specification
  • Mass (or Multi) Annotation Specification
  • Mass (or Multi) Curation Annotation Specification

Annotation Metadata Specification

The annotation metadata provides information regarding the following general aspects

  • annotation creation
  • annotated record
  • annotating agent

Valid annotations must include any data described within the following sections for each aspect separately.

Annotation creation

The annotation creation entails the generation of following data elements:

Motivation
Usually, annotations will be instantiated for well determined reasons. Within extensions[31] proposed by the TDWG Annotations Interest Group[32] the term motivation was introduced to express the intension for which an annotator has made the annotation. Within this document, we will use it synonymously with the term annotation type describing a precompiled set of data element selectors used in typical, domain specific annotation use cases. Currently, AnnoSys supports the following motivations (annotation types):
* Determination
* Duplicate Locality
* Gathering
* Nomenclatural type
* Other
* Sequence
* Record basis
* Scientific name
* Unit Mark
Datetime (of the creation)
The creation's datetime describes the point in time the annotation was published by the system. This time is usually different to the time, the annotation was created within the system for user processing. The time's granularity is given in milliseconds.
AnnoSysID (of the annotation)
The AnnoSysID of the annotation shall uniquely identify any annotation created. In accordance to the AnnoSysID's and Globally Unique Identifiers (GUIDs), the AnnoSysID is constructed based on the following components:
institutionID
Organisation running the AnnoSys instance creating the annotation(e.g. "BGBM").
collectionID
Name of the annotation repository (e.g. "AnnoSys").
unitID
Time in milliseconds since 01/01/1970 the annotation was first instantiated by the system
Since, annotations are not intended to be modified after publication and W3C Open Annotation Data Model (OA) actually is the only standard supported for annotation exchange, the fields version and format are currently unneeded.

Annotated record

The term annotated record compiles the following information with regard to the XML record document representation and the relating annotated specimen unit:

record GUID
The record GUID identifies the physical specimen unit to which the annotated XML record document relies on and is expected to be represented as tripleId.
record document format
The record document format describes the standard format of the annotated XML record document. This is expected to be specified by the namespace URI of the relating version of the XML standard format (e.g. ABCD v2.06: http://www.tdwg.org/schemas/abcd/2.06).
record document version
The revision information reflects any kind of version information regarding the annotated record document. As, we can not expect any revision information from the XML record document, the time in milliseconds since 01/01/1970 when downloading the document first into the AnnoSys repository will be used.

Annotating agent

An annotating agent is either a person or a machine conducting the annotation. Conforming to the common practice considering physical specimen objects in botanical museums, only a minimum set of data is captured:

name
The name field includes first name(s) and last name(s) of the annotating person, or the name and version of the software or service used to create the annotation.
institution
The institution denotes the organisation the annotating agent belongs to. If the institution field is empty, the authorisation component will consider the agent as member of the "private taxonomist" group.
email
The email address the annotating person, or the email address of the annotating machine's administrator. For privacy reasons, this information is not subjected to be made public.

Annotated Elements Specification

According to the Annotation Context Model described in the previous chapter, the annotated elements represent a list of element selectors each linked with the following information in order to annotate the selected data element in the XML record document:

element selector
is realised according to section element selectors. Regarding the interpretation of selectors, the following special cases apply:
  1. A selector pointing at the XML document's root element ("/") is interpreted as to address the entire data record. In particular, this means is used to express general comments about the referenced data record.
  2. A selector pointing at a data element, for which the relating XML Schema does not allow for values to be entered, does not allow for entering annotated values as well.
annotated value (optional)
enables annotators to propose a new value for the selected data element. The proposed value must correspond to the data type specified in the document's relating XML Schema.
expectation
expresses the expected corrective to be taken by the curator with regard to the annotated data record in the collection database. Within AnnoSys, the following expectations may be expressed:
update
the value of the selected data element should be updated by the annotated value in the annotated data record.
remove
the selected data element should be removed from the annotated data record
add
a new data element according to the element selector should be added to the annotated data record including the given annotated value and/or comment.
comment (optional)
permits annotators to annotate or to make natural language comments for the selected data element. Comments are always expected to be a text string.

While both, annotated value and comment are optional, one of both must be present in order to provide valuable information. That is, annotation elements neither including annotated value nor comment are considered as to be invalid.

Curation Annotation Specification

The intention of curation annotations is to prepare means for collection managers registered with curation rights for given institutions, collections or units to provide curational feedback on the further processing of annotations within the affected collection database. As curation annotations refer to other annotations instead of records, the following sections will reflect necessary modifications to the general Annotation metadata specification in order to instantiate curation annotations only. Additionally, the section Curated Elements Specification will provide details on how curation annotations refer and provide curational information with regard to the list of Annotated Elements usually being part of record annotations.

Curation Annotation Data Model

Curation Annotation creation

Motivation
In order to distinguish curation annotations from record annotations, a further motivation type was introduced to outline the curational intend of curation annotations:
* Curation

Curated annotation

The term curated annotation compiles the following information with regard to the annotation representing the object of a curation annotion:

annotation GUID
The annotation GUID identifies the annotation to which the curation annotation relies on and is expected to be represented as tripleId.

Curating agent

The term curating agent can be synonomously used to the term annotating agent.

Curated Elements Specification

Instead of referring to selected elements within a record document, curation annotations refer to annotated elements of an already existing. Moreover, curated elements are used to annotate the curational information processing in the concerned collection database with regard to the annotated elements. Optionally, the curator may enrich the curational processing information by natural language comments.

The following information is used to annotate curational information proccessings:

scoped annotated element
a stable identifier referencing the annota+*ted element being subjected by a given curated element.
curational processing
assigns one of the following processings to the scoped annotated element:
accepted
the annotated element was accepted by the curator and updated within the collection database.
rejected
the annotated element was rejected by the curator and therefore not updated within the collection database.
updated
the annotated element was already updated before by another reasong.
undecided
the further processing of the annotated element within the collection database was not decided yet.
comment
permits curators to justify or to provide further natural language comments regarding the processing of the scoped annotated element.

Mass (or Multi) Annotation Specification

By creating a mass annotation within the AnnoSys user interface, annotators can create identical annotations referring multiple, distinct record documents by a single execution step. That way, identical errors occurring in multiple records or affecting a dataset completely can efficiently be annotated using AnnoSys. Each annotation will be stored in the annotation repository as a "regular" annotation. Additionally, the concept of mass annotation includes the automatic generation of an additional annotation linking all these annotations together. So, each annotation can stand for itself, but can also be easily identified as being part of a mass annotation.

Mass Annotation Data Model

Thus, just a single information is required to realise the mass annotation concept:

annotation references
a list of references to annotations affected by this mass annotation.

Mass (or Multi) Curation Annotation Specification

As the former section introduced the concept of mass annotations, AnnoSys also provide a means for curators to respond to mass annotations in a single step as well. This concept is called mass curation annotation and corresponds to the concept above, but is creating and linking together curation annotations instead of regular annotations.

Mass Curation Annotation Data Model

Thus, the single single information required to realise the mass curation annotation concept is:

curation annotation references
a list of references to curation annotations affected by this mass annotation.

Repository

Records

Referring to the experience gained from the BioCASE annotation system [12] its file system based, hierarchical storage architecture has proofed to be a suitable solution for efficiently storing and accessing XML based original record documents(e.g ABCD[5], Darwin Core) in a biodiversity context. Following that experience, the current repository implementation stores original record documents on the server's file system in a configurable directory location. Using ABCD terms, the following file path creation scheme is used.

  • $RECORD_BASE_DIR/SourceInstitutionID/SourceID/format/UnitID.revision.xml

Thereby, the tripleID (SourceInstitutionID, SourceID and UnitID, see GUIDs in AnnoSys) will be extracted from the individual original record document depending on its data standard(e.g ABCD, Darwin Core). The format part within the file path creation scheme will be determined by a system-wide, predefined mapping scheme assigning an XML namespace prefix to the XML namespace URI for any data standard supported by the system. Finally, the revision part within the file name describes the datetime an original record document was first stored within the repository. According to the respective data elements defined within the [#Data Model|Data Model], the value of revision is specified as the time in milliseconds since 01/01/1970.

Whenever required for some substantial reasons(e.g. query performance), later revisions of AnnoSys may optionally store original record documents in a dedicated XML database or may introduce search engines like Apache Solr[33] for optimising e.g. query performance.

Annotations

The storing of annotations in AnnoSys is twofold. First, each user has a personal user profile, which also includes a manageable repository holding any unpublished annotations of the given user. Second, there is a public repository, where annotations were moved to whenever a users initiates to publish an annotation.

Thereby, the user profile repositories are realised using Apache Jena TDB[10] based triple stores. Whereas, the public annotation repository uses an Virtuoso Open-Source Edition[11] triple store. Nevertheless, both triple stores are implemented in AnnoSys using the Apache Jena[34] Software APIs.

In both kinds of triple stores, annotations are stored according to the W3C Open Annotation Data Model[14] driven by the activities of the W3C Open Annotation Community Group[35]. Due to some constraints in the current release of the W3C Open Annotation Data Model, the current implementation of the Annotation Data Model specified for AnnoSys with the means of the W3C Open Annotation Data Model results from thorough discussions with the W3C Open Annotation Community Group. Therefore, the following sections details the integration of AnnoSys' Annotation Data Model with the W3C Open Annotation Data Model.

W3C Open Annotation implementation of the AnnoSys Annotation Data Model

Principally, there are two means to implement complex data models with W3C Open Annotation. First, just the basic OA features are used to identify a target (in this case a versioned record document) and to realise annotations by formalising an own ontology to describe the annotations within the body. Second, trying to make use of all W3C OA features in order to express annotations with the means of W3C OA completely. Within AnnoSys, we decided to use the second alternative in order to provide a solution as close as possible to the current standard, and thus to be interpretable in non-biodiversity domains as well.

Therefore, the W3C Open Annotation implementation of the AnnoSys Annotation Data Model introduces new motivations reflecting the annotation types specified in the Annotation Data Model. Also, it uses multiple targets and bodies to implement the annotated elements also specified in the Annotation Data Model. Finally, the oa:hasScope property relation is used to realise a weak, but adequate linking between body and target of an annotation element. Within the next paragraphs provenance, target and body parts of the AnnoSys implementation will be elucidated in more detail.

W3C Open Annotation implementation of the AnnoSys Annotation Data Model

The oa:motivatedBy properties of the provenance part of the AnnoSys implementation extends the predefined class oa:editing to provide classes for the implementation of the AnnoSys motivation use cases (annotation types). As the basic expectation of annotators towards an collection curator is to edit or update the collection database based on the published annotation. So, the following classes have been derived from oa:editing to reflect the defined annotation type:

  • annosys:Determination
  • annosys:DuplicateLocality
  • annosys:Gathering
  • annosys:NomenclaturalType
  • annosys:Other
  • annosys:Sequence
  • annosys:RecordBasis
  • annosys:ScientificName
  • annosys:UnitMark

The remaining properties are straight forward uses of the W3C OA Specification providing the following information:

annotated by
the annotating agent, which is usually a foaf:Person, but may also be represented as prov:Software agent if (mass) annotations were created by a corresponding service.
annotated at
datetime, the annotation was published.
serialised by
the software agent, which has created the annotation in favour of the annotator(e.g. AnnoSys system).
serialised at
datetime, the annotation was first created within the system before it has been published by the annotator.

The annotated elements of an annotation are realised as tuples of target and body each. Due to the open world assumption defined wihtin the W3C OA Specification, any target and any body may be reused by any other annotation and thus, the statement expressed by these resources must always be true. Therefore, body and target of an annotated element can not be "hardlinked", because this would contradict to the open world assumption. Here, the oa:hasScope property comes in place, which provides a kind of weak linking and could be interpreted in the sense of: The target this body is scoped to was examined by the annotator when the annotation(body) was created. In fact, even though the relationship is not hardlinked, it expresses exactly what we require to model the annotated element. The body tracks the expectation, annotated value and the comment created by an annotator when examining a selected data element within a given record identified within the target.

A specific target part of an annotated element is identified through a oa:hasSource, oa:hasState and a oa:hasSelector property, which will be described within the next paragaphs.

oa:hasSource
a target's source represents the tripleId of the physical specimen object in a museum(e.g. BGBM:Herbar:0001) which is somehow represented as a dataset.
oa:hasState
the state of the dataset represents the state of record document related to the target source as presented to the annotator at annotation time.
oa:cachedSource
the cached source represents the record XML document within the AnnoSys repository as it was downloaded from a record data provider. The cached source is stored as stable http-URI, which can be used to retrieve the given document from the AnnoSys repository via the AnnoSys Linked Data services.
oa:when
as we selected oa:TimeState class, when describes the datetime, when the record document referred to by the cached source property was retrieved and stored within the AnnoSys repository.
oa:hasSelector
oa:FragmentSelector is used here as class type to describe the AnnoSys element selector. In particular, fragment selectors support the XPointer specification, which permits to adequately state namespaces and XPath[30] selectors to identify the elements in the XML record document specified within the hasCachedSource property.
dcterms:conformsTo
namespaceURI of the XPointer specification
rdf:value
XPointer expression selecting an annotated element within the XML record document denoted in the hasCachedSource property

A specific body part of an annotated element is represented by its scope(annotated target) and a body holding the values of an annotated element

oa:hasScope
refers to the specific target resource this specific body resource relates to
oa:hasSource
refers to the body resource holding the values of the annotated element
rdf:type
a class introduced by AnnoSys describing the expectation of the annotator towards the curator of the collection database. The following clases are currently supported:
  • annosys:Update
  • annosys:Remove
  • annosys:Add
dcterms:description
the natural language comment of the annotator with regard to the selected annotated element.
rdf:value
the annotated value proposed by the annotator with regard to the selected annotated element.

In comformance to the W3C Open Annotation specification, a oa:Composite element is used to group specific target or body resources, if multiple specific targets and bodies have been used to model multiple annotated elements. If only a single annotated element had been modeled, oa:hasTarget or oa:hasBody would refer to a specific target or body element respectively.

Curation Annotations

Curation annotations can only be created by registered collection managers and are designated to express a curator's acceptance decision about annotations related to original record data originating from his collections. So, those annotations represent a special annotation use case where annotations are about other annotations. Beyond that, curation annotations must provide means to curators for accepting or rejecting annotated elements from referred annotations individually.

W3C Open Annotation implementation of AnnoSys Curation Annotations

The curation annotation's metadata part remains mostly unchanged. In particular, the following motivation class has been introduced and derived from oa:replying to reflect the nature of curation annotations:

  • annosys:Curation

As the specification defines that annotations must include at lease on target, the target of a curation annotation refers to the the annotation being curated.

The curation annotation's body is a oa:Composite compiling the curator decisions with regard to the annotated elements of the curated annotation (curated elements) being accepted, rejected or undecided. For that, the body's type is derived from the Decision class from the decision-ontology[36]. Further on, the body part for each curated element is represented by its scope(the annotated element being curated) and a body holding the decision and a free text comment for each curated element

oa:hasScope
refers to the specific target resource of the annotated element being curated by this specific body resource.
oa:hasSource
refers to the body resource holding the values of the annotated element.
rdf:type
the Decision class of the defined within the decision-ontology[36].
dcterms:description
the natural language comment of the curator with regard to the curated element.
decision:has_result
the result of the curator decision. Therefore, the following AnnoSys classes are introduced by deriving the from the Option class of the decision-ontology[36] in order to express curator decisions supported by AnnoSys:
  • annosys:accepted
  • annosys:rejected
  • annosys:undecided

Mass (or Multi) Annotations

Mass annotations are designed to link together a set of identical annotations referring to distinct data records identified by their triple ids.

W3C Open Annotation implementation of AnnoSys Mass Annotations

A mass annotation's metadata part remains mostly unchanged. In particular, the following motivation class has been introduced and derived from oa:linking to reflect the nature of mass annotations:

  • annosys:MassAnnotation

Usually, mass annotations have no body but refer to any enveloped annotation by including multiple targets referring to the relating annotation resource URIs.

Mass (or Multi) Curation Annotations

Mass curation annotations are designed to link together a set of identical curation annotations replying to annotations which are referring themselves to distinct data records identified by their triple ids.

W3C Open Annotation implementation of AnnoSys Mass Curation Annotations

A mass curation annotation's metadata part remains mostly unchanged. In particular, the following motivation class has been introduced and derived from oa:linking to reflect the nature of mass annotations:

  • annosys:MassCurationAnnotation

Usually, mass curation annotations have no body but refer to any enveloped curation annotation by including multiple targets referring to the relating curation annotation resource URIs.

User Interface

This section will neither focus on operating the user interface nor its elements. Please refer to the current user instructions document for any details.

Within this section, the configuration of annotation types, their relating data templates in terms of annotation use case specifications and the concomitant definition of record document format namespaces and element selectors as well as the current implementation details of specification based user interface support will be documented.

Any user interface related configuration files are properties based. They will locate in the subdirectory config of the system's configuration home directory (henceforth referred to as ${AnnoSys.home.dir}).

All of these properties files will be evaluated on system startup only! That is, any changes will require a system restart in order to be considered.

Annotation types

The configuration of annotation types takes place within the following property files:

  • namespace.properties
  • selector.properties
  • annotation.properties

The namespace.properties file is used to map namespace prefixes to their relating namespace URIs. Beyond that, it specifies which XML document formats in terms of namespace URIs are supported by the AnnoSys software. The file selector.properties defines reusable constants for XML element selectors based on the XPointer Framework[37]. Within the annotation.properties, element selectors defined above are finally organised to compose annotation types and to define user input value restrictions.

Note
Placeholders may be used (and are highly recommended to enhance readability) in any .properties files to reference and being evaluated within expressions defined for any other property name in any of the AnnoSys properties configuration files. Referenced placeholders simply substitute the evaluated value defined by the referenced property. Placeholders may be stated like ${property name} in any expression.

Namespace Support

Within the file namespace.properties any namespace supported ("known") by the AnnoSys system have to be configured. In particular, for any supported namespace a unique system-wide namespace prefix and the relating namespace URI have to be specified. Optionally, a (machine readable) namespace specification document may be given. Based on the specification document, AnnoSys' user interface analyses the data standard provided therein in order to facilitate automatic user supporting functionalities like

  • input validation
  • input restriction by drop-down selections
  • provision of XML element specific documentation
  • ...

The architecture of AnnoSys' namespace support is designed to be generic in terms of data formats and data standards. Nevertheless, currently only XML data and ABCD[5] standard are implemented.

Namespace URI Mappings

In order to be correctly interpreted by the AnnoSys software, namespace URI mappings must be defined according to the following schema:

NamespaceURI.<namespace prefix> = expression(namespaceURI)

NamespaceURI
Prefix required to retrieve Namespace URI mapping definitions within an arbitrary list of properties by the system.
namespace prefix
The namespace prefix to be assigned with the namespace URI expression.

Example

NamespaceURI.abcd2.06 = http://www.tdwg.org/schemas/abcd/2.06

Maps the namespace prefix abcd2.06 to the namespace URI http://www.tdwg.org/schemas/abcd/2.06.

Namespace configuration

Any namespace to be supported or "known" by the AnnoSys software must be expressed in a property according to the specification described in this section. Mainly, a namespace configuration consists of the following information

  • Namespace prefix mapping of the data standard specification
  • Link(s) to (machine readable) specification document(s)

Even though the configuration design is generic in terms of supporting undetermined specification types, currently only XML-Schema[38] based specification document support is implemented by AnnoSys.

Within the subsequent description, namespace configurations are identified by namespace prefixes and bound to a list of parameter expressions determining the relating namespace URI, and specification document.

Namespace.<namespace prefix> = expression-list(namespaceURI,specification type, local specification file path, specificationURI, default)

Namespace
Prefix required to retrieve Namespace URI mapping definitions within an arbitrary list of properties by the system.
namespace prefix
The namespace prefix assigned with the namespace configuration.
namespaceURI (required)
Prefix required to retrieve Namespace URI mapping definitions within an arbitrary list of properties by the system.
specification type (optional)
Currently, only type xsd (XML-Schema) specification support is implemented.
local specification file path (optional)
Path to the namespace relating specification in the local file system.
specificationURI (optional)
URI of the namespace relating specification. If the local specification is neither configured nor accessible, AnnoSys attempts to resolve the specification document from the given URI.
default (required)
Indicates, if the software assigns this namespace configuration per default to the given namespace URI. Possible values are true or false, where true must be assigned only once to the given namespace URI.

Example:

Namespace.abcd2.06 = ${NamespaceURI.abcd2.06}, xsd, ${AnnoSys.home.dir}/schema/abcd/2.06/ABCD_2.06.xsd, http://rs.tdwg.org/abcd/2.06/ABCD_2.06.xsd, false
Namespace.abcd2.06a = ${NamespaceURI.abcd2.06}, xsd, ${AnnoSys.home.dir}/schema/abcd/2.06/a/ABCD_2.06a.xsd, http://rs.tdwg.org/abcd/2.06/a/ABCD_2.06a.xsd, false
Namespace.abcd2.06b = ${NamespaceURI.abcd2.06}, xsd, ${AnnoSys.home.dir}/schema/abcd/2.06/b/ABCD_2.06b.xsd, http://rs.tdwg.org/abcd/2.06/b/ABCD_2.06b.xsd, true
Namespace.biocase = ${NamespaceURI.biocase}, , , , true

The first three lines configure the namespace support for different versions of ABCD[5] documents. As any of these versions is defined based on the same namespace URI, the most recent version is configured to be used by default.

The last line configures the namespace support for the BioCASE protocol. Thereby, no XML-Schema specification is provided, as it is not subjected to annotation.

Definition of element selectors

Element selectors are used within the AnnoSys software to reduce the complexity of selector usage through assignment of a system-internal unique name. As element selectors are represented by property names, the placeholder concept enables system-wide reusing of named element selectors.

In order to be correctly interpreted by the AnnoSys software, any selector constant must be defined according to the following schema:

ElementSelector.<selector type>.<selector name>.<namespace prefix> = expression(XPointer)

ElementSelector
This is a prefix required to retrieve ElementSelector definitions within an arbitrary list of properties by the system.
selector type
The selector type for marking selector expressions to be retrieved by the AnnoSys system must be xpointer.
Any other types may be used and will not be evaluated by the software. In particular, we use xpointerconst to define intermediary place holder expressions, which are substituted in the finally defined selector expression.
selector name
Defines the selector's name, which is used by the AnnoSys system to identify a given selector expression.
namespace prefix
Defines the XML document's namespace prefix to which the selector expression applies. That way, the same selector name may be defined with regard to several document standards. Currently, AnnoSys supports ABCD 2.06b[5] documents only.
expression
Here, a valid expression according to the XPointer Framework is expected by the evaluating software.

Example:

NamespaceURI.abcd2.06b = http://www.tdwg.org/schemas/abcd/2.06
ElementSelector.xpointer.xmlns.abcd2.06b = xmlns(abcd2.06b=${NamespaceURI.abcd2.06b})
ElementSelector.xpointerconst.DataSets.abcd2.06b = ${ElementSelector.xpointer.xmlns.abcd2.06b}xpointer(/abcd2.06b:DataSets
ElementSelector.xpointerconst.DataSet.abcd2.06b = ${ElementSelector.xpointerconst.DataSets.abcd2.06b}/abcd2.06b:DataSet
ElementSelector.xpointerconst.Unit.abcd2.06b = ${ElementSelector.xpointerconst.DataSet.abcd2.06b}/abcd2.06b:Units/abcd2.06b:Unit[1]
ElementSelector.xpointer.UnitID.abcd2.06b = ${ElementSelector.xpointerconst.Unit.abcd2.06b}/abcd2.06b:UnitID)

The selector expression evaluated for property ElementSelector.xpointer.UnitID.abcd2.06b is:

xmlns(abcd2.06b=http://www.tdwg.org/schemas/abcd/2.06)xpointer(/abcd2.06b:DataSets/abcd2.06b:DataSet/abcd2.06b:Units/abcd2.06b:Unit[1]/abcd2.06b:UnitID)

The element selector named UnitID of type xpointer will evaluate the first occurence of a unit id element within a given ABCD document of namespace http://www.tdwg.org/schemas/abcd/2.06.


Definition of Annotation Types

Annotation types define a subset of data elements, which may potentially be subject of annotation within a determined annotation workflow like Determination or Gathering. Before starting to create annotations, a predefined annotation type has to be selected by the annotator from a drop-down selection box. After having determined the annotation type that way, the standard view presents to the annotator only those data elements as defined in the corresponding template for the given annotation type. Thus, an annotation type definition consists of a list of element selectors determined in favour of a given annotation workflow.

AnnotationType.<namespace prefix>.<name> = expression-list(element selectors)

AnnotationType
This is a prefix required to retrieve AnnotationType definitions within an arbitrary list of properties by the system.
namespace prefix
Defines the XML document's namespace prefix to which the annotation type definition applies to. The namespace prefix must be given as defined in section Namespace Support.
name
Defines the name of the annotation type, which is used by the AnnoSys system to internally identify a given annotation type definition.
element selectors
Specifies the list of template defining element selectors(cmp. Definition of element selectors).

Example:

AnnotationType.abcd2.06b.Determination = \
${ElementSelector.xpointer.PreferredIdentifierRole.abcd2.06b}, \
${ElementSelector.xpointer.PreferredHigherTaxonName.abcd2.06b}, \
${ElementSelector.xpointer.PreferredHigherTaxonRank.abcd2.06b}, \
${ElementSelector.xpointer.PreferredFullScientificName.abcd2.06b}

The annotation type named Determination is defined for the namespace prefix abcd2.06b and consists of the element selectors named as PreferredIdentifierRole, PreferredHigherTaxonName, PreferredHigherTaxonRank and PreferredFullScientificName and defined for the same namespace.

Definition of Annotation Restrictions

In addition to element value restrictions potentially defined in a specification document and automatically evaluated by the AnnoSys user interface, individual restrictions may be configured using the term AnnotationRestriction. An annotation restriction specifies a list of input values permitted for a given element selector. That list of permitted input values will be shown to the annotator within a drop-down selection box and automatically opened within the related input field of the user interface. The names of annotation restrictions are expressed as XPath[30] selectors, where any '/' characters were replaced by '_' characters. The latter is due to the fact that annotation restrictions also apply to the annotation editor's Expert View, where restrictions are only evaluated on a XPointer expression basis.

Note
This will be subject of modification in a later revision where the XPath expression be replaced by the name of an element selector.

AnnotationRestriction.<namespace prefix>.<XPath expression> = expression-list(permitted values)

AnnotationType
This is a prefix required to retrieve AnnotationType definitions within an arbitrary list of properties by the system.
namespace prefix
Defines the XML document's namespace prefix to which the annotation restriction definition applies to. The namespace prefix must be given as defined in section Namespace Support.
XPath expression
Defines an XPath selector to the addressed element, where any '/' characters must be replaced by '_' characters.
permitted values
Specifies a list of permitted values as to be shown in a drop-down selection box within the related input field in the user interface.

Example:

AnnotationRestriction.abcd2.06b._DataSets_DataSet_Units_Unit_SpecimenUnit_NomenclaturalTypeDesignations_NomenclaturalTypeDesignation_TypeStatus = \
Isotype, \
Syntype, \
Isosyntype, \
Holotype, \
Lectotype, \
Neotype, \
Parataype, \
Epitype

The permitted input values (Isotype, Syntype, etc.) are specified for record documents within the defined namespace abcd2.06b for elements matching the XPath expression /DataSets/DataSet/Units/Unit/SpecimenUnit/NomenclaturalTypeDesignations/NomenclaturalTypeDesignation/TypeStatus.

Synonyms list

The synonyms list contains mappings of taxonomic family names defined by the element selector PreferredHigherTaxonName, which may be used synonymously in search operations conducted via the user interface as well as in evaluating user defined subsriptions. The configuration of annotation types takes place in the property file defined within the AnnoSys System Configuration file annosys.properties using the option AnnoSys.synonyms.file (default: /etc/AnnoSys/resources/synonyms.properties).

Mappings must be defined in both directions and family names must be given in lowercase. Any number of mappings are permitted.

Example:

asteraceae = compositae
compositae = asteraceae

Defines the synonymity betwenn the family names Asteraceae and Compositae.

Security

In AnnoSys, any security relevant implementations base on the Apache Shiro[20] framework. Further on, any security or privacy relevant information is stored in a file holding a SQLite[16] database. The following sections will provide further details about the internal parts of the security component.

Security Database

As any other system relevant configuration files, the security database is stored in a location on the server inaccessible for unauthorised AnnoSys agents or external services.

The figure Security Database Schema below shows an entity relationship diagram revealing data and table relationships as organised in the security database. Therein, the subjects table is used for storing user authentication data whereas the tables roles and permissions provide the lists of roles and permissions registered within the system. Ultimately, the tables subjectRoles and subjectPermissions record the roles and permissions assigned to users, and the table rolePermissions does the same for roles and permissions. That way, the database architecture provide means to realise pure role based authorisation enhanced by individual user permission assignments. Internally, the software checks for permissions only. Also, a role management user interface will be accessible to system administrators exclusively.

Security Database Schema

The following sections provide some implementation details about authentication, role based authorisation and agent profiles in AnnoSys.

Secure Authentication

Agent authentication is secured by individual agent credentials to be entered at the login dialog. Moreover, any data transmissions initiated through network connections are secured by the Hypertext Transfer Protocol Secure (HTTPS)[19].

For user authentication, the following data is relevant and retrieved from the security database's subjects table.

uid
the user id, or the login name an agent enters in the login dialog.
credential
the agent's credential, or the password an agent enters in the login dialog. Credentials are stored as digest values according to the Apache SHA1 Password Format.
status
the current status of the agent. Currently, only active (1) and inactive (0) are supported values.

The relationship with agent data in the annotation repository is realised through the following data elements:

resourceid
the current resource URI of the agent representing the users's identity in the annotation repository.
type
the agent type (Person, Organisation or Software) according to the agent metadata properties defined in section Agent Profiles.
name
the agent's name
institution
the agent's institution

Additionally, the following information is held by the security database:

mailbox
the agent's email address used for communication via the AnnoSys Message System.
description
to be used by system administrators for additional natural language information about the agent.

Role Based Authorisation Management

Based on the implementation of role based authorisation, AnnoSys may restrict access to certain functionalities based on roles assigned to authenticated agents by system administrators. In particular, the AnnoSys role based access control (RBAC) mechanisms support resource oriented role definitions like curator of a given institution, collection or unit. Thus, AnnoSys role definitions follow the subsequent URN oriented naming principle:

roleName[:resourceDefinition]

roleName
general name assigned to the role (e.g. curator)
resourceDefinition
hierarchical definition scheme for resources identified by their tripleId (optional). If no resource definition is given, the role applies to any kind of resource.

Examples:

curator:BGBM (curator of any data at BGBM)
curator:BGBM:Herbarium Berolinense (curator of any data in the collection Herbarium Berolinense at BGBM)
curator:BGBM:Herbarium Berolinense:B 10 0356520 (curator of the specified unit in the collection Herbarium Berolinense at BGBM)

Based on the naming principle described previously, AnnoSys currently interpretes the following role names:

role definition
none unauthenticated agents, enables examining annotations from the repository
annotator default role for authenticated agents, enables the execution of basic annotation workflow functionality including the creation of annotations.
curator role for curators, enables the execution of curation workflow functionality on annotations matching the given resource definition
administrator role for system administrators

Similar to role definitions, permission definitions follow an URN oriented naming principle as well. Unlike role definitions, the URN scheme part specifies actions to be executed on the outlined unit resources directly without matching higher resource definition levels like institutions or collections.

actionName:resourceDefinition

actionName
name expressing a specific action to be performed
resourceDefinition
hierarchical definition scheme for resources identified by their tripleId.

Examples:

curate:BGBM:Herbarium Berolinense:B 10 0356520 (curate the specified unit in the collection Herbarium Berolinense at BGBM)

The overall intention is that AnnoSys grants access to specific functionalities and resources based on permissions only. That is, the AnnoSys security component will be used to check, if the current authenticated agent is entitled to execute a certain action on a given resource. For instance, before the AnnoSys user interface enables curation functionalities on a given annotation, it instructs the security component to check, if the current agent is entitled to curate annotations created with regard to the referred original record identified by its tripleId.

As a consequence, this also implies that whenever new resources were imported or added to the repository, any matching role permission relationships matching the original record's tripleId must be updated in the security database. E.g., while importing a new unit matching the collection Herbarium Berolinense at BGBM, the related permissions must be added AND any matching roles defined for the resources BGBM or BGBM:Herbarium Berolinense have to be updated in order to cover the newly added resources correctly. Ultimately, these updates must reflect any other resources potentially implemented or introduces within the system as well.

Agent Profile Management

Agent profile management combines the maintenance of the security database and personal profile stores arranged for each registered agent ( cmp. Agent Profiles). Currently, the agent registration process consists of filling in a form with the required information as outlined in in section Secure Authentication. After having successfully carried out the registration procedure, agents are assigned to the annotator role automatically (cmp. Role Based Authorisation Management). That way, registered agents may immediately login to the system and start editing and publishing annotations. Also, a personal profile store including a store capturing personal system preferences will be instantiated on a per agent basis. If imposed by measures against system misuse (e.g. spam bot registrations), the registration procedure may be hardened by appropriate measures in the future.

Additional agent-to-role or agent-to-permission assignments can exclusively be conducted by system administrators only. This will be done on individual claim and after careful consideration of the presented evidence.

Furthermore, system administrators may decide to inactivate or delete agents and their profiles from the system for reasons of misuse or security. This will impact the deletion of any annotations published by those agents as well.

Message System

Via the message system, registered agents and curators are informed about annotation workflow status changes triggered by agent, curator or system interactions. Therefore, notifications in terms of messages are dispatched through system internal message queues. Registered agents can manage their messages queues and subscribe to specific annotation workflow activities via the user interface. Beyond that, any message to a registered agent will also be sent to his email address.

Technically, the Message System bases on a combination of the Apache Camel Framework[24], which uses Java Message Service (JMS)[21] and Apache ActiveMQ[23] to delegate messages and send Mails via an SMTP server to the users, and a subscription component which manages the subscriptions and delegates messages to users when a subscription for a new annotation matches the criteria for a new message.

JMS & ActiveMQ

The Java Message Service (JMS)[21] standard was developed by the Java Community Process and defined within the specification JSR 914[22]. Among other things, JMS provides an application programming interface (API) for message communication between one or more software components or client applications.

The term messaging describes a loosely coupled form of reliable, asynchronous, distributed message exchange between software components. Herein, senders don't need to have precise knowledge of their receivers and vice versa. Nevertheless, the message format must be arranged by communication partners.

JMS provides two communication models:

  • point-to-point
  • publish/subscribe

Using point-to-point communication, the producer sends a message to a queue which is connected to a dedicated receiver. If the receiver is currently not available, the message will be stored by the message service and the receiver can fetch it when he reconnects.

The publish/subscribe model requires the publisher to create a message topic, where an arbitrary number of clients can subscribe to. Messages sent to that topic must then actively be consumed through subscription or will be lost. Optionally, subscribers may decide for durable-subscription. Thereby, messages will be persistently stored by the message service and redistributed to subscribers on every reconnect.

In order to use JMS, a JMS provider implementation like Apache ActiveMQ is required providing functionalities to manage topics, queues and sessions.

Apache Camel

Apache Camel[24] is an Enterprise Integration Framework. It implements all patterns from the Enterprise Integration Pattern book. For our purposes we use three patterns: Endpoints, Queues and Processors

Endpoints in Camel specify a particular protocol about how to retrieve and send messages. There exist a lot of those protocols already implemented in camel. Sending mails via any smtp server or specifically for managing mails in googlemail. Other components deal with files or in our case with JMS-Queues.

As mentioned above, Apache ActiveMQ implements the JMS-Specification. The most important part for is the queue mechanic. In Camel this is just an endpoint like any other. Like all queue datastructure, JMS-Queues work by the FIFO principle.

In Camel Endpoints are specified via a uri:

activemq:queue:destinationName

On startup, the Message System creates individual endpoints for each registered user (see diagram Message System initialisation) and assigns a single dedicated message queue to them. The message queues are named after the agent's user id in the Security_Database. From these agent queues, messages will either be forwarded via a dedicated processor to other agent queues or to a configurable SMTP endpoint in order to relay notifications by email to agents in question. A processor enriches the message with required meta information like the recipient address, sender address or an HTML formatted email body.

Message System initialisation

The diagram depicts to the following camel routes (in Java DSL notation):

 from("amq:queue:" + id)
   .to("amq:queue" + id + ".msgbox")
   .process(new AnnotationProcessor(user))
     .to("mail://my.smtp.com?username=myname&password=mypassword");
 

The "id" refers to the http uri of the agent. The AnnotationProcessor will add "FROM", "TO" and "SUBJECT" information to the message, so it can then be processed by a smtp server.

An identical route is set up during the agent registration procedure with the AnnoSys user interface.

Subscriptions

Subscriptions are a means for agents to individually determine those kinds of annotation workflow activities they want to be notified about. They are based on a quite simple data model(see diagram Subscription database scheme) and are currently persisted in a SQLite database.

Subscription database scheme

Mainly, subscriptions consist of a list of text based criteria to determine matching annotation workflow activities. Agents may create an unlimited amount of subscriptions. The following table shows the currently supported criteria:

Supported subscription criteria
name crit_key description
Institute subs_institute The institute the collection of the record belongs to
Collection subs_collection The collection the record belongs to
Object ID subs_objectid The object id of the record
Gathering Country subs_country The country where the specimen were collected
XML Selector subs_selector a generic field to specify a xml selector
Scientific name subs_species the scientific name of the specimen
Family subs_family the family of the specimen
Collector subs_gatherer the collector of the specimen

On publication of new annotations, the criteria of any subscriptions are checked for matching the given annotation values. Thereby, the criteria either refer to an element's value in the original record's xml document or an annotation element's value in the annotation to be checked. If any criteria of a subscription match the relating information from an annotation, the agent having issued the subscription will be notified through a corresponding message. In order to avoid sending redundant messages, the processing of any remaining subscriptions for this agent will be skipped. The notification messages will be sent by the Message System agent's message-queue first. Subsequently, the message will be forwarded by Apache Camel to the registered email address of the given agent via the SMTP server configured for the Apache Camel SMTP endpoint.

Implementation

There are several actors involved in processing messages:

  • AnnotationStore
  • MessageSystem
  • SubscriptionSystem
  • AgentRouteBuilder
  • AnnotationProcessor
  • JadeAnnotationRender

First the AnnotationStore gets notified, that a new annotation is about to be published. The store then infers what kind of annotation it is (Annotation, CurationAnnotation, MassAnnotation) and calls the MessageSystem to send Messages to the correct recipients. The MessageSystem finds all the recipients for the annotation. E.g. for Annotation and MassAnnotations a message to the annotator is send, then to all curators, except the annotator if he/she is the annotator and at last to all subscribers, again without annotator and curators.

Sending a message means that a Message with a header is send to the camel queue of the appropriate Agent. The header indicates what kind of email is supposed to be sent. An email to a [annotator, subscriber, curator] and what kind of annotation it was. This header is also used to determine what kind of jade template should be used to render the email. Before an email can be sent the AgentRouteBuilder has to root it appropriately. The AgentRouteBuilder routes the message to the agents message box. Which is indicate by the suffix ".msgbox" in the queuname of the agent. Additionally it processes the message through the AnnotationProcessor. As mentioned above it enriches the AnnotationMessage with EMail-related header fields and calls the JadeAnnotationRenderer to render the mailbody. The JadeAnnotationRenderer extracts all the required information from the annotation to display it accordingly in the mail body. Indicated by the header field "MESSAGE_TYPE" the renderer choses the template to use and renders the mail body. If some values aren't displayed correctly, this is the place to look for to add additional data to the mail.

Jade

The templates are located in resource/templates. The language for the templates is jade. An overview of the syntax can be found at jade-lang.com and https://naltatis.github.io/jade-syntax-docs/ with interactive input boxes. All templates are suffixed with a language locale.

To insert data into the template you pass a Map<String, Object> to the Rendering engine. For instance

   model.put("foo", "bar")

would enable to use the value foo in the template. In text sections use #{...} to escape the passage and access variables or execute jade code (siehe zebra stryped table tr element in annotation-de.jade) .

   h2 Nice Heading #{foo}

If you pass another Map or Object you can use dot-notation to access its values or public fields respectively

   Map m = ...
   m.put("a", 1)
   m.put("b", 1)
   model.put("foo", m)

In the template

 h2 Nice Heading #{foo.a}

Troubleshooting

  • When new fields are introduced to AnnotationMessage, it might break the serialization of old messages in the message store. You probably have to delete all old messages.
  • Messages don't show up in the UI: are they written and read correctly to/from {agent.urn}.messagebox. (See AgentRouteBuilder and MessageSystem.listmessages)
  • Mails aren't sent: Probably an Exception or Rendering Error in JadeAnnotationRenderer or the mail configuration in annosys.properties is wrong. The url of the mail endpoint is build in AgentRouteBuilder.
  • Parts of the mail doesn't show up (like AnnotationElements or Greeting): Check the templates. Is the inheritance correct and do they include all the necessary templates.

PDF Creation

Based on published annotations, curators might want to put an additional label on the phyisical collection object showing the information by online annotations. Therefore, AnnoSys provides basic annotation label export via PDF-files, which might be printed out and attached to the physical object.

In order to export labels via PDF, AnnoSys uses the Apache FOP framework, which needs an XSL-FO stylesheet as input in order to render the desired PDF. The stylesheet is generated via xslt (Apache Xalan). The mapping of annotation data into the stylesheet is done by SimpleXML.

The configuration of stylesheets according to annotation types takes place via stylesheet files defined within the AnnoSys System Configuration file annosys.properties using the options AnnoSys.template.pdf.${annotationtype} (default: /etc/AnnoSys/resources/templates/pdf/${annotationtype}.xsl).

Subsequently, the currently supported annotation types including the values provided within the template are described:

Determination (${annotationtype} = determination)
scientificname
the newly determined scientific name(depends on result and is either the value from the annotation's or the original record's PreferredFullScientificName)
result
the determination result (det., rev., conf. taken from the annotation's PreferredIdentifierRole)
annotator
the name of the annotating agent
date
the annotation's publication date
institute
the name of the organisation the annotating agent depends on


Services

Preliminaries

Within the services documentation, some place holders are used to shorten reading and to adopt it easily to other or changing environments.

${AnnoSysURL} = https://annosys.bgbm.fu-berlin.de/AnnoSys/AnnoSys
Denotes the base URL of the stable and released AnnoSys system.
${ServicesURL} = https://annosys.bgbm.fu-berlin.de/AnnoSys/services
Denotes the base URL of the the stable and released AnnoSys web services.

For testing purposes, the most recent release is available under the following URLs:

${AnnoSysTestURL} = https://annosys.bgbm.fu-berlin.de/AnnoSysTest/AnnoSys
Denotes the base URL of the AnnoSys Test system.
${ServicesTestURL} = https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services
Denotes the base URL of the AnnoSys Test system web services.

Please use the ${AnnoSysTestURL} or ${ServicesTestURL} if you are testing the integration of AnnoSys with your application!

Currently, only ABCD2.06 based XML data records are supported ! (namespacePrefix "abcd2.06b")

Annotations are stored and will be delivered on request in RDF according to our implementation of the W3C Open Annotation Data Model described here.

The AnnoSys - User Guide provides a basic introduction (Quick Start) as well as detailed information on how to use the the AnnoSys user interface.

Please report errors to the AnnoSys Project Team.

Integrating AnnoSys with Data Portals

AnnoSys provides the following types of interfaces for integration with Data Portals or other applications

  • Invoking the AnnoSys user interface
  • Retrieving record or annotation related information from AnnoSys web services

User interface invocation

The user interface can be invoked either to redirect users to the AnnoSys search interface or to enable users to annotate a data record currently reviewed in the data portal. The search interface can be invoked by redirecting web browsers to ${AnnoSysURL}.

To enable users to annotate a data record, the relating XML document must be transferred to the AnnoSys repository. After successfully transferring the document, the user will be redirected to the AnnoSys user login/registration dialog first and subsequently to the AnnoSys Annotation Editor. This can be done either by providing a URL, where the document can be downloaded by AnnoSys directly, or by providing a set of parameters describing how AnnoSys can download the data record from a BioCASE provider.

Note
The values for any parameters described in the next sections MUST be URL encoded individually(!) in order to be correctly transmitted via the URL to AnnoSys !

Download via direct URL

The URL referring to the record data document to be passed to AnnoSys via the parameter recordURL, i.e.

recordURL
The parameter should contain the URL of the document to be downloaded.
Example: https://annosys.bgbm.fu-berlin.de/AnnoSysTest/AnnoSys?recordURL=http://ww2.biocase.org/svn/annotation/original/03666bc0-f0f4-11d8-b22f-b8a03c50a862/abcd2.06/BGBM/Bridel%20Herbar/Bridel-1-12.xml

The URL may refer to the following types of documents

  • ABCD2.06 based XML data record(s)
  • BioCASE response documents containing ABCD2.06 based XML data record(s)
  • ABCD Archives containing either ABCD2.06 based XML data record(s) or BioCASE response documents

In case of any of the document holds more than a single data record, the AnnoSys user interface provides a record selection dialog where annotators may select records for further being processed either as mass annotation or multiple individual annotations from a list of successfully transmitted data records.

Download via BioCASE Provider

The BioCASE provider and the data record to be retrieved by AnnoSys must be passed via the following parameter set

providerURL
The base URL of the BioCASE provider (e.g. http://ww3.bgbm.org/biocase/pywrapper.cgi?dsa=Herbar&).
protocolURI
The namespace URI of the protocol used by the BioCASE provider (e.g. http://www.biocase.org/schemas/protocol/1.3).
formatURI
The namespace URI of the document format to be retrieved (e.g. http://www.tdwg.org/schemas/abcd/2.06).
institution
The institution (lsid:authority) part of the tripleId describing the record (e.g. BGBM).
source
The source (lsid:namespace) part of the tripleId describing the record (e.g. Herbarium Berolinense).
unitID
The unitID (lsid:objectId) part of the tripleId describing the record (e.g. B 20 0145120).
Example: https://annosys.bgbm.fu-berlin.de/AnnoSysTest/AnnoSys?providerURL=http%3A%2F%2Fww3.bgbm.org%2Fbiocase%2Fpywrapper.cgi%3Fdsa%3DHerbar%26&protocolURI=http%3A%2F%2Fwww.biocase.org%2Fschemas%2Fprotocol%2F1.3&formatURI=http%3A%2F%2Fwww.tdwg.org%2Fschemas%2Fabcd%2F2.06&institution=BGBM&source=Herbarium%20Berolinense&unitID=B%2020%200145120

Opening Annotation View or Annotation Editor from http-reference URI

The http-reference URI corresponds to the resource id used to retrieve annotation data as RDF or XML record documents from the repository through AnnoSys' Linked Open Data Services described in section Web Services.

The URI referring repository data object can be passed to AnnoSys via the parameter repositoryURI, i.e.

recordURI
The parameter must contain a valied URI dereferencing an annotation or XML record in the AnnoSys repository.
Example: https://annosys.bgbm.fu-berlin.de/AnnoSysTest/AnnoSys?repositoryURI=https%3A%2F%2Fannosys.bgbm.fu-berlin.de%2FAnnoSysTest%2Fservices%2Fannotations%2FBGBM%2FAnnoSys%2F1405093916622

Information Retrieval

Information related to a given record triple id can be retrieved via AnnoSys RESTful web services. Currently, the following record related information can be retrieved:

  • Annotations

Returns, if the AnnoSys repository knows the record of the given triple id and, if there are annotations, also the number of annotations the given record. For more detailed information, see Request: GET ${ServicesURL}/records/<lsid:authority>/<lsid:namespace>/<lsid:objectId>/annotations.

Web Services

AnnoSys provides two kinds of web services.

  • Linked Open Data (LOD) services
  • RESTful services

The Linked Open Data services provide access to resources referring to data stored in the AnnoSys repository. Therewith, annotations can be retrieved as RDF data, and the relating record documents as XML documents.

The RESTful services provide access to other information related to annotations or records, like if there are annotations stored in the repository for a given record.

Additional services may be implemented on request.

The next sections will provide detailed information regarding the provided services.

Annotations

The general context path for annotations is ${ServicesURL}/annotations.

The annotation id is constructed analogously to tripleIds(or LSIDs) by our institution(lsid:authority), our source(lsid:namespace) and an annotation id(lsid:objectId) (e.g. /BGBM/AnnoSys/123456789).

All annotation requests return an either an rdf graph containing the annotations as described in our annotation model or a JSON annotation object as described in section JSON Annotation Object.

The following sections will describe possible requests and answers.

Note
The values for any parameters described in the next sections MUST be URL encoded individually(!) in order to be correctly interpreted by AnnoSys Web Services !

JSON Annotation Object

An JSON Annotation Object contains the following information about an annotation which is intended for being displayed at a data portal user interface:

repositoryURI
URI of the annotation within the AnnoSys repository
recordURIs
list of record URI's belonging to that annotation (>1 for batch (curation) annotations)
annotator
name of the annotating agent
time
annotation's publication time in milliseconds since 01 January 1970
motivation
the annotation's motivation or type

Request: GET ${ServicesURL}/annotations

Returns a JSON object containing a list of URLs referring to to all available annotations in the AnnoSys repository.

size
number of annotations
annotations
list of JSON Annotation Objects
Example: https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/annotations
{

   "size": 2,
   "annotations": [
    {
     "repositoryURI": "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/annotations/BGBM/AnnoSys/1404736720822",
     "recordURIs": [
      "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/records/IBMT/CCCryo/242-06/1404725446799/abcd2.06b"
     ],
     "annotator": "Wolf-Henning Kusber",
     "time": 1404737340416,
     "motivation": "Determination"
    },
    {
     "repositoryURI": "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/annotations/BGBM/AnnoSys/1410859759463",
     "recordURIs": [
      "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/records/STU/Staatliches+Museum+f%C3%BCr+Naturkunde+Stuttgart%2C+Herbarium/Main-1-8374/1410859752058/abcd2.06b"
     ],
     "annotator": "Okka Tschöpe",
     "time": 1410859852835,
     "motivation": "ScientificName"
    },
   ]
 
}

Request GET ${ServicesURL}/annotations/BGBM/AnnoSys/<annotationId>

Returns the annotation with the given annotationId as RDF, if it exists. Otherwise, HTTP-Status 404 is returned if no such annotation exists.

Example: https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/annotations/BGBM/AnnoSys/1406620804850

Records

The general context path for records is ${ServicesURL}/records.

Records are identified within the AnnoSys repository by an extended LSID, including LSID data plus a prefix describing the document format. Thus, the path to be appended to the general records context path is build up on the tripleId, plus a timestamp stating lsid:version and an AnnoSys predefined namespacePrefix identifying the record's document format. Currently, only abcd2.06b is supported.

Example: /BGBM/Herbarium Berolinense/B -W 00400 -00 0/3534524354/abcd2.06b

Request: GET ${ServicesURL}/records/<lsid:authority>/<lsid:namespace>/<lsid:objectId>/<lsid:version>/<annosys:formatPrefix>

Returns the record document for the given extended LSID identifier. Otherwise, HTTP-Status 404 is returned if no such record exists.

Example: https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/records/BGBM/Herbarium+Berolinense/B+18+0017064/1409216573578/abcd2.06b

Request: GET ${ServicesURL}/records/<lsid:authority>/<lsid:namespace>/<lsid:objectId>/annotations

Returns a JSON object containing information about all annotations referring to the record in the AnnoSys repository according to the given record tripleId. In particular, this includes annotations created for all record versions stored in the AnnoSys record repository.

Otherwise, HTTP-Status 404 is returned if no such record exists.

record
URI of the most recent record for the given tripleId.
hasAnnotation
true or false.
size
number of annotations found related to the most recent record.
annotations
list of JSON Annotation Objects.
Example: https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/records/BGBM/Herbarium+Berolinense/B+18+0017064/annotations
{ 
  "record" : "https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/records/BGBM/Herbarium+Berolinense/B+18+0017064/1409216573578/abcd2.06b"
  "hasAnnotation: true,
  "size": 1,
  "annotations": [
     {"repositoryURI":"https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services/annotations/BGBM/AnnoSys/1409564247294","annotator":"Okka Tschöpe","time":1409575813120,"motivation":"NomenclaturalTyp"},
   ]
}

Request: GET ${ServicesURL}/records/<lsid:authority>/<lsid:namespace>/<lsid:objectId>/<lsid:version>/<annosys:formatPrefix>/annotations

Same as before, but retrieves all annotations referring to the given version and format of the record.

SPARQL Endpoint

AnnoSys also provides a SPARQL endpoint enabling external services or applications to run self-defined queries against the AnnoSys annotation repository. The SPARQL endpoint is driven by Virtuoso Open Source.

The URL of the AnnoSys SPARQL endpoint is: https://annosys.bgbm.fu-berlin.de/AnnoSys/sparql.

Setup

As an Eclipse RAP[18] application, AnnoSys is a Java web servlet instance and the provided AnnoSys.war file has to be deployed within a Java web servlet container like Eclipse Jetty[39]. The AnnoSys Release version is running with Jetty Version 8.1.3-4, which corresponds to the default installation of the Debian package jetty8.

The provided AnnoSys.war file expects its main configuration file on the server path /etc/AnnoSys/config/annosys.properties. Likewise, the Apache Shiro[20] configuration is expected on the server path /etc/AnnoSys/config/shiro.ini. A basic and empty configuration directory template will be provided within the file AnnoSys.Template.zip and must simply unpacked to the server's root directory (/).

If you like to modify the default configuration, then the war file has to be rebuild from source code. AnnoSys is build with the Eclipse Kepler Edition "Eclipse for RCP and RAP Developers". The file AnnoSys.Project.zip contains the complete Eclipse project used for developing AnnoSys.

The following sections will briefly describe the deployment and building of the AnnoSys.war file, and the basic configuration options to adopt the AnnoSys configuration if required.

Deploying AnnoSys.war file

Deploying the AnnoSys.war with Jetty is easy. Simply copy the file into the webapps subdirectory of Jetty's home directory (/var/lib/jetty8/webapps on Debian) and restart jetty (/etc/init.d/jetty8 restart) and AnnoSys should be reachable on the URL http://<servername>:8080/AnnoSys/AnnoSys.

Building AnnoSys.war file from source code

Unpack the file AnnoSys.Project.zip and import or open the project from the subdirectory org.bgbm.annosys with your Eclipse Kepler Edition "Eclipse for RCP and RAP Developers". Within the Package Explorer View on the left side, double-click on the file annosys.target to install the RAP v2.1 target platform for AnnoSys. The project should be compiled and built against the platform automatically.

Next, double-click on the file AnnoSys.warproduct which should open the conifguration dialog of Eclipse WAR Product export wizard. Usually, just executing the export wizard should be fine. If something does not work as expected, first executing Add Required Plug-ins within the step 1 Configuration dialog should solve dependency problems. Finally, deploy your generated AnnoSys.war file as described above.

AnnoSys System Configuration

AnnoSys' system configuration is based on the Apache Commons configuration framework and is subdivided to the following configuration files:

  • annosys.properties
  • annotation.properties
  • selector.properties
  • schema.properties

Further on, that directory is the default location for configuration files of Apache Shiro[20] and the Apache log4j logging system.

  • shiro.ini
  • log4j.properties

If not configured otherwise in your self-built war file, the configuration files will be expected in the directory /etc/AnnoSys/config/. The following sections will briefly introduce into the configuration options on a per file basis.

annosys.properties

AnnoSys.run
Defines, if AnnoSys is running in test or release mode. Currently, this only defines if curators denoted in annotated records are automatically notified by email if no curators are registered for the given record. Possible values: test or release (default: test).
AnnoSys.home.dir
The base directory, where AnnoSys stores any system relevant data and configuration files (default: etc/AnnoSys)


AnnoSys.servlet.context
The name of the servlet context which usually corresponds to the name of the deployed war file (default: /AnnoSys). This option must be set accordingly. Otherwise, AnnoSys will generate incorrect and unresolvable repository resourceURIs!
AnnoSys.servlet.path
The AnnoSys servlet path wihtin the servlet context (default: /AnnoSys). This option must be set accordingly. Otherwise, AnnoSys will generate incorrect and unresolvable repository resourceURIs!


AnnoSys.repository.uri
The server URL hosting the AnnoSys application (default: https://annosys.bgbm.fu-berlin.de). This option must be set accordingly. Otherwise, AnnoSys will generate incorrect and unresolvable repository resourceURIs!
AnnoSys.repository.record
The base location of the record repository (default: ${AnnoSys.home.dir}/repository/record)
AnnoSys.repository.record.temp
The location for temporary files used by the record repository (default: ${AnnoSys.repository.record}/temp)
AnnoSys.repository.record.baseId
The base resource GUID for record repository resources. The format is <base_URI>, <base_LSID-URN>, <base-InstitutionId>, <base-CollectionId>, <base-UnitId>, <base-version>, <base-format> (default: ${AnnoSys.repository.record.uri}, ${AnnoSys.lsid.urn}, , , , ).
AnnoSys.repository.record.uri
The base URI for record resources to be accessed via LinkedOpenData or REST services (default: ${AnnoSys.repository.uri}${AnnoSys.servlet.context}/services/records).
AnnoSys.repository.model.baseId
The base resource GUID for annoation repository resources. The format is <base_URI>, <base_LSID-URN>, <base-InstitutionId>, <base-CollectionId>, <base-UnitId>, <base-version>, <base-format> (default: ${AnnoSys.repository.model.uri}, ${AnnoSys.lsid.urn}, BGBM, AnnoSys, , , ).
AnnoSys.repository.model.uri
The base URI for annotation resources to be accessed via LinkedOpenData or REST services (default: ${AnnoSys.repository.uri}${AnnoSys.servlet.context}/services/annotations).
AnnoSys.repository.message
The base location of the message repository (default: ${AnnoSys.home.dir}/repository/message ).


AnnoSys.repository.model.type
Type of the annotation RDF store. Possible values: tdb (Apache Jena TDB[10] or virtuoso (Virtuoso Open-Source Edition[11])(default: virtuoso).
Note:For performance reasons in the user interface, the current AnnoSys implementation uses virtuoso stores only as backup repository. Anyway, the SPARQL Endpoint is served only from the virtuoso store. For that, virtuoso store must be configured as backup RDF store.
AnnoSys.repository.model
Location or URI of the annotation RDF store. While for tdb stores, a directory path is expected (e.g. ${AnnoSys.home.dir}/repository/model), virtuoso stores rather expect a URI like jdbc:virtuoso://localhost:1112
AnnoSys.repository.model.user
Login id to access an RDF store. Usually, this is required for virtuoso stores.
AnnoSys.repository.model.password
Password to access an RDF store. Usually, this is required for virtuoso stores.
AnnoSys.repository.model.backup.type
Type of the annotation RDF backup store. Possible values: tdb (Apache Jena TDB[10] or virtuoso (Virtuoso Open-Source Edition[11])(default: virtuoso).
Note:For performance reasons in the user interface, the current AnnoSys implementation uses virtuoso stores only as backup repository. Anyway, the SPARQL Endpoint is served only from the virtuoso store. For that, virtuoso store must be configured as backup RDF store.
AnnoSys.repository.model.backup
Location or URI of the annotation RDF backup store. While for tdb stores, a directory path is expected (e.g. ${AnnoSys.home.dir}/repository/model), virtuoso stores rather expect a URI like jdbc:virtuoso://localhost:1112
AnnoSys.repository.model.backup.user
Login id to access an RDF backup store. Usually, this is required for virtuoso stores.
AnnoSys.repository.model.backup.password
Password to access an RDF backup store. Usually, this is required for virtuoso stores.


AnnoSys.repository.agent
The base location of the Agent profiles (default: ${AnnoSys.home.dir}/repository/agent)
AnnoSys.repository.agent.baseId
The base resource GUID for agent repository resources. The format is <base_URI>, <base_LSID-URN>, <base-InstitutionId>, <base-CollectionId>, <base-UnitId>, <base-version>, <base-format> (default: ${AnnoSys.repository.record.uri}, ${AnnoSys.lsid.urn}, BGBM, AnnoSys, , , ).
AnnoSys.repository.agent.uri
The base URI for agent resources to be accessed via LinkedOpenData or REST services (default: ${AnnoSys.repository.model.uri}, same as for annotations).


AnnoSys.authDB
LocationURI of the Security Database (default: jdbc:sqlite:${AnnoSys.home.dir}/authDB.sqlite)
AnnoSys.authStore.importRecordStoreOnStart
Option to be set when roles and permissions within the Security Database should be updated based on the record available from the record repository. As roles and permissions were updated on any successful record import, this should only be activated for system maintenance reasons. Possible values: true or false (default: false)
AnnoSys.authStore.admin.role
Role name for AnnoSys administrators (default
administrator:BGBM:AnnoSys). If neither this admin role nor the admin permission below is configured in the Security Database, then the Administration tab is shown without login. So, a user having that role should be installed immediately after system installation!
AnnoSys.authStore.admin.permission
Permission name for AnnoSys administrators (default
admin:BGBM:AnnoSys:security). If neither this admin permission nor the admin role above is configured in the Security Database, then the Administration tab is shown without login. So, a user having that role should be installed immediately after system installation!


AnnoSys.namespaceURI
AnnoSys namespapce uri (default: (http://annosys.bgbm.org#).
AnnoSys.lsid.urn
lsid URN prefix for internal use (default: urn:lsid).


AnnoSys.smtp.host
SMTP server for outgoing emails
AnnoSys.smtp.port
SMTP server port for outgoing emails (default: 25) (optional).
AnnoSys.smtp.protocol
SMTP protocol to be used for outgoing emails (optional).
AnnoSys.smtp.username
Login id for outgoing SMTP server (optional).
AnnoSys.smtp.password
Password for outgoing SMTP server (optional).
AnnoSys.mail.from
Email header From to be used for outgoing emails (default: AnnoSys Message System<annosys@bgbm.org>).


AnnoSys.message.broker.start
Option to be set for starting the Message System on system start. Possible values: true or false (default:true)
AnnoSys.message.broker.name
Name for the ActiveMQ message broker instance (default:AnnoSys).
AnnoSys.message.broker.url
URL of the ActiveMQ message broker instance (default: vm://${AnnoSys.message.broker.name}).
AnnoSys.message.templates
The base location of the templates used by the Message System for outgoing emails (default: ${AnnoSys.home.dir}/resources/templates).
AnnoSys.message.subscription.db
LocationURI of the Message System's subscription database (default: jdbc:sqlite:${AnnoSys.home.dir}/repository/subscriptions/subscriptions.db).
AnnoSys.message.subscription.add.icon
Icon location for the subscription-add functionality (default: icons/add_att.gif)
AnnoSys.message.subscription.delete.icon
Icon location for the subscription-delete functionality (default: icons/application-exit-4-22x22.png)
AnnoSys.message.subscription.help.icon
Icon location for the subscription-help functionality (default: icons/system-help-3-22x22.ico)
AnnoSys.message.default.mailaddress
AnnoSys's default mailaddress for user replies (default: annosys@bgbm.org)


AnnoSys.synonyms.file
List of synonym expressions used in searches with respect to families (default: ${AnnoSys.home.dir}/resources/synonyms.properties)


AnnoSys.termsOfUse.url
URL of the terms of use document (default: https://annosys.bgbm.fu-berlin.de/node/54)
AnnoSys.help.pdfurl
URL of the terms of user's manual (default: https://annosys.bgbm.fu-berlin.de/sites/default/files/Users%20Instructions.pdf)
AnnoSys.improve.url
URL of the AnnoSys ticket system to record bug reports or other user proposal (default: https://annosys.bgbm.fu-berlin.de/trac/newticket)
AnnoSys.homepage.url
URL of the AnnoSys web home page (default: https://annosys.bgbm.fu-berlin.de/)


AnnoSys.template.pdf.determination
Location of the stylesheet for creating PDF-labels for annotations of type determination (default: ${AnnoSys.message.templates}/pdf/determination.xsl)


AnnoSys.log4j.configuration
Location of the log4j configuration file (default: ${AnnoSys.home.dir}/config/log4j.properties).


AnnoSys.toolbar.annosys.icon
Icon location for the annosys web home page functionality in the main toolbar (default: icons/AnnoSys-Logo-89x24.png)
AnnoSys.toolbar.publish.icon
Icon location for the publish functionality in the main toolbar (default: icons/earth_upload-24x24.ico)
AnnoSys.toolbar.preferences.icon
Icon location for the preferences functionality in the main toolbar (default: icons/preferences-desktop-user-password-22x22.ico)
AnnoSys.toolbar.search.icon
Icon location for the search functionality in the main toolbar (default: icons/system-search-5-22x22.png )
AnnoSys.toolbar.login.icon
Icon location for the login functionality in the main toolbar (default: icons/preferences-system-login-22x22.ico)
AnnoSys.toolbar.logout.icon
Icon location for the logout functionality in the main toolbar (default: icons/application-exit-3-22x22.ico)
AnnoSys.toolbar.help.icon
Icon location for the help functionality in the main toolbar (default: icons/help-contents-5-22x22.ico)
AnnoSys.toolbar.improve.icon
Icon location for the improve functionality in the main toolbar (default: icons/trac-24x24.png)
AnnoSys.toolbar.subscriptions.icon
Icon location for the subscription functionality in the main toolbar (default: icons/hdd_web-24x24.ico)
AnnoSys.toolbar.administrator.icon
Icon location for the administrator functionality in the main toolbar (default: icons/user-group-properties-22x22.ico)
AnnoSys.imagebutton.undo.icon
Icon location for the undo functionality in the annotation editor (default: icons/rewind.png)
AnnoSys.imagebutton.remove.icon
Icon location for the remove functionality in the annotation editor (default: icons/gnome_edit_delete-16x16.ico)
AnnoSys.imagebutton.verify.icon
Icon location for the verify functionality in the annotation editor (default: icons/ok_st_obj.gif)
AnnoSys.imagebutton.annosys.icon
Icon location for the currently unused functionality in the annotation editor (default: = icons/A-16x16.png)
AnnoSys.recordselectionview.information.icon
Icon location for the information functionality in the import record selection dialog (default: icons/mail-mark-important-2.ico)
AnnoSys.notificationDialog.addEmail.icon
Icon location for the add email functionality in the message systems notification dialog (default: icons/icons/add_obj.gif)


AnnoSys.locales
List of supported system locales in preferred order (default: en-GB, de-DE)

shiro.ini

This is the main configuration file of the Apache Shiro[20] system used by the Security System. For more detailed information, please refer to the Apache Shiro documentation.

Section [main]

cacheManager
Defines the cache manager implementation class (default: org.apache.shiro.cache.MemoryConstrainedCacheManager).
securityManager.cacheManager
Assigns the cache manager to the security manager (default: $cacheManager)
passwordService
Defines the password service implementation class for the password matcher (default: org.bgbm.annosys.security.shiro.ApacheSHA1PasswordService).
passwordMatcher
Defines the password matcher for the AnnoSys authentication realm (default: org.apache.shiro.authc.credential.PasswordMatcher).
passwordMatcher.passwordService
Assigns the password service to the password matcher (default: $passwordService).
annosysRealm
Defines the AnnoSys authentication realm implementation class (default: org.bgbm.annosys.security.shiro.AnnoSysJdbcRealm).
annosysRealm.credentialsMatcher
Assigns the password matcher for the AnnoSys authentication realm (default: $passwordMatcher).
annosysRealm.authenticationCachingEnabled
Enables authentication caching for the AnnoSys authentication realm (default: true).
annosysRealm.permissionsLookupEnabled
Enables permission lookup for the AnnoSys authentication realm (default: true).
sessionManager
Defines the session manager implementation class (default: org.apache.shiro.web.session.mgt.DefaultWebSessionManager).
securityManager.sessionManager
Assigns the session manager to the security manager (default: $sessionManager).
securityManager.sessionManager.globalSessionTimeout
Sets the AnnoSys session timeout (default: 36000000 (10h)).
sessionListener
Defines the session listener implementation class (default: org.bgbm.annosys.security.shiro.ShiroSessionListener). This is used to clean up web session correctly wihtin AnnoSys.
securityManager.sessionManager.sessionListeners
Assigns the session listener to the security manager (default: $sessionListener).
securityManager.sessionManager.sessionIdCookie.secure
Enables setting of secure session cookies for the session manager (default: true).
securityManager.sessionManager.sessionIdCookie.domain
Sets the cookie domain for session cookies (default: annosys.bgbm.fu-berlin.de)
logout.redirectUrl
Sets the redirection URL for session logout (default: /AnnoSys).

Section [urls]

/AnnoSys/logout = logout
Defines /AnnoSys/logout as logout URL.

Dependencies

The following table lists the frameworks and libraries used to implement the AnnoSys software. Any libraries were included via the Apache Maven dependencies mechanisms, which automatically resolves hundreds of further dependencies. Therefore, just the dependencies defined within the Maven pom.xml file and their licensing will be listed here.

List of AnnoSys dependencies
group artifact version license
org.apache.activemq activemq-all 5.8.0 Apache 2.0 licence
org.apache.activemq activemq-pool 5.8.0 Apache 2.0 licence
org.apache.camel camel-core 2.12.1 Apache 2.0 licence
org.apache.camel camel-jms 2.12.1 Apache 2.0 licence
org.apache.camel camel-activemq 2.12.1 Apache 2.0 licence
org.apache.camel camel-jaxb 2.12.1 Apache 2.0 licence
org.apache.camel camel-mail 2.12.1 Apache 2.0 licence
org.apache.camel camel-core-osgi 2.12.1 Apache 2.0 licence
org.apache.axis axis 1.4 Apache 2.0 licence
org.apache.shiro shiro-core 1.2.3 Apache 2.0 licence
org.apache.shiro shiro-web 1.2.3 Apache 2.0 licence
org.apache.jena apache-jena-libs 2.10.1 Apache 2.0 licence
org.xerial sqlite-jdbc 3.7.15-M1 Public Domain
net.sf.opencsv opencsv 2.3 Apache 2.0 licence
commons-configuration commons-configuration 1.9 Apache 2.0 licence
org.jdom jdom2 2.0.5 Apache 2.0 style license
com.sun.xsom xsom 20110809 Dual license: CDDL 1.0 and GPL v2.0
com.google.code.gson gson 2.2.4 Apache 2.0 licence
de.neuland-bfi jade4j 0.4.0 MIT license
junit junit 4.11 Common Public License Version 1.0
xmlunit xmlunit 1.5 BSD License
org.agmip.thirdparty ximpleware-vtd-xml 2.11 GPL v2.0

Additionally, the following libraries for the Apache Jena Virtuoso driver are included directly, as they are not available as Maven artifact:

List of AnnoSys dependencies
library license
virt_jena2.jar GPL v2.0 style license
virtjdbc4.jar GPL v2.0 style license

References

  1. 1.0 1.1 Tschöpe, O., Macklin, J. A., Morris, R. A., Suhrbier, L. & Berendsohn, W. G. 2013. Annotating Biodiversity Data via the Internet. – Taxon 62(6): 1248-1258. Online available at http://www.ingentaconnect.com/content/iapt/tax/2013/00000062/00000006/art00011
  2. Sanderson, Robert; Ciccarese, Paolo; Van de Sompel, Herbert. W3C Open Annotation Data Model - Community Draft, 08 February 2013. Retrieved 01 August 2013, from http://www.openannotation.org/spec/core/20130208/index.html.
  3. W3C RDF Working Group, Resource Description Framework (RDF), http://www.w3.org/RDF/, 22 March 2013. Retrieved 16 December 2013
  4. W3C. Extensible Markup Language (XML), 29 October 2013. Retrieved 16 December 2013, from http://www.w3.org/XML/
  5. 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Holetschek, Jörg, ABCD - Access to Biological Collection Data, http://wiki.tdwg.org/twiki/bin/view/ABCD/, 02 March 2010. Retrieved 16 Dec 2013
    Cite error: Invalid <ref> tag; name "TDWG_ABCD" defined multiple times with different content
  6. 6.0 6.1 W3C SPARQL Working Group. SPARQL 1.1 Overview - W3C Recommendation 21 March 2013. Retrieved 16 December 2013, from http://www.w3.org/TR/sparql11-overview/
  7. Bernhard Haslhofer, Elaheh Momeni, et al. (2011). Europeana RDF store report.
  8. Chris Bizer and Andreas Schultz (2011). BSBM V3 Results (February 2011). Retrieved 02 August 2013 from http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/V6/index.html.
  9. Chris Bizer and Andreas Schultz (2009). Berlin SPARQL Benchmark Results. Retrieved 02 August 2013 from http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html.
  10. 10.0 10.1 10.2 10.3 10.4 The Apache Software Foundation (2013). Apache Jena - TDB. Retrieved 03 August 2013, from http://jena.apache.org/documentation/tdb/.
  11. 11.0 11.1 11.2 11.3 OpenLink Software. Virtuoso Open-Source Edition. Retrieved 02 August 2013 from http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/.
  12. 12.0 12.1 Anton Güntsch, Walter G. Berendsohn, Pepé Ciardelli, Andrea Hahn, Wolf-Henning Kusber & Jinling Li (2009): Adding content to content – a generic annotation system for biodiversity data. Studi Trent. Sci. Nat. 84: 123-128.
  13. 13.0 13.1 Wieczorek, John, Döring, Markus, De Giovanni, Renato, Robertson, Tim, Vieglais, Dave, Darwin Core, http://rs.tdwg.org/dwc/index.htm (2009), accessed 05 Dec 2011
  14. 14.0 14.1 Sanderson, Robert; Ciccarese, Paolo; Van de Sompel, Herbert (2013). W3C Open Annotation Data Model - Community Draft, 08 February 2013, http://www.openannotation.org/spec/core/, accessed 26 November 2013
  15. Brickley, Dan; Miller, Libby (2010). FOAF Vocabulary Specification 0.98 - Namespace Document 9 August 2010 - Marco Polo Edition, http://xmlns.com/foaf/spec/, accessed 26 November 2013
  16. 16.0 16.1 16.2 Hipp, D. Richard. SQLite. Retrieved 16 December 2013 from [http://www.sqlite.org/.
  17. Garrett, J. J. (2005, 18 February 2005). "Ajax: A New Approach to Web Applications." from http://www.adaptivepath.com/ideas/ajax-new-approach-web-applications.
  18. 18.0 18.1 The Eclipse Foundation (2013). "Enabling modular business apps for desktop, browser and mobile." Retrieved 26 November, 2013, from http://eclipse.org/rap/.
  19. 19.0 19.1 Rescorla, E. (2000). "HTTP Over TLS" Retrieved 13 December, 2013, from http://tools.ietf.org/html/rfc2818.
  20. 20.0 20.1 20.2 20.3 20.4 The Apache Software Foundation (2013). "Welcome to Apache Shiro." Retrieved 13 December, 2013, from http://shiro.apache.org/.
  21. 21.0 21.1 21.2 Oracle. "Java Message Service (JMS)." Retrieved 16 December, 2013, from http://www.oracle.com/technetwork/java/jms-136181.html.
    Cite error: Invalid <ref> tag; name "ORACLE_JMS" defined multiple times with different content
  22. 22.0 22.1 Rich Burridge Mark Hapner, Rahul Sharma, Joseph Fialli, Kate Stout (2002) Java Message Service.
  23. 23.0 23.1 Apache Software Foundation (2011). "ActiveMQ." Retrieved 16 December 2013, from http://activemq.apache.org/.
  24. 24.0 24.1 24.2 Claus Ibsen and Jonathan Anstey (2010). "Camel in Action", Manning.
  25. Pericas-Geertsen, Santiago; Potociar, Marek (2013). "AX-RS: Java™ API for RESTful Web Services - Version 2.0 Final Release May 22, 2013", Oracle Corporation. Retrieved 16 December 2013, from http://download.oracle.com/otn-pub/jcp/jaxrs-2_0-fr-eval-spec/jsr339-jaxrs-2.0-final-spec.pdf.
  26. TDWG, Welcome to the Globally Unique Identifiers (GUID) Wiki, http://wiki.tdwg.org/GUID (2009), accessed 22 Oct 2012
  27. Richards, Kevin; TDWG GUID Applicability Statement, 09 September 2009, http://www.tdwg.org/standards/150/download/, accessed 19 December 2011
  28. TDWG; LSID Authority Identifications, www.omg.org/cgi-bin/doc?dtc/04-05-01, ???, not accessible 19 December 2011
  29. Pereira, Ricardo; Richards, Kevin; Hobern, Donald, Hyam, Roger; Belbin, Lee; Blum,Stan;TDWG Life Sciences Identifiers (LSID) Applicability Statement, 03 September 2009, http://www.tdwg.org/standards/150/download/, accessed 19 December 2011
  30. 30.0 30.1 30.2 Berglund, Anders, Boag, Scott, et al., XML Path Language (XPath) 2.0 (Second Edition) - W3C Recommendation 14 December 2010 (Link errors corrected 3 January 2011), http://www.w3.org/TR/xpath20/ (14 December 2010), accessed 05 Dec 2011
  31. Morris, Paul J., Proposed AOD Extensions to AO, http://wiki.tdwg.org/twiki/bin/view/AnnotationsIG/DataExtensionsToAO (06 July 2011), accessed 05 Dec 2011
  32. Morris, Paul J., TDWG Wiki AnnotationsIG, http://wiki.tdwg.org/AnnotationsIG (2011), accessed 05 Dec 2011
  33. Apache Software Foundation (2012). Apache Solr. Retrieved 26 November 2013 from http://lucene.apache.org/solr/.
  34. The Apache Software Foundation (2014). Apache Jena - A free and open source Java framework for building Semantic Web and Linked Data applications. Retrieved 27 January 2014, from http://jena.apache.org/.
  35. W3C, W3C Community and Business Groups - Open Annotation Community Group, http://www.w3.org/community/openannotation/ (2012), accessed 25 Oct 2012
  36. 36.0 36.1 36.2 Nowara, Piotr. decision-ontology - An ontology for representing decisions and decision-making, https://code.google.com/p/decision-ontology/, accessed 27 Jun 2014
  37. Grosso, Paul; Maler, Eve; Marsh, Jonathan; Walsh, Norman."XPointer Framework - W3C Recommendation 25 March 2003". Retrieved 23 May, 2014, from http://www.w3.org/TR/xptr-framework/.
  38. Fallside, David C.; Walmsley, Prescilla."XML Schema Part 0: Primer Second Edition - W3C Recommendation 28 October 2004". Retrieved 28 May, 2014, from http://www.w3.org/TR/xmlschema-0/.
  39. The Eclipse Foundation, "eclipse - Jetty".Retrieved 03 September, 2014, from http://www.eclipse.org/jetty/.