Difference between revisions of "TechnicalDocumentation"

From Annotationssystem für Biodiversitätsdaten
Jump to: navigation, search
(Annotations)
(Annotations)
Line 290: Line 290:
 
The storing of annotations in AnnoSys is twofold. First, each user has a personal user profile, which includes a manageable repository holding any unpublished annotations of the given user. Second, there is a public repository, where annotations are moved to whenever a users initiates to publish an annotation.
 
The storing of annotations in AnnoSys is twofold. First, each user has a personal user profile, which includes a manageable repository holding any unpublished annotations of the given user. Second, there is a public repository, where annotations are moved to whenever a users initiates to publish an annotation.
  
Thereby, the user profile repositories are realised using [http://jena.apache.org/documentation/tdb/ Apache Jena TDB]<ref name="APACHE_JENA_TDB"></ref> based triple stores. Whereas, the public annotation repository uses an [http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/ Virtuoso Open-Source Edition]<ref name="VIRTUOSO"></ref> triple store. Nevertheless, both triple stores are implemented in AnnoSys using the [http://jena.apache.org/ Apache Jena]<ref name="APACHE_JENA">The Apache Software Foundation (2014). ''Apache Jena - A free and open source Java framework for building Semantic Web and Linked Data applications.'' Retrieved 27 January 2014, from http://jena.apache.org/.<br/></ref> Software APIs.  
+
Thereby, the user profile repositories are realised using [http://jena.apache.org/documentation/tdb/ Apache Jena TDB]<ref name="APACHE_JENA_TDB"></ref> based triple stores, Whereas the public annotation repository uses an [http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/ Virtuoso Open-Source Edition]<ref name="VIRTUOSO"></ref> triple store. Nevertheless, both triple stores are implemented in AnnoSys using the [http://jena.apache.org/ Apache Jena]<ref name="APACHE_JENA">The Apache Software Foundation (2014). ''Apache Jena - A free and open source Java framework for building Semantic Web and Linked Data applications.'' Retrieved 27 January 2014, from http://jena.apache.org/.<br/></ref> Software APIs.  
  
 
In both kinds of triple stores, annotations are stored according to the [http://www.openannotation.org/spec/core/ W3C Open Annotation Data Model]<ref name="OpenAnnotation_DataModel"></ref> driven by the activities of the [http://www.w3.org/community/openannotation/ W3C Open Annotation Community Group]<ref name="W3COpenAnnotationCommunityGroup">W3C, ''W3C Community and Business Groups - Open Annotation Community Group'', http://www.w3.org/community/openannotation/ (2012), accessed 25 Oct 2012<br/></ref>. Due to some constraints in the current release of the [http://www.openannotation.org/spec/core/ W3C Open Annotation Data Model], the current implementation of the [[#Annotation_Data_Model|Annotation Data Model]] specified for AnnoSys with the means of the [http://www.openannotation.org/spec/core/ W3C Open Annotation Data Model] results from thorough discussions with the [http://www.w3.org/community/openannotation/ W3C Open Annotation Community Group]. Therefore, the following sections details the integration of AnnoSys' [[#Annotation_Data_Model|Annotation Data Model]] with the [http://www.openannotation.org/spec/core/ W3C Open Annotation Data Model].
 
In both kinds of triple stores, annotations are stored according to the [http://www.openannotation.org/spec/core/ W3C Open Annotation Data Model]<ref name="OpenAnnotation_DataModel"></ref> driven by the activities of the [http://www.w3.org/community/openannotation/ W3C Open Annotation Community Group]<ref name="W3COpenAnnotationCommunityGroup">W3C, ''W3C Community and Business Groups - Open Annotation Community Group'', http://www.w3.org/community/openannotation/ (2012), accessed 25 Oct 2012<br/></ref>. Due to some constraints in the current release of the [http://www.openannotation.org/spec/core/ W3C Open Annotation Data Model], the current implementation of the [[#Annotation_Data_Model|Annotation Data Model]] specified for AnnoSys with the means of the [http://www.openannotation.org/spec/core/ W3C Open Annotation Data Model] results from thorough discussions with the [http://www.w3.org/community/openannotation/ W3C Open Annotation Community Group]. Therefore, the following sections details the integration of AnnoSys' [[#Annotation_Data_Model|Annotation Data Model]] with the [http://www.openannotation.org/spec/core/ W3C Open Annotation Data Model].

Revision as of 09:50, 25 March 2014

Contents

Introduction

This document provides the technical documentation of the AnnoSys system. Thereby, the following methodology is used: First, a reference to basic workflow definitions is given. Then, an overview of the system architecture introduces the technical components implemented to realise these workflows. Next, a detailed description of the basic data model and its implementation with the means of the W3C Open Annotation specification is provided. Finally, the specification of technical interfaces or configuration details are provided in the following section for each technical component whenever apropriate.

Basic Workflow

Our publication Annotating Biodiversity Data via the Internet[1] provides a detailed description of the basic workflows elaborated for AnnoSys.

System Architecture

AnnoSys' implementation aims to model the basic workflow and to integrate further requirements identified in [1] as close as possible. Therefore, AnnoSys is composed of the following interacting system components:

  • Repository
  • User Interface
  • Security
  • Message System
  • Services
AnnoSys system architecture

The repository component provides functionalities to archive annotations including the relating original record documents presented to the user at an annotation's creation time. A web based, but desktop oriented user interface supports annotators, curators and users to intuitively manage annotation workflows. The security component is responsible for secure authentication, authorisation and data privacy concerns within the system. The task of the message system component is to organise information flows such as informing annotators, curators or other users about status changes in annotation workflows. Finally, the service component provides interfaces to enable data exchange with external clients or services interacting with the repository.

The following sections elaborate on AnnoSys' system components.

Repository

Within AnnoSys, W3C Open Annotation Data Model (OA)[2] serves as a basis data model to store and exchange annotation data. While OA's data model is defined based on the Resource Description Framework (RDF)[3], record documents are delivered based on Extensible Markup Language (XML)[4] exchange formats like e.g. ABCD[5]. Due to the close relationship of annotations and record documents, both have to be archived within the AnnoSys repository to ensure exact recovery of annotation data at any later point in time. Finally, the repository also comprehends agent(user) profile data and agent authentication and authorisation information.

Annotations

In consideration of the RDF oriented nature of OA based annotation data, AnnoSys uses so called RDF stores (aka. triple stores) to store annotations. These stores can be regarded as databases primarily specializing in storing and retrieving of RDF data. Usually, information stored in RDF stores can be retrieved by queries specified using the SPARQL[6] query language.

Some recent evaluation reports and RDF store benchmarking studies [7][8][9] evaluated individual strengths and weaknesses of different RDF stores. Referring to these evaluation results, AnnoSys decided to use Apache Jena TDB[10] for development and prototype implementations and Virtuoso Open-Source Edition[11] for productional use.

Original Record documents

Referring to the experience gained from the BioCASE annotation system [12] its file system based, hierarchical storage architecture has proofed to be a suitable solution for efficiently storing and accessing XML based original record documents(e.g ABCD, Darwin Core[13]) in a biodiversity context. Following that experience, the current repository implementation stores original record documents on the server's file system in a configurable directory location.

Agent Profiles

The term agent profiles aggregates the following information concerning system agents (users):

  • personal metadata
  • authentication & authorisation data
  • personal annotation repository
  • preferences configuration

While an agent's personal metadata comprises his real name, affiliation and email address only, authentication & authorisation data add login credentials (username, password), roles and a unique agent identifier to the profile. For each registered agent, the following information is stored in the sytem's agent database:

Agent database entries
column description
uid agent's login id
credential agent's password
roles agent's roles
resourceid agent's stable URI
name agent's name
institution agent's institution
mailbox agent's mailbox

Additionally, according to the best practices recommended by the W3C Open Annotation Data Model[14], the personal metadata is also stored based on the FOAF vocabulary[15] together with annotation data within AnnoSys' RDF store. For privacy reasons, agents' email addresses or other sensible information are not publicly accessible in the RDF store.

Agent metadata properties
property definition
rdf:type agent type (foaf:Person, foaf:Organization or dcTypes:Software)
foaf:name agent's name
foaf:member optional list of members (for agent type foaf:Organization only)

AnnoSys maintains a personal annotation repository per registered agent within the system. These repositories keep track of any annotation data created and edited by authenticated agents and are neither publicly accessible nor by other system users. On publication, annotation data will be moved from the publishing agent's personal annotation repository to AnnoSys' publicly accessible RDF store. Annotations may also be manually removed by authenticated agents via the user interface.

The preferences configuration may also be managed by authenticated agents via the user interface. It holds agent individual system configurations like e.g. a list of currently opened documents in the user interface for restoring at the next login.

The agent database is realised as SQLite[16] database. The personal annotation repositories are named according to the agent's resourceId in AnnoSys. The private annotation repositories are created as Apache Jena TDB[10] model in a directory location on the server including any personal agent profiles in subdirectories named according the agent's system resourceid.

User Interface

Providing and managing annotations may become a complex job either to annotators or data curators. For instance, annotators have to gather specimen data, analyse a potentially large number of data objects and annotate or comment specific data elements accurately. Again, data curators have to analyse these annotations, accept or reject them possibly along with individual comments and have to reintegrate them back into their data collections. Further on, any of these activities should be documented and communicated back to annotators and other subscribing members of the community.

As the main focus of both annotators and curators should be on preparing accurate information, a suitable user interface should give advice and guide users through all these workflows. Furthermore, it should allow a clear presentation of information, adoption to individual user preferences and provide convenient guidance and support through the most common workflow activities. This requires modern web user interfaces providing functionality and look and feel of desktop applications. Fortunately, due to recent web toolkit developments based on the Asynchronous JavaScript and XML (Ajax)[17] technology, desktop alike applications can be implemented for the web now as well.

Here, AnnoSys decided to use Eclipse's Remote Application Platform (RAP)[18] for developing a desktop oriented web user interface. The main reason for that decision was that Eclipse RAP imposes only small requirements on client machine's capabilities and does not transmit potentially large documents over the network. Thus, any complex functionality is executed on the server.

Security

The security component provides the following functionalities:

  • agent profile management
  • secure authentication
  • role based authorisation

Agent profile management combines the maintenance of the agent database and personal profile stores for each registered agent(user) see Agent Profiles. Agent profile management also facilitates the agent registration process. The agent registration consists of filling in a form with the required information. After running the registration procedure successfully, agents can immediately login to the system and start creating annotations. The registration procedure may be extended in the future by appropriate measures, if system misuse e.g. by spam bot registrations gets out of control.

Agent authentication is secured by individual agent credentials to be entered at the login dialog. Moreover, any data transmissions initiated through network connections are secured by the Hypertext Transfer Protocol Secure(HTTPS)[19].

Based on the implementation of role based authorisation, AnnoSys may restrict access to certain functionalities based on the roles assigned to a given agent (e.g. curator for a given collection) by system administrators. In particular, AnnoSys defines the following roles:

role definition
none enables examining annotations from the repository
annotator default role for authenticated agents, enabling access to any annotation workflow function
administrator role for system administrators
curator:institutionID:collectionID role for curators of collection collectionID in institution institutionID

In AnnoSys, any security relevant implementations base on the Apache Shiro[20] framework. The agent database is held in a SQLite[16] database.

For security reasons and with respect to data privacy, the agent database, personal profile stores and other system relevant configuration files are stored in a location on the server inaccessible for unauthorised AnnoSys agents or external services.

Message System

Via the message system, AnnoSys aims to inform registered agents and curators about annotation workflow status changes triggered by agent or system interactions. Therefore, the message system forwards notifications related to annotation workflow activities initiated by agents, curators or the system itself. Notifications are dispatched via system internal message queues and are individually manageable by registered agents via the user interface. Additionally, notifications are also send to agents by email.

In particular, the following kinds of annotation publishings cause the message system to trigger notifications regarding annotation workflow status changes:

  • original record annotations published by annotators
  • curator decisions reported by the AnnoSys curator interface
  • version updates of collection records, either detected on repository import processing or reported by curators

On occasion of any annotation workflow status change decribed above, the message system issues notifications according to the following topics agents may subscribe to:

  • subsequent publication of annotations relating to records previously annotated by the given agent (automatically)
  • publication of annotations relating to records an agent actively subscribed to
  • publication of annotations relating to records within a collection an agent actively subscribed to

Furthermore, the message system automatically issues notifications to curators on occasion of any annotation published and related to those records within collections a curator has registered for. As the current implementation does not support notification filters, topic subscriptions include any kind of annotations published. In particular, this includes notifications initiated through curation activities.

Summarising the set of notifications specified above, AnnoSys may issue the following kinds of messages:

message type definition
ANNOTATION_PUBLISHED annotation issued by an annotator
ANNOTATION_ACCEPTED annotation accepted by collection manager
ANNOTATION_PARTLY_ACCEPTED annotation partly accepted by collection manager
ANNOTATION_REJECTED annotation rejected by collection manager

Technically, the message system builds up on the Java Message Service (JMS)[21] standard developed by the Java Community Process and defined within specification JSR 914[22]. Therefore, the JMS provider Apache ActiveMQ[23] is used in tandem with Apache Camel[24] to realise internal and external message transports.

Services

The service component enables external services to access public data from the AnnoSys repository. Therefore, AnnoSys provides the following services and interfaces:

  • RESTful web services
  • Linked Open Data
  • SPARQL Endpoint

Primarily, the RESTful web service interface is intended for collaborating data portals to display information about annotations or original record documents stored in the AnnoSys repository in general, or with regard to a specific record based on its tripleId. Thus, beside requesting lists of all annotations or records available in the repository, external services may also request a list of available annotations relating to a specific record based on its tripleId. These lists contain stable URIs referring to the related annotation or record stored in the repository.

Any of these stable URIs are resolvable, such as either the denoted annotation can be downloaded as RDF/XML document, or the original record as XML document from the AnnoSys repository. That way, the AnnoSys repository and any services are completely following the recommendations of the Linked Open Data paradigm.

Finally, external services may run self-defined SPARQL queries on the AnnoSys annotation repository via the AnnoSys SPARQL endpoint[6]. Any non-anonymous resource in the repository's RDF store is also based on stable URI and thus, follows the Linked Open Data paradigm. RESTful and Linked Open Data services are implemented using the Java API for RESTful Services (JAX-RS)[25]

Annotation Context Model

AnnoSys' annotation context modelling approach founds on the presumption that annotations will be related to any kind of metadata related to physical specimens uniquely identifiable by Globally Unique Identifiers (GUIDs)[26][27]). Beyond that, the accessibility of those record data in an XML document based standard like ABCD[5] is assumed. In the biodiversity domain, there are several use cases like e.g. the annotation of georeferences which require to simultaneously address multiple record data elements (e.g. longitude, latitude) in a single annotation for providing useful information. That is, the annotation context model combines general meta information determining e.g. the annotated record or the annotator's name with an aggregated list of data element selectors. With each of these selectors, an expectation towards the curator of the collection the record relates will be expressed. The expectation may either state to update, add or remove the selected data element within the record. Furthermore, the annotator may propose new or corrected values to the selected data elements and/or may make natural language comments. Based on these information, the curator may decide to either update a collection's database or not.

Therefore, in AnnoSys the annotation context will be defined by

  • A GUID for the specimen's metadata data object (tripleID)
  • An XML representation of the data record the GUID refers to
  • A selection of XML elements (XPath selectors) within the XML representation affected by the annotation

The following sections will introduce the AnnoSys' interpretation and implementation of the above mentioned annotation context model elements.

Globally Unique Identifiers (GUIDs) in AnnoSys

With regard to biodiversity collections, the term GUID initially reflects data records describing physical unit objects in museum collections. As such, a GUID is usually called a triple identifier(tripleId) and is specified by the following data triple:

  • institution identifier (institutionId)
  • collection identifier (collectionId)
  • unit identifier (unitId)

Thereby, the tripleId refers to a specimen object (unit) belonging to a determined collection which is physically located at the given instution (e.g. herbarium). Thus, any TDWG standard like ABCD[5] usually defines data elements reflecting that tripleId.

Basically, any annotation workflows in AnnoSys start by retrieving, accessing and analysing an XML record document. In general, data providers generate these documents on the fly from the data actually stored for the given tripleId in their collection databases and no versioning mechanisms can presumed. Therefore, within the analysing process, the system attempts to detect the tripleId from the downloaded XML record document first. Next, the most recent version of XML record document available from the AnnoSys repository for that tripleId will be compared with the downloaded one. If the comparison result shows any differences between the documents, AnnoSys assumes the issuance of a new data record revision for that tripleId. Thus, the downloaded XML record document has to be added as a new version to the repository, since any changes within the XML record document potentially invalidate annotations created previously.

Therefore, AnnoSys always connects annotations to a specific version of an XML record document. That is, AnnoSys makes use of LSIDs[28][29], which extends the notion of tripleIds by the requested version information.

As (different) data providers may supply the same record data in different data formats (e.g. Darwin Core[13]), annotations can not be correctly reproduced (different XPath selectors), if the standard of the corresponding XML record document has not been recorded. To support that, AnnoSys again extends the LSID specification by adding a defined namespace prefix for each supported XML data standard.

Finally, AnnoSys introduces the following persistent identifier schema to unambiguously distinguish XML record data within the repository:

  • institutionId:collectionId:unitId:version:format
Example:
BGBM:Herbarium+Berolinense:B+18+0014862:1379406965371:abcd2.06b

XML Data Record representations

Currently, AnnoSys supports XML record documents in ABCD 2.06b[5] format only.

Through the generic context selection approach context selection approach, AnnoSys can be adopted to support other XML standards or versions as well.

Annotation Context Selection

Using AnnoSys, annotators may identify several data elements within a record to be worth annotating. Because AnnoSys is designed to support XML record documents, a mechanism to select these data elements had to be developed and implemented. During the evalution phase, it turned out that using XML diff representations or inserting annotated values and/or natural language comments as XML comments into XML record documents results in either cryptic result representations, low performance or conceptual difficulties. Therefore, AnnoSys developed the idea of realising some kind of "recorder" functionality to capture those XML data elements selected by annotators.

For that, XPath XPath[30] expressions provide suitable means to unambiguously identify either single or sets of data elements within XML documents of any record format. In particular, XPath based element selectors can be used to uniquely identify (sets of) data elements like

  • "all elements named "xxx" located below the element "yyy""
  • "all elements named "xxx" with a value "zzz""

Annotation Data Model

The data model outlined within the next sections is centered on the Annotation Context Model developed earlier in this document. So, the annotation data model bases in two interrelated parts:

  • Annotation Metadata Specification
  • Annotated Elements Specification

While the metadata specification provides general meta information (e.g. determining the annotated record or annotator's name), the "recorder" functionality is realised through a list of data elements capturing those data elements selected for annotation by the annotator. These "annotated elements" are "recorded" according to the XPath selector method outlined in section Annotation Context Selection. The following sections will provide more detailed information on both parts.

Annotation Data Model

Annotation Metadata Specification

The annotation metadata provides information regarding the following general aspects

  • annotation creation
  • annotated record
  • annotating agent

Valid annotations must include any data described within the following sections for each aspect separately.

Annotation creation

The annotation creation entails the generation of following data elements:

Motivation
Usually, annotations will be instantiated for well determined reasons. Within extensions[31] proposed by the TDWG Annotations Interest Group[32] the term motivation was introduced to express the intension for which an annotator has made the annotation. Within this document, we will use it synonymously with the term annotation type describing a precompiled set of data element selectors used in typical, domain specific annotation use cases. Currently, AnnoSys supports the following motivations (annotation types):
* Determination
* Duplicate Locality
* Gathering
* Nomenclatural type
* Other
* Sequence
* Record basis
* Scientific name
* Unit Mark
Datetime (of the creation)
The creation's datetime describes the point in time the annotation was published by the system. This time is usually different to the time the annotation was created within the system for user processing. The time's granularity is given in milliseconds.
GUID (of the annotation)
The GUID of the annotation shall uniquely identify any annotation created. In accordance to the GUID interpretation in AnnoSys, the GUID is constructed based on the following components:
institutionID
Organisation running the AnnoSys instance creating the annotation(e.g. "BGBM").
collectionID
Name of the annotation repository (e.g. "AnnoSys").
unitID
Time in milliseconds since 01/01/1970 the annotation was first instantiated by the system
Since annotations are not intended to be modified after publication and W3C Open Annotation Data Model (OA) the only standard supported for annotation exchange, the fields version and format are currently unneeded.

Annotated record

The term annotated record compiles the following information with regard to the XML record document representation and the relating annotated specimen unit:

record GUID
The record GUID identifies the physical specimen unit to which the annotated XML record document refers and is expected to be represented as a tripleId.
record document format
The record document format describes the standard format of the annotated XML record document. This is expected to be specified by the namespace URI of the relating version of the XML standard format (e.g. ABCD v2.06: http://www.tdwg.org/schemas/abcd/2.06).
record document version
The revision information reflects any kind of version information regarding the annotated record document. As we can not expect any revision information from the XML record document, the time in milliseconds since 01/01/1970 when downloading the document first into the AnnoSys repository is used.

Annotating agent

An annotating agent is either a person or a machine conducting the annotation. Conforming to the common practice considering physical specimen objects in botanical museums, only a minimum set of data is captured:

name
The name field includes first name(s) and last name(s) of the annotating person, or the name and version of the software or service used to create the annotation.
institution
The institution denotes the organisation the annotating agent belongs to. If the institution field is empty, the authorisation component will consider the agent as member of the "private taxonomist" group.
email
The email address of the annotating person, or the email address of the annotating machine's administrator. For privacy reasons, this information is not made public.

Annotated Elements Specification

According to the Annotation Context Model described in the previous chapter, the annotated elements represent a list of element selectors each linked with the following information in order to annotate the selected data element in the XML record document:

element selector
is realised according to section element selectors. Regarding the interpretation of selectors, the following special cases apply:
  1. A selector pointing at the XML document's root element ("/") is interpreted as to address the entire data record. In particular, this means is used to express general comments about the referenced data record.
  2. A selector pointing at a data element, for which the relating XML Schema does not allow for values to be entered, does not allow for entering annotated values as well.
annotated value (optional)
enables annotators to propose a new value for the selected data element. The proposed value must correspond to the data type specified in the document's relating XML Schema.
expectation
expresses the expected corrective to be taken by the curator with regard to the annotated data record in the collection database. Within AnnoSys, the following expectations may be expressed:
update
the value of the selected data element should be updated by the annotated value in the annotated data record.
remove
the selected data element should be removed from the annotated data record
add
a new data element according to the element selector should be added to the annotated data record including the given annotated value and/or comment.
comment (optional)
permits annotators to annotate or to make natural language comments for the selected data element. Comments are always expected to be a text string.

While both, annotated value and comment are optional, one of both must be present in order to provide valuable information. That is, annotation elements neither including annotated value nor comment are considered as to be invalid.

Repository

Records

Referring to the experience gained from the BioCASE annotation system [12] its file system based, hierarchical storage architecture has proofed to be a suitable solution for efficiently storing and accessing XML based original record documents (e.g ABCD, Darwin Core) in a biodiversity context. Following that experience, the current repository implementation stores original record documents on the server's file system in a configurable directory location. Using ABCD terms, the following file path creation scheme is used:

  • $RECORD_BASE_DIR/SourceInstitutionID/SourceID/format/UnitID.revision.xml

Thereby, the tripleID (SourceInstitutionID, SourceID and UnitID, see GUIDs in AnnoSys) will be extracted from the individual original record document depending on its data standard (e.g ABCD, Darwin Core). The format part within the file path creation scheme will be determined by a system-wide, predefined mapping scheme assigning an XML namespace prefix to the XML namespace URI for any data standard supported by the system. Finally, the revision part within the file name describes the datetime an original record document was first stored within the repository. According to the respective data elements defined within the [#Data Model|Data Model], the value of revision is specified as the time in milliseconds since 01/01/1970.

Whenever required for some substantial reasons (e.g. query performance), later revisions of AnnoSys may optionally store original record documents in a dedicated XML database or may introduce search engines like Apache Solr[33] for optimising e.g. query performance.

Annotations

The storing of annotations in AnnoSys is twofold. First, each user has a personal user profile, which includes a manageable repository holding any unpublished annotations of the given user. Second, there is a public repository, where annotations are moved to whenever a users initiates to publish an annotation.

Thereby, the user profile repositories are realised using Apache Jena TDB[10] based triple stores, Whereas the public annotation repository uses an Virtuoso Open-Source Edition[11] triple store. Nevertheless, both triple stores are implemented in AnnoSys using the Apache Jena[34] Software APIs.

In both kinds of triple stores, annotations are stored according to the W3C Open Annotation Data Model[14] driven by the activities of the W3C Open Annotation Community Group[35]. Due to some constraints in the current release of the W3C Open Annotation Data Model, the current implementation of the Annotation Data Model specified for AnnoSys with the means of the W3C Open Annotation Data Model results from thorough discussions with the W3C Open Annotation Community Group. Therefore, the following sections details the integration of AnnoSys' Annotation Data Model with the W3C Open Annotation Data Model.

W3C Open Annotation implementation of the AnnoSys Annotation Data Model

Principally, there are two means to implement complex data models with W3C Open Annotation. First, just the basic OA features are used to identify a target (in this case a versioned record document) and to realise annotations by formalising an own ontology to describe the annotations within the body. Second, trying to make use of all W3C OA features in order to express annotations with the means of W3C OA completely. Within AnnoSys, we decided to use the second alternative in order to provide a solution as close as possible to the current standard, and thus to be interpretable in non-biodiversity domains as well.

Therefore, the W3C Open Annotation implementation of the AnnoSys Annotation Data Model introduces new motivations reflecting the annotation types specified in the Annotation Data Model. Also, it uses multiple targets and bodies to implement the annotated elements also specified in the Annotation Data Model. Finally, the oa:hasScope property relation is used to realise a weak, but adequate linking between body and target of an annotation element. Within the next paragraphs provenance, target and body parts of the AnnoSys implementation will be elucidated in more detail.

W3C Open Annotation implementation of the AnnoSys Annotation Data Model

The oa:motivatedBy properties of the provenance part of the AnnoSys implementation extends the predefined class oa:editing to provide classes for the implementation of the AnnoSys motivation use cases (annotation types). As the basic expectation of annotators towards an collection curator is to edit or update the collection database based on the published annotation. So, the following classes have been derived from oa:editing to reflect the defined annotation type:

  • annosys:Determination
  • annosys:DuplicateLocality
  • annosys:Gathering
  • annosys:NomenclaturalType
  • annosys:Other
  • annosys:Sequence
  • annosys:RecordBasis
  • annosys:ScientificName
  • annosys:UnitMark

The remaining properties are straight forward uses of the W3C OA Specification providing the following information:

annotated by
the annotating agent, which is usually a foaf:Person, but may also be represented as prov:Software agent if (mass) annotations were created by a corresponding service.
annotated at
datetime, the annotation was published.
serialised by
the software agent, which has created the annotation in favour of the annotator (e.g. AnnoSys system).
serialised at
datetime, the annotation was first created within the system before it has been published by the annotator.

The annotated elements of an annotation are realised as tuples of target and body each. Due to the Open World Assumption defined wihtin the W3C OA Specification, any target and any body may be reused by any other annotation. Thus, the statement expressed by these resources must always be true. Therefore, body and target of an annotated element can not be "hardlinked", because this contradicts to the Open World Assumption. Here, the oa:hasScope property comes in place, which provides a kind of weak linking and could be interpreted in the following sense:

The target this body is scoped to was examined by the annotator when the annotation (body) was created. In fact, even though the relationship is not hardlinked, it expresses exactly what we require to model the annotated element. The body tracks the expectation, annotated value and the comment created by an annotator when examining a selected data element within a given record identified within the target.

A specific target part of an annotated element is identified through a oa:hasSource, oa:hasState and oa:hasSelector property, which will be described within the next paragaphs.

oa:hasSource
a target's source represents the tripleId of the physical specimen object in a museum (e.g. BGBM:Herbar:0001) which is somehow represented as a dataset.
oa:hasState
the state as oa:TimeState class represents the state of record document related to the target source as presented to the annotator at annotation time.
oa:cachedSource
the cached source represents the record XML document within the AnnoSys repository as it was downloaded from a record data provider. The cached source is stored as stable http-URI, which can be used to retrieve the given document from the AnnoSys repository via the AnnoSys Linked Data services.
oa:when
describes the datetime, when the record document referred to by the cached source property was retrieved and stored within the AnnoSys repository.
oa:hasSelector
oa:FragmentSelector is used here as class type to describe the AnnoSys element selector. In particular, fragment selectors support the XPointer framework, which permits to adequately state namespaces and XPath[30] selectors to identify the elements in the XML record document specified by the oa:hasCachedSource property.
dcterms:conformsTo
namespaceURI of the XPointer specification
rdf:value
XPointer expression selecting an annotated element within the XML record document denoted in the oa:hasCachedSource property

A specific body part of an annotated element is represented by its scope (annotated target) and a body holding the values of an annotated element

oa:hasScope
refers to the specific target resource this specific body resource relates to
oa:hasSource
refers to the body resource holding the values of the annotated element
rdf:type
a class introduced by AnnoSys describing the expectation of the annotator towards the curator of the collection database. The following clases are currently supported:
  • annosys:Update
  • annosys:Remove
  • annosys:Add
dcterms:description
the natural language comment of the annotator with regard to the selected annotated element.
rdf:value
the annotated value proposed by the annotator with regard to the selected annotated element.

In comformance to the W3C Open Annotation Data Model, an oa:Composite element is used to group specific target or body resources, if multiple specific targets and bodies have been used to model multiple annotated elements. If only a single annotated element had been modeled, oa:hasTarget or oa:hasBody would refer to a specific target or body element respectively.

User Interface

TBD

Security

TBD

Message System

TBD

Services

Preliminaries

Within the services documentation, some place holders are used to shorten reading and to adopt it easily to other or changing environments.

${AnnoSysURL} = https://annosys.bgbm.fu-berlin.de/AnnoSys/AnnoSys
Denotes the base URL of the stable and released AnnoSys system.
${ServicesURL} = https://annosys.bgbm.fu-berlin.de/AnnoSys/services
Denotes the base URL of the the stable and released AnnoSys web services.

For testing purposes, the most recent release is available under the following URLs:

${AnnoSysTestURL} = https://annosys.bgbm.fu-berlin.de/AnnoSysTest/AnnoSys
Denotes the base URL of the AnnoSys Test system.
${ServicesTestURL} = https://annosys.bgbm.fu-berlin.de/AnnoSysTest/services
Denotes the base URL of the AnnoSys Test system web services.

Please use the ${AnnoSysTestURL} or ${ServicesTestURL} if you are testing the integration of AnnoSys with your application!

Currently, only ABCD2.06 based XML data records are supported ! (namespacePrefix "abcd2.06b")

Annotations are stored and will be delivered on request in RDF according to our implementation of the W3C Open Annotation Data Model described here.

The AnnoSys - Quick User Guide provides a basic introduction using the AnnoSys user interface.

Note: The AnnoSys software and services are currently beta releases and may have errors !. Please report errors to the AnnoSys Project Team.

Integrating AnnoSys with Data Portals

AnnoSys provides the following types of interfaces for integration with Data Portals or other applications

  • Invoking the AnnoSys user interface
  • Retrieving record or annotation related information from AnnoSys web services

User interface invocation

The user interface can be invoked either to redirect users to the AnnoSys search interface or to enable users to annotate a data record currently reviewed in the data portal. The search interface can be invoked by redirecting web browsers to ${AnnoSysURL}.

To enable users to annotate a data record, the relating XML document must be transferred to the AnnoSys repository. After successfully transferring the document, the user will be redirected to the AnnoSys user login/registration dialog first and subsequently to the AnnoSys Annotation Editor. This can be done either by providing a URL, where the document can be downloaded by AnnoSys directly, or by providing a set of parameters describing how AnnoSys can download the data record from a BioCASE provider.

Note
The values for any parameters described in the next sections MUST be URL encoded individually(!) in order to be correctly transmitted via the URL to AnnoSys !

Download via direct URL

The URL referring to the record data document can be passed to AnnoSys via the parameter recordURL, i.e.

recordURL
The parameter should contain the URL of the document to be downloaded.
Example: https://annosys.bgbm.fu-berlin.de/AnnoSys/AnnoSys?recordURL=http://ww2.biocase.org/svn/annotation/original/03666bc0-f0f4-11d8-b22f-b8a03c50a862/abcd2.06/BGBM/Bridel%20Herbar/Bridel-1-12.xml

Download via BioCASE Provider

The BioCASE provider and the data record to be retrieved by AnnoSys must be passed via the following parameter set

providerURL
The base URL of the BioCASE provider (e.g. http://ww3.bgbm.org/biocase/pywrapper.cgi?dsa=Herbar&).
protocolURI
The namespace URI of the protocol used by the BioCASE provider (e.g. http://www.biocase.org/schemas/protocol/1.3).
formatURI
The namespace URI of the document format to be retrieved (e.g. http://www.tdwg.org/schemas/abcd/2.06).
institution
The institution (lsid:authority) part of the tripleId describing the record (e.g. BGBM).
source
The source (lsid:namespace) part of the tripleId describing the record (e.g. Herbarium Berolinense).
unitID
The unitID (lsid:objectId) part of the tripleId describing the record (e.g. B 20 0145120).
Example: https://annosys.bgbm.fu-berlin.de/AnnoSys/AnnoSys?providerURL=http%3A%2F%2Fww3.bgbm.org%2Fbiocase%2Fpywrapper.cgi%3Fdsa%3DHerbar%26&protocolURI=http%3A%2F%2Fwww.biocase.org%2Fschemas%2Fprotocol%2F1.3&formatURI=http%3A%2F%2Fwww.tdwg.org%2Fschemas%2Fabcd%2F2.06&institution=BGBM&source=Herbarium%20Berolinense&unitID=B%2020%200145120

Opening Annotation View or Annotation Editor from http-reference URI

The http-reference URI corresponds to the resource id used to retrieve annotation data as RDF or XML record documents from the repository through AnnoSys' Linked Open Data Services described in section Web Services.

The URI referring repository data object can be passed to AnnoSys via the parameter repositoryURI, i.e.

recordURI
The parameter must contain a valied URI dereferencing an annotation or XML record in the AnnoSys repository.
Example: https://annosys.bgbm.fu-berlin.de/AnnoSys/AnnoSys?repositoryURI=https%3A%2F%2Fannosys.bgbm.fu-berlin.de%2FAnnoSys%2Fservices%2Fannotations%2FBGBM%2FAnnoSys%2F1378396890596

Information Retrieval

Information related to a given record triple id can be retrieved via AnnoSys RESTful web services. Currently, the following record related information can be retrieved:

  • Annotations

Returns, if the AnnoSys repository knows the record of the given triple id and, if there are annotations, also the number of annotations the given record. For more detailed information, see Request: GET ${ServicesURL}/records/<lsid:authority>/<lsid:namespace>/<lsid:objectId>/annotations.

Web Services

AnnoSys provides two kinds of web services.

  • Linked Open Data (LOD) services
  • RESTful services

The Linked Open Data services provide access to resources referring to data stored in the AnnoSys repository. Therewith, annotations can be retrieved as RDF data, and the relating record documents as XML documents.

The RESTful services provide access to other information related to annotations or records, like if there are annotations stored in the repository for a given record.

Additional services may be implemented on request.

The next sections will provide detailed information regarding the provided services.

Annotations

The general context path for annotations is ${ServicesURL}/annotations.

The annotation id is constructed analogously to tripleIds(or LSIDs) by our institution(lsid:authority), our source(lsid:namespace) and an annotation id(lsid:objectId) (e.g. /BGBM/AnnoSys/123456789).

All annotation requests return an either an rdf graph containing the annotations as described in our annotation model or a JSON annotation object as described in section JSON Annotation Object.

The following sections will describe possible requests and answers.

Note
The values for any parameters described in the next sections MUST be URL encoded individually(!) in order to be correctly interpreted by AnnoSys Web Services !

JSON Annotation Object

An JSON Annotation Object contains the following information about an annotation which is intended for being displayed at a data portal user interface:

repositoryURI
URI of the annotation within the AnnoSys repository
annotator
name of the annotating agent
time
annotation's publication time in milliseconds since 01 January 1970
motivation
the annotation's motivation or type

Request: GET ${ServicesURL}/annotations

Returns a JSON object containing a list of URLs referring to to all available annotations in the AnnoSys repository.

size
number of annotations
annotations
list of JSON Annotation Objects
Example: https://annosys.bgbm.fu-berlin.de/AnnoSys/services/annotations
{
  "size": 2,
  "annotations": [
     {"repositoryURI":"https://annosys.bgbm.fu-berlin.de/AnnoSys/services/annotations/BGBM/AnnoSys/1378396890596","annotator":"Lutz Suhrbier","time":1378396910419,"motivation":"ScientificName"},
     {"repositoryURI":"https://annosys.bgbm.fu-berlin.de/AnnoSys/services/annotations/BGBM/AnnoSys/1378396868015","annotator":"Lutz Suhrbier","time":1378396919770,"motivation":"Determination"}
   ]
}

Request GET ${ServicesURL}/annotations/BGBM/AnnoSys/<annotationId>

Returns the annotation with the given annotationId as RDF, if it exists. Otherwise, HTTP-Status 404 is returned if no such annotation exists.

Example: https://annosys.bgbm.fu-berlin.de/AnnoSys/services/annotations/BGBM/AnnoSys/1378396890596

Records

The general context path for records is ${ServicesURL}/records.

Records are identified within the AnnoSys repository by an extended LSID, including LSID data plus a prefix describing the document format. Thus, the path to be appended to the general records context path is build up on the tripleId, plus a timestamp stating lsid:version and an AnnoSys predefined namespacePrefix identifying the record's document format. Currently, only abcd2.06b is supported.

Example: /BGBM/Herbarium Berolinense/B -W 00400 -00 0/3534524354/abcd2.06b

Request: GET ${ServicesURL}/records/<lsid:authority>/<lsid:namespace>/<lsid:objectId>/<lsid:version>/<annosys:formatPrefix>

Returns the record document for the given extended LSID identifier. Otherwise, HTTP-Status 404 is returned if no such record exists.

Example: https://annosys.bgbm.fu-berlin.de/AnnoSys/services/records/BGBM/Herbarium+Berolinense/B+18+0014862/1379406965371/abcd2.06b

Request: GET ${ServicesURL}/records/<lsid:authority>/<lsid:namespace>/<lsid:objectId>/annotations

Returns a JSON object containing information about all annotations referring to the record in the AnnoSys repository according to the given record tripleId. In particular, this includes annotations created for all record versions stored in the AnnoSys record repository.

Otherwise, HTTP-Status 404 is returned if no such record exists.

record
URI of the most recent record for the given tripleId.
hasAnnotation
true or false.
size
number of annotations found related to the most recent record.
annotations
list of JSON Annotation Objects.
Example: https://annosys.bgbm.fu-berlin.de/AnnoSys/services/records/BGBM/Herbarium+Berolinense/B+18+0014862/annotations
{ 
  "record" : "https://annosys.bgbm.fu-berlin.de/AnnoSys/services/records/BGBM/Herbarium+Berolinense/B+18+0014862/1379406965371/abcd2.06b"
  "hasAnnotation: true,
  "size": 3,
  "annotations": [
     {"repositoryURI":"https://annosys.bgbm.fu-berlin.de/AnnoSys/services/annotations/BGBM/AnnoSys/1379917872836","annotator":"okka","time":1379918049923,"motivation":"Determination"},
     {"repositoryURI":"https://annosys.bgbm.fu-berlin.de/AnnoSys/services/annotations/BGBM/AnnoSys/1379918822364","annotator":"okka","time":1379918844336,"motivation":"NomenclaturalType"},
     {"repositoryURI":"https://annosys.bgbm.fu-berlin.de/AnnoSys/services/annotations/BGBM/AnnoSys/1379920327486","annotator":"okka","time":1379920373944,"motivation":"Gathering"}
   ]
}

Request: GET ${ServicesURL}/records/<lsid:authority>/<lsid:namespace>/<lsid:objectId>/<lsid:version>/<annosys:formatPrefix>/annotations

Same as before, but retrieves all annotations referring to the given version and format of the record.

SPARQL Endpoint

AnnoSys also provides a SPARQL endpoint enabling external services or applications to run self-defined queries against the AnnoSys annotation repository. The SPARQL endpoint is driven by Virtuoso Open Source.

The URL of the AnnoSys SPARQL endpoint is: https://annosys.bgbm.fu-berlin.de/AnnoSys/sparql.

References

  1. 1.0 1.1 Tschöpe, O., Macklin, J. A., Morris, R. A., Suhrbier, L. & Berendsohn, W. G. 2013. Annotating Biodiversity Data via the Internet. – Taxon 62(6): 1248-1258. Online available at http://www.ingentaconnect.com/content/iapt/tax/2013/00000062/00000006/art00011
  2. Sanderson, Robert; Ciccarese, Paolo; Van de Sompel, Herbert. W3C Open Annotation Data Model - Community Draft, 08 February 2013. Retrieved 01 August 2013, from http://www.openannotation.org/spec/core/20130208/index.html.
  3. W3C RDF Working Group, Resource Description Framework (RDF), http://www.w3.org/RDF/, 22 March 2013. Retrieved 16 December 2013
  4. W3C. Extensible Markup Language (XML), 29 October 2013. Retrieved 16 December 2013, from http://www.w3.org/XML/
  5. 5.0 5.1 5.2 5.3 Holetschek, Jörg, ABCD - Access to Biological Collection Data, http://wiki.tdwg.org/twiki/bin/view/ABCD/, 02 March 2010. Retrieved 16 Dec 2013
    Cite error: Invalid <ref> tag; name "TDWG_ABCD" defined multiple times with different content
  6. 6.0 6.1 W3C SPARQL Working Group. SPARQL 1.1 Overview - W3C Recommendation 21 March 2013. Retrieved 16 December 2013, from http://www.w3.org/TR/sparql11-overview/
  7. Bernhard Haslhofer, Elaheh Momeni, et al. (2011). Europeana RDF store report.
  8. Chris Bizer and Andreas Schultz (2011). BSBM V3 Results (February 2011). Retrieved 02 August 2013 from http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/V6/index.html.
  9. Chris Bizer and Andreas Schultz (2009). Berlin SPARQL Benchmark Results. Retrieved 02 August 2013 from http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html.
  10. 10.0 10.1 10.2 The Apache Software Foundation (2013). Apache Jena - TDB. Retrieved 03 August 2013, from http://jena.apache.org/documentation/tdb/.
  11. 11.0 11.1 OpenLink Software. Virtuoso Open-Source Edition. Retrieved 02 August 2013 from http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/.
  12. 12.0 12.1 Anton Güntsch, Walter G. Berendsohn, Pepé Ciardelli, Andrea Hahn, Wolf-Henning Kusber & Jinling Li (2009): Adding content to content – a generic annotation system for biodiversity data. Studi Trent. Sci. Nat. 84: 123-128.
  13. 13.0 13.1 Wieczorek, John, Döring, Markus, De Giovanni, Renato, Robertson, Tim, Vieglais, Dave, Darwin Core, http://rs.tdwg.org/dwc/index.htm (2009), accessed 05 Dec 2011
  14. 14.0 14.1 Sanderson, Robert; Ciccarese, Paolo; Van de Sompel, Herbert (2013). W3C Open Annotation Data Model - Community Draft, 08 February 2013, http://www.openannotation.org/spec/core/, accessed 26 November 2013
  15. Brickley, Dan; Miller, Libby (2010). FOAF Vocabulary Specification 0.98 - Namespace Document 9 August 2010 - Marco Polo Edition, http://xmlns.com/foaf/spec/, accessed 26 November 2013
  16. 16.0 16.1 Hipp, D. Richard. SQLite. Retrieved 16 December 2013 from [http://www.sqlite.org/.
  17. Garrett, J. J. (2005, 18 February 2005). "Ajax: A New Approach to Web Applications." from http://www.adaptivepath.com/ideas/ajax-new-approach-web-applications.
  18. The Eclipse Foundation (2013). "Enabling modular business apps for desktop, browser and mobile." Retrieved 26 November, 2013, from http://eclipse.org/rap/.
  19. Rescorla, E. (2000). "HTTP Over TLS" Retrieved 13 December, 2013, from http://tools.ietf.org/html/rfc2818.
  20. The Apache Software Foundation (2013). "Welcome to Apache Shiro." Retrieved 13 December, 2013, from http://shiro.apache.org/.
  21. Oracle. "Java Message Service (JMS)." Retrieved 16 December, 2013, from http://www.oracle.com/technetwork/java/jms-136181.html.
  22. Rich Burridge Mark Hapner, Rahul Sharma, Joseph Fialli, Kate Stout (2002) Java Message Service.
  23. Apache Software Foundation (2011). "ActiveMQ." Retrieved 16 December 2013, from http://activemq.apache.org/.
  24. Claus Ibsen and Jonathan Anstey (2010). "Camel in Action", Manning.
  25. Pericas-Geertsen, Santiago; Potociar, Marek (2013). "AX-RS: Java™ API for RESTful Web Services - Version 2.0 Final Release May 22, 2013", Oracle Corporation. Retrieved 16 December 2013, from http://download.oracle.com/otn-pub/jcp/jaxrs-2_0-fr-eval-spec/jsr339-jaxrs-2.0-final-spec.pdf.
  26. TDWG, Welcome to the Globally Unique Identifiers (GUID) Wiki, http://wiki.tdwg.org/GUID (2009), accessed 22 Oct 2012
  27. Richards, Kevin; TDWG GUID Applicability Statement, 09 September 2009, http://www.tdwg.org/standards/150/download/, accessed 19 December 2011
  28. TDWG; LSID Authority Identifications, www.omg.org/cgi-bin/doc?dtc/04-05-01, ???, not accessible 19 December 2011
  29. Pereira, Ricardo; Richards, Kevin; Hobern, Donald, Hyam, Roger; Belbin, Lee; Blum,Stan;TDWG Life Sciences Identifiers (LSID) Applicability Statement, 03 September 2009, http://www.tdwg.org/standards/150/download/, accessed 19 December 2011
  30. 30.0 30.1 Berglund, Anders, Boag, Scott, et al., XML Path Language (XPath) 2.0 (Second Edition) - W3C Recommendation 14 December 2010 (Link errors corrected 3 January 2011), http://www.w3.org/TR/xpath20/ (14 December 2010), accessed 05 Dec 2011
  31. Morris, Paul J., Proposed AOD Extensions to AO, http://wiki.tdwg.org/twiki/bin/view/AnnotationsIG/DataExtensionsToAO (06 July 2011), accessed 05 Dec 2011
  32. Morris, Paul J., TDWG Wiki AnnotationsIG, http://wiki.tdwg.org/AnnotationsIG (2011), accessed 05 Dec 2011
  33. Apache Software Foundation (2012). Apache Solr. Retrieved 26 November 2013 from http://lucene.apache.org/solr/.
  34. The Apache Software Foundation (2014). Apache Jena - A free and open source Java framework for building Semantic Web and Linked Data applications. Retrieved 27 January 2014, from http://jena.apache.org/.
  35. W3C, W3C Community and Business Groups - Open Annotation Community Group, http://www.w3.org/community/openannotation/ (2012), accessed 25 Oct 2012