Difference between revisions of "Projektrelevante domänenspezifische Infrastrukturen - BGBM"

From StandAPHerb
Jump to: navigation, search
(Trivialnamen)
Line 850: Line 850:
  
 
=== Workflow Umgebungen ===
 
=== Workflow Umgebungen ===
 
=== Zusammenfassung / Summary ===
 
 
Several workflow management systems exist that target the scientific community, most notably Taverna [Oinn2002] and Kepler [Altintas2004]. Both systems are open source and have a history of several years of development. These systems are oriented at data-driven computations and provide some specialized primitives for data flow control and data transformations to facilitate data exchange between computation tasks [Yildiz2009] [Lin2014]. While these specialized features may be very convenient for data-driven applications, they are not seen as advantageous for fulfilling the StanDAP-Herb requirements. Additionally, there is no standard notation supported across multiple systems.
 
Another interesting project is Argo [rak:2012], described as “a workbench for analyzing (primarily annotating) textual data”. Argo relies on the UIMA standard [Ferrucci2004] to support interoperability between processing components. Users who develop UIMA-based components are able to deposit them on the system, and it is also possible to develop Argo clients that interact with the system through web services. Argo main focus is curation of biomedical literature and is currently in beta phase. Argo is suited for creating pipelines of text analysis tasks, but it lacks primitives to express the complex workflows required to cover the whole digitalization process.
 
Since the aforementioned systems do not provide a compelling solution with clear advantages to support the digitalization process, a decision has been made to use the more generic BPMN notation (WhiteBPMN), recognized as an ISO standard [ISO/IEC 19510:2013], to design the StanDAP-Herb workflow for processing herbarium specimens. A number of BPMN engines, including open source projects like Activiti [activiti2015], provide the run-time support to execute and manage workflow instances.
 
 
*(Altintas2004) Altintas, I., Jaeger, E., Lin, K., Ludaescher, B. and Memon, A., A Web Service Composition and Deployment Framework for Scientific Workflows, 2013 IEEE 20th International Conference on Web Services, IEEE Computer Society, 2004, Vol. 0, pp. 814
 
 
*(Ferrucci2004) Ferrucci, D. and Lally, A., UIMA: an architectural approach to unstructured information processing in the corporate research environment, Natural Language Engineering, Cambridge Univ Press, 2004, Vol. 10(3-4), pp. 327-348
 
 
*(Lin2014) Lin, Y., Mougenot, I. and Libourel, T., Method and components for creating scientific workflow, Data Engineering Workshops (ICDEW), 2014 IEEE 30th International Conference on 2014, pp. 147-153
 
 
*(Oinn2002) Oinn, T., Greenwood, M., Addis, M. J., Alpdemir, M. N., Ferris, J., Glover, K., Goble, C., Goderis, A., Hull, D., Marvin, D. and others, Taverna: lessons in creating a workflow environment for the life sciences, Journal of Concurrency and Computation: Practice and experience, John Wiley & Sons Ltd, 2002
 
 
*(rak:2012) Rak, R., Rowley, A., Black, W. and Ananiadou, S., Argo: an integrative, interactive, text mining-based workbench supporting curation, Database: The Journal of Biological Databases and Curation, 2012, Vol. 2012
 
 
*(Yildiz2009) Yildiz, U., Guabtni, A. and Ngu, A., Business versus Scientific Workflows: A Comparative Study, 2009 World Conference on Services – I, 2009, pp. 340-343
 
 
*(WhiteBPMN) White, S. A., Introduction to BPMN, http://www.omg.org/bpmn/Documents/Introduction_to_BPMN.pdf
 
 
*(activiti2015) Activiti BPM Platform, http://activiti.org/
 
 
 
  
 
''Taverna''
 
''Taverna''

Revision as of 14:47, 3 September 2015

projektrelevante Infrastrukturen (Biodiversitätsinformatik)

Biodiversitätsnetzwerke, Infrastrukturen

GBIF - Global Biodiversity Information Facility

URL: http://www.gbif.org/

Beschreibung: GBIF ist eine internationales Netzwerk, das den freien Zugang zu Biodiversitätsdaten über das Internet ermöglicht. - mehr: http://www.gbif.org/whatisgbif , [Fuzzy Matching integriert]


BioCASe (Europäisches Sammlungsnetzwerk)

URL: http://www.biocase.org/

Beschreibung: Der Biological Collection Access Service for Europe, BioCASE, ist ein transnationales Netzwerk biologischer Sammlungen der verschiedensten Arten. BioCASE ermöglicht den Zugriff auf verteilte, heterogene europäische Sammlungs- und Beobachtungsdatenbanken und nutzt konsequent betriebssystemunabhängige Open-Source-Software sowie offene Datenstandards und -austauschprotokolle.

BioCASe Portal: http://search.biocase.org/europe/

BioCASe Technologie: Der Begriff "BioCASe" wird häufig zur Bezeichnung der Technologien verwendet, die im Rahmen des BioCASE-Projekts entwickelt wurden, insbesondere für das BioCASe-Protokoll und die BioCASe Provider-Software. Diese Technologien ermöglichen es, eine beliebig strukturierte Datenbank und primäre Biodiversitätsnetzwerke wie BioCASE oder GBIF anzubinden.

BioCASe Provider Software Wiki: http://wiki.bgbm.org/bps/index.php/Main_Page


ALA - Atlas of Living Australia

URL: http://www.ala.org.au/about-the-atlas/

Beschreibung:The Atlas was initiated by a group of 14 (now 17) organisations—our partners. The intent was to create a national database of all of Australia’s flora and fauna that could be accessed through a single, easy to use web site. [Fuzzy Matching integriert]

ALA Downloadable tools:http://www.ala.org.au/about-the-atlas/downloadable-tools/

ALA Webservices: http://api.ala.org.au/


Canadensys

URL:http://www.canadensys.net/

Beschreibung:Canadensys makes biodiversity information freely and openly available to everyone. We are a network of researchers, collectors, curators, information technologists, students, and educators that shares data on the occurrence and identity of plant, animal, and fungal species in Canada. Members of GBIF.

Web service page: http://data.canadensys.net/vascan/api


BHL - Biodiversity Heritage Library

URL Developer Tools and API: http://biodivlib.wikispaces.com/Developer+Tools+and+API

Beschreibung: The Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate to digitize the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.” The BHL consortium works with the international taxonomic community, rights holders, and other interested parties to ensure that this biodiversity heritage is made available to a global audience through open access principles.




themenspezifisches Wissen / Dictionaries

wissenschaftliche Namen

International Plant Names Index – IPNI

URL: http://www.ipni.org/

Beschreibung: database of the names and associated basic bibliographical details of seed plants, ferns and lycophytes / Datenbank mit Namen und assoziierten bibliografischen Details von Samenpflanzen, Farnen und Bärlappgewächsen. Wird ständig aktualisiert. - Pflanzen

Kontakt: ipnieditors@ipni.org (zum download von mehr als 5.000 Datensätzen)


Web Service for Matching IPNI names

URL: (beta version) http://data1.kew.org/reconciliation/

URL: (biodiversity catalogue) https://www.biodiversitycatalogue.org/services/84

Beschreibung: (Matthew Blisset, Kew): The software is flexible and can use any kind of data, the first service we have released is for matching to IPNI names. This is built using a sequence of transformations to both the list of authoritative names and the query, for each part of the name. These transformations should provide better matches than some other services which just use Levenshtein distance etc. For example, it ignores double letters or a changed Latin ending.

The service is exposed in three ways:

  • An OpenRefine (Google Refine) Reconciliation Service
  • A custom API which is a bit simpler to use
  • A batch upload of a CSV file

I've concentrated on the OpenRefine method.

I've also implemented a few bits of the "Metaweb Query Language" API on ThePlantList, which allows an OpenRefine extension to query ThePlantList using an IPNI id, and retrieve information held by TPL for that name.



TROPICOS

URL: http://www.tropicos.org/Home.aspx

Beschreibung: ‘Tropicos® was originally created for internal research but has since been made available to the world’s scientific community. All of the nomenclatural, bibliographic, and specimen data accumulated in MBG’s electronic databases during the past 25 years are publicly available here. This system has over 1.2 million scientific names and 4.0 million specimen records.’ Suchportal über die Datenbanken des Missouri Botanical Garden, USA. Suche möglich unter anderem nach: wissenschaftlichen Namen, Personen, Orten. - Pflanzen

TROPICOS Web Services: http://services.tropicos.org/

Kontakt: Missouri Botanical Garden, http://www.tropicos.org/Feedback.aspx?feedbackoption=4


WSCP Kew's World Checklist

URL: http://apps.kew.org/wcsp/home.do

Beschreibung: WCSP is an international collaborative programme that provides the latest peer reviewed and published opinions on the accepted scientific names and synonyms of selected plant families. It allows you to search for all the scientific names of a particular plant, or the areas of the world in which it grows (distribution). The checklist includes 173 Seed Plant families (View list of included families). Different families are in different stages of review as indicated in the family list. There are currently more than 155 contributors from 22 countries. - Pflanzen

Kontakt: Rafaël Govaerts, email: R.Govaerts@kew.org


Catalogue of Life Service

URL: https://www.biodiversitycatalogue.org/services/17

Beschreibung: This web service endpoint serves as a search engine for scientific name-related taxonomic information.


Euro+Med Plantbase

URL:http://www.emplantbase.org/home.html

Beschreibung:The Euro+Med PlantBase provides an on-line database and information system for the vascular plants of Europe and the Mediterranean region, against an up-to-date and critically evaluated consensus taxonomic core of the species concerned. The Euro+Med PlantBase is part of the Pan-European Species directories Infrastructure (PESI). - Pflanzen

Kontakt: http://www.emplantbase.org/contacts.html


Australian Plant Name Index – APNI

URL: http://www.anbg.gov.au/apni/index.html

Beschreibung: APNI is a tool for the botanical community that deals with plant names and their usage in the scientific literature. - Maintained by the Australian National Botanic Gardens as part of its larger IBIS database, in collaboration with the Centre for Australian National Biodiversity Research and the Australian Biological Resources Study - Pflanzen


GRIN – Taxonomy of Plants

URL: http://www.ars-grin.gov/cgi-bin/npgs/html/index.pl?language=en

Beschreibung: Taxonomische Daten im GRIN bestimmen die Struktur und Benennung für die Akzessionen im Nationalen Genetische-Ressourcen-System für Pflanzen (NPGS), Teil des Nationalen Programms von Genetische Ressourcen (NGRP) von Landwirtschaftlicher Forschung Service (ARS) der Abteilung der Landwirtschaft der Vereinigten Staaten von Amerika (USDA). Alle Familien und Gattungen der Pflanzen und 52.095 Arten aus der ganzen Welt insbesondere ökonomische Pflanzen und ihre verwandten Arten, werden im GRIN Taxonomie der Pflanzen repräsentiert. Angaben beinhalten die wissenschaftlichen Namen und Volksnamen, Klassifizierung, Verbreitung, Referenzen und Informationen über die ökonomischen Nutzungen. - Pflanzen


PESI

URL: http://www.eu-nomen.eu/pesi/

Beschreibung:PESI provides standardised and authoritative taxonomic information by integrating and securing Europe’s taxonomically authoritative species name registers and nomenclators (name databases) and associated exper(tise) networks that underpin the management of biodiversity in Europe

Web service page: http://www.eu-nomen.eu/portal/webservices.php [fuzzy matching integriert]


Species 2000 /Catalogue of Life

URL: http://www.sp2000.org/sp2kwebsite/index.php?option=com_content&task=view&id=40&Itemid=49

Kontakt: sp2000@sp2000.org

Beschreibung: online database of the world's known species of animals, plants, fungi and micro-organisms. CoL (Catalogue of Life): starting point for GBIF taxonomic backbone

Web service page : http://webservice.catalogueoflife.org/col/webservice


GNI - Global Names Index

URL: http://gni.globalnames.org/

Beschreibung: Index von Namenskatalogen, u.a. uBio Name Bank, ITIS, EOL, GBIF, IPNI. 'GNI is a collection of strings (combinations of characters) that have been used as names for organisms. GNI contains many examples of names spelled in slightly different ways. In order to be able to link all of the information about any taxon, a query beginning with one string will find data associated with any of the alternative strings. This is done by linking the alternative names for the same species - a process called 'reconciliation'. GNI is a component of the Global names Architecture, an effort that is building a names-based cyberinfrastructure for biology (Patterson, D. J., Cooper, J., Kirk, P. M.,Pyle, R.L. and Remsen D. P. 2010. Names are key to the big new biology. TREE 25: 686-691).'

Global Names Index API: http://www.biodiversitycatalogue.org/services/61


GBIF taxonomic backbone

URL: http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c

Beschreibung: The GBIF Backbone Taxonomy, often called the Nub taxonomy, is a single synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another. It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families.


ITIS – Integrated Taxonomic Information System

URL: http://www.itis.gov/

Beschreibung: the Integrated Taxonomic Information System! Here you will find authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world. We are a partnership of U.S., Canadian, and Mexican agencies (ITIS-North America); other organizations; and taxonomic specialists. ITIS is also a partner of Species 2000 and the Global Biodiversity Information Facility (GBIF). The ITIS and Species 2000 Catalogue of Life (CoL) partnership is proud to provide the taxonomic backbone to the Encyclopedia of Life (EOL).


Trivialnamen

OpenUp common names webservice

Beschreibung: entwickelt an der Universität Wien, Service zur Anreicherung der wissenschaftlichen Namen mit Trivialnamen in unterschiedlichen Sprachen entwickelt. Service wird bereits von Europeana genutzt. Weitere Hinweise im Newsletter: http://open-up.eu/sites/open-up.eu/files/Newletter4_PRNT_0.pdf


Australian Common Name Database

URL: http://www.anbg.gov.au/common.names/

Beschreibung: Datenbank für Trivialnamen Australischer Pflanzen

Verantwortlich: Integrated Botanical Information System (IBIS), Australian National Herbarium

Sprache: englisch


Standardliste der Farn- und Blütenpflanzen Deutschlands

Beschreibung: deutsche Namen in der: Standardliste der Farn- und Blütenpflanzen Deutschlands Rolf Wisskirchen, Henning Haeupler: Standardliste der Farn- und Blütenpflanzen Deutschlands. Mit Chromosomenatlas. Herausgegeben vom Bundesamt für Naturschutz (= Die Farn- und Blütenpflanzen Deutschlands. Band 1). Eugen Ulmer, Stuttgart (Hohenheim) 1998, ISBN 3-8001-3360-1.

Sprache: deutsch

Personen (Sammler / Collectors, Autoren )

Index Herbariorum

URL: http://sciweb.nybg.org/science2/IndexHerbariorum.asp

Beschreibung: Liste von Sammlernamen und Spezialisten

Kontakt: http://sciweb.nybg.org/science2/Contacts.asp.html


International Plant Names Index – IPNI

URL: http://www.ipni.org/

Beschreibung: Autorennamen

Kontakt: ipnieditors@ipni.org (zum download von mehr als 5.000 Datensätzen)


Index of Botanists (Collectors)

Beschreibung: Harvard Index of Botanists (Suchfilter ‚Collector‘)

URL: http://kiki.huh.harvard.edu/databases/botanist_index.html

Kontakt: http://huh.harvard.edu/pages/contact


TROPICOS

URL: http://www.tropicos.org/Home.aspx

Beschreibung: ‘Tropicos® was originally created for internal research but has since been made available to the world’s scientific community. All of the nomenclatural, bibliographic, and specimen data accumulated in MBG’s electronic databases during the past 25 years are publicly available here. This system has over 1.2 million scientific names and 4.0 million specimen records.’ Suchportal über die Datenbanken des Missouri Botanical Garden, USA. Suche möglich unter anderem nach: wissenschaftlichen Namen, Personen, Orten.

TROPICOS Web Services: http://services.tropicos.org/

Kontakt: Missouri Botanical Garden, http://www.tropicos.org/Feedback.aspx?feedbackoption=4


Australian Plant Collectors and Illustrators 1780s-1980s

URL: http://www.anbg.gov.au/bot-biog/index.html

Beschreibung: Australische Pflanzensammler und Illustratoren

Verantwortlich: This web site is based on the list published by J.H. Willis, D. Pearson, M.T. Davis, and J.W. Green, Western Australian Herbarium Research Notes Number 12, August 1986. That original list has been supplemented by additional entries and some updates of dates, especially where people have died since that publication. It has been further supplemented with information from Alex George's 2009 publication Australian Botanist's Companion, published by Four Gables Press, WA.


Cyclopedia of Malesian Collectors

URL: http://www.nationaalherbarium.nl/fmcollectors/Home.htm


Botanical Collectors: Africa (Natural History Museum, London)

URL: http://www.plantcollectors.co.uk/


Botanical Collectors: Latin America (Natural History Museum, London)

URL: http://www.plantcollectors.co.uk/LAPI.asp?


Liste von Herbarinstitutionen

Index Herbariorum

URL: http://sciweb.nybg.org/science2/IndexHerbariorum.asp

Beschreibung: Liste Herbarien und offizielle Abkürzungen - eindeutige Instistuts IDs

Kontakt: http://sciweb.nybg.org/science2/Contacts.asp.html


Geografische Information, Gazetteers

Getty Thesaurus

URL: http://www.getty.edu/research/tools/vocabularies/tgn/index.html


Geonames, the United States Board on Geographic Names

URL: http://geonames.usgs.gov/


JRZ fuzzy gazetteer

URL: http://dma.jrc.it/services/fuzzyg


weitere Vokabularien, Glossare, Abkürzungen

Abkürzungen und Symbole in der biologischen Nomenklatur

Beschreibung: Es werden Abkürzungen und Formulierungen aus der Nomenklatur und Taxonomie von Zoologie, Botanik, Kulturpflanzen, Virologie und Bakterien alphabetisch aufgelistet. Die Erläuterung erfolgt meist anhand von Beispielen.

Publikation: Wolfgang Granzow (2000): Abkürzungen und Symbole in der biologischen Nomenklatur. Senckenbergiana lethaea 80 (2) 355 – 370.

Download pdf: http://ashipunov.info/jurassic/j/Granzow,%202000_Nomenklatur.pdf


Terms Used in Bionomenclature

Beschreibung: This text is a comprehensive a glossary of over 2,100 terms used in biological nomenclature - the naming of whole organisms of all kinds. It is accompanied by a web application that enables the glossary to facilitate semantic linking on the web.

Download pdf:http://www.gbif.org/resources/2647

Einbindung von Thesauri, Terminologie Server

GfBio Terminology Server

Beschreibung: Der Terminology Server verknüpft externe und interne Vokabularien (kontrolliertes Vokabular, Glossare, Thesauri, Ontologien) zum Thema Biodiversität, entwickelt im Rahmen des GfBio Projektes (http://www.gfbio.org/ )

URL: http://terminologies.gfbio.org/



TOQE – Thesaurus optimized query expander

Beschreibung: Einbindung von Thesauri über eine Service-Schnittstelle

Dokumentation: http://search.biocase.org/toqe/

Publikation: http://journals.ku.edu/index.php/jbi/article/view/1631/3472


Handschriftensammlungen (Digitalisate)

Online verfügbare Handschriftensammlungen bekannter Autoren


Chirographicum historicum

URL: http://harvest.cals.ncsu.edu/chiro/about.html

Kontakt: http://harvest.cals.ncsu.edu/chiro/contact.html, North Carolina University


Auxilium ad Botanicorum Graphicem

URL: http://www.ville-ge.ch/musinfo/bd/cjb/auxilium/index.php

Kontakt: http://www.ville-ge.ch/musinfo/bd/cjb/auxilium/contactus.php, Herbarium Genf


Handwriting Linnean Herbarium

URL: http://linnaeus.nrm.se/botany/fbo/hand/welcome.html.en

Kontakt: Swedish Natural History Museum, Linnean Herbarium


Global Plants Initiative Designation Identifier

URL: http://gpi.myspecies.info/digitising-resources/designation-identifier

Kontakt: Global Plants Initiative


CALIGRAFÍAS del Herbario MA, Madrid

URL: http://www.floraiberica.es/caligrafia/index.php



Digitalisierung

Literaturhinweise

Sehr gute Übersicht über Entwicklungen im Bereich der Massendigitalisierung und Digitalisierungsworkflows:Vladimir Blagoderov & Vincent Smith (2012): No specimen left behind: mass digitization of natural history collections

URL: http://www.pensoft.net/journals/zookeys/issue/209/

Hierin u.a.

  • Beschreibung eines modularen Digitalisierungsworkflows in Edinburgh:

Haston & al. (2012): Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach. ZooKeys 209: 93–102, doi: 10.3897/zookeys.209.3121


  • Workflow zur Durchführung einer Massendigitalisierung am Natural History Museum London:

Blagoderov & al. (2012): No specimen left behind: industrial scale digitization of natural history collections. ZooKeys 209: 133–146, doi: 10.3897/zookeys.209.3178


  • Massendigitalisierung Naturalis Biodiversity Center, Leiden, Niederlande:

van den Oever & Gofferjé (2012): ‘From Pilot to production’: Large Scale Digitisation project at Naturalis Biodiversity Center. ZooKeys 209: 87–92, doi: 10.3897/zookeys.209.3609


  • Über das Digitalisierungscentrum des Naturkundemuseums Joensuu, Finnland (Digitarium): Tegelberg & al. (2012):

The development of a digitising service centre for natural history collections. ZooKeys 209: 75–86, doi: 10.3897/zookeys.209.3119


  • Methoden der Effizienzsteigerung bei der Herbardigitalisierung am Herbarium des New York Botanical Gardens, hier besonders ‚Strategy 3: semi-automated approach‘ unter Verwendung von Tools für eine halbautomatische Informationsextraktion von Etiketten wie ‚SALIX3‘ und ‚Apiary‘ (siehe auch Informationsextraktion) und Dublettenerkennung mit ‚Specify‘ (siehe auch Sammlungserfassungssoftware):

Tulig & al. (2012): Increasing the efficiency of digitization workflows for herbarium specimens. ZooKeys 209: 103–113, doi: 10.3897/zookeys.209.3125


  • Dokumentation und Vergleich von Digitalisierungsworkflow-Komponenten und Protokollen von 28 Programmen in 10 US-amerikanischen Museen und akademischen Einrichtungen, darin Beschreibung von Informationsextraktion mittels OCR Software Tools und Software für rückwirkende Georeferenzierung => Fazit „There is significant interest in natural language processing (NLP), which is designed to parse OCR text into fields, as well as intelligent character recognition (ICR) or handwriting analysis, but effective systems for using these technologies to extract data from biological specimens were not observed.”:

Nelson & al. (2012): Five task clusters that enable efficient and effective digitization of biological collections. ZooKeys 209: 19–45, doi: 10.3897/zookeys.209.3135


  • Mobilisierung und Verlinkung von biologischen Multimediadaten im EUROPEANA Portal (OpenUp! Projekt), hier besonders ‘Data quality control’ und ‘Semantic enrichment’, dh. Verknüpfung von wissenschaftlichen Namen mit Trivialnamen (Vulgärnamen bereitgestellt vom Naturhistorischen Museum Wien):

Berendsohn & Güntsch (2012): OpenUp! Creating a cross-domain pipeline for natural history data. ZooKeys 209: 47–54, doi: 10.3897/zookeys.209.3179


  • Test von Digitalisierungsworkflows am Royal Botanic Garden Edinburgh

Robyn E. Drinkwater, Robert W. N. Cubey, Elspeth M. Haston (2014): The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels. PhytoKeys 38: 15–30 (2014) doi: 10.3897/phytokeys.38.7168

Adresse: Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK

Kontakt: Elspeth M. Haston (e.haston@rbge.org.uk)


Sammlungsdatenbanksoftware / Collection Management Software

Übersicht und einige Beispiele


Übersicht

Überblick über Collection Managament Systems in GfBio biowikifarm: http://gfbio.biowikifarm.net/wiki/Technical_documentation_of_collection_management_systems_at_the_GFBio_collection_archives


JACQ

Dokumentation: http://jacq.nhm-wien.ac.at/dokuwiki/doku.php?id=export_documentation

Code: http://sourceforge.net/p/jacq/legacy/

Webausgabe: http://herbarium.univie.ac.at/database/search.php


BGBase

URL: http://www.bg-base.com/users.htm

Beschreibung: BG-BASE is a PC-based database application written primarily to handle the information management needs of institutions and individuals holding living and/or preserved collections of biological material, including botanic gardens, arboreta, zoos, herbaria, museums, libraries, university campuses, horticultural societies and private collections.

Support: proprietary product, BG-BASE is developed and supported from two international centers, one in the US and the other in the UK.

Kontakt: http://www.bg-base.com/contact.htm


Specify

URL: http://specifysoftware.org/

Kontakt: email: specify@ku.edu, Universität Kansas

Beschreibung: Specify is a database software application for museum and herbarium research data. It manages species and specimen information for computerizing collections, tracking museum specimen transactions, linking images to specimen records, and publishing catalog data to the Internet. Specify is written in Java. mehr: http://specifysoftware.org/

Download: http://specifysoftware.org/download/

Source code: http://specifysoftware.org/sourceforge/


BRAHMS

URL: http://herbaria.plants.ox.ac.uk/bol/

Kontakt: http://herbaria.plants.ox.ac.uk/bol/brahms/Home/Contact

Beschreibung: management system working with and integrating data and images from specimens, botanical surveys, field observations, living collections, seed banks and literature. more: http://herbaria.plants.ox.ac.uk/bol/brahms/Software


Digitalisierungsprojekte / -initiativen

Integrated Digitised Collections – iDigBio

Beschreibung: National Resource for Advancing Digitization of Biodiversity Collections (ADBC) funded by the National Science Foundation (USA). Through ADBC, data and images for millions of biological specimens are being made available in electronic format for the research community, government agencies, students, educators, and the general public.

Auf dem Website befindet sich ein Wiki mit vielen Hinweisen, hier besonders ‘iDigBio Data Ingestion’- ‘Digitisation Resources’: z.B Workflows and Protocols, Georeferencing Rescources. Sehr interessante Hinweise unter 'Working Groups / aOCR (Augmenting OCR) hier: 'OCR related Materials !'

URL: http://www.idigbio.org/ und http://www.idigbio.org/wiki/index.php/Digitization_Resources


Digitarium, Finnland

URL: http://www.digitarium.fi/content/statistics/

Beschreibung: Digitalisierungszentrum des Naturhistorischen Museums Finnland, Massendigitalisierung, industrieller Ansatz


Picturae / Naturalis, Niederlande

Beschreibung: Picturae ist eine niederländische Firma, die für das Naturkundemuseum Naturalis in Leiden, Niederlande, die Digitalisieung der Sammlungsbelege durchgeführt hat. Dafür wurde ein Workflow entwickelt, der die Massendigitalisierung von Herbarbelegen u.a. durch Einsatz von „Digitstreets“ ermöglicht.

URL: http://picturae.com/digitising/herbarium-sheets

URL Digitalisierung Naturalis, Leiden: http://science.naturalis.nl/en/collection/digitization/digitization/


MNHN Paris, Frankreich

URL: http://collections.mnhn.fr/wiki/attach/Visit_October2012/Paris-Herbarium-Digitization_2012-07-12.pdf (power point slides)

Beschreibung: Massendigitalisierung des gesamten Pariser Herbariums in industriellen Arbeitsabläufen


Next Generation Phenomics for the Tree of Life, USA

http://avatol.org/ngp/

Beschreibung: The Next Generation Phenomics project seeks to develop and adapt tools to assemble large phenomic datasets in a rapid and automated way. This project consists of computer vision, natural language processing, and crowdsourcing components. The Computer Vision (CV) team is developing methods that automate the extraction and annotation of phenomic characters from digital images using computer learning approaches. The new CV algorithms can discern the presence/absence of features and assess their spatial relationships and appearance. The Natural Language Processing (NLP) group is developing software to transforms digitized taxonomical descriptions into taxon/character matrices for phylogenetic analyses. Also, because microbial descriptions often differ radically from those of other organisms, the NLP group is developing supervised learning strategies to extract phenomic characters from microbial descriptions. Finally, the crowdsourcing team has developed software, The Evolution Project, that works with MorphoBank to present images of character states to crowds for scoring.


Annotationssysteme

AnnoSys

URL:http://annosys.bgbm.fu-berlin.de/

URL Technische Dokumentation: http://wiki.bgbm.org/annosys/index.php?title=TechnicalDocumentation#Services_2

Beschreibung: The project´s objective is to exemplarily develop a specification for an annotation data repository for networked and highly complex biodiversity data. AnnoSys is based on a prototype developed in the context of SYNTHESYS and uses the Open Annotation Data Model and an RDF-database for storing the information. AnnoSys is implemented using the example of collection and observation data in the botanic domain provided by the GBIF/BioCASE system (currently over 50.8 million records, including 15 million records from natural history collection objects).


Filtered Push

URL: http://wiki.filteredpush.org/wiki/

Beschreibung:We are designing and implementing a network, which we term Filtered Push, to connect remote sites where annotations can be generated with the authoritative databases of the collections holding the vouchers to which those annotations apply.

Standards und Praxisregeln

Standards

TDWG - Biodiversity Information Standards

TDWG formuliert Biodiversitätsstandards und bildet Arbeitsgruppen zu verschiedenen Themenbereichen.

URL: http://www.tdwg.org/

Beschreibung: Biodiversity Information Standards (TDWG), also known as the Taxonomic Databases Working Group, is a not for profit scientific and educational association that is affiliated with the International Union of Biological Sciences. TDWG was formed to establish international collaboration among biological database projects. TDWG promoted the wider and more effective dissemination of information about the World's heritage of biological organisms for the benefit of the world at large. Biodiversity Information Standards (TDWG) now focuses on the development of standards for the exchange of biological/biodiversity data.


ABCD Access to Biological Collection Data (TDWG Standard)

URL: http://www.tdwg.org/standards/115/

Beschreibung: The Access to Biological Collections Data (ABCD) Schema is an evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data). The ABCD Schema attempts to be comprehensive and highly structured, supporting data from a wide variety of databases. It is compatible with several existing data standards. Parallel structures exist so that either (or both) atomised data and free-text can be accommodated. Version 1.2 is currently in use with the GBIF (Global Biodiversity Information Facility) and BioCASE (Biological Collection Access Service for Europe) networks. Apart from the GBIF and BioCASE networks, the potential for the application of ABCD extends to internal networks, or in-house legacy data access (e.g. datasets from external sources that shall not be converted and integrated into an institutions own data, but be kept separately, though easily accessible). By defining relations between terms, ABCD is a step towards an ontology for biological collections. ABCD concepts: http://wiki.tdwg.org/twiki/bin/view/ABCD/AbcdConcepts ABCD 2.06 Schema: http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/HTML/ABCD_2.06.html


Darwin Core (TDWG Standard)

URL: http://www.tdwg.org/standards/450/

Beschreibung: The Darwin Core is body of standards. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. The Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information. Included are documents describing how these terms are managed, how the set of terms can be extended for new purposes, and how the terms can be used. Used by: GBIF


TaxonX - systematics literature mark up standard

URL: http://www.tdwg.org/biodiv-projects/projects-database/view-project/512/

Beschreibung: TaxonX is an XML schema for encoding taxonomic treatments to: create open, persistent full digital surrogates of treatments; identify treatments and their major structures; identify textual data such as names, localities, characters, and citations


Audobon Core - Audobon Core Multimedia Recources Metadata Standard

URL: http://terms.tdwg.org/wiki/Audubon_Core

Beschreibung: The Audubon Core metadata schema ("AC") is a representation-neutral metadata vocabulary for describing biodiversity-related multimedia resources and collections.


EML Ecological Metadata language

URL: http://knb.ecoinformatics.org/#external//emlparser/docs/index.html

Beschreibung: Ecological Metadata Language (EML) is a metadata specification developed by the ecology discipline and for the ecology discipline. It is based on prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications). EML is implemented as a series of XML document types that can be used in a modular and extensible manner to document ecological data. Each EML module is designed to describe one logical part of the total metadata that should be included with any ecological dataset.

Spezifikation: http://knb.ecoinformatics.org/#external//emlparser/docs/eml-2.1.1/index.html


Guidelines, Praxisregeln, Direktiven

Metamorfoze Preservation Imaging Directives (with three quality levels), Niederlande

URL: http://www.metamorfoze.nl/sites/metamorfoze.nl/files/publicatie_documenten/Metamorfoze_Preservation_Imaging_Guidelines_1.0.pdf


DFG Praxisregeln Digitalisierung

URL: http://www.dfg.de/formulare/12_151/index.jsp


GBIF - Best practice guide Georeferenzing

URL: http://www.gbif.org/resources/2809


MANIS Georeferenzierungsguidelines

URL: http://manisnet.org/GeorefGuide.html


Tools zur Informationsextraktion / Text Mining

Softwareentwicklungen, die die automatische Informationsextraktion unterstützen

Allgemein

National Centre for Text Mining, Manchester UK

URL: http://www.nactem.ac.uk/DID-MIBIO/

Beschreibung:The National Centre for Text Mining (NaCTeM) is the first publicly-funded text mining centre in the world. We provide text mining services in response to the requirements of the UK academic community. NaCTeM is operated by the University of Manchester.


Argo Projekt - A web based text mining workbench

Beschreibung: Argo is a workbench for building and running text-analysis solutions. It facilitates the development of custom workflows from a selection of elementary analytics.

URL: http://argo.nactem.ac.uk/




Taxon Namen

GBIF Name Finder

Code URL:https://github.com/silverbiology/gbif-namefinder

Beschreibung: Name Finder which is a GNA Name Finding API compliant java implementation based on lucene for finding scientific names in arbitrary text documents.


GBIF Name Parser

URL: http://tools.gbif.org/nameparser/ URL API Dokumentation:http://tools.gbif.org/nameparser/api.do

Beschreibung: The parser is written in java and based on regular expressions to disect name strings into its components. It does only keep name parts required to reconstruct a full 3-parted name with an optional subgenus, but ignores additional infraspecific parts such as the subspecies given for varieties.


Global Names

URL: http://www.silverbiology.com/projects/opensource/ URL: http://gnrd.globalnames.org/

Beschreibung: Global Names recognition and discovery tools (grnd) and services make it easy to find scientific names on web pages, PDFs, Microsoft Office documents, images, or in freeform text. Encrypted or image-based PDFs and image files first pass through an OCR routine using Tesseract prior to using the excellent TaxonFinder and NetiNeti names discovery engines. The language of incoming content is determined using unsupervised language detection. If the language is other than English, TaxonFinder is prefered. Found names can be optionally resolved against a number of resources.


TaxonFinder

URL:http://taxonfinder.org/about

Beschreibung:taxonfinder detects scientific names in plain text. Given a string, it will scan through the contents and use a dictionary-based approach to identifying which words and strings are latin scientific organism names. It detects names at all ranks including species, genera, subspecies and more. mehr Info:https://github.com/pleary/node-taxonfinder


GBIF Checklist Bank

Code URL: https://github.com/silverbiology/gbif-checklistbank

Beschreibung: Checklist Bank serves as a dynamic archive of "checklists," which are summarized lists of taxa or taxon names. Checklist Bank stores checklist data as it was provided by the data publisher. In addition, Checklist Bank attempts to collate different published checklist resources by tying the atomic elements of checklists, taxon names, to a common names dictionary.


SPECIES

URL:http://species.jensenlab.org/

Beschreibung: a standalone command line application capable of identifying taxonomic mentions in documents and mapping them to corresponding NCBI Taxonomy database entries.

Publikation:http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0065390


EUBON Taxonomic Backbone

URL: http://cybertaxonomy.eu/eu-bon/utis/

Beschreibung: The Unified Taxonomic Information Service (UTIS) is the taxonomic backbone for the EU BON project. The EU BON Taxonomic backbone allows running a federated search on multiple European checklists and returns a unified result set of the individual responses of the various checklists. The current implementation of the UTIS is still a prototype, which means that the API and data model may be changed until final release. It connects the web services of the Pan-European Species directories Infrastructure, PESI CoL, the Catalogue of Life and of the World Register of Marine Species (WoRMS). In future it will connect more data providers like EUNIS and Natura2000 in order to be compliant with the INSPIRE directive. Currently it is possible to search for taxa and synonyms by a scientific name or vernacular name string. In case of matching synonyms the according accepted taxon is resolved. The search results always include information on the classification and on related taxa so far as this data is delivered by the connected checklist providers.

Documentation: http://cybertaxonomy.eu/eu-bon/utis/doc.html


Taxamatch - fuzzy matching algorithm for genus and species scientific names

URL: Taxamatch Developers' Wiki: https://wiki.csiro.au/display/taxamatch/Home

Beschreibung: TAXAMATCH employs both phonetic and non-phonetic matching (to detect errors of either type, or both) along with a set of heuristic rules that are incorporated into pre- and post- filters at both genus and species epithet level. In the main, the pre-filters maximimise algorithm efficiency by ensuring that only a subset of available names have to be tested, while the post-filters apply heuristic reasoning to distinguish likely "true" from "false" near matches, although they may have the same calculated similarity.



Geografische Information

Tools zur (halb-)automatischen Georeferenzierung von Ortsinformationen


Geolocate (Silver Biology)

URL: http://www.silverbiology.com/projects/opensource/

Beschreibung: Geolocate is a platform for Georeferencing Natural History Collections Data


BioGeoMancer

Beschreibung:The Java BioGeomancer Core API is used for georeferencing localities (example: 5 Miles West of Berkeley) - nicht weiter gepflegt -

Code: http://code.google.com/p/biogeomancer-core/wiki/WebServicesScope

Lizenz: Apache 2.0


GeoLoc-Cria

URL:http://www.cria.org.br/eventos/iaed/amarino_pre.html

Beschreibung: allows a user to determine the geocode and associated error for a locality that is at a fixed distance and direction from a known locality. The database has approximately 110 thousand names of Brazilian geocoded localities.

Label Informationsextraktion

Apiary Project (High-Throughput Workflow for Computer-Assisted Human Parsing of Biological Specimen Label Data)

Beschreibung: The Texas Center for Digital Knowledge (TxCDK) at the University of North Texas and the Botanical Research Institute of Texas (BRIT) are conducting fundamental research with the goal of identifying how human intelligence can be combined with machine processes for effective and efficient transformation of textual museum specimen label information into high-quality machine-processible parsed data. This two-year project, which we call Apiary, will advance understanding of the workflow and processes best able to increase access to and use of digitized biological collection metadata within the stakeholder communities comprised of biologists, natural history museum collections managers, biodiversity standards groups, and the library and information science community.

URL: http://www.apiaryproject.org/high-throughput-workflow-computer-assisted-human-parsing-biological-specimen-label-data

Kontakt: Jason Best, URL: http://www.brit.org/StaffDirectory/Best, E-Mail: jbest@brit.org


SALIX, the Semi-automatic Label Information Extraction system

Beschreibung: nutzt OCR und weitere Software zur Erschließung von Etikettendaten von Sammlungsobjekten, entwickelt an der Arizona State University (ASU), USA. mehr: http://daryllafferty.com/salix/

URL: http://nhc.asu.edu/vpherbarium/canotia/SALIX3.pdf

Download: http://daryllafferty.com/salix/


Darwin Score

Source URL: http://github.com/jbest/darwin-score

Beschreibung: A method to evaluate text representing a natural history bio-collection object. Scores calculated in this process can help provide a rough evaluation of the quality of the text, whether it be generated by human transcription or OCR. Initially, this will focus on labels and annotations applied to herbarium collections. This process will check the text for words including taxonomic names, collector names, annotator names, location names, common abbreviations and other expected words (all stored in purpose-built dictionaries) and patterns such as dates, numbers, and geocoordinates. The text is given a score based on the number of the matches found. Includes Dictionaries and simple regex patterns.

Kontakt: Jason Best, Director of Biodiversity Informatics, Botanical Research Institute of Texas (BRIT), URL: http://www.brit.org/, E-mail: jbest@brit.org


ScioTR

URL: http://www.sciotr.com/

Beschreibung: ScioTR is a new touch-enabled Windows 8 app which integrates Optical Character Recognition (OCR), Consensus Strategy, and Machine Learning (ML) to provide an efficient workflow for digitizing images into custom data fields. Label unspezifiziert: office labels,small forms,food labels,product labels,music collection,travel receipts,business cards,library card catalogs.

Data Quality Tools

BioVel - Data Refinement Workflow

URL:https://wiki.biovel.eu/display/doc/Data+Refinement+Workflow+v16;jsessionid=51B1E9D7187555A7D07EF6DBEFF4FDB6

Beschreibung: The Taxonomic Data Refinement Workflow provides an environment for preparing observational and specimen data sets for use in scientific analyses such as: species distribution analysis,species richness and diversity studies, species occurrence studies, historical analysis, and other spatio-temporal analyses.


BinHuM - Data quality tools

BiNHum Projekt URL: http://wiki.binhum.net/web/Hauptseite

Beschreibung: Quality Testing in verschiedenen Schritten, alle bezogen auf geografische Angaben: 1. Übersetzung von Ländernamen ins Englische, die Ländernamen verschiedener Sprachen aus Wikipedia 2. Zuordnung untergeordneter geografischer Einheiten, z.B. Bundesländer, die versehentlich dem ABCD Element ‚Country‘ zugeordnet werden 3. Wenn Element ‚Country‘ leer, dann Suche in anderen ABCD Elementen mit geografischem Bezug 4. Prüfung: Eintrag in ISO Country Code (ja/nein), wenn ja, dann Abgleich Eintrag ISO Code mit Country Name -> Ergebnis entweder ‚Warnhinweis‘ oder weitere Prüfung: Wenn Koordinaten vorhanden Test mit open Source GIS Applikation. 5. Suche im Geonames Server (Suchanzahl begrenzt)und Google Maps 6. Entwicklung eines Ozean Thesaurus


Data Quality Hub

URL: http://www.gbif.es/BDQ.php

Kontakt: GBIF Spanien

Beschreibung: liefert eine Übersicht über diverse Data Quality tools: Detection tools,Validation tools, Thesauri Checklists, Thesauri IsoCodes, Procedures and Best Practices


Workflow Umgebungen

Taverna

URL: http://www.taverna.org.uk/

Beschreibung: Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation. Taverna has been created by the myGrid team and is currently funded though FP7 projects BioVeL, SCAPE and Wf4Ever. Taverna workflows can be shared through the http://www.myexperiment.org/ site and be executed on HPC platforms, such as http://onlinehpc.com. Taverna is oriented towards the processing of potentially large volumes of data that require multiple steps. When compared to BPMN, Taverna's workflow notation is much simpler, as it doesn't try to support messages, events and other concepts present in the former notation.


Kepler

URL: http://kepler-project.org/

Beschreibung: The Kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler. Kepler is designed to help scien­tists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines.

Download Link: http://kepler-project.org/users/downloads


Argo Projekt - A web based text mining workbench

Beschreibung: Argo is a workbench for building and running text-analysis solutions. It facilitates the development of custom workflows from a selection of elementary analytics.

URL: http://argo.nactem.ac.uk/

Web Service Registries

Biodiversity Catalogue

URL:https://www.biodiversitycatalogue.org/

Beschreibung: The BiodiversityCatalogue is a centralised registry of curated biodiversity Web services. It allows you to easily discover, register, annotate, monitor and use Web services.


Stable Identifiers

Best Practises for Stable URIs

URL: http://wiki.pro-ibiosphere.eu/wiki/Best_practices_for_stable_URIs

Source Code and Example Elements: http://sourceforge.net/projects/stablecollectionidentifiers/

Hintergrund: "In a "stable identifier hackathon" in June 2013 in Edinburgh, five CETAF institutions (Royal Botanical Garden Edinburgh, Museum für Naturkunde in Berlin, Royal Botanic Garden Kew, National Museum of National History Paris and Botanical Museum Berlin-Dahlem) committed to a rapid pilot implementation of the system. Naturalis Biodiversity Center in the Netherlands also plans to join this effort." (from: http://www.pro-ibiosphere.eu/news/4296_stable%20identifiers%20for%20specimens%20%E2%80%93%20a%20cetaf%20istc%20initiative%20supported%20by%20pro-ibiosphere%20/)