Projektrelevante domänenspezifische Infrastrukturen - BGBM

From StandAPHerb
Jump to: navigation, search

projektrelevante Infrastrukturen (Biodiversitätsinformatik)

Biodiversitätsnetzwerke, Infrastrukturen

GBIF - Global Biodiversity Information Facility


Beschreibung: GBIF ist eine internationales Netzwerk, das den freien Zugang zu Biodiversitätsdaten über das Internet ermöglicht. - mehr: , [Fuzzy Matching integriert]

BioCASe (Europäisches Sammlungsnetzwerk)


Beschreibung: Der Biological Collection Access Service for Europe, BioCASE, ist ein transnationales Netzwerk biologischer Sammlungen der verschiedensten Arten. BioCASE ermöglicht den Zugriff auf verteilte, heterogene europäische Sammlungs- und Beobachtungsdatenbanken und nutzt konsequent betriebssystemunabhängige Open-Source-Software sowie offene Datenstandards und -austauschprotokolle.

BioCASe Portal:

BioCASe Technologie: Der Begriff "BioCASe" wird häufig zur Bezeichnung der Technologien verwendet, die im Rahmen des BioCASE-Projekts entwickelt wurden, insbesondere für das BioCASe-Protokoll und die BioCASe Provider-Software. Diese Technologien ermöglichen es, eine beliebig strukturierte Datenbank und primäre Biodiversitätsnetzwerke wie BioCASE oder GBIF anzubinden.

BioCASe Provider Software Wiki:

ALA - Atlas of Living Australia


Beschreibung:The Atlas was initiated by a group of 14 (now 17) organisations—our partners. The intent was to create a national database of all of Australia’s flora and fauna that could be accessed through a single, easy to use web site. [Fuzzy Matching integriert]

ALA Downloadable tools:

ALA Webservices:



Beschreibung:Canadensys makes biodiversity information freely and openly available to everyone. We are a network of researchers, collectors, curators, information technologists, students, and educators that shares data on the occurrence and identity of plant, animal, and fungal species in Canada. Members of GBIF.

Web service page:

BHL - Biodiversity Heritage Library

URL Developer Tools and API:

Beschreibung: The Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate to digitize the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.” The BHL consortium works with the international taxonomic community, rights holders, and other interested parties to ensure that this biodiversity heritage is made available to a global audience through open access principles.

virtuelle Herbarien

virtuelles Herbarium Deutschland


Beschreibung: VH/de, das Virtuelle Herbarium Deutschland, ermöglicht den direkten Zugriff auf digitalisierte Sammlungs- informationen aus deutschen Herbarien. / VH/de, the German Virtual Herbarium, is an online resource that provides access to information obtained from collections held in German herbaria.

Virtual Herbarium JACQ


Beschreibung: quick access to botanical collections from the herbaria B, BAK, BRNU, CHER, ERE, FT, GAT, GJO, GZU, HAL, HERZ JE, KFTA, KUFS, LAGU, LECB, LW, LWKS, LWS, LZ, MJG, OLD, PRC, TBI, TGU, TMRC, W and WU. It is our main goal to provide a unified and jointly administered specimen management system for the participating herbaria. Special attention is paid to providing images for all material online and especially high resolution images for type collections.

weitere virtuelle Herbarien (Beispiele)

Australia's virtual Herbarium:

Utah Valley University Vitual Herbarium

Fairchild Virtual Herbarium:

themenspezifisches Wissen / Dictionaries

wissenschaftliche Namen

International Plant Names Index – IPNI


Beschreibung: database of the names and associated basic bibliographical details of seed plants, ferns and lycophytes / Datenbank mit Namen und assoziierten bibliografischen Details von Samenpflanzen, Farnen und Bärlappgewächsen. Wird ständig aktualisiert. - Pflanzen

Kontakt: (zum download von mehr als 5.000 Datensätzen)

Web Service for Matching IPNI names

URL: (beta version)

URL: (biodiversity catalogue)

Beschreibung: (Matthew Blisset, Kew): The software is flexible and can use any kind of data, the first service we have released is for matching to IPNI names. This is built using a sequence of transformations to both the list of authoritative names and the query, for each part of the name. These transformations should provide better matches than some other services which just use Levenshtein distance etc. For example, it ignores double letters or a changed Latin ending.

The service is exposed in three ways:

  • An OpenRefine (Google Refine) Reconciliation Service
  • A custom API which is a bit simpler to use
  • A batch upload of a CSV file

I've concentrated on the OpenRefine method.

I've also implemented a few bits of the "Metaweb Query Language" API on ThePlantList, which allows an OpenRefine extension to query ThePlantList using an IPNI id, and retrieve information held by TPL for that name.



Beschreibung: ‘Tropicos® was originally created for internal research but has since been made available to the world’s scientific community. All of the nomenclatural, bibliographic, and specimen data accumulated in MBG’s electronic databases during the past 25 years are publicly available here. This system has over 1.2 million scientific names and 4.0 million specimen records.’ Suchportal über die Datenbanken des Missouri Botanical Garden, USA. Suche möglich unter anderem nach: wissenschaftlichen Namen, Personen, Orten. - Pflanzen

TROPICOS Web Services:

Kontakt: Missouri Botanical Garden,

WSCP Kew's World Checklist


Beschreibung: WCSP is an international collaborative programme that provides the latest peer reviewed and published opinions on the accepted scientific names and synonyms of selected plant families. It allows you to search for all the scientific names of a particular plant, or the areas of the world in which it grows (distribution). The checklist includes 173 Seed Plant families (View list of included families). Different families are in different stages of review as indicated in the family list. There are currently more than 155 contributors from 22 countries. - Pflanzen

Kontakt: Rafaël Govaerts, email:

Catalogue of Life Service


Beschreibung: This web service endpoint serves as a search engine for scientific name-related taxonomic information.

Euro+Med Plantbase


Beschreibung:The Euro+Med PlantBase provides an on-line database and information system for the vascular plants of Europe and the Mediterranean region, against an up-to-date and critically evaluated consensus taxonomic core of the species concerned. The Euro+Med PlantBase is part of the Pan-European Species directories Infrastructure (PESI). - Pflanzen


Australian Plant Name Index – APNI


Beschreibung: APNI is a tool for the botanical community that deals with plant names and their usage in the scientific literature. - Maintained by the Australian National Botanic Gardens as part of its larger IBIS database, in collaboration with the Centre for Australian National Biodiversity Research and the Australian Biological Resources Study - Pflanzen

GRIN – Taxonomy of Plants


Beschreibung: Taxonomische Daten im GRIN bestimmen die Struktur und Benennung für die Akzessionen im Nationalen Genetische-Ressourcen-System für Pflanzen (NPGS), Teil des Nationalen Programms von Genetische Ressourcen (NGRP) von Landwirtschaftlicher Forschung Service (ARS) der Abteilung der Landwirtschaft der Vereinigten Staaten von Amerika (USDA). Alle Familien und Gattungen der Pflanzen und 52.095 Arten aus der ganzen Welt insbesondere ökonomische Pflanzen und ihre verwandten Arten, werden im GRIN Taxonomie der Pflanzen repräsentiert. Angaben beinhalten die wissenschaftlichen Namen und Volksnamen, Klassifizierung, Verbreitung, Referenzen und Informationen über die ökonomischen Nutzungen. - Pflanzen



Beschreibung:PESI provides standardised and authoritative taxonomic information by integrating and securing Europe’s taxonomically authoritative species name registers and nomenclators (name databases) and associated exper(tise) networks that underpin the management of biodiversity in Europe

Web service page: [fuzzy matching integriert]

Species 2000 /Catalogue of Life



Beschreibung: online database of the world's known species of animals, plants, fungi and micro-organisms. CoL (Catalogue of Life): starting point for GBIF taxonomic backbone

Web service page :

GNI - Global Names Index


Beschreibung: Index von Namenskatalogen, u.a. uBio Name Bank, ITIS, EOL, GBIF, IPNI. 'GNI is a collection of strings (combinations of characters) that have been used as names for organisms. GNI contains many examples of names spelled in slightly different ways. In order to be able to link all of the information about any taxon, a query beginning with one string will find data associated with any of the alternative strings. This is done by linking the alternative names for the same species - a process called 'reconciliation'. GNI is a component of the Global names Architecture, an effort that is building a names-based cyberinfrastructure for biology (Patterson, D. J., Cooper, J., Kirk, P. M.,Pyle, R.L. and Remsen D. P. 2010. Names are key to the big new biology. TREE 25: 686-691).'

Global Names Index API:

GBIF taxonomic backbone


Beschreibung: The GBIF Backbone Taxonomy, often called the Nub taxonomy, is a single synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another. It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families.

ITIS – Integrated Taxonomic Information System


Beschreibung: the Integrated Taxonomic Information System! Here you will find authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world. We are a partnership of U.S., Canadian, and Mexican agencies (ITIS-North America); other organizations; and taxonomic specialists. ITIS is also a partner of Species 2000 and the Global Biodiversity Information Facility (GBIF). The ITIS and Species 2000 Catalogue of Life (CoL) partnership is proud to provide the taxonomic backbone to the Encyclopedia of Life (EOL).


IndExs - Index of Esiccatae


Beschreibung: "IndExs" comprises information on titles, abbreviations and bibliography of exsiccatae. Exsiccatae are defined here as "published, uniform, numbered sets of preserved specimens distributed with printed labels" (Pfister 1985). Please note that there are two similar latin terms: "exsiccata, ae" is feminine and used for a set of dried specimens as defined above, whereas the term "exsiccatum, i" is neutral and used for dried specimens in general. You may search "IndExs" using title, part of the title, editor and group of organisms alone or combined.


OpenUp common names webservice

Beschreibung: entwickelt an der Universität Wien, Service zur Anreicherung der wissenschaftlichen Namen mit Trivialnamen in unterschiedlichen Sprachen entwickelt. Service wird bereits von Europeana genutzt. Weitere Hinweise im Newsletter:

Australian Common Name Database


Beschreibung: Datenbank für Trivialnamen Australischer Pflanzen

Verantwortlich: Integrated Botanical Information System (IBIS), Australian National Herbarium

Sprache: englisch

Standardliste der Farn- und Blütenpflanzen Deutschlands

Beschreibung: deutsche Namen in der: Standardliste der Farn- und Blütenpflanzen Deutschlands Rolf Wisskirchen, Henning Haeupler: Standardliste der Farn- und Blütenpflanzen Deutschlands. Mit Chromosomenatlas. Herausgegeben vom Bundesamt für Naturschutz (= Die Farn- und Blütenpflanzen Deutschlands. Band 1). Eugen Ulmer, Stuttgart (Hohenheim) 1998, ISBN 3-8001-3360-1.

Sprache: deutsch

Personen (Sammler / Collectors, Autoren )

Index Herbariorum


Beschreibung: Liste von Sammlernamen und Spezialisten


International Plant Names Index – IPNI


Beschreibung: Autorennamen

Kontakt: (zum download von mehr als 5.000 Datensätzen)

Index of Botanists (Collectors)

Beschreibung: Harvard Index of Botanists (Suchfilter ‚Collector‘)





Beschreibung: ‘Tropicos® was originally created for internal research but has since been made available to the world’s scientific community. All of the nomenclatural, bibliographic, and specimen data accumulated in MBG’s electronic databases during the past 25 years are publicly available here. This system has over 1.2 million scientific names and 4.0 million specimen records.’ Suchportal über die Datenbanken des Missouri Botanical Garden, USA. Suche möglich unter anderem nach: wissenschaftlichen Namen, Personen, Orten.

TROPICOS Web Services:

Kontakt: Missouri Botanical Garden,

Australian Plant Collectors and Illustrators 1780s-1980s


Beschreibung: Australische Pflanzensammler und Illustratoren

Verantwortlich: This web site is based on the list published by J.H. Willis, D. Pearson, M.T. Davis, and J.W. Green, Western Australian Herbarium Research Notes Number 12, August 1986. That original list has been supplemented by additional entries and some updates of dates, especially where people have died since that publication. It has been further supplemented with information from Alex George's 2009 publication Australian Botanist's Companion, published by Four Gables Press, WA.

Cyclopedia of Malesian Collectors


Botanical Collectors: Africa (Natural History Museum, London)


Botanical Collectors: Latin America (Natural History Museum, London)


Liste von Herbarinstitutionen

Index Herbariorum


Beschreibung: Liste Herbarien und offizielle Abkürzungen - eindeutige Instistuts IDs


Geografische Information, Gazetteers

Getty Thesaurus


Geonames, the United States Board on Geographic Names


JRZ fuzzy gazetteer


weitere Vokabularien, Glossare, Abkürzungen

Abkürzungen und Symbole in der biologischen Nomenklatur

Beschreibung: Es werden Abkürzungen und Formulierungen aus der Nomenklatur und Taxonomie von Zoologie, Botanik, Kulturpflanzen, Virologie und Bakterien alphabetisch aufgelistet. Die Erläuterung erfolgt meist anhand von Beispielen.

Publikation: Wolfgang Granzow (2000): Abkürzungen und Symbole in der biologischen Nomenklatur. Senckenbergiana lethaea 80 (2) 355 – 370.

Download pdf:,%202000_Nomenklatur.pdf

Terms Used in Bionomenclature

Beschreibung: This text is a comprehensive a glossary of over 2,100 terms used in biological nomenclature - the naming of whole organisms of all kinds. It is accompanied by a web application that enables the glossary to facilitate semantic linking on the web.

Download pdf:

Einbindung von Thesauri, Terminologie Server

GfBio Terminology Server

Beschreibung: Der Terminology Server verknüpft externe und interne Vokabularien (kontrolliertes Vokabular, Glossare, Thesauri, Ontologien) zum Thema Biodiversität, entwickelt im Rahmen des GfBio Projektes ( )


TOQE – Thesaurus optimized query expander

Beschreibung: Einbindung von Thesauri über eine Service-Schnittstelle



Handschriftensammlungen (Digitalisate)

Online verfügbare Handschriftensammlungen bekannter Autoren

Chirographicum historicum


Kontakt:, North Carolina University

Auxilium ad Botanicorum Graphicem


Kontakt:, Herbarium Genf

Handwriting Linnean Herbarium


Kontakt: Swedish Natural History Museum, Linnean Herbarium

Global Plants Initiative Designation Identifier


Kontakt: Global Plants Initiative

CALIGRAFÍAS del Herbario MA, Madrid




Sehr gute Übersicht über Entwicklungen im Bereich der Massendigitalisierung und Digitalisierungsworkflows:Vladimir Blagoderov & Vincent Smith (2012): No specimen left behind: mass digitization of natural history collections


Hierin u.a.

  • Beschreibung eines modularen Digitalisierungsworkflows in Edinburgh:

Haston & al. (2012): Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach. ZooKeys 209: 93–102, doi: 10.3897/zookeys.209.3121

  • Workflow zur Durchführung einer Massendigitalisierung am Natural History Museum London:

Blagoderov & al. (2012): No specimen left behind: industrial scale digitization of natural history collections. ZooKeys 209: 133–146, doi: 10.3897/zookeys.209.3178

  • Massendigitalisierung Naturalis Biodiversity Center, Leiden, Niederlande:

van den Oever & Gofferjé (2012): ‘From Pilot to production’: Large Scale Digitisation project at Naturalis Biodiversity Center. ZooKeys 209: 87–92, doi: 10.3897/zookeys.209.3609

  • Über das Digitalisierungscentrum des Naturkundemuseums Joensuu, Finnland (Digitarium): Tegelberg & al. (2012):

The development of a digitising service centre for natural history collections. ZooKeys 209: 75–86, doi: 10.3897/zookeys.209.3119

  • Methoden der Effizienzsteigerung bei der Herbardigitalisierung am Herbarium des New York Botanical Gardens, hier besonders ‚Strategy 3: semi-automated approach‘ unter Verwendung von Tools für eine halbautomatische Informationsextraktion von Etiketten wie ‚SALIX3‘ und ‚Apiary‘ (siehe auch Informationsextraktion) und Dublettenerkennung mit ‚Specify‘ (siehe auch Sammlungserfassungssoftware):

Tulig & al. (2012): Increasing the efficiency of digitization workflows for herbarium specimens. ZooKeys 209: 103–113, doi: 10.3897/zookeys.209.3125

  • Dokumentation und Vergleich von Digitalisierungsworkflow-Komponenten und Protokollen von 28 Programmen in 10 US-amerikanischen Museen und akademischen Einrichtungen, darin Beschreibung von Informationsextraktion mittels OCR Software Tools und Software für rückwirkende Georeferenzierung => Fazit „There is significant interest in natural language processing (NLP), which is designed to parse OCR text into fields, as well as intelligent character recognition (ICR) or handwriting analysis, but effective systems for using these technologies to extract data from biological specimens were not observed.”:

Nelson & al. (2012): Five task clusters that enable efficient and effective digitization of biological collections. ZooKeys 209: 19–45, doi: 10.3897/zookeys.209.3135

  • Mobilisierung und Verlinkung von biologischen Multimediadaten im EUROPEANA Portal (OpenUp! Projekt), hier besonders ‘Data quality control’ und ‘Semantic enrichment’, dh. Verknüpfung von wissenschaftlichen Namen mit Trivialnamen (Vulgärnamen bereitgestellt vom Naturhistorischen Museum Wien):

Berendsohn & Güntsch (2012): OpenUp! Creating a cross-domain pipeline for natural history data. ZooKeys 209: 47–54, doi: 10.3897/zookeys.209.3179

  • Test von Digitalisierungsworkflows am Royal Botanic Garden Edinburgh

Robyn E. Drinkwater, Robert W. N. Cubey, Elspeth M. Haston (2014): The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels. PhytoKeys 38: 15–30 (2014) doi: 10.3897/phytokeys.38.7168

Adresse: Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK

Kontakt: Elspeth M. Haston (

Sammlungsdatenbanksoftware / Collection Management Software

Übersicht und einige Beispiele


Überblick über Collection Managament Systems in GfBio biowikifarm:







Beschreibung: DiversityCollection is focused on the management of specimens in scientific collections and the handling of observation data. In this context it is designed to document any action concerning the collection, storage, exchange and treatment of specimens in a collection and is also appropriate to store observation data with analyses added. The Diversity GIS Editor is integrated. DiversityCollection is distinguished from other collection management systems by its focus on biological relations between organisms linked together as one or more specimens or observations (e. g., host, parasite, hyperparasite, symbionts, etc.). DiversityCollection keeps only data connected with the handling of collection specimens, parts of specimens and observations. Data of other realms like, e. g., taxonomy and references as well as scientific term systems and sampling plots are handled in separate modules. For an overview of the available DWB components see the Diversity Workbench Main Page.



Beschreibung: BG-BASE is a PC-based database application written primarily to handle the information management needs of institutions and individuals holding living and/or preserved collections of biological material, including botanic gardens, arboreta, zoos, herbaria, museums, libraries, university campuses, horticultural societies and private collections.

Support: proprietary product, BG-BASE is developed and supported from two international centers, one in the US and the other in the UK.




Kontakt: email:, Universität Kansas

Beschreibung: Specify is a database software application for museum and herbarium research data. It manages species and specimen information for computerizing collections, tracking museum specimen transactions, linking images to specimen records, and publishing catalog data to the Internet. Specify is written in Java. mehr:


Source code:




Beschreibung: management system working with and integrating data and images from specimens, botanical surveys, field observations, living collections, seed banks and literature. more:

Digitalisierungsprojekte / -initiativen

Integrated Digitised Collections – iDigBio

Beschreibung: National Resource for Advancing Digitization of Biodiversity Collections (ADBC) funded by the National Science Foundation (USA). Through ADBC, data and images for millions of biological specimens are being made available in electronic format for the research community, government agencies, students, educators, and the general public.

Auf dem Website befindet sich ein Wiki mit vielen Hinweisen, hier besonders ‘iDigBio Data Ingestion’- ‘Digitisation Resources’: z.B Workflows and Protocols, Georeferencing Rescources. Sehr interessante Hinweise unter 'Working Groups / aOCR (Augmenting OCR) hier: 'OCR related Materials !'

URL: und

Digitarium, Finnland


Beschreibung: Digitalisierungszentrum des Naturhistorischen Museums Finnland, Massendigitalisierung, industrieller Ansatz

Picturae / Naturalis, Niederlande

Beschreibung: Picturae ist eine niederländische Firma, die für das Naturkundemuseum Naturalis in Leiden, Niederlande, die Digitalisieung der Sammlungsbelege durchgeführt hat. Dafür wurde ein Workflow entwickelt, der die Massendigitalisierung von Herbarbelegen u.a. durch Einsatz von „Digitstreets“ ermöglicht.


URL Digitalisierung Naturalis, Leiden:

MNHN Paris, Frankreich

URL: (power point slides)

Beschreibung: Massendigitalisierung des gesamten Pariser Herbariums in industriellen Arbeitsabläufen

Next Generation Phenomics for the Tree of Life, USA

Beschreibung: The Next Generation Phenomics project seeks to develop and adapt tools to assemble large phenomic datasets in a rapid and automated way. This project consists of computer vision, natural language processing, and crowdsourcing components. The Computer Vision (CV) team is developing methods that automate the extraction and annotation of phenomic characters from digital images using computer learning approaches. The new CV algorithms can discern the presence/absence of features and assess their spatial relationships and appearance. The Natural Language Processing (NLP) group is developing software to transforms digitized taxonomical descriptions into taxon/character matrices for phylogenetic analyses. Also, because microbial descriptions often differ radically from those of other organisms, the NLP group is developing supervised learning strategies to extract phenomic characters from microbial descriptions. Finally, the crowdsourcing team has developed software, The Evolution Project, that works with MorphoBank to present images of character states to crowds for scoring.




URL Technische Dokumentation:

Beschreibung: The project´s objective is to exemplarily develop a specification for an annotation data repository for networked and highly complex biodiversity data. AnnoSys is based on a prototype developed in the context of SYNTHESYS and uses the Open Annotation Data Model and an RDF-database for storing the information. AnnoSys is implemented using the example of collection and observation data in the botanic domain provided by the GBIF/BioCASE system (currently over 50.8 million records, including 15 million records from natural history collection objects).

Filtered Push


Beschreibung:We are designing and implementing a network, which we term Filtered Push, to connect remote sites where annotations can be generated with the authoritative databases of the collections holding the vouchers to which those annotations apply.

Standards und Praxisregeln


TDWG - Biodiversity Information Standards

TDWG formuliert Biodiversitätsstandards und bildet Arbeitsgruppen zu verschiedenen Themenbereichen.


Beschreibung: Biodiversity Information Standards (TDWG), also known as the Taxonomic Databases Working Group, is a not for profit scientific and educational association that is affiliated with the International Union of Biological Sciences. TDWG was formed to establish international collaboration among biological database projects. TDWG promoted the wider and more effective dissemination of information about the World's heritage of biological organisms for the benefit of the world at large. Biodiversity Information Standards (TDWG) now focuses on the development of standards for the exchange of biological/biodiversity data.

ABCD Access to Biological Collection Data (TDWG Standard)


Beschreibung: The Access to Biological Collections Data (ABCD) Schema is an evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data). The ABCD Schema attempts to be comprehensive and highly structured, supporting data from a wide variety of databases. It is compatible with several existing data standards. Parallel structures exist so that either (or both) atomised data and free-text can be accommodated. Version 1.2 is currently in use with the GBIF (Global Biodiversity Information Facility) and BioCASE (Biological Collection Access Service for Europe) networks. Apart from the GBIF and BioCASE networks, the potential for the application of ABCD extends to internal networks, or in-house legacy data access (e.g. datasets from external sources that shall not be converted and integrated into an institutions own data, but be kept separately, though easily accessible). By defining relations between terms, ABCD is a step towards an ontology for biological collections. ABCD concepts: ABCD 2.06 Schema:

Darwin Core (TDWG Standard)


Beschreibung: The Darwin Core is body of standards. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. The Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information. Included are documents describing how these terms are managed, how the set of terms can be extended for new purposes, and how the terms can be used. Used by: GBIF

TaxonX - systematics literature mark up standard


Beschreibung: TaxonX is an XML schema for encoding taxonomic treatments to: create open, persistent full digital surrogates of treatments; identify treatments and their major structures; identify textual data such as names, localities, characters, and citations

Audobon Core - Audobon Core Multimedia Recources Metadata Standard


Beschreibung: The Audubon Core metadata schema ("AC") is a representation-neutral metadata vocabulary for describing biodiversity-related multimedia resources and collections.

EML Ecological Metadata language


Beschreibung: Ecological Metadata Language (EML) is a metadata specification developed by the ecology discipline and for the ecology discipline. It is based on prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications). EML is implemented as a series of XML document types that can be used in a modular and extensible manner to document ecological data. Each EML module is designed to describe one logical part of the total metadata that should be included with any ecological dataset.


Guidelines, Praxisregeln, Direktiven

Metamorfoze Preservation Imaging Directives (with three quality levels), Niederlande


DFG Praxisregeln Digitalisierung


GBIF - Best practice guide Georeferenzing


MANIS Georeferenzierungsguidelines


Tools zur Informationsextraktion / Text Mining

Softwareentwicklungen, die die automatische Informationsextraktion unterstützen


National Centre for Text Mining, Manchester UK


Beschreibung:The National Centre for Text Mining (NaCTeM) is the first publicly-funded text mining centre in the world. We provide text mining services in response to the requirements of the UK academic community. NaCTeM is operated by the University of Manchester.

Argo Projekt - A web based text mining workbench

Beschreibung: Argo is a workbench for building and running text-analysis solutions. It facilitates the development of custom workflows from a selection of elementary analytics.


Taxon Namen

GBIF Name Finder

Code URL:

Beschreibung: Name Finder which is a GNA Name Finding API compliant java implementation based on lucene for finding scientific names in arbitrary text documents.

GBIF Name Parser

URL: URL API Dokumentation:

Beschreibung: The parser is written in java and based on regular expressions to disect name strings into its components. It does only keep name parts required to reconstruct a full 3-parted name with an optional subgenus, but ignores additional infraspecific parts such as the subspecies given for varieties.

Global Names


Beschreibung: Global Names recognition and discovery tools (grnd) and services make it easy to find scientific names on web pages, PDFs, Microsoft Office documents, images, or in freeform text. Encrypted or image-based PDFs and image files first pass through an OCR routine using Tesseract prior to using the excellent TaxonFinder and NetiNeti names discovery engines. The language of incoming content is determined using unsupervised language detection. If the language is other than English, TaxonFinder is prefered. Found names can be optionally resolved against a number of resources.



Beschreibung:taxonfinder detects scientific names in plain text. Given a string, it will scan through the contents and use a dictionary-based approach to identifying which words and strings are latin scientific organism names. It detects names at all ranks including species, genera, subspecies and more. mehr Info:

GBIF Checklist Bank

Code URL:

Beschreibung: Checklist Bank serves as a dynamic archive of "checklists," which are summarized lists of taxa or taxon names. Checklist Bank stores checklist data as it was provided by the data publisher. In addition, Checklist Bank attempts to collate different published checklist resources by tying the atomic elements of checklists, taxon names, to a common names dictionary.



Beschreibung: a standalone command line application capable of identifying taxonomic mentions in documents and mapping them to corresponding NCBI Taxonomy database entries.


EUBON Taxonomic Backbone


Beschreibung: The Unified Taxonomic Information Service (UTIS) is the taxonomic backbone for the EU BON project. The EU BON Taxonomic backbone allows running a federated search on multiple European checklists and returns a unified result set of the individual responses of the various checklists. The current implementation of the UTIS is still a prototype, which means that the API and data model may be changed until final release. It connects the web services of the Pan-European Species directories Infrastructure, PESI CoL, the Catalogue of Life and of the World Register of Marine Species (WoRMS). In future it will connect more data providers like EUNIS and Natura2000 in order to be compliant with the INSPIRE directive. Currently it is possible to search for taxa and synonyms by a scientific name or vernacular name string. In case of matching synonyms the according accepted taxon is resolved. The search results always include information on the classification and on related taxa so far as this data is delivered by the connected checklist providers.


Taxamatch - fuzzy matching algorithm for genus and species scientific names

URL: Taxamatch Developers' Wiki:

Beschreibung: TAXAMATCH employs both phonetic and non-phonetic matching (to detect errors of either type, or both) along with a set of heuristic rules that are incorporated into pre- and post- filters at both genus and species epithet level. In the main, the pre-filters maximimise algorithm efficiency by ensuring that only a subset of available names have to be tested, while the post-filters apply heuristic reasoning to distinguish likely "true" from "false" near matches, although they may have the same calculated similarity.

Geografische Information

Tools zur (halb-)automatischen Georeferenzierung von Ortsinformationen



Beschreibung: Geolocate is a platform for Georeferencing Natural History Collections Data. The GEOLocate project is an effort to develop software and services for translating textual locality descriptions associated with biodiversity collections data into geographic coordinates.


Beschreibung:The Java BioGeomancer Core API is used for georeferencing localities (example: 5 Miles West of Berkeley) - nicht weiter gepflegt -


Lizenz: Apache 2.0



Beschreibung: allows a user to determine the geocode and associated error for a locality that is at a fixed distance and direction from a known locality. The database has approximately 110 thousand names of Brazilian geocoded localities.

Label Informationsextraktion

Apiary Project (High-Throughput Workflow for Computer-Assisted Human Parsing of Biological Specimen Label Data)

Beschreibung: The Texas Center for Digital Knowledge (TxCDK) at the University of North Texas and the Botanical Research Institute of Texas (BRIT) are conducting fundamental research with the goal of identifying how human intelligence can be combined with machine processes for effective and efficient transformation of textual museum specimen label information into high-quality machine-processible parsed data. This two-year project, which we call Apiary, will advance understanding of the workflow and processes best able to increase access to and use of digitized biological collection metadata within the stakeholder communities comprised of biologists, natural history museum collections managers, biodiversity standards groups, and the library and information science community.


Kontakt: Jason Best, URL:, E-Mail:

SALIX, the Semi-automatic Label Information Extraction system

Beschreibung: nutzt OCR und weitere Software zur Erschließung von Etikettendaten von Sammlungsobjekten, entwickelt an der Arizona State University (ASU), USA. mehr:



Darwin Score

Source URL:

Beschreibung: A method to evaluate text representing a natural history bio-collection object. Scores calculated in this process can help provide a rough evaluation of the quality of the text, whether it be generated by human transcription or OCR. Initially, this will focus on labels and annotations applied to herbarium collections. This process will check the text for words including taxonomic names, collector names, annotator names, location names, common abbreviations and other expected words (all stored in purpose-built dictionaries) and patterns such as dates, numbers, and geocoordinates. The text is given a score based on the number of the matches found. Includes Dictionaries and simple regex patterns.

Kontakt: Jason Best, Director of Biodiversity Informatics, Botanical Research Institute of Texas (BRIT), URL:, E-mail:



Beschreibung: ScioTR is a new touch-enabled Windows 8 app which integrates Optical Character Recognition (OCR), Consensus Strategy, and Machine Learning (ML) to provide an efficient workflow for digitizing images into custom data fields. Label unspezifiziert: office labels,small forms,food labels,product labels,music collection,travel receipts,business cards,library card catalogs.

Data Quality Tools

BioVel - Data Refinement Workflow


Beschreibung: The Taxonomic Data Refinement Workflow provides an environment for preparing observational and specimen data sets for use in scientific analyses such as: species distribution analysis,species richness and diversity studies, species occurrence studies, historical analysis, and other spatio-temporal analyses.

BinHuM - Data quality tools

BiNHum Projekt URL:

Beschreibung: Quality Testing in verschiedenen Schritten, alle bezogen auf geografische Angaben: 1. Übersetzung von Ländernamen ins Englische, die Ländernamen verschiedener Sprachen aus Wikipedia 2. Zuordnung untergeordneter geografischer Einheiten, z.B. Bundesländer, die versehentlich dem ABCD Element ‚Country‘ zugeordnet werden 3. Wenn Element ‚Country‘ leer, dann Suche in anderen ABCD Elementen mit geografischem Bezug 4. Prüfung: Eintrag in ISO Country Code (ja/nein), wenn ja, dann Abgleich Eintrag ISO Code mit Country Name -> Ergebnis entweder ‚Warnhinweis‘ oder weitere Prüfung: Wenn Koordinaten vorhanden Test mit open Source GIS Applikation. 5. Suche im Geonames Server (Suchanzahl begrenzt)und Google Maps 6. Entwicklung eines Ozean Thesaurus

Data Quality Hub


Kontakt: GBIF Spanien

Beschreibung: liefert eine Übersicht über diverse Data Quality tools: Detection tools,Validation tools, Thesauri Checklists, Thesauri IsoCodes, Procedures and Best Practices

Workflow Umgebungen



Beschreibung: Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation. Taverna has been created by the myGrid team and is currently funded though FP7 projects BioVeL, SCAPE and Wf4Ever. Taverna workflows can be shared through the site and be executed on HPC platforms, such as Taverna is oriented towards the processing of potentially large volumes of data that require multiple steps. When compared to BPMN, Taverna's workflow notation is much simpler, as it doesn't try to support messages, events and other concepts present in the former notation.



Beschreibung: The Kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler. Kepler is designed to help scien­tists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines.

Download Link:

Argo Projekt - A web based text mining workbench

Beschreibung: Argo is a workbench for building and running text-analysis solutions. It facilitates the development of custom workflows from a selection of elementary analytics.


Web Service Registries

Biodiversity Catalogue


Beschreibung: The BiodiversityCatalogue is a centralised registry of curated biodiversity Web services. It allows you to easily discover, register, annotate, monitor and use Web services.

Stable Identifiers

Best Practises for Stable URIs


Source Code and Example Elements:

Hintergrund: "In a "stable identifier hackathon" in June 2013 in Edinburgh, five CETAF institutions (Royal Botanical Garden Edinburgh, Museum für Naturkunde in Berlin, Royal Botanic Garden Kew, National Museum of National History Paris and Botanical Museum Berlin-Dahlem) committed to a rapid pilot implementation of the system. Naturalis Biodiversity Center in the Netherlands also plans to join this effort." (from: