Projektrelevante domänenspezifische Infrastrukturen - BGBM
Contents
- 1 projektrelevante Infrastrukturen (Biodiversitätsinformatik)
- 1.1 Biodiversitätsnetzwerke, Infrastrukturen
- 1.2 themenspezifisches Wissen / Dictionaries
- 1.3 Einbindung von Thesauri, Terminologie Server
- 1.4 Handschriftensammlungen (Digitalisate)
- 1.5 Digitalisierung
- 1.6 Annotationssysteme
- 1.7 Standards und Praxisregeln
- 1.8 Tools zur Informationsextraktion / Text Mining
- 1.9 Data Quality Tools
- 1.10 Workflow Umgebungen
- 1.11 Web Service Registries
- 1.12 Stable Identifiers
projektrelevante Infrastrukturen (Biodiversitätsinformatik)
Biodiversitätsnetzwerke, Infrastrukturen
GBIF - Global Biodiversity Information Facility
URL: http://www.gbif.org/
Beschreibung: GBIF ist eine internationales Netzwerk, das den freien Zugang zu Biodiversitätsdaten über das Internet ermöglicht. - mehr: http://www.gbif.org/whatisgbif , [Fuzzy Matching integriert]
BioCASe (Europäisches Sammlungsnetzwerk)
Beschreibung: Der Biological Collection Access Service for Europe, BioCASE, ist ein transnationales Netzwerk biologischer Sammlungen der verschiedensten Arten. BioCASE ermöglicht den Zugriff auf verteilte, heterogene europäische Sammlungs- und Beobachtungsdatenbanken und nutzt konsequent betriebssystemunabhängige Open-Source-Software sowie offene Datenstandards und -austauschprotokolle.
BioCASe Portal: http://search.biocase.org/europe/
BioCASe Technologie: Der Begriff "BioCASe" wird häufig zur Bezeichnung der Technologien verwendet, die im Rahmen des BioCASE-Projekts entwickelt wurden, insbesondere für das BioCASe-Protokoll und die BioCASe Provider-Software. Diese Technologien ermöglichen es, eine beliebig strukturierte Datenbank und primäre Biodiversitätsnetzwerke wie BioCASE oder GBIF anzubinden.
BioCASe Provider Software Wiki: http://wiki.bgbm.org/bps/index.php/Main_Page
ALA - Atlas of Living Australia
URL: http://www.ala.org.au/about-the-atlas/
Beschreibung:The Atlas was initiated by a group of 14 (now 17) organisations—our partners. The intent was to create a national database of all of Australia’s flora and fauna that could be accessed through a single, easy to use web site. [Fuzzy Matching integriert]
ALA Downloadable tools:http://www.ala.org.au/about-the-atlas/downloadable-tools/
ALA Webservices: http://api.ala.org.au/
Canadensys
URL:http://www.canadensys.net/
Beschreibung:Canadensys makes biodiversity information freely and openly available to everyone. We are a network of researchers, collectors, curators, information technologists, students, and educators that shares data on the occurrence and identity of plant, animal, and fungal species in Canada. Members of GBIF.
Web service page: http://data.canadensys.net/vascan/api
BHL - Biodiversity Heritage Library
URL Developer Tools and API: http://biodivlib.wikispaces.com/Developer+Tools+and+API
Beschreibung: The Biodiversity Heritage Library (BHL) is a consortium of natural history and botanical libraries that cooperate to digitize the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.” The BHL consortium works with the international taxonomic community, rights holders, and other interested parties to ensure that this biodiversity heritage is made available to a global audience through open access principles.
themenspezifisches Wissen / Dictionaries
wissenschaftliche Namen
International Plant Names Index – IPNI
URL: http://www.ipni.org/
Beschreibung: database of the names and associated basic bibliographical details of seed plants, ferns and lycophytes / Datenbank mit Namen und assoziierten bibliografischen Details von Samenpflanzen, Farnen und Bärlappgewächsen. Wird ständig aktualisiert. - Pflanzen
Kontakt: ipnieditors@ipni.org (zum download von mehr als 5.000 Datensätzen)
Web Service for Matching IPNI names
URL: (beta version) http://data1.kew.org/reconciliation/
URL: (biodiversity catalogue) https://www.biodiversitycatalogue.org/services/84
Beschreibung: (Matthew Blisset, Kew): The software is flexible and can use any kind of data, the first service we have released is for matching to IPNI names. This is built using a sequence of transformations to both the list of authoritative names and the query, for each part of the name. These transformations should provide better matches than some other services which just use Levenshtein distance etc. For example, it ignores double letters or a changed Latin ending.
The service is exposed in three ways:
- An OpenRefine (Google Refine) Reconciliation Service
- A custom API which is a bit simpler to use
- A batch upload of a CSV file
I've concentrated on the OpenRefine method.
I've also implemented a few bits of the "Metaweb Query Language" API on ThePlantList, which allows an OpenRefine extension to query ThePlantList using an IPNI id, and retrieve information held by TPL for that name.
TROPICOS
URL: http://www.tropicos.org/Home.aspx
Beschreibung: ‘Tropicos® was originally created for internal research but has since been made available to the world’s scientific community. All of the nomenclatural, bibliographic, and specimen data accumulated in MBG’s electronic databases during the past 25 years are publicly available here. This system has over 1.2 million scientific names and 4.0 million specimen records.’ Suchportal über die Datenbanken des Missouri Botanical Garden, USA. Suche möglich unter anderem nach: wissenschaftlichen Namen, Personen, Orten. - Pflanzen
TROPICOS Web Services: http://services.tropicos.org/
Kontakt: Missouri Botanical Garden, http://www.tropicos.org/Feedback.aspx?feedbackoption=4
WSCP Kew's World Checklist
URL: http://apps.kew.org/wcsp/home.do
Beschreibung: WCSP is an international collaborative programme that provides the latest peer reviewed and published opinions on the accepted scientific names and synonyms of selected plant families. It allows you to search for all the scientific names of a particular plant, or the areas of the world in which it grows (distribution). The checklist includes 173 Seed Plant families (View list of included families). Different families are in different stages of review as indicated in the family list. There are currently more than 155 contributors from 22 countries. - Pflanzen
Kontakt: Rafaël Govaerts, email: R.Govaerts@kew.org
Catalogue of Life Service
URL: https://www.biodiversitycatalogue.org/services/17
Beschreibung: This web service endpoint serves as a search engine for scientific name-related taxonomic information.
Euro+Med Plantbase
URL:http://www.emplantbase.org/home.html
Beschreibung:The Euro+Med PlantBase provides an on-line database and information system for the vascular plants of Europe and the Mediterranean region, against an up-to-date and critically evaluated consensus taxonomic core of the species concerned. The Euro+Med PlantBase is part of the Pan-European Species directories Infrastructure (PESI). - Pflanzen
Kontakt: http://www.emplantbase.org/contacts.html
Australian Plant Name Index – APNI
URL: http://www.anbg.gov.au/apni/index.html
Beschreibung: APNI is a tool for the botanical community that deals with plant names and their usage in the scientific literature. - Maintained by the Australian National Botanic Gardens as part of its larger IBIS database, in collaboration with the Centre for Australian National Biodiversity Research and the Australian Biological Resources Study - Pflanzen
GRIN – Taxonomy of Plants
URL: http://www.ars-grin.gov/cgi-bin/npgs/html/index.pl?language=en
Beschreibung: Taxonomische Daten im GRIN bestimmen die Struktur und Benennung für die Akzessionen im Nationalen Genetische-Ressourcen-System für Pflanzen (NPGS), Teil des Nationalen Programms von Genetische Ressourcen (NGRP) von Landwirtschaftlicher Forschung Service (ARS) der Abteilung der Landwirtschaft der Vereinigten Staaten von Amerika (USDA). Alle Familien und Gattungen der Pflanzen und 52.095 Arten aus der ganzen Welt insbesondere ökonomische Pflanzen und ihre verwandten Arten, werden im GRIN Taxonomie der Pflanzen repräsentiert. Angaben beinhalten die wissenschaftlichen Namen und Volksnamen, Klassifizierung, Verbreitung, Referenzen und Informationen über die ökonomischen Nutzungen. - Pflanzen
PESI
URL: http://www.eu-nomen.eu/pesi/
Beschreibung:PESI provides standardised and authoritative taxonomic information by integrating and securing Europe’s taxonomically authoritative species name registers and nomenclators (name databases) and associated exper(tise) networks that underpin the management of biodiversity in Europe
Web service page: http://www.eu-nomen.eu/portal/webservices.php [fuzzy matching integriert]
Species 2000 /Catalogue of Life
URL: http://www.sp2000.org/sp2kwebsite/index.php?option=com_content&task=view&id=40&Itemid=49
Kontakt: sp2000@sp2000.org
Beschreibung: online database of the world's known species of animals, plants, fungi and micro-organisms. CoL (Catalogue of Life): starting point for GBIF taxonomic backbone
Web service page : http://webservice.catalogueoflife.org/col/webservice
GNI - Global Names Index
URL: http://gni.globalnames.org/
Beschreibung: Index von Namenskatalogen, u.a. uBio Name Bank, ITIS, EOL, GBIF, IPNI. 'GNI is a collection of strings (combinations of characters) that have been used as names for organisms. GNI contains many examples of names spelled in slightly different ways. In order to be able to link all of the information about any taxon, a query beginning with one string will find data associated with any of the alternative strings. This is done by linking the alternative names for the same species - a process called 'reconciliation'. GNI is a component of the Global names Architecture, an effort that is building a names-based cyberinfrastructure for biology (Patterson, D. J., Cooper, J., Kirk, P. M.,Pyle, R.L. and Remsen D. P. 2010. Names are key to the big new biology. TREE 25: 686-691).'
Global Names Index API: http://www.biodiversitycatalogue.org/services/61
GBIF taxonomic backbone
URL: http://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c
Beschreibung: The GBIF Backbone Taxonomy, often called the Nub taxonomy, is a single synthetic management classification with the goal of covering all names GBIF is dealing with. It's the taxonomic backbone that allows GBIF to integrate name based information from different resources, no matter if these are occurrence datasets, species pages, names from nomenclators or external sources like EOL, Genbank or IUCN. This backbone allows taxonomic search, browse and reporting operations across all those resources in a consistent way and to provide means to crosswalk names from one source to another. It is updated regulary through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families.
ITIS – Integrated Taxonomic Information System
URL: http://www.itis.gov/
Beschreibung: the Integrated Taxonomic Information System! Here you will find authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world. We are a partnership of U.S., Canadian, and Mexican agencies (ITIS-North America); other organizations; and taxonomic specialists. ITIS is also a partner of Species 2000 and the Global Biodiversity Information Facility (GBIF). The ITIS and Species 2000 Catalogue of Life (CoL) partnership is proud to provide the taxonomic backbone to the Encyclopedia of Life (EOL).
Exsiccatae
IndExs - Index of Esiccatae
URL: http://indexs.botanischestaatssammlung.de/
Beschreibung: "IndExs" comprises information on titles, abbreviations and bibliography of exsiccatae. Exsiccatae are defined here as "published, uniform, numbered sets of preserved specimens distributed with printed labels" (Pfister 1985). Please note that there are two similar latin terms: "exsiccata, ae" is feminine and used for a set of dried specimens as defined above, whereas the term "exsiccatum, i" is neutral and used for dried specimens in general. You may search "IndExs" using title, part of the title, editor and group of organisms alone or combined.
Trivialnamen
OpenUp common names webservice
Beschreibung: entwickelt an der Universität Wien, Service zur Anreicherung der wissenschaftlichen Namen mit Trivialnamen in unterschiedlichen Sprachen entwickelt. Service wird bereits von Europeana genutzt. Weitere Hinweise im Newsletter: http://open-up.eu/sites/open-up.eu/files/Newletter4_PRNT_0.pdf
Australian Common Name Database
URL: http://www.anbg.gov.au/common.names/
Beschreibung: Datenbank für Trivialnamen Australischer Pflanzen
Verantwortlich: Integrated Botanical Information System (IBIS), Australian National Herbarium
Sprache: englisch
Standardliste der Farn- und Blütenpflanzen Deutschlands
Beschreibung: deutsche Namen in der: Standardliste der Farn- und Blütenpflanzen Deutschlands Rolf Wisskirchen, Henning Haeupler: Standardliste der Farn- und Blütenpflanzen Deutschlands. Mit Chromosomenatlas. Herausgegeben vom Bundesamt für Naturschutz (= Die Farn- und Blütenpflanzen Deutschlands. Band 1). Eugen Ulmer, Stuttgart (Hohenheim) 1998, ISBN 3-8001-3360-1.
Sprache: deutsch
Personen (Sammler / Collectors, Autoren )
Index Herbariorum
URL: http://sciweb.nybg.org/science2/IndexHerbariorum.asp
Beschreibung: Liste von Sammlernamen und Spezialisten
Kontakt: http://sciweb.nybg.org/science2/Contacts.asp.html
International Plant Names Index – IPNI
URL: http://www.ipni.org/
Beschreibung: Autorennamen
Kontakt: ipnieditors@ipni.org (zum download von mehr als 5.000 Datensätzen)
Index of Botanists (Collectors)
Beschreibung: Harvard Index of Botanists (Suchfilter ‚Collector‘)
URL: http://kiki.huh.harvard.edu/databases/botanist_index.html
Kontakt: http://huh.harvard.edu/pages/contact
TROPICOS
URL: http://www.tropicos.org/Home.aspx
Beschreibung: ‘Tropicos® was originally created for internal research but has since been made available to the world’s scientific community. All of the nomenclatural, bibliographic, and specimen data accumulated in MBG’s electronic databases during the past 25 years are publicly available here. This system has over 1.2 million scientific names and 4.0 million specimen records.’ Suchportal über die Datenbanken des Missouri Botanical Garden, USA. Suche möglich unter anderem nach: wissenschaftlichen Namen, Personen, Orten.
TROPICOS Web Services: http://services.tropicos.org/
Kontakt: Missouri Botanical Garden, http://www.tropicos.org/Feedback.aspx?feedbackoption=4
Australian Plant Collectors and Illustrators 1780s-1980s
URL: http://www.anbg.gov.au/bot-biog/index.html
Beschreibung: Australische Pflanzensammler und Illustratoren
Verantwortlich: This web site is based on the list published by J.H. Willis, D. Pearson, M.T. Davis, and J.W. Green, Western Australian Herbarium Research Notes Number 12, August 1986. That original list has been supplemented by additional entries and some updates of dates, especially where people have died since that publication. It has been further supplemented with information from Alex George's 2009 publication Australian Botanist's Companion, published by Four Gables Press, WA.
Cyclopedia of Malesian Collectors
URL: http://www.nationaalherbarium.nl/fmcollectors/Home.htm
Botanical Collectors: Africa (Natural History Museum, London)
URL: http://www.plantcollectors.co.uk/
Botanical Collectors: Latin America (Natural History Museum, London)
URL: http://www.plantcollectors.co.uk/LAPI.asp?
Liste von Herbarinstitutionen
Index Herbariorum
URL: http://sciweb.nybg.org/science2/IndexHerbariorum.asp
Beschreibung: Liste Herbarien und offizielle Abkürzungen - eindeutige Instistuts IDs
Kontakt: http://sciweb.nybg.org/science2/Contacts.asp.html
Geografische Information, Gazetteers
Getty Thesaurus
URL: http://www.getty.edu/research/tools/vocabularies/tgn/index.html
Geonames, the United States Board on Geographic Names
URL: http://geonames.usgs.gov/
JRZ fuzzy gazetteer
URL: http://dma.jrc.it/services/fuzzyg
weitere Vokabularien, Glossare, Abkürzungen
Abkürzungen und Symbole in der biologischen Nomenklatur
Beschreibung: Es werden Abkürzungen und Formulierungen aus der Nomenklatur und Taxonomie von Zoologie, Botanik, Kulturpflanzen, Virologie und Bakterien alphabetisch aufgelistet. Die Erläuterung erfolgt meist anhand von Beispielen.
Publikation: Wolfgang Granzow (2000): Abkürzungen und Symbole in der biologischen Nomenklatur. Senckenbergiana lethaea 80 (2) 355 – 370.
Download pdf: http://ashipunov.info/jurassic/j/Granzow,%202000_Nomenklatur.pdf
Terms Used in Bionomenclature
Beschreibung: This text is a comprehensive a glossary of over 2,100 terms used in biological nomenclature - the naming of whole organisms of all kinds. It is accompanied by a web application that enables the glossary to facilitate semantic linking on the web.
Download pdf:http://www.gbif.org/resources/2647
Einbindung von Thesauri, Terminologie Server
GfBio Terminology Server
Beschreibung: Der Terminology Server verknüpft externe und interne Vokabularien (kontrolliertes Vokabular, Glossare, Thesauri, Ontologien) zum Thema Biodiversität, entwickelt im Rahmen des GfBio Projektes (http://www.gfbio.org/ )
URL: http://terminologies.gfbio.org/
TOQE – Thesaurus optimized query expander
Beschreibung: Einbindung von Thesauri über eine Service-Schnittstelle
Dokumentation: http://search.biocase.org/toqe/
Publikation: http://journals.ku.edu/index.php/jbi/article/view/1631/3472
Handschriftensammlungen (Digitalisate)
Online verfügbare Handschriftensammlungen bekannter Autoren
Chirographicum historicum
URL: http://harvest.cals.ncsu.edu/chiro/about.html
Kontakt: http://harvest.cals.ncsu.edu/chiro/contact.html, North Carolina University
Auxilium ad Botanicorum Graphicem
URL: http://www.ville-ge.ch/musinfo/bd/cjb/auxilium/index.php
Kontakt: http://www.ville-ge.ch/musinfo/bd/cjb/auxilium/contactus.php, Herbarium Genf
Handwriting Linnean Herbarium
URL: http://linnaeus.nrm.se/botany/fbo/hand/welcome.html.en
Kontakt: Swedish Natural History Museum, Linnean Herbarium
Global Plants Initiative Designation Identifier
URL: http://gpi.myspecies.info/digitising-resources/designation-identifier
Kontakt: Global Plants Initiative
CALIGRAFÍAS del Herbario MA, Madrid
URL: http://www.floraiberica.es/caligrafia/index.php
Digitalisierung
Literaturhinweise
Sehr gute Übersicht über Entwicklungen im Bereich der Massendigitalisierung und Digitalisierungsworkflows:Vladimir Blagoderov & Vincent Smith (2012): No specimen left behind: mass digitization of natural history collections
URL: http://www.pensoft.net/journals/zookeys/issue/209/
Hierin u.a.
- Beschreibung eines modularen Digitalisierungsworkflows in Edinburgh:
Haston & al. (2012): Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach. ZooKeys 209: 93–102, doi: 10.3897/zookeys.209.3121
- Workflow zur Durchführung einer Massendigitalisierung am Natural History Museum London:
Blagoderov & al. (2012): No specimen left behind: industrial scale digitization of natural history collections. ZooKeys 209: 133–146, doi: 10.3897/zookeys.209.3178
- Massendigitalisierung Naturalis Biodiversity Center, Leiden, Niederlande:
van den Oever & Gofferjé (2012): ‘From Pilot to production’: Large Scale Digitisation project at Naturalis Biodiversity Center. ZooKeys 209: 87–92, doi: 10.3897/zookeys.209.3609
- Über das Digitalisierungscentrum des Naturkundemuseums Joensuu, Finnland (Digitarium): Tegelberg & al. (2012):
The development of a digitising service centre for natural history collections. ZooKeys 209: 75–86, doi: 10.3897/zookeys.209.3119
- Methoden der Effizienzsteigerung bei der Herbardigitalisierung am Herbarium des New York Botanical Gardens, hier besonders ‚Strategy 3: semi-automated approach‘ unter Verwendung von Tools für eine halbautomatische Informationsextraktion von Etiketten wie ‚SALIX3‘ und ‚Apiary‘ (siehe auch Informationsextraktion) und Dublettenerkennung mit ‚Specify‘ (siehe auch Sammlungserfassungssoftware):
Tulig & al. (2012): Increasing the efficiency of digitization workflows for herbarium specimens. ZooKeys 209: 103–113, doi: 10.3897/zookeys.209.3125
- Dokumentation und Vergleich von Digitalisierungsworkflow-Komponenten und Protokollen von 28 Programmen in 10 US-amerikanischen Museen und akademischen Einrichtungen, darin Beschreibung von Informationsextraktion mittels OCR Software Tools und Software für rückwirkende Georeferenzierung => Fazit „There is significant interest in natural language processing (NLP), which is designed to parse OCR text into fields, as well as intelligent character recognition (ICR) or handwriting analysis, but effective systems for using these technologies to extract data from biological specimens were not observed.”:
Nelson & al. (2012): Five task clusters that enable efficient and effective digitization of biological collections. ZooKeys 209: 19–45, doi: 10.3897/zookeys.209.3135
- Mobilisierung und Verlinkung von biologischen Multimediadaten im EUROPEANA Portal (OpenUp! Projekt), hier besonders ‘Data quality control’ und ‘Semantic enrichment’, dh. Verknüpfung von wissenschaftlichen Namen mit Trivialnamen (Vulgärnamen bereitgestellt vom Naturhistorischen Museum Wien):
Berendsohn & Güntsch (2012): OpenUp! Creating a cross-domain pipeline for natural history data. ZooKeys 209: 47–54, doi: 10.3897/zookeys.209.3179
- Test von Digitalisierungsworkflows am Royal Botanic Garden Edinburgh
Robyn E. Drinkwater, Robert W. N. Cubey, Elspeth M. Haston (2014): The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels. PhytoKeys 38: 15–30 (2014) doi: 10.3897/phytokeys.38.7168
Adresse: Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh, EH3 5LR, UK
Kontakt: Elspeth M. Haston (e.haston@rbge.org.uk)
Sammlungsdatenbanksoftware / Collection Management Software
Übersicht und einige Beispiele
Übersicht
Überblick über Collection Managament Systems in GfBio biowikifarm: http://gfbio.biowikifarm.net/wiki/Technical_documentation_of_collection_management_systems_at_the_GFBio_collection_archives
JACQ
Dokumentation: http://jacq.nhm-wien.ac.at/dokuwiki/doku.php?id=export_documentation
Code: http://sourceforge.net/p/jacq/legacy/
Webausgabe: http://herbarium.univie.ac.at/database/search.php
DiversityCollection
URL:http://diversityworkbench.net/Portal/DiversityCollection
Beschreibung: DiversityCollection is focused on the management of specimens in scientific collections and the handling of observation data. In this context it is designed to document any action concerning the collection, storage, exchange and treatment of specimens in a collection and is also appropriate to store observation data with analyses added. The Diversity GIS Editor is integrated. DiversityCollection is distinguished from other collection management systems by its focus on biological relations between organisms linked together as one or more specimens or observations (e. g., host, parasite, hyperparasite, symbionts, etc.). DiversityCollection keeps only data connected with the handling of collection specimens, parts of specimens and observations. Data of other realms like, e. g., taxonomy and references as well as scientific term systems and sampling plots are handled in separate modules. For an overview of the available DWB components see the Diversity Workbench Main Page.
BGBase
URL: http://www.bg-base.com/users.htm
Beschreibung: BG-BASE is a PC-based database application written primarily to handle the information management needs of institutions and individuals holding living and/or preserved collections of biological material, including botanic gardens, arboreta, zoos, herbaria, museums, libraries, university campuses, horticultural societies and private collections.
Support: proprietary product, BG-BASE is developed and supported from two international centers, one in the US and the other in the UK.
Kontakt: http://www.bg-base.com/contact.htm
Specify
URL: http://specifysoftware.org/
Kontakt: email: specify@ku.edu, Universität Kansas
Beschreibung: Specify is a database software application for museum and herbarium research data. It manages species and specimen information for computerizing collections, tracking museum specimen transactions, linking images to specimen records, and publishing catalog data to the Internet. Specify is written in Java. mehr: http://specifysoftware.org/
Download: http://specifysoftware.org/download/
Source code: http://specifysoftware.org/sourceforge/
BRAHMS
URL: http://herbaria.plants.ox.ac.uk/bol/
Kontakt: http://herbaria.plants.ox.ac.uk/bol/brahms/Home/Contact
Beschreibung: management system working with and integrating data and images from specimens, botanical surveys, field observations, living collections, seed banks and literature. more: http://herbaria.plants.ox.ac.uk/bol/brahms/Software
Digitalisierungsprojekte / -initiativen
Integrated Digitised Collections – iDigBio
Beschreibung: National Resource for Advancing Digitization of Biodiversity Collections (ADBC) funded by the National Science Foundation (USA). Through ADBC, data and images for millions of biological specimens are being made available in electronic format for the research community, government agencies, students, educators, and the general public.
Auf dem Website befindet sich ein Wiki mit vielen Hinweisen, hier besonders ‘iDigBio Data Ingestion’- ‘Digitisation Resources’: z.B Workflows and Protocols, Georeferencing Rescources. Sehr interessante Hinweise unter 'Working Groups / aOCR (Augmenting OCR) hier: 'OCR related Materials !'
URL: http://www.idigbio.org/ und http://www.idigbio.org/wiki/index.php/Digitization_Resources
Digitarium, Finnland
URL: http://www.digitarium.fi/content/statistics/
Beschreibung: Digitalisierungszentrum des Naturhistorischen Museums Finnland, Massendigitalisierung, industrieller Ansatz
Picturae / Naturalis, Niederlande
Beschreibung: Picturae ist eine niederländische Firma, die für das Naturkundemuseum Naturalis in Leiden, Niederlande, die Digitalisieung der Sammlungsbelege durchgeführt hat. Dafür wurde ein Workflow entwickelt, der die Massendigitalisierung von Herbarbelegen u.a. durch Einsatz von „Digitstreets“ ermöglicht.
URL: http://picturae.com/digitising/herbarium-sheets
URL Digitalisierung Naturalis, Leiden: http://science.naturalis.nl/en/collection/digitization/digitization/
MNHN Paris, Frankreich
URL: http://collections.mnhn.fr/wiki/attach/Visit_October2012/Paris-Herbarium-Digitization_2012-07-12.pdf (power point slides)
Beschreibung: Massendigitalisierung des gesamten Pariser Herbariums in industriellen Arbeitsabläufen
Next Generation Phenomics for the Tree of Life, USA
Beschreibung: The Next Generation Phenomics project seeks to develop and adapt tools to assemble large phenomic datasets in a rapid and automated way. This project consists of computer vision, natural language processing, and crowdsourcing components. The Computer Vision (CV) team is developing methods that automate the extraction and annotation of phenomic characters from digital images using computer learning approaches. The new CV algorithms can discern the presence/absence of features and assess their spatial relationships and appearance. The Natural Language Processing (NLP) group is developing software to transforms digitized taxonomical descriptions into taxon/character matrices for phylogenetic analyses. Also, because microbial descriptions often differ radically from those of other organisms, the NLP group is developing supervised learning strategies to extract phenomic characters from microbial descriptions. Finally, the crowdsourcing team has developed software, The Evolution Project, that works with MorphoBank to present images of character states to crowds for scoring.
Annotationssysteme
AnnoSys
URL:http://annosys.bgbm.fu-berlin.de/
URL Technische Dokumentation: http://wiki.bgbm.org/annosys/index.php?title=TechnicalDocumentation#Services_2
Beschreibung: The project´s objective is to exemplarily develop a specification for an annotation data repository for networked and highly complex biodiversity data. AnnoSys is based on a prototype developed in the context of SYNTHESYS and uses the Open Annotation Data Model and an RDF-database for storing the information. AnnoSys is implemented using the example of collection and observation data in the botanic domain provided by the GBIF/BioCASE system (currently over 50.8 million records, including 15 million records from natural history collection objects).
Filtered Push
URL: http://wiki.filteredpush.org/wiki/
Beschreibung:We are designing and implementing a network, which we term Filtered Push, to connect remote sites where annotations can be generated with the authoritative databases of the collections holding the vouchers to which those annotations apply.
Standards und Praxisregeln
Standards
TDWG - Biodiversity Information Standards
TDWG formuliert Biodiversitätsstandards und bildet Arbeitsgruppen zu verschiedenen Themenbereichen.
URL: http://www.tdwg.org/
Beschreibung: Biodiversity Information Standards (TDWG), also known as the Taxonomic Databases Working Group, is a not for profit scientific and educational association that is affiliated with the International Union of Biological Sciences. TDWG was formed to establish international collaboration among biological database projects. TDWG promoted the wider and more effective dissemination of information about the World's heritage of biological organisms for the benefit of the world at large. Biodiversity Information Standards (TDWG) now focuses on the development of standards for the exchange of biological/biodiversity data.
ABCD Access to Biological Collection Data (TDWG Standard)
URL: http://www.tdwg.org/standards/115/
Beschreibung: The Access to Biological Collections Data (ABCD) Schema is an evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data). The ABCD Schema attempts to be comprehensive and highly structured, supporting data from a wide variety of databases. It is compatible with several existing data standards. Parallel structures exist so that either (or both) atomised data and free-text can be accommodated. Version 1.2 is currently in use with the GBIF (Global Biodiversity Information Facility) and BioCASE (Biological Collection Access Service for Europe) networks. Apart from the GBIF and BioCASE networks, the potential for the application of ABCD extends to internal networks, or in-house legacy data access (e.g. datasets from external sources that shall not be converted and integrated into an institutions own data, but be kept separately, though easily accessible). By defining relations between terms, ABCD is a step towards an ontology for biological collections. ABCD concepts: http://wiki.tdwg.org/twiki/bin/view/ABCD/AbcdConcepts ABCD 2.06 Schema: http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/HTML/ABCD_2.06.html
Darwin Core (TDWG Standard)
URL: http://www.tdwg.org/standards/450/
Beschreibung: The Darwin Core is body of standards. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. The Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information. Included are documents describing how these terms are managed, how the set of terms can be extended for new purposes, and how the terms can be used. Used by: GBIF
TaxonX - systematics literature mark up standard
URL: http://www.tdwg.org/biodiv-projects/projects-database/view-project/512/
Beschreibung: TaxonX is an XML schema for encoding taxonomic treatments to: create open, persistent full digital surrogates of treatments; identify treatments and their major structures; identify textual data such as names, localities, characters, and citations
Audobon Core - Audobon Core Multimedia Recources Metadata Standard
URL: http://terms.tdwg.org/wiki/Audubon_Core
Beschreibung: The Audubon Core metadata schema ("AC") is a representation-neutral metadata vocabulary for describing biodiversity-related multimedia resources and collections.
EML Ecological Metadata language
URL: http://knb.ecoinformatics.org/#external//emlparser/docs/index.html
Beschreibung: Ecological Metadata Language (EML) is a metadata specification developed by the ecology discipline and for the ecology discipline. It is based on prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications). EML is implemented as a series of XML document types that can be used in a modular and extensible manner to document ecological data. Each EML module is designed to describe one logical part of the total metadata that should be included with any ecological dataset.
Spezifikation: http://knb.ecoinformatics.org/#external//emlparser/docs/eml-2.1.1/index.html
Guidelines, Praxisregeln, Direktiven
Metamorfoze Preservation Imaging Directives (with three quality levels), Niederlande
DFG Praxisregeln Digitalisierung
URL: http://www.dfg.de/formulare/12_151/index.jsp
GBIF - Best practice guide Georeferenzing
URL: http://www.gbif.org/resources/2809
MANIS Georeferenzierungsguidelines
URL: http://manisnet.org/GeorefGuide.html
Tools zur Informationsextraktion / Text Mining
Softwareentwicklungen, die die automatische Informationsextraktion unterstützen
Allgemein
National Centre for Text Mining, Manchester UK
URL: http://www.nactem.ac.uk/DID-MIBIO/
Beschreibung:The National Centre for Text Mining (NaCTeM) is the first publicly-funded text mining centre in the world. We provide text mining services in response to the requirements of the UK academic community. NaCTeM is operated by the University of Manchester.
Argo Projekt - A web based text mining workbench
Beschreibung: Argo is a workbench for building and running text-analysis solutions. It facilitates the development of custom workflows from a selection of elementary analytics.
URL: http://argo.nactem.ac.uk/
Taxon Namen
GBIF Name Finder
Code URL:https://github.com/silverbiology/gbif-namefinder
Beschreibung: Name Finder which is a GNA Name Finding API compliant java implementation based on lucene for finding scientific names in arbitrary text documents.
GBIF Name Parser
URL: http://tools.gbif.org/nameparser/ URL API Dokumentation:http://tools.gbif.org/nameparser/api.do
Beschreibung: The parser is written in java and based on regular expressions to disect name strings into its components. It does only keep name parts required to reconstruct a full 3-parted name with an optional subgenus, but ignores additional infraspecific parts such as the subspecies given for varieties.
Global Names
URL: http://www.silverbiology.com/projects/opensource/ URL: http://gnrd.globalnames.org/
Beschreibung: Global Names recognition and discovery tools (grnd) and services make it easy to find scientific names on web pages, PDFs, Microsoft Office documents, images, or in freeform text. Encrypted or image-based PDFs and image files first pass through an OCR routine using Tesseract prior to using the excellent TaxonFinder and NetiNeti names discovery engines. The language of incoming content is determined using unsupervised language detection. If the language is other than English, TaxonFinder is prefered. Found names can be optionally resolved against a number of resources.
TaxonFinder
URL:http://taxonfinder.org/about
Beschreibung:taxonfinder detects scientific names in plain text. Given a string, it will scan through the contents and use a dictionary-based approach to identifying which words and strings are latin scientific organism names. It detects names at all ranks including species, genera, subspecies and more. mehr Info:https://github.com/pleary/node-taxonfinder
GBIF Checklist Bank
Code URL: https://github.com/silverbiology/gbif-checklistbank
Beschreibung: Checklist Bank serves as a dynamic archive of "checklists," which are summarized lists of taxa or taxon names. Checklist Bank stores checklist data as it was provided by the data publisher. In addition, Checklist Bank attempts to collate different published checklist resources by tying the atomic elements of checklists, taxon names, to a common names dictionary.
SPECIES
URL:http://species.jensenlab.org/
Beschreibung: a standalone command line application capable of identifying taxonomic mentions in documents and mapping them to corresponding NCBI Taxonomy database entries.
Publikation:http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0065390
EUBON Taxonomic Backbone
URL: http://cybertaxonomy.eu/eu-bon/utis/
Beschreibung: The Unified Taxonomic Information Service (UTIS) is the taxonomic backbone for the EU BON project. The EU BON Taxonomic backbone allows running a federated search on multiple European checklists and returns a unified result set of the individual responses of the various checklists. The current implementation of the UTIS is still a prototype, which means that the API and data model may be changed until final release. It connects the web services of the Pan-European Species directories Infrastructure, PESI CoL, the Catalogue of Life and of the World Register of Marine Species (WoRMS). In future it will connect more data providers like EUNIS and Natura2000 in order to be compliant with the INSPIRE directive. Currently it is possible to search for taxa and synonyms by a scientific name or vernacular name string. In case of matching synonyms the according accepted taxon is resolved. The search results always include information on the classification and on related taxa so far as this data is delivered by the connected checklist providers.
Documentation: http://cybertaxonomy.eu/eu-bon/utis/doc.html
Taxamatch - fuzzy matching algorithm for genus and species scientific names
URL: Taxamatch Developers' Wiki: https://wiki.csiro.au/display/taxamatch/Home
Beschreibung: TAXAMATCH employs both phonetic and non-phonetic matching (to detect errors of either type, or both) along with a set of heuristic rules that are incorporated into pre- and post- filters at both genus and species epithet level. In the main, the pre-filters maximimise algorithm efficiency by ensuring that only a subset of available names have to be tested, while the post-filters apply heuristic reasoning to distinguish likely "true" from "false" near matches, although they may have the same calculated similarity.
Geografische Information
Tools zur (halb-)automatischen Georeferenzierung von Ortsinformationen
Geolocate
URL: http://www.museum.tulane.edu/geolocate/
Beschreibung: Geolocate is a platform for Georeferencing Natural History Collections Data. The GEOLocate project is an effort to develop software and services for translating textual locality descriptions associated with biodiversity collections data into geographic coordinates.
BioGeoMancer
Beschreibung:The Java BioGeomancer Core API is used for georeferencing localities (example: 5 Miles West of Berkeley) - nicht weiter gepflegt -
Code: http://code.google.com/p/biogeomancer-core/wiki/WebServicesScope
Lizenz: Apache 2.0
GeoLoc-Cria
URL:http://www.cria.org.br/eventos/iaed/amarino_pre.html
Beschreibung: allows a user to determine the geocode and associated error for a locality that is at a fixed distance and direction from a known locality. The database has approximately 110 thousand names of Brazilian geocoded localities.
Label Informationsextraktion
Apiary Project (High-Throughput Workflow for Computer-Assisted Human Parsing of Biological Specimen Label Data)
Beschreibung: The Texas Center for Digital Knowledge (TxCDK) at the University of North Texas and the Botanical Research Institute of Texas (BRIT) are conducting fundamental research with the goal of identifying how human intelligence can be combined with machine processes for effective and efficient transformation of textual museum specimen label information into high-quality machine-processible parsed data. This two-year project, which we call Apiary, will advance understanding of the workflow and processes best able to increase access to and use of digitized biological collection metadata within the stakeholder communities comprised of biologists, natural history museum collections managers, biodiversity standards groups, and the library and information science community.
Kontakt: Jason Best, URL: http://www.brit.org/StaffDirectory/Best, E-Mail: jbest@brit.org
SALIX, the Semi-automatic Label Information Extraction system
Beschreibung: nutzt OCR und weitere Software zur Erschließung von Etikettendaten von Sammlungsobjekten, entwickelt an der Arizona State University (ASU), USA. mehr: http://daryllafferty.com/salix/
URL: http://nhc.asu.edu/vpherbarium/canotia/SALIX3.pdf
Download: http://daryllafferty.com/salix/
Darwin Score
Source URL: http://github.com/jbest/darwin-score
Beschreibung: A method to evaluate text representing a natural history bio-collection object. Scores calculated in this process can help provide a rough evaluation of the quality of the text, whether it be generated by human transcription or OCR. Initially, this will focus on labels and annotations applied to herbarium collections. This process will check the text for words including taxonomic names, collector names, annotator names, location names, common abbreviations and other expected words (all stored in purpose-built dictionaries) and patterns such as dates, numbers, and geocoordinates. The text is given a score based on the number of the matches found. Includes Dictionaries and simple regex patterns.
Kontakt: Jason Best, Director of Biodiversity Informatics, Botanical Research Institute of Texas (BRIT), URL: http://www.brit.org/, E-mail: jbest@brit.org
ScioTR
Beschreibung: ScioTR is a new touch-enabled Windows 8 app which integrates Optical Character Recognition (OCR), Consensus Strategy, and Machine Learning (ML) to provide an efficient workflow for digitizing images into custom data fields. Label unspezifiziert: office labels,small forms,food labels,product labels,music collection,travel receipts,business cards,library card catalogs.
Data Quality Tools
BioVel - Data Refinement Workflow
Beschreibung: The Taxonomic Data Refinement Workflow provides an environment for preparing observational and specimen data sets for use in scientific analyses such as: species distribution analysis,species richness and diversity studies, species occurrence studies, historical analysis, and other spatio-temporal analyses.
BinHuM - Data quality tools
BiNHum Projekt URL: http://wiki.binhum.net/web/Hauptseite
Beschreibung: Quality Testing in verschiedenen Schritten, alle bezogen auf geografische Angaben: 1. Übersetzung von Ländernamen ins Englische, die Ländernamen verschiedener Sprachen aus Wikipedia 2. Zuordnung untergeordneter geografischer Einheiten, z.B. Bundesländer, die versehentlich dem ABCD Element ‚Country‘ zugeordnet werden 3. Wenn Element ‚Country‘ leer, dann Suche in anderen ABCD Elementen mit geografischem Bezug 4. Prüfung: Eintrag in ISO Country Code (ja/nein), wenn ja, dann Abgleich Eintrag ISO Code mit Country Name -> Ergebnis entweder ‚Warnhinweis‘ oder weitere Prüfung: Wenn Koordinaten vorhanden Test mit open Source GIS Applikation. 5. Suche im Geonames Server (Suchanzahl begrenzt)und Google Maps 6. Entwicklung eines Ozean Thesaurus
Data Quality Hub
URL: http://www.gbif.es/BDQ.php
Kontakt: GBIF Spanien
Beschreibung: liefert eine Übersicht über diverse Data Quality tools: Detection tools,Validation tools, Thesauri Checklists, Thesauri IsoCodes, Procedures and Best Practices
Workflow Umgebungen
Taverna
URL: http://www.taverna.org.uk/
Beschreibung: Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation. Taverna has been created by the myGrid team and is currently funded though FP7 projects BioVeL, SCAPE and Wf4Ever. Taverna workflows can be shared through the http://www.myexperiment.org/ site and be executed on HPC platforms, such as http://onlinehpc.com. Taverna is oriented towards the processing of potentially large volumes of data that require multiple steps. When compared to BPMN, Taverna's workflow notation is much simpler, as it doesn't try to support messages, events and other concepts present in the former notation.
Kepler
URL: http://kepler-project.org/
Beschreibung: The Kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler. Kepler is designed to help scientists, analysts, and computer programmers create, execute, and share models and analyses across a broad range of scientific and engineering disciplines.
Download Link: http://kepler-project.org/users/downloads
Argo Projekt - A web based text mining workbench
Beschreibung: Argo is a workbench for building and running text-analysis solutions. It facilitates the development of custom workflows from a selection of elementary analytics.
URL: http://argo.nactem.ac.uk/
Web Service Registries
Biodiversity Catalogue
URL:https://www.biodiversitycatalogue.org/
Beschreibung: The BiodiversityCatalogue is a centralised registry of curated biodiversity Web services. It allows you to easily discover, register, annotate, monitor and use Web services.
Stable Identifiers
Best Practises for Stable URIs
URL: http://wiki.pro-ibiosphere.eu/wiki/Best_practices_for_stable_URIs
Source Code and Example Elements: http://sourceforge.net/projects/stablecollectionidentifiers/
Hintergrund: "In a "stable identifier hackathon" in June 2013 in Edinburgh, five CETAF institutions (Royal Botanical Garden Edinburgh, Museum für Naturkunde in Berlin, Royal Botanic Garden Kew, National Museum of National History Paris and Botanical Museum Berlin-Dahlem) committed to a rapid pilot implementation of the system. Naturalis Biodiversity Center in the Netherlands also plans to join this effort." (from: http://www.pro-ibiosphere.eu/news/4296_stable%20identifiers%20for%20specimens%20%E2%80%93%20a%20cetaf%20istc%20initiative%20supported%20by%20pro-ibiosphere%20/)