Difference between revisions of "BeginnersGuide"

From Berlin Harvesting and Indexing Toolkit
Jump to: navigation, search
m
(introduction text expanded)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
B-HIT is a harvesting and indexing toolkit, developed in Java.
+
With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures. The Global Biodiversity Information Facility (GBIF) implemented a Harvesting and Indexing Toolkit (HIT), which largely automates data harvesting activities for hundreds of collection and observational data providers. The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities. The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups.
  
It makes it possible to harvest biodiversity data and index them in a MySQL database.
+
It makes it possible to harvest biodiversity data and index them in a MySQL database by using established pipeline like BioCASe and IPT. After harvesting B-HIT perfoms several data cleaning steps. Original provider data are stored in parallel to cleaned data.
  
 +
B-HIT is currently used by BiNHum, OpenUp!, World Flora Online and GGBN. We recommend to set up a SOLR instance in addition to the MySQL database to speed up queries in a data portal. Please check out the Wiki of the [http://wiki.bgbm.org/gps GGBN Portal Software] for further details about the SOLR instance and an open source portal software optimized for usage with B-HIT.
  
== Supported schema and protocols ==
+
== Supported schemata and protocols ==
 +
 
 +
B-HIT support all established collection data exchange standards. In addition extensions developed within special interest networks (GGBN, EFG (Geosciences)) are also supported.
  
 
ABCD: 2.06, 2.1, EFG, GGBN, GGBN Enviro, ABCD - Archive
 
ABCD: 2.06, 2.1, EFG, GGBN, GGBN Enviro, ABCD - Archive
 +
 
DwC: DwC 1.0, 1.4, 1.4-Geospatial, 1.4-Curatorial, MaNIS 1.0, MaNIS 1.21, DwC Archive, DwC GGBN
 
DwC: DwC 1.0, 1.4, 1.4-Geospatial, 1.4-Curatorial, MaNIS 1.0, MaNIS 1.21, DwC Archive, DwC GGBN

Latest revision as of 16:14, 3 February 2016

With the rapidly growing number of data publishers, the process of harvesting and indexing information to offer advanced search and discovery becomes a critical bottleneck in globally distributed primary biodiversity data infrastructures. The Global Biodiversity Information Facility (GBIF) implemented a Harvesting and Indexing Toolkit (HIT), which largely automates data harvesting activities for hundreds of collection and observational data providers. The team of the Botanic Garden and Botanical Museum Berlin-Dahlem has extended this well-established system with a range of additional functions, including improved processing of multiple taxon identifications, the ability to represent associations between specimen and observation units, new data quality control and new reporting capabilities. The open source software B-HIT can be freely installed and used for setting up thematic networks serving the demands of particular user groups.

It makes it possible to harvest biodiversity data and index them in a MySQL database by using established pipeline like BioCASe and IPT. After harvesting B-HIT perfoms several data cleaning steps. Original provider data are stored in parallel to cleaned data.

B-HIT is currently used by BiNHum, OpenUp!, World Flora Online and GGBN. We recommend to set up a SOLR instance in addition to the MySQL database to speed up queries in a data portal. Please check out the Wiki of the GGBN Portal Software for further details about the SOLR instance and an open source portal software optimized for usage with B-HIT.

Supported schemata and protocols

B-HIT support all established collection data exchange standards. In addition extensions developed within special interest networks (GGBN, EFG (Geosciences)) are also supported.

ABCD: 2.06, 2.1, EFG, GGBN, GGBN Enviro, ABCD - Archive

DwC: DwC 1.0, 1.4, 1.4-Geospatial, 1.4-Curatorial, MaNIS 1.0, MaNIS 1.21, DwC Archive, DwC GGBN