From Berlin Harvesting and Indexing Toolkit
Jump to: navigation, search

The default database is MySQL. If you want to use another system, you will have to change the configuration ( and add the library.

Data is saved two times:

  • first, the original data delivered by the provider is saved in the raw* tables (rawidentification, rawcoordinates, rawoccurrence, rawpreservationtype, rawhigher)
  • after running the quality tests, improved data is saved in tables without the raw prefix (identification, coordinates, occurrence, perservationtype, higher)

There are 2 central tables:

  • the bio_datasource table, which stores every datasource (accespoint, name, number of records, standard and protocol).
  • the tripleidstore table, which stores every triple ID (unitID, collectionCode, institutionCode) met during the processing of the harvested records.