Difference between revisions of "Database"

From Berlin Harvesting and Indexing Toolkit
Jump to: navigation, search
Line 5: Line 5:
 
*first, the original data delivered by the provider is saved in the raw* tables (rawidentification, rawcoordinates, rawoccurrence, rawpreservationtype, rawhigher)
 
*first, the original data delivered by the provider is saved in the raw* tables (rawidentification, rawcoordinates, rawoccurrence, rawpreservationtype, rawhigher)
 
*after running the quality tests, improved data is saved in tables without the raw prefix (identification, coordinates, occurrence, perservationtype, higher)
 
*after running the quality tests, improved data is saved in tables without the raw prefix (identification, coordinates, occurrence, perservationtype, higher)
 +
 +
 +
There are 2 central tables:
 +
*the bio_datasource table, which stores every datasource (accespoint, name, number of records, standard and protocol).
 +
*the tripleidstore table, which stores every triple ID (unitID, collectionCode, institutionCode) met during the processing of the harvested records.

Revision as of 14:52, 16 November 2015

The default database is MySQL. If you want to use another system, you will have to change the configuration (application.properties) and add the library.


Data is saved two times:

  • first, the original data delivered by the provider is saved in the raw* tables (rawidentification, rawcoordinates, rawoccurrence, rawpreservationtype, rawhigher)
  • after running the quality tests, improved data is saved in tables without the raw prefix (identification, coordinates, occurrence, perservationtype, higher)


There are 2 central tables:

  • the bio_datasource table, which stores every datasource (accespoint, name, number of records, standard and protocol).
  • the tripleidstore table, which stores every triple ID (unitID, collectionCode, institutionCode) met during the processing of the harvested records.