Mandatory and recommended fields for sharing data with GGBN
Introduction
ABCD and DarwinCore both contain hundreds of elements. Some are mandatory. The curator of the DNA Bank has to define the Metadata of the DNA Bank collection first. After that the real mapping can begin. You can use BioCASe or IPT to provide data to GGBN, but have to make sure you use the extensions for the GGBN Data Standard. DarwinCore and ABCD have a lot of terms in common, but use different names for the terms. Therefore the following section are splitted into ABCD and DwC. GGBN supports both standards, both have pros and cons and work well for GGBN. It depends on your data and if your institution is already providing data to GBIF which one fits better. Please contact us if you need help.
Asterisk (*) means mandatory for GGBN, all others are recommended.
Separate mappings for voucher specimens (mapping 1) and tissue and DNA samples (mapping 2) are preferred. This enables third parties such as GBIF or GGBN to harvest only those data that are relevant.
All metadata information should be in English! Please map as many elements as possible, the better your data visible and searchable in GBIF and GGBN the better for other scientists and at the end for yourself!Mapping principles
You can provide your data to GGBN in different ways. It depends on your other collections and the way you provide your data already to GBIF which one fits best for you. GGBN can handle all of the examples below, but it might make a difference for your occurrences at GBIF. Most important is, that you provide the references between the records in a standardized way. DNA collections comprise many difference use cases. Not in all cases you will have a physicial tissue or a physicial specimen. Please contact us if you need help.
Mapping with Specimens
A specimen in terms of GGBN and GBIF is a preserved specimen from wild origin or a living specimen (cultivated plant, captivated animal, cell culture). In case your collection or certain DNA samples have no corresponding tissue sample, you should refer to the "mother" specimen directly.
Mapping without Specimens
Often tissue samples are collected without collecting/killing the whole organism. There are two ways to handle this.
Environmental Samples
Environmental Samples and even more environmental DNA can currently be provided to GGBN and GBIF in a very basic way only. We are working on a proposal how to map these important sample information in a way they deserve it. Have a look at our Environmental DNA use case
ABCD
Metadata of your DNA and tissue bank
Metadata are very important. They will be displayed with every single DNA record in the portal. Be careful with the IPR statements!
Group | Element | Remarks | Example |
---|---|---|---|
ContentContact | *Address | Complete Address of the responsible person | Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Koenigin-Luise-Str. 6-8, 14195 Berlin, Germany |
ContentContact | valid email address; will be used by the web portal for sending you annotations etc. | please use @ instead of masquerades like "[at]" | |
ContentContact | *Name | Person, person team responsible for curation of your DNA Bank | Gabi Droege |
Description/Representation | *Details | Short text to describe the focus and number of samples in your DNA collection. It should include the following phrase: The DNA bank is part of the Global Genome Biodiversity Network (GGBN) which was founded in 2011. The network provides a technically optimized DNA collection service facility for all biological research accessible via one central web portal. The network promotes deposition of well documented reference DNA samples after project completion or data publication from scientists of other universities and institutions. | The DNA bank of the Botanic Garden and Botanical Museum Berlin-Dahlem holds currently a collection of 20000 plant DNA and tissue samples growing constantly. Its core collection focuses on the flora of Germany but it comprises botanical samples collected worldwide. |
Description/Representation | *@language | language of description | should be "en" as literal |
Description/Representation | *Title | Short title that describes your DNA collection | DNA Bank of the Herbarium Berolinense |
IPRStatements/Citation | *Text | Droege, G. (Ed.) 2008 - (continuously updated): DNA samples of the DNA bank at the BGBM (Botanic Garden and Botanical Museum Berlin-Dahlem). | |
IPRStatements/Citation | *@language | language of citation | should be "en" as literal |
IPRStatements/Copyright | *Text | The copyright for any material created by the DNA bank of the BGBM is reserved. The duplication or use of information and data such as texts or images is only permitted with the indication of the source or with prior approval by the BGBM. | |
IPRStatements/Copyright | *@language | language of copyright | should be "en" as literal |
IPRStatements/TermsOfUse | *Text | The use of the data is allowed only for non-profit scientific use and for non-profit nature conservation purpose. The data bases or part of it may only be used or copied by the written permission from the legal owner. | |
IPRStatements/TermsOfUse | *@language | language of the terms of use | should be "en" as literal |
IconURI | complete url path to the logo of your institution | ||
Owner | *Address | Complete Address of the institution that owns the DNA bank samples and data | |
Owner/Representation | *Text | name of your institution | Botanic Garden an Botanical Museum Berlin-Dahlem (Freie Universität Berlin) |
Owner | *URL | path to website of your institution | http://www.bgbm.org |
RevisionData | *DateModified | date of last modification of your data | |
TechnicalContact | *Address | Complete Address of the responsible person | Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Koenigin-Luise-Str. 6-8, 14195 Berlin, Germany |
TechnicalContact | valid email address; will be used by the web portal for sending you annotations etc. | please use @ instead of masquerades like "[at]" | |
TechnicalContact | *Name | Person, person team responsible for technical issues of your database | Gabriele Droege |
Unit | *RecordBasis | this value must be part of the ABCD vocabulary | must be "MaterialSample" |
Unit | *KindOfUnit | the type of sample | please use our recommended vocabulary: "DNA", "RNA", "Protein", "tissue", "culture", "specimen", "environmental sample" |
Sample data
The sample identifiers
We use identifier the same way as GBIF does. In addition to the traditional triple ID (see below) you can also provide the GUID. Please note that UnitID should be unique in one database! Mostly the extraction number is used for it.
Group | Element | Remarks | Example |
---|---|---|---|
Unit | *SourceID | short description of relevant collection | should be "DNA Bank" or "tissue collection" |
Unit | *SourceInstitutionID | short name/abbreviation of relevant institution | BGBM |
Unit | *UnitID | DNA extraction number or tissue number | DNA 123 |
Unit | UnitGUID | sample record GUID | f6c7fd5a-2f04-4dcb-ba38-0230bf196d30 |
GGBN terms
The following fields are highly recommended, some are mandatory (marked with *), but feel free to map more!
Group | Element | Remarks | Example |
---|---|---|---|
*RecordBasis | The specific nature of the data record, controlled vocabulary! | "PreservedSpecimen", "MaterialSample", "FossilSpecimen", "LivingSpecimen", "HumanObservation", "MachineObservation"; for Tissue or DNA samples use "MaterialSample" | |
*materialSampleType | Classification of kind of physical sample in addition to BasisOfRecord/RecordBasis and Preparation Type | "tissue", "DNA", "specimen" | |
SpecimenUnit | Disposition | The current state of a specimen with respect to the collection identified in collectionCode or collectionID | in collection, missing, voucher elsewhere, duplicates elsewhere, consumed, deaccessioned, dead |
SpecimenUnit/Preparation | *preparationType | description of type of material, free text | for DNA: gDNA, eDNA, aDNA; for tissues/specimens: leaf, muscle, leg, blood |
SpecimenUnit/Preparation | preparationDate | date of DNA extraction | if unknown type "unknown" |
SpecimenUnit/Preparation | preparatioMaterials | extraction kit or protocoll | if unknown type "unknown" |
SpecimenUnit/Preparation | preparedBy | extraction staff | if unknown type "unknown" |
SpecimenUnit/Preservation | preservation | preservation of the tissue or DNA | if unknown type "unknown" |
SpecimenUnit/Loan/Permit | *permitStatus | Information about the presence, absence or other basic status of permits associated with the sample(s), controlled vocabulary! | Permit available, Permit not required, Permit not available, Unknown Material collected after 2014-10-12 cannot be in "Unknown" permit status! |
SpecimenUnit/Loan/Permit | *permitStatusQualifier | Description of why a certain permit was not required or why Permit Status is unknown | "no national requirement for a permit at date of access", "officially authorized illegal holder", "collected on private land", "pre-Nagoya" |
SpecimenUnit/Loan/Permit | *permitType | A permit is a document that allows someone to take an action that otherwise would not be allowed, controlled vocabulary! | Collecting Permit, Import Permit, Export Permit, Intellectual Property Rights, Copyright, Patent, Data use, Phytosanitary, Salvage, Exemption Permit, Material Transfer Agreement, Internationally Recognized Certificate of Compliance, Contract, Memorandum of Understanding, Memorandum of Cooperation, Veterinary Certificate, Human Pathogens, Genetically Modified Organism, Other |
SpecimenUnit/Loan/Permit | *permitText | The text of a permit related to the gathering/shipping or further details | |
GGBN/Amplification | amplificationDate | date of amplification; if unknown or general without content you don't have to map it | should be ISO format yyyy-mm-dd |
GGBN/Amplification | marker | COX1 | |
GGBN/Amplification | geneticAccessionNumber | the accession number of NCBI/EMBL/DDBJ or the process ID of BOLD; this is a repeatable element, you can provide as much as you want | e.g. AJ45567 |
GGBN/Amplification | genBankNumber-URI | complete link to the accession number of NCBI/EMBL/DDBJ or the process ID of BOLD | |
SpecimenUnit | blockedUntil | in case the sample is blocked until a specific date it's nevertheless searchable but customers cannot order it | ISO format |
SpecimenUnit | blocked | sample is blocked (e.g. because it is consumed), but data are still available | Yes/No |
GGBN | concentration | concentration of the DNA | 65,34 |
GGBN | @unit | unit of the concentration | µg/ml |
GGBN | ratioOfAbsorbance260_230 | map only if filled with content | 1,2 |
GGBN | ratioOfAbsorbance260_280 | map only if filled with content | 1,8 |
Related Specimen Data
The voucher identifiers
Group | Element | Remarks | Example |
---|---|---|---|
Associations/UnitAssociation | *UnitID | the UnitID or CatalogueNumber used for GBIF | e.g. the barcode number of your specimens |
Associations/UnitAssociation | *SourceInstitutionID | the SourceInstitutionID or InstitutionCode used for GBIF | e.g. the acronym of your institution |
Associations/UnitAssociation | *SourceName | the SourceID or CollectionCode used for GBIF | e.g. the name of the collection where the specimen belongs to, e.g. "Birds" |
Associations/UnitAssociation | *AssociationType | the Relation between the DNA and the voucher | e.g. "DNA and voucher from same individual" |
Associations/UnitAssociation | *DatasetAccessPoint | the wrapper url of the voucher record | e.g. "http://ww3.bgbm.org/biocase/pywrapper.cgi?dsa=Herbar" |
Gathering event of the voucher
All elements marked with * will be indexed and must be mapped! Mapping these gathering facts twice (one for the specimen database and one for the DNA mapping) is required because of indexing and later search purposes.
Group | Element | Remarks | Example |
---|---|---|---|
*CollectorsFieldNumber | the number the collector gave to the specimen in the field, often used in Botany but not in Zoology; map it if you have content | e.g. 765/10 | |
Gathering/Agents/GatheringAgent | *FullName | the Collector or Collector Team | e.g. Scholz & Sipman |
Gathering/Altitude | *LowerValue | if you have both the lower and upper value in different columns map both field, if not map LowerValue only | e.g. 100 |
Gathering/Altitude | UpperValue | e.g. 200 | |
Gathering/Altitude | *Unit | e.g. m | |
Gathering/Country | *ISO3166Code | ISO code of the country where the voucher was collected | e.g. US |
Gathering/Country | *Name | english Name of the country | e.g. United States of America |
Gathering/DateTime | *DateText | date when voucher was collected, if you have content yuo can also use ISO format | e.g. 21. April 1951 |
Gathering/Locality | *LocalityText | e.g. 5km NO Berlin | |
Gathering/NamedArea | AreaName | name of continent | e.g. Europe |
Gathering/NamedArea | @language | language of the name of continent | e.g. "en" |
Gathering/SiteCoordinates | *LatitudeDecimal | e.g. -15,88876 | |
Gathering/SiteCoordinates | *LongitudeDecimal | e.g. 72,88876 |
Identification history of the voucher
Mostly specimen databases record the complete determination or identification history of a single specimen. For GGBN we try to get all available information into the portal.
Group | Element | Remarks | Example |
---|---|---|---|
Identification | *PreferredFlag | mark the presently preferred Identification | e.g. true, false, 0, 1 |
Identification/HigherTaxon | *HigherTaxonName | the name of the higher taxon, please have a look at the BioCASE Wiki for how to prepare your database for the repeatable elements | e.g. Asteraceae, Animalia |
Identification/HigherTaxon | *HigherTaxonRank | the rank of the taxon in english or latin | e.g. familia, regnum, phylum |
Identification/ScientificName | *FullScientificName | the complete name of the taxon including Authors (and years for animals) | e.g. Aaronsohnia factorovskyi Warb. & Eig. var. factorovskyi |
Identification/ScientificName/NameAtomised | *FirstEpithet | Please note: ABCD has several container for NameAtomised, it depens on your sampes which one to choose (Botanical or Zoology etc.) | e.g. factorovskyi |
Identification/ScientificName/NameAtomised | *GenusOrMonomial | e.g. Aaronsohnia | |
Identification/ScientificName/NameAtomised | *InfraspecificEpithet | Please note: ABCD has several container for NameAtomised, it depens on your sampes which one to choose (Botanical or Zoology etc.) | e.g. factorovskyi |
Identification/ScientificName/NameAtomised | *Rank | Please note: ABCD has several container for NameAtomised, it depens on your sampes which one to choose (Botanical or Zoology etc.) | e.g. var. |
Multimedia items of the voucher
These should be mapped in the specimen mapping, not the DNA mapping.
Darwin Core
We recommend using IPT for providing data as Darwin Core-Archive.
Mandatory for GGBN: Select occurrence as core and add GGBN Material Sample and Darwin Core Resource Relationship as extensions.
Metadata of your DNA and tissue bank
Please follow the example in the IPT documentation: https://github.com/gbif/ipt/wiki/IPT2ManualNotes.wiki#basic-metadata
Note: The description of your dataset should contain the following phrase: The DNA bank is part of the Global Genome Biodiversity Network (GGBN) which was founded in 2011. The network provides a technically optimized DNA collection service facility for all biological research accessible via one central web portal. The network promotes deposition of well documented reference DNA samples after project completion or data publication from scientists of other universities and institutions.
Sample data
The sample identifiers
We use identifier the same way as GBIF does. In addition to the traditional triple ID (see below) you can also provide the GUID. Please note that CatalogNumber should be unique in one database! Mostly the extraction number is used for it.
Group | Element | Remarks | Example |
---|---|---|---|
Occurrence | *institutionCode | short name/abbreviation of relevant institution | NMNH |
Occurrence | *collectionCode | short description of relevant collection | should be "DNA Bank" or "tissue collection" |
Occurrence | *catalogNumber | DNA extraction number or tissue number | DNA 123 |
Occurrence | occurrenceID | sample record GUID | http://n2t.net/ark:/65665/304ed89be-b1ed-4e71-b210-dbbddfadb776 |
GGBN terms
The following fields are highly recommended, some are mandatory (marked with *), but feel free to map more!
Group | Element | Remarks | Example |
---|---|---|---|
Occurrence Core | *basisOfRecord | The specific nature of the data record, controlled vocabulary! | "PreservedSpecimen", "MaterialSample", "FossilSpecimen", "LivingSpecimen", "HumanObservation", "MachineObservation"; for Tissue or DNA samples use "MaterialSample" |
disposition | The current state of a specimen with respect to the collection identified in collectionCode or collectionID, use e.g. if the sample/specimen is consumed or deaccessioned | in collection, missing, voucher elsewhere, duplicates elsewhere, consumed, deaccessioned, dead | |
Material Sample Extension | *materialSampleType | Classification of kind of physical sample in addition to BasisOfRecord/RecordBasis and Preparation Type | "tissue", "DNA", "specimen" |
Preparation Extension | *preparationType | description of type of material, free text | or DNA: gDNA, eDNA, aDNA; for tissues/specimens: leaf, muscle, leg, blood |
Preparation Extension | preparationDate | date of DNA extraction | if unknown type "unknown" |
Preparation Extension | preparatioMaterials | extraction kit or protocoll | if unknown type "unknown" |
Preparation Extension | preparationStaff | extraction staff | if unknown type "unknown" |
Preservation Extension | preservation | preservation of the tissue or DNA | if unknown type "unknown" |
Permit Extension | *permitStatus | Information about the presence, absence or other basic status of permits associated with the sample(s), controlled vocabulary! | Permit available, Permit not required, Permit not available, Unknown Material collected after 2014-10-12 cannot be in "Unknown" permit status! |
Permit Extension | *permitStatusQualifier | Description of why a certain permit was not required or why Permit Status is unknown | "no national requirement for a permit at date of access", "officially authorized illegal holder", "collected on private land", "pre-Nagoya" |
Permit Extension | *permitType | A permit is a document that allows someone to take an action that otherwise would not be allowed, controlled vocabulary! | Collecting Permit, Import Permit, Export Permit, Intellectual Property Rights, Copyright, Patent, Data use, Phytosanitary, Salvage, Exemption Permit, Material Transfer Agreement, Internationally Recognized Certificate of Compliance, Contract, Memorandum of Understanding, Memorandum of Cooperation, Veterinary Certificate, Human Pathogens, Genetically Modified Organism, Other |
Permit Extension | *permitText | The text of a permit related to the gathering/shipping or further details | |
Amplification Extension | amplificationDate | date of amplification; if unknown or general without content you don't have to map it | should be ISO format yyyy-mm-dd |
Amplification Extension | marker | COX1 | |
Amplification Extension | geneticAccessionNumber | the accession number of NCBI/EMBL/DDBJ or the process ID of BOLD; this is a repeatable element, you can provide as much as you want | e.g. AJ45567 |
Amplification Extension | genBankNumber-URI | complete link to the accession number of NCBI/EMBL/DDBJ or the process ID of BOLD | |
Loan Extension | blockedUntil | in case the sample is blocked until a specific date it's nevertheless searchable but customers cannot order it | ISO format |
Loan Extension | blocked | sample is blocked (e.g. because it is consumed), but data are still available | Yes/No |
Material Sample Extension | concentration | concentration of the DNA | 65,34 |
Material Sample Extension | @unit | unit of the concentration | µg/ml |
Material Sample Extension | ratioOfAbsorbance260_230 | map only if filled with content | 1,2 |
Material Sample Extension | ratioOfAbsorbance260_280 | map only if filled with content | 1,8 |
Related Specimen Data
The voucher identifiers
Please use the Darwin Core Resource Relationship Class
Group | Element | Remarks | Example |
---|---|---|---|
Related Resource | *relatedResourceID | concatenated string with the triple id used for GBIF plus the accesspoint of the IPT archive providing this record and the guid (the latter one is not mandatory, but recommended) | e.g. catalogNumber=11718653&collectionCode=Botany&institutionCode=US&guid=http://n2t.net/ark:/65665/303e7eecb-4e8d-4fce-a251-6c8fa3f2863d&accesspoint=http://collections.mnh.si.edu/ipt/archive.do?r=nmnhdwca |
Resource Relationship | *relationshipOfResource | the Relation between the DNA and the voucher or the tissue and the voucher | e.g. " same individual", “same population”, “same ex situ individual” |
Gathering event of the voucher
All elements marked with * will be indexed and must be mapped! Mapping these gathering facts twice (one for the specimen database and one for the DNA mapping) is required because of indexing and later search purposes.
Group | Element | Remarks | Example |
---|---|---|---|
Occurrence Core | *recordNumber | the number the collector gave to the specimen in the field, often used in Botany but not in Zoology; map it if you have content | e.g. 765/10 |
Occurrence Core | *recordedBy | the Collector or Collector Team | e.g. Scholz & Sipman |
Occurrence Core | *minimumElevationInMeters | if you have both the lower and upper value in different columns map both field, if not map LowerValue only | e.g. 100 |
Occurrence Core | maximumElevationInMeters | e.g. 200 | |
Occurrence Core | *countryCode | ISO code of the country where the voucher was collected | e.g. US |
Occurrence Core | *country | english Name of the country | e.g. United States of America |
Occurrence Core | *eventDate | date when voucher was collected, if you have content you can also use ISO format | e.g. 21. April 1951 |
Occurrence Core | *locality | e.g. 5km NO Berlin | |
Occurrence Core | continent | name of continent | e.g. Europe |
Occurrence Core | *decimalLatitude | e.g. -15,88876 | |
Occurrence Core | *decimalLongitude | e.g. 72,88876 |
Scientific Name of the voucher
Group | Element | Remarks | Example |
---|---|---|---|
Occurrence Core | *Family | Please provide at least one higher taxon, usually family | e.g. Asteraceae, Paridae |
Occurrence Core | *scientificName | the complete name of the taxon including Authors (and years for animals) | e.g. Aaronsohnia factorovskyi Warb. & Eig. var. factorovskyi |
Occurrence Core | *specificEpithet | e.g. factorovskyi | |
Occurrence Core | *genus | e.g. Aaronsohnia | |
Occurrence Core | *infraspecificEpithet | e.g. factorovskyi | |
Identification/ScientificName/NameAtomised | *taxonRank | e.g. var. |
Multimedia items of the voucher
These should be mapped in the specimen mapping, not the DNA/tissue mapping. If you have histological images you should map them here.