ABCD2Mapping (French) > FAQ > BeginnersGuide (French) > VersionHistory > ABCD2Concepts

Revision as of 09:47, 27 September 2018

Introduction

ABCD and DarwinCore both contain hundreds of elements. Some are mandatory. The curator of the DNA Bank has to define the Metadata of the DNA Bank collection first. After that the real mapping can begin. You can use BioCASe or IPT to provide data to GGBN, but have to make sure you use the extensions for the GGBN Data Standard. DarwinCore and ABCD have a lot of terms in common, but use different names for the terms. Therefore the following section are splitted into ABCD and DwC. GGBN supports both standards, both have pros and cons and work well for GGBN. It depends on your data and if your institution is already providing data to GBIF which one fits better. Please contact us if you need help.

Asterisk (*) means mandatory for GGBN, all others are recommended.

Separate mappings for voucher specimens (mapping 1) and tissue and DNA samples (mapping 2) are preferred. This enables third parties such as GBIF or GGBN to harvest only those data that are relevant.

All metadata information should be in English! Please map as many elements as possible, the better your data visible and searchable in GBIF and GGBN the better for other scientists and at the end for yourself!

Mapping principles

You can provide your data to GGBN in different ways. It depends on your other collections and the way you provide your data already to GBIF which one fits best for you. GGBN can handle all of the examples below, but it might make a difference for your occurrences at GBIF. Most important is, that you provide the references between the records in a standardized way. DNA collections comprise many difference use cases. Not in all cases you will have a physicial tissue or a physicial specimen. Please contact us if you need help.

Mapping with Specimens

A specimen in terms of GGBN and GBIF is a preserved specimen from wild origin or a living specimen (cultivated plant, captivated animal, cell culture). In case your collection or certain DNA samples have no corresponding tissue sample, you should refer to the "mother" specimen directly.

Specimens are often deposited in external collections. If you provide the proper identifiers the GGBN Data Portal can aggregate this information! We know that it is impossible to track this information back for all legacy data. But if you keep this in mind in your daily workflows for all new records and samples the quality and value of your data and collection will increase enormously.

Mapping all at once	Mapping two sources	Mapping three sources
300px	300px	300px
Pros: - only one mapping needed Cons: - double occurrences at GBIF, since GBIF can't handle associataions	Pros: - Only underlying specimens records are provided, no double entries at GBIF Cons: - two mappings needed (but can be easily copied)	Pros: - Only underlying specimens records are provided, no double entries at GBIF - Clear distinction between collections, some managers prefer this mapping type Cons: - three mappings needed (but can be easily copied)

Mapping without Specimens

Often tissue samples are collected without collecting/killing the whole organism. There are two ways to handle this.

Mapping eVouchers	Mapping tissues only
300px	300px
In case you want to track an individual, sampled multiple times (e.g. vertebrates caught multiple times and marked individually, trees) you should add an Observation record as top level. This could also be accompanied by images, then called eVoucher.	In case you don't want to track individuals, just provide your tissue records to GBIF as top level occurrence.

Environmental Samples

Environmental Samples and even more environmental DNA can currently be provided to GGBN and GBIF in a very basic way only. We are working on a proposal how to map these important sample information in a way they deserve it. Have a look at our Environmental DNA use case

ABCD

Metadata of your DNA and tissue bank

Metadata are very important. They will be displayed with every single DNA record in the portal. Be careful with the IPR statements!

Group	Element	Remarks	Example
ContentContact	*Address	Complete Address of the responsible person	Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Koenigin-Luise-Str. 6-8, 14195 Berlin, Germany
ContentContact	*Email	valid email address; will be used by the web portal for sending you annotations etc.	please use @ instead of masquerades like "[at]"
ContentContact	*Name	Person, person team responsible for curation of your DNA Bank	Gabi Droege
Description/Representation	*Details	Short text to describe the focus and number of samples in your DNA collection. It should include the following phrase: The DNA bank is part of the Global Genome Biodiversity Network (GGBN) which was founded in 2011. The network provides a technically optimized DNA collection service facility for all biological research accessible via one central web portal. The network promotes deposition of well documented reference DNA samples after project completion or data publication from scientists of other universities and institutions.	The DNA bank of the Botanic Garden and Botanical Museum Berlin-Dahlem holds currently a collection of 20000 plant DNA and tissue samples growing constantly. Its core collection focuses on the flora of Germany but it comprises botanical samples collected worldwide.
Description/Representation	*@language	language of description	should be "en" as literal
Description/Representation	*Title	Short title that describes your DNA collection	DNA Bank of the Herbarium Berolinense
IPRStatements/Citation	*Text		Droege, G. (Ed.) 2008 - (continuously updated): DNA samples of the DNA bank at the BGBM (Botanic Garden and Botanical Museum Berlin-Dahlem).
IPRStatements/Citation	*@language	language of citation	should be "en" as literal
IPRStatements/Copyright	*Text		The copyright for any material created by the DNA bank of the BGBM is reserved. The duplication or use of information and data such as texts or images is only permitted with the indication of the source or with prior approval by the BGBM.
IPRStatements/Copyright	*@language	language of copyright	should be "en" as literal
IPRStatements/TermsOfUse	*Text		The use of the data is allowed only for non-profit scientific use and for non-profit nature conservation purpose. The data bases or part of it may only be used or copied by the written permission from the legal owner.
IPRStatements/TermsOfUse	*@language	language of the terms of use	should be "en" as literal
	IconURI	complete url path to the logo of your institution
Owner	*Address	Complete Address of the institution that owns the DNA bank samples and data
Owner/Representation	*Text	name of your institution	Botanic Garden an Botanical Museum Berlin-Dahlem (Freie Universität Berlin)
Owner	*URL	path to website of your institution	http://www.bgbm.org
RevisionData	*DateModified	date of last modification of your data
TechnicalContact	*Address	Complete Address of the responsible person	Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Koenigin-Luise-Str. 6-8, 14195 Berlin, Germany
TechnicalContact	*Email	valid email address; will be used by the web portal for sending you annotations etc.	please use @ instead of masquerades like "[at]"
TechnicalContact	*Name	Person, person team responsible for technical issues of your database	Gabriele Droege
Unit	*RecordBasis	this value must be part of the ABCD vocabulary	must be "MaterialSample"
Unit	*KindOfUnit	the type of sample	e.g. DNA

Sample data

The sample identifiers

We use identifier the same way as GBIF does. In addition to the traditional triple ID (see below) you can also provide the GUID. Please note that UnitID should be unique in one database! Mostly the extraction number is used for it.

Group	Element	Remarks	Example
Unit	*SourceID	short description of relevant collection	should be "DNA Bank" or "tissue collection"
Unit	*SourceInstitutionID	short name/abbreviation of relevant institution	BGBM
Unit	*UnitID	DNA extraction number or tissue number	DNA 123
Unit	UnitGUID	sample record GUID	f6c7fd5a-2f04-4dcb-ba38-0230bf196d30

GGBN terms

The following fields are highly recommended, some are mandatory (marked with *), but feel free to map more!

Please have a look at the GGBN Data Standard documentation where all GGBN terms are described in detail. The table below is just listing the most important ones. Since the Nagoya Protocol has come into force we higly recommend to map the Permit fields. They will become mandatory for providing data to GGBN at the end of 2020. Until then all core members must have implemented the required fields in their databases.

Group	Element	Remarks	Example
	*RecordBasis	The specific nature of the data record, controlled vocabulary!	"PreservedSpecimen", "MaterialSample", "FossilSpecimen", "LivingSpecimen", "HumanObservation", "MachineObservation"; for Tissue or DNA samples use "MaterialSample"
	*materialSampleType	Classification of kind of physical sample in addition to BasisOfRecord/RecordBasis and Preparation Type	"tissue", "DNA", "specimen"
SpecimenUnit/Preparation	*preparationType	description of type of material, free text	or DNA: gDNA, eDNA, aDNA; for tissues/specimens: leaf, muscle, leg, blood
SpecimenUnit/Preparation	*preparationDate	date of DNA extraction	if unknown type "unknown"
SpecimenUnit/Preparation	*preparatioMaterials	extraction kit or protocoll	if unknown type "unknown"
SpecimenUnit/Preparation	*preparedBy	extraction staff	if unknown type "unknown"
SpecimenUnit/Preservation	preservation	preservation of the tissue or DNA	if unknown type "unknown"
SpecimenUnit/Loan/Permit	*permitStatus	Information about the presence, absence or other basic status of permits associated with the sample(s), controlled vocabulary!	Permit available, Permit not required, Permit not available, Unknown Material collected after 2014-10-12 cannot be in "Unknown" permit status!
SpecimenUnit/Loan/Permit	*permitStatusQualifier	Description of why a certain permit was not required or why Permit Status is unknown	"no national requirement for a permit at date of access", "officially authorized illegal holder", "collected on private land", "pre-Nagoya"
SpecimenUnit/Loan/Permit	*permitType	A permit is a document that allows someone to take an action that otherwise would not be allowed, controlled vocabulary!	Collecting Permit, Import Permit, Export Permit, Intellectual Property Rights, Copyright, Patent, Data use, Phytosanitary, Salvage, Exemption Permit, Material Transfer Agreement, Internationally Recognized Certificate of Compliance, Contract, Memorandum of Understanding, Memorandum of Cooperation, Veterinary Certificate, Human Pathogens, Genetically Modified Organism, Other
SpecimenUnit/Loan/Permit	*permitText	The text of a permit related to the gathering/shipping or further details
GGBN/Amplification	amplificationDate	date of amplification; if unknown or general without content you don't have to map it	should be ISO format yyyy-mm-dd
GGBN/Amplification	marker		COX1
GGBN/Amplification	geneticAccessionNumber	the accession number of NCBI/EMBL/DDBJ or the process ID of BOLD; this is a repeatable element, you can provide as much as you want	e.g. AJ45567
GGBN/Amplification	genBankNumber-URI	complete link to the accession number of NCBI/EMBL/DDBJ or the process ID of BOLD
SpecimenUnit	blockedUntil	in case the sample is blocked until a specific date it's nevertheless searchable but customers cannot order it	ISO format
SpecimenUnit	blocked	sample is blocked (e.g. because it is consumed), but data are still available	Yes/No
GGBN	concentration	concentration of the DNA	65,34
GGBN	@unit	unit of the concentration	µg/ml
GGBN	ratioOfAbsorbance260_230	map only if filled with content	1,2
GGBN	ratioOfAbsorbance260_280	map only if filled with content	1,8

Related Specimen Data

All specimen voucher information must be available via a GBIF compliant database! Both Darwin Core-Archive and BioCASE specimen providers are possibe! To provide the underlying specimen data you can use your existing GBIF dataset mapping.

File:Relationships.jpg

In the database and the mapping the child record should refer to its mother record in opposite to the real world example

The voucher identifiers

Group	Element	Remarks	Example
Associations/UnitAssociation	*UnitID	the UnitID or CatalogueNumber used for GBIF	e.g. the barcode number of your specimens
Associations/UnitAssociation	*SourceInstitutionID	the SourceInstitutionID or InstitutionCode used for GBIF	e.g. the acronym of your institution
Associations/UnitAssociation	*SourceName	the SourceID or CollectionCode used for GBIF	e.g. the name of the collection where the specimen belongs to, e.g. "Birds"
Associations/UnitAssociation	*AssociationType	the Relation between the DNA and the voucher	e.g. "DNA and voucher from same individual"
Associations/UnitAssociation	*DatasetAccessPoint	the wrapper url of the voucher record	e.g. "http://ww3.bgbm.org/biocase/pywrapper.cgi?dsa=Herbar"

Gathering event of the voucher

All elements marked with * will be indexed and must be mapped! Mapping these gathering facts twice (one for the specimen database and one for the DNA mapping) is required because of indexing and later search purposes.

Group	Element	Remarks	Example
	*CollectorsFieldNumber	the number the collector gave to the specimen in the field, often used in Botany but not in Zoology; map it if you have content	e.g. 765/10
Gathering/Agents/GatheringAgent	*FullName	the Collector or Collector Team	e.g. Scholz & Sipman
Gathering/Altitude	*LowerValue	if you have both the lower and upper value in different columns map both field, if not map LowerValue only	e.g. 100
Gathering/Altitude	UpperValue		e.g. 200
Gathering/Altitude	*Unit		e.g. m
Gathering/Country	*ISO3166Code	ISO code of the country where the voucher was collected	e.g. US
Gathering/Country	*Name	english Name of the country	e.g. United States of America
Gathering/DateTime	*DateText	date when voucher was collected, if you have content yuo can also use ISO format	e.g. 21. April 1951
Gathering/Locality	*LocalityText		e.g. 5km NO Berlin
Gathering/NamedArea	AreaName	name of continent	e.g. Europe
Gathering/NamedArea	@language	language of the name of continent	e.g. "en"
Gathering/SiteCoordinates	*LatitudeDecimal		e.g. -15,88876
Gathering/SiteCoordinates	*LongitudeDecimal		e.g. 72,88876

Identification history of the voucher

Mostly specimen databases record the complete determination or identification history of a single specimen. For GGBN we try to get all available information into the portal.

Group	Element	Remarks	Example
Identification	*PreferredFlag	mark the presently preferred Identification	e.g. true, false, 0, 1
Identification/HigherTaxon	*HigherTaxonName	the name of the higher taxon, please have a look at the BioCASE Wiki for how to prepare your database for the repeatable elements	e.g. Asteraceae, Animalia
Identification/HigherTaxon	*HigherTaxonRank	the rank of the taxon in english or latin	e.g. familia, regnum, phylum
Identification/ScientificName	*FullScientificName	the complete name of the taxon including Authors (and years for animals)	e.g. Aaronsohnia factorovskyi Warb. & Eig. var. factorovskyi
Identification/ScientificName/NameAtomised	*FirstEpithet	Please note: ABCD has several container for NameAtomised, it depens on your sampes which one to choose (Botanical or Zoology etc.)	e.g. factorovskyi
Identification/ScientificName/NameAtomised	*GenusOrMonomial		e.g. Aaronsohnia
Identification/ScientificName/NameAtomised	*InfraspecificEpithet	Please note: ABCD has several container for NameAtomised, it depens on your sampes which one to choose (Botanical or Zoology etc.)	e.g. factorovskyi
Identification/ScientificName/NameAtomised	*Rank	Please note: ABCD has several container for NameAtomised, it depens on your sampes which one to choose (Botanical or Zoology etc.)	e.g. var.

Multimedia items of the voucher

These should be mapped in the specimen mapping, not the DNA mapping.

Darwin Core

We recommend using IPT for providing data as Darwin Core-Archive.

Mandatory for GGBN: Select occurrence as core and add GGBN Material Sample and Darwin Core Resource Relationship as extensions.

Metadata of your DNA and tissue bank

Please follow the example in the IPT documentation: https://github.com/gbif/ipt/wiki/IPT2ManualNotes.wiki#basic-metadata

Note: The description of your dataset should contain the following phrase: The DNA bank is part of the Global Genome Biodiversity Network (GGBN) which was founded in 2011. The network provides a technically optimized DNA collection service facility for all biological research accessible via one central web portal. The network promotes deposition of well documented reference DNA samples after project completion or data publication from scientists of other universities and institutions.

Sample data

The sample identifiers

We use identifier the same way as GBIF does. In addition to the traditional triple ID (see below) you can also provide the GUID. Please note that CatalogNumber should be unique in one database! Mostly the extraction number is used for it.

Group	Element	Remarks	Example
Occurrence	*institutionCode	short description of relevant collection	should be "DNA Bank" or "tissue collection"
Occurrence	*collectionCode	short name/abbreviation of relevant institution	NMNH
Occurrence	*catalogNumber	DNA extraction number or tissue number	DNA 123
Occurrence	occurrenceID	sample record GUID	http://n2t.net/ark:/65665/304ed89be-b1ed-4e71-b210-dbbddfadb776

GGBN terms

The following fields are highly recommended, some are mandatory (marked with *), but feel free to map more!

Please have a look at the GGBN Data Standard documentation where all GGBN terms are described in detail. The table below is just listing the most important ones. Since the Nagoya Protocol has come into force we higly recommend to map the Permit fields. They will become mandatory for providing data to GGBN at the end of 2020. Until then all core members must have implemented the required fields in their databases.

Group	Element	Remarks	Example
Occurrence Core	*basisOfRecord	The specific nature of the data record, controlled vocabulary!	"PreservedSpecimen", "MaterialSample", "FossilSpecimen", "LivingSpecimen", "HumanObservation", "MachineObservation"; for Tissue or DNA samples use "MaterialSample"
Material Sample Extension	*materialSampleType	Classification of kind of physical sample in addition to BasisOfRecord/RecordBasis and Preparation Type	"tissue", "DNA", "specimen"
Preparation Extension	*preparationType	description of type of material, free text	or DNA: gDNA, eDNA, aDNA; for tissues/specimens: leaf, muscle, leg, blood
Preparation Extension	*preparationDate	date of DNA extraction	if unknown type "unknown"
Preparation Extension	*preparatioMaterials	extraction kit or protocoll	if unknown type "unknown"
Preparation Extension	*preparationStaff	extraction staff	if unknown type "unknown"
Preservation Extension	preservation	preservation of the tissue or DNA	if unknown type "unknown"
Permit Extension	*permitStatus	Information about the presence, absence or other basic status of permits associated with the sample(s), controlled vocabulary!	Permit available, Permit not required, Permit not available, Unknown Material collected after 2014-10-12 cannot be in "Unknown" permit status!
Permit Extension	*permitStatusQualifier	Description of why a certain permit was not required or why Permit Status is unknown	"no national requirement for a permit at date of access", "officially authorized illegal holder", "collected on private land", "pre-Nagoya"
Permit Extension	*permitType	A permit is a document that allows someone to take an action that otherwise would not be allowed, controlled vocabulary!	Collecting Permit, Import Permit, Export Permit, Intellectual Property Rights, Copyright, Patent, Data use, Phytosanitary, Salvage, Exemption Permit, Material Transfer Agreement, Internationally Recognized Certificate of Compliance, Contract, Memorandum of Understanding, Memorandum of Cooperation, Veterinary Certificate, Human Pathogens, Genetically Modified Organism, Other
Permit Extension	*permitText	The text of a permit related to the gathering/shipping or further details
Amplification Extension	amplificationDate	date of amplification; if unknown or general without content you don't have to map it	should be ISO format yyyy-mm-dd
Amplification Extension	marker		COX1
Amplification Extension	geneticAccessionNumber	the accession number of NCBI/EMBL/DDBJ or the process ID of BOLD; this is a repeatable element, you can provide as much as you want	e.g. AJ45567
Amplification Extension	genBankNumber-URI	complete link to the accession number of NCBI/EMBL/DDBJ or the process ID of BOLD
Loan Extension	blockedUntil	in case the sample is blocked until a specific date it's nevertheless searchable but customers cannot order it	ISO format
Loan Extension	blocked	sample is blocked (e.g. because it is consumed), but data are still available	Yes/No
Material Sample Extension	concentration	concentration of the DNA	65,34
Material Sample Extension	@unit	unit of the concentration	µg/ml
Material Sample Extension	ratioOfAbsorbance260_230	map only if filled with content	1,2
Material Sample Extension	ratioOfAbsorbance260_280	map only if filled with content	1,8

Related Specimen Data

All specimen voucher information must be available via a GBIF compliant database! Both Darwin Core-Archive and BioCASE specimen providers are possibe! To provide the underlying specimen data you can use your existing GBIF dataset mapping.

File:Relationships.jpg

In the database and the mapping the child record should refer to its mother record in opposite to the real world example

The voucher identifiers

Please use the Darwin Core Resource Relationship Class

Group	Element	Remarks	Example
Resource Relationship	*resourceRelationshipID	concatenated string with the triple id used for GBIF plus the accesspoint of the IPT archive providing this record and the guid (the latter one is not mandatory, but recommended)	e.g. catalogNumber=11718653&collectionCode=Botany&institutionCode=US&accesspoint=http://collections.mnh.si.edu/ipt/archive.do?r=nmnhdwca&guid=http://n2t.net/ark:/65665/303e7eecb-4e8d-4fce-a251-6c8fa3f2863d
Resource Relationship	*relationshipOfResource	the Relation between the DNA and the voucher or the tissue and the voucher	e.g. " same individual", “same population”, “same ex situ individual”

Gathering event of the voucher

All elements marked with * will be indexed and must be mapped! Mapping these gathering facts twice (one for the specimen database and one for the DNA mapping) is required because of indexing and later search purposes.

Group	Element	Remarks	Example
Occurrence Core	*recordNumber	the number the collector gave to the specimen in the field, often used in Botany but not in Zoology; map it if you have content	e.g. 765/10
Occurrence Core	*recordedBy	the Collector or Collector Team	e.g. Scholz & Sipman
Occurrence Core	*minimumElevationInMeters	if you have both the lower and upper value in different columns map both field, if not map LowerValue only	e.g. 100
Occurrence Core	maximumElevationInMeters		e.g. 200
Occurrence Core	*countryCode	ISO code of the country where the voucher was collected	e.g. US
Occurrence Core	*country	english Name of the country	e.g. United States of America
Occurrence Core	*eventDate	date when voucher was collected, if you have content you can also use ISO format	e.g. 21. April 1951
Occurrence Core	*locality		e.g. 5km NO Berlin
Occurrence Core	continent	name of continent	e.g. Europe
Occurrence Core	*decimalLatitude		e.g. -15,88876
Occurrence Core	*decimalLongitude		e.g. 72,88876

Scientific Name of the voucher

Group	Element	Remarks	Example
Occurrence Core	*Family	Please provide at least one higher taxon, usually family	e.g. Asteraceae, Paridae
Occurrence Core	*scientificName	the complete name of the taxon including Authors (and years for animals)	e.g. Aaronsohnia factorovskyi Warb. & Eig. var. factorovskyi
Occurrence Core	*specificEpithet		e.g. factorovskyi
Occurrence Core	*genus		e.g. Aaronsohnia
Occurrence Core	*infraspecificEpithet		e.g. factorovskyi
Identification/ScientificName/NameAtomised	*taxonRank		e.g. var.

Multimedia items of the voucher

These should be mapped in the specimen mapping, not the DNA/tissue mapping. If you have histological images you should map them here.

Difference between revisions of "ABCD2Concepts"

Revision as of 09:47, 27 September 2018

Contents

Introduction

Mapping principles

Mapping with Specimens

Mapping without Specimens

Environmental Samples

ABCD

Metadata of your DNA and tissue bank

Sample data

The sample identifiers

GGBN terms

Related Specimen Data

The voucher identifiers

Gathering event of the voucher

Identification history of the voucher

Multimedia items of the voucher

Darwin Core

Metadata of your DNA and tissue bank

Sample data

The sample identifiers

GGBN terms

Related Specimen Data

The voucher identifiers

Gathering event of the voucher

Scientific Name of the voucher

Multimedia items of the voucher

Navigation menu

Views

Personal tools

Table of Contents

ABCD 2.06

Feedback

Print/export

Search

Tools