Metadata

From reBiND Documentation
Revision as of 02:18, 19 November 2014 by LornaMorris (talk | contribs) (42 revisions)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Metadata Documentation

Metadata can be used to describe single items such as objects (physical or digital), but can also be used to describe groups of items. This documentation refers to metadata relating to groups of items only, as the reBiND project focusses on biodiversity data collections. This text aims to give an overview over metadata standards, which are relevant for the management of biodiversity data. Object data/metadata will be covered with ABCD standard.


Definitions and Functions

metadata

  • structured data describing information resources
  • The National Information Standards Organization (NISO) defines metadata as "structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource."
  • The World Wide Web Consortium (W3C) defines metadata as "machine understandable information for the web."
  • The Federal Geographic Data Committee (FGDC) defines metadata as describing, "the content, quality, condition, and other characteristics of data."
  • Put simply, metadata are data about data. They provide context for research findings, ideally in a machine-readable format. Once published, metadata can enable discovery of data via electronic interfaces and enable correct use and attribution of your findings.

(from: http://marinemetadata.org/guides/mdataintro/mdatadefined)

metadata from and for research data

  • external metadata: basis for unambiguous citation, comparable to classical catalogue data (libraries).
    • ID
    • technical data (Technische Daten)
    • discription of content (Beschreibung des Inhalts)
    • people and rights (Personen und Rechte)
    • networking (Vernetzung)
    • life cycle (Lebenszyklus)
  • internal metadata: metadata on subject-specific level, necessary for subject-specific understanding.
    • technical and method specific: data record comprehensible from technical and content point of vies, data file name, file format, file size, (hash value) , information about software,
    • subject-specific, here biodiversity specific: biodiversity specific metatdata for subject-specific retrieval available?, ABCD sufficient for all biodiversity primary data?

metadata for long term storage

  • structural metadata: relation of one object with other objects in an achrive (standard, e.g. METS)
  • administrative mtadata: administration of archived objects, originator and use evidence, access control, provinience information
  • preservation metadata: history of an objekt, e.g. provinience, measurements for long term accessability, authenticity, rights information regarding applicable processes (standards PREMIS, LMER)

recommendations from the German research foundatation (DFG)

"Die Daten werden durch Metadaten beschrieben. Mit den Metadaten (mindestens nach Dublin Core) werden zum einen die bibliographischen Fakten festgehalten. Es sind dies der Name des Forschers, der die Daten erhoben hat, die Benennung des Datensatzes, Ort und Jahr der Veröffentlichung sowie technische Daten (Format etc). In den inhaltsbezogenen Metadaten werden die Primärdaten umfassend beschrieben. Hier finden sich die Angaben zu den Rahmenbedingungen, unter denen sie erhoben bzw. gemessen wurden. Hier beschreibt der Autor auch die Fragestellung, unter der die Daten entstanden. Es sollen hier alle Informationen vorliegen, die für eine wiederholte Nutzung der Daten in anderen Fragestellungen erforderlich sind. Die Kriterien des Information Life Cycle Management sollen dabei berücksichtigt werden."

(from: Deutsche Forschungsgemeinschaft, Ausschuss für Wissenschaftliche Bibliotheken und Informationssysteme, Unterausschuss für Informationsmanagement, Empfehlungen zur gesicherten Aufbewahrung und Bereitstellung digitaler Forschungsprimärdaten, Januar 2009)


Metadata Standards

catalogue standards

standards to mix vocabularies

defines metadata records, provides semantic interoperability, is generic for designing metadata records, terms are based on RDF

DCMI Abstract Model (DCAM): application profiles promote the sharing and linking of data within and between communities Dublin Core Description Set Profile (DCSP): language for application profiles

DCAP specifies and describes metadata in a particular application

  • functional requirements
  • Domain Model: types of metadata and their relationships
  • Description Set Profile and Usage Guidelines
  • Syntax Guidelines and Data Formats

DCMI-SF illustrates how the standards fit together

schemas for external metadata

  • STD – DOI: 25 desrcibing input fields, oriented on ISO-Norm 690-2* for citation of electronic resources, fields from DC and international DOI Foundation, required: identifier, creators, titles, publisher, publication year, Optional: subjects, contributors, dates, language, resourceType, alternateIdentifiers, relatedIdentifiers, formats, version, rights, descriptions (http://www.iso.org/iso/catalogue_detail.htm?csnumber=25921 )
  • DataCite: draft, based on STD-DOI
  • Altman & King: bases mainly on Dublin Core (http://www.dlib.org/dlib/march07/altman/03altman.html)
  • OECD publisher: based on Altman and King, 27 elements
  • DANS (Data Archiving and Networked Services): 15 DC terms elements, which roughly correspond to a refinement of the core elements.
  • ANDS (Australian National Data Service): seperate Metadataschema, four groups: collection, service, party, and activity in different relations

(from: Konzeptstudie Forschungsdaten Chemie, www.fiz-chemie.de/fileadmin/user_upload/PDF_DE/Konzeptstudie_Forschungsdaten_Chemie.pdf)

further relevant metadata standards

  • DIF - Directory Interchange Format (http://gcmd.nasa.gov/User/difguide/difman.html)
    • 1988 formally approved and adopted, NASA Master Directory (NMD)
    • 1990 NMD renamed to Global Change Master Directory (GCMD), the GCMD serves as NASA's FGDC Clearinghouse node for geospatial metadata.
    • 2004 the ISO 19115/TC211 geospatial metadata standard was adopted
    • The DIF does not compete with other metadata standards. It is simply the "container" for the metadata elements.
    • Eight fields are required in the DIF
  • ISO 19115 - Geographic Information Metadata (http://www.gdi-de.org/thema2009/uebersetzungiso)
    • ISO 19115 defines how to describe geographical information and associated services, including contents, spatial-temporal purchases, data quality, access and rights to use.
    • The standard defines more than 400 metadata elements
    • 20 core elements.
  • INSPIRE Infrastructure for Spacial Information in the European Community
  • EML - Ecological Metadata Language (http://knb.ecoinformatics.org/software/eml/eml-2.1.0/index.html)
    • EML is implemented as a series of XML document types that can by used in a modular and extensible manner to document ecological data.
    • features: modular, detailed structure, compatible, strong typing - meeting the criteria of XML Schema, distinct content model and syntactic implementation
    • was designed with the following standards in mind: Dublin Core Metadata Initiative, the Content Standard for Digital Geospatial Metadata (CSDGM from the US geological Survey's Federal Geographic Data Committee (FGDC)), the Biological Profile of the CSDGM (from the National Biological Information Infrastructure), the International Standards Organization's Geographic Information Standard (ISO 19115), the ISO 8601 Date and Time Standard, the OpenGIS Consortiums's Geography Markup Language (GML), the Scientific, Technical, and Medical Markup Language (STMML), and the Extensible Scientific Interchange Language (XSIL).

metadata standards for long term storage

  • METS (Metadata Encoding & Transmission Standard): information about digitised objects, XML format, representation of inner object structure, metadatacontainer (http://www.loc.gov/standards/mets/mets-schemadocs.html)
  • PREMIS (PREservation Metadata: Implementation Strategies) Entities: Intellectual, Object, Event, Rights, Agent, exact description by semantic units (http://www.loc.gov/standards/premis/)
  • LMER (Deutsche Bibliothek, 2003) based on Preservation Metadata Schema of the National Library New Zealand, exchange format in cooperative archive systems, technical information, history of an object, object, file, process, modification, LMER data mapping to PREMIS data (http://www.d-nb.de/standards/lmer/lmer.htm)

Metadata Management Software

(from: Konzeptstudie Forschungsdaten Chemie)

Interfaces

Creation of Metadata

Metadata can be produced automatically or manually/intellectually. Computer-aided combination of both.

Mapping Meta Data Standards

ISO 19115

  • red = core element

DIF

  • red = required
  • blue = highly recommended
  • green = recommended

Dublin core

  • red = DC elements
Kategorie ISO 19115 Meta Standard DIF Dublin Core Metadata ABCD Metadata Notes
Identifier fileidentifier (unique identifier for this metadata file) Enty_ID is the unique document identifier of the metadata record (may be the same as Data_Set_ID) (Resource) Identifier none
language(languages used for documenting metadata, languageCode ISO 639) not in DIF none none
characterSet (full name of the character coding standard, ISO 10646-2) not in DIF none none
Technische Daten presentationForm (Mode in which the data is represented) Data_Set_Citation:Data_Presentation_Form (The mode in which the data are represented, e.g. atlas, image, profile, text, etc.) (Resource)Type (nature or genre of the resource, recommended: controlled vocabulary such as the DCMI Type Vocabulary) none
Personen und Rechte datasetPointofContact (Point of Contact) Personnel(defines the point of contact for more information about the data set or the metadata, may be repeated): Role (Investigator, technical contact, DIF Author)(may be repeated): First Name, Middle Name, Last_Name/email/FAX/Phone/contact_Adress Contributor ContentMetadata/RevisionData/Contributor (source for Dublin Core standard element Contributor)
Beschreibung des Inhalts geographicDescription (Documented in ISO19112 - Location) , SI_LocationInstance , geographicIdentifier , spacialResolution (optional) Location: Location_Category(keyword, continent, ocean, geographic region, solid earth, space, vertical location) , Location_Type (keyword), Location_Subregion_1, _2, _3 (keywords), Detailed_Location (text) Coverage (The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.) ContentMetadata/ Description/Representation/Coverage (source for the Dublin Core standard element Coverage) coverage: free text form describing geographic, taxonomic and other aspects of terminology and descriptions
Beschreibung des Inhalts geographicBox (geographic Areal Domain of the dataset) Spacial_Coverage
Beschreibung des Inhalts EX_GeographicBoundingBox (Geographic area of the entire dataset)
Beschreibung des Inhalts WestBoundLongitude (Western-most coordinate of the limit)eastBoundLongitude, southBoundLatitude, northBoundLatitude (referenced to WGS 84) Temporal_Coverage (start and stop dates during which the data was collected, may be repeated): Start_Date (may not be repeated within Temp.Cov.), Stop_Date (not valid without start date) Coverage (The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.) ContentMetadata/ Description/Representation/Coverage (source for the Dublin Core standard element Coverage)
Beschreibung des Inhalts not in ISO Paleo_Temporal_Coverage (length of time represented by the data collected, data spans time frames earlier than yyyy-mm-dd = 0001-01-01): Paleo_Start_Date, Paleo_Stop_Date, Chronostratigraphic_Unit (eon, era, period, epoch, stage) Coverage (The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.) ContentMetadata/ Description/Representation/Coverage (source for the Dublin Core standard element Coverage)
Personen und Rechte originator (party who created the recource), (CI_RoleCode) Data_Set_Citation (allows the author to properly cite the data set producer): Dataset_Creator (The name of the organization(s) or individual(s) with primary intellectual responsibility for the data set's development. ) Creator (An entity primarily responsible for making the resource.) ContentMetadata/ Description/Representation/Creator (source for the Dublin Core standard element Coverage)
Personen und Rechte principalInvestigator (key party responsible for gathering information and conducting research), (CI_RoleCode) Personnel: Role: Investigator (The person who headed the investigation or experiment that resulted in the acquisition of the data described (i.e., Principal Investigator, Experiment Team Leader)): Last_Name, First_Name, Middle_Name Creator (An entity primarily responsible for making the resource.) ContentMetadata/ Description/Representation/Creator (source for the Dublin Core standard element Coverage)
Lebenszyklus editionDate (Date of the Edition, Ausgabedatum) Data_Set_Citation: Dataset_Release_Date Date (DateIssued) ContentMetadata/Version/DateIssued (source for Dublin Core standard element DateIssued)
Beschreibung des Inhalts abstract (Brief narrative summary of the content), purpose (summary of the intentions with which ds) Summary (brief description of the data set along with the purpose of the data): Abstract (brief description of the data set), Purpose (purpose of the data set) Description (Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource) ContentMetadata/Description/Representation/Details (source for Dublin Core standard element Description)
Technische Daten MD_DigitalTransferOptions (Means and media by which dataset is obtained): name (Name of the media on which the dataset can be received), transferSize (Estimated size of the transferred dataset), distributonFormat (provides information about the format in whicht the dataset may be obtained), fees (Fees and terms for tretrieving the dataset) Distribution (media options, size, data format, and fees involved in distributing the data set): Distribution_Media, Distribution_Size, Distribution_Format, Fees Format ( (The file format, physical medium, or dimensions of the resource.)
Technische Daten language (languages used within the dataset, languageCode ISO 639) Data_Set_Language (language used in the preparation, storage, and description of the data) Language (A language of the resource.Recommended best practice is to use a controlled vocabulary such as RFC 4646 [RFC4646].) ContentMetadata/Description/Representation/@language
Personen und Rechte MD_Distribution (distributor and options): distributorContact, onLine (Information about online sources), distributorContact (Party from whom the dataset may be obtained) Data_Center (data center, organization, or institution responsible for distributing the data) Data_Center_Name, Data_Center_URL, Data Center Contact Last_Name, - First_Name, - Middle_Name
CI Citation: title (name by which the cited resource is known), alternateTitle (short name or other language name by which the cited information is known. Example: “DCW” as an alternative title for “Digital Chart of the World”) Data_Set_Citation (allows the author to properly cite the data set producer. two functions: a) to indicate how this data set should be cited in professional scientific literature, and b) if this data set is a compilation of other data sets, to document and credit the data sets that were used in producing this compilation, citation for the data set itself, not articles related to the research results, can be repeated, subfields cannot be repeated): Dataset_Title,
Personen und Rechte publisher (party who published the resource) Data_Set_Citation: Dataset_Publisher (The name of the individual or organization that made the data set available for release), Publisher (An entity responsible for making the resource available)
Vernetzung ? sourceCitation (recommended reference to be used for the source data) Reference (describes key bibliographic citations pertaining to the data set): Author, Publication_Date, Title, Series, Edition, Volume, Issue, Report_number, Publication_Place, Publisher, Pages, ISBN, DOI, Online_Recource, Other_Reference_Details Relation (A related resource) Reference (published reference): TitleCitation, CitationDetail, URI (in ABCD pro Unit/Wiss.Name)
Vernetzung linkage (information about on-line sources from which the dataset, specification, or community profile name and extended metadata elements can be obtained) Data_Set_Citation: Online_Resource Relation (A related resource)
Vernetzung funcCode ( Function performed by the resource, Cl_Online Function <<CodeList>>, download, information, offlineAccess, order, search,) Related_URL (specifies links to Internet sites that contain information related to the data, as well as related Internet sites such as project home pages, related data archives/servers, metadata extensions, online software packages, web mapping services, and calibration/validation data): URL_Content_Type (type , subtype), URL, Description Relation (A related resource)
Identifier ?? funcCode ( Function performed by the resource, Cl_Online Function <<CodeList>>) Related_URL (specifies links to Internet sites that contain information related to the data, as well as related Internet sites such as project home pages, related data archives/servers, metadata extensions, online software packages, web mapping services, and calibration/validation data): URL_Content_Type (type , subtype), URL, Description ?? Resource Identifier (An unambiguous reference to the resource within a given context.)
IN ISO SUCHEN Data_Set_Citation: Online_Resource Resource Identifier (An unambiguous reference to the resource within a given context.)
Personen und Rechte useConstraints (constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations or warnings on using the resource or metadata) (MD_legalConstraints) Use_Constraints(information about any constraints for accessing the data set) Rights Management (Information about rights held in and over the resource) ContentMetadata/IPR statements: IPRDeclarations, Copyrights, Licences, TermsofUseStatements, Disclaimers, Acknowledgements, Citations,
Personen und Rechte AccessConstraints (access constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on obtaining the resource or metadata) (MD_legalConstraints) Access_Constraints (describe how the data may or may not be used after access is granted to assure the protection of privacy or intellectual property) Rights Management (Information about rights held in and over the resource)
Personen und Rechte otherConstraints (other restrictions and legal prerequisites for accessing and using the resource or metadata ) (MD_legalConstraints) not in DIF
Vernetzung CI_OnlineResource (information about on-line sources from which the dataset, specification, or community profile name and extended metadata elements can be obtained, vererbt vom übergeorndeten Objekt) Related_URL (specifies links to Internet sites that contain information related to the data, as well as related Internet sites such as project home pages, related data archives/servers, metadata extensions, online software packages, web mapping services, and calibration/validation data): URL_Content_Type (type , subtype), URL, Description Source (A related resource from which the described resource is derived) ContentMetadata/Description/Representation/URI (URI pointing to an online source, related to the current project which may or may not serve an updated version of the descripition data)
OS_Platfrom ( Vehicle/other support base holding sensor) Platform (or Source_Name - platform used to acquire the data, 11 categories of platforms): Source_Name (repeatable), Short_Name, Long_Name (from controlled platform keywords when using the GCMD metadata authoring tools) Source (A related resource from which the described resource is derived)
Beschreibung des Inhalts keyword (common-unse word(s) or phrase(s) used Keyword (ancillary keyword, provide any words or phrases needed to further describe the data set) Subject and Keywords
Beschreibung des Inhalts category (Keywords describing dataset) Parameters: Category (default: EARTH SCIENCE), Topic, Variable:Level 1-3, Detailed_Variable Subject and Keywords (topic of the resource, represented using keywords, key phrases, or classification codes, recommended controled vocabulary)
Beschreibung des Inhalts <DS_Sensor (Device or piece of equipment which detects and records information) Instrument (name of the instrument used to acquire the data, may be repeated: Earth Remote Sensing Instruments, In Situ/Laboratory Instruments, Solar/Space Observing Instruments): Sensor_Name – short_name, long_name Subject and Keywords (topic of the resource, represented using keywords, key phrases, or classification codes, recommended controled vocabulary)
Beschreibung des Inhalts <DS_Sensor (Device or piece of equipment which detects and records information) Instrument (name of the instrument used to acquire the data, may be repeated: Earth Remote Sensing Instruments, In Situ/Laboratory Instruments, Solar/Space Observing Instruments): Sensor_Name – short_name, long_name Subject and Keywords (topic of the resource, represented using keywords, key phrases, or classification codes, recommended controled vocabulary)
Beschreibung des Inhalts not in ISO Project (name of the scientific program, field campaign, or project from which the data were collected): short name, long name Subject and Keywords (topic of the resource, represented using keywords, key phrases, or classification codes, recommended controled vocabulary)
Beschreibung des Inhalts not repeated in ISO Entry_Title (should be descriptive enough so that when a user is presented with a list of titles the general content of the data set can be determined) Title (name given to the resource) ContentMetadata/Description/representation/Title (source for the Dublin Core standard element Title)
Personen und Rechte  ? citedResponsibleParty (name and position information for an individual or organization that is responsible for the resource) Originating Center (data center or data producer who originally generated the dataset) ContentMetadata/Owners: Organisation, Person, Roles, Adresses, TelephoneNumbers, EmailAdresses, URIs, LogoURI, ( Entities having legal possession of the data collection content. Here defined for the entire data collection, not for individual units. If an owner statement is present on the unit level, it should override this dataset-level statement.)
Date (Date.Created) date of creation of the recource ContentMetadata/RevisionData/DateCreated (source for Dublin Core standard element DateCreated)
Date (Date.Modified) ContentMetadata/RevisionData/DateModified (source for Dublin Core standard element DateModified)
ContentMetadata/Scope: GeoecologicalTerms, TaxonomicTerms, IconURI
ContentMetadata/Version(number and date of current version): Major, Minor, Modifier
Datasets/Dataset/ContentContacts/
Datasets/Dataset/DatasetGUID
Datasets/Dataset/OtherProviders
Datasets/Dataset/TechnicalContact/
Datasets/Dataset/Units/Unit/SourceID
Datasets/Dataset/Units/Unit/SourceInstitutionID
Identifier Entry ID (unique document identifier of the metadata record) = Parent DIF
MD_topicCategoryCode (high-level geographic data thematic classification to assist in the grouping and search of available geographic data sets. Can be used to group keywords as well. Listed examples are not exhaustive. NOTE It is understood there are overlaps between general categories and the user is encouraged to select the one most appropriate.) ISO Topic category (identify the keywords in the ISO 19115 - Geographic Information Metadata ) (Farming, Biota, Boundaries, Climatology/Meteorology/Atmosphere, Economy, Elevation, Environment, Geoscientific Information, Health, Imagery/Base Maps/Earth Cover, Intelligence/Military, Inland Waters, Location, Oceans, Planning Cadastre, Society, Structure, Transportation, Utilities/Communications)
metadataStandardName (name of the metadata standard (including profile name) used Metadata_Name (current DIF standard name)
metadataStandardVersion (version of the metadata standard (version of the porfile) used Metadata_Version (current DIF Metadata standard)
Data_Set_Progress (production status of the data set regarding its completeness): planned, in work, complete
Data_resolution (resolution of the data, which is the difference between two adjacent geographic, vertical, or temporal values)
Quality (information about the quality of the data or any quality assurance procedures followed in producing the data)
DIF revision history (list of changes made to the DIF over time)
Multimedia_Sample (provide information that will enable the display of a sample image, movie or sound clip within the DIF): File, URL, Format, Caption, Description,
Parent_DIF (allows the capability to relate generalized aggregated metadata records (parents) to metadata records with highly specific information (children))
IDN_Node(The Internal Directory Name (IDN) Node field is used internally to identify association, responsibility and/or ownership of the dataset, service or supplemental information, not displayed to the user)
dateStamp (date that the metadata was created) DIF_Creation_Date (date the metadata record was created)
Last_DIF_Revision_Date ( date the metadata record was created)
Future_DIF_Revision_Date (allows for the specification of a future date at which the DIF should be reviewed for accuracy of scientific or technical content)
Private ( restrict the data set description from being publicly available) True or False (default, makes the decription publicly available)
locale (provides information about an alternatetively used localised character string for a linguistic extension), (Sprachraum: Kombination aus Sprache, Land und Zeichensatz in der der Datensatz vorliegt)
Role name: spatioalRepresentationInfo (digital representation of spatial information in the dataset)
Role name: referenceSysteminfo (description of the spatial and temporal reference systems used in the dataset)
Role name: metadataExtensionInfo (basic information about the rescources to which the metadata applies
Role name: contentInfo (provides information about the feature catalogue and describes the coverage and image data characteristics)
Role name: distributionInfo (provides information about the distributor of and options for obtaining the resource(s))
dataQualityInfo (provides overall assessment of quality of a resource(s))
Role name: portrayalCatalogueInfo (provides information about the catalogue of rules defined for the portrayal of a resource(s))
Role name: metadataConstraints (provides restrictions on the access and use of metadata)
Role name: applicationSchemaInfo (provides information about the conceptual schema of a dataset)
Role name: metadataMaintenance (provides information about the frequency of metadata updates, and the scope of those updates)

Quellen: http://gcmd.gsfc.nasa.gov/Aboutus/standards/difiso.html
http://gcmd.gsfc.nasa.gov/Aboutus/standards/dublin_to_dif.html
http://rs.tdwg.org/dwc/terms/history/dwctoabcd/index.htm