https://wiki.bgbm.org/rebind_documentation/api.php?action=feedcontributions&user=LornaMorris&feedformat=atomreBiND Documentation - User contributions [en]2024-03-28T21:55:43ZUser contributionsMediaWiki 1.28.2https://wiki.bgbm.org/rebind_documentation/index.php?title=Best_Practice_Handbook&diff=650Best Practice Handbook2014-11-20T11:33:45Z<p>LornaMorris: </p>
<hr />
<div>=reBiND Best Practice Handbook=<br />
<br />
==Introduction==<br />
* [[About reBiND]]<br />
* [[Introduction|How to use this manual]]<br />
<br />
==The reBiND workflow==<br />
* [[Overview_rebind_workflow#Overview_of_the_reBiND_workflow|An overview of the reBiND pipeline]]<br />
<br />
==System Installation and Administration (Technical information for administrators and IT specialists)==<br />
* [[Installation]]<br />
* [[Administration]]<br />
* [[Data Rescue - outdated software and hardware]]<br />
<br />
==Data Archiving (Information for content administrators and contributing scientists) ==<br />
<br />
* [[Data_preparation|Data Preparation]]<br />
* [[Data_upload_to_rebind_framework|Creating a new project and uploading data to the reBiND data portal]]<br />
* [[Validation_and_Corrections |Validation and Corrections]]<br />
* [[Manual_review_of_data |Manual review of the data file]]<br />
* [[Entering_metadata |Entering metadata]]<br />
* [[Publishing_and_searching_the_data |Publishing and searching the data]]<br />
<br />
==Case Studies==<br />
* [[Case studies with single data sets from different providers ]]<br />
<br />
== Supporting data preparation ==<br />
* [[Supporting_data_preparation_software|Software Products]]<br />
* [[Supporting procedures|Supporting data preparation procedures]]<br />
<br />
==Technical Background==<br />
*[[eXist_and_xquery|eXist and xquery]]<br />
*[[Ecologial Metadata Language]]<br />
*[[ABCD Access to Biological Collection Data, Standard]]<br />
<br />
==Glossary==<br />
<br />
* [[Glossary|A description of some of the terms used in this manual]]<br />
<br />
==References==<br />
<br />
* Güntsch, A., Fichtmüller, D., Kirchhoff, A. & Berendsohn, W.G.: Efficient rescue of threatened biodiversity data using reBiND-workflows. In: Plant Biosystems, 146(4) (2012), S. 752-755, DOI:10.1080/11263504.2012.740086<br />
<br />
* The BioCASE Provider Software Documentation: http://wiki.bgbm.org/bps/index.php/Main_Page <br />
<br />
* BioCASE Biological Collection Access Service: http://www.biocase.org/<br />
<br />
* ABCD Schema (Access ot Biological Collection Data); ABCD 2.0 Concepts: http://wiki.tdwg.org/twiki/bin/view/ABCD/AbcdConcepts<br />
<br />
* GBIF (Global Biodiversity Information Facility): http://www.gbif.org/</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Best_Practice_Handbook&diff=649Best Practice Handbook2014-11-20T11:33:16Z<p>LornaMorris: </p>
<hr />
<div>=reBiND Best Practice Handbook=<br />
<br />
==Introduction==<br />
* [[About reBiND]]<br />
* [[Introduction|How to use this manual]]<br />
<br />
==The reBiND workflow==<br />
* [[Overview_rebind_workflow#Overview_of_the_reBiND_workflow|An overview of the reBiND pipeline]]<br />
<br />
==System Installation and Administration (Technical information for administrators and IT specialists)==<br />
* [[Installation]]<br />
* [[Administration]]<br />
* [[Data Rescue - outdated software and hardware]]<br />
<br />
==Data Archiving (Information for content administrators and contributing scientists) ==<br />
<br />
* [[Data_preparation|Data Preparation]]<br />
* [[Data_upload_to_rebind_framework|Creating a new project and uploading data to the reBiND data portal]]<br />
* [[Validation_and_Corrections |Validation and Corrections]]<br />
* [[Manual_review_of_data |Manual review of the data file]]<br />
* [[Entering_metadata |Entering metadata]]<br />
* [[Publishing_and_searching_the_data |Publishing and searching the data]]<br />
<br />
==Case Studies==<br />
* [[Case studies with single data sets from different providers ]]<br />
<br />
== Supporting data preparation ==<br />
* [[Supporting_data_preparation_software|Software Products]]<br />
* [[Supporting procedures|Supporting data preparation procedures]]<br />
<br />
==Technical Background==<br />
*[[eXist_and_xquery|eXist and xquery]]<br />
*[[Ecologial Metadata Language]]<br />
*[[ABCD Access to Biological Collection Data, Standard]]<br />
<br />
==Glossary==<br />
<br />
* [[Glossary|A description of some of the terms used in this manual]]<br />
<br />
==References==<br />
<br />
==References==<br />
<br />
* Güntsch, A., Fichtmüller, D., Kirchhoff, A. & Berendsohn, W.G.: Efficient rescue of threatened biodiversity data using reBiND-workflows. In: Plant Biosystems, 146(4) (2012), S. 752-755, DOI:10.1080/11263504.2012.740086<br />
<br />
* The BioCASE Provider Software Documentation: http://wiki.bgbm.org/bps/index.php/Main_Page <br />
<br />
* BioCASE Biological Collection Access Service: http://www.biocase.org/<br />
<br />
* ABCD Schema (Access ot Biological Collection Data); ABCD 2.0 Concepts: http://wiki.tdwg.org/twiki/bin/view/ABCD/AbcdConcepts<br />
<br />
* GBIF (Global Biodiversity Information Facility): http://www.gbif.org/</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Template:CodeExample&diff=648Template:CodeExample2014-11-19T01:26:24Z<p>LornaMorris: Created page with "<div style="margin-left:30px; float:left; padding:3px; background-color: #F9F9F9; border: 1px solid #CCCCCC;"> <div style="padding:10px; border: 1px solid gray"> {{#tag:syntax..."</p>
<hr />
<div><div style="margin-left:30px; float:left; padding:3px; background-color: #F9F9F9; border: 1px solid #CCCCCC;"><br />
<div style="padding:10px; border: 1px solid gray"><br />
{{#tag:syntaxhighlight|{{{1}}}|lang="{{{lang<noinclude>|xml</noinclude>}}}" }}<br />
</div>{{#if:{{{description|}}}|<div style="font-size: 94%;line-height: 1.4em;padding: 3px">{{{description}}}</div>}}</div><br />
<br style="clear:both"/></div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Metadata&diff=636Metadata2014-11-19T01:18:44Z<p>LornaMorris: 42 revisions</p>
<hr />
<div>Metadata Documentation<br />
<br />
Metadata can be used to describe single items such as objects (physical or digital), but can also be used to describe groups of items. This documentation refers to metadata relating to groups of items only, as the reBiND project focusses on biodiversity data collections. This text aims to give an overview over metadata standards, which are relevant for the management of biodiversity data. Object data/metadata will be covered with ABCD standard.<br />
<br />
<br />
==Definitions and Functions==<br />
===metadata===<br />
*'''structured data describing information resources'''<br />
*The National Information Standards Organization (NISO) defines metadata as "structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource." <br />
*The World Wide Web Consortium (W3C) defines metadata as "machine understandable information for the web." <br />
*The Federal Geographic Data Committee (FGDC) defines metadata as describing, "the content, quality, condition, and other characteristics of data." <br />
*Put simply, metadata are data about data. They provide context for research findings, ideally in a machine-readable format. Once published, metadata can enable discovery of data via electronic interfaces and enable correct use and attribution of your findings.<br />
(from: http://marinemetadata.org/guides/mdataintro/mdatadefined)<br />
<br />
===metadata from and for research data===<br />
*external metadata: basis for unambiguous citation, comparable to classical catalogue data (libraries).<br />
** ID<br />
** technical data (Technische Daten)<br />
** discription of content (Beschreibung des Inhalts)<br />
** people and rights (Personen und Rechte)<br />
** networking (Vernetzung)<br />
** life cycle (Lebenszyklus)<br />
<br />
*internal metadata: metadata on subject-specific level, necessary for subject-specific understanding.<br />
** technical and method specific: data record comprehensible from technical and content point of vies, data file name, file format, file size, (hash value) , information about software, <br />
**subject-specific, here biodiversity specific: biodiversity specific metatdata for subject-specific retrieval available?, ABCD sufficient for all biodiversity primary data?<br />
<br />
===metadata for long term storage===<br />
*structural metadata: relation of one object with other objects in an achrive (standard, e.g. METS)<br />
*administrative mtadata: administration of archived objects, originator and use evidence, access control, provinience information<br />
*preservation metadata: history of an objekt, e.g. provinience, measurements for long term accessability, authenticity, rights information regarding applicable processes (standards PREMIS, LMER)<br />
<br />
===recommendations from the German research foundatation (DFG)===<br />
"Die Daten werden durch Metadaten beschrieben.<br />
Mit den Metadaten (mindestens nach Dublin Core) werden zum einen die bibliographischen<br />
Fakten festgehalten. Es sind dies der Name des Forschers, der die Daten erhoben hat, die<br />
Benennung des Datensatzes, Ort und Jahr der Veröffentlichung sowie technische Daten (Format<br />
etc). In den inhaltsbezogenen Metadaten werden die Primärdaten umfassend beschrieben. Hier<br />
finden sich die Angaben zu den Rahmenbedingungen, unter denen sie erhoben bzw. gemessen<br />
wurden. Hier beschreibt der Autor auch die Fragestellung, unter der die Daten entstanden. Es<br />
sollen hier alle Informationen vorliegen, die für eine wiederholte Nutzung der Daten in anderen<br />
Fragestellungen erforderlich sind. Die Kriterien des Information Life Cycle Management sollen<br />
dabei berücksichtigt werden."<br />
<br />
(from: Deutsche Forschungsgemeinschaft, Ausschuss für Wissenschaftliche Bibliotheken und Informationssysteme, Unterausschuss für Informationsmanagement, Empfehlungen zur gesicherten Aufbewahrung und Bereitstellung digitaler Forschungsprimärdaten, Januar 2009)<br />
<br />
<br />
==Metadata Standards==<br />
<br />
===catalogue standards===<br />
<br />
* PICA - Project of Integrated Catalogue Automation<br />
* MARC – Machine Readable Cataloging (http://www.loc.gov/standards/marcxml/)<br />
* Dublin Core (http://dublincore.org/documents/dces/)<br />
<br />
===standards to mix vocabularies===<br />
* Dublin Core Application Profiles (DCAP): allow to mix and match terms from different vocabularies (http://dublincore.org/documents/profile-guidelines/)<br />
defines metadata records, provides semantic interoperability, is generic for designing metadata records, terms are based on RDF<br />
<br />
DCMI Abstract Model (DCAM): application profiles promote the sharing and linking of data within and between communities<br />
Dublin Core Description Set Profile (DCSP): language for application profiles<br />
<br />
DCAP specifies and describes metadata in a particular application<br />
* functional requirements<br />
* Domain Model: types of metadata and their relationships<br />
* Description Set Profile and Usage Guidelines<br />
* Syntax Guidelines and Data Formats<br />
<br />
DCMI-SF illustrates how the standards fit together<br />
<br />
===schemas for external metadata===<br />
*STD – DOI: 25 desrcibing input fields, oriented on ISO-Norm 690-2* for citation of electronic resources, fields from DC and international DOI Foundation, required: identifier, creators, titles, publisher, publication year, Optional: subjects, contributors, dates, language, resourceType, alternateIdentifiers, relatedIdentifiers, formats, version, rights, descriptions (http://www.iso.org/iso/catalogue_detail.htm?csnumber=25921 )<br />
*DataCite: draft, based on STD-DOI<br />
*Altman & King: bases mainly on Dublin Core (http://www.dlib.org/dlib/march07/altman/03altman.html)<br />
*OECD publisher: based on Altman and King, 27 elements<br />
*DANS (Data Archiving and Networked Services): 15 DC terms elements, which roughly correspond to a refinement of the core elements. <br />
*ANDS (Australian National Data Service): seperate Metadataschema, four groups: collection, service, party, and activity in different relations<br />
<br />
(from: Konzeptstudie Forschungsdaten Chemie, www.fiz-chemie.de/fileadmin/user_upload/PDF_DE/Konzeptstudie_Forschungsdaten_Chemie.pdf)<br />
<br />
=== further relevant metadata standards===<br />
*DIF - Directory Interchange Format (http://gcmd.nasa.gov/User/difguide/difman.html)<br />
**1988 formally approved and adopted, NASA Master Directory (NMD)<br />
**1990 NMD renamed to Global Change Master Directory (GCMD), the GCMD serves as NASA's FGDC Clearinghouse node for geospatial metadata. <br />
**2004 the ISO 19115/TC211 geospatial metadata standard was adopted<br />
**The DIF does not compete with other metadata standards. It is simply the "container" for the metadata elements.<br />
**Eight fields are required in the DIF<br />
*ISO 19115 - Geographic Information Metadata (http://www.gdi-de.org/thema2009/uebersetzungiso)<br />
**ISO 19115 defines how to describe geographical information and associated services, including contents, spatial-temporal purchases, data quality, access and rights to use. <br />
**The standard defines more than 400 metadata elements<br />
**20 core elements.<br />
*INSPIRE Infrastructure for Spacial Information in the European Community<br />
**INSPIRE Metadata Regulation document (http://inspire.jrc.ec.europa.eu/index.cfm/pageid/101)<br />
**INSPIRE Meteadata Implementing Rules document (http://inspire.jrc.ec.europa.eu/index.cfm/pageid/101)<br />
*EML - Ecological Metadata Language (http://knb.ecoinformatics.org/software/eml/eml-2.1.0/index.html)<br />
**EML is implemented as a series of XML document types that can by used in a modular and extensible manner to document ecological data.<br />
**features: modular, detailed structure, compatible, strong typing - meeting the criteria of XML Schema, distinct content model and syntactic implementation <br />
**was designed with the following standards in mind: Dublin Core Metadata Initiative, the Content Standard for Digital Geospatial Metadata (CSDGM from the US geological Survey's Federal Geographic Data Committee (FGDC)), the Biological Profile of the CSDGM (from the National Biological Information Infrastructure), the International Standards Organization's Geographic Information Standard (ISO 19115), the ISO 8601 Date and Time Standard, the OpenGIS Consortiums's Geography Markup Language (GML), the Scientific, Technical, and Medical Markup Language (STMML), and the Extensible Scientific Interchange Language (XSIL).<br />
<br />
=== metadata standards for long term storage===<br />
*METS (Metadata Encoding & Transmission Standard): information about digitised objects, XML format, representation of inner object structure, metadatacontainer (http://www.loc.gov/standards/mets/mets-schemadocs.html)<br />
*PREMIS (PREservation Metadata: Implementation Strategies) Entities: Intellectual, Object, Event, Rights, Agent, exact description by semantic units (http://www.loc.gov/standards/premis/) <br />
*LMER (Deutsche Bibliothek, 2003) based on Preservation Metadata Schema of the National Library New Zealand, exchange format in cooperative archive systems, technical information, history of an object, object, file, process, modification, LMER data mapping to PREMIS data (http://www.d-nb.de/standards/lmer/lmer.htm)<br />
<br />
==Metadata Management Software==<br />
*Metacat: metadata catalogue and repository for science data (ecology, environmental research), XML syntax, open source (http://knb.ecoinformatics.org/software/metacat/01-intro.html)<br />
*Morpho: software for metadata input, storage EML conform files, information about people, locations, research methods, data attributess (http://knb.ecoinformatics.org/morphoportal.jsp)<br />
*MERMAid (Metadata Enterprise Resource Management Aid): tool for development, validation, management, publication of metadata (https://www.dataone.org/node/204)<br />
*MATT (Metadata Authoring Tool): runs within webbrowser, instructions for composing metadata, data converted to XML (https://www.dataone.org/node/188)<br />
*CatMDEdit: metadata editor tool, focus on description geographic information resources, conform with DC and ISO 19115. (http://catmdedit.sourceforge.net/)<br />
*Archivematica: digital preservation system, free, open source, data processing from ingest to access according to ISO-OAIS model (http://archivematica.org/wiki/index.php?title=Main_Page)<br />
<br />
(from: Konzeptstudie Forschungsdaten Chemie)<br />
<br />
==Interfaces==<br />
*external interfaces: DOI regristry with metadata mapping (https://mds.datacite.org/static/apidoc)<br />
*interface provision: OAI-PMH, metadata mapping for OAI-PMH export<br />
<br />
==Creation of Metadata==<br />
Metadata can be produced automatically or manually/intellectually. Computer-aided combination of both.<br />
<br />
==Mapping Meta Data Standards==<br />
<br />
ISO 19115<br />
*<span style="color:red">red </span>= core element<br />
<br />
DIF<br />
*<span style= "color:red">red </span>= required<br />
*<span style= "color:blue">blue </span>= highly recommended<br />
*<span style="color:green">green </span>= recommended<br />
<br />
Dublin core<br />
*<span style="color:red">red </span>= DC elements<br />
<br />
{| class="wikitable sortable"<br />
| '''Kategorie'''|| '''ISO 19115 Meta'''|| '''Standard DIF'''|| '''Dublin Core Metadata ''' || '''ABCD Metadata''' || '''Notes''' <br />
|-<br />
| Identifier|| <span style="color:red">fileidentifier </span>(unique identifier for this metadata file)|| <span style="color:red">Enty_ID </span>is the unique document identifier of the metadata record (may be the same as Data_Set_ID)|| (Resource) <span style="color:red">Identifier</span> || ''none'' || <br />
|-<br />
| || <span style="color:red">language</span>(languages used for documenting metadata, languageCode ISO 639)|| not in DIF|| ''none'' || ''none'' || <br />
|-<br />
|-<br />
| || <span style="color:red">characterSet </span>(full name of the character coding standard, ISO 10646-2)|| not in DIF|| ''none'' || ''none'' || <br />
|-<br />
|-<br />
| Technische Daten || presentationForm (Mode in which the data is represented)|| <span style="color:blue">Data_Set_Citation:</span>Data_Presentation_Form (The mode in which the data are represented, e.g. atlas, image, profile, text, etc.) || (Resource)<span style="color:red">Type</span> (nature or genre of the resource, recommended: controlled vocabulary such as the DCMI Type Vocabulary) || ''none'' || <br />
|-<br />
|-<br />
| Personen und Rechte || <span style="color:red">datasetPointofContact</span> (Point of Contact)|| <span style="color:blue">Personnel</span>(defines the point of contact for more information about the data set or the metadata, may be repeated): Role (Investigator, technical contact, DIF Author)(may be repeated): First Name, Middle Name, Last_Name/email/FAX/Phone/contact_Adress || <span style="color:red">Contributor </span> || ContentMetadata/RevisionData/Contributor (source for Dublin Core standard element Contributor) || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <span style="color:red">geographicDescription</span> (Documented in ISO19112 - Location) , SI_LocationInstance , geographicIdentifier , spacialResolution (optional)|| <span style="color:blue">Location: Location_Category</span>(keyword, continent, ocean, geographic region, solid earth, space, vertical location) , Location_Type (keyword), Location_Subregion_1, _2, _3 (keywords), Detailed_Location (text) || <span style="color:red">Coverage </span> (The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.)|| ContentMetadata/ Description/Representation/'''Coverage''' (source for the Dublin Core standard element Coverage) || ''coverage: free text form describing geographic, taxonomic and other aspects of terminology and descriptions''<br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <span style="color:red">geographicBox</span> (geographic Areal Domain of the dataset)|| <span style="color:blue">Spacial_Coverage</span> || || || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <span style="color:red">EX_GeographicBoundingBox</span> (Geographic area of the entire dataset)|| || || || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <span style="color:red">WestBoundLongitude</span> (Western-most coordinate of the limit)<span style="color:red">eastBoundLongitude, southBoundLatitude, northBoundLatitude </span> (referenced to WGS 84) || <span style="color:blue">Temporal_Coverage</span> (start and stop dates during which the data was collected, may be repeated): Start_Date (may not be repeated within Temp.Cov.), Stop_Date (not valid without start date) || <span style="color:red">Coverage </span> (The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.)|| ContentMetadata/ Description/Representation/'''Coverage''' (source for the Dublin Core standard element Coverage) || || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <span style="color:orange">not in ISO</span> || <span style="color:blue">Paleo_Temporal_Coverage </span> (length of time represented by the data collected, data spans time frames earlier than yyyy-mm-dd = 0001-01-01): Paleo_Start_Date, Paleo_Stop_Date, Chronostratigraphic_Unit (eon, era, period, epoch, stage) || <span style="color:red">Coverage </span> (The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.)|| ContentMetadata/ Description/Representation/'''Coverage''' (source for the Dublin Core standard element Coverage) || || <br />
|-<br />
|-<br />
| Personen und Rechte || originator (party who created the recource), (CI_RoleCode) || <span style="color:blue">Data_Set_Citation </span> (allows the author to properly cite the data set producer): Dataset_Creator (The name of the organization(s) or individual(s) with primary intellectual responsibility for the data set's development. )|| <span style="color:red">Creator </span> (An entity primarily responsible for making the resource.)|| ContentMetadata/ Description/Representation/'''Creator''' (source for the Dublin Core standard element Coverage) || <br />
|-<br />
|-<br />
| Personen und Rechte || principalInvestigator (key party responsible for gathering information and conducting research), (CI_RoleCode) || <span style="color:blue">Personnel: Role: Investigator </span> (The person who headed the investigation or experiment that resulted in the acquisition of the data described (i.e., Principal Investigator, Experiment Team Leader)): Last_Name, First_Name, Middle_Name|| <span style="color:red">Creator </span> (An entity primarily responsible for making the resource.)|| ContentMetadata/ Description/Representation/'''Creator''' (source for the Dublin Core standard element Coverage) || <br />
|-<br />
|-<br />
| Lebenszyklus || editionDate (Date of the Edition, Ausgabedatum) || <span style="color:blue">Data_Set_Citation: </span> Dataset_Release_Date|| <span style="color:red">Date </span> (DateIssued)|| ContentMetadata/Version/'''DateIssued''' (source for Dublin Core standard element DateIssued) || || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <span style="color:red">abstract</span> (Brief narrative summary of the content), purpose (summary of the intentions with which ds) || <span style="color:red">Summary </span> (brief description of the data set along with the purpose of the data): Abstract (brief description of the data set), Purpose (purpose of the data set) || <span style="color:red">Description </span> (Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource)|| ContentMetadata/Description/Representation/Details (source for Dublin Core standard element Description) || || <br />
|-<br />
|-<br />
| Technische Daten || MD_DigitalTransferOptions (Means and media by which dataset is obtained): <span style="color:red">name</span> (Name of the media on which the dataset can be received), transferSize (Estimated size of the transferred dataset), <span style="color:red">distributonFormat </span> (provides information about the format in whicht the dataset may be obtained), fees (Fees and terms for tretrieving the dataset) || <span style="color:blue">Distribution</span> (media options, size, data format, and fees involved in distributing the data set): Distribution_Media, Distribution_Size, Distribution_Format, Fees || <span style="color:red">Format </span> ( (The file format, physical medium, or dimensions of the resource.)|| || <br />
|-<br />
|-<br />
| Technische Daten || <span style="color:red">language</span> (languages used within the dataset, languageCode ISO 639) || <span style="color:blue">Data_Set_Language</span> (language used in the preparation, storage, and description of the data) || <span style="color:red">Language </span> (A language of the resource.Recommended best practice is to use a controlled vocabulary such as RFC 4646 [RFC4646].)|| ContentMetadata/Description/Representation/@language || || <br />
|-<br />
|-<br />
| Personen und Rechte ||MD_Distribution (distributor and options): distributorContact, onLine (Information about online sources), distributorContact (Party from whom the dataset may be obtained) || <span style="color:red">Data_Center</span> (data center, organization, or institution responsible for distributing the data) Data_Center_Name, Data_Center_URL, Data Center Contact Last_Name, - First_Name, - Middle_Name || || || <br />
|-<br />
|-<br />
| ||CI Citation: title (name by which the cited resource is known), alternateTitle (short name or other language name by which the cited information is known. Example: “DCW” as an alternative title for “Digital Chart of the World”) || <span style="color:blue">Data_Set_Citation</span> (allows the author to properly cite the data set producer. two functions: a) to indicate how this data set should be cited in professional scientific literature, and b) if this data set is a compilation of other data sets, to document and credit the data sets that were used in producing this compilation, citation for the data set itself, not articles related to the research results, can be repeated, subfields cannot be repeated): Dataset_Title, || || || <br />
|-<br />
|-<br />
|Personen und Rechte ||publisher (party who published the resource)|| <span style="color:blue">Data_Set_Citation:</span> Dataset_Publisher (The name of the individual or organization that made the data set available for release), ||<span style="color:red">Publisher</span> (An entity responsible for making the resource available) || || <br />
|-<br />
|-<br />
|Vernetzung ||? sourceCitation (recommended reference to be used for the source data)|| <span style="color:green">Reference</span> (describes key bibliographic citations pertaining to the data set): Author, Publication_Date, Title, Series, Edition, Volume, Issue, Report_number, Publication_Place, Publisher, Pages, ISBN, DOI, Online_Recource, Other_Reference_Details ||<span style="color:red">Relation </span> (A related resource) || Reference (published reference): TitleCitation, CitationDetail, URI (in ABCD pro Unit/Wiss.Name)|| <br />
|-<br />
|-<br />
|Vernetzung ||linkage (information about on-line sources from which the dataset, specification, or community profile name and extended metadata elements can be obtained)|| <span style="color:blue">Data_Set_Citation: </span> Online_Resource||<span style="color:red">Relation </span> (A related resource) || || <br />
|-<br />
|-<br />
|Vernetzung ||funcCode ( Function performed by the resource, Cl_Online Function <<CodeList>>, download, information, offlineAccess, order, search,)|| <span style="color:blue">Related_URL </span> (specifies links to Internet sites that contain information related to the data, as well as related Internet sites such as project home pages, related data archives/servers, metadata extensions, online software packages, web mapping services, and calibration/validation data): <span style="color:blue">URL_Content_Type (type</span> , subtype), <span style="color:blue">URL,</span> Description||<span style="color:red">Relation </span> (A related resource) || || <br />
|-<br />
|-<br />
|Identifier||?? funcCode ( Function performed by the resource, Cl_Online Function <<CodeList>>)|| <span style="color:blue">Related_URL </span> (specifies links to Internet sites that contain information related to the data, as well as related Internet sites such as project home pages, related data archives/servers, metadata extensions, online software packages, web mapping services, and calibration/validation data): <span style="color:blue">URL_Content_Type (type</span> , subtype), <span style="color:blue">URL,</span> Description||<span style="color:red">?? Resource Identifier </span> (An unambiguous reference to the resource within a given context.) || || <br />
|-<br />
|-<br />
| || <span style="color:orange">IN ISO SUCHEN</span>|| Data_Set_Citation: Online_Resource||<span style="color:red">Resource Identifier </span> (An unambiguous reference to the resource within a given context.) || || <br />
|-<br />
|-<br />
|Personen und Rechte|| useConstraints (constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations or warnings on using the resource or metadata) (MD_legalConstraints)|| <span style="color:blue">Use_Constraints</span>(information about any constraints for accessing the data set) ||<span style="color:red">Rights Management </span>(Information about rights held in and over the resource) || ContentMetadata/<b>IPR</b> statements: IPRDeclarations, Copyrights, Licences, TermsofUseStatements, Disclaimers, Acknowledgements, Citations,|| <br />
|-<br />
|-<br />
|Personen und Rechte|| AccessConstraints (access constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on obtaining the resource or metadata) (MD_legalConstraints)|| <span style="color:blue">Access_Constraints </span>(describe how the data may or may not be used after access is granted to assure the protection of privacy or intellectual property) ||<span style="color:red">Rights Management </span>(Information about rights held in and over the resource) || || <br />
|-<br />
|-<br />
|Personen und Rechte|| otherConstraints (other restrictions and legal prerequisites for accessing and using the resource or metadata ) (MD_legalConstraints)|| <span style="color:orange">not in DIF </span> || || || <br />
|-<br />
|-<br />
|Vernetzung|| CI_OnlineResource (information about on-line sources from which the dataset, specification, or community profile name and extended metadata elements can be obtained, vererbt vom übergeorndeten Objekt)|| <span style="color:blue">Related_URL </span> (specifies links to Internet sites that contain information related to the data, as well as related Internet sites such as project home pages, related data archives/servers, metadata extensions, online software packages, web mapping services, and calibration/validation data): <span style="color:blue">URL_Content_Type (type</span> , subtype), <span style="color:blue">URL,</span> Description || <span style="color:red">Source</span> (A related resource from which the described resource is derived)|| ContentMetadata/Description/Representation/<b>URI</b> (URI pointing to an online source, related to the current project which may or may not serve an updated version of the descripition data) || <br />
|-<br />
|-<br />
| || OS_Platfrom ( Vehicle/other support base holding sensor)|| <span style="color:blue">Platform</span> (or Source_Name - platform used to acquire the data, 11 categories of platforms): Source_Name (repeatable), Short_Name, Long_Name (from controlled platform keywords when using the GCMD metadata authoring tools) ||<span style="color:red">Source</span> (A related resource from which the described resource is derived) || || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || keyword (common-unse word(s) or phrase(s) used|| <span style="color:green">Keyword </span>(ancillary keyword, provide any words or phrases needed to further describe the data set) ||Subject and Keywords|| || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <span style="color:red">category</span> (Keywords describing dataset)|| <span style="color:red">Parameters:</span> Category (default: EARTH SCIENCE), Topic, Variable:Level 1-3, Detailed_Variable ||<span style="color:red">Subject</span> and Keywords (topic of the resource, represented using keywords, key phrases, or classification codes, recommended controled vocabulary)|| || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <DS_Sensor (Device or piece of equipment which detects and records information)|| <span style="color:blue">Instrument</span> (name of the instrument used to acquire the data, may be repeated: Earth Remote Sensing Instruments, In Situ/Laboratory Instruments, Solar/Space Observing Instruments): Sensor_Name – short_name, long_name ||<span style="color:red">Subject</span> and Keywords (topic of the resource, represented using keywords, key phrases, or classification codes, recommended controled vocabulary)|| || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <DS_Sensor (Device or piece of equipment which detects and records information)|| <span style="color:blue">Instrument</span> (name of the instrument used to acquire the data, may be repeated: Earth Remote Sensing Instruments, In Situ/Laboratory Instruments, Solar/Space Observing Instruments): Sensor_Name – short_name, long_name ||<span style="color:red">Subject</span> and Keywords (topic of the resource, represented using keywords, key phrases, or classification codes, recommended controled vocabulary)|| || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <span style="color:orange">not in ISO</span> || <span style="color:blue">Project </span> (name of the scientific program, field campaign, or project from which the data were collected): short name, long name ||<span style="color:red">Subject</span> and Keywords (topic of the resource, represented using keywords, key phrases, or classification codes, recommended controled vocabulary)|| || <br />
|-<br />
|-<br />
| Beschreibung des Inhalts || <span style="color:orange">not repeated in ISO</span> || <span style="color:blue">Entry_Title</span> (should be descriptive enough so that when a user is presented with a list of titles the general content of the data set can be determined) ||<span style="color:red">Title</span> (name given to the resource)||ContentMetadata/Description/representation/<b>Title</b> (source for the Dublin Core standard element Title) || <br />
|-<br />
|-<br />
| Personen und Rechte || ? citedResponsibleParty (name and position information for an individual or organization that is responsible for the resource) || <span style="color:green">Originating Center</span> (data center or data producer who originally generated the dataset) || || ContentMetadata/<b>Owners</b>: Organisation, Person, Roles, Adresses, TelephoneNumbers, EmailAdresses, URIs, LogoURI, ( Entities having legal possession of the data collection content. Here defined for the entire data collection, not for individual units. If an owner statement is present on the unit level, it should override this dataset-level statement.) || <br />
|-<br />
| || || ||Date (Date.Created) date of creation of the recource || ContentMetadata/RevisionData/<b>DateCreated</b> (source for Dublin Core standard element DateCreated) || <br />
|-<br />
|-<br />
| || || ||Date (Date.Modified) || ContentMetadata/RevisionData/<b>DateModified</b> (source for Dublin Core standard element DateModified) || <br />
|-<br />
|-<br />
| || || || || ContentMetadata/<b>Scope</b>: GeoecologicalTerms, TaxonomicTerms, IconURI || <br />
|-<br />
|-<br />
| || || || || ContentMetadata/<b>Version</b>(number and date of current version): Major, Minor, Modifier || <br />
|-<br />
|-<br />
| || || || || Datasets/Dataset/ContentContacts/ || <br />
|-<br />
|-<br />
| || || || || Datasets/Dataset/DatasetGUID || <br />
|-<br />
|-<br />
| || || || || Datasets/Dataset/OtherProviders || <br />
|-<br />
|-<br />
| || || || || Datasets/Dataset/TechnicalContact/ || <br />
|-<br />
|-<br />
| || || || || Datasets/Dataset/Units/Unit/SourceID || <br />
|-<br />
|-<br />
| || || || || Datasets/Dataset/Units/Unit/SourceInstitutionID || <br />
|-<br />
|-<br />
| Identifier || || <span style="color:red"> Entry ID </span> (unique document identifier of the metadata record) = Parent DIF || || || <br />
|-<br />
|-<br />
| || MD_topicCategoryCode (high-level geographic data thematic classification to assist in the grouping and search of available geographic data sets. Can be used to group keywords as well. Listed examples are not exhaustive. NOTE It is understood there are overlaps between general categories and the user is encouraged to select the one most appropriate.) || <span style="color:red"> ISO Topic category</span> (identify the keywords in the ISO 19115 - Geographic Information Metadata ) (Farming, Biota, Boundaries, Climatology/Meteorology/Atmosphere, Economy, Elevation, Environment, Geoscientific Information, Health, Imagery/Base Maps/Earth Cover, Intelligence/Military, Inland Waters, Location, Oceans, Planning Cadastre, Society, Structure, Transportation, Utilities/Communications) || || || <br />
|-<br />
|-<br />
| || <span style="color:red">metadataStandardName</span> (name of the metadata standard (including profile name) used ||<span style="color:red">Metadata_Name</span> (current DIF standard name)|| || || <br />
|-<br />
|-<br />
| || <span style="color:red">metadataStandardVersion</span> (version of the metadata standard (version of the porfile) used || <span style="color:red">Metadata_Version</span> (current DIF Metadata standard) || || || <br />
|-<br />
|-<br />
| || || <span style="color:blue">Data_Set_Progress</span> (production status of the data set regarding its completeness): planned, in work, complete || || || <br />
|-<br />
|-<br />
| || || <span style="color:blue">Data_resolution</span> (resolution of the data, which is the difference between two adjacent geographic, vertical, or temporal values) || || || <br />
|-<br />
|-<br />
| || || <span style="color:blue">Quality</span> (information about the quality of the data or any quality assurance procedures followed in producing the data) || || || <br />
|-<br />
|-<br />
| || || <span style="color:blue">DIF revision history </span> (list of changes made to the DIF over time) || || || <br />
|-<br />
|-<br />
| || || <span style="color:green">Multimedia_Sample</span> (provide information that will enable the display of a sample image, movie or sound clip within the DIF): File, URL, Format, Caption, Description, || || || <br />
|-<br />
|-<br />
| || ||<span style="color:green"> Parent_DIF </span> (allows the capability to relate generalized aggregated metadata records (parents) to metadata records with highly specific information (children)) || || || <br />
|-<br />
|-<br />
| || || <span style="color:green">IDN_Node</span>(The Internal Directory Name (IDN) Node field is used internally to identify association, responsibility and/or ownership of the dataset, service or supplemental information, not displayed to the user) || || || <br />
|-<br />
|-<br />
| || <span style="color:red">dateStamp </span>(date that the metadata was created) || <span style="color:green"> DIF_Creation_Date </span> (date the metadata record was created) || || || <br />
|-<br />
|-<br />
| || || <span style="color:green">Last_DIF_Revision_Date </span>( date the metadata record was created) || || || <br />
|-<br />
|-<br />
| || || <span style="color:green">Future_DIF_Revision_Date </span>(allows for the specification of a future date at which the DIF should be reviewed for accuracy of scientific or technical content) || || || <br />
|-<br />
|-<br />
| || || <span style="color:green"> Private </span> ( restrict the data set description from being publicly available) True or False (default, makes the decription publicly available) || || || <br />
|-<br />
|-<br />
| || <span style="color:red">locale </span> (provides information about an alternatetively used localised character string for a linguistic extension), (Sprachraum: Kombination aus Sprache, Land und Zeichensatz in der der Datensatz vorliegt) || || || || <br />
|-<br />
|-<br />
| || <span style="color:red">Role name: spatioalRepresentationInfo </span> (digital representation of spatial information in the dataset) || || || || <br />
|-<br />
|-<br />
| || <span style="color:red">Role name: referenceSysteminfo </span> (description of the spatial and temporal reference systems used in the dataset) || || || || <br />
|-<br />
|-<br />
| || <span style="color:red"> Role name: metadataExtensionInfo </span> (basic information about the rescources to which the metadata applies || || || || <br />
|-<br />
|-<br />
| || <span style="color:red"> Role name: contentInfo </span> (provides information about the feature catalogue and describes the coverage and image data characteristics) || || || || <br />
|-<br />
|-<br />
| || <span style="color:red">Role name: distributionInfo </span> (provides information about the distributor of and options for obtaining the resource(s)) || || || || <br />
|-<br />
|-<br />
| || <span style="color:red">dataQualityInfo </span> (provides overall assessment of quality of a resource(s)) || || || || <br />
|-<br />
|-<br />
| || <span style="color:red">Role name: portrayalCatalogueInfo </span> (provides information about the catalogue of rules defined for the portrayal of a resource(s)) || || || || <br />
|-<br />
|-<br />
| || <span style="color:red"> Role name: metadataConstraints </span> (provides restrictions on the access and use of metadata) || || || || <br />
|-<br />
|-<br />
| || <span style="color:red">Role name: applicationSchemaInfo </span> (provides information about the conceptual schema of a dataset) || || || || <br />
|-<br />
|-<br />
| || <span style="color:red"> Role name: metadataMaintenance </span> (provides information about the frequency of metadata updates, and the scope of those updates) || || || || <br />
|-<br />
|-<br />
| || <span style="color:red"> </span> || || || || <br />
|-<br />
|}<br />
Quellen:<br />
http://gcmd.gsfc.nasa.gov/Aboutus/standards/difiso.html<br><br />
http://gcmd.gsfc.nasa.gov/Aboutus/standards/dublin_to_dif.html<br><br />
http://rs.tdwg.org/dwc/terms/history/dwctoabcd/index.htm</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Correction_Modules&diff=642Correction Modules2014-11-19T01:18:44Z<p>LornaMorris: 5 revisions</p>
<hr />
<div>= Modules for the XML Correction Manager =<br />
When the user starts the correction of an XML file, a specific correction configuration has to be selected. The web interface of the Correction Manager will offer a list of all correction config files in the correction config directory. A correction config file, is an XML file which is valid to a specific schema. It specifies what correction modules will be called, in what order and with what parameters. The structure of such a file is relatively simple: <br />
<syntaxhighlight><br />
<modules xmlns="http://rebind.bgbm.org/modules" name="configuration-name"><br />
<module name="module-name" description="module-description"><br />
<setting name="module-setting-name" value="module-setting-value"/><br />
</module><br />
<setting name="general-setting-name" value="general-setting-value"/><br />
</modules><br />
</syntaxhighlight><br />
<br />
The root element is <code>modules</code>. The <code>name</code> attribute is optional. It is the name with which the module will be displayed in the web interface of the Correction Manager. <br />
<br />
There can be several <code>module</code> elements within the element <code>modules</code>. <br />
<br />
Each <code>module</code> element must have a <code>name</code> attribute which specifies the name of the module to be loaded. This should either be the complete name of a Java class which implements the <code>Module</code> interface or the name specified within the method <code>getName()</code> of that class. <br />
<br />
The <code>description</code> is optional and is used to distinguish different instances of the same module which are run with different settings. <br />
<br />
Each <code>module</code> element can have any number of <code>setting</code> elements. Each <code>setting</code> element has mandatory attributes for the setting <code>name</code> and <code>value</code>. What settings are used for each module is specified in the module descriptions below.<br />
<br />
The <code>modules</code> element can also have <code>setting</code> elements. These general settings are also accessible by the module and are overwritten if a setting with the same name is specified in the <code>module</code> element. <br />
__TOC__<br />
== ElementTextReplacer ==<br />
''What it does'': Replaces the text content of specific elements according to specific rules. <br />
<br />
''Full Name'': <code>org.bgbm.rebind.correction.modules.ElementTextReplacer</code><br />
<br />
''Settings'': <br />
: '''address'''<br />
:: The name (including element prefix) of the element whose text should be replaced. Or an XPath expression pointing to the element. If the value is interpreted as a name or as an XPath depends on the attribute <code>isXPath</code>. <br />
:: ''Mandatory'': yes<br />
:: ''Example Values'': <code>abcd:Sex</code> or <code>//abcd:RecordBasis[matches(.,'^Specimen$')]</code><br />
<br />
: '''isXPath'''<br />
:: A flag indicating if the <code>address</code> element contains an XPath expression or just the name of an element. <br />
:: ''Mandatory'': no<br />
:: ''Default Value'': <code>false</code><br />
:: ''Allowed Values'': <code>true</code> or <code>false</code><br />
<br />
: '''key'''<br />
:: The part of the content that should be replaced. This could either be plain text or a RegEx, depending on the attribute <code>isRegEx</code>. Regardless whether it is plain text or an attribute, it could several keys to be replaced or just one, depending on the attribute <code>isBatch</code>. If the batch mode is used, the character or string with which the different parts are separated can be specified in the attribute <code>splitter</code>. <br />
:: ''Mandatory'': yes<br />
:: ''Example Values'': <code>Hello World</code> (plain text), <code>(H[ea]llo World)(\!?)</code> (RegEx), <code>Hello World;Lorem Ipsum</code> (plain text, Batch mode with ';' as splitter), <code>H[ea]llo World\!?;[Ll]orem [iI]psum</code> (RegEx, Batch mode with ';' as splitter),<br />
<br />
: '''value'''<br />
:: The new content with which the content specified in <code>key</code> will be replaced. This could either be plain text or a RegEx, depending on the attribute <code>isRegEx</code>. Regardless whether it is plain text or an attribute, it could several keys to be replaced or just one, depending on the attribute <code>isBatch</code>. If the batch mode is used, the character or string with which the different parts are separated can be specified in the attribute <code>splitter</code>. If the batch mode is used, the key fragments will be replaced by the corresponding value fragments (e.g. the third key fragment will be replaced by third value fragment). Therefore the number of fragments must be the same for key and value, otherwise the replacement will stop after the number of fragments in the smaller one. <br />
:: ''Mandatory'': yes<br />
:: ''Example Values'': <code>Hello World again</code> (plain text), <code>$1 again $2</code> (RegEx), <code>Hello World again;Lorem ipsum dolor sit amet</code> (plain text, Batch mode with ';' as splitter), <code>$1 again $2;$& dolor sit amet</code> (RegEx, Batch mode with ';' as splitter),<br />
<br />
: '''isRegEx'''<br />
:: A flag indicating if the <code>key</code> and the <code>value</code> elements are regular expressions or just plain text. <br />
:: ''Mandatory'': no<br />
:: ''Default Value'': <code>false</code><br />
:: ''Allowed Values'': <code>true</code> or <code>false</code><br />
<br />
: '''isBatch'''<br />
:: A flag indicating if the <code>key</code> and the <code>value</code> elements contain just one fragment which is supposed to be replaced, or several. If it is true, the character or string with which the different fragments of <code>key</code> and the <code>value</code> elements are separated can be specified in the attribute <code>splitter</code>.<br />
:: ''Mandatory'': no<br />
:: ''Default Value'': <code>false</code><br />
:: ''Allowed Values'': <code>true</code> or <code>false</code><br />
<br />
: '''splitter'''<br />
:: The character or string with which the <code>key</code> and the <code>value</code> elements are broken into their fragments, if they are in batch mode. The splitting is done using the function [http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#split%28java.lang.String%29 String.split(String)], which interprets the parameter string as a regular expression. This could cause errors when the splitter contains characters with syntactical meaning in RegEx, like <code><setting name="splitter" value="."/></code> which would cause any character to be matched and therefor only returning empty fragments. <br />
:: ''Mandatory'': no<br />
:: ''Default Value'': <code>;</code><br />
:: ''Example Values'': <code>,</code> or <code>\.</code><br />
<br />
''Examples'':<br />
<syntaxhighlight><br />
<module name="org.bgbm.rebind.correction.modules.ElementTextReplacer" description="replaces 'Specimen'"><br />
<setting name="address" value="//abcd:RecordBasis[matches(.,'^Specimen$')]"/><br />
<setting name="isXPath" value="true"/><br />
<setting name="key" value="^(Specimen)$"/><br />
<setting name="value" value="Preserved$1"/><br />
<setting name="isRegEx" value="true"/><br />
<setting name="isBatch" value="false"/><br />
<setting name="splitter" value=";"/><br />
</module><br />
<module name="org.bgbm.rebind.correction.modules.ElementTextReplacer" description="corrects abcd:Sex"><br />
<setting name="address" value="abcd:Sex"/><br />
<setting name="isXPath" value="false"/><br />
<setting name="key" value="female;male;hermaphrodite"/><br />
<setting name="value" value="F;M;X"/><br />
<setting name="isRegEx" value="false"/><br />
<setting name="isBatch" value="true"/><br />
<setting name="splitter" value=";"/><br />
</module><br />
<module name="org.bgbm.rebind.correction.modules.ElementTextReplacer" description="corrects abcd:Rank"><br />
<setting name="address" value="abcd:Rank"/><br />
<setting name="isXPath" value="false"/><br />
<setting name="key" value="[f.];[subvar.];[var.]"/><br />
<setting name="value" value="f.;subvar.;var."/><br />
<setting name="isRegEx" value="false"/><br />
<setting name="isBatch" value="true"/><br />
<setting name="splitter" value=";"/><br />
</module><br />
<module name="org.bgbm.rebind.correction.modules.ElementTextReplacer" description="corrects abcd:Rank"><br />
<setting name="address" value="//abcd:Rank[matches(.,'^(f|var)$')]"/><br />
<setting name="isXPath" value="true"/><br />
<setting name="key" value="^f$;^var$"/><br />
<setting name="value" value="f.;var."/><br />
<setting name="isRegEx" value="true"/><br />
<setting name="isBatch" value="true"/><br />
<setting name="splitter" value=";"/><br />
</module><br />
</syntaxhighlight><br />
<br />
<br />
== EmptyElementDeleter ==<br />
''What it does:'' Removes empty elements which have neither text content (except white spaces) nor child elements nor attributes. Currently there is a hardcoded exception regarding the attributes. If the only attribute is <code>abcd:language</code> then the element will be deleted as well. Such exceptions will be adjustable via the settings in the future. <br />
<br />
''Full Name:'' <code>org.bgbm.rebind.correction.modules.EmptyElementDeleter</code><br />
<br />
''Settings:'' none<br />
<br />
''Example:''<br />
<syntaxhighlight><br />
<module name="org.bgbm.rebind.correction.modules.EmptyElementDeleter" description="first iteration"/><br />
</syntaxhighlight><br />
<br />
<br />
== ElementDeleter ==<br />
''What it does:'' Deletes specific elements including all its content and child elements. <br />
<br />
''Full Name:'' <code>org.bgbm.rebind.correction.modules.ElementDeleter</code><br />
<br />
''Settings:'' <br />
: '''xpath'''<br />
:: The XPath address of the element(s) to be removed. Also works for attributes. <br />
:: ''Mandatory:'' yes<br />
:: ''Example Values:'' <code>//abcd:LogoURI</code> or <code>//abcd:TelephoneNumber[abcd:Device="Fax"]</code><br />
<br />
''Examples:''<br />
<syntaxhighlight><br />
<module name="org.bgbm.rebind.correction.modules.ElementDeleter" description="delete abcd:language attributes"><br />
<setting name="xpath" value="//*/@abcd:language"/><br />
</module><br />
</syntaxhighlight><br />
<br />
<br />
== ElementRenamer ==<br />
''What it does:'' <br />
<br />
''Full Name:'' <code>org.bgbm.rebind.correction.modules.ElementRenamer</code><br />
<br />
''Settings:'' <br />
: '''xpath'''<br />
:: The XPath address of the element(s) to be renamed. Also works for attributes. <br />
:: ''Mandatory:'' yes<br />
:: ''Example Values:'' <code>//abcd:LogoURI</code> or <code>//abcd:TelephoneNumber[abcd:Device="Fax"]</code><br />
<br />
: '''newName'''<br />
:: The new name of the element, without namepsace prefix.<br />
:: ''Mandatory:'' yes<br />
:: ''Example Values:'' <code>newElementName</code><br />
<br />
: '''useOldNamespace'''<br />
:: A flag indicating if the old namespace (and namespace prefix) of the element should be used after renaming as well. <br />
:: ''Mandatory:'' no<br />
:: ''Default Value:'' <code>true</code><br />
:: ''Allowed Values:'' <code>true</code> or <code>false</code><br />
<br />
: '''newNamespace'''<br />
:: The namespace url of the new namespace, if <code>useOldNamespace</code> is set to <code>false</code>. <br />
:: ''Mandatory:'' no<br />
:: ''Default Value:'' ''(empty string)''<br />
:: ''Example Values:'' <code><nowiki>http://example.com/ns/xyz</nowiki></code><br />
<br />
: '''newNamespacePrefix'''<br />
:: The namespace prefix of the new namespace, if <code>useOldNamespace</code> is set to <code>false</code>. If the colon at the end is missing, it will be added automatically.<br />
:: ''Mandatory:'' no<br />
:: ''Default Value:'' ''(empty string)''<br />
:: ''Example Values:'' <code>xyz:</code><br />
<br />
''Examples:''<br />
<syntaxhighlight><br />
<module name="org.bgbm.rebind.correction.modules.ElementRenamer" description="rename abcd:language attributes"><br />
<setting name="xpath" value="//*/@abcd:language"/><br />
<setting name="newName" value="language"/><br />
<setting name="useOldNamespace" value="false"/><br />
<setting name="newNamespace" value=""/><br />
<setting name="newNamespacePrefix" value=""/><br />
</module><br />
<module name="org.bgbm.rebind.correction.modules.ElementRenamer" description="rename wrong ISO Dates"><br />
<setting name="xpath" value="//abcd:ISODateTimeBegin[not(matches(.,'^(\d\d\d\d(\-(0[1-9]|1[012])(\-((0[1-9])|1\d|2\d|3[01])(T(0\d|1\d|2[0-3])(:[0-5]\d){0,2})?)?)?|\-\-(0[1-9]|1[012])(\-(0[1-9]|1\d|2\d|3[01]))?|\-\-\-(0[1-9]|1\d|2\d|3[01]))$'))]"/><br />
<setting name="newName" value="DateText"/><br />
<setting name="useOldNamespace" value="true"/><br />
</module><br />
</syntaxhighlight><br />
<br />
<br />
== DummyModule ==<br />
''What it does:'' Waits 1-11 seconds before returning a quote from either the homicidal computer HAL 9000 from the movie "2001: A Space Odyssey" or the maniacally depressed robot Marvin from the book/movie "The Hitchhiker's Guide to the Galaxy". This module does not alter the XML code in any way, it only sends the quote back to the Correction Manager. It is only used for testing purposes. <br />
<br />
''Full Name:'' <code>org.bgbm.rebind.correction.modules.DummyModule</code><br />
<br />
''Settings:'' none<br />
<br />
''Examples:''<br />
<syntaxhighlight><br />
<module name="org.bgbm.rebind.correction.modules.DummyModule" description="just wait a bit for a snappy robot remark" /><br />
<module name="org.bgbm.rebind.correction.modules.DummyModule" description="Play it once again, Marvin, for old times' sake." /><br />
</syntaxhighlight><br />
<br />
<br />
== Work in Progress ==<br />
These modules are currently in the making. So some of these descriptions might not reflect the current state of development.<br />
<br />
=== ABCDDateCorrector ===<br />
''What it does:'' Checks the dates in the elements <code>abcd:ISODateTimeBegin</code> within <code>abcd:Date</code> and <code>abcd:DateTime</code>. If they are not formatted according to the ISO norm it tries to parse and fix them or renames the element to <code>abcd:DateText</code>. However if there are any <code>abcd:DateText</code> elements which are correctly formatted or can be converted, it will make them into <code>abcd:ISODateTimeBegin</code> elements. <br />
<br />
''Full Name:'' <code>org.bgbm.rebind.correction.modules.ABCDDateCorrector</code><br />
<br />
''Settings:'' none<br />
<br />
''Examples:''<br />
<syntaxhighlight><br />
<module name="org.bgbm.rebind.correction.modules.ABCDDateCorrector" description="fixing the dates" /><br />
</syntaxhighlight><br />
<br />
<br />
=== SimpleCountryCodeChecker ===<br />
''What it does:'' Compares the content of the <code>abcd:ISO3166Code</code> element with a list of Country Codes provided by Java and warns if it doesn't occur there. Also has some hardcoded exceptions for the commonly used but unspecified codes: <br />
ZZ Unknown<br />
XA Unknown or unspecified Africa<br />
XB Unknown or unspecified Middle and South America<br />
XC Unknown or unspecified Asia<br />
XD Unknown or unspecified Australia and Oceania<br />
XE Unknown or unspecified Europe<br />
XF Unknown or unspecified North America<br />
<br />
''Full Name:'' <code>org.bgbm.rebind.correction.modules.SimpleCountryCodeChecker</code><br />
<br />
''Settings:'' none<br />
<br />
''Examples:''<br />
<syntaxhighlight><br />
<module name="org.bgbm.rebind.correction.modules.SimpleCountryCodeChecker" description="checking the country codes" /><br />
</syntaxhighlight></div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Retrieving_Geocoordinates_From_Points_In_Images&diff=647Retrieving Geocoordinates From Points In Images2014-11-19T01:18:44Z<p>LornaMorris: 4 revisions</p>
<hr />
<div>= Retrieving Geocoordinates From Points In Images =<br />
This manual will show how to get the geocoordinates from points in an image. <br />
<br />
__TOC__<br />
<br />
<br />
The source image we use, shows six locations in the western part of Nicaragua. Though the image is of poor quality, luckily the lines for latitude and longitude are visible in the picture. <br />
[[File:JOSM Mapping 00 Original Image.png|frame|none|The Source Image with six locations marked with circled dots]]<br />
To start download the program JOSM from [http://josm.openstreetmap.de/ http://josm.openstreetmap.de/]. JOSM is a Java based Editor for [http://www.oepnstreetmap.org OpenStreetMap]. <br />
[[File:JOSM Mapping 01 Initial Window.png|frame|none|The Initial Screen of JOSM]]<br />
== Download PicLayer Plugin ==<br />
* Start JOSM<br />
* Press F12 to get to the Preferences Window<br />
* Go to the Plugin-Tab<br />
* Click on '''Download List'''<br />
* Search for '''PicLayer'''<br />
* Check the checkbox for the PicLayer plugin<br />
* Click OK and restart JOSM<br />
[[File:JOSM Mapping 02 Preferences Plugins.png|frame|none|Selecting the PicLayer Plugin]]<br />
<br />
== Select the OpenStreetMap Image Layer ==<br />
* Go on '''Imagery''' and select '''OpenStreetMap (Mapnik)'''<br />
* navigate to the area of interest<br />
[[File:JOSM Mapping 03 Area Of Interest.png|frame|none|The area of interest is selected]]<br />
<br />
== Draw a reference rectangle ==<br />
''This step can be skipped if the image does not show the lines for latitude and longitude, like the example image above does.'' <br />
* Open a new Data Layer by pressing '''Ctrl + N'''<br />
* Insert a new Point by pressing '''Shift + D''' and entering its coordinates and press OK. In this example we start with 13.0 -87.0. Repeat for the other points as well: 12.0 -87,0; 12.0 -86,0 and 13.0 -86,0.<br />
<br />
[[File:JOSM Mapping 04 Entering Coordinates.png|frame|none|The window for entering the coordinates.]]<br />
* Switch to the drawing tool by pressing '''A'''<br />
* Connect the four points to a rectangle. <br />
[[File:JOSM Mapping 05 Grid Rectangle.png|frame|none|The rectangle on the map]]<br />
<br />
== Loading the reference image ==<br />
* In the Menu go on '''PicLayer''' and select '''New picture layer from file ...'''<br />
* Select the prepared image.<br />
* The Picture Layer may be displayed below the OpenStreetMap Layer. If so, move it up in the Layers Box, so that the image becomes visible. <br />
[[File:JOSM Mapping 06 Image Loaded.png|frame|none|The image loaded as a PicLayer]]<br />
* Also make the Picture Layer the active layer, by clicking on the third button in the Layers Box Toolbar, when the Picture Layer is selected. By doing so, the tools for calibrating the picture will become visible in the toolbar at the left side. <br />
[[File:JOSM Mapping 07 Image Layer Selected.png|frame|none|The image layer is now the selected layer]]<br />
<br />
== Calibrating the reference image == <br />
* Click on the button with the green arrow to add three reference points on the image, each on one of the confluence points (where the latitude and longitude lines cross). <br />
* Click on the button with the red arrow to move the reference points until they align with the corners of the rectangle. <br />
* If you now make the image layer semi-transparent, you can see how the image aligns with the map. <br />
* If you do not have the latitude and longitude lines, you should make the image layer semi transparent and try to align distinct points in the image with the map. <br />
<br />
[[File:JOSM Mapping 08 Image Layer Calibrated.png|frame|none|The image is now calibrated. The gray arrows of the rectangle of the data layer are now over the grid lines in the image.]]<br />
[[File:JOSM Mapping 09 Image Layer Semi-Transparent.png|frame|none|Setting the image layer as semi transparent.]]<br />
<br />
== Mapping the points in the image ==<br />
* Open a new data layer by pressing '''Ctrl + N'''<br />
* Select the drawing tool by pressing '''A'''<br />
* Draw points on the positions of the image from which you want to know the coordinates.<br />
* If there are several points, you should add a name for each point in the Properties Box on the right.<br />
[[File:JOSM Mapping 10 Points Mapped And Named.png|frame|none|Mapping the points. They are visible as the light small rectangles in the black dots on the image.]]<br />
* After all the points are mapped, right-click on the new data layer and select '''Save As ...''' and save it as XML file on your local file system. <br />
[[File:JOSM Mapping 11 Exporting Points.png|frame|none|Exporting the mapped points.]]<br />
* The file contains the coordinates of the selected points for further processing. <br />
<syntaxhighlight lang="xml"><br />
<?xml version='1.0' encoding='UTF-8'?><br />
<osm version='0.6' upload='true' generator='JOSM'><br />
<node id='-95' action='modify' visible='true' lat='12.881180287556537' lon='-86.11194760698338'><br />
<tag k='name' v='1' /><br />
</node><br />
<node id='-94' action='modify' visible='true' lat='12.79249813630681' lon='-86.10761639569195'><br />
<tag k='name' v='2' /><br />
</node><br />
<node id='-93' action='modify' visible='true' lat='12.403621057420668' lon='-86.22455910056097'><br />
<tag k='name' v='3' /><br />
</node><br />
<node id='-92' action='modify' visible='true' lat='12.084582576302516' lon='-86.11790302250911'><br />
<tag k='name' v='6' /><br />
</node><br />
<node id='-91' action='modify' visible='true' lat='12.20261349095152' lon='-86.46277572159048'><br />
<tag k='name' v='5' /><br />
</node><br />
<node id='-90' action='modify' visible='true' lat='12.40062989709886' lon='-86.83219726377988'><br />
<tag k='name' v='4' /><br />
</node><br />
</osm><br />
</syntaxhighlight><br />
<br />
== Closing JOSM ==<br />
* Close JOSM. '''WARNING''': When you try to close JOSM, it will ask you if you want to upload the two data layers to OpenStreetMap and selects these options for you. '''DO NOT UPLOAD THE DATA!''' Click on '''Exit Now.'''<br />
* JOSM will then ask if the calibration file for the image should be saved. You can do so, if you want to. It will be stored in the same directory as the image. The next time you select the image to be used by PicLayer it will detect this file and put the image in the correct position.<br />
<br />
== Useful Links ==<br />
* [http://boulter.com/gps/ GPS Coordinate Converter] Converts coordinates from the various form to various other forms, shows them on the map and links to different maps.<br />
* [http://www.movable-type.co.uk/scripts/latlong-vincenty.html Inverse Vincenty Formula] Calculate the distance between two coordinates<br />
* [http://www.movable-type.co.uk/scripts/latlong-vincenty-direct.html Direct Vincenty Formula] Enter GPS coordinates, a direction and a distance and get the resulting coordinates</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Overview_rebind_workflow&diff=593Overview rebind workflow2014-11-19T01:16:22Z<p>LornaMorris: /* Overview of the reBiND workflow */</p>
<hr />
<div>=== Overview of the reBiND workflow === <br />
<br />
[[File:Architecture-Concept.png|thumb|center|1000px|The general structure of the reBiND Framework. Blue solid lines indicate how the document is transformed and processed, orange dashed lines indicate user interaction or input.]]<br />
<br />
This figure shows the general structure of the reBiND processing architecture. It shows each step in the workflow from submission of a dataset, preparation and processing to its final publication.<br />
<br />
Before the data can be uploaded into the reBiND data portal several steps are required to prepare the data and map it to an appropriate schema. We have used the ABCD - Access to Biological Collections Data - schema ([[ABCD_Access_to_Biological_Collection_Data,_Standard|ABCD in reBiND]]). ABCD is a common data specification for biological collection units, including living and preserved specimens and field observations. <br />
<br />
The majority of data we received from contributing scientists was in spreadsheet format, which can easily be imported into a relational database. Once the data is in a rational database we used the BioCASe Provider Software (BPS) [http://wiki.bgbm.org/bps]. The BPS supports many different SQL based databases and these databases offer imports for different file types. In order to generate the XML files the columns from the relational database have to be mapped to the corresponding concepts of ABCD. At this point the expert knowledge of the contributing scientists is needed. At the end of the mapping process an ABCD XML document is generated. <br />
<br />
Once the data has been converted into an XML format it can be uploaded onto the reBiND web portal. After the XML document has been uploaded, the correction process can be started by the Content Administrator. <br />
<br />
The grey box in the figure highlights the steps between upload of the data into the reBiND portal and the validation, correction and review steps prior to publication of the data. The Correction Manager processes several correction modules, each for a specific purpose. When any of the modules makes any changes to the document or encounters problems, these issues are recorded in a document, so they can later be reviewed. When the modules are finished running the corrected document is loaded back into the reBiND system. At this stage the document should be valid or if the set of correction modules were unable to fix any problems encountered the remaining validation errors will be marked.<br />
<br />
The next step is the review. In the issue list produced by the Correction Manager issues of three different severity level are flagged. These are: <br />
* information (a change was made that is not expected to cause any problem)<br />
* warning (a change has been made or a problem with the content has been detected that can not be changed automatically but it has no consequence for the validity of the document)<br />
* error (a problem with the content has been detected that causes the document to be invalid and it can not be fixed automatically). <br />
<br />
The issues should be reviewed. Some of the problems could be the result of some technical issues and may be fixed by specifying new correction modules. Other problems could be caused by the content errors and therefore discussion with the contributing scientist might be necessary in order to fix these.<br />
<br />
Independent from the review, a metadata document has to be created. This can be done via a dedicated web form where the user can enter information describing the dataset. The data from this form is saved in another XML file (in [[Ecologial_Metadata_Language|EML format]]).<br />
<br />
Once the metadata document has been created and the changes and notes of the automated corrections have been reviewed, the data project can be published. This makes the now valid document and the metadata document publicly available. The data can now be accessed via a search form from the homepage or via data networks. In the case of the reBiND Service, the ABCD documents can be accessed via biodiversity networks like GBIF and BioCASe. A specialised module will translate the BioCASe Protocol, in which the query from these two networks are sent, into XQuery and then return the parts of the documents that are relevant to the query. <br />
<br />
After this overview of the processing architecture of the reBiND Framework, this text will now take a closer look at the individual steps.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Publishing_and_searching_the_data&diff=592Publishing and searching the data2014-11-19T01:11:36Z<p>LornaMorris: </p>
<hr />
<div>==Publishing and searching the data ==<br />
<br />
There are two ways of searching the data. You can perform simple full text or scientific name searches via the search box on the [http://data-rebind.bgbm.org/rebind public part of the reBiND data portal].<br />
<br />
A module has also been implemented to connect the reBiND Service to biodiversity networks like GBIF and BioCASe. The module converts requests sent in the [http://www.biocase.org/products/protocols BioCASe Protocol] for querying data sources into XQuery (which can then select the appropriate parts of the ABCD data files) and then return this result in the ABCD format. The [http://data-rebind.bgbm.org/rebind/biocase/request.xql reBiND capabilities request] shows the fields within the ACBD file which can be queried. More information explaining the output of the capabilities request can be found [http://www.biocase.org/products/protocols here].<br />
<br />
==Searching the data via the reBiND interface==<br />
<br />
The search page of the reBiND interface lists the published datasets by title. The screenshot below shows the search page.<br />
<br />
[[File:Rebind_search.png|border]]<br />
<br />
<br />
Clicking on a title takes you to further details, where you can click on the link 'export data file' to download ABCD data file:<br />
<br />
<br />
[[File:Rebind_search_result.png|border]]<br />
<br />
<br />
Alternatively using the search box on the top right of the screen you can carry out a full-text search or a scientific name search. The screenshot below show a scientific name search for 'Natrix'.<br />
<br />
<br />
[[File:Natrix_search.png|border]]<br />
<br />
<br />
The full text search searches for the text anywhere within the ABCD file or the associated metadata EML file. The scientific name search searches for the query within the abcd:FullScientificName of the abcd document.<br />
<br />
<br />
[[File:Natrix_result.png|border]]<br />
<br />
==Searching the data via GBIF==<br />
<br />
<br />
The [http://data-rebind.bgbm.org/rebind/biocase/request.xql?inventory=1 inventory] show the titles of the datasets available for harvesting. The latest published data that has been harvested by GBIF can be found [http://www.gbif.org/installation/ccdd9a0a-26a8-4db7-9d35-c10020305622 here]. <br />
<br />
The screenshot below shows one of the datasets 'Morphologische Daten von Natrix' made available via the GBIF interface:<br />
<br />
<br />
[[File:GBIF_Natrix.PNG|border]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Entering_metadata&diff=591Entering metadata2014-11-19T01:10:31Z<p>LornaMorris: </p>
<hr />
<div>= Metadata Editor Tool =<br />
<br />
Before a dataset can be published by reBiND it is necessary to either upload a metadata file or create a metadata file using the reBiND software. We have implemented a tool consisting of a series of web-forms for capturing the metadata. The Metadata Editor tool is based on Ecological Metadata Language (EML). The specification for EML is available from [https://knb.ecoinformatics.org/#external//emlparser/docs/index.html 'The Knowledge Network for Biocomplexity']. We are using [[Ecological_Metadata_Language|a subset of EML]] to describe the essential data most frequently used by our data providers. For example, Title, Abstract, Owner, Content and Usage Rights. Usage rights can be used to link to code of conduct, copyright statement and any license information. The tool can also be used to enter coverage information (Geographical, Temporal and Taxonomic) and scientific methods that were used to collect the data. The following screenshots show the series of forms for entering the metadata:<br />
<br />
===Metadata Description===<br />
<br />
The screenshot below shows the description form in the Metadata Capture Tool. The description should distinguish your data from other data and provide an abstract to describe key features, study design or methods used in the study. Any metadata already contained in the ABCD data files - such as abcd:Title and abcd:Details is transferred to the EML file and is pre-populated in the metadata capture form, thereby saving the user from entering the data again.<br />
<br />
<br />
[[File:Metadata1.PNG|border]]<br />
<br />
<br />
===Metadata Keywords===<br />
<br />
The screenshot below shows the Keywords form. Entering keywords enable better categorisation and searching of the data. Keywords can be entered in multiple sets by creating multiple thesauri.<br />
<br />
<br />
[[File:Metadata2.PNG|border]]<br />
<br />
<br />
===Metadata Contact/Owner===<br />
<br />
<br />
The screenshot shows the Contact/Owner form for entering details about the owner of the data.<br />
<br />
<br />
[[File:Metadata3.PNG|border]]<br />
<br />
<br />
===Metadata Usage Rights===<br />
<br />
The Usage Rights form allows you to enter a free text description describing the usage rights. It could be a standard agreement for reBiND data, such as the [http://www.biocase.org/whats_biocase/code_of_conduct.shtml Biocase Code of Conduct] or a more specific statement describing usage rights for the dataset itself, for example how to cite the data if it is re-used.<br />
<br />
<br />
[[File:Metadata4.PNG|border]]<br />
<br />
<br />
===Metadata Geographical Coverage===<br />
<br />
The screenshot shows the Geographical coverage form. Here you can enter the coordinates directly or use the zoom and drag to re-position the boundary box on the map.<br />
<br />
<br />
[[File:Metadata5.PNG|border]]<br />
<br />
<br />
===Metadata Temporal Coverage===<br />
<br />
The Temporal coverage form allows you to enter multiple date ranges to indicate when the data was gathered.<br />
<br />
<br />
[[File:Metadata6.PNG|border]]<br />
<br />
<br />
===Metadata Taxonomic Coverage===<br />
<br />
The Taxonomic coverage form allows you to add a taxonomic hierarchy or multiple taxons describing the organisms in the dataset.<br />
<br />
<br />
[[File:Metadata7.PNG|border]]<br />
<br />
<br />
===Metadata Methods===<br />
<br />
The Methods form allows you to specify the methods used in the dataset, such as field, laboratory and processing steps, sampling methods and the instrumentation used.<br />
<br />
<br />
[[File:Metadata8.PNG|border]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Manual_review_of_data&diff=590Manual review of data2014-11-19T01:08:14Z<p>LornaMorris: </p>
<hr />
<div>==Manual Review and Corrections==<br />
The results from the automated correction can be manually reviewed and edited via the web interface. It is especially important to review the errors and warnings. Errors and warnings could be caused by technical issues or with the content of the document itself. More minor changes are flagged as 'info' messages. If there are any errors or warnings it is advisable to have both the contributing scientist and the content administrator go through the review together. Errors in the document could be fixed by modifying the correction configuration and re-running the automated correction with a new configuration file.<br />
<br />
There is an online XML editor included in the eXist release, called eXide, an online demo is available at the eXist homepage. The eXide editor is a modification of the online source code editor ACE - [http://ace.ajax.org/ Could9 Editor]. With eXide it is possible to directly edit documents stored in the database, including features like syntax highlighting or code folding. <br />
<br />
The eXide editor that comes packaged with eXist has been modified to create the reBiND editor. <br />
<br />
===Reviewing corrections in the reBiND Editor===<br />
<br />
A screenshot showing the results of the correction on the reBiND_Puffinus.xml data file is shown below:<br />
<br />
<br />
[[File:Correction_output_review.png|border]]<br />
<br />
<br />
In the left-hand panel a list of 'Issues' is displayed and in the main editor window the data file is displayed. Clicking on any 'Issue' in the left-hand panel takes the user to the corresponding change in the data file. In the example shown the first issue in the list has been clicked. This expands the 'Issue' and shows the 'Old Content' and the 'New Content'. In this case the problem was that the XML schema required the content to be of the type xs:DateTime, but the old content only gave a year date range. The automated correction was of a type called 'Element Text Replacer'. This type of correction replaces a specific pattern (a regular expression) at a specified position within the XML file with some other text. The technical documentation details the different types of correction and [[Correction Modules|how to modify the correction modules and specify a different configuration]]. In this example the lower year is taken as the year and the date is assumed to be the 1st January of that year and the time is assumed to be midnight. If this change is acceptable to the reviewer then they can click the checkbox to indicate they agree within the change. <br />
<br />
If the change is not acceptable, the user should run another set of corrections on the original data file. To do this you need to [[Correction Modules|change the correction configuration or add new correction modules]], in consultation with a technical administrator.<br />
<br />
A modified GUI could also allow only changes of a certain type (class) or carried out by certain module to be displayed. It could also hide reviewed changes from the list. Though this might be a good way for the Content Administrator and the Technical Administrator to review the changes, there is still the problem that the Contributing Scientist is confronted with the XML document and required to work with it. So at some point in the future a better interface for the Contributing Scientist to work with the data might be desirable (e.g. the use of automatically generated web forms to edit the data), but for the general infrastructure described in this text, the online XML editor is sufficient. <br />
<br />
It is not necessary to review all the changes at once. The document can be stored at any time and the review process can be resumed at a later point. So it is possible that after the correction one of the administrators reviews the changes which are caused by technical issues or the XML format used, leaving only changes which are related to the data. Then the Contributing Scientist can review these.<br />
<br />
<br />
After the correction is finished the file can be validated again. This time - if the correction modules have been able to fix the original errors the file should be valid. The screenshot below shows rerunning the 'validation' on the reBiND_Puffinus.xml after the correction step has been run.<br />
<br />
<br />
[[File:Validation_final.PNG|border]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Validation_and_Corrections&diff=589Validation and Corrections2014-11-19T01:07:02Z<p>LornaMorris: /* Validation */</p>
<hr />
<div>= Validation and Correction =<br />
<br />
Once the XML data file has been uploaded into the reBiND system the user can validate and perform automated and manual corrections to the file before publishing. Publishing the data makes it available to the public search interface and also to biodiversity networks, such as GBIF. <br />
<br />
== Validation ==<br />
<br />
The figure below shows the result of clicking on the validation action for the file reBiND_Puffinus.xml. When the validation is running the information screen opens and displays a throbber while the file is being validated. In the screenshot the validation is complete and the screen shows the result - that there are 600 errors in the file.<br />
<br />
<br />
[[File:ReBIND_portal_validation.PNG|border]]<br />
<br />
<br />
After validation is complete it is possible to review the validation results in detail in the reBiND editor (a modified version of the eXide editor which comes bundled with the eXist software). To open this editor the user should click on the 'Edit' button in the list of actions below the data file (in this case reBiND_Puffinus.xml). This opens the data file in the editor - a screenshot of this is shown below:<br />
<br />
<br />
[[File:ReBIND_portal_validation_report.PNG|border]]<br />
<br />
<br />
The left-hand panel shows a list of validation errors and in the main editor window the data file is displayed. Clicking on any individual validation error in the left-hand panel takes the user to the corresponding error in the data file. Errors are marked with an red error icon within the left-hand margin. It is possible to make manual edits to the file to fix these errors, but when there are so many errors within a file this would be labour intensive. It the next step (the automated correction) we show how these errors can be fixed automatically using the reBiND correction software.<br />
<br />
== Running the automated correction ==<br />
<br />
'Start Correction' is the final action in the list below the data file. Clicking on this link takes the user to the following page:<br />
<br />
<br />
[[File:Correction_choose_config.png|border]]<br />
<br />
<br />
A drop-down menu gives a list of available configuration files. The first correction configuration ('default correction') is suitable for most ABCD files. Alternative corrections can be uploaded by the administrator to run different automated corrections This could depend on - for example - what sort of errors have been seen in the data file (in the validation step) or whether a different XML file has been used instead of the default ABCD data.<br />
<br />
<br />
After clicking on 'Start Correction' a throbber appears as the correction modules (specified in the configuration file) are run. When the correction is complete a report is generated (see the following screenshot):<br />
<br />
<br />
[[File:Correction_output_report.png|border]]<br />
<br />
<br />
The output shows a link to the original data file, a link to an XML version of the report and a tabular view of the report showing the number of each type of correction made. The level of 'info', 'warning' and 'error' are used to indicate the effect of the change as follows:<br />
<br />
* info - flags any minor change to the data where no problems are expected from the change.<br />
* warning - flags a change where it is uncertain that the new value is correct and it should be checked by the content administrator.<br />
* error - flags a a problem that could not be corrected and results in the file being invalid according to the associated schema.<br />
<br />
<br />
The report indicates several changes were made, including one change of a year date to an ISO DateTime and 1640 changes of abcd:LowerValue. These changes are both of type 'Element Text Replacer', where a standard pattern in the text is replaced with another. There were also 201 changes to remove unnecessary empty elements. There is another type of change called an 'Element Renamer' which renames incorrectly named elements within the XML file. Clicking back on the browser and then opening the reBiND editor by clicking the action 'Edit' under the data file allows the user to review these results from the automated correction. See the next sections for details on reviewing the corrected data file.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Validation_and_Corrections&diff=588Validation and Corrections2014-11-19T01:05:25Z<p>LornaMorris: /* Validation */</p>
<hr />
<div>= Validation and Correction =<br />
<br />
Once the XML data file has been uploaded into the reBiND system the user can validate and perform automated and manual corrections to the file before publishing. Publishing the data makes it available to the public search interface and also to biodiversity networks, such as GBIF. <br />
<br />
== Validation ==<br />
<br />
The figure below shows the result of clicking on the validation action for the file reBiND_Puffinus.xml. When the validation is running the information screen opens and displays a throbber while the file is being validated. In the screenshot the validation is complete and the screen shows the result - that there are 600 errors in the file.<br />
<br />
<br />
[[File:ReBIND_portal_validation.PNG|border]]<br />
<br />
<br />
After validation is complete it is possible to review the validation results in detail in the reBiND editor (a modified version of the eXide editor which comes bundled with the eXist software). To open this editor the user should click on the 'Edit' button in the list of actions below the data file (in this case reBiND_Puffinus.xml). This opens the data file in the editor - a screenshot of this is shown below:<br />
<br />
<br />
[[File:ReBIND_portal_validation_report.PNG|border]]<br />
<br />
<br />
In the left-hand panel a list of validation errors can be seen and in the main editor window the data file is displayed. Clicking on any individual validation error in the left-hand panel takes the user to the corresponding error in the data file. Errors are marked with an red error icon within the left-hand margin. It is possible to make manual edits to the file to fix these errors, but when there are so many errors within a file this would be labour intensive. It the next step (the automated correction) we show how these errors can be fixed automatically using the reBiND correction software.<br />
<br />
== Running the automated correction ==<br />
<br />
'Start Correction' is the final action in the list below the data file. Clicking on this link takes the user to the following page:<br />
<br />
<br />
[[File:Correction_choose_config.png|border]]<br />
<br />
<br />
A drop-down menu gives a list of available configuration files. The first correction configuration ('default correction') is suitable for most ABCD files. Alternative corrections can be uploaded by the administrator to run different automated corrections This could depend on - for example - what sort of errors have been seen in the data file (in the validation step) or whether a different XML file has been used instead of the default ABCD data.<br />
<br />
<br />
After clicking on 'Start Correction' a throbber appears as the correction modules (specified in the configuration file) are run. When the correction is complete a report is generated (see the following screenshot):<br />
<br />
<br />
[[File:Correction_output_report.png|border]]<br />
<br />
<br />
The output shows a link to the original data file, a link to an XML version of the report and a tabular view of the report showing the number of each type of correction made. The level of 'info', 'warning' and 'error' are used to indicate the effect of the change as follows:<br />
<br />
* info - flags any minor change to the data where no problems are expected from the change.<br />
* warning - flags a change where it is uncertain that the new value is correct and it should be checked by the content administrator.<br />
* error - flags a a problem that could not be corrected and results in the file being invalid according to the associated schema.<br />
<br />
<br />
The report indicates several changes were made, including one change of a year date to an ISO DateTime and 1640 changes of abcd:LowerValue. These changes are both of type 'Element Text Replacer', where a standard pattern in the text is replaced with another. There were also 201 changes to remove unnecessary empty elements. There is another type of change called an 'Element Renamer' which renames incorrectly named elements within the XML file. Clicking back on the browser and then opening the reBiND editor by clicking the action 'Edit' under the data file allows the user to review these results from the automated correction. See the next sections for details on reviewing the corrected data file.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Validation_and_Corrections&diff=587Validation and Corrections2014-11-19T01:05:12Z<p>LornaMorris: /* Validation */</p>
<hr />
<div>= Validation and Correction =<br />
<br />
Once the XML data file has been uploaded into the reBiND system the user can validate and perform automated and manual corrections to the file before publishing. Publishing the data makes it available to the public search interface and also to biodiversity networks, such as GBIF. <br />
<br />
== Validation ==<br />
<br />
The figure below shows the result of clicking on the validation action for the file reBiND_Puffinus.xml. When the validation is running the information screen opens and displays a throbber while the file is being validated. In the screenshot the validation is complete and the screen shows the result - that there are 600 errors in the file.<br />
<br />
<br />
[[File:ReBIND_portal_validation.PNG/|border]]<br />
<br />
<br />
After validation is complete it is possible to review the validation results in detail in the reBiND editor (a modified version of the eXide editor which comes bundled with the eXist software). To open this editor the user should click on the 'Edit' button in the list of actions below the data file (in this case reBiND_Puffinus.xml). This opens the data file in the editor - a screenshot of this is shown below:<br />
<br />
<br />
[[File:ReBIND_portal_validation_report.PNG|border]]<br />
<br />
<br />
In the left-hand panel a list of validation errors can be seen and in the main editor window the data file is displayed. Clicking on any individual validation error in the left-hand panel takes the user to the corresponding error in the data file. Errors are marked with an red error icon within the left-hand margin. It is possible to make manual edits to the file to fix these errors, but when there are so many errors within a file this would be labour intensive. It the next step (the automated correction) we show how these errors can be fixed automatically using the reBiND correction software.<br />
<br />
== Running the automated correction ==<br />
<br />
'Start Correction' is the final action in the list below the data file. Clicking on this link takes the user to the following page:<br />
<br />
<br />
[[File:Correction_choose_config.png|border]]<br />
<br />
<br />
A drop-down menu gives a list of available configuration files. The first correction configuration ('default correction') is suitable for most ABCD files. Alternative corrections can be uploaded by the administrator to run different automated corrections This could depend on - for example - what sort of errors have been seen in the data file (in the validation step) or whether a different XML file has been used instead of the default ABCD data.<br />
<br />
<br />
After clicking on 'Start Correction' a throbber appears as the correction modules (specified in the configuration file) are run. When the correction is complete a report is generated (see the following screenshot):<br />
<br />
<br />
[[File:Correction_output_report.png|border]]<br />
<br />
<br />
The output shows a link to the original data file, a link to an XML version of the report and a tabular view of the report showing the number of each type of correction made. The level of 'info', 'warning' and 'error' are used to indicate the effect of the change as follows:<br />
<br />
* info - flags any minor change to the data where no problems are expected from the change.<br />
* warning - flags a change where it is uncertain that the new value is correct and it should be checked by the content administrator.<br />
* error - flags a a problem that could not be corrected and results in the file being invalid according to the associated schema.<br />
<br />
<br />
The report indicates several changes were made, including one change of a year date to an ISO DateTime and 1640 changes of abcd:LowerValue. These changes are both of type 'Element Text Replacer', where a standard pattern in the text is replaced with another. There were also 201 changes to remove unnecessary empty elements. There is another type of change called an 'Element Renamer' which renames incorrectly named elements within the XML file. Clicking back on the browser and then opening the reBiND editor by clicking the action 'Edit' under the data file allows the user to review these results from the automated correction. See the next sections for details on reviewing the corrected data file.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Validation_and_Corrections&diff=586Validation and Corrections2014-11-19T01:05:01Z<p>LornaMorris: /* Validation */</p>
<hr />
<div>= Validation and Correction =<br />
<br />
Once the XML data file has been uploaded into the reBiND system the user can validate and perform automated and manual corrections to the file before publishing. Publishing the data makes it available to the public search interface and also to biodiversity networks, such as GBIF. <br />
<br />
== Validation ==<br />
<br />
The figure below shows the result of clicking on the validation action for the file reBiND_Puffinus.xml. When the validation is running the information screen opens and displays a throbber while the file is being validated. In the screenshot the validation is complete and the screen shows the result - that there are 600 errors in the file.<br />
<br />
<br />
[[File:800px-ReBIND_portal_validation.PNG|border]]<br />
<br />
<br />
After validation is complete it is possible to review the validation results in detail in the reBiND editor (a modified version of the eXide editor which comes bundled with the eXist software). To open this editor the user should click on the 'Edit' button in the list of actions below the data file (in this case reBiND_Puffinus.xml). This opens the data file in the editor - a screenshot of this is shown below:<br />
<br />
<br />
[[File:ReBIND_portal_validation_report.PNG|border]]<br />
<br />
<br />
In the left-hand panel a list of validation errors can be seen and in the main editor window the data file is displayed. Clicking on any individual validation error in the left-hand panel takes the user to the corresponding error in the data file. Errors are marked with an red error icon within the left-hand margin. It is possible to make manual edits to the file to fix these errors, but when there are so many errors within a file this would be labour intensive. It the next step (the automated correction) we show how these errors can be fixed automatically using the reBiND correction software.<br />
<br />
== Running the automated correction ==<br />
<br />
'Start Correction' is the final action in the list below the data file. Clicking on this link takes the user to the following page:<br />
<br />
<br />
[[File:Correction_choose_config.png|border]]<br />
<br />
<br />
A drop-down menu gives a list of available configuration files. The first correction configuration ('default correction') is suitable for most ABCD files. Alternative corrections can be uploaded by the administrator to run different automated corrections This could depend on - for example - what sort of errors have been seen in the data file (in the validation step) or whether a different XML file has been used instead of the default ABCD data.<br />
<br />
<br />
After clicking on 'Start Correction' a throbber appears as the correction modules (specified in the configuration file) are run. When the correction is complete a report is generated (see the following screenshot):<br />
<br />
<br />
[[File:Correction_output_report.png|border]]<br />
<br />
<br />
The output shows a link to the original data file, a link to an XML version of the report and a tabular view of the report showing the number of each type of correction made. The level of 'info', 'warning' and 'error' are used to indicate the effect of the change as follows:<br />
<br />
* info - flags any minor change to the data where no problems are expected from the change.<br />
* warning - flags a change where it is uncertain that the new value is correct and it should be checked by the content administrator.<br />
* error - flags a a problem that could not be corrected and results in the file being invalid according to the associated schema.<br />
<br />
<br />
The report indicates several changes were made, including one change of a year date to an ISO DateTime and 1640 changes of abcd:LowerValue. These changes are both of type 'Element Text Replacer', where a standard pattern in the text is replaced with another. There were also 201 changes to remove unnecessary empty elements. There is another type of change called an 'Element Renamer' which renames incorrectly named elements within the XML file. Clicking back on the browser and then opening the reBiND editor by clicking the action 'Edit' under the data file allows the user to review these results from the automated correction. See the next sections for details on reviewing the corrected data file.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Validation_and_Corrections&diff=585Validation and Corrections2014-11-19T01:04:39Z<p>LornaMorris: /* Validation */</p>
<hr />
<div>= Validation and Correction =<br />
<br />
Once the XML data file has been uploaded into the reBiND system the user can validate and perform automated and manual corrections to the file before publishing. Publishing the data makes it available to the public search interface and also to biodiversity networks, such as GBIF. <br />
<br />
== Validation ==<br />
<br />
The figure below shows the result of clicking on the validation action for the file reBiND_Puffinus.xml. When the validation is running the information screen opens and displays a throbber while the file is being validated. In the screenshot the validation is complete and the screen shows the result - that there are 600 errors in the file.<br />
<br />
<br />
[[File:ReBIND_portal_validation.PNG/800px-ReBIND_portal_validation.PNG|border]]<br />
<br />
<br />
After validation is complete it is possible to review the validation results in detail in the reBiND editor (a modified version of the eXide editor which comes bundled with the eXist software). To open this editor the user should click on the 'Edit' button in the list of actions below the data file (in this case reBiND_Puffinus.xml). This opens the data file in the editor - a screenshot of this is shown below:<br />
<br />
<br />
[[File:ReBIND_portal_validation_report.PNG|border]]<br />
<br />
<br />
In the left-hand panel a list of validation errors can be seen and in the main editor window the data file is displayed. Clicking on any individual validation error in the left-hand panel takes the user to the corresponding error in the data file. Errors are marked with an red error icon within the left-hand margin. It is possible to make manual edits to the file to fix these errors, but when there are so many errors within a file this would be labour intensive. It the next step (the automated correction) we show how these errors can be fixed automatically using the reBiND correction software.<br />
<br />
== Running the automated correction ==<br />
<br />
'Start Correction' is the final action in the list below the data file. Clicking on this link takes the user to the following page:<br />
<br />
<br />
[[File:Correction_choose_config.png|border]]<br />
<br />
<br />
A drop-down menu gives a list of available configuration files. The first correction configuration ('default correction') is suitable for most ABCD files. Alternative corrections can be uploaded by the administrator to run different automated corrections This could depend on - for example - what sort of errors have been seen in the data file (in the validation step) or whether a different XML file has been used instead of the default ABCD data.<br />
<br />
<br />
After clicking on 'Start Correction' a throbber appears as the correction modules (specified in the configuration file) are run. When the correction is complete a report is generated (see the following screenshot):<br />
<br />
<br />
[[File:Correction_output_report.png|border]]<br />
<br />
<br />
The output shows a link to the original data file, a link to an XML version of the report and a tabular view of the report showing the number of each type of correction made. The level of 'info', 'warning' and 'error' are used to indicate the effect of the change as follows:<br />
<br />
* info - flags any minor change to the data where no problems are expected from the change.<br />
* warning - flags a change where it is uncertain that the new value is correct and it should be checked by the content administrator.<br />
* error - flags a a problem that could not be corrected and results in the file being invalid according to the associated schema.<br />
<br />
<br />
The report indicates several changes were made, including one change of a year date to an ISO DateTime and 1640 changes of abcd:LowerValue. These changes are both of type 'Element Text Replacer', where a standard pattern in the text is replaced with another. There were also 201 changes to remove unnecessary empty elements. There is another type of change called an 'Element Renamer' which renames incorrectly named elements within the XML file. Clicking back on the browser and then opening the reBiND editor by clicking the action 'Edit' under the data file allows the user to review these results from the automated correction. See the next sections for details on reviewing the corrected data file.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Data_upload_to_rebind_framework&diff=584Data upload to rebind framework2014-11-19T01:02:42Z<p>LornaMorris: /* Uploading data to the reBiND portal */</p>
<hr />
<div>= Uploading data to the reBiND portal =<br />
<br />
The previous data preparation step ensures the data is in ABCD format. Once the data has been prepared an XML file (conforming to the ABCD schema) can be exported from the Biocase Provider software. It should be noted that ABCD files can be prepared using any software. There are other types of software that use a metadata-based approach to extract data from CSV and relational databases and enable transformation into XML format, for example [http://community.pentaho.com/projects/data-integration/ Pentaho Kettle]. Furthermore the reBiND software has been designed to work with any XML file, not just ABCD, providing there is an associated XML schema available.<br />
<br />
In addition to at least one XML data file the project should also contain a metadata file. Additional files can also uploaded, such as the original data file (from which the XML data file was generated), images, other multimedia objects or PDF files. For the sake of clarity image and multimedia objects should be placed into special sub-collections within the data project collection, like ''images/''. A data project could also contain more than one XML data document, however only one metadata file. <br />
<br />
The figures below show a step-by-step guide to the process of uploading data to the reBiND portal. Once these steps are completed the user can continue to the the [[Validation_and_Corrections|validation and automated correction]] steps.<br />
<br />
== Logging onto reBiND ==<br />
<br />
The homepage of the reBiND portal has links to the published data-sets and to further information on the project web-site. In figure 1 below the left-hand margin shows the login form. Login is required in order to submit data to the reBiND system. For information on how to set up user accounts for login see this page.<br />
<br />
<br />
[[File:Rebind_portal_logon.PNG|border]]<br />
<br />
== Creating and viewing unpublished projects ==<br />
<br />
After logging into the reBiND portal the user is presented with a left-hand side panel which lists the 'Unpublished' and 'Published' projects. An icon 'Create Project' which is used to create a new unpublished project to which XML files and other data can be imported. In the screenshot below the unpublished project 'ClemensHBG' has been selected and the summary of the files associated with this project can be seen in the right-hand panel. Below this are icons to upload further data.<br />
<br />
<br />
[[File:ReBIND_portal_project_overview.PNG|border]]<br />
<br />
<br />
To create an entirely new project the user should click 'Create Project' in the left-hand side panel and in the pop up form supply a unique name for the new project. The project should have a clear descriptive name, but must not contain special characters, digits, spaces or dashes. In the figure below we have used 'Puffinus' to identify the project - data on the ''Puffinus Creatopus'' - the pink-footed Shearwater.<br />
<br />
<br />
[[File:ReBIND_portal_project_create_project.PNG|border]]<br />
<br />
== Importing XML and other data files from file system ==<br />
<br />
After creating the new project the project name (in this example 'Puffinus') should appear in the list of un-published projects in the left-hand side panel.<br />
<br />
Clicking on the project name takes you to the list of files associated with the project. In this case there are no files yet associated with the project as it is a new empty project. The 'Upload File' can be used to upload any file type from the file system. Several files of various file types (e.g. XML, PDF and images) and folders can be added to a project. 'Upload from BioCASE' enables the user to connect to a specific ABCD file stored in the Biocase Provider software, if the URL is known. However this is currently restricted to files below a certain size (a maximum of 700 records / abcd:Units).<br />
<br />
[[File:ReBIND_portal_project_new_project.PNG|border]]<br />
<br />
The screenshot below shows the 'Upload from BioCASE' option. <br />
<br />
[[File:Upload_data_biocase.PNG|border]]<br />
<br />
<br />
Depending on the file type different options are offered for the current file. All file types have the option to view/download the file in its native form. Text based file types have the option to edit the file online. XML files can be validated against their schema (if it is registered with the reBiND Software) and can be corrected or modified by running automated corrections on them. Below the details of the data file 'reBiND_Puffinus.xml' are shown and the list of available actions. 'View XML' and 'View Data' link to an XML view or a tabular view of the data respectively. In the [[Validation_and_Corrections|next section]] we'll describe remaining actions in turn, going into detail of how to run the validation and correction actions.<br />
<br />
<br />
[[File:ReBIND_portal_project_upload_file_actions.PNG|border]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Administration&diff=583Administration2014-11-19T00:59:13Z<p>LornaMorris: </p>
<hr />
<div>==Administration==<br />
<br />
An XML document can be uploaded to the eXist database even if the document is not valid according to the schema used. However it is mandatory that the document is well formed, otherwise errors will occur when trying to store the document. <br />
<br />
But before the documents can be stored, a closer look at the collection structure within eXist is needed. The [[Glossary|reBiND Framework]] depends on two collections for managing the different data projects. One of the collections is for the unpublished data projects, which are still being corrected, reviewed or otherwise prepared for publication. The other collection is for the published projects, which can be publicly searched and accessed. Within this document they will be referred to as ''unpublished'' and ''published''. Both of these collections are located in the root collection of eXist. Within these collections are the collection for the individual data projects. Having the two collections for unpublished and published data projects makes the security configuration also quite easy, since the security settings for these two collections are automatically inherited by data projects within these.<br />
<br />
Instructions on how to create projects via the reBiND user interface have been described in the [[Data_upload_to_rebind_framework|data archiving section]].<br />
<br />
The collection structure of eXist, showing the unpublished and published project collections is shown below:<br />
<pre><br />
/db/<br />
├─ (eXist default)<br />
├─ unpublished/<br />
│ ├─ (data-project-name 1)/<br />
│ │ ├─ data.xml<br />
│ │ ├─ metadata.xml<br />
│ │ ├─ original-data.xls<br />
│ │ └─ images/<br />
│ │ ├─ image_001.jpg<br />
│ │ ├─ image_002.jpg<br />
│ │ └─ ...<br />
│ └─ (data-project-name 2)/<br />
│ └─ ...<br />
└─ published/<br />
└─ (data-project-name 3)/<br />
├─ data.xml<br />
├─ metadata.xml<br />
├─ original-data.xls<br />
├─ images/<br />
│ ├─ image_001.jpg<br />
│ ├─ image_002.jpg<br />
│ └─ ...<br />
└─ multimedia/<br />
├─ movie1.mov<br />
├─ movie2.avi<br />
└─ ...<br />
</pre><br />
<br />
The majority of database administration can be done through the eXist Java webstart client. For example, the client can be used to add new user accounts, modify permissions for different database collections and delete/edit/query the data.<br />
<br />
==User management==<br />
<br />
Users can be created via the eXist Java admin client. For example creating a new user for the contributing scientist is shown in the screenshot below.<br />
<br />
[[File:Create_exist_user.PNG|border]]<br />
<br />
In the above example the 'contributing scientist' is allocated to the group 'users', so the /db/unpublished/ directory within the eXist database should be made accessible to the group 'users'.<br />
<br />
==Correction Manager==<br />
<br />
Though all of the corrections and modifications to the data document could be done using XQuery and the XQuery Update Facility, it was decided to not have the corrections run in XQuery directly. The Correction Manager is written in Java. It is only loosely coupled to eXist in order to make the Framework more modular. Instead of a document being directly accessed within eXist by the Correction Manager, it will be exported by the custom XQuery function to a regular XML file on the file system of the server. This file is then handed over to the Correction Manager. The source code for the eXist module that interacts with the Correction Manager is available from our [http://ww2.biocase.org/svn/rebind/trunk/reBiND-eXist-Module/ subversion repository]. <br />
<br />
The actual corrections are not done by the Correction Manager, but by individual Correction Modules which are managed by the Correction Manager. The source code for the Correction Manager and correction modules is available from our [http://ww2.biocase.org/svn/rebind/trunk/reBiND-CorrectionManager/ subversion repository]. <br />
<br />
===Writing Custom Correction Modules===<br />
A Correction Module is a Java Class which implements a specific Java Interface. It is possible to extend the correction manager by adding a new module (this should implement the class [http://ww2.biocase.org/svn/rebind/trunk/reBiND-CorrectionManager/src/org/bgbm/rebind/correction/modules/ Module]). <br />
<br />
{{CodeExample|lang=Java| 1=<br />
package org.bgbm.rebind.correction.modules;<br />
<br />
import java.io.File;<br />
<br />
public interface Module {<br />
public String process(File inputFile, File outputFile, String[][] settings);<br />
}<br />
|description=The Java code of the Module Interface.}}<br />
<br />
A new module can be added to take care of a specific problem for the XML documents of the particular reBiND Instance, by writing a new class implementing the interface and putting the compiled class into a specific folder. The eXist server should be restarted but does not have to be rebuilt, which makes it very easy to add new correction modules. This was also one of the reasons why the corrections are done in Java code and not in XQuery. Also Java is much more widely used and understood than XQuery. Furthermore the Correction Manager could be used in stand-alone mode, either by creating an independent GUI tool or by calling the appropriate correction functions via the Command Line. This would require the ABCD data file and the configuration file (specifying the correction modules to be run) to be specified and then calling the startCorrection method in the CorrectionManager class.<br />
<br />
===Modifying the correction configuration file===<br />
The correction configuration file specifies which correction modules are run and in what order. It is an XML file stored in a special collection within eXist - the default location is /db/rebind/correction and the default name is default-correction.xml. By creating different configuration files the user can specify different checks, for example the default configuration checks for all possible errors that have been seen in ABCD files, in another configuration the user might just want to check if the ISO dates are formatted correctly. Details of how to modify the configuration file and an explanation of the function of the implemented modules are [[Correction Modules|described here]].<br />
<br />
==Using a different metadata format== <br />
<br />
We chose Ecological Markup Language (EML) for creating additional metadata to associate with our published ABCD datasets. EML is used by researchers to document a typical dataset in the ecological sciences. We chose to use [[Ecologial_Metadata_Language| a sub-set of EML to describe the core features of our datasets]]. We also investigated [[Metadata| several other Metadata standards]], before selecting EML as the most appropriate for our purpose.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Administration&diff=582Administration2014-11-19T00:50:30Z<p>LornaMorris: /* Administration */</p>
<hr />
<div>==Administration==<br />
<br />
An XML document can be uploaded to the eXist database even if the document is not valid according to the schema used. However it is mandatory that the document is well formed, otherwise errors will occur when trying to store the document. <br />
<br />
But before the documents can be stored, a closer look at the collection structure within eXist is needed. The [[Glossary|reBiND Framework]] depends on two collections for managing the different data projects. One of the collections is for the unpublished data projects, which are still being corrected, reviewed or otherwise prepared for publication. The other collection is for the published projects, which can be publicly searched and accessed. Within this document they will be referred to as ''unpublished'' and ''published''. Both of these collections are located in the root collection of eXist. Within these collections are the collection for the individual data projects. Having the two collections for unpublished and published data projects makes the security configuration also quite easy, since the security settings for these two collections are automatically inherited by data projects within these.<br />
<br />
Instructions on how to create projects via the reBiND user interface have been desctibed in the [[Data_upload_to_rebind_framework|data archiving section]].<br />
<br />
The collection structure of eXist, showing the unpublished and published project collections is shown below:<br />
<pre><br />
/db/<br />
├─ (eXist default)<br />
├─ unpublished/<br />
│ ├─ (data-project-name 1)/<br />
│ │ ├─ data.xml<br />
│ │ ├─ metadata.xml<br />
│ │ ├─ original-data.xls<br />
│ │ └─ images/<br />
│ │ ├─ image_001.jpg<br />
│ │ ├─ image_002.jpg<br />
│ │ └─ ...<br />
│ └─ (data-project-name 2)/<br />
│ └─ ...<br />
└─ published/<br />
└─ (data-project-name 3)/<br />
├─ data.xml<br />
├─ metadata.xml<br />
├─ original-data.xls<br />
├─ images/<br />
│ ├─ image_001.jpg<br />
│ ├─ image_002.jpg<br />
│ └─ ...<br />
└─ multimedia/<br />
├─ movie1.mov<br />
├─ movie2.avi<br />
└─ ...<br />
</pre><br />
<br />
The majority of database administration can be done through the eXist Java webstart client. For example, the client can be used to add new user accounts, modify permissions for different database collections and delete/edit/query the data.<br />
<br />
==User management==<br />
<br />
Users can be created via the eXist Java admin client. For example creating a new user for the contributing scientist is shown in the screenshot below.<br />
<br />
[[File:Create_exist_user.PNG|border]]<br />
<br />
In the above example the 'contributing scientist' is allocated to the group 'users', so the /db/unpublished/ directory within the eXist database should be made accessible to the group 'users'.<br />
<br />
==Correction Manager==<br />
<br />
Though all of the corrections and modifications to the data document could be done using XQuery and the XQuery Update Facility, it was decided to not have the corrections run in XQuery directly. The Correction Manager is written in Java. It is only loosely coupled to eXist in order to make the Framework more modular. Instead of a document being directly accessed within eXist by the Correction Manager, it will be exported by the custom XQuery function to a regular XML file on the file system of the server. This file is then handed over to the Correction Manager. The source code for the eXist module that interacts with the Correction Manager is available from our [http://ww2.biocase.org/svn/rebind/trunk/reBiND-eXist-Module/ subversion repository]. <br />
<br />
The actual corrections are not done by the Correction Manager, but by individual Correction Modules which are managed by the Correction Manager. The source code for the Correction Manager and correction modules is available from our [http://ww2.biocase.org/svn/rebind/trunk/reBiND-CorrectionManager/ subversion repository]. <br />
<br />
===Writing Custom Correction Modules===<br />
A Correction Module is a Java Class which implements a specific Java Interface. It is possible to extend the correction manager by adding a new module (this should implement the class [http://ww2.biocase.org/svn/rebind/trunk/reBiND-CorrectionManager/src/org/bgbm/rebind/correction/modules/ Module]). <br />
<br />
{{CodeExample|lang=Java| 1=<br />
package org.bgbm.rebind.correction.modules;<br />
<br />
import java.io.File;<br />
<br />
public interface Module {<br />
public String process(File inputFile, File outputFile, String[][] settings);<br />
}<br />
|description=The Java code of the Module Interface.}}<br />
<br />
A new module can be added to take care of a specific problem for the XML documents of the particular reBiND Instance, by writing a new class implementing the interface and putting the compiled class into a specific folder. The eXist server should be restarted but does not have to be rebuilt, which makes it very easy to add new correction modules. This was also one of the reasons why the corrections are done in Java code and not in XQuery. Also Java is much more widely used and understood than XQuery. Furthermore the Correction Manager could be used in stand-alone mode, either by creating an independent GUI tool or by calling the appropriate correction functions via the Command Line. This would require the ABCD data file and the configuration file (specifiying the correction modules to be run) to be specified and then calling the startCorrection method in the CorrectionManager class.<br />
<br />
===Modifying the correction configuration file===<br />
The correction configuration file specifies which correction modules are run and in what order. It is an XML file stored in a special collection within eXist - the default location is /db/rebind/correction and the the default name is default-correction.xml. By creating different configuration files the user can specifiy different checks, for example the default configuration checks for all possible errors that have been seen in ABCD files, in another configuration the user might just want to check if the ISO dates are formatted correctly. Details of how to modify the configuration file and an explanation of the function of the implemented modules are [[Correction Modules|described here]].<br />
<br />
==Using a different metadata format== <br />
<br />
We chose Ecological Markup Lanuguage (EML) for creating additional metadata to associate with our published ABCD datasets. EML is used by researchers to document a typical dataset in the ecological sciences. We chose to use [[Ecologial_Metadata_Language| a sub-set of EML to describe the core features of our datasets]]. We also investigated [[Metadata| several other Metadata standards]], before selecting EML as the most approriate for our purpose.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Administration&diff=581Administration2014-11-19T00:48:03Z<p>LornaMorris: </p>
<hr />
<div>==Administration==<br />
<br />
An XML document can be uploaded to the eXist database even if the document is not valid according to the schema used. However it is mandatory that the document is well formed, otherwise errors will occur when trying to store the document. <br />
<br />
But before the documents can be stored, a closer look at the collection structure within eXist is needed. The [[Glossary|reBiND Framework]] depends on two collections for managing the different data projects. One of the collections is for the unpublished data projects, which are still being corrected, reviewed or otherwise prepared for publication. The other collection is for the published projects, which can be publicly searched and accessed. Within this document they will be referred to as ''unpublished'' and ''published''. Both of these collections are located in the root collection of eXist. Within these collections are the collection for the individual data projects. Having the two collections for unpublished and published data projects makes the security configuration also quite easy, since the security settings for these two collections are automatically inherited by data projects within these.<br />
<br />
Instructions on how to create projects via the reBiND user interface have been desctibed in the [Data_upload_to_rebind_framework data archiving section].<br />
<br />
The collection structure of eXist, showing the unpublished and published project collections is shown below:<br />
<pre><br />
/db/<br />
├─ (eXist default)<br />
├─ unpublished/<br />
│ ├─ (data-project-name 1)/<br />
│ │ ├─ data.xml<br />
│ │ ├─ metadata.xml<br />
│ │ ├─ original-data.xls<br />
│ │ └─ images/<br />
│ │ ├─ image_001.jpg<br />
│ │ ├─ image_002.jpg<br />
│ │ └─ ...<br />
│ └─ (data-project-name 2)/<br />
│ └─ ...<br />
└─ published/<br />
└─ (data-project-name 3)/<br />
├─ data.xml<br />
├─ metadata.xml<br />
├─ original-data.xls<br />
├─ images/<br />
│ ├─ image_001.jpg<br />
│ ├─ image_002.jpg<br />
│ └─ ...<br />
└─ multimedia/<br />
├─ movie1.mov<br />
├─ movie2.avi<br />
└─ ...<br />
</pre><br />
<br />
The majority of the database administration can be done through the eXist Java webstart client. For example, the client can be used to add new user accounts, modify permissions to different database collections and delete/edit/query the data.<br />
<br />
==User management==<br />
<br />
Users can be created via the eXist Java admin client. For example creating a new user for the contributing scientist is shown in the screenshot below.<br />
<br />
[[File:Create_exist_user.PNG|border]]<br />
<br />
In the above example the 'contributing scientist' is allocated to the group 'users', so the /db/unpublished/ directory within the eXist database should be made accessible to the group 'users'.<br />
<br />
==Correction Manager==<br />
<br />
Though all of the corrections and modifications to the data document could be done using XQuery and the XQuery Update Facility, it was decided to not have the corrections run in XQuery directly. The Correction Manager is written in Java. It is only loosely coupled to eXist in order to make the Framework more modular. Instead of a document being directly accessed within eXist by the Correction Manager, it will be exported by the custom XQuery function to a regular XML file on the file system of the server. This file is then handed over to the Correction Manager. The source code for the eXist module that interacts with the Correction Manager is available from our [http://ww2.biocase.org/svn/rebind/trunk/reBiND-eXist-Module/ subversion repository]. <br />
<br />
The actual corrections are not done by the Correction Manager, but by individual Correction Modules which are managed by the Correction Manager. The source code for the Correction Manager and correction modules is available from our [http://ww2.biocase.org/svn/rebind/trunk/reBiND-CorrectionManager/ subversion repository]. <br />
<br />
===Writing Custom Correction Modules===<br />
A Correction Module is a Java Class which implements a specific Java Interface. It is possible to extend the correction manager by adding a new module (this should implement the class [http://ww2.biocase.org/svn/rebind/trunk/reBiND-CorrectionManager/src/org/bgbm/rebind/correction/modules/ Module]). <br />
<br />
{{CodeExample|lang=Java| 1=<br />
package org.bgbm.rebind.correction.modules;<br />
<br />
import java.io.File;<br />
<br />
public interface Module {<br />
public String process(File inputFile, File outputFile, String[][] settings);<br />
}<br />
|description=The Java code of the Module Interface.}}<br />
<br />
A new module can be added to take care of a specific problem for the XML documents of the particular reBiND Instance, by writing a new class implementing the interface and putting the compiled class into a specific folder. The eXist server should be restarted but does not have to be rebuilt, which makes it very easy to add new correction modules. This was also one of the reasons why the corrections are done in Java code and not in XQuery. Also Java is much more widely used and understood than XQuery. Furthermore the Correction Manager could be used in stand-alone mode, either by creating an independent GUI tool or by calling the appropriate correction functions via the Command Line. This would require the ABCD data file and the configuration file (specifiying the correction modules to be run) to be specified and then calling the startCorrection method in the CorrectionManager class.<br />
<br />
===Modifying the correction configuration file===<br />
The correction configuration file specifies which correction modules are run and in what order. It is an XML file stored in a special collection within eXist - the default location is /db/rebind/correction and the the default name is default-correction.xml. By creating different configuration files the user can specifiy different checks, for example the default configuration checks for all possible errors that have been seen in ABCD files, in another configuration the user might just want to check if the ISO dates are formatted correctly. Details of how to modify the configuration file and an explanation of the function of the implemented modules are [[Correction Modules|described here]].<br />
<br />
==Using a different metadata format== <br />
<br />
We chose Ecological Markup Lanuguage (EML) for creating additional metadata to associate with our published ABCD datasets. EML is used by researchers to document a typical dataset in the ecological sciences. We chose to use [[Ecologial_Metadata_Language| a sub-set of EML to describe the core features of our datasets]]. We also investigated [[Metadata| several other Metadata standards]], before selecting EML as the most approriate for our purpose.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Export_DataPerfect&diff=580Export DataPerfect2014-11-19T00:45:11Z<p>LornaMorris: </p>
<hr />
<div>This article describes how to export data from a DataPerfect database, based on the Rohwer data set. <br />
<br />
Data Perfect is a DOS based database system. It's latest release is 2.6Y from June 2008. It can be downloaded via http://dataperfect.nl/ . <br />
<br />
=== Install and run DOSBox ===<br />
: To run DataPerfect a DOS emulator is needed. The free emulator DOSBox works quite well. Go to http://www.dosbox.com/download.php?main=1 or http://sourceforge.net/projects/dosbox/, download the latest release and install it. For this article DOSBox 0.74 will be used and installed under Windows 7. <br />
<br />
=== Download DataPerfect ===<br />
: Go to http://dataperfect.nl/ and download the DataPerfect. Unzip the files into a specific folder. The folder used for this article is ''C:/DOS/DP26Y''.<br />
<br />
=== Copy your data files in the DataPerfect Folder===<br />
: You can either copy them directly in the folder or create a subfolder for your data files and copy them in there. The data files for this article are located in the directory ''DIAS/'' within the DataPerfect folder.<br />
<br />
=== Start DOSBox===<br />
: When starting DOSBox a second console window is opened. When running several instances of DOSBox these additional windows can bloat the window bar quite a bit. To avoid this just start the version ''DOXBox-0.74/Extras/DOSBox 0.74 (noconsole)'' from the Program Menu.<br />
[[File:DPE01_DOSBox.png|frame|none|The DOSBOX Start Window]]<br />
<br />
=== Mount the DataPerfect Folder===<br />
: After starting DOSBox the directory of the DataPerfect files needs to be mounted as a virtual drive. Type '''mount c C:\DOS\DP27Y''' to mount the folder and '''C:''' to change to the drive. <br />
[[File:DPE02_DOSBox_mount.png|frame|none|Mounting the DataPerfect Folder]]<br />
<br />
=== Start DataPerfect===<br />
: Start DataPerfect by typing '''DP26YU''' and pressing Enter.<br />
{|<br />
|[[File:DPE03_DataPerfect_start.png|frame|none|Switch to C: and start DataPerfect]]<br />
|[[File:DPE04_DataPerfect_welcome.png|frame|none|The Welcome Screen of DataPerfect]]<br />
|}<br />
<br />
=== Change to the folder of your data files===<br />
: If the data files are not in the directory of DataPerfect you need to change the directory. Press '''2''' to change the directory and type in the path to the directory of your data. <br />
{|<br />
|[[File:DPE05_DataPerfect_open.png|frame|none| The initial screen of DataPerfect]]<br />
|[[File:DPE06_DataPerfect_change_directory.png|frame|none|Change to the data directory]]<br />
|}<br />
<br />
=== Open the data files===<br />
: If you are in the correct directory you see the DataPerfect datasets within the folder. Use the cursor keys to select the correct data set and press enter.<br />
[[File:DPE07_DataPerfect_select_DIAS.png|frame|none|Selecting the DIAS data set.]]<br />
<br />
=== Select the table to export ===<br />
You can now see the tables within this database project. Select the table you want to export by using the up or down keys and pressing enter. <br />
{|<br />
|[[File:DPE08_DIAS_Overview.png|frame|none|The list of available tables.]]<br />
|[[File:DPE09_DIAS_Table1.png|frame|none|The first entry in the first table of DIAS]]<br />
|}<br />
=== Navigate the Table ===<br />
''This these steps are not necessary for the export, but will be documented here anyway.'' <br />
Here are important keys for navigating thought a table: <br />
* '''Tab''' the next field is highlighted<br />
* '''Down''' on a field which has references to entries in another table, that referenced entry is displayed<br />
* '''Up''' The list of entries in this table where the content of the current field is displayed. Navigate though this list using the '''Up''' and '''Down''' keys. When typing characters this list the focus will jump to the entry whose unique key column is like the typed characters. For example in the table displayed in the image below, typing a number will focus on the entry with that id. The corresponding entry is automatically displayed. To edit this entry, press '''Enter'''. <br />
* '''F7''' Goes to the next upper level in the hierarchy. For example if the list of entries is opened, it will jump back to the next upper level in the hierarchy. <br />
[[File:DPE10_DIAS_Table1_browse.png|frame|none|Browsing through the first table of DIAS]]<br />
<br />
=== Select Report ===<br />
From the list of available reports select the entry at the top: '''Build-In Short Reports'''.<br />
[[File:DPE11_DIAS_Table1_Report_List.png|frame|none|The list of available reports.]]<br />
<br />
=== Set Export Properties ===<br />
Set the export settings as shown in the first image below. Press the number of the property you want to change. For example, to change the file name of the output file, press '''2''' and the file options will appear, as can be seen in the second image. Press '''1''' to create a new file and enter the name of the file (image 3). <br />
The filename of the export file must not be longer than 8 characters without file extension and the file extension itself (without the dot) must not be longer than 3 characters.<br />
{|<br />
|[[File:DPE12_DIAS_Table1_Export_1.png|frame|none|Export Settings Overview]]<br />
|[[File:DPE13_DIAS_Table1_Export_2.png|frame|none|Changing the File Options]]<br />
|[[File:DPE14_DIAS_Table1_Export_3.png|frame|none|Changing the File Name]]<br />
|}<br />
<br />
=== Run Export ===<br />
To start the export press '''Shift + F7''' again. The screen now shows a counter of how many elements from the table have already been exported. DOSBox starts with a limited CPU speed for the programs running in it. The speed can be increased by pressing '''Ctrl + F12'''. Increasing the speed to much, however creates a big overhead which will also result in a slower export. For some reason when increasing the speed the CPU load will jump from around 5% of one core to 100% within just 2-3 increase steps. So the CPU load should be watched when increasing the speed of DOSBox. On a DualCore 3 GHz Processor a speed of 50000-55000 cycles (can be seen in the title bar of the DOSBox Window) appears to be a good export speed without overhead. <br />
[[File:DPE15_DIAS_Export.png|frame|none|The export is running.]]<br />
<br />
After the export is done, you will see the list of available reports again. To step to the next higher level of the hierarchy press '''F7'''. Repeat this until you see the list of tables within this database project again. Now repeat the export process for all the other tables you want to export. <br />
<br />
=== Converting special characters ===<br />
The best way to handle special characters is to know the character encoding by the original file system. If it is a DOS based system, the command '''CHCP''' will display what Character Code Page used. With this knowledge the file can be easily converted. In DOSBox the Command is '''KEYB''' however, it only helps if all of the special characters are displayed correctly within the DataPerfect. In the case of the Rohwer data set, the KEYB command showed Codepage 437, however Codepage 850 was actually used, must of the special characters are however identical between the two sets. <br />
<br />
Once the character encoding is known the exported file converted by a program that is able to read that encoding. Under Windows Notepad++ does a good job. After opening the file (the special characters are probably distorted), one must select the original character encoding as the encoding of the file and then convert the file to UTF-8. <br />
<br />
Google Refine is another software that is able to read the CP 850 encoding and allows for further processing of the files.<br />
<br />
Additional Infomation about Codepages can be found on the Wikipedia Article ''[http://en.wikipedia.org/wiki/Code_page Code page]''. Also the article ''[http://www.joelonsoftware.com/articles/Unicode.html The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)]'' by Joel Spolsky is very helpful. <br />
<br />
In retrospective we now know that the Rohwer data set is encoded using the Codepage 850 (also called OEM 850), however some single characters are in the wrong encoding. This however is the result of the process where we parsed the file, got a list of all the different characters used and looked them up and converted them by hand. Though this process is a lot more work, but it works even in the case of mixed character sets. It is documented in detail below.<br />
<br />
<br />
==== converting characters individually ====<br />
The export out of DataPerfect has problems with special characters which are not converted correctly in a proper character set. Though all of the occurrences of such special characters can be replaced automatically, each of the character to be replaced has to be defined manually one. The files as they are generated by DataPerfect use the ANSI Character Encoding. This needs to be converted to UTF8. Under Windows the free program Notepad++ is very suited for this. When selecting the menu item '''Encoding''' (''Kodierung'' in the image, because it is the German version of Notepad++), the entry ''ANSI'' should be marked as the current encoding. Now click on '''Convert to UTF-8''' and save the file. <br />
[[File:DPE16 Notepad CharSet.png|frame|none|Converting the Character Set in Notepad++]]<br />
The next step is to run the file through a program that converts the characters. The small Java program CharReplacer was written just to do that. There are two different files to that program: CharReplacer.class (the acutual program) and CharReplacer.settings (the settings file, which specifies, what characters to replace. The first line in ''CharReplacer.settings'' are all the characters which will not be replaced. All the following lines have the number value of the character which will be replaced, followed by a tab and the character it will be replaced with.<br />
<br />
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",.-; ()?\':/=&[]!`<>#+%<br />
8222 ä<br />
8221 ö<br />
129 ü<br />
225 ß<br />
381 Ä<br />
8482 Ö<br />
353 Ü<br />
Excerpt from the ''CharReplacer.settings'' file.<br />
<br />
To run the CharReplacer program, the Java Runtime Envirement (JRE) must be installed on a computer. It must be run via the command line interface (also known as console). <br />
<br />
Under Windows the Command Prompt (cmd.exe) can be used, but it needs to be adjusted to show the UTF-8 characters. <br />
<br />
''(The following steps displayed in italics only need to be done if the Windows Command Prompt is used.)''<br />
* Copy the files CharReplacer.class and CharReplacer.settings in the folder in which the exported files are (in the example from above it is C:\DOS\DP26Y\DIAS\)<br />
* Start the console.<br />
* Change to the directory in which the files are.<br />
* ''Set the Font for the Command Prompt to '''Lucida Console'''''<br />
* ''Switch the Character Set for the console by typing '''chcp 65001'''''<br />
* run the program by typing '''java -Dfile.encoding=UTF-8 CharReplacer''' followed by the name or names of the files you want to run the program on, e.g. ''java -Dfile.encoding=UTF-8 CharReplacer OUTPUT1.EXP'' and hit Enter.<br />
<br />
{|<br />
|[[File:DPE17 Console change Font.png|frame|none|Changing the Font in the Windows Command Prompt (German version)]]<br />
|[[File:DPE18 Console start CharReplacer.png|frame|none|Running the CharReplacer]]<br />
|}<br />
<br />
The output of the program will be a file with the name of the input file, but with an additional '''.csv''' extension at the end. So '''OUTPUT1.EXP''' will become '''OUTPUT1.EXP.csv'''.<br />
<br />
If the program find characters which are not in the list of allowed characters (the first line of the file) and for which no replacement rule exists, it will not replace this character, but it will show a message, informing the user that an unknown character was found, where it was found (file, line and column), what it looks like in UTF-8 and what its code is. If this occurs it is necessary to review the file at this position and create a new rule on how to handle that character. If the program prints out an unknown character, open the file and look at the given position. If the character is correct and represents precisely what was meant in the original dataset, then add this character at the end of the first line in the settings file. If it is a wrong character however, try to figure out what character it is meant to be. This could be derived from the context in which this unknown character appears. In the example below, the unknown character is displayed as '''†''' and appears in the work '''"R†dhusplassen"'''. People how are familiar with Norwegian might recognize the word as '''"Rådhusplassen"''' (the Norwegian word for "City Hall Place" or "Town Hall Place"). In other cased it could be not so clear. So it becomes necessary to look up the entry in the original DataPerfect table and see, what character was originally entered. In the third image, it can be seen that in this case the unknown character is indeed a '''å'''. So a new line can be added at the end of the settings file with the following content: <br />
8224 å<br />
The 8224 is the code for the character, as shown in the message of the CharReplacer program in the first image. <br />
<br />
To edit the ''CharReplacer.settings'' file, use a regular text editor (like Notepad++ under Windows).<br />
<br />
{|<br />
|[[File:DPE19 Console run CharReplacer.png|frame|none|some unknown characters were found]]<br />
|[[File:DPE20_Notepad_Special_Characters.png|frame|none|Viewing the unconverted character in Notepad++]]<br />
|[[File:DPE21 DataPerfect Special Characters.png|frame|none|Viewing the unconverted character in the original data entry in DataPerfect.]]<br />
|}<br />
<br />
=== Documenting Foreign Key Relations ===<br />
When the data is exported out of DataPerfect the associations between the tables are lost. So it is important to take a look at the exported data and see the relation between the tables. What columns are unique keys for their table and what columns are foreign key links to other tables. Looking at the tables in DataPerfect is helpful in this context, as the UI of DataPerfect sometimes shows columns which are not part of the export of that table, so they must be loaded from a different table. Sometimes when selecting a field in DataPerfect it also opens the window to another table, making it obvious that this column is a foreign key connection.<br />
<br />
=== Importing into a modern relational database ===<br />
Once the data is exported, converted into UTF-8 and the foreign key relations have been analyzed, it is possible to import the data in a modern relational data base, so it can be accessed by the BioCASe Provider Software. To do this, one must first create a database for it and the respective tables. The columns of the tables must already be prepared. It is important that the maximal length allowed for the columns is not smaller then the longest entry in this columns. Also the data formats must be correct. Though it is possible to store any kind of data in a text field, it will be more useful if the data is actually stored using the format it actually is. <br />
<br />
=== Next Steps ===<br />
Depending on how the tables are structured, it might become necessary to do a controlled denormalization so that the entries can be easier mapped to ABCD. This is described on the [http://wiki.bgbm.org/bps/index.php/Preparation Preparation page] of the [http://wiki.bgbm.org/bps BioCASe Provider Software Wiki].<br />
<br />
The other following steps are also described at the BioCASe Provider Software Wiki, like the mapping of the ABCD concepts. <br />
<br />
=== Alternative Ways of exporting data from a DataPerfect database ===<br />
Another way of exporting DataPerfect files is the [http://dans-dp-lib.sourceforge.net/ DANS DataPerfect Library]. There is a reference implementation of a [http://dans-dp-lib.sourceforge.net/Dp2MySqlExport.java DP2MySQLConverter]. After testing it on small sample databases, it worked fine, but it ran intro problems exporting the DIAS database, causing huge and corrupt output files. For other DataPerfect files it could still be useful, especially since it already handles the special characters correctly.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Installation&diff=579Installation2014-11-19T00:43:54Z<p>LornaMorris: /* Using the eXist Java admin client to prepare the database */</p>
<hr />
<div>==Installation==<br />
<br />
Before you use the reBiND archiving system the data must be transformed into a suitable XML format. We use ABCD format ([[ABCD_in_reBiND]]) and to create ABCD XML files we recommend using the Biocase Provider Software (BPS). To install this software please follow the [http://wiki.bgbm.org/bps/index.php/Installation installation instructions in the BPS wiki].<br />
<br />
===reBiND Data Portal Prerequisites===<br />
<br />
To be able to install the reBiND archiving system you first need to [http://exist-db.org/exist/apps/doc/quickstart.xml install the eXist database], version 2.0 or later. the eXist database requires 512 MB of RAM and about 200 MB of disk space. The critical requirement is that Java be installed. We have tested the reBiND Data Portal with eXist version 2.1 and Java version 1.7. All software components have been tested on Windows 7 and Debian GNU/Linux 6.<br />
<br />
===Installing the reBiND Data Portal===<br />
<br />
The following steps should be carried out to install the reBiND components. After each step we've provided the location of the diectory on Windows where the files should be added or modified assuming that eXist has been installed on the C drive at C:\eXist-db.<br />
<br />
# The rebind_module.jar is available from our subversion repository at: [http://ww2.biocase.org/svn/rebind/trunk/reBiND-eXist-Module/rebind-module.jar subversion rebind module]. Add the rebind_module.jar to the eXist/lib/user directory. (on Windows C:\eXist-db\lib\user). <br />
# Copy the folder called rebind ([http://ww2.biocase.org/svn/rebind/trunk available from our subersion repository]) to eXist/webapp (on Windows C:\eXist-db\webapp).<br />
<br />
===Configuring the reBiND Data Portal===<br />
<br />
<ol><br />
<li>Register the rebind-module by editing the eXist configuration file - eXist/conf.xml (on Windows C:\eXist-db\conf.xml). The following lines should be added within the <builtin-modules> element in the conf.xml:</li><br />
<br />
<syntaxhighlight><br />
<!-- modules for reBiND --><br />
<?xml version='1.0' encoding='UTF-8'?><br />
<module class="org.exist.xquery.modules.datetime.DateTimeModule" uri="http://exist-db.org/xquery/datetime"/><br />
<module class="org.exist.backup.xquery.BackupModule" uri="http://exist-db.org/xquery/backups"/><br />
<!-- Custom reBiND Module --><br />
<module class="org.bgbm.rebind.exist.module.RebindModule" uri="http://rebind.bgbm.org/code/module"/><br />
</syntaxhighlight><br />
<br />
Consult the [http://exist-db.org/exist/apps/doc eXist documentation] for further details about the module system.<br />
<br />
<li>Add the settings.conf file to the eXist-root (available from our subversion repository at http://ww2.biocase.org/svn/rebind/trunk/configuration/settings.conf). On Windows C:\eXist-db\settings.conf)</li><br />
<li>(Optional) In eXist-root/webapp/controller.xql replace lines 37-40 (within the <dispatch> element) with.:</li><br />
<syntaxhighlight><forward url="rebind/"/></syntaxhighlight><br />
This ensures the host URL formards to the homepage with the login to the rebind portal.<br />
<li>In eXist-root/webapp/WEB-INF/web.xml add the following at line 371:</li><br />
<br />
<syntaxhighlight><filter-mapping><br />
<filter-name>XFormsFilter</filter-name><br />
<url-pattern>/rebind/*</url-pattern><br />
</filter-mapping><br />
</syntaxhighlight><br />
</ol><br />
<br />
===Using the eXist Java admin client to prepare the database===<br />
<br />
<ol><br />
<li>Open the Java webstart admin client. There are several ways of launching the client. These are [http://exist-db.org/exist/apps/doc/java-admin-client.xml documented in detail on the eXist web-site]/.</li><br />
<li>When prompted enter the password for the admin user. This should have already been set during eXist installation.</li><br />
<li>Download the eXist db directory [http://ww2.biocase.org/svn/rebind/trunk/exist-db/db from our svn repository] (this contains database files that must be stored directly in eXist).</li><br />
<li>In the admin client navigate to db/apps and create a new collection called eXide2.</li><br />
<li>In the Tools menu of the admin client click 'Restore'. Then use the file browser to select the file to restore - in this case db/apps/eXide2/_contents_.xml from svn (see step 3).</li><br />
<li>Navigate back to the base collection in the admin client and create a collection called 'rebind'. Create the following child collections of 'rebind':<br />
<br />
*correction<br />
*protected<br />
*public<br />
*schema<br />
*template<br />
*xsl<br />
</li><br />
<br />
<li>Navigate to each child collection and restore the corresponding data files from svn as in step 5.</li><br />
</ol> <br />
For example navigate to /db/correction and restore the data with data from http://ww2.biocase.org/svn/rebind/trunk/exist-db/db/rebind/correction<br />
<br />
(Note the public and protected collections listed above remain empty until data is stored in eXist via the reBiND workflow.)</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Installation&diff=578Installation2014-11-19T00:42:39Z<p>LornaMorris: /* Configuring the reBiND Data Portal */</p>
<hr />
<div>==Installation==<br />
<br />
Before you use the reBiND archiving system the data must be transformed into a suitable XML format. We use ABCD format ([[ABCD_in_reBiND]]) and to create ABCD XML files we recommend using the Biocase Provider Software (BPS). To install this software please follow the [http://wiki.bgbm.org/bps/index.php/Installation installation instructions in the BPS wiki].<br />
<br />
===reBiND Data Portal Prerequisites===<br />
<br />
To be able to install the reBiND archiving system you first need to [http://exist-db.org/exist/apps/doc/quickstart.xml install the eXist database], version 2.0 or later. the eXist database requires 512 MB of RAM and about 200 MB of disk space. The critical requirement is that Java be installed. We have tested the reBiND Data Portal with eXist version 2.1 and Java version 1.7. All software components have been tested on Windows 7 and Debian GNU/Linux 6.<br />
<br />
===Installing the reBiND Data Portal===<br />
<br />
The following steps should be carried out to install the reBiND components. After each step we've provided the location of the diectory on Windows where the files should be added or modified assuming that eXist has been installed on the C drive at C:\eXist-db.<br />
<br />
# The rebind_module.jar is available from our subversion repository at: [http://ww2.biocase.org/svn/rebind/trunk/reBiND-eXist-Module/rebind-module.jar subversion rebind module]. Add the rebind_module.jar to the eXist/lib/user directory. (on Windows C:\eXist-db\lib\user). <br />
# Copy the folder called rebind ([http://ww2.biocase.org/svn/rebind/trunk available from our subersion repository]) to eXist/webapp (on Windows C:\eXist-db\webapp).<br />
<br />
===Configuring the reBiND Data Portal===<br />
<br />
<ol><br />
<li>Register the rebind-module by editing the eXist configuration file - eXist/conf.xml (on Windows C:\eXist-db\conf.xml). The following lines should be added within the <builtin-modules> element in the conf.xml:</li><br />
<br />
<syntaxhighlight><br />
<!-- modules for reBiND --><br />
<?xml version='1.0' encoding='UTF-8'?><br />
<module class="org.exist.xquery.modules.datetime.DateTimeModule" uri="http://exist-db.org/xquery/datetime"/><br />
<module class="org.exist.backup.xquery.BackupModule" uri="http://exist-db.org/xquery/backups"/><br />
<!-- Custom reBiND Module --><br />
<module class="org.bgbm.rebind.exist.module.RebindModule" uri="http://rebind.bgbm.org/code/module"/><br />
</syntaxhighlight><br />
<br />
Consult the [http://exist-db.org/exist/apps/doc eXist documentation] for further details about the module system.<br />
<br />
<li>Add the settings.conf file to the eXist-root (available from our subversion repository at http://ww2.biocase.org/svn/rebind/trunk/configuration/settings.conf). On Windows C:\eXist-db\settings.conf)</li><br />
<li>(Optional) In eXist-root/webapp/controller.xql replace lines 37-40 (within the <dispatch> element) with.:</li><br />
<syntaxhighlight><forward url="rebind/"/></syntaxhighlight><br />
This ensures the host URL formards to the homepage with the login to the rebind portal.<br />
<li>In eXist-root/webapp/WEB-INF/web.xml add the following at line 371:</li><br />
<br />
<syntaxhighlight><filter-mapping><br />
<filter-name>XFormsFilter</filter-name><br />
<url-pattern>/rebind/*</url-pattern><br />
</filter-mapping><br />
</syntaxhighlight><br />
</ol><br />
<br />
===Using the eXist Java admin client to prepare the database===<br />
<br />
<ol><br />
<li>Open the Java webstart admin client. There are several ways of launching the client. These are [http://exist-db.org/exist/apps/doc/java-admin-client.xml documented in detail on the eXist web-site]/.</li><br />
<li>When prompted enter the password for the admin user. This should have already been set during eXist installation.</li><br />
<li>Download the eXist db directory [http://ww2.biocase.org/svn/rebind/trunk/exist-db/db from our svn repository] (this contains database files that must be stored directly in eXist).</li><br />
<li>In the admin client navigate to db/apps and create a new collection called eXide2.</li><br />
<li>In the Tools menu of the admin client click 'Restore'. Then use the file browser to select the file to restore - in this case db/apps/eXide2/_contents_.xml from svn (see step 3).</li><br />
<li>Navigate back to the base collection in the admin client and create a collection called 'rebind'. Create the following child collections of 'rebind':<br />
<br />
*correction<br />
*protected<br />
*public<br />
*schema<br />
*template<br />
*xsl<br />
</li><br />
<br />
<li>Navigate to each child collection and restore the corresponding data files from svn as in step 5.</li><br />
</ol> <br />
For example navigate to /db/correction and restore the data with data from http://ww2.biocase.org/svn/rebind/trunk/exist-db/db/rebind/correction<br />
<br />
(Note the public and protected collections remain empty until data is stored in eXist via the reBiND workflow.)</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Ecological_Metadata_Language&diff=576Ecological Metadata Language2014-11-19T00:34:51Z<p>LornaMorris: LornaMorris moved page Ecologial Metadata Language to Ecological Metadata Language</p>
<hr />
<div>==EML - Ecological Metadata Language ==<br />
<br />
EML - Ecological Metadata Language is a detailed structured XML metadata specification for earth, environmental and ecological sciences. EML consists of a collection of modules based on XML document types.<br />
<br />
The modules can be used in a flexible way. Following types of modules are provided:<br />
<br />
• Top level resources distinguishing <br />
Data set resource, literature resource, software resource, protocol resource. These are compatible with Dublin Core syntax.<br />
<br />
• Supporting modules<br />
<br />
• Data organization modules<br />
<br />
• Entity types<br />
<br />
• Utility modules<br />
<br />
<br />
more about [http://knb.ecoinformatics.org/#external//emlparser/docs/index.html EML]<br />
<br />
<br />
Due to the complexity of EML we choose those EML elements which are provided by the Morpho tool (http://knb.ecoinformatics.org/software/dist/MorphoUserGuide.pdf) offering the minimum amount of documentation necessary for the description of a data package in EML.<br />
These are:<br />
<br />
• Title and Abstract (Description)<br />
<br />
• Keywords<br />
<br />
• Owner/Contact (People and Organizations)<br />
<br />
• Usage Rights<br />
<br />
• Coverage Details (geographical, temporal, taxonomic)<br />
<br />
• Methods</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Ecologial_Metadata_Language&diff=577Ecologial Metadata Language2014-11-19T00:34:51Z<p>LornaMorris: LornaMorris moved page Ecologial Metadata Language to Ecological Metadata Language</p>
<hr />
<div>#REDIRECT [[Ecological Metadata Language]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Overview_rebind_workflow&diff=575Overview rebind workflow2014-11-19T00:33:53Z<p>LornaMorris: /* Overview of the reBiND workflow */</p>
<hr />
<div>=== Overview of the reBiND workflow === <br />
<br />
[[File:Architecture-Concept.png|thumb|center|1000px|The general structure of the reBiND Framework. Blue solid lines indicate how the document is transformed and processed, orange dashed lines indicate user interaction or input.]]<br />
<br />
This figure shows the general structure of the reBiND processing architecture. It shows each step in the workflow from submission of a dataset, preparation and processing to its final publication.<br />
<br />
Before the data can be uploaded into the reBiND data portal several steps are required to prepare the data and map it to an appropriate schema. We have used the ABCD - Access to Biological Collections Data - schema ([[ABCD_in_reBiND|ABCD in reBiND]]). ABCD is a common data specification for biological collection units, including living and preserved specimens and field observations. <br />
<br />
The majority of data we received from contributing scientists was in spreadsheet format, which can easily be imported into a relational database. Once the data is in a rational database we used the BioCASe Provider Software (BPS) [http://wiki.bgbm.org/bps]. The BPS supports many different SQL based databases and these databases offer imports for different file types. In order to generate the XML files the columns from the relational database have to be mapped to the corresponding concepts of ABCD. At this point the expert knowledge of the contributing scientists is needed. At the end of the mapping process an ABCD XML document is generated. <br />
<br />
Once the data has been converted into an XML format it can be uploaded onto the reBiND web portal. After the XML document has been uploaded, the correction process can be started by the Content Administrator. <br />
<br />
The grey box in the figure highlights the steps between upload of the data into the reBiND portal and the validation, correction and review steps prior to publication of the data. The Correction Manager processes several correction modules, each for a specific purpose. When any of the modules makes any changes to the document or encounters problems, these issues are recorded in a document, so they can later be reviewed. When the modules are finished running the corrected document is loaded back into the reBiND system. At this stage the document should be valid or if the set of correction modules were unable to fix any problems encountered the remaining validation errors will be marked.<br />
<br />
The next step is the review. In the issue list produced by the Correction Manager issues of three different severity level are flagged. These are: <br />
* information (a change was made that is not expected to cause any problem)<br />
* warning (a change has been made or a problem with the content has been detected that can not be changed automatically but it has no consequence for the validity of the document)<br />
* error (a problem with the content has been detected that causes the document to be invalid and it can not be fixed automatically). <br />
<br />
The issues should be reviewed. Some of the problems could be the result of some technical issues and may be fixed by specifying new correction modules. Other problems could be caused by the content errors and therefore discussion with the contributing scientist might be necessary in order to fix these.<br />
<br />
Independent from the review, a metadata document has to be created. This can be done via a dedicated web form where the user can enter information describing the dataset. The data from this form is saved in another XML file (in [[Ecologial_Metadata_Language|EML format]]).<br />
<br />
Once the metadata document has been created and the changes and notes of the automated corrections have been reviewed, the data project can be published. This makes the now valid document and the metadata document publicly available. The data can now be accessed via a search form from the homepage or via data networks. In the case of the reBiND Service, the ABCD documents can be accessed via biodiversity networks like GBIF and BioCASe. A specialised module will translate the BioCASe Protocol, in which the query from these two networks are sent, into XQuery and then return the parts of the documents that are relevant to the query. <br />
<br />
After this overview of the processing architecture of the reBiND Framework, this text will now take a closer look at the individual steps.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Overview_rebind_workflow&diff=574Overview rebind workflow2014-11-19T00:33:17Z<p>LornaMorris: /* Overview of the reBiND workflow */</p>
<hr />
<div>=== Overview of the reBiND workflow === <br />
<br />
[[File:Architecture-Concept.png|thumb|center|1000px|The general structure of the reBiND Framework. Blue solid lines indicate how the document is transformed and processed, orange dashed lines indicate user interaction or input.]]<br />
<br />
This figure shows the general structure of the reBiND processing architecture. It shows each step in the workflow from submission of a dataset, preparation and processing to its final publication.<br />
<br />
Before the data can be uploaded into the reBiND data portal several steps are required to prepare the data and map it to an appropriate schema. We have used the ABCD - Access to Biological Collections Data - schema ([[ABCD_in_reBiND|ABCD in reBiND]]). ABCD is a common data specification for biological collection units, including living and preserved specimens and field observations. <br />
<br />
The majority of data we received from contributing scientists was in spreadsheet format, which can easily be imported into a relational database. Once the data is in a rational database we used the BioCASe Provider Software (BPS) [http://wiki.bgbm.org/bps]. The BPS supports many different SQL based databases and these databases offer imports for different file types. In order to generate the XML files the columns from the relational database have to be mapped to the corresponding concepts of ABCD. At this point the expert knowledge of the contributing scientists is needed. At the end of the mapping process an ABCD XML document is generated. <br />
<br />
Once the data has been converted into an XML format it can be uploaded onto the reBiND web portal. After the XML document has been uploaded, the correction process can be started by the Content Administrator. <br />
<br />
The grey box in the figure highlights the steps between upload of the data into the reBiND portal and the validation, correction and review steps prior to publication of the data. The Correction Manager processes several correction modules, each for a specific purpose. When any of the modules makes any changes to the document or encounters problems, these issues are recorded in a document, so they can later be reviewed. When the modules are finished running the corrected document is loaded back into the reBiND system. At this stage the document should be valid or if the set of correction modules were unable to fix any problems encountered the remaining validation errors will be marked.<br />
<br />
The next step is the review. In the issue list produced by the Correction Manager issues of three different severity level are flagged. These are: <br />
* information (a change was made that is not expected to cause any problem)<br />
* warning (a change has been made or a problem with the content has been detected that can not be changed automatically but it has no consequence for the validity of the document)<br />
* error (a problem with the content has been detected that causes the document to be invalid and it can not be fixed automatically). <br />
<br />
The issues should be reviewed. Some of the problems could be the result of some technical issues and may be fixed by specifying new correction modules. Other problems could be caused by the content errors and therefore discussion with the contributing scientist might be necessary in order to fix these.<br />
<br />
Independent from the review, a metadata document has to be created. This can be done via a dedicated web form where the user can enter information describing the dataset. The data from this form is saved in another XML file (in [Ecologial_Metadata_Language|EML format]).<br />
<br />
Once the metadata document has been created and the changes and notes of the automated corrections have been reviewed, the data project can be published. This makes the now valid document and the metadata document publicly available. The data can now be accessed via a search form from the homepage or via data networks. In the case of the reBiND Service, the ABCD documents can be accessed via biodiversity networks like GBIF and BioCASe. A specialised module will translate the BioCASe Protocol, in which the query from these two networks are sent, into XQuery and then return the parts of the documents that are relevant to the query. <br />
<br />
After this overview of the processing architecture of the reBiND Framework, this text will now take a closer look at the individual steps.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Data_Rescue_-_outdated_software_and_hardware&diff=252Data Rescue - outdated software and hardware2014-11-04T19:27:36Z<p>LornaMorris: /* Software */</p>
<hr />
<div>==Data Rescue - outdated software and hardware==<br />
<br />
This is not a core part of the workflow, however during the project we gained experience with transforming data stored in old and outdated software and hardware formats.<br />
<br />
===Software===<br />
We were able to rescue data from files in a variety of outdated formats. Details are described in the list below:<br />
* [[File Type Overview|Harvard Graphics and Word 4]]<br />
* [[Export DataPerfect|DataPerfect]]<br />
* [[Export Paradox Data|Paradox]]<br />
* [[Export dBase Data|dBase]]<br />
<br />
===Hardware===<br />
We received some data on 3.5" Floppies and on 5.25" floppy disks. Details on how to [[Reading 5.25" Floppy Disks|read 5.25 " Floppies read]] are provided.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Data_Rescue_-_outdated_software_and_hardware&diff=251Data Rescue - outdated software and hardware2014-11-04T19:26:41Z<p>LornaMorris: /* Software */</p>
<hr />
<div>==Data Rescue - outdated software and hardware==<br />
<br />
This is not a core part of the workflow, however during the project we gained experience with transforming data stored in old and outdated software and hardware formats.<br />
<br />
===Software===<br />
We were able to rescue data from files in a variety of outdated formats. Details are described in the list below:<br />
** [[File Type Overview|Harvard Graphics and Word 4]]<br />
** [[Export DataPerfect|DataPerfect]]<br />
** [[Export Paradox Data|Paradox]]<br />
** [[Export dBase Data|dBase]]<br />
*** Excel to Access database<br />
<br />
===Hardware===<br />
We received some data on 3.5" Floppies and on 5.25" floppy disks. Details on how to [[Reading 5.25" Floppy Disks|read 5.25 " Floppies read]] are provided.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Data_Rescue_-_outdated_software_and_hardware&diff=250Data Rescue - outdated software and hardware2014-11-04T19:25:24Z<p>LornaMorris: /* Software */</p>
<hr />
<div>==Data Rescue - outdated software and hardware==<br />
<br />
This is not a core part of the workflow, however during the project we gained experience with transforming data stored in old and outdated software and hardware formats.<br />
<br />
===Software===<br />
We were available to rescue data from files in a variety of outdated formats. Details are described in the list below:<br />
** Conversion for [[File Type Overview|selected file formats]]<br />
*** [[Export DataPerfect|DataPerfect]]<br />
*** [[Export Paradox Data|Paradox]]<br />
*** [[Export dBase Data|dBase]]<br />
*** Harvard Graphics<br />
*** Word 3 for DOS<br />
*** Excel to Access database<br />
** Document new data formats for others<br />
<br />
===Hardware===<br />
We received some data on 3.5" Floppies and on 5.25" floppy disks. Details on how to [[Reading 5.25" Floppy Disks|read 5.25 " Floppies read]] are provided.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Data_Rescue_-_outdated_software_and_hardware&diff=249Data Rescue - outdated software and hardware2014-11-04T19:22:08Z<p>LornaMorris: /* Hardware */</p>
<hr />
<div>==Data Rescue - outdated software and hardware==<br />
<br />
This is not a core part of the workflow, however during the project we gained experience with transforming data stored in old and outdated software and hardware formats.<br />
<br />
===Software===<br />
** [[Identifying Unknown File Formats]]<br />
** Conversion for [[File Type Overview|selected file formats]]<br />
*** [[Export DataPerfect|DataPerfect]]<br />
*** [[Export Paradox Data|Paradox]]<br />
*** [[Export dBase Data|dBase]]<br />
*** Harvard Graphics<br />
*** Word 3 for DOS<br />
*** Excel to Access database<br />
** Document new data formats for others<br />
<br />
===Hardware===<br />
We received some data on 3.5" Floppies and on 5.25" floppy disks. Details on how to [[Reading 5.25" Floppy Disks|read 5.25 " Floppies read]] are provided.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Data_Rescue_-_outdated_software_and_hardware&diff=248Data Rescue - outdated software and hardware2014-11-04T19:21:32Z<p>LornaMorris: /* Hardware */</p>
<hr />
<div>==Data Rescue - outdated software and hardware==<br />
<br />
This is not a core part of the workflow, however during the project we gained experience with transforming data stored in old and outdated software and hardware formats.<br />
<br />
===Software===<br />
** [[Identifying Unknown File Formats]]<br />
** Conversion for [[File Type Overview|selected file formats]]<br />
*** [[Export DataPerfect|DataPerfect]]<br />
*** [[Export Paradox Data|Paradox]]<br />
*** [[Export dBase Data|dBase]]<br />
*** Harvard Graphics<br />
*** Word 3 for DOS<br />
*** Excel to Access database<br />
** Document new data formats for others<br />
<br />
===Hardware===<br />
We received some data on 3.5" Floppies and on 5.25" floppy disks. Details on how to [[Reading 5.25" Floppy Disks|read 5.25 " Floppies read are provided here]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Data_Rescue_-_outdated_software_and_hardware&diff=247Data Rescue - outdated software and hardware2014-11-04T19:20:29Z<p>LornaMorris: </p>
<hr />
<div>==Data Rescue - outdated software and hardware==<br />
<br />
This is not a core part of the workflow, however during the project we gained experience with transforming data stored in old and outdated software and hardware formats.<br />
<br />
===Software===<br />
** [[Identifying Unknown File Formats]]<br />
** Conversion for [[File Type Overview|selected file formats]]<br />
*** [[Export DataPerfect|DataPerfect]]<br />
*** [[Export Paradox Data|Paradox]]<br />
*** [[Export dBase Data|dBase]]<br />
*** Harvard Graphics<br />
*** Word 3 for DOS<br />
*** Excel to Access database<br />
** Document new data formats for others<br />
<br />
===Hardware===<br />
We received some data on 3.5" Floppies and on 5.25" floppy discs.<br />
** [[Reading 5.25" Floppy Disks|5.25 " Floppies]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Data_Rescue_-_outdated_software_and_hardware&diff=246Data Rescue - outdated software and hardware2014-11-04T19:18:50Z<p>LornaMorris: </p>
<hr />
<div>==Data Rescue - outdated software and hardware===<br />
<br />
This is not a core part of the workflow, however during the project we gained experience with transforming data stored in old and outdated software and hardware formats.<br />
<br />
** [[Identifying Unknown File Formats]]<br />
** Conversion for [[File Type Overview|selected file formats]]<br />
*** [[Export DataPerfect|DataPerfect]]<br />
*** [[Export Paradox Data|Paradox]]<br />
*** [[Export dBase Data|dBase]]<br />
*** Harvard Graphics<br />
*** Word 3 for DOS<br />
*** Excel to Access database<br />
** Document new data formats for others<br />
* Hardware<br />
** 3.5" Floppies<br />
** [[Reading 5.25" Floppy Disks|5.25 " Floppies]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=EXist_and_xquery&diff=481EXist and xquery2014-11-04T19:16:26Z<p>LornaMorris: /* XQuery */</p>
<hr />
<div>==eXist==<br />
<br />
After an extensive evaluation of the available XML storage solutions eXist was selected as the storage solution for the reBiND project.<br />
<br />
Further information about eXist-db and how it benefits the reBiND Project can be found in the article [http://rebind.bgbm.org/exist-db Choosing eXist-db as the XML database for the reBiND project].<br />
<br />
The [http://jetty.codehaus.org/jetty/ Jetty Web Server] is included in the distribution of eXist. It already offers a simple web interface to access the data, an admin backend to manage the files and other tools which are useful for the reBiND Framework.<br />
<br />
When looking at the files stored in eXist an hierarchical structure can be seen, comparable to a regular file system. Within eXist '''Collections''' are the equivalent to folders in the file system. The root collection is ''/db/'' but usually this can be omitted, so ''/db/my_collection/data.xml'' is equivalent to ''/my_collection/data.xml''. XML files stored in eXist are usually called '''Documents''', since the more generic term '''Files''' also includes non-XML files, such as text files, images or binary files, which can also be uploaded into eXist. The only file type that cannot be uploaded are non-well-formed XML files.<br />
<br />
==XQuery==<br />
<br />
The web application for the reBiND data portal was coded using a combination of XQuery, XSLT, XForms, HTML and Javascript.<br />
<br />
eXist-db provides a complete platform for the development of rich web applications based on XML technologies. Detailed documentation on creating web application using eXist and XQuery is available on the [http://exist-db.org/ eXist web-site]. Generating web pages directly in XQuery is possible, however XQuery is mainly used to query and update the data stored within the eXist XML database. A combination of XSLT, HTML and Javascript is used to present the information to the user and make the web application interactive. The source code for the web application is available from [http://ww2.biocase.org/svn/rebind/trunk/rebind/ our subversion repository] and details on how to install and configure the reBiND data portal are [[Installation|available here]].<br />
<br />
The library of XQuery functions supported by eXist can be [http://exist-db.org/exist/apps/fundocs/index.html searched here].</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=EXist_and_xquery&diff=480EXist and xquery2014-11-04T19:13:31Z<p>LornaMorris: /* XQuery */</p>
<hr />
<div>==eXist==<br />
<br />
After an extensive evaluation of the available XML storage solutions eXist was selected as the storage solution for the reBiND project.<br />
<br />
Further information about eXist-db and how it benefits the reBiND Project can be found in the article [http://rebind.bgbm.org/exist-db Choosing eXist-db as the XML database for the reBiND project].<br />
<br />
The [http://jetty.codehaus.org/jetty/ Jetty Web Server] is included in the distribution of eXist. It already offers a simple web interface to access the data, an admin backend to manage the files and other tools which are useful for the reBiND Framework.<br />
<br />
When looking at the files stored in eXist an hierarchical structure can be seen, comparable to a regular file system. Within eXist '''Collections''' are the equivalent to folders in the file system. The root collection is ''/db/'' but usually this can be omitted, so ''/db/my_collection/data.xml'' is equivalent to ''/my_collection/data.xml''. XML files stored in eXist are usually called '''Documents''', since the more generic term '''Files''' also includes non-XML files, such as text files, images or binary files, which can also be uploaded into eXist. The only file type that cannot be uploaded are non-well-formed XML files.<br />
<br />
==XQuery==<br />
<br />
The web application for the reBiND data portal was coded using a combination of XQuery, XSLT, XForms, HTML and Javascript.<br />
<br />
eXist-db provides a complete platform for the development of rich web applications based on XML technologies. Detailed documentation on creating web application using eXist and XQuery is available on the [http://exist-db.org/ eXist web-site]. Generating web pages directly in XQuery is possible, however XQuery is mainly used to query and update the data stored within the eXist XML database. A combination of XSLT, HTML and Javascript is used to present the information to the user and make the web application interactive. The source code for the web application is available from [http://ww2.biocase.org/svn/rebind/trunk/rebind/ our subversion repository] and details on how to install and configure the reBiND data portal are [Installation|available here].<br />
<br />
The library of XQuery functions supported by eXist can be [http://exist-db.org/exist/apps/fundocs/index.html searched here].</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=EXist_and_xquery&diff=479EXist and xquery2014-11-04T19:09:34Z<p>LornaMorris: /* XQuery */</p>
<hr />
<div>==eXist==<br />
<br />
After an extensive evaluation of the available XML storage solutions eXist was selected as the storage solution for the reBiND project.<br />
<br />
Further information about eXist-db and how it benefits the reBiND Project can be found in the article [http://rebind.bgbm.org/exist-db Choosing eXist-db as the XML database for the reBiND project].<br />
<br />
The [http://jetty.codehaus.org/jetty/ Jetty Web Server] is included in the distribution of eXist. It already offers a simple web interface to access the data, an admin backend to manage the files and other tools which are useful for the reBiND Framework.<br />
<br />
When looking at the files stored in eXist an hierarchical structure can be seen, comparable to a regular file system. Within eXist '''Collections''' are the equivalent to folders in the file system. The root collection is ''/db/'' but usually this can be omitted, so ''/db/my_collection/data.xml'' is equivalent to ''/my_collection/data.xml''. XML files stored in eXist are usually called '''Documents''', since the more generic term '''Files''' also includes non-XML files, such as text files, images or binary files, which can also be uploaded into eXist. The only file type that cannot be uploaded are non-well-formed XML files.<br />
<br />
==XQuery==<br />
<br />
The web application for the reBiND data portal was coded using a combination of XQuery, XSLT, XForms, HTML and Javascript.<br />
<br />
eXist-db provides a complete platform for the development of rich web applications based on XML technologies. Detailed documentation on creating web application using eXist and XQuery is available on the [http://exist-db.org/ eXist web-site]. Generating web pages directly in XQuery is possible, however XQuery is mainly used to query and update the data stored within the eXist XML database. A combination of XSLT, HTML and Javascript is used to present the information to the user and make the web application interactive.<br />
<br />
The library of XQuery functions upported by eXist can be [http://exist-db.org/exist/apps/fundocs/index.html searched here].</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=EXist_and_xquery&diff=478EXist and xquery2014-11-04T19:02:09Z<p>LornaMorris: </p>
<hr />
<div>==eXist==<br />
<br />
After an extensive evaluation of the available XML storage solutions eXist was selected as the storage solution for the reBiND project.<br />
<br />
Further information about eXist-db and how it benefits the reBiND Project can be found in the article [http://rebind.bgbm.org/exist-db Choosing eXist-db as the XML database for the reBiND project].<br />
<br />
The [http://jetty.codehaus.org/jetty/ Jetty Web Server] is included in the distribution of eXist. It already offers a simple web interface to access the data, an admin backend to manage the files and other tools which are useful for the reBiND Framework.<br />
<br />
When looking at the files stored in eXist an hierarchical structure can be seen, comparable to a regular file system. Within eXist '''Collections''' are the equivalent to folders in the file system. The root collection is ''/db/'' but usually this can be omitted, so ''/db/my_collection/data.xml'' is equivalent to ''/my_collection/data.xml''. XML files stored in eXist are usually called '''Documents''', since the more generic term '''Files''' also includes non-XML files, such as text files, images or binary files, which can also be uploaded into eXist. The only file type that cannot be uploaded are non-well-formed XML files.<br />
<br />
==XQuery==<br />
<br />
The web application for the reBiND data portal was coded using a combination of XQuery, XSLT, HTML and Javascript.<br />
<br />
The library of XQuery functions upported by eXist can be [http://exist-db.org/exist/apps/fundocs/index.html searched here].</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=EXist_and_xquery&diff=477EXist and xquery2014-11-04T18:54:11Z<p>LornaMorris: /* eXist */</p>
<hr />
<div>==eXist==<br />
<br />
After an extensive evaluation of the available XML storage solutions eXist was selected as the storage solution for the reBiND project.<br />
<br />
Further information about eXist-db and how it benefits the reBiND Project can be found in the article [http://rebind.bgbm.org/exist-db Choosing eXist-db as the XML database for the reBiND project].<br />
<br />
The [http://jetty.codehaus.org/jetty/ Jetty Web Server] is included in the distribution of eXist. It already offers a simple web interface to access the data, an admin backend to manage the files and other tools which are useful for the reBiND Framework.<br />
<br />
When looking at the files stored in eXist an hierarchical structure can be seen, comparable to a regular file system. Within eXist '''Collections''' are the equivalent to folders in the file system. The root collection is ''/db/'' but usually this can be omitted, so ''/db/my_collection/data.xml'' is equivalent to ''/my_collection/data.xml''. XML files stored in eXist are usually called '''Documents''', since the more generic term '''Files''' also includes non-XML files, such as text files, images or binary files, which can also be uploaded into eXist. The only file type that cannot be uploaded are non-well-formed XML files.<br />
<br />
==xquery==</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Supporting_data_preparation_software&diff=465Supporting data preparation software2014-11-04T18:52:14Z<p>LornaMorris: </p>
<hr />
<div>== Software products to support data preparation ==<br />
<br />
In addition to the core reBiND software [Installation|described in the installation guide] several other software tools were created to support data preparation, by data cleaning, data substitutions and other modifications. These are outlined below.<br />
<br />
===Data Splitter===<br />
<br />
A Java-based program was written to facilitate preparation of data where a single field of data in a text file needs to be split into several fields. This ‘Data Splitter’ program requires the user to specify a regular expression where data in a single field should be split. <br />
<br />
The user interface is shown in the screenshot below.<br />
<br />
[[File:Data_splitter.PNG|border]]<br />
<br />
For example the user enters a regular expression in the Regex box which atomises the data by splitting it at the specified character, in this case it was a comma separated value (csv) file. In the example file the full locality information occurred in one field but we required it to be atomised. After clicking ‘Split Data’ the original data shown in column 1 is split into several numbered columns (1-7) to the right of this data. <br />
<br />
After splitting if any of the atomised data needs to be re-combined, highlighting cells (for example cells 5,6,7) and pressing Ctrl + J can be used to re-join any columns.<br />
<br />
<br />
===Character Encoding Correcter===<br />
===Stand-alone Correction Manager===<br />
<br />
The correction manager has been described in detail, in the context of the reBiND data portal. It could also be used in stand-alone mode. To enable this an additional Java main method, to specify the import and export files and the correction configuration file could be written. This enables the correction modules to be used directly from the command-line or via an entirely different user interface, thus it could be incorporated into other projects which required automated correction of XML files.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Supporting_data_preparation_software&diff=464Supporting data preparation software2014-11-04T18:48:09Z<p>LornaMorris: /* Software products to support data preparation */</p>
<hr />
<div>== Software products to support data preparation ==<br />
<br />
In addition to the core reBiND software [Installation|described in the installation guide] several other software tools were created to support data preparation, by data cleaning, data substitutions and other modifications. These are outlined below.<br />
<br />
== Data Splitter ==<br />
<br />
A Java-based program was written to facilitate preparation of data where a single field of data in a text file needs to be split into several fields. This ‘Data Splitter’ program requires the user to specify a regular expression where data in a single field should be split. <br />
<br />
The user interface is shown in the screenshot below.<br />
<br />
[[File:Data_splitter.PNG|border]]<br />
<br />
For example the user enters a regular expression in the Regex box which atomises the data by splitting it at the specified character, in this case it was a comma separated value (csv) file. In the example file the full locality information occurred in one field but we required it to be atomised. After clicking ‘Split Data’ the original data shown in column 1 is split into several numbered columns (1-7) to the right of this data. <br />
<br />
After splitting if any of the atmoised data needs to be re-combined, highlighting cells (for example cells 5,6,7) and pressing Ctrl + J can be used to re-join any columns.<br />
<br />
<br />
== Character Encoding Correcter ==<br />
== Stand-alone Correction Manager ==<br />
<br />
The correction manager has been described in detail, in the context of the reBiND data portal. It can also be used in stand-alone mode. My writing an additional Java main method, to specifiy the import and export files and the correction donfiguration file all of the correction modules can be used. This enables the program to be used directly from the command-line or via an entirely different user interface, thus it could be incorporated into other projects which required automated correction of XML files.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Entering_metadata&diff=423Entering metadata2014-11-04T18:38:20Z<p>LornaMorris: </p>
<hr />
<div>= Metadata Editor Tool =<br />
<br />
Before a dataset can be published by reBiND it is necessary to either upload a metadata file or create a metadata file using the reBiND software. We have implemented a tool consisting of a series of web-forms for capturing the metadata. The Metadata Editor tool is based on Ecological Metadata Language (EML). The specification for EML is available from [https://knb.ecoinformatics.org/#external//emlparser/docs/index.html 'The Knowledge Network for Biocomplexity']. We are using a subset of EML to describe the essential data most frequently used by our data providers. For example, Title, Abstract, Owner, Content and Usage Rights. Usage rights can be used to link to code of conduct, copyright statement and any license information. The tool can also be used to enter coverage information (Geographical, Temporal and Taxonomic) and scientific methods that were used to collect the data. The following screenshots show the series of forms for entering the metadata:<br />
<br />
===Metadata Description===<br />
<br />
The screenshot below shows the description form in the Metadata Capture Tool. The description should distinguish your data from other data and provide an abstract to describe key features, study design or methods used in the study. Any metadata already contained in the ABCD data files - such as abcd:Title and abcd:Details is transferred to the EML file and is pre-populated in the metadata capture form, thereby saving the user from entering the data again.<br />
<br />
<br />
[[File:Metadata1.PNG|border]]<br />
<br />
<br />
===Metadata Keywords===<br />
<br />
The screenshot below shows the Keywords form. Entering keywords enable better categorisation and searching of the data. Keywords can be entered in multiple sets by creating multiple thesauri.<br />
<br />
<br />
[[File:Metadata2.PNG|border]]<br />
<br />
<br />
===Metadata Contact/Owner===<br />
<br />
<br />
The screenshot shows the Contact/Owner form for entering details about the owner of the data.<br />
<br />
<br />
[[File:Metadata3.PNG|border]]<br />
<br />
<br />
===Metadata Usage Rights===<br />
<br />
The Usage Rights form allows you to enter a free text description describing the usage rights. It could be a standard agreement for reBiND data, such as the [http://www.biocase.org/whats_biocase/code_of_conduct.shtml Biocase Code of Conduct] or a more specific statement describing usage rights for the dataset itself, for example how to cite the data if it is re-used.<br />
<br />
<br />
[[File:Metadata4.PNG|border]]<br />
<br />
<br />
===Metadata Geographical Coverage===<br />
<br />
The screenshot shows the Geographical coverage form. Here you can enter the coordinates directly or use the zoom and drag to re-position the boundary box on the map.<br />
<br />
<br />
[[File:Metadata5.PNG|border]]<br />
<br />
<br />
===Metadata Temporal Coverage===<br />
<br />
The Temporal coverage form allows you to enter multiple date ranges to indicate when the data was gathered.<br />
<br />
<br />
[[File:Metadata6.PNG|border]]<br />
<br />
<br />
===Metadata Taxonomic Coverage===<br />
<br />
The Taxonomic coverage form allows you to add a taxonomic hierarchy or multiple taxons describing the organisms in the dataset.<br />
<br />
<br />
[[File:Metadata7.PNG|border]]<br />
<br />
<br />
===Metadata Methods===<br />
<br />
The Methods form allows you to specify the methods used in the dataset, such as field, laboratory and processing steps, sampling methods and the instrumentation used.<br />
<br />
<br />
[[File:Metadata8.PNG|border]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Entering_metadata&diff=422Entering metadata2014-11-04T18:35:35Z<p>LornaMorris: /* Metadata Editor Tool */</p>
<hr />
<div>= Metadata Editor Tool =<br />
<br />
Before a dataset can be published by reBiND it is necessary to either upload a metadata file or create a metadata file using the reBiND software. We have implemented a tool consisting of a series of web-forms for capturing the metadata. The Metadata Editor tool is based on Ecological Metadata Language (EML). The specification for EML is available from [https://knb.ecoinformatics.org/#external//emlparser/docs/index.html 'The Knowledge Network for Biocomplexity']. We are using a subset of EML to describe the essential data most frequently used by our data providers. For example, Title, Abstract, Owner, Content and Usage Rights. Usage rights can be used to link to code of conduct, copyright statement and any license information. The tool can also be used to enter coverage information (Geographical, Temporal and Taxonomic) and scientific methods that were used to collect the data. The following screenshots show the series of forms for entering the metadata:<br />
<br />
===Metadata Description===<br />
<br />
The screenshot below shows the description form in the Metadata Capture Tool. The description should distinguish your data from other data and provide an abstract to describe key features, study design or methods used in the study. Any metadata already contained in the ABCD data files - such as abcd:Title and abcd:Details is transferred to the EML file and is pre-populated in the metadata capture form, thereby saving the user from entering the data again.<br />
<br />
<br />
[[File:Metadata1.PNG|border]]<br />
<br />
<br />
===Metadata Keywords===<br />
<br />
The screenshot below shows the Keywords form. Entering keywords enable better categorisation and searching of the data. Keywords can be entered in multiple sets by creating multiple thesauri.<br />
<br />
<br />
[[File:Metadata2.PNG|border]]<br />
<br />
<br />
===Metadata Contact/Owner===<br />
<br />
<br />
The screenshot shows the Contact/Owner form for entering details about the owner of the data.<br />
<br />
<br />
[[File:Metadata3.PNG|border]]<br />
<br />
<br />
===Metadata usage rights===<br />
<br />
The Usage Rights form allows you to free text description describing the usage rights. It could be a standard agreement for reBiND data, such as the [http://www.biocase.org/whats_biocase/code_of_conduct.shtml Biocase Code of Conduct] or a more specific statement describing usage rights for the dataset itself, for example how to cite the data if it is re-used.<br />
<br />
<br />
[[File:Metadata4.PNG|border]]<br />
<br />
<br />
===Metadata uGeographical coverage===<br />
<br />
The screenshot shows the Geographical coverage form. Here you can enter the coordinates directly or use the zoom and drag to re-position the boundary box on the map.<br />
<br />
<br />
[[File:Metadata5.PNG|border]]<br />
<br />
<br />
===Metadata Temporal coverage===<br />
<br />
The Temporal coverage form allows you to enter multiple date ranges to indicate when the data was gathered.<br />
<br />
<br />
[[File:Metadata6.PNG|border]]<br />
<br />
<br />
===Metadata Taxonomic coverage===<br />
<br />
The Taxonomic coverage form allows you to add a taxonomic hierarchy or multiple taxons describing the organisms in the dataset.<br />
<br />
<br />
[[File:Metadata7.PNG|border]]<br />
<br />
<br />
===Metadata Methods===<br />
<br />
The Methods form allows you to specify the methods used in the dataset, such as field, laboratory and processing steps, sampling methods and the instrumentation used.<br />
<br />
<br />
[[File:Metadata8.PNG|border]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Entering_metadata&diff=421Entering metadata2014-11-04T18:27:07Z<p>LornaMorris: /* Metadata Editor Tool */</p>
<hr />
<div>= Metadata Editor Tool =<br />
<br />
Before a dataset can be published by reBiND it is necessary to either upload a metadata file or create a metadata file using the reBiND software. We have implemented a tool consisting of a series of web-forms for capturing the metadata. The Metadata Editor tool is based on Ecological Metadata Language (EML). The specification for EML is available from [https://knb.ecoinformatics.org/#external//emlparser/docs/index.html 'The Knowledge Network for Biocomplexity']. We are using a subset of EML to describe the essential data most frequently used by our data providers. For example, Title, Abstract, Owner, Content and Usage Rights. Usage rights can be used to link to code of conduct, copyright statement and any license information. The tool can also be used to enter coverage information (Geographical, Temporal and Taxonomic) and scientific methods that were used to collect the data. The following screenshots show the series of forms for entering the metadata:<br />
<br />
===Metadata Description===<br />
<br />
[[File:Metadata1.PNG|border]]<br />
<br />
<br />
The above screenshot shows the description form in the Metadata Capture Tool. The description should distinguish your data from other data and provide an abstract to describe key features, study design or methods used in the study. Any metadata already contained in the ABCD data files - such as abcd:Title and abcd:Details is transferred to the EML file and is pre-populated in the metadata capture form, thereby saving the user from entering the data again.<br />
<br />
===Metadata Keywords===<br />
<br />
[[File:Metadata2.PNG|border]]<br />
<br />
<br />
The above screenshot shows the Keywords form. Entering keywords enable better categorisation and searching of the data. Keywords can be entered in multiple sets by creating multiple thesauri.<br />
<br />
<br />
[[File:Metadata3.PNG|border]]<br />
<br />
<br />
The above screenshot shows the Contact/Owner form for entering details about the owner of the data.<br />
<br />
<br />
[[File:Metadata4.PNG|border]]<br />
<br />
<br />
The Usage Rights form allows you to free text description describing the usage rights. It could be a standard agreement for reBiND data, such as the [http://www.biocase.org/whats_biocase/code_of_conduct.shtml Biocase Code of Conduct] or a more specific statement describing usage rights for the dataset itself, for example how to cite the data if it is re-used.<br />
<br />
<br />
[[File:Metadata5.PNG|border]]<br />
<br />
<br />
The above screenshot shows the Geographical coverage form. Here you can enter the coordinates directly or use the zoom and drag to re-position the boundary box on the map.<br />
<br />
<br />
[[File:Metadata6.PNG|border]]<br />
<br />
<br />
The Temporal coverage form allows you to enter multiple date ranges to indicate when the data was gathered.<br />
<br />
<br />
[[File:Metadata7.PNG|border]]<br />
<br />
<br />
The Taxonomic coverage form allows you to add a taxonomic hierarchy or multiple taxons describing the organisms in the dataset.<br />
<br />
<br />
[[File:Metadata8.PNG|border]]<br />
<br />
<br />
The Methods from allows you to specify the methods used in the dataset, such as field, laboratory and processing steps, sampling methods and the instrumentation used.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Entering_metadata&diff=420Entering metadata2014-11-04T18:25:47Z<p>LornaMorris: /* Metadata Editor Tool */</p>
<hr />
<div>= Metadata Editor Tool =<br />
<br />
Before a dataset can be published by reBiND it is necessary to either upload a metadata file or create a metadata file using the reBiND software. We have implemented a tool consisting of a series of web-forms for capturing the metadata. The Metadata Editor tool is based on Ecological Metadata Language (EML). The specification for EML is available from [https://knb.ecoinformatics.org/#external//emlparser/docs/index.html 'The Knowledge Network for Biocomplexity']. We are using a subset of EML to describe the essential data most frequently used by our data providers. For example, Title, Abstract, Owner, Content and Usage Rights. Usage rights can be used to link to code of conduct, copyright statement and any license information. The tool can also be used to enter coverage information (Geographical, Temporal and Taxonomic) and scientific methods that were used to collect the data. The following screenshots show the series of forms for entering the metadata:<br />
<br />
[[File:Metadata1.PNG|border]]<br />
<br />
<br />
The above screenshot shows the description form in the Metadata Capture Tool. The description should distinguish your data from other data and provide an abstract to describe key features, study design or methods used in the study. Any metadata already contained in the ABCD data files - such as abcd:Title and abcd:Details is transferred to the EML file and is pre-populated in the metadata capture form, thereby saving the user from entering the data again.<br />
<br />
<br />
[[File:Metadata2.PNG|border]]<br />
<br />
<br />
The above screenshot shows the Keywords form. Entering keywords enable better categorisation and searching of the data. Keywords can be entered in multiple sets by creating multiple thesauri.<br />
<br />
<br />
[[File:Metadata3.PNG|border]]<br />
<br />
<br />
The above screenshot shows the Contact/Owner form for entering details about the owner of the data.<br />
<br />
<br />
[[File:Metadata4.PNG|border]]<br />
<br />
<br />
The Usage Rights form allows you to free text description describing the usage rights. It could be a standard agreement for reBiND data, such as the [http://www.biocase.org/whats_biocase/code_of_conduct.shtml Biocase Code of Conduct] or a more specific statement describing usage rights for the dataset itself, for example how to cite the data if it is re-used.<br />
<br />
<br />
[[File:Metadata5.PNG|border]]<br />
<br />
<br />
The above screenshot shows the Geographical coverage form. Here you can enter the coordinates directly or use the zoom and drag to re-position the boundary box on the map.<br />
<br />
<br />
[[File:Metadata6.PNG|border]]<br />
<br />
<br />
The Temporal coverage form allows you to enter multiple date ranges to indicate when the data was gathered.<br />
<br />
<br />
[[File:Metadata7.PNG|border]]<br />
<br />
<br />
The Taxonomic coverage form allows you to add a taxonomic hierarchy or multiple taxons describing the organisms in the dataset.<br />
<br />
<br />
[[File:Metadata8.PNG|border]]<br />
<br />
<br />
The Methods from allows you to specify the methods used in the dataset, such as field, laboratory and processing steps, sampling methods and the instrumentation used.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Entering_metadata&diff=419Entering metadata2014-11-04T18:21:08Z<p>LornaMorris: /* Metadata Editor Tool */</p>
<hr />
<div>= Metadata Editor Tool =<br />
<br />
Before a dataset can be published by reBiND it is necessary to either upload a metadata file or create a metadata file using the reBiND software. We have implemented a tool consisting of a series of web-forms for capturing the metadata. The Metadata Editor tool is based on Ecological Metadata Language (EML). The specification for EML is available from [https://knb.ecoinformatics.org/#external//emlparser/docs/index.html 'The Knowledge Network for Biocomplexity']. We are using a subset of EML to describe the essential data most frequently used by our data providers. For example, Title, Abstract, Owner, Content and Usage Rights. Usage rights can be used to link to code of conduct, copyright statement and any license information. The tool can also be used to enter coverage information (Geographical, Temporal and Taxonomic) and scientific methods that were used to collect the data. The following screenshots show the series of forms for entering the metadata:<br />
<br />
[[File:Metadata1.PNG|border]]<br />
<br />
<br />
The above screenshot shows the description form in the Metadata Capture Tool. The description should distinguish your data from other data and provide an abstract to describe key features, study design or methods used in the study. Any metadata already contained in the ABCD data files - such as abcd:Title and abcd:Details is transferred to the EML file and is pre-populated in the metadata capture form, thereby saving the user from entering the data again.<br />
<br />
<br />
[[File:Metadata2.PNG|border]]<br />
<br />
<br />
The above screenshot shows the Keywords form. Entering keywords enable better categorisation and searching of the data. Keywords can be entered in multiple sets by creating multiple thesauri.<br />
<br />
<br />
[[File:Metadata3.PNG|border]]<br />
<br />
<br />
The above screenshot shows the Contact/Owner form for entering details about the owner of the data.<br />
<br />
<br />
[[File:Metadata4.PNG|border]]<br />
<br />
<br />
The Usage Rights form allows you to free text description describing the usage rights. It could be a standard agreement for reBiND data, such as the [http://www.biocase.org/whats_biocase/code_of_conduct.shtml Biocase Code of Conduct] or a more specific statement describing usage rights for the dataset itself, for example how to cite the data if it is re-used.<br />
<br />
<br />
[[File:Metadata5.PNG|border]]<br />
<br />
<br />
The above screenshot shows the Geographical coverage form. Here you can enter the coordinates directly or use the zoom and drag to re-position the boundary box on the map.<br />
<br />
<br />
[[File:Metadata6.PNG|border]]<br />
<br />
<br />
The Temporal coverage form allows you to enter multiple date ranges to indicate when the data was gathered.<br />
<br />
<br />
[[File:Metadata7.PNG|border]]<br />
<br />
<br />
[[File:Metadata8.PNG|border]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Entering_metadata&diff=418Entering metadata2014-11-04T18:18:18Z<p>LornaMorris: /* Metadata Editor Tool */</p>
<hr />
<div>= Metadata Editor Tool =<br />
<br />
Before a dataset can be published by reBiND it is necessary to either upload a metadata file or create a metadata file using the reBiND software. We have implemented a tool consisting of a series of web-forms for capturing the metadata. The Metadata Editor tool is based on Ecological Metadata Language (EML). The specification for EML is available from [https://knb.ecoinformatics.org/#external//emlparser/docs/index.html 'The Knowledge Network for Biocomplexity']. We are using a subset of EML to describe the essential data most frequently used by our data providers. For example, Title, Abstract, Owner, Content and Usage Rights. Usage rights can be used to link to code of conduct, copyright statement and any license information. The tool can also be used to enter coverage information (Geographical, Temporal and Taxonomic) and scientific methods that were used to collect the data. The following screenshots show the series of forms for entering the metadata:<br />
<br />
[[File:Metadata1.PNG]]<br />
<br />
<br />
The above screenshot shows the description form in the Metadata Capture Tool. The description should distinguish your data from other data and provide an abstract to describe key features, study design or methods used in the study. Any metadata already contained in the ABCD data files - such as abcd:Title and abcd:Details is transferred to the EML file and is pre-populated in the metadata capture form, thereby saving the user from entering the data again.<br />
<br />
<br />
[[File:Metadata2.PNG]]<br />
<br />
<br />
The above screenshot shows the Keywords form. Entering keywords enable better categorisation and searching of the data. Keywords can be entered in multiple sets by creating multiple thesauri.<br />
<br />
<br />
[[File:Metadata3.PNG]]<br />
<br />
<br />
The above screenshot shows the Contact/Owner form for entering details about the owner of the data.<br />
<br />
<br />
[[File:Metadata4.PNG]]<br />
<br />
<br />
The Usage Rights form allows you to free text description describing the usage rights. It could be a standard agreement for reBiND data, such as the [http://www.biocase.org/whats_biocase/code_of_conduct.shtml Biocase Code of Conduct] or a more specific statement describing usage rights for the dataset itself, for example how to cite the data if it is re-used.<br />
<br />
<br />
[[File:Metadata5.PNG]]<br />
<br />
<br />
The above screenshot shows the Geographical coverage form. Here you can enter the coordinates directly or use the zoom and drag to re-position the boundary box on the map.<br />
<br />
<br />
[[File:Metadata6.PNG]]<br />
<br />
<br />
[[File:Metadata7.PNG]]<br />
<br />
<br />
[[File:Metadata8.PNG]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Manual_review_of_data&diff=405Manual review of data2014-11-04T18:15:29Z<p>LornaMorris: /* Reviewing corrections in the reBiND Editor */</p>
<hr />
<div>==Manual Review and Corrections==<br />
The results from the automated correction can be manually reviewed and edited via the web interface. It is especially importanat to review the errors and warnings. Errors and warnings could be caused by technical issues or with the content of the document itself. More minor changes are flagged as 'info' messages. If there are any errors or warnings it is advised to have both, the contributing scientist and the content administrator go through the review together. Errors in the document could be fixed by modifying the correction configuration and re-running the automated correction with a new configuration file.<br />
<br />
There is an online XML editor included in the eXist release, called eXide, an online demo is available at the eXist homepage. The eXide editor is a modification of the online source code editor ACE - [http://ace.ajax.org/ Could9 Editor]. With eXide it is possible to directly edit documents stored in the database, including features like syntax highlighting or code folding. <br />
<br />
The eXide editor that comes packaged with eXist has been modified to create the reBiND editor. <br />
<br />
===Reviewing corrections in the reBiND Editor===<br />
<br />
A screenshot showing the results of the correction on the reBiND_Puffinus.xml data file is shown below:<br />
<br />
<br />
[[File:Correction_output_review.png|border]]<br />
<br />
<br />
In the left-hand panel a list of 'Issues' is displayed and in the main editor window the data file is displayed. Clicking on any 'Issue' in the left-hand panel takes the user to the corresponding change in the data file. In the example shown the first issue in the list has been clicked. This expands the 'Issue' and shows the 'Old Content' and the 'New Content'. In this case the problem was that the XML schema required the content to be of the type xs:DateTime, but the old content only gave a year date range. The automated correction was of a type called 'Element Text Replacer'. This type of correction replaces a specific pattern (a regular expression) at a specified position within the XML file with some other text. The technical documentation details the different types of correction and [[Correction Modules|how to modify the correction modules and specify a different configuration]]. In this example the lower year is taken as the year and the date is assumed to be the 1st January of that year and the time is assumed to be midnight. If this change is acceptable to the reviewer then they can click the checkbox to indicate they agree within the change. <br />
<br />
If the change is not acceptable, the user should run another set of corrections on the original data file. To do this you need to [[Correction Modules|change the correction configuration or add new correction modules]], in consultation with a technical administrator.<br />
<br />
A modified GUI could also allow only changes of a certain type (class) or carried out by certain module to be displayed. It could also hide reviewed changes from the list. Though this might be a good way for the Content Administrator and the Technical Administrator to review the changes, there is still the problem that the Contributing Scientist is confronted with the XML document and required to work with it. So at some point in the future a better interface for the Contributing Scientist to work with the data might be desirable (e.g. the use of automatically generated web forms to edit the data), but for the general infrastructure described in this text, the online XML editor is sufficient. <br />
<br />
It is not necessary to review all the changes at once. The document can be stored at any time and the review process can be resumed at a later point. So it is possible that after the correction one of the administrators reviews the changes which are caused by technical issues or the XML format used, leaving only changes which are related to the data. Then the Contributing Scientist can review these.<br />
<br />
<br />
After the correction is finished the file can be validated again. This time - if the correction modules have been able to fix the original errors the file should be valid. The screenshot below shows rerunning the 'validation' on the reBiND_Puffinus.xml after the correction step has been run.<br />
<br />
<br />
[[File:Validation_final.PNG|border]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Manual_review_of_data&diff=404Manual review of data2014-11-04T18:08:30Z<p>LornaMorris: /* Reviewing corrections in the reBiND Editor */</p>
<hr />
<div>==Manual Review and Corrections==<br />
The results from the automated correction can be manually reviewed and edited via the web interface. It is especially importanat to review the errors and warnings. Errors and warnings could be caused by technical issues or with the content of the document itself. More minor changes are flagged as 'info' messages. If there are any errors or warnings it is advised to have both, the contributing scientist and the content administrator go through the review together. Errors in the document could be fixed by modifying the correction configuration and re-running the automated correction with a new configuration file.<br />
<br />
There is an online XML editor included in the eXist release, called eXide, an online demo is available at the eXist homepage. The eXide editor is a modification of the online source code editor ACE - [http://ace.ajax.org/ Could9 Editor]. With eXide it is possible to directly edit documents stored in the database, including features like syntax highlighting or code folding. <br />
<br />
The eXide editor that comes packaged with eXist has been modified to create the reBiND editor. <br />
<br />
===Reviewing corrections in the reBiND Editor===<br />
<br />
A screenshot showing the results of the correction on the reBiND_Puffinus.xml data file is shown below:<br />
<br />
<br />
[[File:Correction_output_review.png|border]]<br />
<br />
<br />
In the left-hand panel a list of 'Issues' can be seen and in the main editor window the data file is displayed. Clicking on any 'Issue' in the left-hand panel takes the user to the corresponding change in the data file. In the example shown the first issue in the list has been clicked. This expands the 'Issue' and shows the 'Old Content' and the 'New Content'. In this case the problem was that the XML schema required the content to be of the type xs:DateTime, but the old content only gave a year date range. The automated correction was of a type called 'Element Text Replacer'. This type of correction replaces a specific pattern (a regular expression) at a specified position within the XML file with some other text. The techinal documentation details the different types of correction and [[Correction Modules|how to modify the correction modules and specifiy a different configuration]]. In this example the lower year is taken as the year and the date is assumed to be the 1st January of that year and the time is assumed to be midnight. If this change is acceptable to the reviewer then they can click the checkbox to indicate they agree within the change. <br />
<br />
If the change is not acceptable, the user should run another set of corrections on the original data file. To do this you need to [[Correction Modules|change the correction configuration or add new correction modules]], in consultation with a technical administrator.<br />
<br />
When the reviewer clicks on any of the items in the list, the editor will directly jump to the element that was changed within the document. A modified GUI could also allow to only view changes of a certain type (class) or done by a certain module. It could also allow to flag changes as reviewed and hide reviewed changes from the list. <br />
<br />
Though this might be a good way for the Content Administrator and the Technical Administrator to review the changes, there is still the problem that the Contributing Scientist is confronted with the XML document and required to work with it. So at some point in the future a better interface for the Contributing Scientist to work with the data might be desirable (e.g. the use of automatically generated web forms to edit the data), but for the general infrastructure described in this text, the online XML editor is sufficient. <br />
<br />
It is not necessary to review all the changes at once. The document can be stored at any time and the review process can be resumed at a later point. So it is possible that after the correction one of the administrators reviews the changes which are caused by technical issues or the XML format used, leaving only changes which are related to the data. Then the Contributing Scientist can review the remaining changes.<br />
<br />
<br />
After the correction is finished the file can be validated again. This time - if the correction modules have been able to the original errors the file should be valid. The screenshot below shows rerunning the 'validation' on the reBiND_Puffinus.xml after the correction step has been run.<br />
<br />
<br />
[[File:Validation_final.PNG|border]]</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Validation_and_Corrections&diff=395Validation and Corrections2014-11-04T18:07:19Z<p>LornaMorris: /* Running the automated correction */</p>
<hr />
<div>= Validation and Correction =<br />
<br />
Once the XML data file has been uploaded into the reBiND system the user can validate and perform automated and manual corrections to the file before publishing. Publishing the data makes it available to the public search interface and also to biodiversity networks, such as GBIF. <br />
<br />
== Validation ==<br />
<br />
The figure below shows the result of clicking on the validation action for the file reBiND_Puffinus.xml. When the validation is running the information screen opens and displays a throbber while the file is being validated. In the screenshot the validation is complete and the screen shows the result - that there are 600 errors in the file.<br />
<br />
<br />
[[File:ReBIND_portal_validation.PNG|border]]<br />
<br />
<br />
After validation is complete it is possible to review the validation results in detail in the reBiND editor (a modified version of the eXide editor which comes bundled with the eXist software). To open this editor the user should click on the 'Edit' button in the list of actions below the data file (in this case reBiND_Puffinus.xml). This opens the data file in the editor - a screenshot of this is shown below:<br />
<br />
<br />
[[File:ReBIND_portal_validation_report.PNG|border]]<br />
<br />
<br />
In the left-hand panel a list of validation errors can be seen and in the main editor window the data file is displayed. Clicking on any individual validation error in the left-hand panel takes the user to the corresponding error in the data file. Errors are marked with an red error icon within the left-hand margin. It is possible to make manual edits to the file to fix these errors, but when there are so many errors within a file this would be labour intensive. It the next step (the automated correction) we show how these errors can be fixed automatically using the reBiND correction software.<br />
<br />
== Running the automated correction ==<br />
<br />
'Start Correction' is the final action in the list below the data file. Clicking on this link takes the user to the following page:<br />
<br />
<br />
[[File:Correction_choose_config.png|border]]<br />
<br />
<br />
A drop-down menu gives a list of available configuration files. The first correction configuration ('default correction') is suitable for most ABCD files. Alternative corrections can be uploaded by the administrator to run different automated corrections This could depend on - for example - what sort of errors have been seen in the data file (in the validation step) or whether a different XML file has been used instead of the default ABCD data.<br />
<br />
<br />
After clicking on 'Start Correction' a throbber appears as the correction modules (specified in the configuration file) are run. When the correction is complete a report is generated (see the following screenshot):<br />
<br />
<br />
[[File:Correction_output_report.png|border]]<br />
<br />
<br />
The output shows a link to the original data file, a link to an XML version of the report and a tabular view of the report showing the number of each type of correction made. The level of 'info', 'warning' and 'error' are used to indicate the effect of the change as follows:<br />
<br />
* info - flags any minor change to the data where no problems are expected from the change.<br />
* warning - flags a change where it is uncertain that the new value is correct and it should be checked by the content administrator.<br />
* error - flags a a problem that could not be corrected and results in the file being invalid according to the associated schema.<br />
<br />
<br />
The report indicates several changes were made, including one change of a year date to an ISO DateTime and 1640 changes of abcd:LowerValue. These changes are both of type 'Element Text Replacer', where a standard pattern in the text is replaced with another. There were also 201 changes to remove unnecessary empty elements. There is another type of change called an 'Element Renamer' which renames incorrectly named elements within the XML file. Clicking back on the browser and then opening the reBiND editor by clicking the action 'Edit' under the data file allows the user to review these results from the automated correction. See the next sections for details on reviewing the corrected data file.</div>LornaMorrishttps://wiki.bgbm.org/rebind_documentation/index.php?title=Manual_review_of_data&diff=403Manual review of data2014-11-04T18:05:43Z<p>LornaMorris: /* Manual Review and Corrections */</p>
<hr />
<div>==Manual Review and Corrections==<br />
The results from the automated correction can be manually reviewed and edited via the web interface. It is especially importanat to review the errors and warnings. Errors and warnings could be caused by technical issues or with the content of the document itself. More minor changes are flagged as 'info' messages. If there are any errors or warnings it is advised to have both, the contributing scientist and the content administrator go through the review together. Errors in the document could be fixed by modifying the correction configuration and re-running the automated correction with a new configuration file.<br />
<br />
There is an online XML editor included in the eXist release, called eXide, an online demo is available at the eXist homepage. The eXide editor is a modification of the online source code editor ACE - [http://ace.ajax.org/ Could9 Editor]. With eXide it is possible to directly edit documents stored in the database, including features like syntax highlighting or code folding. <br />
<br />
The eXide editor that comes packaged with eXist has been modified to create the reBiND editor. <br />
<br />
===Reviewing corrections in the reBiND Editor===<br />
<br />
A screenshot showing the results of the correction on the reBiND_Puffinus.xml data file is shown below:<br />
<br />
<br />
<br />
When the reviewer clicks on any of the items in the list, the editor will directly jump to the element that was changed within the document. A modified GUI could also allow to only view changes of a certain type (class) or done by a certain module. It could also allow to flag changes as reviewed and hide reviewed changes from the list. <br />
<br />
Though this might be a good way for the Content Administrator and the Technical Administrator to review the changes, there is still the problem that the Contributing Scientist is confronted with the XML document and required to work with it. So at some point in the future a better interface for the Contributing Scientist to work with the data might be desirable (e.g. the use of automatically generated web forms to edit the data), but for the general infrastructure described in this text, the online XML editor is sufficient. <br />
<br />
It is not necessary to review all the changes at once. The document can be stored at any time and the review process can be resumed at a later point. So it is possible that after the correction one of the administrators reviews the changes which are caused by technical issues or the XML format used, leaving only changes which are related to the data. Then the Contributing Scientist can review the remaining changes.</div>LornaMorris