Edit the parser, index new elements

From Berlin Harvesting and Indexing Toolkit
Jump to: navigation, search

First of all, we can recommend to use Eclipse (version Luna for example). You will need the Maven extension.
Import the B-HIT project in B-HIT (File/New/Maven Project -> put the location of the directory where you downloaded B-HIT).

Add an simple element (ie. it can be added to an existing table in the database)

A) Choose where you want to save it
B) Add a new column in the table from the concept that will have a this new element.
C) Document the file with the "SQL changes".
D) Edit the Java code!


For example, if you want to add an element linked to the gathering group, you will have to create a new String or Int value in the Gathering.java class, create the corresponding getters and setters, and extract the value from your document (ABCD, Darwin etc..). You will also have to edit the create and update queries for the corresponding concept (add the field name in the insert statement, add the value in the prepared statement, and check the query has the same number of fields than values!)

Add a repeatable or a more complex element or group of elements (ie. it will need a new table in the database)

1.Choose where you want to save it Example: add a reference group (ABCD2.06) http://www.bgbm.org/tdwg/codata/schema/ABCD_2.06/HTML/ABCD_2.06.html#complexType_Reference_Link031A69A8 A Reference is made of 3 elements: TitleCitation, CitationDetail and URI. As a Reference can be linked to several ABCD concepts, it might make more sense to link the Reference(s) to the concept than to the whole Unit

A) Create a new table in the database for the references, with an auto-incrementation ID. Put a new empty line (referenceID 1, titleCitation null, citationDetail null, URI null) because you will need a foreign key to make the rest easier.
B) Add a new column in the table from the concept that will have a Reference (ie. fk_referenceID), and configure it as a foreign-Key with the default value 1 for all the old records)
C) Document the file with the "SQL changes".
D) Edit the Java code!

  • Have a look at the src/org/binhum/abcd/Multimedia.java class. The new class could look like this:
// $Id$
/***************************************************************************
  * Copyright 2015 Global Biodiversity Information Facility Secretariat and Botanic Garden and Botanical Museum Berlin-Dahlem
  * Licensed under the Apache License, Version 2.0 (the "License"); you may not
  * use this file except in compliance with the License. You may obtain a copy of
  * the License at
  * http://www.apache.org/licenses/LICENSE-2.0
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
  * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
  * License for the specific language governing permissions and limitations under
  * the License.

***************************************************************************/
package org.binhum.abcd;

import java.util.Map;

import org.dom4j.Document;

import com.mysql.jdbc.StringUtils;


public class Reference extends XMLutil {

   private static final long serialVersionUID = 2096267214610763427L;
   private String URI;
   private String title;
   private String detail;
   private String standard; //abcd or abcd21
   
   private Map<String, String> namespaceMap;
   private Document xmlDocument;
   
   Reference(Map<String, String> namespaceMap, String standard) {
       this.namespaceMap=namespaceMap;
       this.standard=standard;
   }
   /**
    * @return the uRI
    */
   public String getURI() {
       return URI;
   }
   /**
    * @param uRI the uRI to set
    */
   public void setURI(String uRI) {
       URI = uRI;
   }
   /**
    * @return the title
    */
   public String getTitle() {
       return title;
   }
   /**
    * @param title the title to set
    */
   public void setTitle(String title) {
       this.title = title;
   }
   /**
    * @return the detail
    */
   public String getDetail() {
       return detail;
   }
   /**
    * @param detail the detail to set
    */
   public void setDetail(String detail) {
       this.detail = detail;
   }
   public Document getXmlDocument() {
       return xmlDocument;
   }
   public void setXmlDocument(Document xmlDocument) {
       this.xmlDocument = xmlDocument;
       detail = getTextValue(xmlDocument, "//"+standard+":Reference/"+standard+":CitationDetail", namespaceMap);
       title=getTextValue(xmlDocument, "//"+standard+":Reference/"+standard+":TitleCitation", namespaceMap);
       URI=getTextValue(xmlDocument, "//"+standard+":Reference/"+standard+":URI", namespaceMap);
   }
}
  • Save the new Reference and get it ID (ie. referenceid= occurrenceDao.createOrUpdateReference(referenceObj);

--> create the createOrUpdateReference method in org/binhum/harvest/util/jdbc/dao/OccurrenceDao.java (abstract) and org/binhum/harvest/util/jdbc/dao/OccurrenceDaoImpl.java

  • Have a look at the class of concept you want to link it to.
  • Add this ID to the concept you want to link it to, usually in the Unit.java class (for example:
    Preparation prepa = new Preparation();
    prepa.setTripleidstoreid(triplestoreid);
    prepa.setPreparationDate(extractionDate);
    prepa.setPreparationStaff(extractionStaff);
    prepa.setPreparationMaterials(extractionMethod);
    prepa.setPreparationType(preparationType);
    prepa.setReferenceid(referenceid);
    savePreparation(prepa);

-> Do it for each standard (get them from the public int parse(String accesspoint) method in Unit.java)
-> You will have to create the method setReferenceid and getReferenceid in the class Preparation.java.
!! For each concept, reset the referenceID first to 1, or you might have the reference from a previous element attached to the current concept !!
Also, you have to update the savePreparation method (ie. occurrenceDao.createOrUpdatePreparation(prepa); ie. you will have to edit in the OccurrenceDaoImpl.java class following elements:

*createPreparation : add a ps.setInt(6, prepa.getReferenceid()); 
*CREATE_PREPA_SQL : add the table column and a question mark for the SQL statement
*UPDATE_PREPA_SQL : add the table column with the question mark for the SQL statement
*createOrUpdatePreparation : add ps.setInt(6, prepa.getReferenceid()); and change ps.setInt(6, prepa.getId()); to ps.setInt(7, prepa.getId());

!! Check the field order and the number of fields/columns !!


Now it will be stored with the next harvesting. If you want to force the indexation of the new field (the original data might have not changed since the last indexing), you will have to delete the old records from the database (use the Management tab for it).


These new fields also have to be deleted by the next update : add the query statement in the processDeletionBasedOnOccurrenceid or in the processDeletionBasedOnTripleidstoreid method (depending on what it's based). You also might have to write a similar function to the cleanAssociation to remove the unused references from the database.

Add a new data standard

A) Add it in the database, in the standardschema table.
B) Document the changes
C) Add a new method in the Unit.java class, that will be called from the function public int parse(String accesspoint).
For example:

 if (getStandard().equalsIgnoreCase("abcd3")) {
            parseABCD3(utils,tripleIDStore);
 }
....

 private void parseABCD3(XMLutil utils, TripleIDStore tripleIDStore) {.....

And get inspired from the closest existing standard already known by B-HIT.


If you have a new namespace, add it to the namespaceMap in src/org/binhum/gbif/util/ABCDProcessor.java.