Difference between revisions of "End-user workflows for name matching"

From TETTRIs
Jump to: navigation, search
 
Line 17: Line 17:
 
#*This involves making local corrections based on the matching results. It also includes integrating the aggregator’s name ID into the local dataset to enable linkage and potential interaction with the aggregator.
 
#*This involves making local corrections based on the matching results. It also includes integrating the aggregator’s name ID into the local dataset to enable linkage and potential interaction with the aggregator.
  
The current effort to document these processes faces a moving targets, as the main aggregators are continuously evolving, hopefully influenced by the [[Aggregator_services_wish_list|requirements]] posted by TETTRIs WP2. Close collaboration with the TETTRIs 3PP project on Taxonomic Name Linking Services (TNLS), along with the involvement of the Catalogue of Life as a project member, is expected to yield productive complementarity in enhancing end-user workflows.
+
We currently focus on OpenRefine for medium to large datasets. The current effort to document these processes faces a moving target, as the main aggregators are continuously evolving, hopefully influenced by the [[Aggregator_services_wish_list|requirements]] posted by TETTRIs WP2. Close collaboration with the TETTRIs 3PP project on Taxonomic Name Linking Services (TNLS), along with the involvement of the Catalogue of Life as a project member, is expected to yield productive complementarity in enhancing end-user workflows.

Latest revision as of 14:08, 23 May 2024

The workflows that end-users follow vary significantly based on the specific use case, ranging from checking a single name to uploading a regional or monographic checklist with thousands of names. There are three main types of name-checking processes:

  1. Direct Use of the Aggregator's Name Matching Mechanisms: Utilizing the tools provided directly by the aggregator.
  2. Using Third-Party Tools: Leveraging tools such as [OpenRefine] that access the aggregator's name matching services.
  3. Using Local Tools: Downloading the aggregator's data and using local tools to perform the matching.

The choice of method mainly depends on the expected result but also on the number of records to be matched and on the technical in-house expertise available to the user. A type 3 process usually requires some expertise in biodiversity data management. TETTRIs provides links to download sites to get the aggregator's data. For type 2, TETTRIs will provide some example use cases that have been successfully tested. For type 1 (direct use of the aggregator’s services, the respective documentation will be pointed out in a list paralleling the list of general capabilities of aggregators.

The choice of method depends on the expected outcome, the volume of records to be matched, and the technical expertise available to the user. Type 3 processes generally require expertise in biodiversity data management. TETTRIs offers links to download sites for the aggregator's data. For type 2 processes, TETTRIs provides example use cases that have been successfully tested. For type 1 processes, relevant documentation will be documented paralleling the listed list of general capabilities of aggregators.

The process itself can be divided into four phases:

  1. Preparing the Data:
    • A text-only list of names is required, which can be created from a spreadsheet column or a table. Each name should be on a separate line.
  2. Submitting the Data:
    • This phase depends on the chosen type of checking process.
  3. Getting and Interpreting the Results:
    • For process types 1 and 2, results are provided as lists of exact matches and possible candidates. Interpretation involves assessing these candidates and selecting the correct match if appropriate.
  4. Incorporating the Results Locally:
    • This involves making local corrections based on the matching results. It also includes integrating the aggregator’s name ID into the local dataset to enable linkage and potential interaction with the aggregator.

We currently focus on OpenRefine for medium to large datasets. The current effort to document these processes faces a moving target, as the main aggregators are continuously evolving, hopefully influenced by the requirements posted by TETTRIs WP2. Close collaboration with the TETTRIs 3PP project on Taxonomic Name Linking Services (TNLS), along with the involvement of the Catalogue of Life as a project member, is expected to yield productive complementarity in enhancing end-user workflows.