Use cases for name matching

From TETTRIs
Jump to: navigation, search

Simple use cases such as checking the names of a local list of insects against the names in the normative European EU-Nomen checklist (a.k.a. Pan European Species Infrastructure PESI) are already largely covered by existing tools. They are often used to detect orthographic or taxonomical errors, or to comply with legal requirements (e.g. EU-directives, national Red Lists, etc.).

On the other hand, a common use case in the elaboration of taxonomic treatments with comprehensive cover of the names in the taxonomic group is to identify existing names that have to be investigated for that purpose, i.e. to identify names held by the aggregator that are not already in the treatment. Currently, this use case ("Name harvesting") is not covered by a simple mechanism or service. Examples

Another common use case is that local data portals want to include links to aggregator databases when displaying a name (see e.g. the links to other sources in Tropicos). This allows to link to the name record in the target aggregator and thus (directly or indirectly), to the target's current or versioned taxonomic concept related to the specific name, and its opinion regarding the nomenclatural status. However, this is accomplished by using the name string as the search parameter, which may or may not work correctly (see potential caveats of scientific names).

In contrast, the incorporation of the target aggregator's resolvable name ID in the local database establishes an unequivocal link between the local name and the name in the aggregator treatment. This brings about an improvement in data quality for such links. Such unique and resolvable name identifiers play a central role, because they allow keeping track of name usage in the target aggregator. Aggregators should (and some do) provide name matching services that provide stable resolvable name IDs that users may incorporate into their databases.

Beyond that, users who did match their names and incorporate the aggregator's name ID should have the possibility to use a "taxon concept subscription" to be automatically informed of changes in the name usage of their names. This implies that target aggregators of taxonomic data trace changes in the concepts of the taxon where the respective name is placed. Such changes may be changes in status (accepted name to synonym or vice-versa) but also the addition or removal of synonyms. We hope that TETTRIs can instigate the implementation of taxon concept subscriptions by target aggregators, for example by means of the 3PP project funding scheme. Nomenclators (databases that do not treat the taxonomic status of a name) would have to trace changes in nomenclatural notes or nomenclatural status of the name.

A specialised use case for taxonomists (and taxon name registrars) is checking if a specific name already exists, in order to avoid the creation of homonyms.

Use cases

Environmental Research Infrastructures

(external) Navigating taxonomic complexity: A use-case report on FAIR scientific name-matching service usage in ENVRI Research Infrastructures, a paper published in RIO by Sharif Islam, Dario Papale, Lucia Vaira, Ilaria Rosati, Johannes Peterseil and Christian Pichot directly addresses the TETTRIs topic. The "paper presents a use-case conducted within the ENVRI FAIR project, examining challenges and opportunities in deploying FAIR-aligned (ensuring Findability, Accessibility, Interoperability and Reusability) scientific name-matching services across Environmental Research Infrastructures (RIs)."

Monitoring of fresh water fishes in Germany

See website. Data provided Sept. 19, 2023.

The monitoring dataset for 12 federal states in Germany contains 72 native and non-native freshwater species of fish with abundance data collected at more then 12.000 sample sites. The user wants to check the taxon names used against secondary aggregators.

The DwC-A file exported from DiversityWorkbench and provided to TETTRIs contains the 174.897 sampling records, including the taxon name.
Procedure for name matching: Opened the file in Libre Office Calc, deleted all columns but for taxonID, scientificName, order and family. Set records unique with menu commands Data | Standard Filter (see Nifty: How to Get Unique Values in a Column | LibreOffice Calc). The resulting table (TaxaFromOccurrenceTxt.csv) contains 72 records, as expected. It can now be matched against PESI and Catalogue of Life.
Results: PESI name matching: 67 exact matches, one more ambiguous (2 candidates), 4 without match; output file does not specify taxonomic status of target names. CoL name matching: 68 exact matches, 4 “variants” (orthographical errors, in authors only); all names accepted by the target aggregator.

Integrating Euro+Med PlantBase with WFO

Euro+Med PlantBase is both, a target database (as part of PESI) and a client with respect to CoL and World Flora Online (WFO). The client-use case currently consists of name-matching E+M with WFO to integrate WFO-IDs, with the side-effect that (for the accepted names) a content contribution to WFO can be made. Name-matching methods were tested extensively with the WFO Plant List online matching tool, as well as with the OpenRefine reconciliation API.

Data curation of a major herbarium dataset

Herbarium databases always contain taxon names as the identification of the plant specimen (and the storage location in the herbarium collection). Since these data are accumulated over time, and most collections do not have the personnel to follow all updates on nomenclature and taxonomy, the taxon lists may be quite messy. A name matching workflow may be used to curate the existing names in the database. An additional use case at a later stage would be to provide herbarium curators with synonyms to aid in the search of specimens for a specific loan request to their collection, where specimens may be stored under different names. The herbarium of the Real Jardín Botánico in Madrid agreed to provide a test case. First results: Use case Specify database at the herbarium Madrid.