Difference between revisions of "Name matching services"

From TETTRIs
Jump to: navigation, search
(Created page with "== Terminology == See What is name matching? for general discussion on terminology and intent.<br /> The following results from the TNLS (Taxonomic Name Linking Services)...")
 
m
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Terminology ==
+
 
See [[What is name matching?]] for general discussion on terminology and intent.<br />
+
These are online services that let users compare their lists of scientific organism names against [[Taxonomic_datasets|reference datasets]. Because user needs can vary widely, selecting both an appropriate dataset and a suitable name-matching service requires careful consideration before deciding on the most appropriate [[End-user workflows for name matching|workflow]]. Also see [[What is name matching?]] for a general discussion on terminology and intent.<br />
The following results from the TNLS (Taxonomic Name Linking Services) 3PP (TETTRIs third party project) provide an important overview:<br/>
+
The following results from the TNLS (Taxonomic Name Linking Services) TETTRIs Satellite Project provide an important overview of name service functionality:<br/>
 
[https://docs.google.com/spreadsheets/d/1QKvytrNa8TxYY63cfxzttd9tSUAN-xeF8TY3vG-J5gs/edit?gid=0#gid=0 Overview of input parameters of aggregator services]<br />
 
[https://docs.google.com/spreadsheets/d/1QKvytrNa8TxYY63cfxzttd9tSUAN-xeF8TY3vG-J5gs/edit?gid=0#gid=0 Overview of input parameters of aggregator services]<br />
[https://docs.google.com/spreadsheets/d/1QKvytrNa8TxYY63cfxzttd9tSUAN-xeF8TY3vG-J5gs/edit?gid=223152038#gid=223152038 Overview of output fields of aggregator services]
+
[https://docs.google.com/spreadsheets/d/1QKvytrNa8TxYY63cfxzttd9tSUAN-xeF8TY3vG-J5gs/edit?gid=223152038#gid=223152038 Overview of output fields of aggregator services]<br>
 +
We distinguish between services that act upon repositories of taxonomic datasets and single-dataset services.
  
==[https://www.anbg.gov.au/apni/ Australian Plant Name Index]==
+
=Repository services=
'''Taxonomic scope:''' Plants<br />
 
'''Geographic scope:''' Australia<br />
 
Software updated: ?<br />
 
Codebase/Documentation: ?<br />
 
Data updated: ?<br />
 
'''Limitation:''' not stated, check with nearly 21,000 names ended in server error [23 may 2024]<br />
 
'''Local ID input returned:''' NO<br />
 
Local Name input returned: <br />
 
Aggregator name ID returned: <br />
 
'''Interactive mode for partial matches:''' NO <br />
 
'''OpenRefine reconciliation API:''' NO<br />
 
'''Other:''' [https://www.biorxiv.org/content/10.1101/2024.02.02.578715v1 APCalign: an R package workflow and app for aligning and updating flora names to the Australian Plant Census]<br />
 
  
 
==[https://www.checklistbank.org/tools/name-match Checklist Bank (GBIF & Catalogue of Life)]==
 
==[https://www.checklistbank.org/tools/name-match Checklist Bank (GBIF & Catalogue of Life)]==
Line 31: Line 20:
 
'''Interactive mode for partial matches:''' NO<br />
 
'''Interactive mode for partial matches:''' NO<br />
 
'''OpenRefine reconciliation API:''' NO (but for OpenRefine possible with REST services)<br />
 
'''OpenRefine reconciliation API:''' NO (but for OpenRefine possible with REST services)<br />
'''Other:''' Login with GBIF account is recommended, required for file upload (self-registration at [https://www.gbif.org/user/profile https://www.gbif.org/user/profile])<br />
+
'''Other:''' Login with GBIF account is recommended, required for file upload (self-registration at [https://www.gbif.org/user/profile https://www.gbif.org/user/profile])<br />  
 
+
Offers the possibility to match datasets in repository against other such datasets.  
==[https://www.gbif.org/tools/species-lookup GBIF Taxonomic Backbone]==
 
'''Taxonomic scope:''' All taxa <br />
 
'''Geographic scope:''' global<br />
 
'''Software updated:''' current<br />
 
'''Codebase/Documentation:''' see [https://www.gbif.org/developer/species Species API] <br />
 
'''Data updated:''' August 28, 2023 (no further updates, but will stay online; will most probably be replaced by COL eXtended edition)<br />
 
'''Limitation:''' 6000 records<br />
 
'''Local ID input returned:''' YES <br />
 
'''Local Name input returned:''' YES<br />
 
'''Aggregator name ID returned:''' NO<br />
 
'''Interactive mode for partial matches:''' NO<br />
 
'''OpenRefine reconciliation API:''' NO<br />
 
'''Other:''' "Multi-taxonomy" mode in preparation - will allow to match against other taxonomies (e.g. COL eXtended edition). Matching Service including data will be downloadable as a Docker image. <br />
 
  
 
==[https://verifier.globalnames.org/ Global Names Verifier]==
 
==[https://verifier.globalnames.org/ Global Names Verifier]==
Line 61: Line 37:
 
'''OpenRefine reconciliation API:''' YES, with step-by-step documentation: https://github.com/gnames/gnverifier/wiki/OpenRefine-readme <br />
 
'''OpenRefine reconciliation API:''' YES, with step-by-step documentation: https://github.com/gnames/gnverifier/wiki/OpenRefine-readme <br />
 
'''Other:''' Offers a kind of query language that seems to be very flexible<br />
 
'''Other:''' Offers a kind of query language that seems to be very flexible<br />
'''TETTRIS Notes''' includes:<br />  
+
 
a 2021 Algabase set (but matching doesn’t work - [8/23]) <br />
+
==[https://tnrs.biendata.org/ TNRS Taxonomic Name Resolution Service] also: http://tnrs.iplantcollaborative.org/ <br />==
a (supposedly) up to date Index Fungorum search – the output (HTML, JSON, CSV, TSV) contains a UUID in the „id“ field which is not the „Index Fungorum UUID“, probably a Global Names UUID. But the field „RecordID“ is the „Index Fungorum Registration Identifier“ which in a URL resolves to the name page, e.g.: [http://www.indexfungorum.org/Names/NamesRecord.asp?RecordID=229900 http://www.indexfungorum.org/Names/NamesRecord.asp?RecordID=229900] <br />
+
'''Taxonomic scope:''' Plants, WFO and vascular plants WCVP - potentially more datasets could be included<br />
 +
'''Geographic scope:''' Global<br />
 +
'''Software updated:''' v. 5.0 Feb. 24, 2021<br />
 +
'''Codebase/Documentation:''' https://github.com/ojalaquellueva/TNRSapi and  https://github.com/iPlantCollaborativeOpenSource/TNRS/.<br />
 +
'''Data updated:''' 2023 (2024)<br />
 +
'''Limitation:''' Pasting 5000 names; API-processing unlimited (in batches of 5000)<br />
 +
'''Local ID input returned:''' NO <br />
 +
'''Local Name input returned:''' YES<br />
 +
'''Aggregator name ID returned:''' NO<br />
 +
'''Interactive mode for partial matches:''' YES<br />
 +
'''OpenRefine reconciliation API:''' NO<br />
 +
'''Other:''' API and R package available<br />
 +
 
 +
 
 +
=Single-dataset name matching services=
 +
 
 +
==[https://www.anbg.gov.au/apni/ Australian Plant Name Index]==
 +
'''Taxonomic scope:''' Plants<br />
 +
'''Geographic scope:''' Australia<br />
 +
Software updated: ?<br />
 +
Codebase/Documentation: ?<br />
 +
Data updated: ?<br />
 +
'''Limitation:''' not stated, check with nearly 21,000 names ended in server error [23 may 2024]<br />
 +
'''Local ID input returned:''' NO<br />
 +
Local Name input returned: <br />
 +
Aggregator name ID returned: <br />
 +
'''Interactive mode for partial matches:''' NO <br />
 +
'''OpenRefine reconciliation API:''' NO<br />
 +
'''Other:''' [https://www.biorxiv.org/content/10.1101/2024.02.02.578715v1 APCalign: an R package workflow and app for aligning and updating flora names to the Australian Plant Census]<br />
 +
 
 +
==[https://www.gbif.org/tools/species-lookup GBIF Taxonomic Backbone]==
 +
'''Taxonomic scope:''' All taxa <br />
 +
'''Geographic scope:''' global<br />
 +
'''Software updated:''' current<br />
 +
'''Codebase/Documentation:''' see [https://www.gbif.org/developer/species Species API] <br />
 +
'''Data updated:''' August 28, 2023 (no further updates, but will stay online; will most probably be replaced by COL eXtended edition)<br />
 +
'''Limitation:''' 6000 records<br />
 +
'''Local ID input returned:''' YES <br />
 +
'''Local Name input returned:''' YES<br />
 +
'''Aggregator name ID returned:''' NO<br />
 +
'''Interactive mode for partial matches:''' NO<br />
 +
'''OpenRefine reconciliation API:''' NO<br />
 +
'''Other:''' "Multi-taxonomy" mode in preparation - will allow to match against other taxonomies (e.g. COL eXtended edition). Matching Service including data will be downloadable as a Docker image. <br />
  
 
==[http://namematch.science.kew.org/ International Plant Name Index (IPNI)] <br />==
 
==[http://namematch.science.kew.org/ International Plant Name Index (IPNI)] <br />==
'''Taxonomic scope:''' Vascular plants (source: POWO - IPNI offered but not working) <br />
+
'''Taxonomic scope:''' Vascular plants (source: POWO; IPNI offered but not working) <br />
 
'''Geographic scope:''' Global<br />
 
'''Geographic scope:''' Global<br />
 
Software updated: ? <br />
 
Software updated: ? <br />
Line 109: Line 127:
 
'''OpenRefine reconciliation API:''' n/a <br />
 
'''OpenRefine reconciliation API:''' n/a <br />
 
Other: <br />
 
Other: <br />
 
==[https://tnrs.biendata.org/ TNRS Taxonomic Name Resolution Service] also: http://tnrs.iplantcollaborative.org/ <br />==
 
'''Taxonomic scope:''' Plants, WFO and vascular plants WCVP - potentially more datasets could be included<br />
 
'''Geographic scope:''' Global<br />
 
'''Software updated:''' v. 5.0 Feb. 24, 2021<br />
 
'''Codebase/Documentation:''' https://github.com/ojalaquellueva/TNRSapi and  https://github.com/iPlantCollaborativeOpenSource/TNRS/.<br />
 
'''Data updated:''' 2023 (2024)<br />
 
'''Limitation:''' Pasting 5000 names; API-processing unlimited (in batches of 5000)<br />
 
'''Local ID input returned:''' NO <br />
 
'''Local Name input returned:''' YES<br />
 
'''Aggregator name ID returned:''' NO<br />
 
'''Interactive mode for partial matches:''' YES<br />
 
'''OpenRefine reconciliation API:''' NO<br />
 
'''Other:''' API and R package available<br />
 
  
 
==[https://legacy.tropicos.org/NameMatching.aspx Tropicos]==
 
==[https://legacy.tropicos.org/NameMatching.aspx Tropicos]==
Line 164: Line 168:
 
'''Interactive mode for partial matches:''' NO<br />
 
'''Interactive mode for partial matches:''' NO<br />
 
'''OpenRefine reconciliation API:''' NO<br />
 
'''OpenRefine reconciliation API:''' NO<br />
 +
 +
=R-packages that include name matching=
 +
[https://doi.org/10.1111/2041-210X.13802 Grenié & al. (2022)] cover this subject in detail, identifying a number of packages providing direct or indirect access to online taxonomic datasets. They also point to an application [https://mgrenie.shinyapps.io/taxharmonizexplorer/ (“taxharmonizeexplorer")] that should aid R-users to select tools and datasets. The website for the app lists and graphically depicts the relationship between taxonomic datasets and R-packages (as of end of July 2025, it covers 68 packages). If this tool continues to be updated, it should be the primary source for R-programmers to identify useful packages for name matching, and apply these in a workflow detailed in Grenié & al.’s paper.

Latest revision as of 15:55, 28 September 2025

These are online services that let users compare their lists of scientific organism names against [[Taxonomic_datasets|reference datasets]. Because user needs can vary widely, selecting both an appropriate dataset and a suitable name-matching service requires careful consideration before deciding on the most appropriate workflow. Also see What is name matching? for a general discussion on terminology and intent.
The following results from the TNLS (Taxonomic Name Linking Services) TETTRIs Satellite Project provide an important overview of name service functionality:
Overview of input parameters of aggregator services
Overview of output fields of aggregator services
We distinguish between services that act upon repositories of taxonomic datasets and single-dataset services.

Repository services

Checklist Bank (GBIF & Catalogue of Life)

Taxonomic scope: All taxa or specific groups depending on the target dataset chosen
Geographic scope: Global or specific areas, depending on the target dataset chosen
Software updated: current (last checked june 26, 2025)
Codebase/Documentation: https://www.checklistbank.org/about/API
Data updated: depending on target dataset
Limitation: Direct input of list limited to 6000 names. (With file upload for asynchronous response not limited)
Local ID input returned: YES
Local Name input returned: YES
Aggregator name ID returned: YES - in download only
Interactive mode for partial matches: NO
OpenRefine reconciliation API: NO (but for OpenRefine possible with REST services)
Other: Login with GBIF account is recommended, required for file upload (self-registration at https://www.gbif.org/user/profile)
Offers the possibility to match datasets in repository against other such datasets.

Global Names Verifier

Taxonomic scope: defined by stored datasets - option to restrict matching to individual source dataset
Geographic scope: global (cross datasets or with global datasets) or restricted by choice of dataset
Software updated: active Feb. 2025
Codebase/Documentation: https://github.com/gnames
Codebase/Documentation: https://resolver.globalnames.org/api
Data updated: Differs for stored datasets
Limitation: 5000 names, at least in interactive mode
Local ID input returned: NO
Local Name input returned: YES
Aggregator name ID returned: YES (may be a taxon ID)
Interactive mode for partial matches: NO
OpenRefine reconciliation API: YES, with step-by-step documentation: https://github.com/gnames/gnverifier/wiki/OpenRefine-readme
Other: Offers a kind of query language that seems to be very flexible

TNRS Taxonomic Name Resolution Service also: http://tnrs.iplantcollaborative.org/

Taxonomic scope: Plants, WFO and vascular plants WCVP - potentially more datasets could be included
Geographic scope: Global
Software updated: v. 5.0 Feb. 24, 2021
Codebase/Documentation: https://github.com/ojalaquellueva/TNRSapi and https://github.com/iPlantCollaborativeOpenSource/TNRS/.
Data updated: 2023 (2024)
Limitation: Pasting 5000 names; API-processing unlimited (in batches of 5000)
Local ID input returned: NO
Local Name input returned: YES
Aggregator name ID returned: NO
Interactive mode for partial matches: YES
OpenRefine reconciliation API: NO
Other: API and R package available


Single-dataset name matching services

Australian Plant Name Index

Taxonomic scope: Plants
Geographic scope: Australia
Software updated: ?
Codebase/Documentation: ?
Data updated: ?
Limitation: not stated, check with nearly 21,000 names ended in server error [23 may 2024]
Local ID input returned: NO
Local Name input returned:
Aggregator name ID returned:
Interactive mode for partial matches: NO
OpenRefine reconciliation API: NO
Other: APCalign: an R package workflow and app for aligning and updating flora names to the Australian Plant Census

GBIF Taxonomic Backbone

Taxonomic scope: All taxa
Geographic scope: global
Software updated: current
Codebase/Documentation: see Species API
Data updated: August 28, 2023 (no further updates, but will stay online; will most probably be replaced by COL eXtended edition)
Limitation: 6000 records
Local ID input returned: YES
Local Name input returned: YES
Aggregator name ID returned: NO
Interactive mode for partial matches: NO
OpenRefine reconciliation API: NO
Other: "Multi-taxonomy" mode in preparation - will allow to match against other taxonomies (e.g. COL eXtended edition). Matching Service including data will be downloadable as a Docker image.

International Plant Name Index (IPNI)

Taxonomic scope: Vascular plants (source: POWO; IPNI offered but not working)
Geographic scope: Global
Software updated: ?
Codebase/Documentation ?
Data updated: current
Limitation: Not found - tested with 144.000 records
Local ID input returned: YES
Local Name input returned: YES
Aggregator name ID returned: YES: IPNI-LSID
Interactive mode for partial matches: NO
OpenRefine reconciliation API: YES - documentation: https://data1.kew.org/reconciliation/help
Other:

LifeWatch

Refers to Global Names for name matching.

PESI / eu-nomen

Taxonomic scope: All taxa
Geographic scope: Europe
Software updated: 2011?
Codebase/Documentation: By reference to components used (Taxamatch algorithm and scientific name parser)
Data updated: 2014
Limitation: 5,000 names
Local ID input returned: NO
Local Name input returned: YES
Aggregator name ID returned: YES (as provided by the primary aggregator)
Interactive mode for partial matches: YES
OpenRefine reconciliation API: NO
Other:

ROpenSci taxize

Taxonomic scope: All taxa or specific groups, depending on dataset used
Geographic scope: Global or regional, depending on dataset used
Software updated: Feb 2025
Codebase/Documentation: https://github.com/ropensci/taxize/
Data updated:
Limitation:
Local ID input returned:
Local Name input returned:
Aggregator name ID returned: YES
Interactive mode for partial matches: NO
OpenRefine reconciliation API: n/a
Other:

Tropicos

Taxonomic scope: Plants
Geographic scope: Global
Software updated:
Codebase/Documentation
Data updated: Current
Limitation:
Local ID input returned: YES
Local Name input returned: YES
Aggregator name ID returned: YES: Tropicos-ID
Interactive mode for partial matches: NO
OpenRefine reconciliation API: NO
Other:

World Flora Online WFO Plant List

Taxonomic scope: Plants
Geographic scope: Global
Software updated: ongoing June 2025 (not stated on website)
Codebase/Documentation: GraphQL API, Name Matching REST API, Reconciliation API
Data updated: December 2024 (semiannual edition)
Limitation: Not found - tested with 144.000 records
Local ID input returned: YES
Local Name input returned: YES
Aggregator name ID returned: YES - WFO-ID
Interactive mode for partial matches: YES
OpenRefine reconciliation API: YES: https://list.worldfloraonline.org/reconcile_index.php
Other: Service can be installed as local copy
Other: R-Package World Flora - see https://cran.r-project.org/web/packages/WorldFlora/index.html

WoRMS (World Register of Marine Species)

Scope: Marine species (global)
Software updated: not stated
Codebase/Documentation:
Data updated: current
Limitation: limited to 1500 names.
Local ID input returned: NO
Local Name input returned: YES
Aggregator name ID returned: YES (AphiaID)
Interactive mode for partial matches: NO
OpenRefine reconciliation API: NO

R-packages that include name matching

Grenié & al. (2022) cover this subject in detail, identifying a number of packages providing direct or indirect access to online taxonomic datasets. They also point to an application (“taxharmonizeexplorer") that should aid R-users to select tools and datasets. The website for the app lists and graphically depicts the relationship between taxonomic datasets and R-packages (as of end of July 2025, it covers 68 packages). If this tool continues to be updated, it should be the primary source for R-programmers to identify useful packages for name matching, and apply these in a workflow detailed in Grenié & al.’s paper.