Wish list for name matching services

From TETTRIs
Jump to: navigation, search

Terminology

  • candidates = returned partial matches, as opposed to exact matches
  • canonical name = name string without authors/year, but with rank indicator (where required)
  • asynchronous output = (here) output (as a downloadable file) is produced after internal processing, which may take time. User is notified when ready.

General

Input

  • Allow input of a pasted column of names
  • Allow upload of a table with names (dialogue: name bearing column[s])
  • Allow upload of any other column
  • Allow wildcards
  • Do not limit the input (in asynchronuous mode)
  • Filter out terms for uncertainty (e.g. c., cf., aff., sp. prox., (?) etc.)
  • Filter out terms for species groups/aggregates/complexes (agg., species group, species complex)
  • Filter out
  • Recognise the different structures of zoological (ICZN) and "botanical" (ICNAFP) names (see "Structure of names" below)
  • Allow input of name components in separate fields (at least in uploads):
    • For ICNAFP: Monomial / genus component; infrageneric rank; infrageneric epithet; species epithet; infraspecific rank; infraspecific epithet; basionym author (team); combination author (team); year of publication.
    • For ICZN: Genus name, subgenus name, specific name, subspecific name, original combination author (team), year of publication

Interactive mode

  • Allow for interactive selection of the matching record

Matching

  • If the name is entered as a string, parse the name and match the name components independently
  • For ICNAFP: Tolerate different abbreviations for infrageneric and infraspecific rank designations (e.g. "subspecies", "subsp.", "ssp.")
  • For ICNAFP: Exactly match the rank of the name, if unambiguous in the input – i.e. do not return a subspecies for a variety, a genus name for a family, etc. (default)
  • For ICZN: Exactly match the rank of the name, if unambiguous in the input – even more important here, because redundant names are allowed (Bufo bufo bufo, Meles meles)
  • Avoid returning names that are completely improbable as candidates
    • Weigh probabilities hierarchically; e.g., in a species name, a full or near full match on a genus name is more important than that of the species epithet (epithets may be used in many genera).
    • Make phonetic matching optional
  • Allow parameters for matching (see below)

Output

  • Return matched records and candidates in a single table (at least in asynchronous mode)
  • Return all input columns with matching results (at least in asynchronous mode)
  • List candidates sorted by probability (see above)
  • Provide output in standard formats (DwC, ColDP)

Parameters for exact matches

  • optional (ICNAFP): accept IPNI, Tropicos and full spaced author abbreviations as exact matches (default)
  • optional (ICNAFP): ignore ex authors (the author or team preceeding the ex) (explicit)
  • optional (ICNAFP): ignore hybrid symbol (or “x” space/space “x” space) in name (explicit)
  • optional (ICZN): ignore indicators for hybrids (kl. for klepton, sk. for subklepton) (default)
  • optional (ICNAFP): ignore authors in autonyms (explicit)
  • optional: (ICNAFP): ignore endings in name elements (monomials, species epithet/name, infraspecific epithet/name) (explicit)
  • optional: (ICZN): ignore the classic genitive mistake (-ii instead of -i, -iae instead of -ae) [it's such a common mistake. In zoology, we would only have -ii/-iae if someone's name finishes with -i. In botany, the use of the double ii is more common. There was a time when the -ii was also the norm in zoology (ex. Theraphosa blondi [correct spelling] was described as blondii, dedicated to Leblond)] (E. Saliba, pers. comm.)
  • optional: treat canonical matches as exact if only one match is found (i.e. ignore authors/year) (explicit)

Parameters for candidate matches

  • allow different ranks (explicit)
  • activate phonetic matching (explicit)
  • increase tolerance (accept wider range of near matches)
  • ...

Machine Interface

  • OpenRefine Reconciliation Interface
  • ...

Structure of names

A botanical name strictly consists of a maximum of three canonical name elements plus a rank indicator, where appropriate.
In zoology, there is only one infraspecific rank, the subspecies. No need to specify the rank. That one is not open to interpretation, the Code is quite clear about it. There is officially also only one infrageneric rank, the subgenus, but there seem to be other opinions. A discussion of the ICZN occurred in 2018 and if we follow Erna Aescht’s position in Opinion 2420 “the taxonomic terms ‘section’, ‘division’ and ‘superspecies’ are given in quotation marks [in the Code], because they are not nomenclatural terms” and the subgenus is the only possible rank in the genus series, other than the genus rank. Subgenera are usually indicated in between parentheses when used in a full species name (Art. 6.1.). The ICodeZN recommends that No genus-group name other than a valid subgeneric name should be interpolated between a generic name and a specific name, even in square brackets or parentheses. An author who desires to refer to a former generic combination should do so in some explicit form such as "Branchiostoma lanceolatum [formerly in Amphioxus]".
[Elie Saliba, pers. comm. 27 may 2024]

Literature and Links

  • Conti, M., Nimis, P.L., Martellos, S. 2021: Match Algorithms for Scientific Names in FlorItaly, the Portal to the Flora of Italy. Plants 2021, 10, 974. https://doi.org/10.3390/plants10050974