Difference between revisions of "Wish list for name matching services"

From TETTRIs
Jump to: navigation, search
(Created page with "=Parameters for name matching=")
 
Line 1: Line 1:
=Parameters for name matching=
+
==Terminology==
 +
*candidates are returned partial matches, as opposed to exact matches
 +
*canonical name
 +
*asynchronous output
 +
 
 +
=General=
 +
==Input==
 +
*Allow input of a pasted column of names
 +
*Allow upload of a table with names (dialogue: name bearing column[s])
 +
*Allow upload of any other column
 +
==Interactive mode==
 +
==Output==
 +
*Avoid returning absurd candidates (names that are completely improbable)
 +
*In asynchronous mode, return matched records and candidates in a single table
 +
*If the name is entered as a string, parse the name and match the name components independently.
 +
*Allow wildcards
 +
*Allow input of name components in separate fields (at least in uploads):
 +
**For ICNAFP: Monomial / genus component; infrageneric rank; infrageneric epithet; species epithet; infraspecific rank; infraspecific epithet; basionym author (team); combination author (team); year of publication.
 +
**For ICZN: work in progress.
 +
*Exactly match the rank of the name, if unambiguous in the input – i.e. do not return a subspecies for a variety, a genus name for a family, etc.
 +
=Parameters for exact matches=
 +
*optional (ICNAFP): accept IPNI, Tropicos and full spaced author abbreviations as exact matches
 +
*optional (ICNAFP): ignore ex authors (the author or team preceeding the ex)
 +
*optional (ICNAFP): ignore hybrid symbol (or “x” space/space “x” space) in name
 +
*optional (ICNAFP): ignore authors in autonyms
 +
*optional: ignore endings in epithets (ICNAFP) / (species/subspecies name (ICZN)
 +
=For candidate matches=
 +
*exactly match the rank of the name, if unambiguous in the input
 +
*weigh probabilities hierchically:
 +
**e.g., in a species name, a full or near full match on a genus name is more important than that of the species epithet (epithets may be used in many genera).

Revision as of 16:50, 22 May 2024

Terminology

  • candidates are returned partial matches, as opposed to exact matches
  • canonical name
  • asynchronous output

General

Input

  • Allow input of a pasted column of names
  • Allow upload of a table with names (dialogue: name bearing column[s])
  • Allow upload of any other column

Interactive mode

Output

  • Avoid returning absurd candidates (names that are completely improbable)
  • In asynchronous mode, return matched records and candidates in a single table
  • If the name is entered as a string, parse the name and match the name components independently.
  • Allow wildcards
  • Allow input of name components in separate fields (at least in uploads):
    • For ICNAFP: Monomial / genus component; infrageneric rank; infrageneric epithet; species epithet; infraspecific rank; infraspecific epithet; basionym author (team); combination author (team); year of publication.
    • For ICZN: work in progress.
  • Exactly match the rank of the name, if unambiguous in the input – i.e. do not return a subspecies for a variety, a genus name for a family, etc.

Parameters for exact matches

  • optional (ICNAFP): accept IPNI, Tropicos and full spaced author abbreviations as exact matches
  • optional (ICNAFP): ignore ex authors (the author or team preceeding the ex)
  • optional (ICNAFP): ignore hybrid symbol (or “x” space/space “x” space) in name
  • optional (ICNAFP): ignore authors in autonyms
  • optional: ignore endings in epithets (ICNAFP) / (species/subspecies name (ICZN)

For candidate matches

  • exactly match the rank of the name, if unambiguous in the input
  • weigh probabilities hierchically:
    • e.g., in a species name, a full or near full match on a genus name is more important than that of the species epithet (epithets may be used in many genera).