Difference between revisions of "Wish list for name matching services"
From TETTRIs
Line 1: | Line 1: | ||
==Terminology== | ==Terminology== | ||
− | *candidates | + | *candidates = returned partial matches, as opposed to exact matches |
− | *canonical name | + | *canonical name = name string without authors/year, but with rank indicator (where required) |
− | *asynchronous output | + | *asynchronous output = (here) output (as a downloadable file) is produced after internal processing, which may take time. User is notified when ready. |
− | |||
− | |||
==Input== | ==Input== | ||
*Allow input of a pasted column of names | *Allow input of a pasted column of names | ||
Line 10: | Line 8: | ||
*Allow upload of any other column | *Allow upload of any other column | ||
*Allow wildcards | *Allow wildcards | ||
+ | *Do not limit the input (in asynchronuous mode) | ||
*Allow input of name components in separate fields (at least in uploads): | *Allow input of name components in separate fields (at least in uploads): | ||
**For ICNAFP: Monomial / genus component; infrageneric rank; infrageneric epithet; species epithet; infraspecific rank; infraspecific epithet; basionym author (team); combination author (team); year of publication. | **For ICNAFP: Monomial / genus component; infrageneric rank; infrageneric epithet; species epithet; infraspecific rank; infraspecific epithet; basionym author (team); combination author (team); year of publication. | ||
Line 16: | Line 15: | ||
==Matching== | ==Matching== | ||
*If the name is entered as a string, parse the name and match the name components independently | *If the name is entered as a string, parse the name and match the name components independently | ||
− | * | + | *Tolerate different abbreviations for infrageneric and infraspecific rank designations (e.g. "subspecies", "subsp.", "ssp.") |
*Exactly match the rank of the name, if unambiguous in the input – i.e. do not return a subspecies for a variety, a genus name for a family, etc. | *Exactly match the rank of the name, if unambiguous in the input – i.e. do not return a subspecies for a variety, a genus name for a family, etc. | ||
+ | *Avoid returning names that are completely improbable as candidates | ||
+ | **Weigh probabilities hierarchically; e.g., in a species name, a full or near full match on a genus name is more important than that of the species epithet (epithets may be used in many genera). | ||
+ | **Make phonetic matching optional | ||
*Allow parameters for matching (see below) | *Allow parameters for matching (see below) | ||
==Output== | ==Output== | ||
− | * | + | *Return matched records and candidates in a single table (at least in asynchronous mode) |
− | * | + | *Return all input columns with matching results (at least in asynchronous mode) |
− | + | ||
− | + | ==Parameters for exact matches== | |
− | =Parameters for exact matches= | ||
*optional (ICNAFP): accept IPNI, Tropicos and full spaced author abbreviations as exact matches | *optional (ICNAFP): accept IPNI, Tropicos and full spaced author abbreviations as exact matches | ||
*optional (ICNAFP): ignore ex authors (the author or team preceeding the ex) | *optional (ICNAFP): ignore ex authors (the author or team preceeding the ex) | ||
Line 30: | Line 31: | ||
*optional (ICNAFP): ignore authors in autonyms | *optional (ICNAFP): ignore authors in autonyms | ||
*optional: ignore endings in epithets (ICNAFP) / (species/subspecies name (ICZN) | *optional: ignore endings in epithets (ICNAFP) / (species/subspecies name (ICZN) | ||
− | =Parameters for candidate matches= | + | ==Parameters for candidate matches== |
− | + | *allow different ranks | |
+ | *activate phonetic matching | ||
+ | *... | ||
+ | ==Machine Interface== | ||
+ | *OpenRefine Reconciliation Interface | ||
+ | * |
Revision as of 09:28, 23 May 2024
Contents
Terminology
- candidates = returned partial matches, as opposed to exact matches
- canonical name = name string without authors/year, but with rank indicator (where required)
- asynchronous output = (here) output (as a downloadable file) is produced after internal processing, which may take time. User is notified when ready.
Input
- Allow input of a pasted column of names
- Allow upload of a table with names (dialogue: name bearing column[s])
- Allow upload of any other column
- Allow wildcards
- Do not limit the input (in asynchronuous mode)
- Allow input of name components in separate fields (at least in uploads):
- For ICNAFP: Monomial / genus component; infrageneric rank; infrageneric epithet; species epithet; infraspecific rank; infraspecific epithet; basionym author (team); combination author (team); year of publication.
- For ICZN: work in progress.
Interactive mode
Matching
- If the name is entered as a string, parse the name and match the name components independently
- Tolerate different abbreviations for infrageneric and infraspecific rank designations (e.g. "subspecies", "subsp.", "ssp.")
- Exactly match the rank of the name, if unambiguous in the input – i.e. do not return a subspecies for a variety, a genus name for a family, etc.
- Avoid returning names that are completely improbable as candidates
- Weigh probabilities hierarchically; e.g., in a species name, a full or near full match on a genus name is more important than that of the species epithet (epithets may be used in many genera).
- Make phonetic matching optional
- Allow parameters for matching (see below)
Output
- Return matched records and candidates in a single table (at least in asynchronous mode)
- Return all input columns with matching results (at least in asynchronous mode)
Parameters for exact matches
- optional (ICNAFP): accept IPNI, Tropicos and full spaced author abbreviations as exact matches
- optional (ICNAFP): ignore ex authors (the author or team preceeding the ex)
- optional (ICNAFP): ignore hybrid symbol (or “x” space/space “x” space) in name
- optional (ICNAFP): ignore authors in autonyms
- optional: ignore endings in epithets (ICNAFP) / (species/subspecies name (ICZN)
Parameters for candidate matches
- allow different ranks
- activate phonetic matching
- ...
Machine Interface
- OpenRefine Reconciliation Interface