Difference between revisions of "What is name matching?"

From TETTRIs
Jump to: navigation, search
(How Latin names are ambiguous)
(How Latin names are ambiguous)
Line 11: Line 11:
 
== How Latin names are ambiguous ==
 
== How Latin names are ambiguous ==
  
* __Homonyms__ are names that are spelt the same but refer to different things. Under the codes of nomenclature one of the names will always have presidence over another.  
+
* '''Homonyms''' are names that are spelt the same but refer to different things. Under the codes of nomenclature one of the names will always have presidence over another but because there have not been universal name registries it has not been possible to prevent creation of duplicate names. In the strict sense homonym means the full name, including the authors names, are identical. Homonym is often used in a looser sense of just applying to the words that make up the name, excluding the author string. This is because author strings are often not standardised or omitted entirely. Homonyms may occur within or between codes, that is the same name string may be used for two plants or for a plant and an animal. Furthermore there are two types of homonyms:
** Isonyms occur when a name is based on the same type specimen but published in multiple places. The majority of isonyms are created by the author publishing the name again (perhaps in a paper and in a flora or catalogue) and so have the same authors. There is no scope for taxonomic confusion in botany and the only scope for nomenclatural confusion caused by isonyms is citing the wrong reference as a place of original publication. In zoology the name string may have different dates thus causing matching failures even though the intent of the author was to name the same taxon.
+
** '''Isonyms''' occur when a name is based on the same type specimen but published in multiple places. The majority of isonyms are created by the author publishing the name again (perhaps in a paper and in a flora, fauna or catalogue) and so have the same author(s). There is no scope for taxonomic confusion in botany and the only scope for nomenclatural confusion caused by isonyms is citing the wrong reference as a place of original publication. In zoology the name string may have different dates thus causing matching failures even though the intent of the author(s) was to name the same taxon.
** True  
+
** 'True' Homonyms are names based on different type specimens and, usually, published by different authors. If they are published by different authors (homonym in the loose sense) and the author(s) names are included in the full name then they should not be ambiguous during matching however author(s) names may be omitted, causing false matches, or use nonstandard forms, causing false mismatches. Because the useage of species epithets in different genera (new combination in botany) are not required in zoology the potential for ambiguity is higher.
 
* Author String variation
 
* Author String variation
 
** Legal
 
** Legal

Revision as of 07:49, 26 September 2024


The process of combining biodiversity data from multiple sources currently starts with matching of the Latin name strings for the organisms used in each dataset. Studies often contain names that can not be unambiguously matched or miss out some names entirely. When combining datasets, between 10% and 20% of names will fail to match perfectly and may need some human interaction or accepted error. With datasets of many thousands of species this soon becomes a major hurdle that has to be crossed every time datasets are used in analyses and is exasperated when more than two datasets are used.

It is better if study data can be matched once, at source, then linked on unambiguous name IDs rather than by matching potentially ambiguous name strings.

How Latin names are ambiguous

  • Homonyms are names that are spelt the same but refer to different things. Under the codes of nomenclature one of the names will always have presidence over another but because there have not been universal name registries it has not been possible to prevent creation of duplicate names. In the strict sense homonym means the full name, including the authors names, are identical. Homonym is often used in a looser sense of just applying to the words that make up the name, excluding the author string. This is because author strings are often not standardised or omitted entirely. Homonyms may occur within or between codes, that is the same name string may be used for two plants or for a plant and an animal. Furthermore there are two types of homonyms:
    • Isonyms occur when a name is based on the same type specimen but published in multiple places. The majority of isonyms are created by the author publishing the name again (perhaps in a paper and in a flora, fauna or catalogue) and so have the same author(s). There is no scope for taxonomic confusion in botany and the only scope for nomenclatural confusion caused by isonyms is citing the wrong reference as a place of original publication. In zoology the name string may have different dates thus causing matching failures even though the intent of the author(s) was to name the same taxon.
    • 'True' Homonyms are names based on different type specimens and, usually, published by different authors. If they are published by different authors (homonym in the loose sense) and the author(s) names are included in the full name then they should not be ambiguous during matching however author(s) names may be omitted, causing false matches, or use nonstandard forms, causing false mismatches. Because the useage of species epithets in different genera (new combination in botany) are not required in zoology the potential for ambiguity is higher.
  • Author String variation
    • Legal
    • Illegal
  • Orthographical variants
  • Errors
    • OCR
    • Typographic

Matching vs Searching