Difference between revisions of "Global or regional aggregators"
(Replaced content with "=="Taxonomic aggregators" in the TETTRIs context==") |
m |
||
| (One intermediate revision by the same user not shown) | |||
| Line 1: | Line 1: | ||
=="Taxonomic aggregators" in the TETTRIs context== | =="Taxonomic aggregators" in the TETTRIs context== | ||
| + | In the context of TETTRIs WP2, we define taxonomic aggregators as online accessible databases that compile taxonomic information from multiple sources. Our focus is on aggregators that offer name-matching services; however, we distinguish between the taxonomic datasets used by these services and the services themselves. | ||
| + | |||
| + | '''Taxonomic datasets''' here are structured lists of scientific names that follow a single classification system. They typically present a hierarchical, tree-like taxonomy in which each taxon represents a node. Each scientific name is either assigned as the accepted name of a taxon or treated as a synonym (except in cases where the name exists but cannot currently be resolved). | ||
| + | |||
| + | Such datasets aim to provide comprehensive coverage for particular organism groups, either globally or regionally (e.g., at a European scale). The ''Catalogue of Life'' monthly and annual editions are currently the only datasets attempting to unite all biota in a single global dataset. At the regional level, ''PESI-EU Nomen'' serves a similar purpose for European taxa. | ||
| + | |||
| + | The sources of the data may include scientific literature, smaller aggregators, or, in some cases, even original unpublished data. An important consideration with respect to datasets is in how far they are driven by and (also) used by the respective taxonomic community. | ||
| + | |||
| + | A special class of datasets are “nomenclators” that systematically catalog scientific names along with their authorship, publication dates, and references. These, focus on nomenclatural accuracy such as on correct spelling and nomenclatural validity rather than providing taxonomic opinion or classification. | ||
| + | |||
| + | Taxonomic aggregators may be based on a single dataset or serve as repositories for multiple independent datasets. In the latter case, aggregator services can be configured to operate across all or selected datasets. When these services are restricted to a single dataset, users can generally expect a taxonomically coherent outcome. | ||
| + | |||
| + | '''Aggregator services''' vary in scope and may cover diverse types of biological data. Within TETTRIs WP2, our focus is specifically on name-matching mechanisms - see [https://wiki.bgbm.org/tettris/What_is_name_matching%3F What is name matching] for a discussion of the name matching process and [https://wiki.bgbm.org/tettris/Existing_name_checking_mechanisms Existing Name Matching Mechanisms] for a preliminary list of services. | ||
| + | |||
| + | Currently, '''users''' face a fragmented and sometimes confusing landscape of multiple and partially overlapping datasets and name-checking services. Services differ in their features and may even give different results when applied to the same dataset. To address this, TETTRIs is developing data matrices to characterize the properties of datasets and services, thereby supporting users in selecting tools that best meet their needs. Additionally, we have compiled a [https://wiki.bgbm.org/tettris/Wish_list_for_name_matching_services wish list of desired properties] based on use case studies, which may help to improve name-matching services. | ||
Latest revision as of 09:15, 27 June 2025
"Taxonomic aggregators" in the TETTRIs context
In the context of TETTRIs WP2, we define taxonomic aggregators as online accessible databases that compile taxonomic information from multiple sources. Our focus is on aggregators that offer name-matching services; however, we distinguish between the taxonomic datasets used by these services and the services themselves.
Taxonomic datasets here are structured lists of scientific names that follow a single classification system. They typically present a hierarchical, tree-like taxonomy in which each taxon represents a node. Each scientific name is either assigned as the accepted name of a taxon or treated as a synonym (except in cases where the name exists but cannot currently be resolved).
Such datasets aim to provide comprehensive coverage for particular organism groups, either globally or regionally (e.g., at a European scale). The Catalogue of Life monthly and annual editions are currently the only datasets attempting to unite all biota in a single global dataset. At the regional level, PESI-EU Nomen serves a similar purpose for European taxa.
The sources of the data may include scientific literature, smaller aggregators, or, in some cases, even original unpublished data. An important consideration with respect to datasets is in how far they are driven by and (also) used by the respective taxonomic community.
A special class of datasets are “nomenclators” that systematically catalog scientific names along with their authorship, publication dates, and references. These, focus on nomenclatural accuracy such as on correct spelling and nomenclatural validity rather than providing taxonomic opinion or classification.
Taxonomic aggregators may be based on a single dataset or serve as repositories for multiple independent datasets. In the latter case, aggregator services can be configured to operate across all or selected datasets. When these services are restricted to a single dataset, users can generally expect a taxonomically coherent outcome.
Aggregator services vary in scope and may cover diverse types of biological data. Within TETTRIs WP2, our focus is specifically on name-matching mechanisms - see What is name matching for a discussion of the name matching process and Existing Name Matching Mechanisms for a preliminary list of services.
Currently, users face a fragmented and sometimes confusing landscape of multiple and partially overlapping datasets and name-checking services. Services differ in their features and may even give different results when applied to the same dataset. To address this, TETTRIs is developing data matrices to characterize the properties of datasets and services, thereby supporting users in selecting tools that best meet their needs. Additionally, we have compiled a wish list of desired properties based on use case studies, which may help to improve name-matching services.