Difference between revisions of "Global or regional aggregators"

From TETTRIs
Jump to: navigation, search
m
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
=="Taxonomic aggregators" - Categorisation in the TETTRIs context==
+
=="Taxonomic aggregators" in the TETTRIs context==
===Primary Target Aggregators===
+
In the context of TETTRIs WP2, we define taxonomic aggregators as online accessible databases that compile taxonomic information from multiple sources. Our focus is on aggregators that offer name-matching services; however, we distinguish between the taxonomic datasets used by these services and the services themselves.
These are aggregators that offer comprehensive taxonomic coverage for a certain group of organisms, globally or with a European geographic coverage. As a role, they offer more in depth information compared to the other types of aggregators. They may contain data from literature, from smaller-scale aggregators or sometimes even original data. TETTRIs here focuses sites directly driven by the respective taxonomic community: For Fungi: Index Fungorum / Species Fungorum (IF). For vascular plants and bryophytes: World Flora Online Plant List (WFO) and for Europe, Euro+Med PlantBase. IF is also contributing to CoL and PESI, Euro+Med to PESI (and it will use WFO Name IDs), and WFO Plant List will in the future probably contribute to CoL, too. The question of identifying appropriate community datasets for animal groups and for Algae is being discussed.
+
 
Species Fungorum and Euro+Med do not provide their own name matching services. However, since the CoL Services essentially are also covering ChecklistBank, versions of these datasets deposited in a standard format there can be accessed directly by singling them out as target datasets. Summary:
+
'''Taxonomic datasets''' here are structured lists of scientific names that follow a single classification system. They typically present a hierarchical, tree-like taxonomy in which each taxon represents a node. Each scientific name is either assigned as the accepted name of a taxon or treated as a synonym (except in cases where the name exists but cannot currently be resolved).
* Specific group of organisms, global (e.g. Index Fungorum) or regional (e.g. Euro+Med PlantBase).
+
* Content directly driven by and (also) used by the respective taxonomic community.  
+
Such datasets aim to provide comprehensive coverage for particular organism groups, either globally or regionally (e.g., at a European scale). The ''Catalogue of Life'' monthly and annual editions are currently the only datasets attempting to unite all biota in a single global dataset. At the regional level, ''PESI-EU Nomen'' serves a similar purpose for European taxa.
* Single classification of name usage, either as synonyms, accepted taxon names or for some reason unplaced names.
+
 
===Secondary aggregators===
+
The sources of the data may include scientific literature, smaller aggregators, or, in some cases, even original unpublished data. An important consideration with respect to datasets is in how far they are driven by and (also) used by the respective taxonomic community.
These are aggregators that offer comprehensive taxonomic coverage of names and taxa irrespective of the taxonomic group, with either global (Catalogue of Life) or European (PESI/EU-nomen) geographic coverage. Their records are mostly contributed by secondary aggregators representing a certain taxonomic group. In contrast to "Lookup" aggregators, they provide a single classification of name usage, either as synonyms, accepted taxon names or for some reason unplaced names. Primary target aggregators are the principal components of the TETTRIs taxonomic backbone. TETTRIs focuses on Catalogue of Life and PESI/EU-nomen as a Secondary Target Aggregators. Summary:
+
 
* Comprehensive taxonomic coverage of names and taxa irrespective of the taxonomic group.
+
A special class of datasets are “nomenclators” that systematically catalog scientific names along with their authorship, publication dates, and references. These, focus on nomenclatural accuracy such as on correct spelling and nomenclatural validity rather than providing taxonomic opinion or classification.  
* Global (Catalogue of Life) or regional (e.g. European: PESI/EU-nomen).  
+
 
* Single classification of name usage.
+
Taxonomic aggregators may be based on a single dataset or serve as repositories for multiple independent datasets. In the latter case, aggregator services can be configured to operate across all or selected datasets. When these services are restricted to a single dataset, users can generally expect a taxonomically coherent outcome.
* Ideally, aggregating primary aggregators
+
 
==="Lookup" target aggregators (Repositories)===
+
'''Aggregator services''' vary in scope and may cover diverse types of biological data. Within TETTRIs WP2, our focus is specifically on name-matching mechanisms - see  [https://wiki.bgbm.org/tettris/What_is_name_matching%3F What is name matching] for a discussion of the name matching process and [https://wiki.bgbm.org/tettris/Existing_name_checking_mechanisms Existing Name Matching Mechanisms] for a preliminary list of services.  
These are services that provide access to a multitude of stored datasets, which may be updated or rather out-of-date. They are of interest for name discovery and to identify where a designation comes from. However, TETTRIs focusses here on GBIF Checklist Bank and on Global Names and both offer a service to match names against specific datasets – this essentially renders them (indirect) secondary target aggregators
+
 
* Provide access to a multitude of stored datasets, legacy or updated (e.g. GBIF/CoL Checklist Bank, GNA Global Names). Summary:
+
Currently, '''users''' face a fragmented and sometimes confusing landscape of multiple and partially overlapping datasets and name-checking services. Services differ in their features and may even give different results when applied to the same dataset. To address this, TETTRIs is developing data matrices to characterize the properties of datasets and services, thereby supporting users in selecting tools that best meet their needs. Additionally, we have compiled a [https://wiki.bgbm.org/tettris/Wish_list_for_name_matching_services wish list of desired properties] based on use case studies, which may help to improve name-matching services.
* Useful for name discovery and to trace provenance of designations
 
* If offering a service to match names against specific datasets they become (indirect) secondary or primary target aggregators
 

Latest revision as of 09:15, 27 June 2025

"Taxonomic aggregators" in the TETTRIs context

In the context of TETTRIs WP2, we define taxonomic aggregators as online accessible databases that compile taxonomic information from multiple sources. Our focus is on aggregators that offer name-matching services; however, we distinguish between the taxonomic datasets used by these services and the services themselves.

Taxonomic datasets here are structured lists of scientific names that follow a single classification system. They typically present a hierarchical, tree-like taxonomy in which each taxon represents a node. Each scientific name is either assigned as the accepted name of a taxon or treated as a synonym (except in cases where the name exists but cannot currently be resolved).

Such datasets aim to provide comprehensive coverage for particular organism groups, either globally or regionally (e.g., at a European scale). The Catalogue of Life monthly and annual editions are currently the only datasets attempting to unite all biota in a single global dataset. At the regional level, PESI-EU Nomen serves a similar purpose for European taxa.

The sources of the data may include scientific literature, smaller aggregators, or, in some cases, even original unpublished data. An important consideration with respect to datasets is in how far they are driven by and (also) used by the respective taxonomic community.

A special class of datasets are “nomenclators” that systematically catalog scientific names along with their authorship, publication dates, and references. These, focus on nomenclatural accuracy such as on correct spelling and nomenclatural validity rather than providing taxonomic opinion or classification.

Taxonomic aggregators may be based on a single dataset or serve as repositories for multiple independent datasets. In the latter case, aggregator services can be configured to operate across all or selected datasets. When these services are restricted to a single dataset, users can generally expect a taxonomically coherent outcome.

Aggregator services vary in scope and may cover diverse types of biological data. Within TETTRIs WP2, our focus is specifically on name-matching mechanisms - see What is name matching for a discussion of the name matching process and Existing Name Matching Mechanisms for a preliminary list of services.

Currently, users face a fragmented and sometimes confusing landscape of multiple and partially overlapping datasets and name-checking services. Services differ in their features and may even give different results when applied to the same dataset. To address this, TETTRIs is developing data matrices to characterize the properties of datasets and services, thereby supporting users in selecting tools that best meet their needs. Additionally, we have compiled a wish list of desired properties based on use case studies, which may help to improve name-matching services.