Taxonomic datasets
Contents
- 1 Repositories
- 2 Taxonomic datasets
- 2.1 Catalogue of Life (COL)
- 2.2 Catalogue of Life eXtended Release (COL XR)
- 2.3 Encyclopedia of Life (EOL)
- 2.4 Fungal Names
- 2.5 GBIF Backbone Taxonomy
- 2.6 Integrated Taxonomic Information System (ITIS)
- 2.7 International Committee on Taxonomy of Viruses (ICTV)
- 2.8 Mycobank
- 2.9 NCBI Taxonomy
- 2.10 (The) Paleobiology Database
- 2.11 World Checklist of Vascular Plants (WCVP)
- 2.12 World Flora Online (WFO) Plant List
- 3 Nomenclators
Repositories
Listed are sites that host uploaded or imported datasets from various sources. All listed here also provide some kind of API and name matching services for the included datasets. Only ChecklistBank and GBIF provide the possibility to also download the datasets.
ChecklistBank (CLB)
CLB was developed by the Catalogue of Life (COL) and the Global Biodiversity Information Facility (GBIF). It is a repository, holding a huge number of individual datasets, ranging from large checklists (like COL or World Flora Online editions) to data extracted from individual publications (taxonomic treatments). The latter, provided by PLAZI, form the bulk of the submissions (as of August 2025, more than 58,000 datasets of a total of about 61,100). Functioning of the portal is documented in a tutorial for users and the code is managed on Github.
Downloads: All data can be downloaded, in the original format or in various formats. Download requires a login with a free GBIF user account. Parts of checklists can be downloaded by means of selecting a root taxon, e.g. a genus within the checklist, or a specific taxonomic rank.
Global Biodiversity Information Facility (GBIF)
GBIF is storing uploaded taxonomic datasets, assigns a DOI and makes them available for download. As of August 2025, nearly 61,000 datasets are listed - largely overlapping with ChecklistBank. Metadata descriptions are usually comprehensive. The datasets have been used to assemble the GBIF Backbone Taxonomy (see under datasets below) as a "single, synthetic management classification with the goal of covering all names GBIF is dealing with". This dataset will be replaced by the Catalogue of Life eXtended Release (COL XR, also see below) in the future [Hernández-Robles & al. 2023].
Downloads: All datasets can be downloaded as DwCA files.
Global names architecture (GNA)
The Global Names Architecture (GNA) is a system of web-services which helps people to register, find, index, check and organize biological scientific names and interconnect on-line information about species. For its name matching tool (Global Names Verifier) it draws on a number of imported taxonomic datasets which are regularly updated, where possible [checked 22 aug 2025]. Name matching can be restricted to one or more of these. However, these datasets cannot be downloaded directly from the GNA website.
Botanical information and ecology network (BIEN)
BIEN is a network based at the National Center for Ecological Analysis and Synthesis (NCEAS). BIEN aims at integrating global botanical data (see https://bien.nceas.ucsb.edu/bien/about/. For its name matching service (Taxonomic Name Resolution Service TNRS) it hosts botanical datasets that can be selected for the matching process.
Taxonomic datasets
These are structured lists of scientific names that follow a single classification system. They typically present a hierarchical, tree-like taxonomy in which each taxon represents a node. Each scientific name is either assigned as the accepted name of a taxon or treated as a synonym (except in cases where the name exists but cannot currently be resolved). The following list is restricted to lists that are directly or indirectly accessible via name matching services, or available for download, and that globally cover broad taxonomic groups (animals, plants, ...) or provide broad cover of taxonomic groups for a country or region. For more specialized purposes, please refer to the listings (and searches) provided by COL-CLB or GBIF (see above).
Catalogue of Life (COL)
Dataset description: https://www.catalogueoflife.org/about/catalogueoflife
Scope: All organisms / global
Downloads: The entire CoL in its latest version can be downloaded from https://www.catalogueoflife.org/data/download in ColDP Archive. Darwin Core Archive, ACEF Archive, or TextTree format. CoL-ChecklistBank also offers partial downloads (in various formats, with DOI), this requires a free GBIF user account.
Catalogue of Life eXtended Release (COL XR)
Dataset description: Hernández-Robles & al. 2023. As of august 2025, COL XR is still under development and published only as a temporary preview
Scope: All organisms / global
Downloads: Available from ChecklistBank (search for <space>XR) or from GBIF.
Encyclopedia of Life (EOL)
Dataset description: https://eol.org/docs/eol-dynamic-hierarchy
Scope: All organisms / global
Downloads: "EOL Dynamic Hierarchy Version 2.2", format: tsv. EOL Dynamic Hierarchy Active Version, format tsv (".tab"). See also https://eol.org/docs/what-is-eol/data-services
Fungal Names
Dataset description: https://nmdc.cn/fungalnames/toabout
Scope: Fungi, global
Downloads: see https://nmdc.cn/fungalnames/towebservice
GBIF Backbone Taxonomy
Dataset description: https://www.checklistbank.org/dataset/53147/about
Scope: All organisms / global
Downloads: https://hosted-datasets.gbif.org/datasets/backbone/current/backbone.zip in DwCA format
Integrated Taxonomic Information System (ITIS)
Dataset description: https://www.itis.gov/about_itis.html
Scope: All organisms / US and global
Downloads: Up to 32,727 records of a specific taxonomic group can be downloaded in Taxonomic Workbench format or as DwC-A - see https://www.itis.gov/access.html.
International Committee on Taxonomy of Viruses (ICTV)
Dataset description: The internationally agreed dataset for viruses and viroids is the ICTV Master Species List ICTV) that provides the current taxonomic status of viral names and their classification.
Scope: Viruses and viroids, global
Downloads: The dataset is included in Catalogue of Life’s ChecklistBank
Mycobank
Dataset description: Crous & al. 2004
Scope: Fungi, global
Downloads: An Excel version of the list of taxa present in MycoBank (export date: 13th of January 2025) can be downloaded from https://www.mycobank.org/Images/MBList.zip.
NCBI Taxonomy
Dataset description: Schoch 2011 / 2021. Metadata on the GBIF site point out that "The NCBI taxonomy database is not a primary source for taxonomic or phylogenetic information. ... the NCBI taxonomy database is not a phylogenetic or taxonomic authority and should not be cited as such".
Scope: All organisms / global
Downloads: The full taxonomy database along with files associating nucleotide and protein sequence records with their taxonomy IDs: https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/
(The) Paleobiology Database
Dataset description: Uhlen & al. 2023
Scope: All organisms (fossils only) / global
Downloads: Records can be downloaded in several formats using the Download Generator
World Checklist of Vascular Plants (WCVP)
Dataset descriptionGovaerts & al. 2021
Scope: Vascular plants
Downloads: According to Rafael Govaerts (pers. comm. July 17, 2024), the primary place for the WCVP dataset is on PoWo, under WCVP "DATA". Both, text (csv) and DarwinCore download are available, with citation and other metadata provided in a readme.txt and the eml.xml file, respectively. The WCVP data can also be checked and downloaded via Checklist Bank in its latest version (the metadata there are less precise).
World Flora Online (WFO) Plant List
Dataset description: https://wfoplantlist.org/background
Scope: Plants (Vascular plants and "bryophytes")
Downloads: Available for the published versions (currently 6-monthly updates, published via ZENODO, with DOI). The current version can be found under https://zenodo.org/records/8079052. If there is a later version available, this will be indicated at the top of the page.
The following formats are available (example for the December 2022 version):
- wfo_plantlist_2022-12.zip The Catalogue of Life Data Package of the WFO Plant List. This is the most expressive standards based form of the list.
- plant_list_2022-12.json.zip JSON formatted version of the WFO Plant List. This has been designed for direct import into a schemaless instance of a SOLR index and is used to drive the WFO Plant List API (https://list.worldfloraonline.org) which in turn drives the WFO Plant List in the portal. This is recommended if you want a local, read only version of the list rather than use the API.
- plant_list_2022-12.sql.gz This is the complete production database (minus logging data and API keys) as a MySQL backup file. It can be restored directly to a MySQL 5.7 or later instance if you require the list in SQL format.
- ipni_to_wfo.csv.gz A file mapping all the IPNI IDs we track to their associated WFO IDs.
- families_dwc.tar.gz Individual Darwin Core Archive files for each of 718 recognized families. If you want a single family in DwC but can't load the whole list download and expand this file. Family and genus files are also available for download through the portal.
- DwC_backbone_R.zip A single Darwin Core Archive file containing non deprecated names and taxa for use in the existing R package.
- _uber.zip A single Darwin Core Archive file containing all names and taxa even those that are deprecated along with some extra columns
Weekly updated DwC-Archive files for all families and for the _uber.zip are available at https://list.worldfloraonline.org/rhakhis/api/downloads/dwc/
Nomenclators
A special class of taxonomic datasets that systematically catalog scientific names along with their authorship, publication date and references, nomenclatural status, and (sometimes) type information. These focus on nomenclatural accuracy such as on correct spelling and nomenclatural validity rather than providing taxonomic opinion or classification.
Index Nominum Algarum
Dataset description: Index Nominum Algarum is a nomenclator comprising >200,000 algal names compiled from index cards.
Scope: Algae
Downloads: Available in Catalogue of Life’s data infrastructure ChecklistBank
International Plant Name Index (IPNI)
Dataset description: [https://ipni.org/about
Scope: Vascular plants
Downloads: upon request
List of Prokaryotic names with Standing in Nomenclature (LPSN)
Dataset description: The LPSN covers validly published names of cyanobacteria under the International Code of Nomenclature of Prokaryotes (ICNP Parker & al. 2019). Note that cyanobacteria may also be named under the International Code of Nomenclature of Algae, Fungi and Plants. This dual treatment may result in some names being valid under ICNAFP and not under ICNP.
Scope: Procaryots
Downloads: with registration
Nomenclator Zoologicus
Dataset description: wikipedia/ChecklistBank
Scope: animals
Downloads: via Zenodo [1], also available in Catalogue of Life’s data infrastructure ChecklistBank
Zoobank
Dataset description:https://zoobank.org/About
Scope: animals
Downloads: was available as a GBIF dataset, but currently not accessible [22 sep 2025]