Taxonomic datasets
Contents
- 1 Repositories
- 2 Taxonomic datasets
- 2.1 Algaebase
- 2.2 Artsdatabanken (Norway) Nortaxa
- 2.3 Barcode of Life Taxonomy
- 2.4 Catalogue of Life (COL)
- 2.5 Catalogue of Life eXtended Release (COL XR)
- 2.6 Dyntaxa (Sweden)
- 2.7 Elurikkus Checklist of Estonian Species
- 2.8 Encyclopedia of Life (EOL)
- 2.9 Euro+Med PlantBase
- 2.10 FinBIF Taxon Database
- 2.11 GBIF Backbone Taxonomy
- 2.12 iNaturalist Taxonomy
- 2.13 Index to Organism Names (ION)
- 2.14 Integrated Taxonomic Information System (ITIS)
- 2.15 International Committee on Taxonomy of Viruses (ICTV)
- 2.16 Le Référentiel Taxonomique Taxref
- 2.17 Leipzig Catalogue of Vascular Plants
- 2.18 Mycobank
- 2.19 NCBI Taxonomy
- 2.20 (The) Paleobiology Database
- 2.21 Pan European Species Information Infrastructure EU - Nomen
- 2.22 Species Fungorum [Plus]
- 2.23 UK Species Inventory / Species Dictionary
- 2.24 World Checklist of Vascular Plants (WCVP)
- 2.25 World Flora Online (WFO) Plant List
- 2.26 World Register of Marine Species (WORMS)
- 3 Nomenclators
- 4 Footnotes
Repositories
Listed are sites that host uploaded or imported datasets from various sources. All listed here also provide some kind of API and name matching services for the included datasets. Only ChecklistBank and GBIF provide the possibility to also download the datasets.
ChecklistBank (CLB)
CLB was developed by the Catalogue of Life (COL) and the Global Biodiversity Information Facility (GBIF). It is a repository, holding a huge number of individual datasets, ranging from large checklists (like COL or World Flora Online editions) to data extracted from individual publications (taxonomic treatments). The latter, provided by PLAZI, form the bulk of the submissions (as of August 2025, more than 58,000 datasets of a total of about 61,100). Functioning of the portal is documented in a tutorial for users and the code is managed on Github.
Downloads: All data can be downloaded, in the original format or in various formats. Download requires a login with a free GBIF user account. Parts of checklists can be downloaded by means of selecting a root taxon, e.g. a genus within the checklist, or a specific taxonomic rank.
Global Biodiversity Information Facility (GBIF)
GBIF is storing uploaded taxonomic datasets, assigns a DOI and makes them available for download. As of August 2025, nearly 61,000 datasets are listed - largely overlapping with ChecklistBank. Metadata descriptions are usually comprehensive. The datasets have been used to assemble the GBIF Backbone Taxonomy (see under datasets below) as a "single, synthetic management classification with the goal of covering all names GBIF is dealing with". This dataset will be replaced by the Catalogue of Life eXtended Release (COL XR, also see below) in the future [Hernández-Robles & al. 2023].
Downloads: All datasets can be downloaded as DwCA files.
Global names architecture (GNA)
The Global Names Architecture (GNA) is a system of web-services which helps people to register, find, index, check and organize biological scientific names and interconnect on-line information about species. For its name matching tool (Global Names Verifier) it draws on a number of imported taxonomic datasets which are regularly updated, where possible [checked 22 aug 2025]. Name matching can be restricted to one or more of these. However, these datasets cannot be downloaded directly from the GNA website.
Botanical information and ecology network (BIEN)
BIEN is a network based at the National Center for Ecological Analysis and Synthesis (NCEAS). BIEN aims at integrating global botanical data (see https://bien.nceas.ucsb.edu/bien/about/. For its name matching service (Taxonomic Name Resolution Service TNRS) it hosts botanical datasets that can be selected for the matching process.
Taxonomic datasets
These are structured lists of scientific names that follow a single classification system. They typically present a hierarchical, tree-like taxonomy in which each taxon represents a node. Each scientific name is either assigned as the accepted name of a taxon or treated as a synonym (except in cases where the name exists but cannot currently be resolved). The following list is restricted to lists that are directly or indirectly accessible via name matching services, or available for download, and that globally cover broad taxonomic groups (animals, plants, ...) or provide broad cover of taxonomic groups for a country or region. For more specialized purposes, please refer to the listings (and searches) provided by COL-CLB or GBIF (see above).
Algaebase
Scope: Algae / global
Dataset description: https://www.algaebase.org/about/
Downloads: Not freely available.
Name-ID / attribute name: yes, numeric / species-ID (for names ranked species and below, for genera and above "#" + numeric)
Example:
https://www.algaebase.org/search/species/detail/?species_id=194111
https://www.algaebase.org/browse/taxonomy/#86701
Synonym ID directly resolves to: Name
Stability: Not stated (but apparently stable at least since 2014)
Policy on orthographic changes: Not stated
Artsdatabanken (Norway) Nortaxa
Scope: All organisms/ Norway
Dataset description: https://nortaxa.artsdatabanken.no/about
Downloads:
Name-ID / attribute name: / ScientificNameId
Example: https://nortaxa.artsdatabanken.no/name-info/100140
Synonym ID directly resolves to: Name
Stability:
Use of ScientificNameID in the API implies stability.
Policy on orthographic changes:
No explicit policy stated
Barcode of Life Taxonomy
Scope: All organisms/ global
Dataset description: https://boldsystems.org/data/taxonomy-page/
Name-ID / attribute name: taxid / taxid
Example: https://bench.boldsystems.org/index.php/Taxbrowser_Taxonpage?taxid=1222
Synonym ID directly resolves to: n/a
Stability: not stated
Policy on orthographic changes: not stated
Catalogue of Life (COL)
Scope: All organisms / global
Dataset description: https://www.catalogueoflife.org/about/catalogueoflife
Downloads: The entire CoL in its latest version can be downloaded from https://www.catalogueoflife.org/data/download in ColDP Archive. Darwin Core Archive, ACEF Archive, or TextTree format. CoL-ChecklistBank also offers partial downloads (in various formats, with DOI), this requires a free GBIF user account.
Name-ID / attribute name: yes, alphanumeric / COL Stable Identifier
Example: https://www.catalogueoflife.org/data/taxon/4HQ39
Synonym ID directly resolves to: Accepted name
Stability: Stable since March 2022 (nomenclatural author corrections possible)
Policy on orthographic changes: See blog post - information page in preparation (MD, pers. comm. June 2025)
Catalogue of Life eXtended Release (COL XR)
Scope: All organisms / global
Dataset description: Hernández-Robles & al. 2023. As of august 2025, COL XR is still under development and published only as a temporary preview
Downloads: Available from ChecklistBank (search for <space>XR) or from GBIF.
Name-ID / attribute name: ?? yes, alphanumeric / COL Stable Identifier
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
Dyntaxa (Sweden)
Elurikkus Checklist of Estonian Species
Encyclopedia of Life (EOL)
Scope: All organisms / global
Dataset description: https://eol.org/docs/eol-dynamic-hierarchy
Downloads: "EOL Dynamic Hierarchy Version 2.2", format: tsv. EOL Dynamic Hierarchy Active Version, format tsv (".tab"). See also https://eol.org/docs/what-is-eol/data-services
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
Euro+Med PlantBase
FinBIF Taxon Database
GBIF Backbone Taxonomy
Scope: All organisms / global
Dataset description: https://www.checklistbank.org/dataset/53147/about
Downloads: https://hosted-datasets.gbif.org/datasets/backbone/current/backbone.zip in DwCA format
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
iNaturalist Taxonomy
Index to Organism Names (ION)
Integrated Taxonomic Information System (ITIS)
Scope: All organisms / US and global
Dataset description: https://www.itis.gov/about_itis.html
Downloads: Up to 32,727 records of a specific taxonomic group can be downloaded in Taxonomic Workbench format or as DwC-A - see https://www.itis.gov/access.html.
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
International Committee on Taxonomy of Viruses (ICTV)
Scope: Viruses and viroids, global
Dataset description: The internationally agreed dataset for viruses and viroids is the ICTV Master Species List ICTV) that provides the current taxonomic status of viral names and their classification.
Downloads: The dataset is included in Catalogue of Life’s ChecklistBank
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
Le Référentiel Taxonomique Taxref
Leipzig Catalogue of Vascular Plants
Mycobank
Scope: Fungi, global
Dataset description: Crous & al. 2004
Downloads: An Excel version of the list of taxa present in MycoBank (export date: 13th of January 2025) can be downloaded from https://www.mycobank.org/Images/MBList.zip.
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
NCBI Taxonomy
Scope: All organisms / global
Dataset description: Schoch 2011 / 2021. Metadata on the GBIF site point out that "The NCBI taxonomy database is not a primary source for taxonomic or phylogenetic information. ... the NCBI taxonomy database is not a phylogenetic or taxonomic authority and should not be cited as such".
Downloads: The full taxonomy database along with files associating nucleotide and protein sequence records with their taxonomy IDs: https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
(The) Paleobiology Database
Scope: All organisms (fossils only) / global
Dataset description: Uhlen & al. 2023
Downloads: Records can be downloaded in several formats using the Download Generator
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
Pan European Species Information Infrastructure EU - Nomen
Species Fungorum [Plus]
UK Species Inventory / Species Dictionary
World Checklist of Vascular Plants (WCVP)
Scope: Vascular plants
Dataset descriptionGovaerts & al. 2021
Downloads: According to Rafael Govaerts (pers. comm. July 17, 2024), the primary place for the WCVP dataset is on PoWo, under WCVP "DATA". Both, text (csv) and DarwinCore download are available, with citation and other metadata provided in a readme.txt and the eml.xml file, respectively. The WCVP data can also be checked and downloaded via Checklist Bank in its latest version (the metadata there are less precise).
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
World Flora Online (WFO) Plant List
Scope: Plants (Vascular plants and "bryophytes")
Dataset description: https://wfoplantlist.org/background
Downloads: Available for the published versions (currently 6-monthly updates, published via ZENODO, with DOI). The current version can be found under https://zenodo.org/records/8079052. If there is a later version available, this will be indicated at the top of the page.
The following formats are available (example for the December 2022 version):
- wfo_plantlist_2022-12.zip The Catalogue of Life Data Package of the WFO Plant List. This is the most expressive standards based form of the list.
- plant_list_2022-12.json.zip JSON formatted version of the WFO Plant List. This has been designed for direct import into a schemaless instance of a SOLR index and is used to drive the WFO Plant List API (https://list.worldfloraonline.org) which in turn drives the WFO Plant List in the portal. This is recommended if you want a local, read only version of the list rather than use the API.
- plant_list_2022-12.sql.gz This is the complete production database (minus logging data and API keys) as a MySQL backup file. It can be restored directly to a MySQL 5.7 or later instance if you require the list in SQL format.
- ipni_to_wfo.csv.gz A file mapping all the IPNI IDs we track to their associated WFO IDs.
- families_dwc.tar.gz Individual Darwin Core Archive files for each of 718 recognized families. If you want a single family in DwC but can't load the whole list download and expand this file. Family and genus files are also available for download through the portal.
- DwC_backbone_R.zip A single Darwin Core Archive file containing non deprecated names and taxa for use in the existing R package.
- _uber.zip A single Darwin Core Archive file containing all names and taxa even those that are deprecated along with some extra columns
Weekly updated DwC-Archive files for all families and for the _uber.zip are available at https://list.worldfloraonline.org/rhakhis/api/downloads/dwc/
Name-ID / attribute name: wfo-id
Example: https://www.worldfloraonline.org/taxon/wfo-0000891536
Synonym ID directly resolves to: Taxon
Stability: Stable
Policy on orthographic changes: Variants of the canonical name normally receive a new WFO-ID; see https://biss.pensoft.net/article/111210/, no ID is ever deleted https://list.worldfloraonline.org/index.php under Identifiers
World Register of Marine Species (WORMS)
Scope: Marine species, global
Dataset description::
Downloads:
Name ID / attribute name: yes, numeric / AphiaID
Example:
Synonym ID directly resolves to: Taxon
Stability:
Policy on orthographic changes:
Nomenclators
A special class of taxonomic datasets that systematically catalog scientific names along with their authorship, publication date and references, nomenclatural status, and (sometimes) type information. These focus on nomenclatural accuracy such as on correct spelling and nomenclatural validity rather than providing taxonomic opinion or classification.
Australian Plant Name Index
Scope: Plant names (vascular plants and mosses)
Dataset description: [1]
Downloads:
Name-ID / attribute name: yes, numeric
Example: https://id.biodiversity.org.au/name/apni/102507
Stability: Not stated, but presumed
Policy on orthographic changes: Not stated but some names with IDs deprecated
Index Nominum Algarum
Scope: Algae
Dataset description: Index Nominum Algarum is a nomenclator comprising >200,000 algal names compiled from index cards.
Downloads: Available in Catalogue of Life’s data infrastructure ChecklistBank
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
Fungal Names
Scope: Fungi, global (incl. lichens)
Dataset description: https://nmdc.cn/fungalnames/toabout
Downloads: see https://nmdc.cn/fungalnames/towebservice
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
Index Fungorum
Scope: Fungi, global (incl. lichens)
Dataset description:
Downloads:
Name-ID / attribute name: Yes, numerical /
Example: http://www.indexfungorum.org/Names/NamesRecord.asp?RecordID=550000
Synonym ID directly resolves to: Name
Stability: Apparently stable, links to mandatory registration for recent names
Policy on orthographic changes: Not stated
International Plant Name Index (IPNI)
Scope: Vascular plants
Dataset description: [https://ipni.org/about
Downloads: upon request
Name ID / attribute name: yes, LSID*
Example: https://ipni.org/n/77108241-1
Synonym ID directly resolves to: n/a
Stability: Stable but records may be deprecated and inaccessible. Duplicates exist but are linked to each other.
Policy on orthographic changes: Not stated, but orthographic corrections are made
List of Prokaryotic names with Standing in Nomenclature (LPSN)
Dataset description: The LPSN covers validly published names of cyanobacteria under the International Code of Nomenclature of Prokaryotes (ICNP Parker & al. 2019). Note that cyanobacteria may also be named under the International Code of Nomenclature of Algae, Fungi and Plants. This dual treatment may result in some names being valid under ICNAFP and not under ICNP.
Scope: Procaryots
Downloads: with registration
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
Nomenclator Zoologicus
Dataset description: wikipedia/ChecklistBank
Scope: animals
Downloads: via Zenodo [2], also available in Catalogue of Life’s data infrastructure ChecklistBank
Name-ID / attribute name:
Example:
Synonym ID directly resolves to:
Stability:
Policy on orthographic changes:
TROPICOS
Scope: Plants / global
Dataset description: https://tropicos.org/home
Downloads: Partial downloads possible with login
Name-ID / attribute name:
Example:
Stability:
Policy on orthographic changes:
Zoobank
Dataset description:https://zoobank.org/About
Scope: animals
Downloads: was available as a GBIF dataset, but currently not accessible [22 sep 2025]
Name ID / attribute name: yes, LSID*
Example:
Synonym ID directly resolves to: n/a
Stability: Linked to ICZN registration.
Policy on orthographic changes: Not stated, but practices may vary for later changes
Footnotes
- LSID: The IPNI LSID (Life Science Identifier) is a unique identifier used to represent scientific names of plants in the International Plant Names Index (IPNI). The LSID is a standardized format for representing these names and their associated data in a machine-readable way. An LSID consists of several parts, including a prefix that identifies the issuing authority (in this case, IPNI), a namespace that identifies the type of data (usually "names"), and a unique identifier for the specific record. For example, a typical IPNI LSID might look something like this: urn:lsid:ipni.org:names:77108241-1. In this LSID, "ipni.org" is the domain of the issuing authority (IPNI), "names" indicates that it's a record for a plant name, and "77108241-1" is the unique identifier for that specific record. (Source: ChatGPT GPT3.5 "IPNI LSID" - accessed 7 nov 2023)