Database statistics#

mOTUs taxonomy overview#

There are 124’295 mOTUs in release 4.1. Each mOTU is mapped on to the GTDB R226 taxonomy and summarised by taxonomic rank.

mOTUs Taxonomy Statistics#

Bacteria

Archaea

Total

Phylum

145

19

164

Class

410

52

462

Order

1,431

115

1,546

Family

3,653

370

4,023

Genus

15,904

1,090

16,994

mOTU

120,445

3,850

124,295

Genome Categories#

The mOTUs database contains ~3 million MAGs and ~1 million isolate/single-cell genomes, all mapped to the GTDB R226 taxonomy. The plot displays the percentage of taxonomic ranks represented by MAGs only, isolates only, or a combination of both (Mixed). The final “Genomes” bar reflects the total ratio of MAGs within the entire database.

../_images/genome_categories.png

Genome Quality#

All genomes (~ 4 million) in the mOTUs 4.1 genomes database were quality controlled and bins with a completeness below 50% and completeness below 10% were removed (exceptions were made for Eremiobacterota, see here). The figure below shows the quality distribution for all genomes in the mOTUs 4.1 database.

../_images/genome_quality.png

Representative Genome Quality#

Representative genomes for each mOTU are selected based on a hierarchical scoring system designed to prioritize biological reliability and assembly quality.

Selection Priority#

Priority

Criterion

  1. Source

Isolate genomes and SAGs are preferred over MAGs.

  1. Quality

Highest Q-Score (Completeness - 5 * Contamination)

  1. Continuity

Highest N50 (assembly statistic: half of the genome is contained in contigs of this length or longer).

The figure below shows the distribution of genome quality of representative genomes:

../_images/rep_genome_quality.png

Environmental annotation#

MAGs of the mOTUs 4.1 database were reconstructed from >120k short read metagenomic samples. Where possible samples were annotated using a hierarchical six level environmental ontology. The figure below shows the distribution of environments across samples.

Taxa with the largest number of mOTUs#

GTDB taxonomy of mOTUs sorted by number and taxonomic rank.

Top 10 Taxa per Rank#

Phylum

Class

Order

Family

Genus

Pseudomonadota (30’707)

Clostridia (19’431)

Bacteroidales (8’738)

Lachnospiraceae (4’213)

Streptococcus (2’185)

Bacillota (29’895)

Bacteroidia (17’706)

Oscillospirales (8’353)

Burkholderiaceae (3’782)

Collinsella (1’807)

Bacteroidota (17’706)

Gammaproteobacteria (17’207)

Burkholderiales (5’789)

Flavobacteriaceae (3’085)

Nanosyncoccus (1’541)

Actinomycetota (13’272)

Alphaproteobacteria (13’459)

Flavobacteriales (4’962)

Bacteroidaceae (2’887)

Prevotella (1’455)

Patescibacteriota (5’738)

Bacilli (9’086)

Lachnospirales (4’461)

Oscillospiraceae (2’514)

Cryptobacteroides (1’164)

Verrucomicrobiota (3’097)

Actinomycetes (8’175)

Pseudomonadales (3’770)

Streptococcaceae (2’224)

Faecousia (1’028)

Chloroflexota (2’882)

Saccharimonadia (3’365)

Saccharimonadales (3’327)

Rhodobacteraceae (2’219)

Flavobacterium (894)

Cyanobacteriota (2’277)

Coriobacteriia (2’926)

Actinomycetales (3’243)

CAG-272 (2’115)

Pelagibacter (886)

Planctomycetota (1’869)

Verrucomicrobiia (2’154)

Lactobacillales (3’175)

Acutalibacteraceae (2’034)

Streptomyces (734)

Acidobacteriota (1’766)

Acidimicrobiia (1’695)

Christensenellales (2’943)

UBA660 (1’943)

Colicola (715)

Global distribution of environmental samples#

../_images/motus4.1.samples_worldmap.png


ico1 mOTUs is part of SIB's portfolio of open tools and databases.

ico2 mOTUs is part of the ELIXIR-CH Service Delivery Plan.