Output files#

Using the commands of the mOTUs tool will create different output files. Their format and content are explained in this section.

BAM file#

The profile or map_tax routines output a bam-formatted alignment file. This file contains sorted and prefiltered alignments of submitted sequences against the mOTUs marker gene databases. Keeping this intermediate file allows the user to calculate mOTUs abundances without the need to re-align raw sequencing files (see calc_mgc).

MGC file#

The profile and calc_mgc routines output abundances of marker gene clusters (MGC). The first column indicates the MGC, while the other columns provide individual quantification metrics. For more details, refer to the Concept section.

The first line of every mgc file also includes a header to track versions and parameters.

#tool_version=4.0.2     database_version=4.0    min_alignment_length=110
MGC     INSERT_RAW      INSERT_NORM     INSERT_SCALED   BASE_RAW        BASE_NORM
mOTUv4.0_000000.MGC_COG0012_0000000001  55.1894 0.0009416992    73.7742 10975.4584      0.0009676089
mOTUv4.0_000000.MGC_COG0016_0000000002  38.2247 0.0007244105    56.7515 6895.8274       0.0006755164
mOTUv4.0_000000.MGC_COG0018_0000000003  66.7233 0.0007233353    56.6673 13976.5549      0.0007860029
mOTUv4.0_000000.MGC_COG0172_0000000004  47.8899 0.0006986440    54.7329 8275.3306       0.0006230695
mOTUv4.0_000000.MGC_COG0215_0000000005  43.5143 0.0006030045    47.2404 7645.3935       0.0005472397
mOTUv4.0_000000.MGC_COG0495_0000000006  43.7432 0.0003359534    26.3191 8815.9793       0.0003501499

Insert Fate file#

The profile and calc_mgc routines generate an additional output file called Insert Fate (file suffix .inserts.gz). This file reports to which marker gene or, in the case of multimappers, which marker genes each aligned insert was assigned. The file is a gzipped tsv file with the first column containing the name of an insert and the second column containing its weight. For example, for a unique mapper (an insert with only one best hit), the weight was one (ERR2726419.1174). In contrast, for a multimapper (an insert with multiple best hits), the weights were distributed to different marker genes (ERR2726419.18784141). The weights are assigned proportionally to the abundance of marker gene containing MGC clusters.

ERR2726419.1174             mOTUv4.0_002372.MGC_COG0525_0000023616.MG_0000652896                    1.0000
ERR2726419.1849             mOTUv4.0_002737.MGC_COG0018_0000027226.MG_0000679617                    1.0000
ERR2726419.2768             mOTUv4.0_000007.MGC_COG0012_0000000071.MG_0000029450                    1.0000
ERR2726419.18784141         mOTUv4.0_000215.MGC_COG0525_0000002155.MG_0000291111                    0.2942
ERR2726419.18784141         mOTUv4.0_000410.MGC_COG0525_0000004097.MG_0000377269                    0.7058
ERR2726419.18784674         mOTUv4.0_000245.MGC_COG0018_0000002447.MG_0000307560                    1.0000
ERR2726419.18785094         mOTUv4.0_000023.MGC_COG0016_0000000232.MG_0000075690                    0.6326
ERR2726419.18785094         mOTUv4.0_unassigned.MGC_COG0016_1001268401.MG_1002418423            0.3674

mOTUs file#

The primary output of the mOTUs tool (profile, calc_motu) is a taxonomic profile file.

Each mOTUs file begins with a header line that tracks versions and parameters. There is one line per mOTU, containing abundances. The columns in this line provide information about the mOTU, taxonomy, and the associated count. By default, mOTUs generates integer counts, similar to standard ASV/OTU pipelines. However mOTUs also creates a second abundance file where the counts are converted into relative abundances.

Output as counts#

#tool_version=4.0.2 database_version=4.0    min_alignment_length=75 min_mgcs=3      count_mode=INSERT_SCALED        value_type=counts
mOTU        Taxonomy        WIRB19-1_SAMEA4817893_METAG_ERR2726404
mOTUv4.0_000000     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium duncaniae        713
mOTUv4.0_000002     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Roseburia;s__Roseburia inulinivorans   119
mOTUv4.0_000004     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium prausnitzii      30
mOTUv4.0_000006     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis  1177
mOTUv4.0_000007     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium longum   327
mOTUv4.0_000011     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium sp900539945      328
mOTUv4.0_000012     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium prausnitzii_D    389
mOTUv4.0_000016     d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Actinomycetales;f__Bifidobacteriaceae;g__Bifidobacterium;s__Bifidobacterium adolescentis      93

Output as relative abundances#

#tool_version=4.0.2 database_version=4.0    min_alignment_length=75 min_mgcs=3      count_mode=INSERT_SCALED        value_type=relative_abundances
mOTU        Taxonomy        WIRB19-1_SAMEA4817893_METAG_ERR2726404
mOTUv4.0_000000     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium duncaniae        0.06199401
mOTUv4.0_000002     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Roseburia;s__Roseburia inulinivorans   0.01039013
mOTUv4.0_000004     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium prausnitzii      0.00263340
mOTUv4.0_000006     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis  0.10242714
mOTUv4.0_000007     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium longum   0.02840710
mOTUv4.0_000011     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium sp900539945      0.02856296
mOTUv4.0_000012     d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium prausnitzii_D    0.03384779
mOTUv4.0_000016     d__Bacteria;p__Actinomycetota;c__Actinomycetes;o__Actinomycetales;f__Bifidobacteriaceae;g__Bifidobacterium;s__Bifidobacterium adolescentis      0.00805011

Classify file#

The classify routine attempts to assign user-submitted genomes to existing mOTUs. The tabular output file contains one line per submitted genome, indicating the assigned mOTU, <6MGs-no_mOTU if the genome lacked at least 5 out of 10 marker genes, or Novel-no_mOTU if the genome had >6MGs marker genes but couldn’t be assigned to any mOTU.

GENOME       MOTU             SIMILARITY      NUM_MGS
GENOME_1.fa  mOTUv4.0_000162  98.54           9
GENOME_2.fa  <6MGs-no_mOTU    -1              5
GENOME_3.fa  Novel-no_mOTU    -1              7

Genomes file#

The motus genomes routine takes as input a query which can be either genome name, functional or taxonomic annotation and reports, if a match was found, the matching genomes. By default only the names of the genomes and the search query is reported. E.g when searching for PF05247 the result has two columns:

GENOME                                  QUERY
ACIN21-1_SAMN05421660_MAG_00000013      PF05247
ACIN21-1_SAMN05421901_MAG_00000031      PF05247
ACIN21-1_SAMN05421913_MAG_00000003      PF05247
ACIN21-1_SAMN05422104_MAG_00000029      PF05247
ACIN21-1_SAMN05422107_MAG_00000026      PF05247
ACIN21-1_SAMN05422110_MAG_00000010      PF05247
ACIN21-1_SAMN05422113_MAG_00000020      PF05247
ACIN21-1_SAMN05422114_MAG_00000003      PF05247
...

When requesting KEGG and taxonomy to be added to the result, the output is a tabular file with 5 columns (column values shortened for to fit into the document):

GENOME                                  QUERY   TAXONOMY                     mOTU            KEGG
ACIN21-1_SAMN05421660_MAG_00000013      PF05247 Bacteria;Pseudomonadota;.... mOTUv4.0_000917 K00001;K00003;...
ACIN21-1_SAMN05421901_MAG_00000031      PF05247 Bacteria;Pseudomonadota;.... no_mOTU         K00003;K00014...
ACIN21-1_SAMN05421913_MAG_00000003      PF05247 Bacteria;Pseudomonadota;.... no_mOTU         K00003;K00012;...
ACIN21-1_SAMN05422104_MAG_00000029      PF05247 Bacteria;Pseudomonadota;.... no_mOTU         K00004;K00012;...
ACIN21-1_SAMN05422107_MAG_00000026      PF05247 Bacteria;Pseudomonadota;.... mOTUv4.0_000917 K00001;K00003;...
ACIN21-1_SAMN05422110_MAG_00000010      PF05247 Bacteria;Pseudomonadota;.... no_mOTU         K00003;K00004;...
ACIN21-1_SAMN05422113_MAG_00000020      PF05247 Bacteria;Pseudomonadota;.... no_mOTU         K00003;K00012;...
...


ico1 mOTUs is part of SIB's portfolio of open tools and databases.

ico2 mOTUs is part of the ELIXIR-CH Service Delivery Plan.