motus profile#

Note: This tutorial has been designed to be run on Unix-based systems (macOS or Linux) and requires the mOTUs profiler to be correctly installed as described on the quickstart page.

The motus profile produces a taxonomic profile from short read metagenomic sequencing data by running motus map_tax, motus calc_mgc, and motus calc_motu in succession.

By default, motus profile requires the following parameters (see option manual):

  • -f | --forward: FastQ/A file(s) containing forward reads from paired-end shotgun metagenome data, separated by spaces.

  • -r | --reverse: FastQ/A file(s) containing reverse reads from paired-end shotgun metagenome data, separated by spaces.

  • -s | --single: FastQ/A file(s) containing reads from single-end shotgun metagenome data, separated by spaces.

  • -o | --output-file: Path and prefix for the output files. This prefix is also used for naming intermediate files.

To run the tool, you must provide either -f together with -r (paired-end), or -s by itself (single-end). When using paired-end data, ensure the file order in -f matches the order in -r.

Note: Although this parameter is not required for motus profile, we strongly recommend providing -n or --sample-name when profiling multiple samples as this enables merging them into a single taxonomic profile.

Before we begin the tutorial, we need to download example paired-end short read sequencing data: forward (*_1.fastq) and reverse reads (*_2.fastq) from metagenomic samples A, B and C.

If you are working on Linux, you can download the data with wget:

wget https://zenodo.org/record/7188406/files/sampleA_1.fastq
wget https://zenodo.org/record/7188406/files/sampleA_2.fastq

wget https://zenodo.org/record/7188406/files/sampleB_1.fastq
wget https://zenodo.org/record/7188406/files/sampleB_2.fastq

wget https://zenodo.org/record/7188406/files/sampleC_1.fastq
wget https://zenodo.org/record/7188406/files/sampleC_2.fastq

If you are working on macOS, you can download the data with curl:

curl https://zenodo.org/records/7188406/files/sampleA_1.fastq -o sampleA_1.fastq
curl https://zenodo.org/records/7188406/files/sampleA_2.fastq -o sampleA_2.fastq

curl https://zenodo.org/records/7188406/files/sampleB_1.fastq -o sampleB_1.fastq
curl https://zenodo.org/records/7188406/files/sampleB_2.fastq -o sampleB_2.fastq

curl https://zenodo.org/records/7188406/files/sampleC_1.fastq -o sampleC_1.fastq
curl https://zenodo.org/records/7188406/files/sampleC_2.fastq -o sampleC_2.fastq

The files should contain 67’926 reads for sampleA, 196’034 reads for sampleB, and 139’238 reads for sampleC.

Profiling a single sample#

You can create a taxonomic profile for a single metagenomic sample using motus profile command. To create a profiles for the three samples, run:

motus profile -f sampleA_1.fastq -r sampleA_2.fastq -n sampleA -o sampleA.mOTUs4

Important: Providing multiple FastQ/A files to a single motus profile command is intended for combining multiple sequencing runs from the same biological sample. Each unique biological sample must be profiled using a separate motus profile command. For detailed explanation, see Input files.

After running the command, the beginning of sampleA.mOTUs4 file should look like the following:

#tool_version=4.0.4   database_version=4.0    min_alignment_length=75 min_mgcs=3      count_mode=INSERT_SCALED        value_type=counts
mOTU               Taxonomy                                                                                                                                         sampleA
mOTUv4.0_000021    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Hominicoprocola;s__Unknown Hominicoprocola mOTUv4.0_000021    113
mOTUv4.0_000030    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Ruminococcus_E;s__Ruminococcus_E bromii_B                   6
mOTUv4.0_000036    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__CAG-217;s__CAG-217 sp000436335                              107
mOTUv4.0_000060    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Hominenteromicrobium;s__Hominenteromicrobium mulieris       2
mOTUv4.0_000063    d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli                345
mOTUv4.0_000080    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Vescimonas;s__Vescimonas coprocola                            2
mOTUv4.0_000147    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Acutalibacter;s__Acutalibacter ornithocaccae                10
mOTUv4.0_000239    d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides ovatus                               13

The first line indicates which version of the mOTUs tool and the marker gene database were used, as well as the parameters, which you can adjust tou your use case based on the instructions in the option manual.

The second line is the header containing three columns: the mOTU identifier, the GTDB taxonomy assigned to the corresponding mOTU, and the sample name as specified by the -n flag. The third column contains counts of the corresponding mOTUs in the sample (see Counting method for more information). The mOTUv4.0_unassigned in the last row of the profile refers to the number of inserts which mapped to unlinked marker genes, i.e. the number of detected cells in the sample for which we were not able to identify the species.

After running the command, the following files will be generated together with the sampleA.mOTUs4 file(see Output files for examples):

Overview of output files generated by motus profile#

File Name

Description

sampleA.mOTUs4

taxonomic profile containing counts for each mOTU

sampleA.mOTUs4.relab

taxonomic profile containing relative abundances for each mOTU

sampleA.mOTUs4.bam

alignment file produced by the bwa aligner

sampleA.mOTUs4.mgc

abundances of each marker gene cluster

sampleA.mOTUs4.inserts.gz

overview of which marker gene sequence each insert in the input file was mapped to

Multi-threading#

The runtime of the motus profile command is limited by the mapping of input reads against the marker gene database (motus map_tax). You can assign multiple threads (-t flag) to accelerate the alignment process:

motus profile -f sampleA_1.fastq -r sampleA_2.fastq -n sampleA -o sampleA.mOTUs4 -t 4

In our tests, runtime scaled almost linearly up to 16 threads (-t 16).

Profiling multiple samples#

The motus profile command is meant to run on one sample at a time. For profiling many a samples at a time, we recommend generating a file which contains the list of samples to profile. For example, let’s assume we have a samples.txt file containing the following three samples:

sampleA
sampleB
sampleC

The following bash script can then be run assuming sampleA_1.fastq, sampleA_2.fastq, etc. are in the same folder:

#!/bin/bash

# Define input file
SAMPLE_FILE="samples.txt"

# Check if input file exists
if [[ ! -f "$SAMPLE_FILE" ]]; then
    echo "Error: $SAMPLE_FILE not found!"
    exit 1
fi

# Loop through each line in the file
while read SAMPLE; do
    echo "Processing:" $SAMPLE

    # Run mOTUs profile on selected sample
    motus profile \
        -f "${SAMPLE}_1.fastq" \
        -r "${SAMPLE}_2.fastq" \
        -n "$SAMPLE" \
        -o "results/${SAMPLE}.mOTUs4" \
        -t 4

    echo "Finished:" $SAMPLE
    echo "-----------------------------------"
done < $SAMPLE_FILE

Alternatively, motus profile can be incorporated into a Snakemake pipeline:

import os

# Load sample IDs from a text file
# .strip() removes whitespace/newlines; 'if line.strip()' skips empty lines
SAMPLES = [line.strip() for line in open("samples.txt") if line.strip()]

rule all:
    input:
        expand("results/{sample}.mOTUs4", sample=SAMPLES)

# Run motus profile on all samples from the samples.txt file
rule motus_profile:
    input:
        fwd = "{sample}_1.fastq",
        rev = "{sample}_2.fastq"
    output:
        "results/{sample}.mOTUs4"
    threads: 4
    shell:
        """
        motus profile \
            -f {input.fwd} \
            -r {input.rev} \
            -n {sample} \
            -o {output} \
            -t {threads}
        """

Before starting the Snakemake job, make sure to run snakemake --dry-run to verify that all profiling commands have been constructed correctly.

Merging profiles into one table#

Note: For merging, ensure all profiles have been generated using the same tool and database version, with consistent parameters.

In metagenomic studies, profiles from multiple samples are frequently compiled into a single abundance table. In this table, rows represent taxa, columns represent samples, and each cell is the abundance of a specific taxon in a specific sample. To generate an abundance table, run the motus merge command, which requires the following parameters:

  • -i | --input-files: a space-separated list of profile files OR a text file listing profile files, with one file per line

  • -o | --output-file: a path where to store the generated abundance table

To merge all profiles output by motus profile, run:

motus merge -i sampleA.mOTUs4 sampleB.mOTUs4 sampleC.mOTUs4 -o output.mOTUs4

Alternatively, you can run:

motus merge -i *.mOTUs4 -o output.mOTUs4

In case there are many profiles, generate a file called profiles.txt, where the path to each sample profile has its own line:

sampleA.mOTUs4
sampleB.mOTUs4
sampleC.mOTUs4

And run:

motus merge -i profiles.txt -o output.mOTUs4

The first rows of the resulting tab-separated values file output.mOTUs4 should contain the following data:

#tool_version=4.0.4   database_version=4.0    min_alignment_length=75 min_mgcs=3      count_mode=INSERT_SCALED        value_type=counts
mOTU               Taxonomy                                                                                                                              sampleA    sampleB    sampleC
mOTUv4.0_000000    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium duncaniae      0          0          5
mOTUv4.0_000001    d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Prevotella;s__Unknown Prevotella mOTUv4.0_000001     0          0          3218
mOTUv4.0_000002    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Roseburia;s__Roseburia inulinivorans                 0          6          0
mOTUv4.0_000003    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Christensenellales;f__Aristaeellaceae;g__UBA11524;s__UBA11524 sp000437595                 0          0          5
mOTUv4.0_000004    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium prausnitzii    0          0          48
mOTUv4.0_000006    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis                0          9          4
mOTUv4.0_000007    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium longum         0          15         0
mOTUv4.0_000011    d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium sp900539945    0          0          5

You can now upload the output.mOTUs4 into Python or R for downstream analysis.



ico1 mOTUs is part of SIB's portfolio of open tools and databases.

ico2 mOTUs is part of the ELIXIR-CH Service Delivery Plan.