`motus profile`#

Note: This tutorial has been designed to be run on Unix-based systems (macOS or Linux) and requires the mOTUs profiler to be correctly installed as described on the quickstart page.

The motus profile command produces a taxonomic profile from short read metagenomic sequencing data by running motus map_tax, motus calc_mgc, and motus calc_motu in succession.

By default, motus profile requires the following parameters (see option manual):

-f | --forward: FastQ/A file(s) containing forward reads from paired-end shotgun metagenome data, separated by spaces.
-r | --reverse: FastQ/A file(s) containing reverse reads from paired-end shotgun metagenome data, separated by spaces.
-s | --single: FastQ/A file(s) containing reads from single-end shotgun metagenome data, separated by spaces.
-o | --output-file: Path and prefix for the output files. This prefix is also used for naming intermediate files.

To run the tool, you must provide either -f together with -r (paired-end), or -s by itself (single-end). When using paired-end data, ensure the file order in -f matches the order in -r.

Note: Although this parameter is not required for motus profile, we strongly recommend providing -n or --sample-name when profiling multiple samples as this enables merging them into a single taxonomic profile.

Before we begin the tutorial, we need to download example paired-end short read sequencing data: forward (*_1.fastq) and reverse reads (*_2.fastq) from metagenomic samples A, B and C.

If you are working on Linux/MacOS, you can download the data with curl:

curl https://zenodo.org/records/7188406/files/sampleA_1.fastq -o sampleA_1.fastq
curl https://zenodo.org/records/7188406/files/sampleA_2.fastq -o sampleA_2.fastq

curl https://zenodo.org/records/7188406/files/sampleB_1.fastq -o sampleB_1.fastq
curl https://zenodo.org/records/7188406/files/sampleB_2.fastq -o sampleB_2.fastq

curl https://zenodo.org/records/7188406/files/sampleC_1.fastq -o sampleC_1.fastq
curl https://zenodo.org/records/7188406/files/sampleC_2.fastq -o sampleC_2.fastq

The files should contain 67’926 reads for sampleA, 196’034 reads for sampleB, and 139’238 reads for sampleC.

Profiling a single sample#

You can create a taxonomic profile for a single metagenomic sample using the motus profile command. To create profiles for the three samples, run:

motus profile -f sampleA_1.fastq -r sampleA_2.fastq -n sampleA -o sampleA.mOTUs4

Important: Providing multiple fastQ/A files to a single motus profile command is intended for combining multiple sequencing runs from the same biological sample. Each unique biological sample must be profiled using a separate motus profile command. For detailed explanation, see Input files.

After running the command, the beginning of sampleA.mOTUs4 file should look like the following:

#tool_version=4.1.0   database_version=4.1    min_alignment_length=75 min_mgcs=3      count_mode=INSERT_SCALED        value_type=counts
mOTU              Taxonomy                                                                                                                                   sampleA
mOTUv4.0_000021   d__Bacteria;p__Bacillota;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Hominicoprocola;s__Unknown Hominicoprocola mOTUv4.0_000021    115
mOTUv4.0_000030   d__Bacteria;p__Bacillota;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Ruminococcoides;s__Ruminococcoides intestinale                6
mOTUv4.0_000036   d__Bacteria;p__Bacillota;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__CAG-217;s__CAG-217 sp000436335                              105
mOTUv4.0_000060   d__Bacteria;p__Bacillota;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Hominenteromicrobium;s__Hominenteromicrobium mulieris         2
mOTUv4.0_000063   d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli              343
mOTUv4.0_000080   d__Bacteria;p__Bacillota;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Vescimonas;s__Vescimonas coprocola                              2
mOTUv4.0_000147   d__Bacteria;p__Bacillota;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Acutalibacter;s__Acutalibacter ornithocaccae                 10
mOTUv4.0_000239   d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides ovatus                              12

The first line indicates which version of the mOTUs tool and the marker gene database were used, as well as the parameters, which you can adjust to your use case based on the instructions in the option manual.

The second line is the header containing three columns: the mOTU identifier, the GTDB taxonomy assigned to the corresponding mOTU, and the sample name as specified by the -n flag. The third column contains counts of the corresponding mOTUs in the sample (see Counting method for more information). The mOTUv4.0_unassigned in the last row of the profile refers to the number of inserts which mapped to unlinked marker genes, i.e. the number of detected cells in the sample for which we were not able to identify the species.

After running the command, the following files will be generated together with the sampleA.mOTUs4 file (see Output files for examples):

Overview of output files generated by `motus profile`#
File Name	Description
`sampleA.mOTUs4`	taxonomic profile containing counts for each mOTU
`sampleA.mOTUs4.relab`	taxonomic profile containing relative abundances for each mOTU
`sampleA.mOTUs4.bam`	alignment file produced by the bwa aligner
`sampleA.mOTUs4.mgc`	abundances of each marker gene cluster
`sampleA.mOTUs4.inserts.gz`	overview of which marker gene sequence each insert in the input file was mapped to

Multi-threading#

The runtime of the motus profile command is limited by the mapping of input reads against the marker gene database (motus map_tax). You can assign multiple threads (-t flag) to accelerate the alignment process:

motus profile -f sampleA_1.fastq -r sampleA_2.fastq -n sampleA -o sampleA.mOTUs4 -t 4

In our tests, runtime scaled almost linearly up to 16 threads (-t 16).

Profiling multiple samples#

The motus profile command is meant to run on one sample at a time. For profiling many samples at a time, we recommend generating a file which contains the list of samples to profile. For example, let’s assume we have a samples.txt file containing the following three samples:

sampleA
sampleB
sampleC

The following bash script can then be run assuming sampleA_1.fastq, sampleA_2.fastq, etc. are in the same folder:

#!/bin/bash

# Define input file
SAMPLE_FILE="samples.txt"

# Check if input file exists
if [[ ! -f "$SAMPLE_FILE" ]]; then
    echo "Error: $SAMPLE_FILE not found!"
    exit 1
fi

# Create the output folder
mkdir -p results

# Loop through each line in the file
while read SAMPLE; do
    echo "Processing:" $SAMPLE

    # Run mOTUs profile on selected sample
    motus profile \
        -f "${SAMPLE}_1.fastq" \
        -r "${SAMPLE}_2.fastq" \
        -n "$SAMPLE" \
        -o "results/${SAMPLE}.mOTUs4" \
        -t 4

    echo "Finished:" $SAMPLE
    echo "-----------------------------------"
done < $SAMPLE_FILE

Alternatively, motus profile can be incorporated into a Snakemake pipeline:

import os

# Load sample IDs from a text file
# .strip() removes whitespace/newlines; 'if line.strip()' skips empty lines
SAMPLES = [line.strip() for line in open("samples.txt") if line.strip()]

rule all:
    input:
        expand("results/{sample}.mOTUs4", sample=SAMPLES)

# Run motus profile on all samples from the samples.txt file
rule motus_profile:
    input:
        fwd = "{sample}_1.fastq",
        rev = "{sample}_2.fastq"
    output:
        "results/{sample}.mOTUs4"
    threads: 4
    shell:
        """
        motus profile \
            -f {input.fwd} \
            -r {input.rev} \
            -n {wildcards.sample} \
            -o {output} \
            -t {threads}
        """

Before starting the Snakemake job, make sure to run snakemake --dry-run to verify that all profiling commands have been constructed correctly.

Merging profiles into one table#

Note: For merging, ensure all profiles have been generated using the same tool and database version, with consistent parameters.

In metagenomic studies, profiles from multiple samples are frequently compiled into a single abundance table. In this table, rows represent taxa, columns represent samples, and each cell is the abundance of a specific taxon in a specific sample. To generate an abundance table, run the motus merge command, which requires the following parameters:

-i | --input-files: a space-separated list of profile files OR a text file listing profile files, with one file per line
-o | --output-file: a path where to store the generated abundance table

To merge all profiles output by motus profile, run:

motus merge -i sampleA.mOTUs4 sampleB.mOTUs4 sampleC.mOTUs4 -o output.mOTUs4

Alternatively, you can run:

motus merge -i *.mOTUs4 -o output.mOTUs4

In case there are many profiles, generate a file called profiles.txt, where the path to each sample profile has its own line:

sampleA.mOTUs4
sampleB.mOTUs4
sampleC.mOTUs4

And run:

motus merge -i profiles.txt -o output.mOTUs4

The first rows of the resulting tab-separated values file output.mOTUs4 should contain the following data:

#tool_version=4.1.0     database_version=4.1    min_alignment_length=75 min_mgcs=3      count_mode=INSERT_SCALED        value_type=counts
mOTU              Taxonomy                                                                                                                           sampleA  sampleB  sampleC
mOTUv4.0_000000   d__Bacteria;p__Bacillota;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium duncaniae           0        0        5
mOTUv4.0_000001   d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Prevotella;s__Unknown Prevotella mOTUv4.0_000001        0        0     3238
mOTUv4.0_000002   d__Bacteria;p__Bacillota;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Roseburia;s__Roseburia inulinivorans                      0        6        0
mOTUv4.0_000003   d__Bacteria;p__Bacillota;c__Clostridia;o__Christensenellales;f__Aristaeellaceae;g__UBA11524;s__UBA11524 sp000437595                      0        0        5
mOTUv4.0_000004   d__Bacteria;p__Bacillota;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium prausnitzii         0        0       49
mOTUv4.0_000006   d__Bacteria;p__Bacillota;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis                     0        9        4
mOTUv4.0_000007   d__Bacteria;p__Bacillota;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium longum              0       17        0
mOTUv4.0_000011   d__Bacteria;p__Bacillota;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium sp900539945         0        0        5

You can now upload the output.mOTUs4 into Python or R for downstream analysis.

mOTUs is part of SIB's portfolio of open tools and databases.

mOTUs is part of the ELIXIR-CH Service Delivery Plan.

motus profile#

Profiling a single sample#

Multi-threading#

Profiling multiple samples#

Merging profiles into one table#

`motus profile`#