motus profile#
Note: This tutorial has been designed to be run on Unix-based systems (macOS or Linux) and requires the mOTUs profiler to be correctly installed as described on the quickstart page.
The motus profile produces a taxonomic profile from short read metagenomic sequencing data by running motus map_tax, motus calc_mgc,
and motus calc_motu in succession.
By default, motus profile requires the following parameters (see option manual):
-f|--forward: FastQ/A file(s) containing forward reads from paired-end shotgun metagenome data, separated by spaces.-r|--reverse: FastQ/A file(s) containing reverse reads from paired-end shotgun metagenome data, separated by spaces.-s|--single: FastQ/A file(s) containing reads from single-end shotgun metagenome data, separated by spaces.-o|--output-file: Path and prefix for the output files. This prefix is also used for naming intermediate files.
To run the tool, you must provide either -f together with -r (paired-end), or -s by itself (single-end). When using paired-end data,
ensure the file order in -f matches the order in -r.
Note: Although this parameter is not required for motus profile, we strongly recommend providing -n or --sample-name
when profiling multiple samples as this enables merging them into a single taxonomic profile.
Before we begin the tutorial, we need to download example paired-end short read sequencing data: forward (*_1.fastq) and reverse reads (*_2.fastq)
from metagenomic samples A, B and C.
If you are working on Linux, you can download the data with wget:
wget https://zenodo.org/record/7188406/files/sampleA_1.fastq
wget https://zenodo.org/record/7188406/files/sampleA_2.fastq
wget https://zenodo.org/record/7188406/files/sampleB_1.fastq
wget https://zenodo.org/record/7188406/files/sampleB_2.fastq
wget https://zenodo.org/record/7188406/files/sampleC_1.fastq
wget https://zenodo.org/record/7188406/files/sampleC_2.fastq
If you are working on macOS, you can download the data with curl:
curl https://zenodo.org/records/7188406/files/sampleA_1.fastq -o sampleA_1.fastq
curl https://zenodo.org/records/7188406/files/sampleA_2.fastq -o sampleA_2.fastq
curl https://zenodo.org/records/7188406/files/sampleB_1.fastq -o sampleB_1.fastq
curl https://zenodo.org/records/7188406/files/sampleB_2.fastq -o sampleB_2.fastq
curl https://zenodo.org/records/7188406/files/sampleC_1.fastq -o sampleC_1.fastq
curl https://zenodo.org/records/7188406/files/sampleC_2.fastq -o sampleC_2.fastq
The files should contain 67’926 reads for sampleA, 196’034 reads for sampleB, and 139’238 reads for sampleC.
Profiling a single sample#
You can create a taxonomic profile for a single metagenomic sample using motus profile command. To create a profiles for the three samples, run:
motus profile -f sampleA_1.fastq -r sampleA_2.fastq -n sampleA -o sampleA.mOTUs4
Important: Providing multiple FastQ/A files to a single motus profile command is intended for combining multiple sequencing
runs from the same biological sample. Each unique biological sample must be profiled using a separate motus profile command. For detailed explanation,
see Input files.
After running the command, the beginning of sampleA.mOTUs4 file should look like the following:
#tool_version=4.0.4 database_version=4.0 min_alignment_length=75 min_mgcs=3 count_mode=INSERT_SCALED value_type=counts
mOTU Taxonomy sampleA
mOTUv4.0_000021 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Hominicoprocola;s__Unknown Hominicoprocola mOTUv4.0_000021 113
mOTUv4.0_000030 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Ruminococcus_E;s__Ruminococcus_E bromii_B 6
mOTUv4.0_000036 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__CAG-217;s__CAG-217 sp000436335 107
mOTUv4.0_000060 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Hominenteromicrobium;s__Hominenteromicrobium mulieris 2
mOTUv4.0_000063 d__Bacteria;p__Pseudomonadota;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli 345
mOTUv4.0_000080 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Vescimonas;s__Vescimonas coprocola 2
mOTUv4.0_000147 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Acutalibacter;s__Acutalibacter ornithocaccae 10
mOTUv4.0_000239 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides;s__Bacteroides ovatus 13
The first line indicates which version of the mOTUs tool and the marker gene database were used, as well as the parameters, which you can adjust tou your use case based on the instructions in the option manual.
The second line is the header containing three columns: the mOTU identifier, the GTDB taxonomy assigned to the corresponding mOTU, and the sample name as specified by the -n flag.
The third column contains counts of the corresponding mOTUs in the sample (see Counting method for more information). The mOTUv4.0_unassigned in the last row of the profile
refers to the number of inserts which mapped to unlinked marker genes, i.e. the number of detected cells in the sample for which we were not able to identify the species.
After running the command, the following files will be generated together with the sampleA.mOTUs4 file(see Output files for examples):
File Name |
Description |
|---|---|
|
taxonomic profile containing counts for each mOTU |
|
taxonomic profile containing relative abundances for each mOTU |
|
alignment file produced by the bwa aligner |
|
abundances of each marker gene cluster |
|
overview of which marker gene sequence each insert in the input file was mapped to |
Multi-threading#
The runtime of the motus profile command is limited by the mapping of input reads against the marker gene database (motus map_tax). You can assign multiple threads (-t flag)
to accelerate the alignment process:
motus profile -f sampleA_1.fastq -r sampleA_2.fastq -n sampleA -o sampleA.mOTUs4 -t 4
In our tests, runtime scaled almost linearly up to 16 threads (-t 16).
Profiling multiple samples#
The motus profile command is meant to run on one sample at a time. For profiling many a samples at a time, we recommend generating a file which contains the list of samples to profile.
For example, let’s assume we have a samples.txt file containing the following three samples:
sampleA
sampleB
sampleC
The following bash script can then be run assuming sampleA_1.fastq, sampleA_2.fastq, etc. are in the same folder:
#!/bin/bash
# Define input file
SAMPLE_FILE="samples.txt"
# Check if input file exists
if [[ ! -f "$SAMPLE_FILE" ]]; then
echo "Error: $SAMPLE_FILE not found!"
exit 1
fi
# Loop through each line in the file
while read SAMPLE; do
echo "Processing:" $SAMPLE
# Run mOTUs profile on selected sample
motus profile \
-f "${SAMPLE}_1.fastq" \
-r "${SAMPLE}_2.fastq" \
-n "$SAMPLE" \
-o "results/${SAMPLE}.mOTUs4" \
-t 4
echo "Finished:" $SAMPLE
echo "-----------------------------------"
done < $SAMPLE_FILE
Alternatively, motus profile can be incorporated into a Snakemake pipeline:
import os
# Load sample IDs from a text file
# .strip() removes whitespace/newlines; 'if line.strip()' skips empty lines
SAMPLES = [line.strip() for line in open("samples.txt") if line.strip()]
rule all:
input:
expand("results/{sample}.mOTUs4", sample=SAMPLES)
# Run motus profile on all samples from the samples.txt file
rule motus_profile:
input:
fwd = "{sample}_1.fastq",
rev = "{sample}_2.fastq"
output:
"results/{sample}.mOTUs4"
threads: 4
shell:
"""
motus profile \
-f {input.fwd} \
-r {input.rev} \
-n {sample} \
-o {output} \
-t {threads}
"""
Before starting the Snakemake job, make sure to run snakemake --dry-run to verify that all profiling commands have been constructed correctly.
Merging profiles into one table#
Note: For merging, ensure all profiles have been generated using the same tool and database version, with consistent parameters.
In metagenomic studies, profiles from multiple samples are frequently compiled into a single abundance table. In this table, rows represent taxa, columns represent samples,
and each cell is the abundance of a specific taxon in a specific sample. To generate an abundance table, run the motus merge command, which requires the following parameters:
-i|--input-files: a space-separated list of profile files OR a text file listing profile files, with one file per line-o|--output-file: a path where to store the generated abundance table
To merge all profiles output by motus profile, run:
motus merge -i sampleA.mOTUs4 sampleB.mOTUs4 sampleC.mOTUs4 -o output.mOTUs4
Alternatively, you can run:
motus merge -i *.mOTUs4 -o output.mOTUs4
In case there are many profiles, generate a file called profiles.txt, where the path to each sample profile has its own line:
sampleA.mOTUs4
sampleB.mOTUs4
sampleC.mOTUs4
And run:
motus merge -i profiles.txt -o output.mOTUs4
The first rows of the resulting tab-separated values file output.mOTUs4 should contain the following data:
#tool_version=4.0.4 database_version=4.0 min_alignment_length=75 min_mgcs=3 count_mode=INSERT_SCALED value_type=counts
mOTU Taxonomy sampleA sampleB sampleC
mOTUv4.0_000000 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium duncaniae 0 0 5
mOTUv4.0_000001 d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Prevotella;s__Unknown Prevotella mOTUv4.0_000001 0 0 3218
mOTUv4.0_000002 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Roseburia;s__Roseburia inulinivorans 0 6 0
mOTUv4.0_000003 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Christensenellales;f__Aristaeellaceae;g__UBA11524;s__UBA11524 sp000437595 0 0 5
mOTUv4.0_000004 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium prausnitzii 0 0 48
mOTUv4.0_000006 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Agathobacter;s__Agathobacter rectalis 0 9 4
mOTUv4.0_000007 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium longum 0 15 0
mOTUv4.0_000011 d__Bacteria;p__Bacillota_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium;s__Faecalibacterium sp900539945 0 0 5
You can now upload the output.mOTUs4 into Python or R for downstream analysis.
mOTUs is part of SIB's portfolio of open tools and databases.
mOTUs is part of the ELIXIR-CH Service Delivery Plan.