Option manual#

Here, we provide a full overview of mOTUs commands and their options, as well as recommendations on parameter selection. In the command line, you can always type motus <command> to obtain a short description of its options and useage.

To execute the tool you need to call motus <command> [options], where <command> can be:

  • profile, perform taxonomic profiling on a sample. The command has the following subroutines:

    • map_tax, map reads to the marker gene database, output a SAM/BAM file,

    • calc_mgc, aggregate reads from the same marker gene cluster (MGC) and output the MGC abundance table. It uses the SAM/BAM file produced by map_tax,

    • calc_motu, from an MGC abundance table (created by calc_mgc), produce the mOTUs abundance table.

  • downloadMGDB, download the mOTUs marker gene database,

  • merge, merge multiple sample profiles into a single table,

  • classify, annotate for your genomes which mOTUs they belong to,

  • prep_long, convert long read data into an input format suitable for motus profile.

  • genomes, search the mOTUs genome database for genomes matching specified taxonomic and functional annotations,

  • download, download sequence data for any genome within the mOTUs database.

motus profile#

Produces a taxonomic profile from short read metagenomic sequencing data by executing motus map_tax, motus calc_mgc, and motus calc_motu in succession. Can be used to profile long read metagenomic sequencing data after running motus prep_long first.

command line interface
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The profile command is the main function in mOTUs that executes map_tax, calc_mgc,
        and calc_motu in sequence. It takes short read metagenomic sequencing data as input
        and generates a taxonomic profile.


    Usage:
        motus profile -f FILE [FILE ...] -r FILE [FILE ...] -s FILE [FILE ...] -o FILE [options]
        motus profile -f FILE [FILE ...] -r FILE [FILE ...] -o FILE [options]
        motus profile -s FILE [FILE ...] -o FILE [options]


    Input options:
        -f, --forward  FILE [FILE ...]
            Input file(s) for reads in forward orientation, FastQ/A(.gz)-formatted

        -r, --reverse  FILE [FILE ...]
            Input file(s) for reads in reverse orientation, FastQ/A(.gz)-formatted

        -s, --single  FILE [FILE ...]
            Input file(s) for unpaired reads, FastQ/A(.gz)-formatted

        -n, --sample-name  STR
            Sample name (default: 'unnamed sample')

    Output options:
        -o, --output-file  FILE
            Output file name [required]

    Algorithm options:
        -g, --marker-genes  INT
            Required number of marker genes for a mOTU to be called present:
            1=higher recall, 6=higher precision, 10=maximum (default: 3)

        -l, --alignment-length  INT
            Minimum length of the alignment (bp) (default: 75)

        -t, --threads  INT
            Number of threads (default: 1)

        -y, --counting-mode  STR
            Which scale the abundances are reported in (default: INSERT_SCALED)
            Choices: [INSERT_RAW, INSERT_NORM, INSERT_SCALED, BASE_RAW, BASE_NORM]

Required arguments#

Input -f, -r, -s: one or multiple FastQ/A files, which can be gzipped. Ensure the order of input files is the same for -f and -r if using paired-end data.

Output -o: path to the output file. This also serves as a prefix for intermediate files.

Option overview#

Option

Description

-f, --forward

Input path - Paired Forward: One or more gzipped FastQ/A files containing forward reads. The input files must have the same order for both forward and reverse.

-r, --reverse

Input path - Paired Reverse: One or more gzipped FastQ/A files containing reverse reads. The input files must have the same order for both forward and reverse.

-s, --single

Input path - Single: One or more gzipped FastQ/A files. The order of the input files does not matter for single-end files.

-o, --output-file

Output prefix: Path to the output files. This prefix is also used for intermediate files.

-n, --sample-name

Sample name: Name of the sample. Required when merging samples. The default value is unnamed sample.

-g, --marker-genes

Sensitivity: The number of marker genes with abundance required to call a mOTU present. Default value is 3, with a minimum of 1 and a maximum of 10.

-l, --alignment-length

Length: Filter alignments if their length is below this value. Default value is 75. Note: Choose a value less than or equal to the length of the reads.

-t, --threads

Threads: Number of threads to use for the alignment step. Default is 1.

-y, --counting-mode

Counting method: mOTUs can count in different modes, the default mode is INSERT_SCALED. Other options include INSERT_RAW, INSERT_NORM, INSERT_SCALED, BASE_RAW, and BASE_NORM. For more details, see algorithm.

Option description#

Input files#

Option

Description

-f, --forward

Input path - Paired Forward: One or more gzipped FastQ/A files containing forward reads. The input files must have the same order for both forward and reverse.

-r, --reverse

Input path - Paired Reverse: One or more gzipped FastQ/A files containing reverse reads. The input files must have the same order for both forward and reverse.

-s, --single

Input path - Single: One or more gzipped FastQ/A files. The order of the input files does not matter for single-end files.

The mOTUs profile and map_tax routines accept short read sequence files in FastA or FastQ format. Gzipped input is supported as well.

It’s possible to submit single and paired-end files belonging to multiple runs from the same sample if the file order between forward and reverse reads is maintained. Files submitted together should be separated by a space.

Examples

We use the sequencing files from the biosample SAMN06172452 as input. This sample has been sequenced multiple times, including runs ERR1913344, ERR1913349, and so on, for a total of 25 runs.

Correct usage

  • -f ERR1913344.1.fq.gz -r ERR1913344.2.fq.gz Profile one paired-end run.

  • -f ERR1913344.1.fq.gz ERR1913349.1.fq.gz -r ERR1913344.2.fq.gz ERR1913349.2.fq.gz Profile two paired-end runs as one sample.

Incorrect usage and mOTUs will abort with an error

  • -f ERR1913349.1.fq.gz -r ERR1913349.1.fq.gz The same FastQ file is submitted twice. If the run was single-end, use -s instead of -f.

  • -f ERR1913349.1.fq.gz -r ERR1913344.2.fq.gz ERR1913349.2.fq.gz Unequal number of forward and reverse FastQ input files.

  • -f ERR1913349.1.fq.gz ERR1913344.1.fq.gz -r ERR1913344.2.fq.gz ERR1913349.2.fq.gz Runs submitted in the wrong order. mOTUs checks read names and will break.

Incorrect usage but mOTUs will not abort with an error

  • -s ERR1913344.1.fq.gz mOTUs will profile a paired-end run as single-end data if only the forward file is submitted.

  • -f ERR1913344.1.fq.gz ERR1913368.1.fq.gz -r ERR1913344.2.fq.gz ERR1913368.2.fq.gz Run ERR1913368 is from (another) biosample SAMN06172417, but mOTUs will profile these two runs as one sample.

In general, when a sample has been sequenced multiple times, we recommend profiling all available runs together, rather than processing each run separately. The main benefits are:

  1. Increased sensitivity: a higher chance of detecting taxa present in low abundances through increasing the chance to detect enough marker genes and pass the threshold defined by -g.

  2. More accurate handling of multi-mapper reads, as their weights are influenced by uniquely mapped reads from the same sample (see algorithm for more details).

Sample name#

Option

Description

-n, --sample-name

Sample name: Name of the sample. Required when merging samples. The default value is unnamed sample.

The -n option can be used within the profile and calc_motu routines to add the name of the sample to the mOTUs profile file.

This is particularly important when merging multiple mOTUs profiles using the motus merge command, which requires distinct sample names. By default, mOTUs assigns each sample the name unnamed sample.

Recommendation: Although this parameter is optional, we recommend using it and adding meaningful sample names that match the user’s metadata. For public data, it’s best to use stable identifiers. For example, when profiling all runs from the biosample SAMN06172452, use -n SAMN06172452. If profiling only one run from the same biosample (e.g., ERR1913349), use -n ERR1913349.

Number of marker genes#

Option

Description

-g, --marker-genes

Sensitivity: The number of marker genes with abundance required to call a mOTU present. Default value is 3, with a minimum of 1 and a maximum of 10.

The profile and calc_motu routines estimate taxonomic (per-mOTU) abundance based on the median abundance of marker genes for mOTUs in which at least [-g] marker genes have been detected. By default, at least three marker genes need to display non-zero abundance for calculating the abundance of the corresponding mOTU. Varying the -g parameter allows users to increase or decrease the required number of non-zero abundance marker genes, prioritising precision or recall respectively.

Examples

  • -g 3 default value, balanced trade-off between precision and recall

  • -g 1 lower precision, higher recall

  • -g 6 higher precision, lower recall

  • -g 10 highest precision, lowest recall.

Note: The criteria for including a genome into marker-gene based clustering is the presence of at least six marker genes (see conceptual implementation). As a result, setting the -g parameter to be higher than 6 will automatically omit such mOTUs from the taxonomic profile.

Alignment length#

Option

Description

-l, --alignment-length

Length: Filter alignments if their length is below this value. Default value is 75. Note: Choose a value less than or equal to the length of the reads.

Used within the profile, map_tax, and calc_mgc routines to filter alignments from the BAM file.

mOTUs uses the bwa aligner to map reads against the mOTUs marker gene database. By default, alignments of less than 97% identity or 75 bases in length (adjusted with -l) are filtered out.

The 75-base cutoff has been proven to be a robust default value that avoids spurious alignments that can occur with decreasing alignment length. It is also low enough to be able to profile sequencing runs that yield 100 bp reads.

Note

If taxonomic profiles from different samples are merged, these profiles should be generated using the same alignment length parameter.

Number of threads#

Option

Description

-t, --threads

Threads: Number of threads to use for the alignment step. Default is 1.

As part of the profile and map_tax routines, mOTUs uses the bwa aligner to map reads against its marker gene database.

By default, one thread is used, but you can increase the number of threads with -t to improve runtime. In our tests, runtime scaled almost linearly up to 16 threads (-t 16).

Counting method#

Option

Description

-y, --counting-mode

Counting method: mOTUs can count in different modes, the default mode is INSERT_SCALED. Other options include INSERT_RAW, INSERT_NORM, INSERT_SCALED, BASE_RAW, and BASE_NORM. For more details, see algorithm.

The -y option in the profile and calc_motu routines is used to set the counting method. Five counting methods are available, each with its own use cases. By default, INSERT_SCALED is applied because it’s most suitable for standard taxonomic profiling followed by alpha- and/or beta-diversity analysis. Detailed formulas and a visual overview of all counting methods are provided on the algorithm page.

Short description with use cases:

INSERT_SCALED: Number of inserts mapped to each mOTU after normalising coverage of each MG by gene length, scaled so that abundance values are above 1 and suitable for use in methods requiring count data. This mode is the recommended default and can be used for most downstream analyses, including alpha-diversity, beta-diversity, and differential abundance analysis.

INSERT_RAW: Number of inserts mapped to each mOTU before any length normalisation. In general, this mode is not recommended for downstream analyses: the MGs vary in length and taking the median coverage across them without accounting for length variation is not meaningful for species quantification. However, the INSERT_RAW column in the MGC output file can be useful for diagnosing technical issues, such as identifying MGs with unusually high or low coverage, or comparing consistency across runs from the same biological sample.

INSERT_NORM: Number of inserts mapped to each mOTU after normalising coverage of each MG by gene length. This number represents the median coverage of universal single-copy marker genes in the sample and can therefore be used to normalise the coverage of gene catalogues. Dividing the coverage of other genes by this value provides the average gene copy number per cell, enabling cross-sample comparison of functional abundances.

BASE_RAW: Base coverage of each mOTU before any length normalisation. As with INSERT_RAW, this mode is not recommended for species quantification and downstream analyses, but can be inspected in the MGC output file to assess technical issues.

BASE_NORM: Base coverage of each mOTU after normalising coverage of each MG by gene length. This mode can be used to normalise gene catalogues, similar to INSERT_NORM. The difference between these two modes depends on how the gene catalogue itself was quantified: in number of inserts or in average base coverage.

motus map_tax#

Maps short reads against the mOTUs marker gene database.

command line interface
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The map_tax command takes short read metagenomic sequencing data as input and
        maps reads to the mOTUs marker gene database.


    Usage:
        motus map_tax -f FILE [FILE ...] -r FILE [FILE ...] -s FILE [FILE ...] -o FILE [options]
        motus map_tax -f FILE [FILE ...] -r FILE [FILE ...] -o FILE [options]
        motus map_tax -s FILE [FILE ...] -o FILE [options]


    Input options:
        -f, --forward  FILE [FILE ...]
            Input file(s) for reads in forward orientation, FastQ/A(.gz)-formatted

        -r, --reverse  FILE [FILE ...]
            Input file(s) for reads in reverse orientation, FastQ/A(.gz)-formatted

        -s, --single  FILE [FILE ...]
            Input file(s) for unpaired reads, FastQ/A(.gz)-formatted

    Output options:
        -o, --output-file  FILE
            Output file name [required]

    Algorithm options:
        -l, --alignment-length  INT
            Minimum length of the alignment (bp) (default: 75)

        -t, --threads  INT
            Number of threads (default: 1)

Required arguments#

Input -f, -r, -s: one or multiple FastQ/A files, which can be gzipped. Ensure the order of input files is the same for -f and -r if using paired-end data.

Output -o: path to the output file.

Option overview#

Option

Description

-f, --forward

Input path - Paired Forward: One or more gzipped FastQ/A files containing forward reads. The input files must have the same order for both forward and reverse.

-r, --reverse

Input path - Paired Reverse: One or more gzipped FastQ/A files containing reverse reads. The input files must have the same order for both forward and reverse.

-s, --single

Input path - Single: One or more gzipped FastQ/A files. The order of the input files does not matter for single-end files.

-o, --output-file

Output prefix: Path to the output files. This prefix is also used for intermediate files.

-l, --alignment-length

Length: Filter alignments if their length is below this value. Default value is 75. Note: Choose a value less than or equal to the length of the reads.

-t, --threads

Threads: Number of threads to use for the alignment step. Default is 1.

motus calc_mgc#

Calculates number of inserts mapping to each marker gene cluster within the mOTUs marker gene database.

command line interface
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The calc_mgc command takes a file storing the alignments of sequencing reads
        to the mOTUs marker gene database and calculates marker gene cluster abundances.


    Usage:
        motus calc_mgc -i FILE -o FILE [options]


    Input options:
        -i, --input-file  FILE
            Path to BAM file generated after running the motus map_tax command [required]

    Output options:
        -o, --output-file  FILE
            Output file name [required]

    Algorithm options:
        -l, --alignment-length  INT
            Minimum length of the alignment (bp) (default: 75)

Required arguments#

Input -i: a SAM or BAM file generated after running motus map_tax.

Output -o: path to the output file.

Option overview#

Option

Description

-i, --input-file

Input path: Path to BAM file generated after running motus map_tax.

-o, --output-file

Output path: Path to the output file.

-l, --alignment-length

Length: Filter alignments if their length is below this value. Default value is 75. Note: Choose a value less than or equal to the length of the reads.

motus calc_motu#

Calculates the taxonomic profile based on number of inserts mapped to the corresponding marker gene clusters.

command line interface
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The calc_motu command takes a file containing marker gene cluster
        abundances and generates a taxonomic profile.


    Usage:
        motus calc_motu -i FILE -o FILE [options]


    Input options:
        -i, --input-file  FILE
            MGC abundance table generated by the calc_mgc command [required]

        -n, --sample-name  STR
            Sample name (default: 'unnamed sample')

    Output options:
        -o, --output-file  FILE
            Output file name [required]

    Algorithm options:
        -g, --marker-genes  INT
            Required number of marker genes for a mOTU to be called present:
            1=higher recall, 6=higher precision, 10=maximum (default: 3)

        -y, --counting-mode  STR
            Which scale the abundances are reported in (default: INSERT_SCALED)
            Choices: [INSERT_RAW, INSERT_NORM, INSERT_SCALED, BASE_RAW, BASE_NORM]

Required arguments#

Input -i: the MGC file generated after running motus calc_mgc.

Output -o: path to the taxonomic profiles containing abundances as counts and as relative abundances.

Option overview#

Option

Description

-i, --input-file

Input path: Path to MGC abundance table generated after running motus calc_mgc.

-o, --output-file

Output path: Path to the output files.

-n, --sample-name

Sample name: Name of the sample. Required when merging samples. The default value is unnamed sample.

-g, --marker-genes

Sensitivity: The number of marker genes with abundance required to call a mOTU present. Default value is 3, with a minimum of 1 and a maximum of 10.

-y, --counting-mode

Counting method: mOTUs can count in different modes, the default mode is INSERT_SCALED. Other options include INSERT_RAW, INSERT_NORM, INSERT_SCALED, BASE_RAW, and BASE_NORM. For more details, see algorithm.

motus downloadMGDB#

Downlads the marker gene reference database required for profiling.

command line interface
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The downloadMGDB command downloads the marker gene reference database used
        by the profile and map_tax commands.


    Usage:
        motus downloadMGDB [options]


    Options:
        -f, --force
            Force download even when database is already present

Option overview#

Option

Description

-f, --force

Force: Download the database even if it’s already present.

motus merge#

Merges taxonomic profiles from multiple samples into one (tab-separated) table.

Note: Requires that each profile is named (using -n in motus profile or motus calc_motu).

command line interface
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The merge command takes multiple profiles produced after running the
        profile command and combines them into a single table.


    Usage:
        motus merge -i FILE [FILE ...] -o FILE


    Input options:
        -i, --input-files  FILE [FILE ...]
            A list of mOTUs profile files or a text file containing the list of profile
            files to be merged, with one line per file [required]

    Output options:
        -o, --output-file  FILE
            Output file name [required]

Required arguments#

Input -i: mOTUs profile files to merge. The input can be provided as a text file with one line per profile or as a space-separated list containing multiple mOTUs profiles. At least two profiles are required. Note: These files must be generated using the same mOTUs version and the same parameters.

Output -o: path to the merged profile file.

Option overview#

Option

Description

-i, --input-files

Input files: A space-separated list of profiles produced after running motus profile or motus calc_motu. Alternatively, a text file containing paths to generated profiles, one profile per line.

-o, --output-file

Output path: Path to the output file containing merged profiles.

Option description#

Input files#

Option

Description

-i, --input-files

Input files: A space-separated list of profiles produced after running motus profile or motus calc_motu. Alternatively, a text file containing paths to generated profiles, one profile per line.

We will use samples from project PRJEB52368 as an example. After running motus profile on selected samples, we obtain the following output files: SAMEA5998847.mOTUs4, SAMEA6009611.mOTUs4, SAMEA6009792.mOTUs4, SAMEA6009843.mOTUs4, and SAMEA6009888.mOTUs4.

IMPORTANT: When running motus profile, sample names should be indicated using the -n parameter (see here). Unnamed samples are unsuitable for merging.

Correct usage

motus merge -i SAMEA5998847.mOTUs4 SAMEA6009611.mOTUs4 SAMEA6009792.mOTUs4 SAMEA6009843.mOTUs4 SAMEA6009888.mOTUs4 -o PRJEB52368.tsv listing profiles directly in the command line.

motus merge -i profiles_to_merge.txt -o PRJEB52368.tsv indicating file with a list of profiles, where the content of profiles_to_merge.txt is:

/path/to/SAMEA5998847.mOTUs4
/path/to/SAMEA6009611.mOTUs4
/path/to/SAMEA6009792.mOTUs4
/path/to/SAMEA6009843.mOTUs4
/path/to/SAMEA6009888.mOTUs4

Incorrect usage and mOTUs will abort with an error

motus merge -i SAMEA5998847.mOTUs4 -o PRJEB52368.tsv merge requires at least two profiles to be provided.

motus merge -i SAMEA5998847.mOTUs4 SAMEA6009611.mOTUs4 SAMEA6009611.mOTUs4 -o PRJEB52368.tsv the same sample is indicated twice.

motus classify#

Assigns provided genomes to a mOTU if the corresponding taxon is present within database.

command line interface
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The classify command takes a list of genome sequence files as input and
        assigns these genomes to existing mOTUs in the database.


    Usage:
        motus classify -i FILE -o FILE [options]


    Input options:
        -i, --input-file  FILE
            Text file listing genome sequence files in FastA(.gz) format to classify.
            One line per genome file [required]

    Output options:
        -o, --output-file  FILE
            Output file name. Each line contains a genome and its associated mOTU [required]

    Algorithm options:
        -t, --threads  INT
            Number of threads (default: 1)

Required arguments#

Input -i: a text file containing a list of paths to genome files in FastA(.gz) format.

Output -o: path to output file, which contains one line per genome with its associated mOTU.

Option overview#

Option

Description

-i, --input-file

Input file - Genome list: A text file listing genome files that will be associated with existing mOTUs.

-o, --output-file

Output path - Classification: The output file, containing one line per genome with its associated mOTU.

-t, --threads

Threads: Number of threads to use for aligning against the mOTUs MG database. Default is 1.

Option description#

Input file#

Option

Description

-i, --input-file

Input file - Genome list: A text file listing genome files that will be associated with existing mOTUs.

The input of motus classify is a text file listing genome sequence files in FastA format, one file per line. The genome sequence files can be gzipped, and the filenames of all genomes must be unique.

Correct usage

Wherein the content of genomes txt is

$ cat genomes.txt
/a/b/c.fa
/a/c/d.fa.gz

Incorrect usage and mOTUs will abort with an error

$ cat genomes.txt
/a/b/c.fa /a/c/d.fa
$ cat genomes.txt
/a/b/c.fa
/a/c/c.fa

Output file#

Option

Description

-o, --output-file

Output path - Classification: The output file, containing one line per genome with its associated mOTU.

The tabular output file contains one line per submitted genome, indicating the assigned mOTU, <6MGs-no_mOTU if the genome lacked at least 5 out of 10 marker genes, or Novel-no_mOTU if the genome had >6MGs marker genes but couldn’t be assigned to any mOTU.

Threads#

Option

Description

-t, --threads

Threads: Number of threads to use for aligning against the mOTUs MG database. Default is 1.

motus classify is partly multi-threaded and using less than 32 threads usually gives a considerable speed-up when classifying larger genome sets.

motus prep_long#

Prepares long reads for profiling by mOTUs. Has to be run before the motus profile command.

command line interface
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The prep_long command takes long-read sequencing data and converts it
        into the appropriate input format to be used by the profile and map_tax commands.


    Usage:
        motus prep_long -i FILE -o FILE [options]


    Input options:
        -i, --input-file  FILE
            Long-read sequencing file to convert, can be in FastQ/A(.gz) format [required]

    Output options:
        -o, --output-file  FILE
            Output file name. This converted file is ready to be used by motus profile [required]

    Algorithm options:
        -sl, --splitting-length  INT
            Target fragment length (in bp) for splitting long reads (default: 300)

        -ml, --minimum-length  INT
            Minimum read length after splitting. Shorter reads are discarded (default: 50)

Required arguments#

Input -i: input file containing long reads in FastQ/A(.gz) format.

Output -o: output FastA file containing converted reads. Appropriate to be used as input for motus profile.

Option overview#

Option

Description

-i, --input-file

Input path: The input file containing long reads in FastQ/A(.gz) format.

-o, --output-file

Output path: The output file containing converted short reads in FastA format.

-sl, --splitting-length

Splitting length: Target length of the converted reads. Default value is 300.

-ml, --minimum-length

Minimum length: Converted reads shorter than indicated length will not be written to the output. The default value is 50.

Option description#

Splitting length and minimum length#

The motus prep_long function splits every long read in the dataset into non-overlapping fragments of 300 base pairs (or the value of -sl) in length:

|--SL--|                                                |--- >ML?
|======|======|======|======|======|======|======|======|===

The final fragment is only written to the output file if its length is at least 50 base pairs (or the value of -ml). Fragments cannot overlap as this will affect base coverage quantification.

motus genomes#

Queries the mOTUs genome database to find genomes matching indicated mOTUs identifiers, taxonomic clades, or functional annotations.

Note

First-time execution of this command downloads the mOTUs annotation database, which requires 17.7G of storage.

command line interface
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The genomes command queries the mOTUs-db based on identifiers, functional,
        or taxonomic annotations and returns a list of genomes matching indicated query.


    Usage:
        motus genomes -i FILE -o FILE [options]
        motus genomes -i STR [STR ...] -o FILE [options]


    Input options:
        -i, --input-queries  FILE/STR
            Can be either a list of search queries or a text file listing search queries
            with one line per query. Queries can be genome or mOTUs identifiers, PFAM, KEGG, EGGNOG,
            or GTDB taxonomy names. If the query does not exactly match any database entry,
            alternative queries will be suggested [required]

    Output options:
        -o, --output-file  FILE
            Output file containing a list of genome identifiers matching search queries and their
            annotations as indicated by the -d parameter. This output file can be used as input
            for the motus download command [required]

        -d, --details  STR [STR ...]
            List of annotations to report. Choose any combination of [KEGG, PFAM, EGGNOG, TAXONOMY],
            for example, -d KEGG PFAM.

Required arguments#

Input -i: a list of search queries separated by space. Alternatively, a text file listing search queries, with one query per line.

Output -o: output table containing genome identifiers matching search queries and annotations requested by the user. This file is appropriate to be used as input for motus download.

Option overview#

Option

Description

-i, --input-queries

Input - List of queries: A list of terms to query within the mOTUs annotation database.

-o, --output-file

Output file: The output table contains an overview of genomes matching indicated queries, accompanied by annotations specified with the -d parameter.

-d, --details

Details: Annotations to report. Options include KEGG, PFAM, EGGNOG, and TAXONOMY.

Option description#

List of queries#

Option

Description

-i, --input-queries

Input - List of queries: A list of terms to query within the mOTUs annotation database.

A list of search queries separated by space. Alternatively, a text file listing search queries, with one query per line.

Output file#

Option

Description

-o, --output-file

Output file: The output table contains an overview of genomes matching indicated queries, accompanied by annotations specified with the -d parameter.

By default only the names of the genomes and the search query is reported. Using the -d option will add columns with functional and taxonomic annotation.

Details#

Option

Description

-d, --details

Details: Annotations to report. Options include KEGG, PFAM, EGGNOG, and TAXONOMY.

The -d option allows users to specify which annotation to report when using the motus genomes command. Reporting multiple annotation types is possible, e.g. by -d KEGG PFAM.

motus download#

Downloads sequences for indicated genomes from the mOTUs genome database.

command line interface
Program: motus - a tool for marker gene-based OTU (mOTU) profiling
    Version: 4.0.4


    References:
        Profiler: Ruscheweyh, Milanese et al. Cultivation-independent genomes greatly expand
        taxonomic-profiling capabilities of mOTUs across various environments. Microbiome (2022).
        doi: https://doi.org/10.1186/s40168-022-01410-z

        Database: Dmitrijeva, Ruscheweyh et al. The mOTUs online database provides web-accessible
        genomic context to taxonomic profiling of microbial communities. Nucleic Acids Research (2025).
        doi: https://doi.org/10.1093/nar/gkae1004


    Summary:
        The download command downloads listed genome files from mOTUs-db.


    Usage:
        motus download -i FILE -o PATH [options]
        motus download -i STR [STR ...] -o PATH [options]


    Input options:
        -i, --input-genomes  FILE/STR
            Can be either a list of genome identifiers separated by spaces or a text file
            listing the identifiers of genomes for download. One line per genome. The output of
            the motus genomes command can be used as input for this command [required]

    Output options:
        -o, --output-folder  PATH
            Path to output folder where the downloaded sequences will be saved [required]

        -r, --representatives
            Download only sequences from representative genomes.

Required arguments#

Input -i: a list of genome identifiers separated by space. Alternatively, a text file listing genome identifiers, with one genome per line. The file generated by motus genomes can be used as input.

Output -o: path to folder that will store downloaded sequences.

Option overview#

Option

Description

-i, --input-genomes

Input - List of genomes: A list of genome identifiers specifying which sequences to download.

-o, --output-folder

Output path: Output directory in which the downloaded sequences will be saved.

-r, --representatives

Representatives only: Only download sequences from representative genomes.

Option description#

List of genome identifiers#

Option

Description

-i, --input-genomes

Input - List of genomes: A list of genome identifiers specifying which sequences to download.

The download command supports two different types of input files. You can provide a simple list of genome names or a detailed table from a previous analysis.

Option 1: Simple Genome List

This is a basic text file where you provide one genome name per line. Use this format if you already have a specific list of genomes you want to download.

ELLE19-1_SAMN09288280_MAG_00000004
ELLE19-1_SAMN09288282_MAG_00000006
ELLE19-1_SAMN09288284_MAG_00000011

Option 2: mOTUs Genome Table

Alternatively, you can provide the output file generated by the motus genomes command. This format is a tab-separated table that includes a header line followed by the genome names and their associated query data.

GENOME                                QUERY
ELLE19-1_SAMN09288280_MAG_00000004    mOTUv4.0_001734
ELLE19-1_SAMN09288282_MAG_00000006    mOTUv4.0_001734
ELLE19-1_SAMN09288284_MAG_00000011    mOTUv4.0_001734


ico1 mOTUs is part of SIB's portfolio of open tools and databases.

ico2 mOTUs is part of the ELIXIR-CH Service Delivery Plan.