Starting command line runs

Command Line Use#

The epsSMASH command line tool comes with a built-in help system.

Use epsSMASH --help to display help for the most common options

Use epsSMASH --help-showall to get a description of all possible options.

Default run#

Running epsSMASH without parameters will run the core detection module and those analyses which are quite fast to run.

More time consuming options such as the ClusterBlast analysis will not be run.

On a quad-core machine, running an annotated Pseudomonas aeruginosa genome with these options will take about 15 seconds.

Example:

epsSMASH pseudomonas_aeruginosa.gbk

Minimal run#

Running epsSMASH with the --minimal parameter will only run the core detection module and no other modules. Any modules disabled by this (e.g. HTML output) can be explicitly re-enabled, if desired, with their matching option (see --help-showall).

In general, we recommend running without the --minimal option, as a default run will generate much more useful results. However, if you want to run epsSMASH many hundreds or thousands of genomes, the --minimal option might be preferable to keep the number of files to a minimum.

Example:

epsSMASH --minimal pseudomonas_aeruginosa.gbff

Reusing results from a previous run#

The JSON output file previously generated by epsSMASH can be reused to regenerate other output files. Additional analyses can be enabled for the new run by adding their options.

NOTE: there are some situations in which results cannot be reused, these are typically when detection modules have changed since the results were generated. In this case, using the version of epsSMASH that generated the results will be required.

Example:

epsSMASH --reuse-results pseudomonas_aeruginosa.json

Customising output#

Output directories and custom names for output can be specified, instead of using the input filename by default (see the --output family of arguments).

HTML output can also be customised with alternative titles and descriptions (see the --html family of arguments).

Processing many genomes in parallel#

The reason for running a bioinformatic tool locally is often to analyse a large amount of data, which is not feasible on the webservice. The snakemake workflow multiSMASH was created to streamline large-scale analyses of BGCs across multiple genomes using the antiSMASH commandline tool. Since epsSMASH uses the same framework as antiSMASH, multiSMASH can easily be made to handle epsSMASH commands by changing "antismash" to "epsSMASH" in the "antismash_command:" line in the multiSMASH YAML. Note that there is a common issue with json-schema versioning between epsSMASH and multiSMASH which, if encountered, should be fixed by upgrading json-schema to version 4-20.0.