Command-Line Interface Documentation¶
Overview¶
Ensembler can be used via the command-line tool ensembler
or via the API.
For API documentation, see the source code.
The ensembler
tool is operated via a number of subcommands, which should be executed successively
ensembler init
ensembler gather_targets
ensembler gather_templates
ensembler loopmodel
ensembler align
ensembler build_models
ensembler cluster
ensembler refine_implicit
ensembler solvate
ensembler refine_explicit
ensembler package_models
The optional ensembler validate
subcommand uses the
MolProbity command-line tools to
conduct model quality validation based on criteria such as Ramachandran angles,
backbone distortion, and atom clashes.
The ensembler quickmodel
subcommand allows the entire modeling pipeline to
be run in one go for a single target and a small number of templates. Note that
this command will not work with MPI.
To print helpstrings for each subcommand, pass the -h
flag.
If desired, target-selection and template-selection can be set up manually,
rather than using the gather_targets
and gather_templates
subcommands.
Targets should be provided as a fasta-format file (targets/targets.fa
)
containing target sequences and arbitrary identifiers. Template sequences and
arbitrary identifiers should be provided in a fasta-format file
(templates/templates-resolved-seq.fa
), and structures should be provided as
PDB-format coordinate files in the directory templates/structures-resolved
.
Each structure should be named XXX.pdb, where XXX matches the identifier in the
fasta file. The residues in the coordinate files should also match the
sequences in the fasta file.
Custom settings¶
Many aspects of the behavior of Ensember can be specified by using the Python API instead of the main command-line interface. For API documentation, see the source code, or view the docstrings in iPython.
Custom settings via the manual_overrides.yaml file¶
Some options can instead be specified via the manual_overrides.yaml
file,
which is created when initializing a new Ensembler project. The file contains
an example configuration, with each line commented out. The user can thus
uncomment the relevant lines and edit as necessary.
target-selection:
domain-spans:
ABL1_HUMAN_D0: 242-513
template-selection:
min-domain-len: 0
max-domain-len: 350
domain-spans:
ABL1_HUMAN_D0: 242-513
skip-pdbs:
- 4CYJ
- 4P41
- 4P2W
- 4QTD
- 4Q2A
- 4CTB
- 4QOX
refinement:
ph: 8.0
custom_residue_variants:
DDR1_HUMAN_D0_PROTONATED:
# keyed by 0-based residue index
35: ASH
The above configuration makes the following specifications (in order of appearance):
- Specifies a custom residue span for the target
ABL1_HUMAN_D0
. This is useful in cases where a different domain span is desired from that annotated in UniProt.- Specifies minimum and maximum domain lengths for templates. Any domain with more than 350 residues would be excluded. The same custom residue span used for target domains is also specified for the template domains.
- Certain PDB files can be skipped if they cause problems.
- A custom pH level (default: 7.0) is set, which determines how protonation states are assigned by OpenMM prior to molecular dynamics refinement.
- Custom residue variants are specified. This can be used to set specific protonation states, rather than rely purely on a defined pH level. These specified protonation states would override those determined by pH. The naming of residue variants (e.g.
ASH
) follows the OpenMM conventions.
Additional Tools¶
Ensembler includes a tools
submodule, which allows the user to conduct
various useful tasks which are not considered core pipeline functions. The
use-cases for many of these tools are quite specific, so they may not be
applicable to every project, and should be used with caution.
Residue renumbering according to UniProt sequence coordinates¶
$ ensembler renumber_residues --target EGFR_HUMAN_D0
The given target ID must begin with a UniProt mnemonic, e.g. “EGFR_HUMAN”.
This will output two files in the models/[target_id]
directory:
topol-renumbered-implicit.pdb
and topol-renumbered-explicit.pdb
.
The coordinates are simply copied from the first example found for each of
refined-implicit.pdb.gz
and refined-explicit.pdb.gz
. The residue
numbers are renumbered according to the canonical isoform sequence coordinates
in the UniProt entry.
Generating unrefined model structures¶
In some cases it may be useful to analyze model structures which have not
undergone refinement, but which have topologies equivalent to the final refined
models. These structures are not saved by the main pipeline functions by
default, but can be regenerated using
ensembler.tools.mktraj.MkTrajImplicitStart
. This code simply loads each
model structure with openmm
, adds hydrogens, and writes the resultant
structure as a pdb file (implicit-start.pdb.gz
). It also combines the
structures into a trajectory (traj-implicit-start.xtc
). This function is
accessed via the Python API as follows:
from ensembler.tools.mktraj import MkTrajImplicitStart
MkTrajImplicitStart(targetid='EGFR_HUMAN_D0')