Command-Line Interface Documentation¶
Ensembler can be used via the command-line tool
ensembler or via the API.
For API documentation, see the source code.
ensembler tool is operated via a number of subcommands, which should be executed successively
ensembler init ensembler gather_targets ensembler gather_templates ensembler loopmodel ensembler align ensembler build_models ensembler cluster ensembler refine_implicit ensembler solvate ensembler refine_explicit ensembler package_models
ensembler validate subcommand uses the
MolProbity command-line tools to
conduct model quality validation based on criteria such as Ramachandran angles,
backbone distortion, and atom clashes.
ensembler quickmodel subcommand allows the entire modeling pipeline to
be run in one go for a single target and a small number of templates. Note that
this command will not work with MPI.
To print helpstrings for each subcommand, pass the
If desired, target-selection and template-selection can be set up manually,
rather than using the
Targets should be provided as a fasta-format file (
containing target sequences and arbitrary identifiers. Template sequences and
arbitrary identifiers should be provided in a fasta-format file
templates/templates-resolved-seq.fa), and structures should be provided as
PDB-format coordinate files in the directory
Each structure should be named XXX.pdb, where XXX matches the identifier in the
fasta file. The residues in the coordinate files should also match the
sequences in the fasta file.
Many aspects of the behavior of Ensember can be specified by using the Python API instead of the main command-line interface. For API documentation, see the source code, or view the docstrings in iPython.
Custom settings via the manual_overrides.yaml file¶
Some options can instead be specified via the
which is created when initializing a new Ensembler project. The file contains
an example configuration, with each line commented out. The user can thus
uncomment the relevant lines and edit as necessary.
target-selection: domain-spans: ABL1_HUMAN_D0: 242-513 template-selection: min-domain-len: 0 max-domain-len: 350 domain-spans: ABL1_HUMAN_D0: 242-513 skip-pdbs: - 4CYJ - 4P41 - 4P2W - 4QTD - 4Q2A - 4CTB - 4QOX refinement: ph: 8.0 custom_residue_variants: DDR1_HUMAN_D0_PROTONATED: # keyed by 0-based residue index 35: ASH
The above configuration makes the following specifications (in order of appearance):
- Specifies a custom residue span for the target
ABL1_HUMAN_D0. This is useful in cases where a different domain span is desired from that annotated in UniProt.
- Specifies minimum and maximum domain lengths for templates. Any domain with more than 350 residues would be excluded. The same custom residue span used for target domains is also specified for the template domains.
- Certain PDB files can be skipped if they cause problems.
- A custom pH level (default: 7.0) is set, which determines how protonation states are assigned by OpenMM prior to molecular dynamics refinement.
- Custom residue variants are specified. This can be used to set specific protonation states, rather than rely purely on a defined pH level. These specified protonation states would override those determined by pH. The naming of residue variants (e.g.
ASH) follows the OpenMM conventions.
Ensembler includes a
tools submodule, which allows the user to conduct
various useful tasks which are not considered core pipeline functions. The
use-cases for many of these tools are quite specific, so they may not be
applicable to every project, and should be used with caution.
Residue renumbering according to UniProt sequence coordinates¶
$ ensembler renumber_residues --target EGFR_HUMAN_D0
The given target ID must begin with a UniProt mnemonic, e.g. “EGFR_HUMAN”.
This will output two files in the
The coordinates are simply copied from the first example found for each of
refined-explicit.pdb.gz. The residue
numbers are renumbered according to the canonical isoform sequence coordinates
in the UniProt entry.
Generating unrefined model structures¶
In some cases it may be useful to analyze model structures which have not
undergone refinement, but which have topologies equivalent to the final refined
models. These structures are not saved by the main pipeline functions by
default, but can be regenerated using
ensembler.tools.mktraj.MkTrajImplicitStart. This code simply loads each
model structure with
openmm, adds hydrogens, and writes the resultant
structure as a pdb file (
implicit-start.pdb.gz). It also combines the
structures into a trajectory (
traj-implicit-start.xtc). This function is
accessed via the Python API as follows:
from ensembler.tools.mktraj import MkTrajImplicitStart MkTrajImplicitStart(targetid='EGFR_HUMAN_D0')