Python application and library which generates diverse protein configurational ensembles suitable for seeding highly parallel molecular simulations.

Protein models are generated by mapping a set of target sequences onto a set of template structures, with a series of successive refinement and filtering stages to optimize model quality. This:

  • Exploits the entire variety of available genomic and structural data to provide diverse arrays of high-quality configurational models
  • Scales to allow the study of entire (super)families
  • Automates much of the time-consuming process of setting up protein systems for molecular simulation

The resulting models can be used as starting configurations for highly parallel molecular simulations, which take advantage of modern high-performance computing architectures. This approach is of particular benefit when used in conjunction with recent techniques which can combine data from multiple independent trajectories to produce kinetic models, such as Markov state models.

Ensembler can be used via the command-line application (ensembler) or scripted via the API, and can be run on a single computer or on a parallel compute cluster. It makes use of a number of external packages:

  • Modeller for comparative modeling of target sequences onto template structures
  • Rosetta loopmodel for reconstruction of missing template loops
  • OpenMM for model refinement with highly efficient, GPU-acclerated, molecular dynamics simulation
  • MDTraj for trajectory-manipulation and fast RMSD calculation
  • MSMBuilder for clustering

Overview of the Ensembler Pipeline

The modeling and refinement process comprises a series of successive stages:

  1. Retrieval of protein target sequences and template structures, e.g. from UniProt and the PDB
  2. (optional) Reconstruction of missing template loops, using Rosetta loopmodel
  3. Model generation - each target sequence is mapped onto each available template structure, using Modeller
  4. Culling of models based on close structural similarity
  5. Refinement with implicit solvent molecular dynamics simulation, using OpenMM
  6. (optional) Solvation of models with explicit water
  7. (optional) Refinement with explicit solvent molecular dynamics simulation, using OpenMM
  8. (optional) Packaging of models, ready for transfer or set-up on production simulation platforms such as Folding@Home