tutorial-icon

Tutorial 1c: Analyzing the toy project

0) Download the finished toy project

If you didn’t follow all the setup and simulation run, you can download the completed toy project started in Tutorial 1a with the simulation results from here. Extract the folder to proceed and move to the project directory (pep_amber) to start with the analysis.

1) Check project status

Execute the following command inside the project folder to get a description of the replicas (independent simulations) contained and their simulation status. At this stage, we should find a True in Min, Eq and Prod tags:

> mdmix info project
 ==========================================================
 || pyMDMix User Interface ||
 ==========================================================
 || Author: Daniel Alvarez-Garcia ||
 || Version : 0.1 ||
 ==========================================================
------------------------------
PROJECT pep_amber INFO
------------------------------
SYSTEMS:
========
SYSTEM:pep
REPLICAS
========
REPLICA:ETA_1 system:pep_ETA nanos:1 restrMode:HA restrForce:0.010 Min:True Eq:True Prod:True Align:False
REPLICA:WAT_1 system:pep_WAT nanos:1 restrMode:HA restrForce:0.010 Min:True Eq:True Prod:True Align:False
------------------------------

2) Align trajectory and quality check

If all is correct, let’s align the trajectory for all the replicas in the project. The reference structure used for aligning the trajectory is a pdb file ending with _ref.pdb. In our example pep_ETA_ETA_1_ref.pdb. This file is exactly the same as pep_ref.pdb in main project folder.

The process involves the automatic detection of all residue numbers part of the protein. The backbone atoms of these residues will be used for the alignment process. If you wish to use different residues for the alignment process (e.g. align just a subdomain), read the alignment section in the documentation.

> mdmix analyze align all
==========================================================
 || pyMDMix User Interface ||
 ==========================================================
 || Author: Daniel Alvarez-Garcia ||
 || Version : 0.1 ||
 ==========================================================

Selected replica names: ['ETA_1', 'WAT_1']
INFO Running alignment for replicas [ETA_1 Replica, WAT_1 Replica]
INFO Running alignment of trajectory in replica WAT_1
INFO Running alignment of trajectory in replica ETA_1
DONE
Total execution time: 8.708s

The alignment of 1ns for each replica in this small system took just 8.7 seconds in my machine. If we check again project status, we should find a Align:True for each of the replicas.

Inside each replica folder (e.g. MD/ETA_1/), a new folder align will be written with the resulting trajectory files, input ptraj scripts, one pdb file for each job run containing the protein average, and two text files with rmsd data for each job run (one for the backbone atoms and one for all heavy atoms).If you remember, when we set up the project, we modified the defaults and each file was saving 500ps. We chose 1ns simulation and thus we will have two files for one nanosecond of simulation.

Eg. this is the content of MD/ETA_1/align folder:

md1_bb_rmsd.out
md1_ha_rmsd.out
md1.nc
md1.ptraj
md1_ptraj.log
md2_bb_rmsd.out
md2_ha_rmsd.out
md2.nc
md2.ptraj
md2_ptraj.log
prot_avg_1.pdb
prot_avg_2.pdb

3) Build density grids

Once trajectory has been aligned, we can calculate the density maps for each probe in the solvents we have simulated. We can query the solvent probe names. For instance:

> mdmix info solvents
 ==========================================================
 || pyMDMix User Interface ||
 ==========================================================
 || Author: Daniel Alvarez-Garcia ||
 || Version : 0.1 ||
 ==========================================================
--------------------------------------------------
SOLVENT DB in USE: /home/daniel/.local/lib/python2.7/site-packages/pyMDMix-0.1-py2.7.egg/pyMDMix/data/solventlib/SOLVENTS.db
Package SOLVENT DB: /home/daniel/.local/lib/python2.7/site-packages/pyMDMix-0.1-py2.7.egg/pyMDMix/data/solventlib/SOLVENTS.db
SOLVENTS Directory: /home/daniel/.local/lib/python2.7/site-packages/pyMDMix-0.1-py2.7.egg/pyMDMix/data/solventlib

Solvents:
 SOLVENT: MOH INFO: Methanol 20% mixture BOXUNIT: MOHWAT20 PROBES: MOH_WAT, MOH_OH, MOH_CT
 SOLVENT: ANT INFO: Acetonitrile 20% mixture BOXUNIT: ANTWAT20 PROBES: ANT_C, ANT_WAT, ANT_N
 SOLVENT: ION INFO: Ammonium Acetate mixture (Charged mixture) BOXUNIT: CN3COOWAT PROBES: ION_NEG, ION_WAT, ION_POS
 SOLVENT: ETA INFO: Ethanol 20% mixture BOXUNIT: ETAWAT20 PROBES: ETA_CT, ETA_WAT, ETA_OH
 SOLVENT: ISO INFO: Isopropanol 20% mixture BOXUNIT: ISOWAT20 PROBES: ISO_WAT, ISO_OH, ISO_CT
 SOLVENT: ISO5 INFO: Isopropanol 5% mixture BOXUNIT: ISOWAT5 PROBES: ISO5_WAT, ISO5_OH, ISO5_CT
 SOLVENT: WAT INFO: Water box BOXUNIT: TIP3PBOX PROBES: WAT_WAT
 SOLVENT: MAM INFO: Acetamide 20% mixture BOXUNIT: MAMWAT20 PROBES: MAM_N, MAM_WAT, MAM_O, MAM_CT
--------------------------------------------------

Ethanol mixture (ETA) has three probes: ETA_CT, ETA_WAT and ETA_OH (being the methyl carbon, the water oxygen and the ethanol hydroxyl oxygen respectively). WAT mixture, which is actually not a mixture but just a box with water, has WAT_WAT probe.

For the current example we will calculate all probes for each solvent box plus a center of mass density map for each molecule type in the mixture:

> mdmix analyze density all --com
> mdmix analyze density all --com -C4 # same using 4 cpus

With this command we select all replicas in the project, calculate all density maps for all probes plus the center of mass density maps.

We could run just the calculation for all replicas with ETA solvent like this:

> mdmix analyze density bysolvent -s ETA --com -C3 # using 3 cpus
==========================================================
 || pyMDMix User Interface ||
 ==========================================================
 || Author: Daniel Alvarez-Garcia ||
 || Version : 0.1 ||
 ==========================================================

Selected replica names: ['ETA_1']
INFO Preparing analysis...
INFO Setting up density grids calculation for replica ETA_1
INFO Calculating density for probes: ['ETA_CT', 'ETA_WAT', 'ETA_OH', 'ETA_COM', 'WAT_COM']
INFO Ready to calculate
INFO --------------------------------------------------
INFO Running ETA_1 analysis...
INFO Working on file /home/daniel/Dropbox/WORK/mdmixDocAndMore/pep_amber/MD/ETA_1/align/md1.nc
INFO Working on file /home/daniel/Dropbox/WORK/mdmixDocAndMore/pep_amber/MD/ETA_1/align/md2.nc
INFO Processing density results for replica ETA_1
INFO Writing density grid dgrids/ETA_1_ETA_CT.dx
INFO Writing density grid dgrids/ETA_1_ETA_OH.dx
INFO Writing density grid dgrids/ETA_1_ETA_COM.dx
INFO Writing density grid dgrids/ETA_1_ETA_WAT.dx
INFO Writing density grid dgrids/ETA_1_WAT_COM.dx
INFO --------------------------------------------------
DONE
Total execution time: 66.731s

Alternatively, we could choose just some of the probes (e.g. ETA_CT and ETA_OH only):

> mdmix analyze density bysolvent -s ETA -P ETA_CT ETA_OH

Inside the replica folder (pep_amber/MD/ETA_1) a new directory dgrids/ will be created with one grid with format DX for each probe.

These grids can be visualized with VMD, PyMOL and the majority of molecular visualization programs. Use pep_ref.pdb in the main project folder as the reference structure for visualizing the results (the same used for aligning the trajectory).

Interpretation of the density grids obtained with this toy project are non-sense due to the short time of simulation and the simple peptide structure simulated.

4) Convert density maps to energy grids

Last step is converting the recently calculated density maps (or grids) to energy values using inverse Boltzmann equation as described in the documentation. There are two flavors of this function:

  • Convert single replicas: Each replica will have its own energy grids for the probes calculated.
  • Merge replicas and convert: Several replicas (same as independent simulations with same solvent) for which density maps have been calculated, can be merged to obtain more converged results. The process involves the addition of the density maps before conversion. This is done by selecting more than one replica of the same solvent. In this example, we have just one replica for each solvent and thus cannot do this analysis.

Converting single replicas

Again in project folder execute the following command to convert all density grids inside replica named ETA_1 to free energy maps (see Analysis Guide for theory and more examples).

> mdmix analyse energy byname -s ETA_1
 ==========================================================
 || pyMDMix User Interface ||
 ==========================================================
 || Author: Daniel Alvarez-Garcia ||
 || Version : 0.1 ||
 ==========================================================

Selected replica names: ['ETA_1']
INFO Converting grids for replica ETA_1
INFO Probe ETA_OH...
INFO Energy grid /home/daniel/Dropbox/WORK/mdmixDocAndMore/pep_amber/MD/ETA_1/egrids/ETA_OH_DG0.dx saved. OK.
INFO Probe ETA_CT...
INFO Energy grid /home/daniel/Dropbox/WORK/mdmixDocAndMore/pep_amber/MD/ETA_1/egrids/ETA_CT_DG0.dx saved. OK.
INFO Probe WAT_COM...
INFO Energy grid /home/daniel/Dropbox/WORK/mdmixDocAndMore/pep_amber/MD/ETA_1/egrids/WAT_COM_DG.dx saved. OK.
INFO Probe ETA_WAT...
INFO Energy grid /home/daniel/Dropbox/WORK/mdmixDocAndMore/pep_amber/MD/ETA_1/egrids/ETA_WAT_DG.dx saved. OK.
INFO Probe ETA_COM...
INFO Energy grid /home/daniel/Dropbox/WORK/mdmixDocAndMore/pep_amber/MD/ETA_1/egrids/ETA_COM_DG0.dx saved. OK.
INFO Done with all probes for replicas ['ETA_1']
Total execution time: 10.280s

Inside the replica folder (MD/ETA_1) a new folder egrids will be created containing the resulting grids after energy conversion.  These grids can be visualized with any molecular visualization program (pymol, VMD, etc.). When file name ends with DG0, it means that the volume correction has been applied to obtain the results. If the file name ends with DG, no volumetric correction has been done (as is the case for water grids which do not require such standard energy correction) – see Analysis Guide for details.

5) Otaining PDB files from grids

asdf

tutorial-icon

Tutorial 1b: Simulating the toy project

1) The COMMANDS.sh file

We have our project folder pep_amber created in Tutorial 1a ready for simulation. It contains 2 folders inside MD directory: ETA_1 and WAT_1. Each one contains all files needed to run an independent simulation: system topology, coordinates, min/ eq/ and md/ folders with the configuration files for each simulation step and, last but not least, a COMMANDS.sh file.

This file contains all the commands the user should execute to run all the simulation steps. E.g. entering in ETA_1 folder, COMMANDS.sh file contains:

cd min
sander -O -i min.in -o min.out -p ../pep_ETA_ETA_1.prmtop -c ../pep_ETA_ETA_1.prmcrd -r min.rst -ref ../pep_ETA_ETA_1.prmcrd
cd ../eq
pmemd.cuda -O -i eq1.in -o eq1.out -p ../pep_ETA_ETA_1.prmtop -c ../min/min.rst -r eq1.rst -x eq1.nc -ref ../pep_ETA_ETA_1.prmcrd
sh ../check_com.sh eq1.rst
pmemd.cuda -O -i eq2.in -o eq2.out -p ../pep_ETA_ETA_1.prmtop -c eq1.rst -r eq2.rst -x eq2.nc -ref ../pep_ETA_ETA_1.prmcrd
sh ../check_com.sh eq2.rst
pmemd.cuda -O -i eq3.in -o eq3.out -p ../pep_ETA_ETA_1.prmtop -c eq2.rst -r eq3.rst -x eq3.nc -ref ../pep_ETA_ETA_1.prmcrd
sh ../check_com.sh eq3.rst
pmemd.cuda -O -i eq4.in -o eq4.out -p ../pep_ETA_ETA_1.prmtop -c eq3.rst -r eq4.rst -x eq4.nc -ref ../pep_ETA_ETA_1.prmcrd
sh ../check_com.sh eq4.rst
pmemd.cuda -O -i eq5.in -o eq5.out -p ../pep_ETA_ETA_1.prmtop -c eq4.rst -r eq5.rst -x eq5.nc -ref ../pep_ETA_ETA_1.prmcrd
sh ../check_com.sh eq5.rst
cd ../md
pmemd.cuda -O -i md.in -o md1.out -p ../pep_ETA_ETA_1.prmtop -c ../eq/eq5.rst -r md1.rst -x md1.nc -ref ../pep_ETA_ETA_1.prmcrd
sh ../check_com.sh md1.rst
pmemd.cuda -O -i md.in -o md2.out -p ../pep_ETA_ETA_1.prmtop -c md1.rst -r md2.rst -x md2.nc -ref ../pep_ETA_ETA_1.prmcrd
sh ../check_com.sh md2.rst

Let me explain the file:

  •  Minimization, Equilibration and Production are stored in separate folders so to run each of the commands, the shell script moves to the appropriate directory.
  • By default, minimization step command is constructed to run with sander and all other steps with pmem.cuda. You can change this default (see bottom of this post). TODO!!!
  • Output file names should always be respected. During analysis process, the program will try to locate these files in their correct folders.
  • In this case, we are running a simulation with harmonic restraints over all non-hydrogen atoms of the peptide. After each simulation step is finished, an extra script called check_com.sh is called to run a re-centering, alignment and imaging process of the last step of the simulation (the restart file that will be used by next command). If this step is not correctly done, the simulation might crash or produce artifacts. Internally, this script calls ptraj progam.

2) Local simulation submission

You should directly execute this file if you wish to submit all the simulation process at the local machine:

> sh COMMANDS.sh >& COMMANDS.log &

This will submit ETA_1 simulation to run. You should repeat this process in all the folders inside MD if you want to submit all of them at the same time (ETA_1 and WAT_1 in this case).

3) Preparing a submission to a computing cluster through a queueing system

Usually, you will submit the simulation process in a computing cluster where all jobs are controlled through a GRID system (e.g. Sun Grid Engine, SLURM, etc.) or, colloquially, a queueing system.

In this case, you will need one queue input file for each independent job to be executed. pyMDMix incorporates a tool to help you in the process of setting up the runs. The program will take a template file corresponding to the queueing system you use and fill the blanks in with the corresponding command for each job execution. Read more in Queue Inputs documentation for setting up your own queue template file.

pyMDMix comes with a simple template file for SGE system. We will use it for this example.

> mdmix queue list
 ==========================================================
 || pyMDMix User Interface ||
 ==========================================================
 || Author: Daniel Alvarez-Garcia ||
 || Version : 0.1 ||
 ==========================================================
Installed queue system templates: SGE
Total execution time: 0.019s

The template should appear in the list. If not, check that the file name and location is correct (read documentation).

> mdmix queue write -n SGE
 ==========================================================
 || pyMDMix User Interface ||
 ==========================================================
 || Author: Daniel Alvarez-Garcia ||
 || Version : 0.1 ||
 ==========================================================

INFO Writing Queue SGE input files for replica ETA_1
INFO Writing Queue SGE input files for replica WAT_1
Total execution time: 0.212s

This command should be executed inside the project folder (pep_amber/). It has created one queue file for each simulation job. E.g. check in pep_amber/MD/ETA_1/min folder. You will have now a file min.in and a file min.q. min.q is the queue input file.

All queue files are connected and you will need to submit the minimization and the rest will automatically follow until the last step of the production stage. E.g.:

> cd MD/ETA_1/min
> qsub min.q

Last lines of min.q are

cd ../eq
qsub eq.q

Thus when minimization is finished, the queue will change de directory to ../eq and submit equilibration process automatically.

Remember you should submit min.q in every independent simulation you are running. You might use the following bash loop from within the project folder (pep_amber):

for d in MD/*; do cd $d/min; qsub min.q; cd -;done

4) Finish MD and move files

If you were running the simulation in a cluster and analyzing in a different location, when all simulations are finished, make sure you copy the resulting files to the appropriate directories. Please keep the same folder structure and naming for the analysis process to work correctly.

Jump to Tutorial 1c to start the analysis.

tutorial-icon

Tutorial 1a: Preparing a toy project

To follow this tutorial a basic knowledge on linux is needed (create, move, copy, edit files and directories).

1) Prepare a PDB file with your system to simulate

Download the following PDB file with a toy 8 aminoacid peptide PDB (pep.pdb). The molecule was built using MOE software, protonated and saved following AMBER naming convention. For a correct preparation of a protein for simualtion, if you are not familiar with the process, please read AMBER tutorials.

2) Create an Amber Object File containing the system to simulate

Open a terminal, create a working directory and move the recently downloaded pdb file there.

> mkdir testpymdmix
> cp $DOWNLOAD_DIR/pep.pdb testpymdmix/ # replace $DOWNLOAD_DIR with the path were pep.pdb was saved
> cd testpymdmix

Once inside, we will create an AMBER object file from the pdb file downloaded. To do so, we will open tLeap (make sure you have installed correctly Amber or AmberTools) using AmberFF99SB:

> tleap -f leaprc.ff99SB # or $AMBERHOME/exe/tleap -f leaprc.ff99SB if the program is not found in your PATH environmental variable

Some messages will be printed to screen and finally a new prompt sign (>) will appear. This time we are inside tleap console and ready to load our pdb file:

> peptide = loadpdb pep.pdb
Loading PDB file: ./pep.pdb
total atoms in file: 122
> check peptide
Checking 'pep'....
Checking parameters for unit 'pep'.
Checking for bond parameters.
Checking for angle parameters.
Unit is OK.
> saveOff peptide pep.off
> quit

We have here 3 commands. The first one (loadpdb) will load the molecule contained in pep.pdb file and create a tleap unit with the name peptide. This unit name is arbitrary and up to your choice. Second command will check that the unit is correctly parameterized (no error in this case). Final command will create a file named pep.off where peptide unit will be stored. Finally we exit tleap and we are back to our working directory and with a normal shell prompt.

3) Create a pyMDMix configuration file

Write a pyMDMix project template file using the following command:

> mdmix tools projecttemplate -f pep_test_project.cfg

This command will write a file named pep_test_project.cfg with some default fields to be filled in. Edit the file with your favorite editor and make the following modifications: Under [SYSTEM]:

NAME = pep
OFF = pep.off
#PDB =     # Comment out
UNAME = peptide # Un-comment and give the unit name contained in pep.off

Under [MDSETTINGS]:

SOLVENTS = WAT, ETA
NANOS = 1  #Un-comment and write a 1 for simulating only 1 ns
RESTR = HA # Un-comment to apply restraints on all non-hydrogen atoms of the peptide
FORCE = 0.01 # Aplly restraints with 0.01 kcal/mol.A^2 force

# Add the following lines
npt_eq_steps= 250000 # 500ps npt equilibration
nvt_prod_steps= 250000 # 500ps per nvt production file
trajfrequency = 2000  # write trajectory snapshots every 4ps (2000 steps x 2fs timestep)

Other options should remain commented (with # symbol in front of the line). In the first section we have defined a system with name pep that will use a unit named peptide which is saved inside pep.off file (from previous step). In the second section  we tell how to simulate this system: will use two solvent mixtures (identified by ETA and WAT names), each solvent will be simulated for 1ns applying positional restraints over all non-hydrogen atoms (HA) of the peptide with a force constant of 0.01 kcal/mol.A^2. Residues part of the peptide are automatically identified, you can specify which residues to restraint at RESTRMASK field in this same configuration file changing auto to the desired residue number range.

The last 3 lines modify the number of steps to run in the equilibration stage (from the default 500000 to 250000) and the number of steps each production job will simulate (250000, 500ps each job run instead of 1ns default). And will write snapshots of the trajectory every 4ps instead of 2ps (default: 1000, here modified to 2000). All these changes are done to decrease the weight of the simulation for the toy example and to speed up the analysis.

Once the file is saved, we are ready to create the project.

EXTRA: To query what are the solvent mixture identifiers, you can issue the following command:

> mdmix info solvents

5) Create the project

Using the configuration file, we will create a new project named pep_amber. With this command, all files needed for simulation will be created.

> mdmix create project -n pep_amber -f pep_test_project.cfg
==========================================================
 || pyMDMix User Interface ||
 ==========================================================
 || Author: Daniel Alvarez-Garcia ||
 || Version : 0.1 ||
 ==========================================================

Creating PROJECT pep_amber from config file: pep_test_project.cfg
Parsing System information...
INFO Using Forcefield or FRCMOD file: /data2/amber12/dat/leap/cmd/leaprc.ff99SB
INFO Using Forcefield or FRCMOD file: /data2/amber12/dat/leap/cmd/leaprc.gaff
Parsing md settings for replica creation...
Creating project pep_amber
INFO Creating project folder
INFO Writing system reference file: pep_ref.pdb
INFO Creating replicas for system pep...
INFO Solvating pep with solvent mixture ETA
INFO Solvating pep with solvent mixture WAT
INFO Creating folder structure for replica ETA_1
INFO Writing AMBER simulation input files for replica ETA_1 ...
INFO Restrain mask: :1-8 & !@H=
INFO Creating folder structure for replica WAT_1
INFO Writing AMBER simulation input files for replica WAT_1 ...
INFO Restrain mask: :1-8 & !@H=
INFO DONE
DONE
Total execution time: 12.531s

By default leaprc.ff99SB and leaprc.gaff are loaded. If you wish to use a different forcefield modify EXTRAFF field in the project configuration file. We will have a new folder under the current working directory with name pep_amber. From now on this folder is the Project folder and any mdmix command will be executed inside this folder. So let’s move there:

> cd pep_amber

And retrieve some information about the recently created project:

> mdmix info project
==========================================================
 || pyMDMix User Interface ||
 ==========================================================
 || Author: Daniel Alvarez-Garcia ||
 || Version : 0.1 ||
 ==========================================================
------------------------------
PROJECT pep_amber INFO
------------------------------
SYSTEMS:
========
SYSTEM:pep
REPLICAS
========
REPLICA:ETA_1 system:pep_ETA nanos:1 restrMode:HA restrForce:0.010 Min:False Eq:False Prod:False Align:False
REPLICA:WAT_1 system:pep_WAT nanos:1 restrMode:HA restrForce:0.010 Min:False Eq:False Prod:False Align:False

------------------------------
Total execution time: 0.040s

This general description will tell us what systems are included in the project, which replicas have been created for what systems and under which conditions these replicas will be simulated (restraining scheme and length in nanoseconds). Moreover for each of the replicas, a True False flag will indicate the current status of simulation (Min: True – minimization finished -; Eq:True – equilibration process finished -; Prod:True – Production stage finished -; Align:True – trajectory is aligned-). 6) It is time to run the simulation. Jump to Tutorial 1b for instructions on how to submit the simulation and prepare job control files for your own cluster (if needed). Jump directly to Tutorial 1c to download the results and directly proceed with the analysis.

6) Simulation time

It is time to grab all the files and run the simulation. For a general explanation on how to submit the simulation process, go to Tutorial 1b. Skip the simulation step and head to analysis section in Tutorial 1c for quick testing and learning pyMDMix operations (there you can download simulation results to proceed with their analysis).