Analysis of molecular dynamics simulations

Overview
Creative Commons License: CC-BY Questions:
  • Which analysis tools are available?

Objectives:
  • Learn which analysis tools are available.

  • Analyse a protein and discuss the meaning behind each analysis.

Requirements:
Time estimation: 1 hour
Level: Intermediate Intermediate
Supporting Materials:
Published: Jun 3, 2019
Last modification: Nov 3, 2023
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00047
rating Rating: 5.0 (1 recent ratings, 5 all time)
version Revision: 32

Molecular dynamics simulations return highly complex data. The Cartesian positions of each atom of the system (thousands or even millions) are recorded at every time step of the trajectory; this may again be thousands to millions of steps in length. Therefore, some kind of further analysis is needed to extract useful information from the data.

In this tutorial, we illustrate some of the analytical tools able to investigate conformational changes by analysis of a typical short protein simulation, such as for CBH1.

There are other analysis tools available; you are encouraged to try these out too.

Agenda

In this tutorial, we will cover:

  1. Get data
  2. Analysis with BIO3D
    1. RMSD
    2. RMSF
    3. PCA
    4. Workflow vs. individual tools
  3. Further analysis
  4. Conclusion

Get data

The data required can be generated by completing the NAMD simulation tutorial. Access it from your history. Alternatively, download the data from the Zenodo link provided.

Hands-on: Upload cellulose simulation trajectory
  1. Create a new history

    Click the new-history icon at the top of the history panel:

    UI for creating new history

  2. Import the files from Zenodo, or from your history, if you completed the previous NAMD simulation tutorial:

    https://zenodo.org/record/2537734/files/cbh1test.dcd
    https://zenodo.org/record/2537734/files/cbh1test.pdb
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

  3. Rename the dcd file ‘CBH1 trajectory’ and rename the pdb file ‘CBH1 structure’

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, change the Name field
    • Click the Save button

Analysis with BIO3D

We’ll carry out some basic analysis by calculating RMSD, RMSF and PCA. The tools use the Bio3D package, developed by the Grant lab.

RMSD

RMSD, or root-mean-square deviation, is a standard measure of structural distance between coordinates. It measures the average distance between a group of atoms (e.g. backbone atoms of a protein). If we calculate RMSD between two sets of atomic coordinates - for example, two time points from the trajectory - the value is a measure of how much the protein conformation has changed. Wikipedia provides more information.

Hands-on: Calculate RMSD

RMSD Analysis ( Galaxy version 2.3.4) with the following parameters:

  • param-file “dcd trajectory input”: Trajectory file
  • param-file “pdb input”: Structure file
  • “Select domains”: Calpha (calculate RMSD only for the C-alpha domain of the protein)
Snapshot of RMSD plot. Open image in new tab

Figure 1: RMSD plot for a short CBH1 simulation
Snapshot of RMSD histogram. Open image in new tab

Figure 2: RMSD histogram for a short CBH1 simulation
Question

What do the features in the RMSD plot tell us?

The increase in the RMSD plot with time shows the protein steadily deviates from its original conformation.

The three peaks visible in the histogram suggests the presence of three main conformations which are accessed during the trajectory.

RMSF

The root-mean-square fluctuation (RMSF) measures the average deviation of a particle (e.g. a protein residue) over time from a reference position (typically the time-averaged position of the particle). Thus, RMSF analyzes the portions of structure that are fluctuating from their mean structure the most (or least).

Hands-on: Calculate RMSF
  1. RMSF Analysis ( Galaxy version 2.3.4) with the following parameters:
    • param-file “dcd trajectory input”: Trajectory file
    • param-file “pdb input”: Structure file
    • “Select domains”: Calpha (calculate RMSF only for the C-alpha domain of the protein)
Snapshot of RMSF plot. Open image in new tab

Figure 3: RMSF plot for a short CBH1 simulation
Question

What can we learn from the features in the RMSF plot?

Higher RMSF values most likely are loop regions with more conformational flexibility, where the structure is not as well defined.

This allows a link with experimental spectroscopic techniques which detect the secondary structure of a protein.

PCA

Principal component analysis (PCA) converts a set of correlated observations (movement of all atoms in protein) to a set of principal components which are linearly independent (or uncorrelated). Mathematically, it is a transformation of the data to a new coordinate system, in which the first coordinate represents the greatest variance, the second coordinate represents the second most variance, and so on.

You can read more about PCA on Wikipedia. In a nutshell, PCA takes a complex dataset with many variables and tries to distill the variables down to a few ‘principal components’ which still preserve most of the differences between the data.

In summary:

  • The PCA tool tool will calculate and return a PCA to determine the relationship between statistically meaningful conformations (major global motions) sampled during the trajectory. THe tool returns several images of the PCA and the raw data in tab-separated format.
  • The PCA visualization tool tool will carry out PCA and return a trajectory of the selected principle component. This trajectory is useful for visualisation and further investigating the interesting modes and changes that occur within a selected principle component.
Hands-on: Calculate PCA
  1. PCA ( Galaxy version 2.3.4) with the following parameters:
    • param-file “dcd trajectory input”: Trajectory file
    • param-file “pdb input”: Structure file
    • “Use singular value decomposition (SVD) instead of default eigenvalue decomposition ?”: No
    • “Select domains”: Calpha
  2. PCA visualization ( Galaxy version 2.3.4) with the following parameters:
    • param-file “dcd trajectory input”: Trajectory file
    • param-file “pdb input”: Structure file
    • “Use singular value decomposition (SVD) instead of default eigenvalue decomposition ?”: No
    • “Select domains”: Calpha
    • “Principal component id”: 1

PCA visualisation: This tool can generate small trajectories of the first three principal components. The .pdb of the .nc files can be visualized using a visualization software such as VMD.

Snapshot of PCA plot. Open image in new tab

Figure 4: PCA plot for a short CBH1 simulation
Question

What do the features in the RMSD plot tell us? Do the principal coordinates have a meaning?

Here, PCA shows the statistically meaningful conformations in the CBH1 trajectory. The principal motions within the trajectory and the vital motions needed for conformational changes can be identified. Two distinct groupings along the PC1 plane, indicating a non-periodic conformational change, are identified. The groupings along the PC2 and PC3 planes do not completely cluster separately, implying that these global motions are periodic. The PC1 is linked to an active site motion that limits the motion to a key glycosidic bond.

Workflow vs. individual tools

You can choose to use the tools one by one as described above, or alternatively combine into a single analysis using the workflow provided.

Snapshot of conformational analysis workflow. Open image in new tab

Figure 5: A simple analysis workflow
Hands-on: Upload a workflow
  1. Click on ‘Workflow’ in the toolbar at the top of the main Galaxy page. In the upper right corner of the central pane, click the ‘Upload or import workflow’ icon.

  2. Enter the ‘Archived workflow URL’ and click ‘Import workflow’.

    https://raw.githubusercontent.com/galaxyproject/training-material/master/topics/computational-chemistry/tutorials/analysis-md-simulations/workflows/main_workflow.ga
    

Further analysis

Further analyses are available; try out the MDAnalysis workflow, which includes a Ramachandran plot and various timeseries.

MDAnalysis workflow. Open image in new tab

Figure 6: MD analysis workflow

Conclusion