# Setting up molecular systems

### Overview

Questions:
• How to get started modelling a protein and a ligand?

Objectives:
• learn about the Protein Data Bank

• learn how to set up up a model protein and ligand system (with CHARMM-GUI)

• learn how to upload the system to Galaxy

Requirements:
Time estimation: 2 hours
Level: Intermediate Intermediate
Supporting Materials:
Last modification: Mar 12, 2021

### comment Audience

This tutorial is intended for those who are new to the computational chemistry tools in Galaxy.

# Introduction

In this tutorial, we’ll cover the basics of molecular modelling by setting up a protein in complex with a ligand and uploading the structure to Galaxy. This tutorial will make use of CHARMM-GUI. Please note that the follow-up to this tutorial (located here) requires access to NAMD Galaxy tools, which can be accessed using the Docker container but are currently not available on any public Galaxy server.

### Agenda

In this tutorial, we will cover:

1. Cellulase and cellulose
2. Modelling with CHARMM-GUI

# Cellulase and cellulose

To start we’ll look at the PDB and find the entry for a fungal enzyme that cleaves cellulose. The enzyme is 7CEL, a hydrolase as seen in the figure.

In this section we’ll access the PDB, download the correct structure, import it and view in Galaxy.

### details Background: What is the PDB (Protein Data Bank) and format?

The Protein Data Bank (PDB) format contains atomic coordinates of biomolecules and provides a standard representation for macromolecular structure data derived from X-ray diffraction and NMR studies. Each structure is stored under a four-letter accession code. For example, the PDB file we will use is assigned the code 7CEL).

More resources:

### details Background: Why choose a cellulase?

Using enzymes to break down abundant cellulose into disaccharide units (cellobiose) is a method to optimise the biofuel process. Barnett et al. 2011

More resources:

## Get data

The 7CEL PDB does not include a complete 8 unit substrate and some modelling is required. The correctly modelled substrate is provided for this tutorial.

### details More details about the modelling done

• VMD (visualisation software) was used for atomic placement and CHARMM was used for energy minimisation.
• The PDB structure contains a mutation at position 217 (glumatate to glutamine). Our structure reverses this.
• The ligand was modelled separately and inserted into the binding site.

1. Create a new history for this tutorial.

### Tip: Creating a new history

Click the new-history icon at the top of the history panel.

If the new-history is missing:

1. Click on the galaxy-gear icon (History options) on the top of the history panel
2. Select the option Create New from the menu
2. Import the files from the Zenodo link provided.

https://zenodo.org/record/2600690

• Open the Galaxy Upload Manager (galaxy-upload on the top-right of the tool panel)

• Select Paste/Fetch Data
• Paste the link into the text field

• Press Start

• Close the window

• By default, Galaxy uses the URL as the name, so rename the files with a more useful name.

### Tip: Importing data from a data library

As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:

• Go into Shared data (top panel) then Data libraries
• Navigate to the correct folder as indicated by your instructor
• Select the desired files
• Click on the To History button near the top and select as Datasets from the dropdown menu
• In the pop-up window, select the history you want to import the files to (or create a new one)
• Click on Import
3. Rename the datasets.
4. Check that the datatype is correct. The file should have the PDB datatype.

### Tip: Changing the datatype

• Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
• In the central panel, click on the galaxy-chart-select-data Datatypes tab on the top
• Select datatypes
• Click the Change datatype button

# Modelling with CHARMM-GUI

It is convenient to set up the molecular system outside Galaxy using a tool such as CHARMM-GUI. Alternative methods are possible - see the GROMACS tutorial for an example. Jo et al. 2016

### tip Tip: Viewing figures

• Some of the figures are screenshots and it may be difficult to make out details
• Right-click on the image and choose ‘Open image in new tab’ to view
• Zoom in and out as needed to see the content

Go to the correct section depending on which MD engine you will be using.

## CHARMM

### Upload the PDB to CHARMM-GUI

Navigate to CHARMM-GUI and use the Input Generator, specifically the PDB Reader tool and upload the Cellulase PDB file. Press ‘Next Step: Select Model/Chain’ in the bottom right corner.

### hands_on Hands-on: Upload the PDB to CHARMM-GUI

1. Retrieve the modelled PDB structure from Zenodo.
2. Upload the PDB and choose CHARMM format.

### hands_on Hands-on: Generate PDB file

Two model chains are presented for selection: the protein (PROA) and the hetero residue, which is the ligand or glycan in this case (HETA). Select both, and press ‘Next Step: Generate PDB’ in the bottom right corner.

### hands_on Hands-on: Make necessary modifications

Rename the hetero chain to BGLC and add ten disulfide bonds to the protein, as shown in the figure. Then press ‘Next Step: Manipulate PDB’ in the bottom right corner.

The output is a .tgz file (a tarball or zipped tarball). Inside the archive you will see all inputs and outputs from CHARMM-GUI.

### tip What is a .tgz file?

This is a compressed file which contains all the output files created by the CHARMM-GUI. To access them, the .tgz file needs to be decompressed. There should be a tool available on your operating system for this. If you prefer to use the command line, tar will work fine on Linux or Mac tar -zxvf example.tgz. On Windows use 7zip, or download Git for windows and use Git Bash.

## NAMD

### hands_on Hands-on: Upload the PDB to CHARMM-GUI

Retrieve the modelled PDB structure from Zenodo. Navigate to CHARMM-GUI and use the Input Generator, specifically the Quick MD Simulator tool. Upload the PDB file, selecting ‘CHARMM’ as the file format. Press ‘Next Step: Select Model/Chain’ in the bottom right corner.

### hands_on Hands-on: Generate PDB file

Two model chains are presented for selection: the protein (PROA) and the hetero residue, which is the ligand or glycan in this case (HETA). Select both, and press ‘Next Step: Generate PDB’ in the bottom right corner.

### hands_on Hands-on: Make necessary modifications

Rename the hetero chain to BGLC and add disulfide bonds.

### hands_on Hands-on: Solvate the protein

Set up a waterbox. Use a size of 10 angstroms and choose a cubic box (‘rectangular’ option).

### question Question

Why is 10 angstrom a fair choice for the buffer? Why choose 0.15M NaCl?

### solution Solution

Under periodic boundary conditions, we need to ensure the protein can never interact with its periodic image, otherwise artefacts are introduced. Allowing 10 angstroms between the protein and the box edge ensures the two images will always be at minimum 20 angstroms apart, which is sufficient.

Some of the residues on the protein surface are charged and counter-ions need to be present nearby to neutralise them. Failure to explicitly model salt ions may destabilise the protein.

### hands_on Hands-on: Generate the FFT

Particle Mesh Ewald (PME) summation is the method being used to calculate long-range interactions in this system. To improve the computational time a Fast Fourier Transform (FFT) is used. A detailed discussion of FFT will not be presented here; there are many articles on the subject. Try Wikipedia and Ewald summation techniques in perspective: a survey.

### hands_on Hands-on: Solvate the protein

The output is a .tgz file (a tarball or zipped tarball). Inside the archive you will see all inputs and outputs from CHARMM-GUI.

### tip What is a .tgz file?

This is a compressed file and needs to be uncompressed using the correct tool. On Linux or Mac: tar will work fine tar -zxvf example.tgz. On Windows use 7zip or download Git for windows and use Git Bash.

### hands_on Hands-on: Upload files to Galaxy

Upload the following files to your BRIDGE instance and ensure the correct datatype is selected:

• step3_pbcsetup.xplor.ext.psf -> xplor psf input (psf format)
• step3_pbcsetup.pdb -> pdb input (pdb format)
• Checkfft.str -> PME grid specs (txt format)
• step2.1_waterbox.prm -> waterbox prm input (txt format)

You are now ready to run the NAMD workflow, which is discussed in another tutorial.

# Conclusion

trophy Well done! You have started modelling a cellulase protein and uploaded it into Galaxy. The next step is running molecular dynamics simulations (tutorial)

### Key points

• The PDB is a key resource for finding protein structures.

• Using CHARMM-GUI is one way to prepare a protein and ligand system.

• To get data into Galaxy you can upload a file from your computer or paste in a web address.

Have questions about this tutorial? Check out the FAQ page for the Computational chemistry topic to see if your question is listed there. If not, please ask your question on the GTN Gitter Channel or the Galaxy Help Forum

# Useful literature

Further information, including links to documentation and original publications, regarding the tools, analysis techniques and the interpretation of results described in this tutorial can be found here.

# References

1. Barnett, C. B., K. A. Wilkinson, and K. J. Naidoo, 2011 Molecular Details from Computational Reaction Dynamics for the Cellobiohydrolase I Glycosylation Reaction. Journal of the American Chemical Society 133: 19474–19482. 10.1021/ja206842j
2. Jo, S., X. Cheng, J. Lee, S. Kim, S.-J. Park et al., 2016 CHARMM-GUI 10 years for biomolecular modeling and simulation. Journal of Computational Chemistry 38: 1114–1124. 10.1002/jcc.24660

# Feedback

Did you use this material as an instructor? Feel free to give us feedback on how it went.

# Citing this Tutorial

1. Christopher Barnett, Simon Bray, 2021 Setting up molecular systems (Galaxy Training Materials). https://training.galaxyproject.org/training-material/topics/computational-chemistry/tutorials/setting-up-molecular-systems/tutorial.html Online; accessed TODAY
2. Batut et al., 2018 Community-Driven Data Analysis Training for Biology Cell Systems 10.1016/j.cels.2018.05.012

### details BibTeX

@misc{computational-chemistry-setting-up-molecular-systems,
author = "Christopher Barnett and Simon Bray",
title = "Setting up molecular systems (Galaxy Training Materials)",
year = "2021",
month = "03",
day = "12"
url = "\url{https://training.galaxyproject.org/training-material/topics/computational-chemistry/tutorials/setting-up-molecular-systems/tutorial.html}",
note = "[Online; accessed TODAY]"
}
@article{Batut_2018,
doi = {10.1016/j.cels.2018.05.012},
url = {https://doi.org/10.1016%2Fj.cels.2018.05.012},
year = 2018,
month = {jun},
publisher = {Elsevier {BV}},
volume = {6},
number = {6},
pages = {752--758.e1},
author = {B{\'{e}}r{\'{e}}nice Batut and Saskia Hiltemann and Andrea Bagnacani and Dannon Baker and Vivek Bhardwaj and Clemens Blank and Anthony Bretaudeau and Loraine Brillet-Gu{\'{e}}guen and Martin {\v{C}}ech and John Chilton and Dave Clements and Olivia Doppelt-Azeroual and Anika Erxleben and Mallory Ann Freeberg and Simon Gladman and Youri Hoogstrate and Hans-Rudolf Hotz and Torsten Houwaart and Pratik Jagtap and Delphine Larivi{\{e}}re and Gildas Le Corguill{\'{e}} and Thomas Manke and Fabien Mareuil and Fidel Ram{\'{\i}}rez and Devon Ryan and Florian Christoph Sigloch and Nicola Soranzo and Joachim Wolff and Pavankumar Videm and Markus Wolfien and Aisanjiang Wubuli and Dilmurat Yusuf and James Taylor and Rolf Backofen and Anton Nekrutenko and Björn Grüning},
title = {Community-Driven Data Analysis Training for Biology},
journal = {Cell Systems}
}
`