The contours depict the average water oxygen (yellow) and sodium (green) distributions around the MD-averaged structure of a DNA decamer.
Before starting with the practical issues, you should be warned about the two most notable shortcomings of the suggested procedure :
1. The method will fail if the solute diffused away from the water box. The reason for this is that the periodic boundary periodicity is not being taken into account (and the reason for this is programmer's lazyness).
The easiest way to check that the solute does stay inside the solvent box is to view the trajectory with carma (See also Quick view of large trajectory (DCD) files). For example, giving something in the spirit of
carma -gl 80.0 -trace -atmid " OH2 " -atmid " CA " my.psf my.dcd
will pop-up a window showing the trajectory. Use the x, y, z keys to change the view. You should be seeing something like :
2. The method will fail if the solute does not have a well behaved average structure. In other words, if the structure of the solute fluctuates too much, you should view the results with scepticism (or, repeat the calculation using shorter trajectory segments).
There are two ways of doing the calculation (both with carma). The first requires the creation of just one index file but creates a large intermediate DCD file. The second is doing the calculation on the fly (no intermediate files created) but requires you to create two index files (one for the atoms that will be used for fitting, the other for the atoms for which the density map will be build). You should probably use the second method.
Method 1 (creates large intermediate DCD file)
- Start from a DCD file containing all atoms of your system (and its corresponding PSF). The first frame of your DCD should be the first frame from the heating phase (the heat_out.dcd). The reason is that the first frame from the heating phase is almost identical with the initial system you've prepared, and thus, the axes of inertia of the molecule are aligned with the orthogonal frame. We want the axes of inertia aligned in order to keep the size of the map to a minimum. You can artificially add the first frame from heat_out.dcd to an existing DCD using catdcd. You should probably avoid including the frames corresponding to the initial minimisation and heating-up phase (but you can include all frames after the final temperature is reached).
- The first step is to align all frames of the trajectory using the solute atoms, but keeping all ions and water oxygens. You can do this with carma by using its -fit and -index keywords :
carma -v -fit -index -atmid " OH2 " -atmid " SOD " -atmid " CLA " -atmid " CA " my.dcd my.psf
where the fit.index file (which should be present in the current directory) should contain indeces for the Cα atoms1.
- Rename the DCD file prepared by carma : mv carma.fitted.dcd waters_ions.dcd
- Now prepare a PSF file for the reduced system using a xplor script :
structure @my.psf end delete selection=( not (name OH2 or name CA or name CLA or name SOD)) end write structure output=waters_ions.psf end stop
- You should now determine the limits on x, y, z inside which a map will be calculated. If you did what suggested above (and the first frame of your DCD is the first frame from the heating phase), you can calculate these limits using pdbset :
- Find the PDB file containing the solute coordinates from psfgen and just before running autoionize (something like psfgen.pdb).
- Run pdbset : pdbset xyzin psfgen.pdb (and type END).
- Find the limits of the molecule in the output :
Orthogonal Coordinate limits in output file: Minimum Maximum Centre Range On X : -24.30 24.30 0.00 48.60 On Y : -15.05 15.05 0.00 30.11 On Z : -10.98 15.87 2.44 26.85
- You will have to add a border around it, so in this example the limits would be something like -30.0 to 30.0 on x, -20.0 to 20.0 on y and -15.0 to 20.0 on z. Write them down.
- Calculate the hydration map :
carma -v -map -30 30 -20 20 -15 20 0.50 -atmid " OH2 " waters_ions.psf waters_ions.dcdwhere the numbers after '-map' are the limits of the map and the grid size (in this case 0.50 Angstrom).
Method 2 (requires the creation of two index files)
- Start from a DCD file containing all atoms of your system (and its corresponding PSF). The first frame of your DCD should be the first frame from the heating phase (the heat_out.dcd). The reason is that the first frame from the heating phase is almost identical with the initial system you've prepared, and thus, the axes of inertia of the molecule are aligned with the orthogonal frame. We want the axes of inertia aligned in order to keep the size of the map to a minimum. You can artificially add the first frame from heat_out.dcd to an existing DCD using catdcd. You should probably avoid including the frames corresponding to the initial minimisation and heating-up phase (but you can include all frames after the final temperature is reached).
- You will need to prepare two index files. The first (fit.index) will contain the indeces for the solute atoms that will be used to remove overall rotations and translations. The second index file (fmap.index) should contain the indeces of the atoms for which the distribution map will be calculated (eg. water oxygens). Both index files should be zero-offset. An easy way to prepare these files (assuming that you start from my.dcd and my.psf which describe the whole system) is :
- Prepare a xplor script to select the atoms that will be used for fitting plus the atoms for which you want to calculate the distribution map :
structure @my.psf end delete selection=( not (name OH2 or name CA)) end write structure output=waters_CAs.psf end stop
- Examine the waters_CAs.psf file. Write down the atom number range corresponding to the two classes of atoms (for example, 1-112 for Cα atoms, 113-4000 for water oxygens).
- Use awk to prepare the index files. You need something in the spirit of :
awk '{for (i=0 ; i < 112 ; i++) print i}'
- Do not forget to edit the index files and confirm that the numbers you have inside are indeed what you've wanted.
- Determine the limits on x, y, z inside which a map will be calculated. If you did what suggested above (and the first frame of your DCD is the first frame from the heating phase), you can calculate these limits using pdbset :
- Find the PDB file containing the solute coordinates from psfgen just before running autoionize (something like psfgen.pdb).
- Run pdbset : pdbset xyzin psfgen.pdb (and type END).
- Find the limits of the molecule in the output :
Orthogonal Coordinate limits in output file: Minimum Maximum Centre Range On X : -24.30 24.30 0.00 48.60 On Y : -15.05 15.05 0.00 30.11 On Z : -10.98 15.87 2.44 26.85
- You will have to add a border around it, so in this example the limits would be something like -30.0 to 30.0 on x, -20.0 to 20.0 on y and -15.0 to 20.0 on z. Write them down.
- Calculate the hydration map :
carma -v -fit -index -fmap -30 30 -20 20 -15 20 0.50 -atmid " OH2 " -atmid " CA " my.psf my.dcd
where the numbers after '-fmap' are the limits of the map and the grid size (in this case 0.50 Angstrom).
Viewing the map
The output from carma will be a map file named carma.fitted.dcd.map. This is an electron density format used by xplor and cns. For vmd to understand the format you will have to rename it so that its suffix is .edm. So :
mv carma.fitted.dcd.map carma.edm vmd carma.edm
You will see nothing. Go to 'representations', select 'isosurface', select 'None' (instead of 'Box'), click 'create replication', and change the contouring level ('isolevel'). You should see it.
You can now quit vmd, use xplor to calculate an average solute structure (for the same frames) and then use vmd to view and study the model + map. If you have a crystal structure for your molecule, you could even compare the experimentally observed and the MD-derived hydration patterns.
One other interesting analysis you can perform is whether the distribution of waters (or ions) converged. A simple-minded way of doing this is to calculate the distributions for two non-overlapping parts of the trajectory (which will have to be a sufficiently long one). If the two maps (from the two trajectory segments) do look alike (say, have a high value of the linear correlation coefficient) then you can argue that the distribution converged.