NOTE WELL : For these analyses to be valid, you have to be able to show that your simulation sufficiently sampled the molecule's configurational space. See Eigenvectors and eigenvalues, sufficient sampling issues.
Having a physical analogue in your mind for what PCA analysis is doing is not easy. A nice simplified but revelant example can be found in wikipedia [1]
Determination of collective motions
If you do a plot of the eigenvalues (xmgr carma.eigenvalues.dat) you will see that the top ~10% of eigenvectors accounts for the majority of the observed motion. You can visualise the motion due to each selected eigenvector with carma and vmd. Here it goes :
- First determine the amplitudes of the motion along the selected eigenvector. For the first eigenvector it would be something like :
carma -v -cov -eigen -proj 1 10 1 CA.dcd CA.psf echo "Amplitude limits are " `sort -r -n -k 2 carma.fluctuations.dat | tail -1 | awk '{print $2}'` `sort -n -k 2 carma.fluctuations.dat | tail -1 | awk '{print $2}'`
The file carma.fluctuations.dat is organised as follows : successive lines correspond to successive frames from the DCD file. Successive columns correspond to successive eigenvectors (in decreasing order of corresponding eigenvalues). The numbers are the amplitudes with which the given eigenvectors contribute to the given frames (ie structures). Large numbers (in absolute value) imply that the corresponding eigenvector contributes significantly to the corresponding structure. The unix shell line above determines the range of amplitudes observed in the given run.
- Now, use the amplitudes you've determined to calculate a pseudo-trajectory file depicting the motion due to the selected eigenvector :
carma -verb -write -cov -eigen -play 1 <amp1> <amp2> CA.dcd CA.psf vmd CA.psf carma.play.dcd
While you are inside VMD, you can save a pdb file containing snapshots from the carma.play.dcd (every, say, 200 frames) and use the pdb to prepare a figure with rasmol (or vmd, pymol, …) :
Essential dynamics
Once you know the principal components (and assuming that sufficient sampling has been demonstrated), you can -at least in principle- use these components to obtain all structures that are consistent with them. Carma is rather primitive in its approach : it can produce a pseudo-trajectory file containing a number of structures (but any number, eg 5 or 10 million) that are consistent with the observed (from the simulation) components. Note that the succession of structures in this file has no physical meaning (there is no such thing as 'time'), and that the amplitudes for the various eigenvectors are chosen randomly and uniformly from the observed fluctuations (contained in the carma.fluctuations.dat file). Performing the calculation is easy :
carma -v -cov -eigen -proj 1 50 1 CA.dcd CA.psf carma -v -cov -eigen -arti 50 20000 CA.dcd CA.psf
will produce a DCD file with the name carma.arti.dcd containing 20000 structures consistent with the 50 largest components.
You should view the resulting structures with caution (and you should never try to obtain frequencies from it) : because carma choses amplitudes randomly and uniformly the sampling of conformations is unphysical. You can convince yourself for this by performing (on the carma.arti.dcd file) an analysis similar to the one discussed in Eigenvectors and eigenvalues, sufficient sampling issues. For example, a 1-2 plot would look like this
which is convincingly unnatural. Note also that the structures corresponding to the corners of the distribution shown above can be highly distorted.
Determination of correlated motions' fluctuations
When you calculate the average structure (with xplor) you also get the atomic fluctuations. These fluctuations do not differentiate between being correlated or not. After principal component analysis, it is possible to prepare graphs that show the amount of correlated fluctuations (per residue per eigenvector). Doing it :
- Use your intelligent editor to select only the first column (first eigenvector) from the carma.eigenvectors.dat file. Save this column in a new file, say 1.dat.
- Compile the following small program in C :
#include <stdio.h> #include <math.h> main() { float x, y, z; int c; while( ( c = scanf("%f %f %f", &x, &y, &z)) == 3 ) printf("%10.8f\n", sqrt( x*x + y*y + z*z)); if ( c > 0 ) fprintf(stderr, "ERROR. Not triplets ???\n"); }
- Rename the resulting executable to mkfluct
- Use mkfluct to calculate the per residue correlated fluctuations :
mkfluct < 1.dat > 1.fluct xmgr 1.fluct
- Repeat the procedure for the other eigenvectors corresponding to large eigenvalues.
- Prepare a nice plot :
Entropy estimation
It is possible to use the eigenvalues of the mass weighted covariance matrix to calculate an upper bound to solute's configurational entropy (but including all atoms, even hydrogens). carma can do the trick, but because the results have not been validated, we will skip this for the time being.