MBG wiki: Editing More on eigenvectors and eigenvalues

Calculation of eigenvalues and eigenvectors is the first step for performing a large number of other analyses. Some of these are discussed below.

<b>NOTE WELL :</b> For these analyses to be valid, you have to be able to show that your simulation sufficiently sampled the molecule's configurational space. See [[Eigenvectors and eigenvalues, sufficient sampling issues]].

Having a physical analogue in your mind for what PCA analysis is doing is not easy. A nice simplified but revelant example can be found in wikipedia [http://en.wikipedia.org/wiki/Eigenvalue%2C_eigenvector_and_eigenspace#Examples]

== Determination of collective motions ==

If you do a plot of the eigenvalues (xmgr carma.eigenvalues.dat) you will see that the top ~10% of eigenvectors accounts for the majority of the observed motion. You can visualise the motion due to each selected eigenvector with /carma/ and /vmd/. Here it goes :

* First determine the amplitudes of the motion along the selected eigenvector. For the first eigenvector it would be something like :

<code>
carma -v -cov -eigen -proj 1 10 1 CA.dcd CA.psf
echo  "Amplitude limits are " `sort -r -n -k 2 carma.fluctuations.dat | tail -1 | awk '{print $2}'` `sort -n -k 2 carma.fluctuations.dat | tail -1 | awk '{print $2}'`
</code>

The file /carma.fluctuations.dat/ is organised as follows : successive lines correspond to successive frames from the DCD file. Successive columns correspond to successive eigenvectors (in decreasing order of corresponding eigenvalues). The numbers are the amplitudes with which the given eigenvectors contribute to the given frames (ie structures). Large numbers (in absolute value) imply that the corresponding eigenvector contributes significantly to the corresponding structure. The unix shell line above determines the range of amplitudes observed in the given run.

* Now, use the amplitudes you've determined to calculate a pseudo-trajectory file depicting the motion due to the selected eigenvector :

<code>
carma -verb -write -cov -eigen -play 1 <amp1> <amp2> CA.dcd CA.psf
vmd CA.psf carma.play.dcd
</code>

While you are inside VMD, you can save a pdb file containing snapshots from the /carma.play.dcd/ (every, say, 200 frames) and use the pdb to prepare a figure  with /rasmol/ (or /vmd/, /pymol/, ...) :

[[image: Motion of HrcQb due to principal component]]

== Essential dynamics ==

Once you know the principal components (and assuming that sufficient sampling has been demonstrated), you can -at least in principle- use these components to obtain all structures that are consistent with them. /Carma/ is rather primitive in its approach : it can produce a pseudo-trajectory file containing a number of structures (but _any_ number, eg 5 or 10 million) that are consistent with the observed (from the simulation) components. Note that the succession of structures in this file has no physical meaning (there is no such thing as 'time'), and that the amplitudes for the various eigenvectors are chosen randomly and uniformly from the observed fluctuations (contained in the carma.fluctuations.dat file). Performing the calculation is easy :

<code>
carma -v -cov -eigen -proj 1 50 1 CA.dcd CA.psf
carma -v -cov -eigen -arti 50 20000 CA.dcd CA.psf
</code>

will produce a DCD file with the name *carma.arti.dcd* containing 20000 structures consistent with the 50 largest components.

You should view the resulting structures with caution (and you should never try to obtain frequencies from it) : because /carma/ choses amplitudes randomly and uniformly the sampling of conformations is unphysical. You can convince yourself for this by performing (on the *carma.arti.dcd* file) an analysis similar to the one discussed in [[Eigenvectors and eigenvalues, sufficient sampling issues]]. For example, a 1-2 plot would look like this

[[image: projection of fluctuations on the 1-2 eigenvector plane for ARTI]]
 
which is convincingly unnatural. Note also that the structures corresponding to the corners of the distribution shown above can be highly distorted.

== Determination of correlated motions' fluctuations ==

When you calculate the average structure (with /xplor/) you also get the atomic fluctuations. These fluctuations do not differentiate between being correlated or not. After principal component analysis, it is possible to prepare graphs that show the amount of correlated fluctuations (per residue per eigenvector). Doing it :

* Use your intelligent editor to select only the first column (first eigenvector) from the *carma.eigenvectors.dat* file. Save this column in a new file, say <b>1.dat</b>.

* Compile the following small program in C :
<code>
#include <stdio.h>
#include <math.h>

main()
{
        float   x, y, z;
        int     c;

while( ( c = scanf("%f %f %f", &x, &y, &z)) == 3 )
                printf("%10.8f\n", sqrt( x*x + y*y + z*z));
        if ( c > 0 )
                fprintf(stderr, "ERROR. Not triplets ???\n");
}
</code>

* Rename the resulting executable to *mkfluct*

* Use *mkfluct* to calculate the per residue correlated fluctuations :
<code>
mkfluct < 1.dat > 1.fluct
xmgr 1.fluct
</code>

* Repeat the procedure for the other eigenvectors corresponding to large eigenvalues.

* Prepare a nice plot :

[[image: Correlated fluctuations per residue per eigenvector]]

== Entropy estimation ==

It is possible to use the eigenvalues of the *mass weighted covariance matrix* to calculate an upper bound to solute's configurational entropy (but including all atoms, even hydrogens). /carma/ can do the trick, but because the results have not been validated, we will skip this for the time being.

Summary:

This change is a minor edit.

Username:

Replace this text with a file.