MBG wiki | RecentChanges | Blog | 2021-09-24 | 2021-09-23

Eigenvectors and eigenvalues, sufficient sampling issues

Doing a principal component analysis (PCA) of a trajectory is easy. And is even easier to prepare nice figures showing collective motions of the solute (see Calculation of eigenvalues and eigenvectors and More on eigenvectors and eigenvalues). And if you are not careful, it is just as easy for all these nice figures to be fantasy pure and simple.

Being careful is this : you must be able to demonstrate that you have sufficiently sampled the molecule's configurational space during your simulation. The truth is that you never are (recall the time scales of the various types of macromolecular motion). Still, if (in a hypothetical case) your molecule has two major and relatively stable conformers, you may have sufficient sampling of the given conformers (but you will never know whether there are other conformations that you've never seen in your dynamics). In this case you may be able to demonstrate sufficient sampling for these conformers (clusters).

Demonstrating convergence of sampling is still an active research area (Google for "converge of sampling in protein simulations" and you'll find an interesting paper by B. Hess). Here I will only show two types of graphs that you can prepare with the output from carma (file carma.fluctuations.dat) and how these should look like in you have sufficient sampling (note that these indicators are necessary but not sufficient conditions).

The first type of graphs you would want to have a look at is a histogram of the fluctuations along the first few eigenvectors. This you do with xmgr (read block data with columns 1-2, or 1-3, or 1-4, …, and then produce a histogram from the 'Transformations' pull-down menu). These histograms should look like Gaussians centered on zero. Compare the following :

 Sufficient sampling, 1, 2, 3 eigenvector fluctuations histogram

 Insufficient sampling, 1, 2, 3 eigenvector fluctuations histogram

The second type of graph is produced by ploting the fluctuations of pairs of eigenvectors. The example that follows shows graphs for the combinations 1-2, 1-3 and 2-3 of eigenvectors. In the case of sufficient sampling, these should look as 2D Gaussians centered on (0,0). Compare these :

 Sufficient sampling, 1,2 1,3 2,3 eigenvector pairs

 Insufficient sampling, 1,2 1,3 2,3 eigenvector pairs

A better way to display these graphs would be to draw the distribution of density of points. This footnote tells you how1.

You could even try to examine a 3D distribution (you'll need a trick2 to convert fluctuations along three eigenvectors to x, y and z coordinates in a pseudo-pdb file). Compare these:

 Sufficient sampling, 1,2,3 eigenvectors 3D

 Insufficient sampling, 1,2,3 eigenvectors 3D


1. Use an editor to create a file containing the fluctuations along the two eigenvectors you are interested in. Then compile this program :

The program will read a file containing pairs of coordinates of points and will output an ASCII file containing density distribution matrix. Then use a gnuplot script to do the plotting :

giving something like

 sufficient sampling density distribution graph

2. If you have a file with three columns containing the fluctuations along the eigenvectors you want to plot, this tiny C program should prepare a PDB file for you (use input and output redirection to use it) :