So, now you have your crossDCD.matrix. Lets make a dendrogram based on these rmsds (which will be treated as distances).
We will use the statistical package R [1]. For the example shown below I will assume that your matrix is 460x460 (do a wc crossDCD.matrix to determine the actual dimension (if your matrix is square)) :
# R R : Copyright 2004, The R Foundation for Statistical Computing Version 2.0.1 (2004-11-15), ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for a HTML browser interface to help. Type 'q()' to quit R. > A <- matrix(scan("crossDCD.matrix", n = 460*460), 460, 460, byrow = TRUE) Read 211600 items > hc <- hclust( dist(A), method="complete" ) > ls () [1] "A" "hc" > hc Call: hclust(d = dist(A), method = "complete") Cluster method : complete Distance : euclidean Number of objects: 460 > plclust(hc) > q() Save workspace image? [y/n/c]: n
giving something like :
If you want to save, say a postscript, of your image, while in R and before you type any of the commands type:
> postscript()
This way, you open the device to print or save postscripts. When you are done with your work, type:
> dev.off()
to close the device and finish the postscript. Of cource, there are numerous options you can play with. If you insist on changing your postscript's properties ask N.M.G for R's manual.
From this dendrogram you could possibly make an educated guess as to how many distinctive clusters you have and then you can use a more advanced clustering method to define them (note however that there are proper statistical methods to help you decide the number of clusters). Using fuzzy clustering and assuming 3 clusters it will be something like :
# R R : Copyright 2004, The R Foundation for Statistical Computing Version 2.0.1 (2004-11-15), ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for a HTML browser interface to help. Type 'q()' to quit R. > A <- matrix(scan("crossDCD.matrix", n = 460*460), 460, 460, byrow = TRUE) Read 211600 items > library(cluster) > f <- fanny(A,3) > summary(f) iterations objective 33.000 1035.766 Membership coefficients: [,1] [,2] [,3] [1,] 0.53150274 0.2677229 0.2007743 [2,] 0.53087205 0.2679982 0.2011298 [3,] 0.53662002 0.2649609 0.1984191 ..................................... [459,] 0.11053637 0.3214276 0.5680360 [460,] 0.11896391 0.3286576 0.5523785 Coefficients: dunn_coeff normalized 0.4334923 0.1502384 Closest hard clustering: [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [112] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [186] 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [223] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [260] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 [297] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [334] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [371] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 [408] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 2 3 3 3 3 3 [445] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Silhouette plot information: cluster neighbor sil_width 76 1 2 5.967160e-01 122 1 2 5.956911e-01 121 1 2 5.956660e-01 79 1 2 5.918693e-01 119 1 2 5.910336e-01 65 1 2 5.907392e-01 113 1 2 5.887318e-01 120 1 2 5.881414e-01 .................................. 435 3 2 7.393940e-02 440 3 2 5.069895e-02 288 3 2 -1.797243e-05 Average silhouette width per cluster: [1] 0.4781967 0.4395254 0.4025648 Average silhouette width of total data set: [1] 0.4414223 105570 dissimilarities, summarized : Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2123 8.5210 14.6000 14.5900 20.5500 30.2600 Metric : euclidean Number of objects : 460 Available components: [1] "membership" "coeff" "clustering" "objective" "diss" [6] "call" "silinfo" "data" > plot(f) Hit <Return> to see next plot: Hit <Return> to see next plot: > q() Save workspace image? [y/n/c]: n
giving :