hclust(stats) merge matrix interpretation

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

hclust(stats) merge matrix interpretation

Tarun Kumar Singh
Hi,

We are trying to interpret the clusters generated by hclust method of R "stats"
package. The problem here is when i get the hc$order then there is some
order, while exporting to file that order is lost. Here is the example code
and their results:

> hc <- hclust(dist(USArrests), "ave")
> plot(hc)
> hc$label
 [1] "Alabama"        "Alaska"         "Arizona"        "Arkansas"
 [5] "California"     "Colorado"       "Connecticut"    "Delaware"
 [9] "Florida"        "Georgia"        "Hawaii"         "Idaho"
[13] "Illinois"       "Indiana"        "Iowa"           "Kansas"
[17] "Kentucky"       "Louisiana"      "Maine"          "Maryland"
[21] "Massachusetts"  "Michigan"       "Minnesota"      "Mississippi"
[25] "Missouri"       "Montana"        "Nebraska"       "Nevada"
[29] "New Hampshire"  "New Jersey"     "New Mexico"     "New York"
[33] "North Carolina" "North Dakota"   "Ohio"           "Oklahoma"
[37] "Oregon"         "Pennsylvania"   "Rhode Island"   "South Carolina"
[41] "South Dakota"   "Tennessee"      "Texas"          "Utah"
[45] "Vermont"        "Virginia"       "Washington"     "West Virginia"
[49] "Wisconsin"      "Wyoming"
> hc$order
 [1]  9 33  5 20  3 31  8  1 18 13 32 22 28  2 24 40 47 37 50 36 46 39 21 30
25
[26]  4 42 10  6 43 12 27 17 26 35 44 14 16  7 38 11 48 19 41 34 45 23 49 15
29
> hc$height
 [1]   2.291288   3.834058   3.929377   6.236986   6.637771   7.355270
 [7]   8.027453   8.537564  10.184218  10.736739  10.771175  11.456439
[13]  12.438692  12.614278  12.878100  13.044922  13.297368  13.352260
[19]  13.896043  14.501034  15.026107  15.122897  15.453120  15.454449
[25]  16.425489  16.891499  18.417331  18.993398  20.198479  20.598507
[31]  21.167192  22.595978  23.972143  26.363428  26.713777  27.779904
[37]  28.012211  28.095803  29.054195  33.117815  38.527912  39.394633
[43]  41.094765  44.283922  44.837933  54.746831  77.605024  89.232093
[49] 152.313999

> hc$merge
      [,1] [,2]
 [1,]  -15  -29
 [2,]  -17  -26
 [3,]  -14  -16
 [4,]  -13  -32
 [5,]  -35  -44
 [6,]  -36  -46
 [7,]   -7  -38
 [8,]  -19  -41
 [9,]  -49    1
[10,]  -50    6
[11,]  -48    8
[12,]  -21  -30
[13,]  -27    2
[14,]   -4  -42
[15,]  -37   10
[16,]  -34  -45
[17,]  -22  -28
[18,]    3    7
[19,]   -3  -31
[20,]   -6  -43
[21,]  -12   13
[22,]    5   18
[23,]  -20   19
[24,]   -1  -18
[25,]  -47   15
[26,]   -8   24
[27,]    4   17
[28,]  -23    9
[29,]  -25   14
[30,]   21   22
[31,]  -24  -40
[32,]  -39   12
[33,]  -10   20
[34,]   26   27
[35,]   25   32
[36,]   16   28
[37,]   -5   23
[38,]   -2   31
[39,]   29   33
[40,]   11   36
[41,]   -9  -33
[42,]   34   38
[43,]  -11   40
[44,]   37   42
[45,]   35   39
[46,]   30   43
[47,]   41   44
[48,]   45   46
[49,]   47   48

Plot generates a dendrogram with clustered nodes. Ideal solution for us
would be, a method which generates a matrix with distance attributes for
each node from the dendrogram.  Even if anyone could suggest a method such
that we could keep the hc$order structure intact.  It would help us a lot.

Second problem is the interpretation of the matrix which is generated by
"hc$merge" command.

Thanking You
-Tarun

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html