# cluster analysis with pairwise data

## cluster analysis with pairwise data

 Hello, I want to do a cluster analysis with my data. The problem is, that the variables dont't consist of single value but the entries are pairs of values. That lokks like this: Variable 1:    Variable2:      Variable3:  .    .    . (1,2)          (1,5)           (4,2) (7,8)          (3,88)          (6,5) (4,7)          (12,4)          (4,4) .               .              . .               .              . .               .              . Is it possible to perform a cluster-analysis with this kind of data in R ? I dont even know how to get this data in a matrix or a dada-frame or anything like this. It would be really nice if somebody could help me. Best regards and happy Easter Claudia
## Re: cluster analysis with pairwise data

 You can create distance matrices for each Variable, square them, sum them, and take the square root. As for getting the data into a data frame, the simplest would be to enter the three variables into six columns like the following: data      [,1] [,2] [,3] [,4] [,5] [,6] [1,]    1    2    1    5    4    2 [2,]    7    8    3   88    6    5 [3,]    4    7   12    4    4    4 Then use dist() on each pair of columns: 1:2, 3:4, 5:6 . . . e.g. for the 3 rows of data you provided size <- nrow(data)*(nrow(data)-1)/2 dm <- dist(rep(0, size)) for(i in seq(1, 6, 2)) {   dm <- dm + dist(data[,i:(i+1)])^2 } dm <- sqrt(dm) dm ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of paladini Sent: Wednesday, April 04, 2012 6:32 AM To: [hidden email] Subject: [R] cluster analysis with pairwise data Hello, I want to do a cluster analysis with my data. The problem is, that the variables dont't consist of single value but the entries are pairs of values. That lokks like this: Variable 1:    Variable2:      Variable3:  .    .    . (1,2)          (1,5)           (4,2) (7,8)          (3,88)          (6,5) (4,7)          (12,4)          (4,4) .               .              . .               .              . .               .              . Is it possible to perform a cluster-analysis with this kind of data in R ? I dont even know how to get this data in a matrix or a dada-frame or anything like this. It would be really nice if somebody could help me. Best regards and happy Easter Claudia
## Re: cluster analysis with pairwise data

 On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote: > Hello, > I want to do a cluster analysis with my data. The problem is, that the > variables dont't consist of single value but the entries are pairs of > values. > That lokks like this: > > > Variable 1:    Variable2:      Variable3:  .    .    . > (1,2)          (1,5)           (4,2) > (7,8)          (3,88)          (6,5) > (4,7)          (12,4)          (4,4) > .               .              . > .               .              . > .               .              . > Is it possible to perform a cluster-analysis with this kind of data in > R ? > I dont even know how to get this data in a matrix or a dada-frame or > anything like this. Hi. The data as they are may be read into R as character data. The exact way depends on the format of the data in the file. The result may look like the following.   Var1 <- c("(1,2)", "(7,8)", "(4,7)")   Var2 <- c("(1,5)", "(3,88)", "(12,4)")   Var3 <- c("(4,2)", "(6,5)", "(4,4)")   DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE) If you want to use a distance between pairs depending on the numbers (and not only equal/different pair), then the data should to be transformed to a numeric format. For example, as follows   trans <- function(x)   {       y <- strsplit(gsub("[()]", "", x), ",")       unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0))))   }   DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3))   DF     Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1   1      1      2      1      5        4        2   2      7      8      3     88        6        5   3      4      7     12      4        4        4 Then, see library(help=cluster). Hope this helps. Petr Savicky.