|
Hello,
I want to do a cluster analysis with my data. The problem is, that the variables dont't consist of single value but the entries are pairs of values. That lokks like this: Variable 1: Variable2: Variable3: . . . (1,2) (1,5) (4,2) (7,8) (3,88) (6,5) (4,7) (12,4) (4,4) . . . . . . . . . Is it possible to perform a cluster-analysis with this kind of data in R ? I dont even know how to get this data in a matrix or a dada-frame or anything like this. It would be really nice if somebody could help me. Best regards and happy Easter Claudia ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
You can create distance matrices for each Variable, square them, sum them,
and take the square root. As for getting the data into a data frame, the simplest would be to enter the three variables into six columns like the following: data [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 2 1 5 4 2 [2,] 7 8 3 88 6 5 [3,] 4 7 12 4 4 4 Then use dist() on each pair of columns: 1:2, 3:4, 5:6 . . . e.g. for the 3 rows of data you provided size <- nrow(data)*(nrow(data)-1)/2 dm <- dist(rep(0, size)) for(i in seq(1, 6, 2)) { dm <- dm + dist(data[,i:(i+1)])^2 } dm <- sqrt(dm) dm ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: [hidden email] [mailto:[hidden email]] On Behalf Of paladini Sent: Wednesday, April 04, 2012 6:32 AM To: [hidden email] Subject: [R] cluster analysis with pairwise data Hello, I want to do a cluster analysis with my data. The problem is, that the variables dont't consist of single value but the entries are pairs of values. That lokks like this: Variable 1: Variable2: Variable3: . . . (1,2) (1,5) (4,2) (7,8) (3,88) (6,5) (4,7) (12,4) (4,4) . . . . . . . . . Is it possible to perform a cluster-analysis with this kind of data in R ? I dont even know how to get this data in a matrix or a dada-frame or anything like this. It would be really nice if somebody could help me. Best regards and happy Easter Claudia ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by paladini-2
On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:
> Hello, > I want to do a cluster analysis with my data. The problem is, that the > variables dont't consist of single value but the entries are pairs of > values. > That lokks like this: > > > Variable 1: Variable2: Variable3: . . . > (1,2) (1,5) (4,2) > (7,8) (3,88) (6,5) > (4,7) (12,4) (4,4) > . . . > . . . > . . . > Is it possible to perform a cluster-analysis with this kind of data in > R ? > I dont even know how to get this data in a matrix or a dada-frame or > anything like this. Hi. The data as they are may be read into R as character data. The exact way depends on the format of the data in the file. The result may look like the following. Var1 <- c("(1,2)", "(7,8)", "(4,7)") Var2 <- c("(1,5)", "(3,88)", "(12,4)") Var3 <- c("(4,2)", "(6,5)", "(4,4)") DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE) If you want to use a distance between pairs depending on the numbers (and not only equal/different pair), then the data should to be transformed to a numeric format. For example, as follows trans <- function(x) { y <- strsplit(gsub("[()]", "", x), ",") unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0)))) } DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3)) DF Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1 1 1 2 1 5 4 2 2 7 8 3 88 6 5 3 4 7 12 4 4 4 Then, see library(help=cluster). Hope this helps. Petr Savicky. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Wed, Apr 4, 2012 at 10:12 AM, Petr Savicky <[hidden email]> wrote:
> On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote: > Var1 <- c("(1,2)", "(7,8)", "(4,7)") > Var2 <- c("(1,5)", "(3,88)", "(12,4)") > Var3 <- c("(4,2)", "(6,5)", "(4,4)") > DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE) > > If you want to use a distance between pairs depending on the > numbers (and not only equal/different pair), then the data should > to be transformed to a numeric format. Or if the pairs have unique meaning ?daisy , also in the cluster package, comes in handy (in this case you'll want to keep Vi as factors in the call to DF). Cheers For example, as follows > > trans <- function(x) > { > y <- strsplit(gsub("[()]", "", x), ",") > unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0)))) > } > > DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3)) > DF > > Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1 > 1 1 2 1 5 4 2 > 2 7 8 3 88 6 5 > 3 4 7 12 4 4 4 > > Then, see library(help=cluster). > > Hope this helps. > > Petr Savicky. > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
