Quantcast

cluster analysis with pairwise data

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

cluster analysis with pairwise data

paladini-2
Hello,
I want to do a cluster analysis with my data. The problem is, that the
variables dont't consist of single value but the entries are pairs of
values.
That lokks like this:


Variable 1:    Variable2:      Variable3:  .    .    .
(1,2)          (1,5)           (4,2)
(7,8)          (3,88)          (6,5)
(4,7)          (12,4)          (4,4)
.               .              .
.               .              .
.               .              .
Is it possible to perform a cluster-analysis with this kind of data in
R ?
I dont even know how to get this data in a matrix or a dada-frame or
anything like this.

It would be really nice if somebody could help me.

Best regards and happy Easter

Claudia

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: cluster analysis with pairwise data

David Carlson
You can create distance matrices for each Variable, square them, sum them,
and take the square root. As for getting the data into a data frame, the
simplest would be to enter the three variables into six columns like the
following:

data
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    1    5    4    2
[2,]    7    8    3   88    6    5
[3,]    4    7   12    4    4    4

Then use dist() on each pair of columns:

1:2, 3:4, 5:6 . . .

e.g. for the 3 rows of data you provided

size <- nrow(data)*(nrow(data)-1)/2
dm <- dist(rep(0, size))
for(i in seq(1, 6, 2)) {
  dm <- dm + dist(data[,i:(i+1)])^2
}
dm <- sqrt(dm)
dm

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352



-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On
Behalf Of paladini
Sent: Wednesday, April 04, 2012 6:32 AM
To: [hidden email]
Subject: [R] cluster analysis with pairwise data

Hello,
I want to do a cluster analysis with my data. The problem is, that the
variables dont't consist of single value but the entries are pairs of
values.
That lokks like this:


Variable 1:    Variable2:      Variable3:  .    .    .
(1,2)          (1,5)           (4,2)
(7,8)          (3,88)          (6,5)
(4,7)          (12,4)          (4,4)
.               .              .
.               .              .
.               .              .
Is it possible to perform a cluster-analysis with this kind of data in
R ?
I dont even know how to get this data in a matrix or a dada-frame or
anything like this.

It would be really nice if somebody could help me.

Best regards and happy Easter

Claudia

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: cluster analysis with pairwise data

Petr Savicky
In reply to this post by paladini-2
On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:

> Hello,
> I want to do a cluster analysis with my data. The problem is, that the
> variables dont't consist of single value but the entries are pairs of
> values.
> That lokks like this:
>
>
> Variable 1:    Variable2:      Variable3:  .    .    .
> (1,2)          (1,5)           (4,2)
> (7,8)          (3,88)          (6,5)
> (4,7)          (12,4)          (4,4)
> .               .              .
> .               .              .
> .               .              .
> Is it possible to perform a cluster-analysis with this kind of data in
> R ?
> I dont even know how to get this data in a matrix or a dada-frame or
> anything like this.

Hi.

The data as they are may be read into R as character data. The
exact way depends on the format of the data in the file. The
result may look like the following.

  Var1 <- c("(1,2)", "(7,8)", "(4,7)")
  Var2 <- c("(1,5)", "(3,88)", "(12,4)")
  Var3 <- c("(4,2)", "(6,5)", "(4,4)")
  DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE)

If you want to use a distance between pairs depending on the
numbers (and not only equal/different pair), then the data should
to be transformed to a numeric format. For example, as follows

  trans <- function(x)
  {
      y <- strsplit(gsub("[()]", "", x), ",")
      unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0))))
  }

  DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3))
  DF

    Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1
  1      1      2      1      5        4        2
  2      7      8      3     88        6        5
  3      4      7     12      4        4        4

Then, see library(help=cluster).

Hope this helps.

Petr Savicky.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: cluster analysis with pairwise data

ilai-2
On Wed, Apr 4, 2012 at 10:12 AM, Petr Savicky <[hidden email]> wrote:
> On Wed, Apr 04, 2012 at 01:32:10PM +0200, paladini wrote:

>  Var1 <- c("(1,2)", "(7,8)", "(4,7)")
>  Var2 <- c("(1,5)", "(3,88)", "(12,4)")
>  Var3 <- c("(4,2)", "(6,5)", "(4,4)")
>  DF <- data.frame(Var1, Var2, Var3, stringsAsFactors=FALSE)
>
> If you want to use a distance between pairs depending on the
> numbers (and not only equal/different pair), then the data should
> to be transformed to a numeric format.

Or if the pairs have unique meaning ?daisy , also in the cluster
package, comes in handy (in this case you'll want to keep Vi as
factors in the call to DF).

Cheers

For example, as follows

>
>  trans <- function(x)
>  {
>      y <- strsplit(gsub("[()]", "", x), ",")
>      unname(t(vapply(y, FUN=as.numeric, FUN.VALUE=c(0, 0))))
>  }
>
>  DF <- data.frame(Var1=trans(Var1), Var2=trans(Var2), Var2=trans(Var3))
>  DF
>
>    Var1.1 Var1.2 Var2.1 Var2.2 Var2.1.1 Var2.2.1
>  1      1      2      1      5        4        2
>  2      7      8      3     88        6        5
>  3      4      7     12      4        4        4
>
> Then, see library(help=cluster).
>
> Hope this helps.
>
> Petr Savicky.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...