Clustering and Rand Index

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Clustering and Rand Index

Mark Hempelmann
Dear WizaRds,

I am trying to compute the (adjusted) Rand Index in order to comprehend
the variable selection heuristic (VS-KM) according to Brusco/ Cradit
2001 (Psychometrika 66 No.2 p.249-270, 2001).

Unfortunately, I am unable to correctly use
cl_ensemble and cl_agreement (package: clue). Here is what I am trying
to do:

library(clue)

## Let p1..p4 be four partitions of the kind

p1=c(1,1,1,2,2,2,3,3,3)
p2=c(1,1,1,3,2,2,3,3,2)
p3=c(1,2,1,3,1,3,1,3,2)
p4=c(1,2,1,3,1,3,1,3,2)

Each object within the partitions is assigned to cluster 1,2,3
respectively. Now I have to create a cl_ensemble object, so that I can
calculate the Rand index:

ens <- cl_ensemble(list=c(p1,p2,p3,p4))

which only leads to
"Ensemble elements must be all partitions or all hierarchies."

Although I understand that p1..p4 are vectors in this example, they
represent the partitions I want to use. I don't know how to create the
necessary partition object in order to transform it into an ensemble
object, so that I can run cl_agreement - so much transformation, so
little time...

I have also tried to work around this prbl, creating partitions via
k-means, but I do not get the same partitions I need to validate. I am
sure the following algorithm needs improvement, especially the use of
putting matrices into a list through a for loop (ouch) - I am very
grateful for your comments of improving this terrible piece of R-work
(is it easier to do sthg with apply?).

Thank you very much for your help and support
Mark

mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2,
15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T )
rownames(mat) <- paste("v", 1:4, sep="" )

clus.mat <- vector(mode="list", length=4)
for (i in 1:4){
        clus.mat[[i]] <- kmeans(mat[i,], centers=3, nstart=1,
algorithm="MacQueen") ## run kmeans on each row (clustering per single
variable)
}

clus.mat

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Clustering and Rand Index

Christian Hennig
Hi Mark,

I don't have the time at the moment to work through your code - but the
adjusted Rand index can be computed by function clusterstats in package
fpc.

Best,
Christian

On Sat, 7 Jan 2006, Mark Hempelmann wrote:

> Dear WizaRds,
>
> I am trying to compute the (adjusted) Rand Index in order to comprehend
> the variable selection heuristic (VS-KM) according to Brusco/ Cradit
> 2001 (Psychometrika 66 No.2 p.249-270, 2001).
>
> Unfortunately, I am unable to correctly use
> cl_ensemble and cl_agreement (package: clue). Here is what I am trying
> to do:
>
> library(clue)
>
> ## Let p1..p4 be four partitions of the kind
>
> p1=c(1,1,1,2,2,2,3,3,3)
> p2=c(1,1,1,3,2,2,3,3,2)
> p3=c(1,2,1,3,1,3,1,3,2)
> p4=c(1,2,1,3,1,3,1,3,2)
>
> Each object within the partitions is assigned to cluster 1,2,3
> respectively. Now I have to create a cl_ensemble object, so that I can
> calculate the Rand index:
>
> ens <- cl_ensemble(list=c(p1,p2,p3,p4))
>
> which only leads to
> "Ensemble elements must be all partitions or all hierarchies."
>
> Although I understand that p1..p4 are vectors in this example, they
> represent the partitions I want to use. I don't know how to create the
> necessary partition object in order to transform it into an ensemble
> object, so that I can run cl_agreement - so much transformation, so
> little time...
>
> I have also tried to work around this prbl, creating partitions via
> k-means, but I do not get the same partitions I need to validate. I am
> sure the following algorithm needs improvement, especially the use of
> putting matrices into a list through a for loop (ouch) - I am very
> grateful for your comments of improving this terrible piece of R-work
> (is it easier to do sthg with apply?).
>
> Thank you very much for your help and support
> Mark
>
> mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2,
> 15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T )
> rownames(mat) <- paste("v", 1:4, sep="" )
>
> clus.mat <- vector(mode="list", length=4)
> for (i in 1:4){
> clus.mat[[i]] <- kmeans(mat[i,], centers=3, nstart=1,
> algorithm="MacQueen") ## run kmeans on each row (clustering per single
> variable)
> }
>
> clus.mat
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[hidden email], www.homepages.ucl.ac.uk/~ucakche

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html