question regarding using weights in the hierarchical/ kmeans clustering process

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

question regarding using weights in the hierarchical/ kmeans clustering process

Hi R users!

I have a bit of a problem with using an hierarchical clustering algorithm:

 b<-rep(seq(1:3), 5)
 c<-rnorm(15, 0,1)
 d<-c(sample(1:100, 15, replace=T))
 e<-c(sample(1:100, 15, replace=T))
 f<-c(sample(1:100, 15, replace=T))
 q<-data.frame(data$d, data$e, data$f)

What i want to do is to use an hierarchical cluster analysis on q data.frame, but using data$c as a weighting variable, could it be done? or is there a package that would  let me use my weights in the clustering process, but an hierarchical process?

Another question:
say i wanted to t.test data$d, data$e but having again data$c as weights, how could it be done?

and the last 2 questions:
1. how can i weight a whole dataframe in order for me to keep my weights for a specific analysis, like cluster or t.test or any other analysis that does not let me incorporate a "weight" option? I am looking for something like in spss where i can weight a whole data frame and use it for a subsequent analysis, or something like the survey package from R but one that offers flexibility to use any analysis that i want (i saw that survey package offers limited connectivity  to such  analyses )
 2. why does a kmeans cluster analysis offer a  multitude  of different results?
I tried both several times
>cclust(scale(q), 3, verbose=T)
>kmeans(scale(q), 3)
 and they both seem vary unstable even with this small data.frame with respect to the cluster sizing, and i don't  know why? Does it always behave  like this ?

Thank you and have a great day!!


        [[alternative HTML version deleted]]

[hidden email] mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.