cluster analysis in R

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

cluster analysis in R

KitKat
I have two issues.

1-I am trying to use morphology to identify gender. I have 9 variables, both continuous and categorical. I was using two-step cluster analysis in SPSS because two-step could deal with different types of variables. But the output tells me that an animal is in cluster 1 or 2, it does not give me a probability (ex. 0.70 cluster 2).  I also did not want to specify that I want two clusters, I wanted to see if analysis would naturally give me two clusters. These were all advantages to using SPSS but now I'm having trouble.

Does cluster analysis in R give probabilities?
Which type of cluster analysis in R is best to use? I did not think hierarchical analysis was a great choice, but maybe I'm wrong. I don't want to create the average variable, I want the analysis to do it on its own.
I'm also new to R so would have to figure out the right codes to enter, etc.

2-I was also told to analyze each variable on its own before including it in cluster analysis. I had first included them all then teased out which ones were not important, but now have been asked to do the reverse. I cannot do cluster analysis on one variable -for example, one variable is either present or absent on an individual so of course cluster analysis gives me two clusters, one representing present and one representing absent. I was told to use regression, but how can regression also not give the same result? I feel like it would give me a line connecting a bunch of 0s to 1s. I don't know what to use, or if I can analyze each variable like this before putting them into cluster analysis. I ultimately want to only use the smallest number of variables necessary to identify gender.

I have tried reading manuals etc and talking to people at my school, but nothing has helped. If anyone has any insight, that would be much appreciated
Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: cluster analysis in R

Ingmar Visser
Dear KitKat,

After installing R and reading some introductory material on getting
started with R you may want to check the CRAN task view on cluster analysis:
http://cran.r-project.org/web/views/Cluster.html
which has many useful references to all kinds and flavors of clustering
techniques, hierarchical or not, selecting the nr of clusters based on some
model selection statistic, et cetera.

hth, Ingmar

On Thu, Nov 15, 2012 at 7:14 PM, KitKat <[hidden email]> wrote:

> I have two issues.
>
> 1-I am trying to use morphology to identify gender. I have 9 variables,
> both
> continuous and categorical. I was using two-step cluster analysis in SPSS
> because two-step could deal with different types of variables. But the
> output tells me that an animal is in cluster 1 or 2, it does not give me a
> probability (ex. 0.70 cluster 2).  I also did not want to specify that I
> want two clusters, I wanted to see if analysis would naturally give me two
> clusters. These were all advantages to using SPSS but now I'm having
> trouble.
>
> Does cluster analysis in R give probabilities?
> Which type of cluster analysis in R is best to use? I did not think
> hierarchical analysis was a great choice, but maybe I'm wrong. I don't want
> to create the average variable, I want the analysis to do it on its own.
> I'm also new to R so would have to figure out the right codes to enter,
> etc.
>
> 2-I was also told to analyze each variable on its own before including it
> in
> cluster analysis. I had first included them all then teased out which ones
> were not important, but now have been asked to do the reverse. I cannot do
> cluster analysis on one variable -for example, one variable is either
> present or absent on an individual so of course cluster analysis gives me
> two clusters, one representing present and one representing absent. I was
> told to use regression, but how can regression also not give the same
> result? I feel like it would give me a line connecting a bunch of 0s to 1s.
> I don't know what to use, or if I can analyze each variable like this
> before
> putting them into cluster analysis. I ultimately want to only use the
> smallest number of variables necessary to identify gender.
>
> I have tried reading manuals etc and talking to people at my school, but
> nothing has helped. If anyone has any insight, that would be much
> appreciated
> Thank you!
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cluster analysis in R

Jose Iparraguirre
Have a look at the package mclust.
Jose
________________________________________
From: [hidden email] [[hidden email]] On Behalf Of Ingmar Visser [[hidden email]]
Sent: 15 November 2012 21:10
To: KitKat
Cc: [hidden email]
Subject: Re: [R] cluster analysis in R

Dear KitKat,

After installing R and reading some introductory material on getting
started with R you may want to check the CRAN task view on cluster analysis:
http://cran.r-project.org/web/views/Cluster.html
which has many useful references to all kinds and flavors of clustering
techniques, hierarchical or not, selecting the nr of clusters based on some
model selection statistic, et cetera.

hth, Ingmar

On Thu, Nov 15, 2012 at 7:14 PM, KitKat <[hidden email]> wrote:

> I have two issues.
>
> 1-I am trying to use morphology to identify gender. I have 9 variables,
> both
> continuous and categorical. I was using two-step cluster analysis in SPSS
> because two-step could deal with different types of variables. But the
> output tells me that an animal is in cluster 1 or 2, it does not give me a
> probability (ex. 0.70 cluster 2).  I also did not want to specify that I
> want two clusters, I wanted to see if analysis would naturally give me two
> clusters. These were all advantages to using SPSS but now I'm having
> trouble.
>
> Does cluster analysis in R give probabilities?
> Which type of cluster analysis in R is best to use? I did not think
> hierarchical analysis was a great choice, but maybe I'm wrong. I don't want
> to create the average variable, I want the analysis to do it on its own.
> I'm also new to R so would have to figure out the right codes to enter,
> etc.
>
> 2-I was also told to analyze each variable on its own before including it
> in
> cluster analysis. I had first included them all then teased out which ones
> were not important, but now have been asked to do the reverse. I cannot do
> cluster analysis on one variable -for example, one variable is either
> present or absent on an individual so of course cluster analysis gives me
> two clusters, one representing present and one representing absent. I was
> told to use regression, but how can regression also not give the same
> result? I feel like it would give me a line connecting a bunch of 0s to 1s.
> I don't know what to use, or if I can analyze each variable like this
> before
> putting them into cluster analysis. I ultimately want to only use the
> smallest number of variables necessary to identify gender.
>
> I have tried reading manuals etc and talking to people at my school, but
> nothing has helped. If anyone has any insight, that would be much
> appreciated
> Thank you!
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Wrap Up & Run 10k next March to raise vital funds for Age UK

Six exciting new 10k races are taking place throughout the country and we want you to join in the fun! Whether you're a runner or not, these are
events are for everyone ~ from walking groups to serious athletes. The Age UK Events Team will provide you with a training plan to suit your
level and lots of tips to make this your first successful challenge of 2012. Beat the January blues and raise some vital funds to help us
prevent avoidable deaths amongst older people this winter.


Sign up now! www.ageuk.org.uk/10k

Coming to; London Crystal Palace, Southport, Tatton Park, Cheshire Harewood House, Leeds,Coventry, Exeter


Age UK Improving later life
www.ageuk.org.uk


 

-------------------------------
Age UK is a registered charity and company limited by guarantee, (registered charity number 1128267, registered company number 6825798).
Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA.

For the purposes of promoting Age UK Insurance, Age UK is an Appointed Representative of Age UK Enterprises Limited, Age UK is an Introducer
Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth Access for the purposes of introducing potential annuity and health
cash plans customers respectively.  Age UK Enterprises Limited, JLT Benefit Solutions Limited and Simplyhealth Access are all authorised and
regulated by the Financial Services Authority.
------------------------------

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are
addressed. If you receive a message in error, please advise the sender and delete immediately.

Except where this email is sent in the usual course of our business, any opinions expressed in this email are those of the author and do not
necessarily reflect the opinions of Age UK or its subsidiaries and associated companies. Age UK monitors all e-mail transmissions passing
through its network and may block or modify mails which are deemed to be unsuitable.

Age Concern England (charity number 261794) and Help the Aged (charity number 272786) and their trading and other associated companies merged
on 1st April 2009.  Together they have formed the Age UK Group, dedicated to improving the lives of people in later life.  The three national
Age Concerns in Scotland, Northern Ireland and Wales have also merged with Help the Aged in these nations to form three registered charities:
Age Scotland, Age NI, Age Cymru.










______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cluster analysis in R

Hennig, Christian
In reply to this post by KitKat
Dear Katherine,

function flexmixedruns in package fpc may do what you want; it fits mixtures with continuous and categorical variables, can use the BIC for giving you the number of mixture components and also gives you posterior probabilities for cases to belong to components.

Note that generally finding the right cluster analysis method is a complicated task and depends crucially on your application, what use you want to make of the clusters etc., so what's best cannot be conclusively said on a mailing list. The same holds for whether and how to select variables. Certainly it's not wrong in general to use all the variables that you have but whether it's better otherwise depends on what meaning your variables have and how this relates to the aim of clustering, what to do with the variables afterwards etc.

You may have a look at
http://www.rss.org.uk/site/cms/contentviewarticle.asp?article=866#Link%20to%20Nov.%202012%20paper
where I discuss a number of related issues.

Best regards,
Christian


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[hidden email], www.homepages.ucl.ac.uk/~ucakche

________________________________________
From: [hidden email] [[hidden email]] on behalf of KitKat [[hidden email]]
Sent: 15 November 2012 18:14
To: [hidden email]
Subject: [R] cluster analysis in R

I have two issues.

1-I am trying to use morphology to identify gender. I have 9 variables, both
continuous and categorical. I was using two-step cluster analysis in SPSS
because two-step could deal with different types of variables. But the
output tells me that an animal is in cluster 1 or 2, it does not give me a
probability (ex. 0.70 cluster 2).  I also did not want to specify that I
want two clusters, I wanted to see if analysis would naturally give me two
clusters. These were all advantages to using SPSS but now I'm having
trouble.

Does cluster analysis in R give probabilities?
Which type of cluster analysis in R is best to use? I did not think
hierarchical analysis was a great choice, but maybe I'm wrong. I don't want
to create the average variable, I want the analysis to do it on its own.
I'm also new to R so would have to figure out the right codes to enter, etc.

2-I was also told to analyze each variable on its own before including it in
cluster analysis. I had first included them all then teased out which ones
were not important, but now have been asked to do the reverse. I cannot do
cluster analysis on one variable -for example, one variable is either
present or absent on an individual so of course cluster analysis gives me
two clusters, one representing present and one representing absent. I was
told to use regression, but how can regression also not give the same
result? I feel like it would give me a line connecting a bunch of 0s to 1s.
I don't know what to use, or if I can analyze each variable like this before
putting them into cluster analysis. I ultimately want to only use the
smallest number of variables necessary to identify gender.

I have tried reading manuals etc and talking to people at my school, but
nothing has helped. If anyone has any insight, that would be much
appreciated
Thank you!



--
View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cluster analysis in R

KitKat
In reply to this post by KitKat
Thank you for replying!
I made a new post asking if there are any websites or files on how to download package mclust (or other Bayesian cluster analysis packages) and the appropriate R functions? Sorry I don't know how this forum works yet
Reply | Threaded
Open this post in threaded view
|

Re: cluster analysis in R

signal


http://cran.r-project.org/web/views/Cluster.html

might be a good start

Brian

On Nov 21, 2012, at 1:36 PM, KitKat wrote:

> Thank you for replying!
> I made a new post asking if there are any websites or files on how to
> download package mclust (or other Bayesian cluster analysis packages) and
> the appropriate R functions? Sorry I don't know how this forum works yet
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635p4650341.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cluster analysis in R

KitKat
Thanks, I have been trying that site and another one (http://www.statmethods.net/advstats/cluster.html)

I don't know if I should be doing mclust or mcclust, but either way, the codes are not working. I am following the guidelines online at:
mcclust - http://cran.r-project.org/web/packages/mcclust/mcclust.pdf
mclust - http://cran.r-project.org/

I am relatively new to R, but so far I have been able to figure out dfa, manova, pca... I cannot get these codes to work, I keep getting various errors. Are there other resources that have details about what codes to use or what to do when errors result? I have not found anything else helpful

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: cluster analysis in R

Ingmar Visser
It's hard to answer these questions without knowing what the errors are and
how they can be reproduced.
Best, Ingmar

On Thu, Nov 22, 2012 at 1:03 AM, KitKat <[hidden email]> wrote:

> Thanks, I have been trying that site and another one
> (http://www.statmethods.net/advstats/cluster.html)
>
> I don't know if I should be doing mclust or mcclust, but either way, the
> codes are not working. I am following the guidelines online at:
> mcclust - http://cran.r-project.org/web/packages/mcclust/mcclust.pdf
> mclust - http://cran.r-project.org/
>
> I am relatively new to R, but so far I have been able to figure out dfa,
> manova, pca... I cannot get these codes to work, I keep getting various
> errors. Are there other resources that have details about what codes to use
> or what to do when errors result? I have not found anything else helpful
>
> Thank you
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/cluster-analysis-in-R-tp4649635p4650397.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: cluster analysis in R

KitKat
In reply to this post by KitKat
These are the errors I've been having. I have been trying 3 different things

1- Mclust:
This is the example I have been following:
# Model Based Clustering
library(mclust)
fit <- Mclust(mydata)
plot(fit, mydata) # plot results
print(fit) # display the best model
 
What I have done:
> fit <- Mclust(mydat)
> plot(fit, mydat) #plot results
Error in match.arg(what, c("BIC", "classification", "uncertainty", "density"),  :
  'arg' must be NULL or a character vector

2- Mclust using different website (cran-r) instructions
This is the example:
> mydatMclust <- Mclust(mydat)
> summary(mydatMclust)
> summary(mydatMclust, parameters = TRUE)
> plot(mydatMclust)

There are a couple other steps but the plot is the problem. I get two plots, there should be four. One should be plotting all my individuals but it's plotting my variables instead. It's also taking a very long time. R script at this point says: "Waiting to confirm page changeā€¦ "

3. Mcclust
Instructions from cran-r:
data(cls.draw2)
# sample of 500 clusterings from a Bayesian cluster model
tru.class <- rep(1:8,each=50)
# the true grouping of the observations
psm2 <- comp.psm(cls.draw2)
# posterior similarity matrix
# optimize criteria based on PSM
mbind2 <- minbinder(psm2)
mpear2 <- maxpear(psm2)
# Relabelling
k <- apply(cls.draw2,1, function(cl) length(table(cl)))
max.k <- as.numeric(names(table(k))[which.max(table(k))])
relab2 <- relabel(cls.draw2[k==max.k,])
# compare clusterings found by different methods with true grouping
arandi(mpear2$cl, tru.class)
arandi(mbind2$cl, tru.class)
arandi(relab2$cl, tru.class)

I called my data: mydat so I changed that where appropriate. I cannot get past one early step, psm2 <- comp.psm(cls.draw2).. the error reads: "Error: could not find function "comp.psm""

I think I have all appropriate packages installed. I don't know what more to do on these three errors.  Any help would be great! Thank you