mclust: modelName="E" vs modelName="V"

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

mclust: modelName="E" vs modelName="V"

Nico902
Hi,

I'm trying to use the library mclust for gaussian mixture on a numeric vector. The function Mclust(data,G=3) is working fine but the fitting is not optimal and is using modelNames="E". When I'm trying Mclust(data,G=3,modelName="V") I have the following message:

Error in if (Sumry$G > 1) ans[c(orderedNames, "z")] else ans[orderedNames] :
  argument is of length zero
In addition: Warning message:
In pickBIC(object[as.character(G), modelNames, drop = FALSE], k = 3) :
  none of the selected models could be fitted


Using variable variance would fit my data better, any idea how to do it?

Thanks a lot.
Reply | Threaded
Open this post in threaded view
|

Re: mclust: modelName="E" vs modelName="V"

Christian Hennig
This normally happens if the algorithm gets caught in a solution where one
of the components has variance converging to zero.

One way of dealing with this is the use of a prior that penalises too
small variances. This works through the prior argument of Mclust (the
defaultPrior should do the trick but I currently don't have the time to
figure out again how to do this precisely; I have done it before with
success).

Another option is to have a look at the flexmix package.

Best regards,
Christian

On Sun, 4 Sep 2011, Nico902 wrote:

> Hi,
>
> I'm trying to use the library mclust for gaussian mixture on a numeric
> vector. The function Mclust(data,G=3) is working fine but the fitting is not
> optimal and is using modelNames="E". When I'm trying
> Mclust(data,G=3,modelName="V") I have the following message:
>
> Error in if (Sumry$G > 1) ans[c(orderedNames, "z")] else ans[orderedNames] :
>  argument is of length zero
> In addition: Warning message:
> In pickBIC(object[as.character(G), modelNames, drop = FALSE], k = 3) :
>  none of the selected models could be fitted
>
>
> Using variable variance would fit my data better, any idea how to do it?
>
> Thanks a lot.
>
> --
> View this message in context: http://r.789695.n4.nabble.com/mclust-modelName-E-vs-modelName-V-tp3789167p3789167.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[hidden email], www.homepages.ucl.ac.uk/~ucakche

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: mclust: modelName="E" vs modelName="V"

Nico902
Hi,

Thanks a lot for your answer. I effectively was able to get rid of this message by doing:

> resClust <- Mclust(data,G=3,modelName="V",prior=priorControl(scale=c(1.44,0.81,0.49)));


However, I would like to be able to retrieve the variances I defined in the result. I found:

> resClust$parameters
$Vinv
NULL

$pro
[1] 0.5502496 0.1986852 0.2510652

$mean
            1             2             3
-2.8390006980 -0.0003267873  3.1072574619

$variance
$variance$modelName
[1] "V"

$variance$d
[1] 1

$variance$G
[1] 3

$variance$sigmasq
[1] 0.840267666 0.009466821 1.510263146

$variance$scale
[1] 0.840267666 0.009466821 1.510263146


I do not manage to get where the sigmasq is coming from. I tried to sqrt or square the sigmasq but it does not correspond to what I defined. I found nothing in the manual. If I am missing something obvious or if somebody has the solution it will help me a lot. I want to retrieve those values automatically to plot the different curves of the fitting and to be sure this is doing what I want.

Thank you very much again.
Reply | Threaded
Open this post in threaded view
|

Re: mclust: modelName="E" vs modelName="V"

Christian Hennig
I probably don't understand problem. I'd assume that variance$sigmasq are
the three estimated component variances (probably estimated by maximum a
posteriori, but consult the mclust documentation).

What's wrong with that?

(The values you submit as scale in "prior" are not fixed variances, but
parameters of the prior distribtion - your problem may be that you
believe that they are meant to be variances fixed by you!?)

Christian

On Tue, 6 Sep 2011, Nico902 wrote:

> Hi,
>
> Thanks a lot for your answer. I effectively was able to get rid of this
> message by doing:
>
>> resClust <-
>> Mclust(data,G=3,modelName="V",prior=priorControl(scale=c(1.44,0.81,0.49)));
>
>
> However, I would like to be able to retrieve the variances I defined in the
> result. I found:
>
>> resClust$parameters
> $Vinv
> NULL
>
> $pro
> [1] 0.5502496 0.1986852 0.2510652
>
> $mean
>            1             2             3
> -2.8390006980 -0.0003267873  3.1072574619
>
> $variance
> $variance$modelName
> [1] "V"
>
> $variance$d
> [1] 1
>
> $variance$G
> [1] 3
>
> $variance$sigmasq
> [1] 0.840267666 0.009466821 1.510263146
>
> $variance$scale
> [1] 0.840267666 0.009466821 1.510263146
>
>
> I do not manage to get where the sigmasq is coming from. I tried to sqrt or
> square the sigmasq but it does not correspond to what I defined. I found
> nothing in the manual. If I am missing something obvious or if somebody has
> the solution it will help me a lot. I want to retrieve those values
> automatically to plot the different curves of the fitting and to be sure
> this is doing what I want.
>
> Thank you very much again.
>
> --
> View this message in context: http://r.789695.n4.nabble.com/mclust-modelName-E-vs-modelName-V-tp3789167p3793697.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[hidden email], www.homepages.ucl.ac.uk/~ucakche

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: mclust: modelName="E" vs modelName="V"

Nico902
"What's wrong with that?

(The values you submit as scale in "prior" are not fixed variances, but
parameters of the prior distribtion - your problem may be that you
believe that they are meant to be variances fixed by you!?)"

Yes I did, so I think it is not possible to fix the variance. Anyway, thanks a lot for your help, I think I will find a way to do it as I want.