Distribution Fitting and fitdistr()

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Distribution Fitting and fitdistr()

Lorenzo Isella
Dear All,
I think I understood how to use the fitdistr() function to fit random
draws to a certain distribution.
I first tested it on a unimodal Gaussian distribution:

rm(list=ls())
library(MASS)
set.seed(123)
mydata<-rnorm(10000,2.0,4.0)
gauden<-function(x,mu,sig)
{
1/(sqrt(2*pi)*sig)*exp(-1/(2*sig^2)*(x-mu)^2)
}
myfit<-fitdistr(mydata,gauden,start=list(mu=5.0,sig=5.0),upper=c(20,20))
myfit2<-fitdistr(mydata,"normal")

seeing the (expected) agreement between myfit and myfit2.

Then I tested it on a set of draws from a mixture made up of three
Gaussian distributions.
This time I used 9 fitting parameters:
- 3 weights N1,N2,N3 for the components of the Gaussian mixture
- 3 standard deviations sig1,sig2,sig3
- 3 means mu1,mu2,mu3

Here is the code I used:

rm(list=ls())
library(MASS)
set.seed(123)
NN<-10000
three<-3 #number of gaussian sequences I want to generate
mysample<-matrix(nrow=NN,ncol=three)
sdvec<-seq(1:three)
sdvec[1]<-4
sdvec[2]<-3
sdvec[3]<-3.5
meanvec<-seq(1:three)
meanvec[1]<-20
meanvec[2]<-50
meanvec[3]<-70

for (i in 1:three)
{
 mysample[ ,i] <-rnorm(NN,meanvec[i],sdvec[i])
}
dim(mysample)<-c(3*NN,1)
plot(density(mysample,kern="gaussian"),lwd=2,col=300)

mydistr<-function(x,mu1,sig1,N1,mu2,sig2,N2,mu3,sig3,N3)
{
N1/(sqrt(2*pi)*sig1)*exp(-1/(2*sig1^2)*(x-mu1)^2)+
N2/(sqrt(2*pi)*sig2)*exp(-1/(2*sig2^2)*(x-mu2)^2)+
N3/(sqrt(2*pi)*sig3)*exp(-1/(2*sig3^2)*(x-mu3)^2)
}
myfit<-fitdistr(mysample,mydistr,start=list(sig1=2,sig2=2,sig3=2,
N1=0.3,N2=0.3,N3=0.4,mu1=25,mu2=55,mu3=76))

But this time the fitdistr() is not able to carry out the optimization
and the warnings do not tell (me at least) that much.
Am I mistaking something?  Furthermore, how can I be sure that the
optimization will lead to weights such that N1+N2+N3=1?
I have not been able to try the mixdist package or any other tool, but
any suggestion about how to fit this threemodal distribution (which
should not be rocket science for someone more experienced than myself)
is welcome.
Best Regards

Lorenzo

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
Reply | Threaded
Open this post in threaded view
|

Re: Distribution Fitting and fitdistr()

Krishna Kumar-2
Lorenzo Isella wrote:

>Dear All,
>I think I understood how to use the fitdistr() function to fit random
>draws to a certain distribution.
>I first tested it on a unimodal Gaussian distribution:
>
>  
>

This only works for distributions that are specified below
 >>>
Distributions '"beta"', '"cauchy"', '"chi-squared"',
          '"exponential"', '"f"', '"gamma"', '"geometric"',
          '"log-normal"', '"lognormal"', '"logistic"', '"negative
          binomial"', '"normal"', '"Poisson"', '"t"' and '"weibull"'
          are recognised, case being ignored.
 >>>>>>>>

>Then I tested it on a set of draws from a mixture made up of three
>Gaussian distributions.
>This time I used 9 fitting parameters:
>- 3 weights N1,N2,N3 for the components of the Gaussian mixture
>- 3 standard deviations sig1,sig2,sig3
>- 3 means mu1,mu2,mu3
>
>  
>

The package nor1Mix has density , distribution and rng for mixtures but
no fitting so here is a  fitting function..

mydistr<-function(x,mu,sig,wt)
{
sum(wt*dnorm(x,mu,sig))
}

mydistrmle<-function(x, y = x) {

#mu goes from x1:3 sig goes x 4:6 and wt goes x 7:9
        f = -sum(log(mydistr(y, c(x[1],x[2],x[3]),    
c(x[4],x[5],x[6]),     c(x[7],x[8],x[9])  )))
        cat("\n Function value:  ", -f)
        cat("\n  Estimated parameters:       ",
c(x[1],x[2],x[3]),c(x[4],x[5],x[6]),c(x[7],x[8],x[9]), "\n")

        f
    }
   


#generate 1000 samples from norMix  mixture MW.nm2
x<-rnorMix(1000,MW.nm2)
hist(rnorMix(1000,MW.nm2))
#  Give it a starting values
mu<-c(-0.2,0.4,0.9)
sigma<-c(0.2, 0.2, 0.6)
wt<-c(0.3,0.3,0.2)
r<-optim(c(mu,sigma,wt),mydistrmle,control=list(maxit=50000))
print(matrix(r$par,3,3))
print(MW.nm2)

You will have to add additional constraints on the "wt" etc.  ( see
?constrOptim)

Further I don't think there is a guaranteed unique decomposition of
variance between the three gaussians. Just try different starting values
and also is there a reason to stop at 3 gaussians ?..

Krishna

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
Reply | Threaded
Open this post in threaded view
|

Re: Distribution Fitting and fitdistr()

Lorenzo Isella
On Mon, 2006-04-24 at 23:09 -0400, Krishna Kumar wrote:

> Lorenzo Isella wrote:
>
> >Dear All,
> >I think I understood how to use the fitdistr() function to fit random
> >draws to a certain distribution.
> >I first tested it on a unimodal Gaussian distribution:
> >
> >  
> >
>
> This only works for distributions that are specified below
>  >>>
> Distributions '"beta"', '"cauchy"', '"chi-squared"',
>           '"exponential"', '"f"', '"gamma"', '"geometric"',
>           '"log-normal"', '"lognormal"', '"logistic"', '"negative
>           binomial"', '"normal"', '"Poisson"', '"t"' and '"weibull"'
>           are recognised, case being ignored.
>  >>>>>>>>
>
> >Then I tested it on a set of draws from a mixture made up of three
> >Gaussian distributions.
> >This time I used 9 fitting parameters:
> >- 3 weights N1,N2,N3 for the components of the Gaussian mixture
> >- 3 standard deviations sig1,sig2,sig3
> >- 3 means mu1,mu2,mu3
> >
> >  
> >
>
> The package nor1Mix has density , distribution and rng for mixtures but
> no fitting so here is a  fitting function..
>
> mydistr<-function(x,mu,sig,wt)
> {
> sum(wt*dnorm(x,mu,sig))
> }
>
> mydistrmle<-function(x, y = x) {
>
> #mu goes from x1:3 sig goes x 4:6 and wt goes x 7:9
>         f = -sum(log(mydistr(y, c(x[1],x[2],x[3]),    
> c(x[4],x[5],x[6]),     c(x[7],x[8],x[9])  )))
>         cat("\n Function value:  ", -f)
>         cat("\n  Estimated parameters:       ",
> c(x[1],x[2],x[3]),c(x[4],x[5],x[6]),c(x[7],x[8],x[9]), "\n")
>
>         f
>     }
>    
>
>
> #generate 1000 samples from norMix  mixture MW.nm2
> x<-rnorMix(1000,MW.nm2)
> hist(rnorMix(1000,MW.nm2))
> #  Give it a starting values
> mu<-c(-0.2,0.4,0.9)
> sigma<-c(0.2, 0.2, 0.6)
> wt<-c(0.3,0.3,0.2)
> r<-optim(c(mu,sigma,wt),mydistrmle,control=list(maxit=50000))
> print(matrix(r$par,3,3))
> print(MW.nm2)
>
> You will have to add additional constraints on the "wt" etc.  ( see
> ?constrOptim)
>
> Further I don't think there is a guaranteed unique decomposition of
> variance between the three gaussians. Just try different starting values
> and also is there a reason to stop at 3 gaussians ?..
>
> Krishna

Thanks. There is not a particular reason to stop at three Gaussians
apart from the fact that I know that the data I will soon be given show
a 3 modal distribution.
Apart from implementing some extra constrains, are these sort of
problems numerically stable? I get plenty of warnings when I run your
code, although it executes.
Are those warnings anything to worry about or do they simply mean that
in some iterations the optimizer rejects the solution (though they seem
to deal with dnorm())?
Cheers

Lorenzo

_______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance