calculating goodness-of-fit statistics

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

calculating goodness-of-fit statistics

Taka Matzmoto
Hi R users

I have a simple data for calculating goodness-of-fit statistics (e.g., X2 by
Pearson, G2 by Wilks)

#################################################
observed<-c(424,174,0,402)
expected<-c(282.7174, 314.2972, 142.3142, 260.6712)
2*sum(observed*log(observed/expected)) # for X2
sum((observed-expected)^2/expected)  # for G2
#################################################

(note. expected ones were calculating by a model I used, not by marginal of
observed ones.)

The third element of the observed vector is zero.

For third element, 0 * log(0/142.3142) is NaN. That is why I got NaN for G2.

I think 0 multiplied by anything should be zero. Am I wrong ?

Is there any R functions to correct zero cells for calculating G2? If there
is, I like to know some

references justifying the correction.

Thank you in advance

TM

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: calculating goodness-of-fit statistics

Pontarelli, Brett
The trouble is log(0/anything) = log(0) = NaN.  If you want them to evaulate to zero you might try zeroing out the values you know will be NaN:

X2 = 2*sum(observed*log(observed/expected));
X2[observed==0] = 0;

--Brett
 

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Taka Matzmoto
Sent: Tuesday, January 31, 2006 6:03 PM
To: [hidden email]
Subject: [R] calculating goodness-of-fit statistics

Hi R users

I have a simple data for calculating goodness-of-fit statistics (e.g., X2 by Pearson, G2 by Wilks)

#################################################
observed<-c(424,174,0,402)
expected<-c(282.7174, 314.2972, 142.3142, 260.6712)
2*sum(observed*log(observed/expected)) # for X2
sum((observed-expected)^2/expected)  # for G2 #################################################

(note. expected ones were calculating by a model I used, not by marginal of observed ones.)

The third element of the observed vector is zero.

For third element, 0 * log(0/142.3142) is NaN. That is why I got NaN for G2.

I think 0 multiplied by anything should be zero. Am I wrong ?

Is there any R functions to correct zero cells for calculating G2? If there is, I like to know some

references justifying the correction.

Thank you in advance

TM

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: calculating goodness-of-fit statistics

Brian Ripley
On Tue, 31 Jan 2006, Pontarelli, Brett wrote:

> The trouble is log(0/anything) = log(0) = NaN.

Hmm: log(0) = -Inf.

Taka Matzmoto said

> I think 0 multiplied by anything should be zero. Am I wrong ?

Yes!  0 * Inf = NaN, 0 * -Inf = NaN, and 0 * NaN = NaN.

Conventionally 0 log0 = 0, since this is the limit of x log x as x -> 0.
That is also what is appropriate in the G^2 formula since it refers to a
Poisson(0).

> If you want them to evaulate to zero you might try zeroing out the
values you know will be NaN:
>
> X2 = 2*sum(observed*log(observed/expected));
> X2[observed==0] = 0;

A trick from way back by Bill Venables is to use pmax(observed, 1) inside
the log.

> --Brett
>
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Taka Matzmoto
> Sent: Tuesday, January 31, 2006 6:03 PM
> To: [hidden email]
> Subject: [R] calculating goodness-of-fit statistics
>
> Hi R users
>
> I have a simple data for calculating goodness-of-fit statistics (e.g.,
> X2 by Pearson, G2 by Wilks)

You have these labelled backwards!

> #################################################
> observed<-c(424,174,0,402)
> expected<-c(282.7174, 314.2972, 142.3142, 260.6712)
> 2*sum(observed*log(observed/expected)) # for X2
> sum((observed-expected)^2/expected)  # for G2
> #################################################
>
> (note. expected ones were calculating by a model I used, not by marginal of observed ones.)
>
> The third element of the observed vector is zero.
>
> For third element, 0 * log(0/142.3142) is NaN. That is why I got NaN for G2.
>
> I think 0 multiplied by anything should be zero. Am I wrong ?
>
> Is there any R functions to correct zero cells for calculating G2? If there is, I like to know some
>
> references justifying the correction.
>
> Thank you in advance
>
> TM
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html