Quantcast

boot() with glm/gnm on a contingency table

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

boot() with glm/gnm on a contingency table

Milan Bouchet-Valat
Hi everyone!

In a package I'm developing, I have created a custom function to get
jackknife standard errors for the parameters of a gnm model (which is
essentially the same as a glm model for this issue). I'd like to add
support for bootstrap using package boot, but I couldn't find how to
proceed.

The problem is, my data is a table object. Thus, I don't have one
individual per line: when the object is converted to a data frame, one
row represents one cell, or one combination of factor levels. I cannot
pass this to boot() as the "data" argument and use "indices" from my
custom statistic() function, since I would drop cells, not individual
observations.

A very inefficient solution would be to create a data frame with one row
per observation, by replicating each cell using its frequencies. That's
really a brute force solution, though. ;-)

The other way would be generate importance weights based on observed
frequencies, and to multiply the original data by the weights at each
iteration, but I'm not sure that's correct. Thoughts?


Thanks for your help!

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: boot() with glm/gnm on a contingency table

Tim Hesterberg-2
One approach is to bootstrap the vector 1:n, where n is the number
of individuals, with a function that does:
f <- function(vectorOfIndices, theTable) {
  (1) create a new table with the same dimensions, but with the counts
  in the table based on vectorOfIndices.
  (2) Calculate the statistics of interest on the new table.
}

When f is called with 1:n, the table it creates should be the same
as the original table.  When called with a bootstrap sample of
values from 1:n, it should create a table corresponding to the
bootstrap sample.

Tim Hesterberg
http://www.timhesterberg.net
 (resampling, water bottle rockets, computers to Costa Rica, shower = 2650 light bulbs, ...)

NEW!  Mathematical Statistics with Resampling and R, Chihara & Hesterberg
http://www.amazon.com/Mathematical-Statistics-Resampling-Laura-Chihara/dp/1118029852/ref=sr_1_1?ie=UTF8

>Hi everyone!
>
>In a package I'm developing, I have created a custom function to get
>jackknife standard errors for the parameters of a gnm model (which is
>essentially the same as a glm model for this issue). I'd like to add
>support for bootstrap using package boot, but I couldn't find how to
>proceed.
>
>The problem is, my data is a table object. Thus, I don't have one
>individual per line: when the object is converted to a data frame, one
>row represents one cell, or one combination of factor levels. I cannot
>pass this to boot() as the "data" argument and use "indices" from my
>custom statistic() function, since I would drop cells, not individual
>observations.
>
>A very inefficient solution would be to create a data frame with one row
>per observation, by replicating each cell using its frequencies. That's
>really a brute force solution, though. ;-)
>
>The other way would be generate importance weights based on observed
>frequencies, and to multiply the original data by the weights at each
>iteration, but I'm not sure that's correct. Thoughts?
>
>
>Thanks for your help!

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: boot() with glm/gnm on a contingency table

Milan Bouchet-Valat
Le mercredi 12 septembre 2012 à 07:08 -0700, Tim Hesterberg a écrit :

> One approach is to bootstrap the vector 1:n, where n is the number
> of individuals, with a function that does:
> f <- function(vectorOfIndices, theTable) {
>   (1) create a new table with the same dimensions, but with the counts
>   in the table based on vectorOfIndices.
>   (2) Calculate the statistics of interest on the new table.
> }
>
> When f is called with 1:n, the table it creates should be the same
> as the original table.  When called with a bootstrap sample of
> values from 1:n, it should create a table corresponding to the
> bootstrap sample.
Indeed, that's another solution I considered, but I wanted to be sure
nothing more reasonable exists. You're right that it's more efficient
than replicating the whole data set. But still, with a typical table of
less than 100 cells and several thousands of observations, this means
creating a potentially long vector, much larger than the original data;
nothing really hard with common machines, to be sure.

If no other way exists, I'll use this. Thanks.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: boot() with glm/gnm on a contingency table

Tim Hesterberg-2
>Le mercredi 12 septembre 2012 à 07:08 -0700, Tim Hesterberg a écrit :
>> One approach is to bootstrap the vector 1:n, where n is the number
>> of individuals, with a function that does:
>> f <- function(vectorOfIndices, theTable) {
>>   (1) create a new table with the same dimensions, but with the counts
>>   in the table based on vectorOfIndices.
>>   (2) Calculate the statistics of interest on the new table.
>> }
>>
>> When f is called with 1:n, the table it creates should be the same
>> as the original table.  When called with a bootstrap sample of
>> values from 1:n, it should create a table corresponding to the
>> bootstrap sample.
>Indeed, that's another solution I considered, but I wanted to be sure
>nothing more reasonable exists. You're right that it's more efficient
>than replicating the whole data set. But still, with a typical table of
>less than 100 cells and several thousands of observations, this means
>creating a potentially long vector, much larger than the original data;
>nothing really hard with common machines, to be sure.
>
>If no other way exists, I'll use this. Thanks.

In your original posting you also suggested:
>>>The other way would be generate importance weights based on observed
>>>frequencies, and to multiply the original data by the weights at each
>>>iteration, but I'm not sure that's correct. Thoughts?

You could do:

bootstrapTable <- x  # where x is the original table
for(i in numberOfBootstrapSamples) {
  bootstrapTable[] <- rmultinom(1, size = sum(x), prob = x)
  replicate[i] <- myFunction(bootstrapTable)
}
# caveat - not tested


I can't tell from help(boot) whether you could do it correctly there.
boot has a 'weights' argument that you could use for the sampling
probabilities, but you also need a way to tell it to draw sum(x)
observations.  Or, you could also pass boot a "parametric" sampler.  
But be careful if you use boot in either of these ways; you not only
need to generate the bootstrap samples, you also need to make sure
that it is does all other calculations correctly, including
calculating the statistic for the original data, calculating jackknife
statistics if they are used for confidence intervals, etc.

Wistful sigh - this would be pretty easy to do with S+Resample.

Tim Hesterberg

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: boot() with glm/gnm on a contingency table

Milan Bouchet-Valat
In reply to this post by Tim Hesterberg-2
Le mercredi 12 septembre 2012 à 07:08 -0700, Tim Hesterberg a écrit :

> One approach is to bootstrap the vector 1:n, where n is the number
> of individuals, with a function that does:
> f <- function(vectorOfIndices, theTable) {
>   (1) create a new table with the same dimensions, but with the counts
>   in the table based on vectorOfIndices.
>   (2) Calculate the statistics of interest on the new table.
> }
>
> When f is called with 1:n, the table it creates should be the same
> as the original table.  When called with a bootstrap sample of
> values from 1:n, it should create a table corresponding to the
> bootstrap sample.
If anybody is interested, I've finally taken this way, the function
described above being implemented as below. The idea is to assign an
index to each observation, and identify which cell the observation comes
from using the cumulative sum. Instead of going over all indices and
adding incrementing the corresponding cell count for each, I decided to
start with the original data, decrementing the counts for missing
indices, and incrementing it for duplicates. There are probably better
implementations, but performance-wise it seems good enough.

# tab is a table object
f <- function(tab, indices) {
  cs <- cumsum(tab)

  # Remove missing observations
  for(i in setdiff(1:sum(tab), indices)) {
      index <- min(which(i <= cs))
      tab[index] <- tab[index] - 1
  }

  # Add duplicate observations
  for(i in indices[duplicated(indices)]) {
      index <- min(which(i <= cs))
      tab[index] <- tab[index] + 1
  }
}


Thanks for the pointers!

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...