Axis/Ticks/Scale

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Axis/Ticks/Scale

R.C.GILL

Dear All,

Apologies for this simple question and thanks in advance for any help given.

Suppose I wanted to plot 1 million observations and produce the command

plot(rnorm(1000000))

The labels of the xaxis are 0, e+00 2 e+05 etc. These are clearly not very
attractive (The plots are for a PhD. thesis).

I would like the axes to be 0,2,4,6,8,10 with a *10^5 on the right hand
side.

Is there a simple command for this?

Best Wishes

Roger

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Axis/Ticks/Scale

jholtman
Here is one way to do it with a smaller set of data, but the 'range' is the
same:


> x <- c(1,1000,1000000)
> y <- pretty(range(x))
> y
[1] 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06
> plot(x,1:3,xaxt='n', xlab="X * 10^5")
> axis(1, at=y, labels=y/100000)
>



On 12/28/05, [hidden email] <[hidden email]> wrote:

>
>
> Dear All,
>
> Apologies for this simple question and thanks in advance for any help
> given.
>
> Suppose I wanted to plot 1 million observations and produce the command
>
> plot(rnorm(1000000))
>
> The labels of the xaxis are 0, e+00 2 e+05 etc. These are clearly not very
> attractive (The plots are for a PhD. thesis).
>
> I would like the axes to be 0,2,4,6,8,10 with a *10^5 on the right hand
> side.
>
> Is there a simple command for this?
>
> Best Wishes
>
> Roger
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>



--
Jim Holtman
Cincinnati, OH
+1 513 247 0281

What the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Axis/Ticks/Scale

P Ehlers
In reply to this post by R.C.GILL
Try this:

x <- c(1, 1e6); y <- 0:1
par(mar = c(5, 4, 4, 5) + 0.1)  # make room at the right
plot(x, y, axes = FALSE)
box()
axis(2)
axis(1, at = 0:5 * 2 * 1e5, labels = 0:5 * 2)
mtext(text = expression(phantom(0)%*%10^5),
       side = 1, line = 1, at = 11.0 * 1e5)

Peter Ehlers

[hidden email] wrote:

> Dear All,
>
> Apologies for this simple question and thanks in advance for any help given.
>
> Suppose I wanted to plot 1 million observations and produce the command
>
> plot(rnorm(1000000))
>
> The labels of the xaxis are 0, e+00 2 e+05 etc. These are clearly not very
> attractive (The plots are for a PhD. thesis).
>
> I would like the axes to be 0,2,4,6,8,10 with a *10^5 on the right hand
> side.
>
> Is there a simple command for this?
>
> Best Wishes
>
> Roger
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Axis/Ticks/Scale

Marc Schwartz (via MN)
In reply to this post by R.C.GILL
On Wed, 2005-12-28 at 20:15 +0000, [hidden email] wrote:

> Dear All,
>
> Apologies for this simple question and thanks in advance for any help
> given.
>
> Suppose I wanted to plot 1 million observations and produce the
> command
>
> plot(rnorm(1000000))
>
> The labels of the xaxis are 0, e+00 2 e+05 etc. These are clearly not
> very
> attractive (The plots are for a PhD. thesis).
>
> I would like the axes to be 0,2,4,6,8,10 with a *10^5 on the right
> hand
> side.
>
> Is there a simple command for this?
>
> Best Wishes
>
> Roger


See ?plotmath for some additional examples and there are some others in
the list archives.

 set.seed(1)
 x <- rnorm(1000000)

 # Now do the plot, but leave the x axis blank
 plot(x, xaxt = "n")

 # Set the x axis label tick marks
 x.at <- seq(0, 10, 2) * 10 ^ 5

 # Create the expressions for the tick mark labels
 # Using parse() takes the character vectors from paste()
 # and converts them to expressions for use in plotmath
 x.lab <- parse(text = paste(seq(0, 10, 2), "%*% 10^5"))

 # Now do the axis labels
 axis(1, at = x.at, labels = x.lab)


HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Axis/Ticks/Scale

Martin Maechler
>>>>> "Marc" == Marc Schwartz (via MN) <[hidden email]>
>>>>>     on Wed, 28 Dec 2005 15:46:37 -0600 writes:

    Marc> On Wed, 2005-12-28 at 20:15 +0000,
    Marc> [hidden email] wrote:
    >> Dear All,
    >>
    >> Apologies for this simple question and thanks in advance
    >> for any help given.
    >>
    >> Suppose I wanted to plot 1 million observations and
    >> produce the command
    >>
    >> plot(rnorm(1000000))
    >>
    >> The labels of the xaxis are 0, e+00 2 e+05 etc. These are
    >> clearly not very attractive (The plots are for a
    >> PhD. thesis).
    >>
    >> I would like the axes to be 0,2,4,6,8,10 with a *10^5 on
    >> the right hand side.
    >>
    >> Is there a simple command for this?
    >>
    >> Best Wishes
    >>
    >> Roger


    Marc> See ?plotmath for some additional examples and there
    Marc> are some others in the list archives.

Yes, I think this one is there too:
It has the "* 10^k" after each number;
the nice thing about it is that it works for all kind of data
-- and of course, in principle it could be built into R ...



###----------------- Do "a 10^k" labeling instead of "a e<k>" ---

axTexpr <- function(side, at = axTicks(side, axp=axp, usr=usr, log=log),
                    axp = NULL, usr = NULL, log = NULL)
{
    ## Purpose: Do "a 10^k" labeling instead of "a e<k>"
    ##      this auxiliary should return 'at' and 'label' (expression)
    ## ----------------------------------------------------------------------
    ## Arguments: as for axTicks()
    ## ----------------------------------------------------------------------
    ## Author: Martin Maechler, Date:  7 May 2004, 18:01
    eT <- floor(log10(abs(at)))# at == 0 case is dealt with below
    mT <- at / 10^eT
    ss <- lapply(seq(along = at),
                 function(i) if(at[i] == 0) quote(0) else
                 substitute(A %*% 10^E, list(A=mT[i], E=eT[i])))
    do.call("expression", ss)
}


x <- 1e7*(-10:50)
y <- dnorm(x, m=10e7, s=20e7)
plot(x,y)
## ^^^^^^ not so nice; ok, try

par(mar=.1+c(5,5,4,1))## << For the horizontal y-axis labels, need more space
plot(x,y, axes= FALSE, frame=TRUE)
aX <- axTicks(1); axis(1, at=aX, label= axTexpr(1, aX))
if(FALSE) # rather the next one
{ aY <- axTicks(2); axis(2, at=aY, label= axTexpr(2, aY))}
## or rather (horizontal labels on y-axis):
aY <- axTicks(2); axis(2, at=aY, label= axTexpr(2, aY), las=2)

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Axis/Ticks/Scale

Marc Schwartz (via MN)
On Thu, 2005-12-29 at 22:06 +0100, Martin Maechler wrote:

> >>>>> "Marc" == Marc Schwartz (via MN) <[hidden email]>
> >>>>>     on Wed, 28 Dec 2005 15:46:37 -0600 writes:
>
>     Marc> On Wed, 2005-12-28 at 20:15 +0000,
>     Marc> [hidden email] wrote:
>     >> Dear All,
>     >>
>     >> Apologies for this simple question and thanks in advance
>     >> for any help given.
>     >>
>     >> Suppose I wanted to plot 1 million observations and
>     >> produce the command
>     >>
>     >> plot(rnorm(1000000))
>     >>
>     >> The labels of the xaxis are 0, e+00 2 e+05 etc. These are
>     >> clearly not very attractive (The plots are for a
>     >> PhD. thesis).
>     >>
>     >> I would like the axes to be 0,2,4,6,8,10 with a *10^5 on
>     >> the right hand side.
>     >>
>     >> Is there a simple command for this?
>     >>
>     >> Best Wishes
>     >>
>     >> Roger
>
>
>     Marc> See ?plotmath for some additional examples and there
>     Marc> are some others in the list archives.
>
> Yes, I think this one is there too:
> It has the "* 10^k" after each number;
> the nice thing about it is that it works for all kind of data
> -- and of course, in principle it could be built into R ...
>
>
>
> ###----------------- Do "a 10^k" labeling instead of "a e<k>" ---
>
> axTexpr <- function(side, at = axTicks(side, axp=axp, usr=usr, log=log),
>                     axp = NULL, usr = NULL, log = NULL)
> {
>     ## Purpose: Do "a 10^k" labeling instead of "a e<k>"
>     ##      this auxiliary should return 'at' and 'label' (expression)
>     ## ----------------------------------------------------------------------
>     ## Arguments: as for axTicks()
>     ## ----------------------------------------------------------------------
>     ## Author: Martin Maechler, Date:  7 May 2004, 18:01
>     eT <- floor(log10(abs(at)))# at == 0 case is dealt with below
>     mT <- at / 10^eT
>     ss <- lapply(seq(along = at),
>                  function(i) if(at[i] == 0) quote(0) else
>                  substitute(A %*% 10^E, list(A=mT[i], E=eT[i])))
>     do.call("expression", ss)
> }
>
>
> x <- 1e7*(-10:50)
> y <- dnorm(x, m=10e7, s=20e7)
> plot(x,y)
> ## ^^^^^^ not so nice; ok, try
>
> par(mar=.1+c(5,5,4,1))## << For the horizontal y-axis labels, need more space
> plot(x,y, axes= FALSE, frame=TRUE)
> aX <- axTicks(1); axis(1, at=aX, label= axTexpr(1, aX))
> if(FALSE) # rather the next one
> { aY <- axTicks(2); axis(2, at=aY, label= axTexpr(2, aY))}
> ## or rather (horizontal labels on y-axis):
> aY <- axTicks(2); axis(2, at=aY, label= axTexpr(2, aY), las=2)


Nice Martin!

I do like that.  I also like the handling of zero, which I realized
after sending my reply, thus should have used:

  x <- rnorm(1000000)
  plot(x, xaxt = "n")
  x.at <- seq(0, 10, 2) * 10 ^ 5

  # Handle the zero here this time
  x.lab <- parse(text = paste(seq(2, 10, 2), "%*% 10^5"))
  axis(1, at = x.at, labels = c(0, x.lab))


BTW, on your approach, it was here:

  http://finzi.psych.upenn.edu/R/Rhelp02a/archive/36462.html

and more recently, here:

  http://finzi.psych.upenn.edu/R/Rhelp02a/archive/57011.html

:-)

Best regards and Happy New Year,

Marc

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Axis/Ticks/Scale

Jim Lemon-2
In reply to this post by Martin Maechler
[hidden email] wrote:
> ...
> I would like the axes to be 0,2,4,6,8,10 with a *10^5 on
> the right hand side.
>  
> Is there a simple command for this?

This post so interested me that I wrote the following function:

axis.mult<-function(side=1,at=NULL,labels,mult=1,mult.label,mult.line,
  mult.labelpos=NULL,...) {
  if(is.null(at)) at<-axTicks(side)
  if(missing(labels)) labels<-at/mult
  axis(side,at,labels,...)
  if(missing(mult.label)) mult.label<-paste("x",mult,collapse="")
  # multiplier position defaults to centered on the outside
  if(is.null(mult.labelpos)) mult.labelpos<-side
  edges<-par("usr")
  if(side %% 2) {
   # either top or bottom
   if(mult.labelpos %% 2) {
    adj<-0.5
    at<-(edges[1]+edges[2])/2
    if(missing(mult.line)) mult.line<-ifelse(mult.labelpos == side,3,0)
   }
   else {
    adj<-ifelse(mult.labelpos == 2,1,0)
    at<-ifelse(mult.labelpos == 2,edges[1],edges[2])
    if(missing(mult.line)) mult.line<-1
   }
  }
  else {
   # either left or right
   if(mult.labelpos %% 2) {
    adj<-ifelse(mult.labelpos == 1,1,0)
    at<-ifelse(mult.labelpos == 1,edges[3],edges[4])
    if(missing(mult.line)) mult.line<-1
   }
   else {
    adj<-0.5
    at<-(edges[3]+edges[4])/2
    if(missing(mult.line)) mult.line=ifelse(mult.labelpos == side,3,0)
   }
  }
  mtext(mult.label,side,mult.line,at=at,adj=adj)
}

which will be in the next (v2.0.1) version of plotrix.

Jim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

A difficulty with boot package

justin bem
Hi,
   
  I have a difficulty with the bootstrap procedure in boot package. How can I specify the size of sample at each bootstrap ?
  I use
        >myboot<-boot(data,boot.fun,R=300)
  when I display myboot$t I get a vector with just one value the than the compute statistic in my data file after reading about bootstrap in the book MASS I add
       >set.seed(101)
  But I get the same result. than mean than at each boot the sample draw is exactly the data file. How can I do resolve this ?
   
  Sincerly !

               
---------------------------------

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A difficulty with boot package

Marc Schwartz (via MN)
On Fri, 2005-12-30 at 18:37 +0100, justin bem wrote:

> Hi,
>    
>   I have a difficulty with the bootstrap procedure in boot package.
> How can I specify the size of sample at each bootstrap ?
>   I use
>         >myboot<-boot(data,boot.fun,R=300)
>   when I display myboot$t I get a vector with just one value the than
> the compute statistic in my data file after reading about bootstrap in
> the book MASS I add
>        >set.seed(101)
>   But I get the same result. than mean than at each boot the sample
> draw is exactly the data file. How can I do resolve this ?
>    
>   Sincerly !

Justin,

First, when creating a new post, please do not do so by replying to an
existing post. This screws up the threading in the list archive, making
it difficult for folks to search for your post and replies, when the
thread ends up being embedded in another unrelated one.

Second, with bootstrapping, the bootstrap samples are done using
sampling with replacement from the original sample. Thus, the bootstrap
replicates are, by default, the same size as the original sample.

You might want to look at the examples in ?sample, specifically the
resample() function created there, for some additional insight.

Using set.seed() simply allows for the ability to repeat the same
sequence of sampling randomizations. It has no effect on the replicate
sample size.

In terms of what you are seeing with respect to only getting a single
statistic value, I suspect that there is a problem in how you have
defined your statistic function.

Using the example from page 134 in MASS4:

  library(MASS)
  library(boot)

  set.seed(101)
  gal.boot <- boot(galaxies, function(x, i) median(x[i]), R = 1000)

  > str(gal.boot$t)
   num [1:1000, 1] 20930 21492 20196 20179 21184 ...

Here 't' is of length 1000, the same as R.

The median function is passed the galaxies data as 'x' and a set of
sampled indices 'i'. Thus, x[i] is the randomly selected replicate
passed to median(). Each replicate has 82 elements (the length of
galaxies) and this is repeated 1000 times.

Note that by default, the 'statistic' function in boot() is defined to
take two arguments, 'data' and 'i'. Since median() does not use these,
we create the boot() function call as above. We could have done it as:

my.median <- function(data, i)
{
  median(data[i])
}

set.seed(101)
my.boot <- boot(galaxies, my.median, R = 1000)


# Now let's compare the resultant boot objects
> all.equal(gal.boot, my.boot)
[1] "Component 6: target, current don't match when deparsed"
[2] "Component 8: target, current don't match when deparsed"


They are identical except for the 'statistic' and 'call' elements, which
simply reflects the different statistic function expressions used.

The key is that the statistic function (in a routine bootstrap such as
these) takes two arguments. It could take more (see below).

If you want to get a feel for how each of the 1000 replicates looks, you
could use the boot.array() function:
 
  boot.array(gal.boot, indices = TRUE)

boot.array() with 'indices = TRUE' will return a matrix of values
representing each replicate as a row (for 1000 rows in this case) and
the _indices_ of the values from galaxies (1:82) that was used in each
replicate (row).

With 'indices = FALSE' (the default here), boot.array() will give you
information pertaining to how frequently each of the 82 elements in
galaxies was used in each replicate. Keep in mind that this is sampling
WITH replacement, so some values will be used more than once in each
replicate.

For example:

  > table(boot.array(gal.boot, indices = FALSE))

      0     1     2     3     4     5     6     7
  30027 30317 15134  4998  1248   233    37     6


This shows that the 82 values in galaxies were used anywhere from 0 to 7
times in any given replicate. The sum() of the above is 82000 of course
and each replicate does have 82 elements:

# Get the sum of each row from the boot.array() call
# above and create a table
> table(rowSums(boot.array(gal.boot, indices = FALSE)))

  82
1000



If you actually wanted to generate the replicates used, you could do
something like:

  matrix(galaxies[boot.array(gal.boot, indices = TRUE)],
         ncol = 82, byrow = TRUE)

which would return a 1000 x 82 matrix with the actual galaxies data as
sampled in the 1000 replicates.

In terms of defining the sample size in each replicate, there is an
example in Davison and Hinkley's book (cited in ?boot) and for which the
boot package is supporting software. The example is on page 528 and
provides an approach for specifying the sample size for the replicates,
if that it what you truly want to do. This example uses the 'city' data:

> city
     u   x
1  138 143
2   93 104
3   61  69
4  179 260
5   48  75
6   37  63
7   29  50
8   23  48
9   30 111
10   2  50

city.subset <- function(data, i, n = 10)
{
  # This step takes the sampled indices
  # and returns the first 'n' of them.
  # Then subsets 'data' using these, in
  # this case, using the first 'n' rows of
  # all of the sampled rows in the city dataset
  d <- data[i[1:n], ]

  # Now the statistic is calculated on a
  # replicate with a sample size of 'n'
  mean(d[, 2]) / mean(d[, 1])
}

# Let's do the boot with 200 replicates
# each with a sample size of 5. 'n' is passed as
# part of the "..." argument in boot().
city.boot <- boot(city, city.subset, R = 200, n = 5)


Note that the 'statistic' function here takes three arguments:

  data: this will be 'city'

  i: sampled indices

  n: replicate sample size


As pointed out in D&H, the boot.array() function would need to be
adjusted here to work properly:

  boot.array(city.boot, indices = TRUE)[, 1:5]

so that you only return the first 5 values for each replicate.

There are important considerations if your replicates are sized less
than the original sample size, so I would not do this lightly and would
recommend securing a copy of D&H to cover some of this area.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: A difficulty with boot package

justin bem

Thanks you very much for you answer.  The mistake was in the statistic function
  I didn't specificy indices
 
  Best wishes to all R users and R core team
 
  Sincerly.
 
 
Marc Schwartz <[hidden email]> a √©crit :  On Fri, 2005-12-30 at 18:37 +0100, justin bem wrote:

> Hi,
>    
>   I have a difficulty with the bootstrap procedure in boot package.
> How can I specify the size of sample at each bootstrap ?
>   I use
>         >myboot<-boot(data,boot.fun,R=300)
>   when I display myboot$t I get a vector with just one value the than
> the compute statistic in my data file after reading about bootstrap in
> the book MASS I add
>        >set.seed(101)
>   But I get the same result. than mean than at each boot the sample
> draw is exactly the data file. How can I do resolve this ?
>    
>   Sincerly !
Justin,

First, when creating a new post, please do not do so by replying to an
existing post. This screws up the threading in the list archive, making
it difficult for folks to search for your post and replies, when the
thread ends up being embedded in another unrelated one.

Second, with bootstrapping, the bootstrap samples are done using
sampling with replacement from the original sample. Thus, the bootstrap
replicates are, by default, the same size as the original sample.

You might want to look at the examples in ?sample, specifically the
resample() function created there, for some additional insight.

Using set.seed() simply allows for the ability to repeat the same
sequence of sampling randomizations. It has no effect on the replicate
sample size.

In terms of what you are seeing with respect to only getting a single
statistic value, I suspect that there is a problem in how you have
defined your statistic function.

Using the example from page 134 in MASS4:

  library(MASS)
  library(boot)

  set.seed(101)
  gal.boot <- boot(galaxies, function(x, i) median(x[i]), R = 1000)

  > str(gal.boot$t)
   num [1:1000, 1] 20930 21492 20196 20179 21184 ...

Here 't' is of length 1000, the same as R.

The median function is passed the galaxies data as 'x' and a set of
sampled indices 'i'. Thus, x[i] is the randomly selected replicate
passed to median(). Each replicate has 82 elements (the length of
galaxies) and this is repeated 1000 times.

Note that by default, the 'statistic' function in boot() is defined to
take two arguments, 'data' and 'i'. Since median() does not use these,
we create the boot() function call as above. We could have done it as:

my.median <- function(data, i)
{
  median(data[i])
}

set.seed(101)
my.boot <- boot(galaxies, my.median, R = 1000)


# Now let's compare the resultant boot objects
> all.equal(gal.boot, my.boot)
[1] "Component 6: target, current don't match when deparsed"
[2] "Component 8: target, current don't match when deparsed"


They are identical except for the 'statistic' and 'call' elements, which
simply reflects the different statistic function expressions used.

The key is that the statistic function (in a routine bootstrap such as
these) takes two arguments. It could take more (see below).

If you want to get a feel for how each of the 1000 replicates looks, you
could use the boot.array() function:
 
  boot.array(gal.boot, indices = TRUE)

boot.array() with 'indices = TRUE' will return a matrix of values
representing each replicate as a row (for 1000 rows in this case) and
the _indices_ of the values from galaxies (1:82) that was used in each
replicate (row).

With 'indices = FALSE' (the default here), boot.array() will give you
information pertaining to how frequently each of the 82 elements in
galaxies was used in each replicate. Keep in mind that this is sampling
WITH replacement, so some values will be used more than once in each
replicate.

For example:

  > table(boot.array(gal.boot, indices = FALSE))

      0     1     2     3     4     5     6     7
  30027 30317 15134  4998  1248   233    37     6


This shows that the 82 values in galaxies were used anywhere from 0 to 7
times in any given replicate. The sum() of the above is 82000 of course
and each replicate does have 82 elements:

# Get the sum of each row from the boot.array() call
# above and create a table
> table(rowSums(boot.array(gal.boot, indices = FALSE)))

  82
1000



If you actually wanted to generate the replicates used, you could do
something like:

  matrix(galaxies[boot.array(gal.boot, indices = TRUE)],
         ncol = 82, byrow = TRUE)

which would return a 1000 x 82 matrix with the actual galaxies data as
sampled in the 1000 replicates.

In terms of defining the sample size in each replicate, there is an
example in Davison and Hinkley's book (cited in ?boot) and for which the
boot package is supporting software. The example is on page 528 and
provides an approach for specifying the sample size for the replicates,
if that it what you truly want to do. This example uses the 'city' data:

> city
     u   x
1  138 143
2   93 104
3   61  69
4  179 260
5   48  75
6   37  63
7   29  50
8   23  48
9   30 111
10   2  50

city.subset <- function(data, i, n = 10)
{
  # This step takes the sampled indices
  # and returns the first 'n' of them.
  # Then subsets 'data' using these, in
  # this case, using the first 'n' rows of
  # all of the sampled rows in the city dataset
  d <- data[i[1:n], ]

  # Now the statistic is calculated on a
  # replicate with a sample size of 'n'
  mean(d[, 2]) / mean(d[, 1])
}

# Let's do the boot with 200 replicates
# each with a sample size of 5. 'n' is passed as
# part of the "..." argument in boot().
city.boot <- boot(city, city.subset, R = 200, n = 5)


Note that the 'statistic' function here takes three arguments:

  data: this will be 'city'

  i: sampled indices

  n: replicate sample size


As pointed out in D&H, the boot.array() function would need to be
adjusted here to work properly:

  boot.array(city.boot, indices = TRUE)[, 1:5]

so that you only return the first 5 values for each replicate.

There are important considerations if your replicates are sized less
than the original sample size, so I would not do this lightly and would
recommend securing a copy of D&H to cover some of this area.

HTH,

Marc Schwartz





               
---------------------------------

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html