Formatting data for bootstrapping for confidence intervals

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Formatting data for bootstrapping for confidence intervals

Paul Wennekes
Hi all,

New to R, so this may be obvious to some.
I've been trying to figure this out for a while, I have a dataset "events" that looks something like this:

Area NAME DATE X Xn Y
1        X 1/10/10        1 1 0
1        Y 1/11/10        0 0 1
1        X 1/12/10        1 0 0
1         X 1/12/10        1 0 0
1        X 1/12/10        1 0 0
2        X 2/12/10        1 1 0
2        X 2/12/10        1 0 0
2        Y 2/12/10        0 0 1
2        X 2/13/10        1 0 0
2        X 2/13/10        1 0 0
2        X 2/13/10        1 0 0
2        X 2/14/10        1 0 0
2        X 2/14/10        1 0 0
2        X 2/14/10        1 1 0
2        X 2/14/10        1 0 0
3         X 7/27/11        1 0 0
3        X 7/27/11        1 1 0
3        X 7/27/11        1 0 0
3        X 7/28/11        1 0 0
3        X 7/28/11        1 1 0
3        X 7/28/11        1 0 0
3        X 7/28/11        1 0 0
3        Y 7/28/11        0 0 1
3        X 7/28/11        1 0 0
3        X 7/28/11        1 1 0
3        Y 7/28/11        0 0 1
3        X 7/28/11        1 0 0
3        X 7/29/11        1 0 0
3        X 7/29/11        1 0 0
3        X 7/29/11        1 1 0

X and Y are events. Every row represents a single event happening, with a 1 indicating which one happens at that time. Xn indicates X happening at night. I want to bootstrap these events over days but I think I need to summarize them first, ie. get something that looks like this:

Area DATE        X Xn Y
1         1/10/10        1 1 0
1         1/11/10        0 0 1
1         1/12/10        3 0 0
2         2/12/10        2 1 1
etc.

and then for each Area, bootstrap the data over the days. Any ideas? I've tried using the 'reshape' package but I don't know how to sum over parts of the columns as defined by the DATE values...

Many thanks ahead!
Reply | Threaded
Open this post in threaded view
|

Re: Formatting data for bootstrapping for confidence intervals

Rui Barradas
Hello,

To aggregate the data use, yes, it's exists, function aggregate.

with(dat, aggregate(cbind(X, Xn, Y), list(Area, DATE), FUN = sum))
# output
   Group.1 Group.2 X Xn Y
1       1 1/10/10 1  1 0
2       1 1/11/10 0  0 1
3       1 1/12/10 3  0 0
4       2 2/12/10 2  1 1
5       2 2/13/10 3  0 0
6       2 2/14/10 4  1 0
7       3 7/27/11 3  1 0
8       3 7/28/11 7  2 2
9       3 7/29/11 3  1 0

And take a look at package boot. Maybe you'll find something there.

Hope this helps,

Rui Barradas


Em 11-10-2012 16:55, Paul Wennekes escreveu:

> Hi all,
>
> New to R, so this may be obvious to some.
> I've been trying to figure this out for a while, I have a dataset "events"
> that looks something like this:
>
> Area NAME DATE X Xn Y
> 1        X 1/10/10        1 1 0
> 1        Y 1/11/10        0 0 1
> 1        X 1/12/10        1 0 0
> 1         X 1/12/10        1 0 0
> 1        X 1/12/10        1 0 0
> 2        X 2/12/10        1 1 0
> 2        X 2/12/10        1 0 0
> 2        Y 2/12/10        0 0 1
> 2        X 2/13/10        1 0 0
> 2        X 2/13/10        1 0 0
> 2        X 2/13/10        1 0 0
> 2        X 2/14/10        1 0 0
> 2        X 2/14/10        1 0 0
> 2        X 2/14/10        1 1 0
> 2        X 2/14/10        1 0 0
> 3         X 7/27/11        1 0 0
> 3        X 7/27/11        1 1 0
> 3        X 7/27/11        1 0 0
> 3        X 7/28/11        1 0 0
> 3        X 7/28/11        1 1 0
> 3        X 7/28/11        1 0 0
> 3        X 7/28/11        1 0 0
> 3        Y 7/28/11        0 0 1
> 3        X 7/28/11        1 0 0
> 3        X 7/28/11        1 1 0
> 3        Y 7/28/11        0 0 1
> 3        X 7/28/11        1 0 0
> 3        X 7/29/11        1 0 0
> 3        X 7/29/11        1 0 0
> 3        X 7/29/11        1 1 0
>
> X and Y are events. Every row represents a single event happening, with a 1
> indicating which one happens at that time. Xn indicates X happening at
> night. I want to bootstrap these events over days but I think I need to
> summarize them first, ie. get something that looks like this:
>
> Area DATE        X Xn Y
> 1         1/10/10        1 1 0
> 1         1/11/10        0 0 1
> 1         1/12/10        3 0 0
> 2         2/12/10        2 1 1
> etc.
>
> and then for each Area, bootstrap the data over the days. Any ideas? I've
> tried using the 'reshape' package but I don't know how to sum over parts of
> the columns as defined by the DATE values...
>
> Many thanks ahead!
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Formatting data for bootstrapping for confidence intervals

Paul Wennekes
Thank you! That had me stuck for quite a while and this worked like a charm!
Reply | Threaded
Open this post in threaded view
|

Re: Formatting data for bootstrapping for confidence intervals

arun kirshna
In reply to this post by Paul Wennekes


Hi,
Try this:

dat1<-read.table(text="
Area    NAME    DATE    X    Xn    Y
1            X    1/10/10            1    1    0
1            Y    1/11/10            0    0    1
1            X    1/12/10            1    0    0
1            X    1/12/10            1    0    0
1            X    1/12/10            1    0    0
2            X    2/12/10            1    1    0
2            X    2/12/10            1    0    0
2            Y    2/12/10            0    0    1
2            X    2/13/10            1    0    0
2            X    2/13/10            1    0    0
2            X    2/13/10            1    0    0
2            X    2/14/10            1    0    0
2            X    2/14/10            1    0    0
2            X    2/14/10            1    1    0
2            X    2/14/10            1    0    0
3            X    7/27/11            1    0    0
3            X    7/27/11            1    1    0
3            X    7/27/11            1    0    0
3            X    7/28/11            1    0    0
3            X    7/28/11            1    1    0
3            X    7/28/11            1    0    0
3            X    7/28/11            1    0    0
3            Y    7/28/11            0    0    1
3            X    7/28/11            1    0    0
3            X    7/28/11            1    1    0
3            Y    7/28/11            0    0    1
3            X    7/28/11            1    0    0
3            X    7/29/11            1    0    0
3            X    7/29/11            1    0    0
3            X    7/29/11            1    1    0
",sep="",header=TRUE,stringsAsFactors=FALSE)

#You can either use aggregate(), ddply() from library(plyr) or using library(data.table)
library(data.table)
dat2<-data.table(dat1)
dat2[,list(X=sum(X),Xn=sum(Xn),Y=sum(Y)),list(Area,DATE)]
#   Area    DATE X Xn Y
#1:    1 1/10/10 1  1 0
#2:    1 1/11/10 0  0 1
#3:    1 1/12/10 3  0 0
#4:    2 2/12/10 2  1 1
#5:    2 2/13/10 3  0 0
#6:    2 2/14/10 4  1 0
#7:    3 7/27/11 3  1 0
#8:    3 7/28/11 7  2 2
#9:    3 7/29/11 3  1 0
library(plyr)
ddply(dat1,.(Area,DATE),colwise(sum,c("X","Xn","Y")))
# Area    DATE X Xn Y
#1    1 1/10/10 1  1 0
#2    1 1/11/10 0  0 1
#3    1 1/12/10 3  0 0
#4    2 2/12/10 2  1 1
#5    2 2/13/10 3  0 0
#6    2 2/14/10 4  1 0
#7    3 7/27/11 3  1 0
#8    3 7/28/11 7  2 2
#9    3 7/29/11 3  1 0

A.K.


----- Original Message -----
From: Paul Wennekes <[hidden email]>
To: [hidden email]
Cc:
Sent: Thursday, October 11, 2012 11:55 AM
Subject: [R] Formatting data for bootstrapping  for confidence intervals

Hi all,

New to R, so this may be obvious to some.
I've been trying to figure this out for a while, I have a dataset "events"
that looks something like this:

Area    NAME    DATE    X    Xn    Y
1            X    1/10/10            1    1    0
1            Y    1/11/10            0    0    1
1            X    1/12/10            1    0    0
1            X    1/12/10            1    0    0
1            X    1/12/10            1    0    0
2            X    2/12/10            1    1    0
2            X    2/12/10            1    0    0
2            Y    2/12/10            0    0    1
2            X    2/13/10            1    0    0
2            X    2/13/10            1    0    0
2            X    2/13/10            1    0    0
2            X    2/14/10            1    0    0
2            X    2/14/10            1    0    0
2            X    2/14/10            1    1    0
2            X    2/14/10            1    0    0
3            X    7/27/11            1    0    0
3            X    7/27/11            1    1    0
3            X    7/27/11            1    0    0
3            X    7/28/11            1    0    0
3            X    7/28/11            1    1    0
3            X    7/28/11            1    0    0
3            X    7/28/11            1    0    0
3            Y    7/28/11            0    0    1
3            X    7/28/11            1    0    0
3            X    7/28/11            1    1    0
3            Y    7/28/11            0    0    1
3            X    7/28/11            1    0    0
3            X    7/29/11            1    0    0
3            X    7/29/11            1    0    0
3            X    7/29/11            1    1    0

X and Y are events. Every row represents a single event happening, with a 1
indicating which one happens at that time. Xn indicates X happening at
night. I want to bootstrap these events over days but I think I need to
summarize them first, ie. get something that looks like this:

Area        DATE            X    Xn    Y
1                1/10/10            1    1    0
1                1/11/10            0    0    1
1                1/12/10            3    0    0
2                2/12/10            2    1    1
etc.

and then for each Area, bootstrap the data over the days. Any ideas? I've
tried using the 'reshape' package but I don't know how to sum over parts of
the columns as defined by the DATE values...

Many thanks ahead!



--
View this message in context: http://r.789695.n4.nabble.com/Formatting-data-for-bootstrapping-for-confidence-intervals-tp4645860.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.