within group sequential subtraction

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

within group sequential subtraction

natalie.vanzuydam
Hi Everyone,

I would like to do sequential subtractions within a group so that I know the time between separate observations for a group of individuals.  

My data:

data <- structure(list(group = c("IND1", "IND1", "IND2",
"IND2", "IND2", "IND3", "IND4", "IND5",
"IND6", "IND6"), date_obs = structure(c(6468,
7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class = "Date")), .Names = c("group",
"date_obs"), row.names = c(NA, 10L), class = "data.frame")

So I start with:

 group   date_obs
1   IND1 1987-09-17
2   IND1 1989-05-04
3   IND2 1997-04-30
4   IND2 2008-11-03
5   IND2 2009-05-08
6   IND3 1984-01-17
7   IND4 1996-09-28
8   IND5 2000-07-30
9   IND6 1998-01-17
10  IND6 1999-02-25

what I would like:

 group   date_obs     time
1   IND1 1987-09-17 NA  
2   IND1 1989-05-04 595
3   IND2 1997-04-30 NA
4   IND2 2008-11-03 4205
5   IND2 2009-05-08 186
6   IND3 1984-01-17 NA
7   IND4 1996-09-28 NA
8   IND5 2000-07-30 NA
9   IND6 1998-01-17 NA
10  IND6 1999-02-25 404

So that if there is one entry/individual a 0/NA would be acceptable and if there is more than one entry/individual the sequential difference would be calculated.

I started with some code but it I cannot edit it appropriately.

x <- do.call(rbind, lapply(split(data, data$group),
        function(dat) {
                        dat <- dat[order(dat$date_obs), ]
                        d<-diff(dat$date_obs)
                         dat <- rbind(dat,d)
                        }))

I get this error: "Error in as.Date.numeric(value) : 'origin' must be supplied" so I'm not sure if it does what I need it to do.  In addition to this the vector lengths won't match up as the first date in the sequence won't be subtracted from itself.

I'm not sure if anyone knows an easier way to achieve this.  

Thanks for the help,
Natalie
                       

Natalie Van Zuydam

PhD Student
University of Dundee
nvanzuydam@dundee.ac.uk
Reply | Threaded
Open this post in threaded view
|

Re: within group sequential subtraction

Joshua Wiley-2
Dear Natalie,

I am sure there are other ways, but one way you can do this is by
applying diff() to each group using tapply() or by().  Because those
return lists, if you want to add it back into your data frame, you can
wrap the whole call in unlist().  Here is an example:

dat <- structure(list(group = c("IND1", "IND1", "IND2",
"IND2", "IND2", "IND3", "IND4", "IND5",
"IND6", "IND6"), date_obs = structure(c(6468,
7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class =
"Date")), .Names = c("group",
"date_obs"), row.names = c(NA, 10L), class = "data.frame")

## calculate differences using diff() by each group
## note the prepended NA
dat$time <- unlist(tapply(dat$date_obs, dat$group,
  function(x) {diff(c(NA, x))}))

dat ## updated data frame

HTH,

Josh

On Thu, Mar 10, 2011 at 6:56 AM, natalie.vanzuydam <[hidden email]> wrote:

> Hi Everyone,
>
> I would like to do sequential subtractions within a group so that I know the
> time between separate observations for a group of individuals.
>
> My data:
>
> data <- structure(list(group = c("IND1", "IND1", "IND2",
> "IND2", "IND2", "IND3", "IND4", "IND5",
> "IND6", "IND6"), date_obs = structure(c(6468,
> 7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class =
> "Date")), .Names = c("group",
> "date_obs"), row.names = c(NA, 10L), class = "data.frame")
>
> So I start with:
>
>  group   date_obs
> 1   IND1 1987-09-17
> 2   IND1 1989-05-04
> 3   IND2 1997-04-30
> 4   IND2 2008-11-03
> 5   IND2 2009-05-08
> 6   IND3 1984-01-17
> 7   IND4 1996-09-28
> 8   IND5 2000-07-30
> 9   IND6 1998-01-17
> 10  IND6 1999-02-25
>
> what I would like:
>
>  group   date_obs     time
> 1   IND1 1987-09-17 NA
> 2   IND1 1989-05-04 595
> 3   IND2 1997-04-30 NA
> 4   IND2 2008-11-03 4205
> 5   IND2 2009-05-08 186
> 6   IND3 1984-01-17 NA
> 7   IND4 1996-09-28 NA
> 8   IND5 2000-07-30 NA
> 9   IND6 1998-01-17 NA
> 10  IND6 1999-02-25 404
>
> So that if there is one entry/individual a 0/NA would be acceptable and if
> there is more than one entry/individual the sequential difference would be
> calculated.
>
> I started with some code but it I cannot edit it appropriately.
>
> x <- do.call(rbind, lapply(split(data, data$group),
>        function(dat) {
>                        dat <- dat[order(dat$date_obs), ]
>                        d<-diff(dat$date_obs)
>                         dat <- rbind(dat,d)
>                        }))
>
> I get this error: "Error in as.Date.numeric(value) : 'origin' must be
> supplied" so I'm not sure if it does what I need it to do.  In addition to
> this the vector lengths won't match up as the first date in the sequence
> won't be subtracted from itself.
>
> I'm not sure if anyone knows an easier way to achieve this.
>
> Thanks for the help,
> Natalie
>
>
>
>
> -----
> Natalie Van Zuydam
>
> PhD Student
> University of Dundee
> [hidden email]
> --
> View this message in context: http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: within group sequential subtraction

jholtman
In reply to this post by natalie.vanzuydam
Try this:

> data$diff <- ave(as.numeric(data$date_obs), data$group, FUN=function(x)c(NA, diff(x)))
> data
   group   date_obs diff
1   IND1 1987-09-17   NA
2   IND1 1989-05-04  595
3   IND2 1997-04-30   NA
4   IND2 2008-11-03 4205
5   IND2 2009-05-08  186
6   IND3 1984-01-17   NA
7   IND4 1996-09-28   NA
8   IND5 2000-07-30   NA
9   IND6 1998-01-17   NA
10  IND6 1999-02-25  404
>


On Thu, Mar 10, 2011 at 9:56 AM, natalie.vanzuydam <[hidden email]> wrote:

> Hi Everyone,
>
> I would like to do sequential subtractions within a group so that I know the
> time between separate observations for a group of individuals.
>
> My data:
>
> data <- structure(list(group = c("IND1", "IND1", "IND2",
> "IND2", "IND2", "IND3", "IND4", "IND5",
> "IND6", "IND6"), date_obs = structure(c(6468,
> 7063, 9981, 14186, 14372, 5129, 9767, 11168, 10243, 10647), class =
> "Date")), .Names = c("group",
> "date_obs"), row.names = c(NA, 10L), class = "data.frame")
>
> So I start with:
>
>  group   date_obs
> 1   IND1 1987-09-17
> 2   IND1 1989-05-04
> 3   IND2 1997-04-30
> 4   IND2 2008-11-03
> 5   IND2 2009-05-08
> 6   IND3 1984-01-17
> 7   IND4 1996-09-28
> 8   IND5 2000-07-30
> 9   IND6 1998-01-17
> 10  IND6 1999-02-25
>
> what I would like:
>
>  group   date_obs     time
> 1   IND1 1987-09-17 NA
> 2   IND1 1989-05-04 595
> 3   IND2 1997-04-30 NA
> 4   IND2 2008-11-03 4205
> 5   IND2 2009-05-08 186
> 6   IND3 1984-01-17 NA
> 7   IND4 1996-09-28 NA
> 8   IND5 2000-07-30 NA
> 9   IND6 1998-01-17 NA
> 10  IND6 1999-02-25 404
>
> So that if there is one entry/individual a 0/NA would be acceptable and if
> there is more than one entry/individual the sequential difference would be
> calculated.
>
> I started with some code but it I cannot edit it appropriately.
>
> x <- do.call(rbind, lapply(split(data, data$group),
>        function(dat) {
>                        dat <- dat[order(dat$date_obs), ]
>                        d<-diff(dat$date_obs)
>                         dat <- rbind(dat,d)
>                        }))
>
> I get this error: "Error in as.Date.numeric(value) : 'origin' must be
> supplied" so I'm not sure if it does what I need it to do.  In addition to
> this the vector lengths won't match up as the first date in the sequence
> won't be subtracted from itself.
>
> I'm not sure if anyone knows an easier way to achieve this.
>
> Thanks for the help,
> Natalie
>
>
>
>
> -----
> Natalie Van Zuydam
>
> PhD Student
> University of Dundee
> [hidden email]
> --
> View this message in context: http://r.789695.n4.nabble.com/within-group-sequential-subtraction-tp3346033p3346033.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.