Looping through data tables (or data frames) by removing previous individuals

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Looping through data tables (or data frames) by removing previous individuals

Frank S.
Dear R users,

With this mail I send my third and last question I wanted to ask these days. First of all, many thanks

for the received support in my previous mails! My question is this: Starting from a series of (for example)

"k" different dates (all contained in vector "v"), I want to get a list of "k" data tables (or data frames) so

that each contains those individuals who for the first time are at least 65, looping on each of the dates of

vector "v". Let's consider the following example with 5 individuals:


dt <- data.table(
   id = 1:5,
   fborn = as.Date(c("1935-07-25", "1942-10-05", "1942-09-07", "1943-09-07", "1943-12-31")),
   sex = as.factor(rep(c(0, 1), c(2, 3)))
   )

v <- seq(as.Date("2006-01-01"), as.Date("2009-01-01"), by ="year") # k=4


I would expect to obtain k=4 data tables so that:
dt_p1: contains id = 1 (he is for the first time at least 65 on date v[1])
dt_p2: is NULL (no subject reach for the first time 65 on date v[2])
dt_p3: contains id = 2 & id = 3 (they are for the first time at least 65 on v[3])
dt_p4: contains id = 4 & id = 5 (they are for the first time at least 65 on v[4])


I have tried:

dt_p <- list( )                        # Empty list to alocate data tables

for (i in 1:length(v)) {
  dt_p[[i]] <- dt[ !(id %in% dt_p[[1:(i-1)]]$id) &  # Remove subjects from previous dt_p's
         round((v[i] - fborn)/365.25, 2) >= 65, ][ , list(id, fborn, sex)]

 dt.names <- paste0("dt_p", 1:length(v))
 assign(dt.names[i], dt_p[[i]])         # Assign a name to each data table
 }

However, I cannot express correctly the previous data tables, because for the first data

table in the loop, there are not any previous. Consequently, I get an error message:

# Error in dt_p[[1:(i - 1)]] : no such index at level 1


I would be very grateful for anu suggestion!

Frank S.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Looping through data tables (or data frames) by removing previous individuals

Ista Zahn
Hi Frank,

How about

library(lubridate)
dtf <- merge(dt, expand.grid(id = dt$id, refdate = v), by = "id")
dtf[, gt65 := as.period(interval(fborn, refdate), unit = "years") > years(65)]
dtf <- dtf[gt65 == TRUE,][, .SD[refdate == min(refdate)], by = id]

Best,
Ista

On Mon, Oct 3, 2016 at 1:17 PM, Frank S. <[hidden email]> wrote:

> Dear R users,
>
> With this mail I send my third and last question I wanted to ask these days. First of all, many thanks
>
> for the received support in my previous mails! My question is this: Starting from a series of (for example)
>
> "k" different dates (all contained in vector "v"), I want to get a list of "k" data tables (or data frames) so
>
> that each contains those individuals who for the first time are at least 65, looping on each of the dates of
>
> vector "v". Let's consider the following example with 5 individuals:
>
>
> dt <- data.table(
>    id = 1:5,
>    fborn = as.Date(c("1935-07-25", "1942-10-05", "1942-09-07", "1943-09-07", "1943-12-31")),
>    sex = as.factor(rep(c(0, 1), c(2, 3)))
>    )
>
> v <- seq(as.Date("2006-01-01"), as.Date("2009-01-01"), by ="year") # k=4
>
>
> I would expect to obtain k=4 data tables so that:
> dt_p1: contains id = 1 (he is for the first time at least 65 on date v[1])
> dt_p2: is NULL (no subject reach for the first time 65 on date v[2])
> dt_p3: contains id = 2 & id = 3 (they are for the first time at least 65 on v[3])
> dt_p4: contains id = 4 & id = 5 (they are for the first time at least 65 on v[4])
>
>
> I have tried:
>
> dt_p <- list( )                        # Empty list to alocate data tables
>
> for (i in 1:length(v)) {
>   dt_p[[i]] <- dt[ !(id %in% dt_p[[1:(i-1)]]$id) &  # Remove subjects from previous dt_p's
>          round((v[i] - fborn)/365.25, 2) >= 65, ][ , list(id, fborn, sex)]
>
>  dt.names <- paste0("dt_p", 1:length(v))
>  assign(dt.names[i], dt_p[[i]])         # Assign a name to each data table
>  }
>
> However, I cannot express correctly the previous data tables, because for the first data
>
> table in the loop, there are not any previous. Consequently, I get an error message:
>
> # Error in dt_p[[1:(i - 1)]] : no such index at level 1
>
>
> I would be very grateful for anu suggestion!
>
> Frank S.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Looping through data tables (or data frames) by removing previous individuals

Berry, Charles
In reply to this post by Frank S.
On Mon, 3 Oct 2016, Frank S. wrote:

> Dear R users,
>
>

[deleted]

> I want to get a list of "k" data tables (or data frames) so that each
> contains those individuals who for the first time are at least 65,
> looping on each of the dates of vector "v". Let's consider the following
> example with 5 individuals:
>
>
> dt <- data.table(
>   id = 1:5,
>   fborn = as.Date(c("1935-07-25", "1942-10-05", "1942-09-07", "1943-09-07", "1943-12-31")),
>   sex = as.factor(rep(c(0, 1), c(2, 3)))
>   )
>
> v <- seq(as.Date("2006-01-01"), as.Date("2009-01-01"), by ="year") # k=4
>
>
> I would expect to obtain k=4 data tables so that:
> dt_p1: contains id = 1 (he is for the first time at least 65 on date v[1])
> dt_p2: is NULL (no subject reach for the first time 65 on date v[2])
> dt_p3: contains id = 2 & id = 3 (they are for the first time at least 65 on v[3])
> dt_p4: contains id = 4 & id = 5 (they are for the first time at least 65 on v[4])
>
>

Here is a start (using a data.frame for dt):

> vp <- as.POSIXlt( c( as.Date("1000-01-01"), v ))
> vp$year <- vp$year-65
> dt.cut <- as.numeric(cut(as.POSIXlt(dt$fborn),vp))
> split(dt,factor(dt.cut, 1:length(v)))
$`1`
   id      fborn sex
1  1 1935-07-25   0

$`2`
[1] id    fborn sex
<0 rows> (or 0-length row.names)

$`3`
   id      fborn sex
2  2 1942-10-05   0
3  3 1942-09-07   1

$`4`
   id      fborn sex
4  4 1943-09-07   1
5  5 1943-12-31   1


See
   ?as.POSIXlt
   ?cut.POSIXt
   ?split

HTH,

Chuck

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Looping through data tables (or data frames) by removing previous individuals

Frank S.
Thank you very much Ista and Zahn!


Best,


Frank S.

________________________________
De: Charles C. Berry <[hidden email]>
Enviat el: dilluns, 3 d'octubre de 2016 21:38:05
Per a: Frank S.
A/c: [hidden email]
Tema: Re: Looping through data tables (or data frames) by removing previous individuals

On Mon, 3 Oct 2016, Frank S. wrote:

> Dear R users,
>
>

[deleted]

> I want to get a list of "k" data tables (or data frames) so that each
> contains those individuals who for the first time are at least 65,
> looping on each of the dates of vector "v". Let's consider the following
> example with 5 individuals:
>
>
> dt <- data.table(
>   id = 1:5,
>   fborn = as.Date(c("1935-07-25", "1942-10-05", "1942-09-07", "1943-09-07", "1943-12-31")),
>   sex = as.factor(rep(c(0, 1), c(2, 3)))
>   )
>
> v <- seq(as.Date("2006-01-01"), as.Date("2009-01-01"), by ="year") # k=4
>
>
> I would expect to obtain k=4 data tables so that:
> dt_p1: contains id = 1 (he is for the first time at least 65 on date v[1])
> dt_p2: is NULL (no subject reach for the first time 65 on date v[2])
> dt_p3: contains id = 2 & id = 3 (they are for the first time at least 65 on v[3])
> dt_p4: contains id = 4 & id = 5 (they are for the first time at least 65 on v[4])
>
>

Here is a start (using a data.frame for dt):

> vp <- as.POSIXlt( c( as.Date("1000-01-01"), v ))
> vp$year <- vp$year-65
> dt.cut <- as.numeric(cut(as.POSIXlt(dt$fborn),vp))
> split(dt,factor(dt.cut, 1:length(v)))
$`1`
   id      fborn sex
1  1 1935-07-25   0

$`2`
[1] id    fborn sex
<0 rows> (or 0-length row.names)

$`3`
   id      fborn sex
2  2 1942-10-05   0
3  3 1942-09-07   1

$`4`
   id      fborn sex
4  4 1943-09-07   1
5  5 1943-12-31   1


See
   ?as.POSIXlt
   ?cut.POSIXt
   ?split

HTH,

Chuck

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.