Using lapply in R data table

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Using lapply in R data table

Frank S.
Dear all,

I have a R data table like this:

DT <- data.table(
  id = rep(c(2, 5, 7), c(3, 2, 2)),
  fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
  group = rep(c("A", "B", "A"), c(3, 2, 2))  )


I want to construct a new variable "exposure" defined as follows:

1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5


So the desired output would be the following data table:

   id                fini exposure group
1:  2 2005-04-20        1.00        A
2:  2 2005-04-20        1.00        A
3:  2 2005-04-20        1.00        A
4:  5 2006-02-19        0.87        B
5:  5 2006-02-19        0.87        B
6:  7 2006-10-08        0.50        A
7:  7 2006-10-08        0.50        A


I have tried:

DT <- DT[ , list(id, fini, exposure = 0, group)]
DT.new <- lapply(DT, function(exposure){
      exposure[fini < as.Date("2006-01-01")] <- 1   # 1st case
      exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
    exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5       # 3rd case
      exposure  # return value
  })


But I get an error message.

Thanks for any help!!


Frank S.


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using lapply in R data table

Ista Zahn
Hi Frank,

lapply(DT) iterates over each column. That doesn't seem to be what you want.

There are probably better ways, but here is one approach.

DT[, exposure := vector(mode = "numeric", length = .N)]
DT[fini < as.Date("2006-01-01"), exposure := 1]
DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
      exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
DT[fini >= as.Date("2006-07-01"), exposure := 0.5]

Best,
Ista

On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <[hidden email]> wrote:

> Dear all,
>
> I have a R data table like this:
>
> DT <- data.table(
>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>   fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>
>
> I want to construct a new variable "exposure" defined as follows:
>
> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>
>
> So the desired output would be the following data table:
>
>    id                fini exposure group
> 1:  2 2005-04-20        1.00        A
> 2:  2 2005-04-20        1.00        A
> 3:  2 2005-04-20        1.00        A
> 4:  5 2006-02-19        0.87        B
> 5:  5 2006-02-19        0.87        B
> 6:  7 2006-10-08        0.50        A
> 7:  7 2006-10-08        0.50        A
>
>
> I have tried:
>
> DT <- DT[ , list(id, fini, exposure = 0, group)]
> DT.new <- lapply(DT, function(exposure){
>       exposure[fini < as.Date("2006-01-01")] <- 1   # 1st case
>       exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>     exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5       # 3rd case
>       exposure  # return value
>   })
>
>
> But I get an error message.
>
> Thanks for any help!!
>
>
> Frank S.
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using lapply in R data table

Bert Gunter-2
This seems like a job for cut() .

(I made DT a data frame to avoid loading the data table package. But I
assume it would work with a data table too, Check this, though!)

> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))

> DT
  id       fini group exposure
1  2 2005-04-20     A        1
2  2 2005-04-20     A        1
3  2 2005-04-20     A        1
4  5 2006-02-19     B     0.87
5  5 2006-02-19     B     0.87
6  7 2006-10-08     A      0.5
7  7 2006-10-08     A      0.5


(but note that exposure is a factor, not numeric)


Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <[hidden email]> wrote:

> Hi Frank,
>
> lapply(DT) iterates over each column. That doesn't seem to be what you want.
>
> There are probably better ways, but here is one approach.
>
> DT[, exposure := vector(mode = "numeric", length = .N)]
> DT[fini < as.Date("2006-01-01"), exposure := 1]
> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>       exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>
> Best,
> Ista
>
> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <[hidden email]> wrote:
>> Dear all,
>>
>> I have a R data table like this:
>>
>> DT <- data.table(
>>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>>   fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>>
>>
>> I want to construct a new variable "exposure" defined as follows:
>>
>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>>
>>
>> So the desired output would be the following data table:
>>
>>    id                fini exposure group
>> 1:  2 2005-04-20        1.00        A
>> 2:  2 2005-04-20        1.00        A
>> 3:  2 2005-04-20        1.00        A
>> 4:  5 2006-02-19        0.87        B
>> 5:  5 2006-02-19        0.87        B
>> 6:  7 2006-10-08        0.50        A
>> 7:  7 2006-10-08        0.50        A
>>
>>
>> I have tried:
>>
>> DT <- DT[ , list(id, fini, exposure = 0, group)]
>> DT.new <- lapply(DT, function(exposure){
>>       exposure[fini < as.Date("2006-01-01")] <- 1   # 1st case
>>       exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>>     exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5       # 3rd case
>>       exposure  # return value
>>   })
>>
>>
>> But I get an error message.
>>
>> Thanks for any help!!
>>
>>
>> Frank S.
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using lapply in R data table

Ista Zahn
On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <[hidden email]> wrote:
> This seems like a job for cut() .

I thought that at first two, but the middle group shouldn't be .87 but rather

exposure" = "2007-01-01" - "fini"

so, I think cut alone won't do it.

Best,
Ista

>
> (I made DT a data frame to avoid loading the data table package. But I
> assume it would work with a data table too, Check this, though!)
>
>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))
>
>> DT
>   id       fini group exposure
> 1  2 2005-04-20     A        1
> 2  2 2005-04-20     A        1
> 3  2 2005-04-20     A        1
> 4  5 2006-02-19     B     0.87
> 5  5 2006-02-19     B     0.87
> 6  7 2006-10-08     A      0.5
> 7  7 2006-10-08     A      0.5
>
>
> (but note that exposure is a factor, not numeric)
>
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <[hidden email]> wrote:
>> Hi Frank,
>>
>> lapply(DT) iterates over each column. That doesn't seem to be what you want.
>>
>> There are probably better ways, but here is one approach.
>>
>> DT[, exposure := vector(mode = "numeric", length = .N)]
>> DT[fini < as.Date("2006-01-01"), exposure := 1]
>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>>       exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>>
>> Best,
>> Ista
>>
>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <[hidden email]> wrote:
>>> Dear all,
>>>
>>> I have a R data table like this:
>>>
>>> DT <- data.table(
>>>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>>>   fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>>>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>>>
>>>
>>> I want to construct a new variable "exposure" defined as follows:
>>>
>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>>>
>>>
>>> So the desired output would be the following data table:
>>>
>>>    id                fini exposure group
>>> 1:  2 2005-04-20        1.00        A
>>> 2:  2 2005-04-20        1.00        A
>>> 3:  2 2005-04-20        1.00        A
>>> 4:  5 2006-02-19        0.87        B
>>> 5:  5 2006-02-19        0.87        B
>>> 6:  7 2006-10-08        0.50        A
>>> 7:  7 2006-10-08        0.50        A
>>>
>>>
>>> I have tried:
>>>
>>> DT <- DT[ , list(id, fini, exposure = 0, group)]
>>> DT.new <- lapply(DT, function(exposure){
>>>       exposure[fini < as.Date("2006-01-01")] <- 1   # 1st case
>>>       exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>>>     exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5       # 3rd case
>>>       exposure  # return value
>>>   })
>>>
>>>
>>> But I get an error message.
>>>
>>> Thanks for any help!!
>>>
>>>
>>> Frank S.
>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using lapply in R data table

Bert Gunter-2
I thought that that was a typo from the OP, as it disagrees with his
example. But the labels are arbitrary, so in fact cut() will do it
whichever way he meant.

-- Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <[hidden email]> wrote:

> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <[hidden email]> wrote:
>> This seems like a job for cut() .
>
> I thought that at first two, but the middle group shouldn't be .87 but rather
>
> exposure" = "2007-01-01" - "fini"
>
> so, I think cut alone won't do it.
>
> Best,
> Ista
>>
>> (I made DT a data frame to avoid loading the data table package. But I
>> assume it would work with a data table too, Check this, though!)
>>
>>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))
>>
>>> DT
>>   id       fini group exposure
>> 1  2 2005-04-20     A        1
>> 2  2 2005-04-20     A        1
>> 3  2 2005-04-20     A        1
>> 4  5 2006-02-19     B     0.87
>> 5  5 2006-02-19     B     0.87
>> 6  7 2006-10-08     A      0.5
>> 7  7 2006-10-08     A      0.5
>>
>>
>> (but note that exposure is a factor, not numeric)
>>
>>
>> Cheers,
>> Bert
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <[hidden email]> wrote:
>>> Hi Frank,
>>>
>>> lapply(DT) iterates over each column. That doesn't seem to be what you want.
>>>
>>> There are probably better ways, but here is one approach.
>>>
>>> DT[, exposure := vector(mode = "numeric", length = .N)]
>>> DT[fini < as.Date("2006-01-01"), exposure := 1]
>>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>>>       exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
>>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>>>
>>> Best,
>>> Ista
>>>
>>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <[hidden email]> wrote:
>>>> Dear all,
>>>>
>>>> I have a R data table like this:
>>>>
>>>> DT <- data.table(
>>>>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>>>>   fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>>>>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>>>>
>>>>
>>>> I want to construct a new variable "exposure" defined as follows:
>>>>
>>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
>>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
>>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>>>>
>>>>
>>>> So the desired output would be the following data table:
>>>>
>>>>    id                fini exposure group
>>>> 1:  2 2005-04-20        1.00        A
>>>> 2:  2 2005-04-20        1.00        A
>>>> 3:  2 2005-04-20        1.00        A
>>>> 4:  5 2006-02-19        0.87        B
>>>> 5:  5 2006-02-19        0.87        B
>>>> 6:  7 2006-10-08        0.50        A
>>>> 7:  7 2006-10-08        0.50        A
>>>>
>>>>
>>>> I have tried:
>>>>
>>>> DT <- DT[ , list(id, fini, exposure = 0, group)]
>>>> DT.new <- lapply(DT, function(exposure){
>>>>       exposure[fini < as.Date("2006-01-01")] <- 1   # 1st case
>>>>       exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>>>>     exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5       # 3rd case
>>>>       exposure  # return value
>>>>   })
>>>>
>>>>
>>>> But I get an error message.
>>>>
>>>> Thanks for any help!!
>>>>
>>>>
>>>> Frank S.
>>>>
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using lapply in R data table

Ista Zahn
On Mon, Sep 26, 2016 at 2:48 PM, Bert Gunter <[hidden email]> wrote:
> I thought that that was a typo from the OP, as it disagrees with his
> example. But the labels are arbitrary, so in fact cut() will do it
> whichever way he meant.

I don't see how cut will do it, at least not conveniently. Consider
this slightly altered example:

library(data.table)
DT <- data.table(
  id = rep(c(2, 5, 7), c(3, 2, 2)),
  fini = rep(as.Date(c('2005-04-20',
                       '2006-02-19',
                       '2006-06-29',
                       '2006-10-08')),
             c(3, 1, 1, 2)),
  group = rep(c("A", "B", "A"), c(3, 2, 2))  )

DT[, exposure := vector(mode = "numeric", length = .N)]
DT[fini < as.Date("2006-01-01"), exposure := 1]
DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
   exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
DT[fini >= as.Date("2006-07-01"), exposure := 0.5]

DT

##    id       fini group  exposure
## 1:  2 2005-04-20     A 1.0000000
## 2:  2 2005-04-20     A 1.0000000
## 3:  2 2005-04-20     A 1.0000000
## 4:  5 2006-02-19     B 0.8651608
## 5:  5 2006-06-29     B 0.5092402
## 6:  7 2006-10-08     A 0.5000000
## 7:  7 2006-10-08     A 0.5000000

Best,
Ista

>
> -- Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <[hidden email]> wrote:
>> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <[hidden email]> wrote:
>>> This seems like a job for cut() .
>>
>> I thought that at first two, but the middle group shouldn't be .87 but rather
>>
>> exposure" = "2007-01-01" - "fini"
>>
>> so, I think cut alone won't do it.
>>
>> Best,
>> Ista
>>>
>>> (I made DT a data frame to avoid loading the data table package. But I
>>> assume it would work with a data table too, Check this, though!)
>>>
>>>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))
>>>
>>>> DT
>>>   id       fini group exposure
>>> 1  2 2005-04-20     A        1
>>> 2  2 2005-04-20     A        1
>>> 3  2 2005-04-20     A        1
>>> 4  5 2006-02-19     B     0.87
>>> 5  5 2006-02-19     B     0.87
>>> 6  7 2006-10-08     A      0.5
>>> 7  7 2006-10-08     A      0.5
>>>
>>>
>>> (but note that exposure is a factor, not numeric)
>>>
>>>
>>> Cheers,
>>> Bert
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <[hidden email]> wrote:
>>>> Hi Frank,
>>>>
>>>> lapply(DT) iterates over each column. That doesn't seem to be what you want.
>>>>
>>>> There are probably better ways, but here is one approach.
>>>>
>>>> DT[, exposure := vector(mode = "numeric", length = .N)]
>>>> DT[fini < as.Date("2006-01-01"), exposure := 1]
>>>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>>>>       exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
>>>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>>>>
>>>> Best,
>>>> Ista
>>>>
>>>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <[hidden email]> wrote:
>>>>> Dear all,
>>>>>
>>>>> I have a R data table like this:
>>>>>
>>>>> DT <- data.table(
>>>>>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>>>>>   fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>>>>>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>>>>>
>>>>>
>>>>> I want to construct a new variable "exposure" defined as follows:
>>>>>
>>>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
>>>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
>>>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>>>>>
>>>>>
>>>>> So the desired output would be the following data table:
>>>>>
>>>>>    id                fini exposure group
>>>>> 1:  2 2005-04-20        1.00        A
>>>>> 2:  2 2005-04-20        1.00        A
>>>>> 3:  2 2005-04-20        1.00        A
>>>>> 4:  5 2006-02-19        0.87        B
>>>>> 5:  5 2006-02-19        0.87        B
>>>>> 6:  7 2006-10-08        0.50        A
>>>>> 7:  7 2006-10-08        0.50        A
>>>>>
>>>>>
>>>>> I have tried:
>>>>>
>>>>> DT <- DT[ , list(id, fini, exposure = 0, group)]
>>>>> DT.new <- lapply(DT, function(exposure){
>>>>>       exposure[fini < as.Date("2006-01-01")] <- 1   # 1st case
>>>>>       exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>>>>>     exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5       # 3rd case
>>>>>       exposure  # return value
>>>>>   })
>>>>>
>>>>>
>>>>> But I get an error message.
>>>>>
>>>>> Thanks for any help!!
>>>>>
>>>>>
>>>>> Frank S.
>>>>>
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using lapply in R data table

Bert Gunter-2
Ista:

Aha -- now I see the point. My bad. You are right. I was careless.

However, cut() with ifelse() might simplify the code a bit and/or make
it more readable. To be clear, this is just a matter of taste; e.g.
using your data and a data frame instead of a data table:

> DT <- within(DT,
        exposure <- {
          f <-cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")),
              labels= letters[1:3])
          ifelse(f == "a", 1,
                 ifelse( f == "c", .5,
                    difftime(as.Date("2007-01-01"), fini, units="days")/365.25))
}
        )


> DT
  id       fini group  exposure f
1  2 2005-04-20     A 1.0000000 a
2  2 2005-04-20     A 1.0000000 a
3  2 2005-04-20     A 1.0000000 a
4  5 2006-02-19     B 0.8651608 b
5  5 2006-06-29     B 0.5092402 b
6  7 2006-10-08     A 0.5000000 c
7  7 2006-10-08     A 0.5000000 c
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 26, 2016 at 12:07 PM, Ista Zahn <[hidden email]> wrote:

> On Mon, Sep 26, 2016 at 2:48 PM, Bert Gunter <[hidden email]> wrote:
>> I thought that that was a typo from the OP, as it disagrees with his
>> example. But the labels are arbitrary, so in fact cut() will do it
>> whichever way he meant.
>
> I don't see how cut will do it, at least not conveniently. Consider
> this slightly altered example:
>
> library(data.table)
> DT <- data.table(
>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>   fini = rep(as.Date(c('2005-04-20',
>                        '2006-02-19',
>                        '2006-06-29',
>                        '2006-10-08')),
>              c(3, 1, 1, 2)),
>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>
> DT[, exposure := vector(mode = "numeric", length = .N)]
> DT[fini < as.Date("2006-01-01"), exposure := 1]
> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>    exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>
> DT
>
> ##    id       fini group  exposure
> ## 1:  2 2005-04-20     A 1.0000000
> ## 2:  2 2005-04-20     A 1.0000000
> ## 3:  2 2005-04-20     A 1.0000000
> ## 4:  5 2006-02-19     B 0.8651608
> ## 5:  5 2006-06-29     B 0.5092402
> ## 6:  7 2006-10-08     A 0.5000000
> ## 7:  7 2006-10-08     A 0.5000000
>
> Best,
> Ista
>
>>
>> -- Bert
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <[hidden email]> wrote:
>>> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <[hidden email]> wrote:
>>>> This seems like a job for cut() .
>>>
>>> I thought that at first two, but the middle group shouldn't be .87 but rather
>>>
>>> exposure" = "2007-01-01" - "fini"
>>>
>>> so, I think cut alone won't do it.
>>>
>>> Best,
>>> Ista
>>>>
>>>> (I made DT a data frame to avoid loading the data table package. But I
>>>> assume it would work with a data table too, Check this, though!)
>>>>
>>>>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))
>>>>
>>>>> DT
>>>>   id       fini group exposure
>>>> 1  2 2005-04-20     A        1
>>>> 2  2 2005-04-20     A        1
>>>> 3  2 2005-04-20     A        1
>>>> 4  5 2006-02-19     B     0.87
>>>> 5  5 2006-02-19     B     0.87
>>>> 6  7 2006-10-08     A      0.5
>>>> 7  7 2006-10-08     A      0.5
>>>>
>>>>
>>>> (but note that exposure is a factor, not numeric)
>>>>
>>>>
>>>> Cheers,
>>>> Bert
>>>>
>>>> Bert Gunter
>>>>
>>>> "The trouble with having an open mind is that people keep coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>>
>>>> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <[hidden email]> wrote:
>>>>> Hi Frank,
>>>>>
>>>>> lapply(DT) iterates over each column. That doesn't seem to be what you want.
>>>>>
>>>>> There are probably better ways, but here is one approach.
>>>>>
>>>>> DT[, exposure := vector(mode = "numeric", length = .N)]
>>>>> DT[fini < as.Date("2006-01-01"), exposure := 1]
>>>>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>>>>>       exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
>>>>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>>>>>
>>>>> Best,
>>>>> Ista
>>>>>
>>>>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <[hidden email]> wrote:
>>>>>> Dear all,
>>>>>>
>>>>>> I have a R data table like this:
>>>>>>
>>>>>> DT <- data.table(
>>>>>>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>>>>>>   fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>>>>>>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>>>>>>
>>>>>>
>>>>>> I want to construct a new variable "exposure" defined as follows:
>>>>>>
>>>>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
>>>>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
>>>>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>>>>>>
>>>>>>
>>>>>> So the desired output would be the following data table:
>>>>>>
>>>>>>    id                fini exposure group
>>>>>> 1:  2 2005-04-20        1.00        A
>>>>>> 2:  2 2005-04-20        1.00        A
>>>>>> 3:  2 2005-04-20        1.00        A
>>>>>> 4:  5 2006-02-19        0.87        B
>>>>>> 5:  5 2006-02-19        0.87        B
>>>>>> 6:  7 2006-10-08        0.50        A
>>>>>> 7:  7 2006-10-08        0.50        A
>>>>>>
>>>>>>
>>>>>> I have tried:
>>>>>>
>>>>>> DT <- DT[ , list(id, fini, exposure = 0, group)]
>>>>>> DT.new <- lapply(DT, function(exposure){
>>>>>>       exposure[fini < as.Date("2006-01-01")] <- 1   # 1st case
>>>>>>       exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>>>>>>     exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5       # 3rd case
>>>>>>       exposure  # return value
>>>>>>   })
>>>>>>
>>>>>>
>>>>>> But I get an error message.
>>>>>>
>>>>>> Thanks for any help!!
>>>>>>
>>>>>>
>>>>>> Frank S.
>>>>>>
>>>>>>
>>>>>>         [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using lapply in R data table

Bert Gunter-2
... and just for fun, here's an alternative in which mapply() is used
to vectorize switch(); again, whether you like it may be just a matter
of taste, although I suspect it might be less efficient than ifelse(),
which is already vectorized:

DT <- within(DT,
            exposure <- {
              mapply(function(x,fac)switch(as.character(fac),
                          a = 1,
                          b = difftime(as.Date("2007-01-01"), x,
units="days")/365.25,
                          c = .5
                    ),
              x = fini,
              fac =
cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")),
                        labels= letters[1:3])
              )}
      )


> DT
  id       fini group  exposure
1  2 2005-04-20     A 1.0000000
2  2 2005-04-20     A 1.0000000
3  2 2005-04-20     A 1.0000000
4  5 2006-02-19     B 0.8651608
5  5 2006-06-29     B 0.5092402
6  7 2006-10-08     A 0.5000000
7  7 2006-10-08     A 0.5000000


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 26, 2016 at 1:27 PM, Bert Gunter <[hidden email]> wrote:

> Ista:
>
> Aha -- now I see the point. My bad. You are right. I was careless.
>
> However, cut() with ifelse() might simplify the code a bit and/or make
> it more readable. To be clear, this is just a matter of taste; e.g.
> using your data and a data frame instead of a data table:
>
>> DT <- within(DT,
>         exposure <- {
>           f <-cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")),
>               labels= letters[1:3])
>           ifelse(f == "a", 1,
>                  ifelse( f == "c", .5,
>                     difftime(as.Date("2007-01-01"), fini, units="days")/365.25))
> }
>         )
>
>
>> DT
>   id       fini group  exposure f
> 1  2 2005-04-20     A 1.0000000 a
> 2  2 2005-04-20     A 1.0000000 a
> 3  2 2005-04-20     A 1.0000000 a
> 4  5 2006-02-19     B 0.8651608 b
> 5  5 2006-06-29     B 0.5092402 b
> 6  7 2006-10-08     A 0.5000000 c
> 7  7 2006-10-08     A 0.5000000 c
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Sep 26, 2016 at 12:07 PM, Ista Zahn <[hidden email]> wrote:
>> On Mon, Sep 26, 2016 at 2:48 PM, Bert Gunter <[hidden email]> wrote:
>>> I thought that that was a typo from the OP, as it disagrees with his
>>> example. But the labels are arbitrary, so in fact cut() will do it
>>> whichever way he meant.
>>
>> I don't see how cut will do it, at least not conveniently. Consider
>> this slightly altered example:
>>
>> library(data.table)
>> DT <- data.table(
>>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>>   fini = rep(as.Date(c('2005-04-20',
>>                        '2006-02-19',
>>                        '2006-06-29',
>>                        '2006-10-08')),
>>              c(3, 1, 1, 2)),
>>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>>
>> DT[, exposure := vector(mode = "numeric", length = .N)]
>> DT[fini < as.Date("2006-01-01"), exposure := 1]
>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>>    exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>>
>> DT
>>
>> ##    id       fini group  exposure
>> ## 1:  2 2005-04-20     A 1.0000000
>> ## 2:  2 2005-04-20     A 1.0000000
>> ## 3:  2 2005-04-20     A 1.0000000
>> ## 4:  5 2006-02-19     B 0.8651608
>> ## 5:  5 2006-06-29     B 0.5092402
>> ## 6:  7 2006-10-08     A 0.5000000
>> ## 7:  7 2006-10-08     A 0.5000000
>>
>> Best,
>> Ista
>>
>>>
>>> -- Bert
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <[hidden email]> wrote:
>>>> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <[hidden email]> wrote:
>>>>> This seems like a job for cut() .
>>>>
>>>> I thought that at first two, but the middle group shouldn't be .87 but rather
>>>>
>>>> exposure" = "2007-01-01" - "fini"
>>>>
>>>> so, I think cut alone won't do it.
>>>>
>>>> Best,
>>>> Ista
>>>>>
>>>>> (I made DT a data frame to avoid loading the data table package. But I
>>>>> assume it would work with a data table too, Check this, though!)
>>>>>
>>>>>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))
>>>>>
>>>>>> DT
>>>>>   id       fini group exposure
>>>>> 1  2 2005-04-20     A        1
>>>>> 2  2 2005-04-20     A        1
>>>>> 3  2 2005-04-20     A        1
>>>>> 4  5 2006-02-19     B     0.87
>>>>> 5  5 2006-02-19     B     0.87
>>>>> 6  7 2006-10-08     A      0.5
>>>>> 7  7 2006-10-08     A      0.5
>>>>>
>>>>>
>>>>> (but note that exposure is a factor, not numeric)
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Bert
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people keep coming along
>>>>> and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>
>>>>>
>>>>> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <[hidden email]> wrote:
>>>>>> Hi Frank,
>>>>>>
>>>>>> lapply(DT) iterates over each column. That doesn't seem to be what you want.
>>>>>>
>>>>>> There are probably better ways, but here is one approach.
>>>>>>
>>>>>> DT[, exposure := vector(mode = "numeric", length = .N)]
>>>>>> DT[fini < as.Date("2006-01-01"), exposure := 1]
>>>>>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>>>>>>       exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
>>>>>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>>>>>>
>>>>>> Best,
>>>>>> Ista
>>>>>>
>>>>>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <[hidden email]> wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> I have a R data table like this:
>>>>>>>
>>>>>>> DT <- data.table(
>>>>>>>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>>>>>>>   fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>>>>>>>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>>>>>>>
>>>>>>>
>>>>>>> I want to construct a new variable "exposure" defined as follows:
>>>>>>>
>>>>>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
>>>>>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
>>>>>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>>>>>>>
>>>>>>>
>>>>>>> So the desired output would be the following data table:
>>>>>>>
>>>>>>>    id                fini exposure group
>>>>>>> 1:  2 2005-04-20        1.00        A
>>>>>>> 2:  2 2005-04-20        1.00        A
>>>>>>> 3:  2 2005-04-20        1.00        A
>>>>>>> 4:  5 2006-02-19        0.87        B
>>>>>>> 5:  5 2006-02-19        0.87        B
>>>>>>> 6:  7 2006-10-08        0.50        A
>>>>>>> 7:  7 2006-10-08        0.50        A
>>>>>>>
>>>>>>>
>>>>>>> I have tried:
>>>>>>>
>>>>>>> DT <- DT[ , list(id, fini, exposure = 0, group)]
>>>>>>> DT.new <- lapply(DT, function(exposure){
>>>>>>>       exposure[fini < as.Date("2006-01-01")] <- 1   # 1st case
>>>>>>>       exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>>>>>>>     exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5       # 3rd case
>>>>>>>       exposure  # return value
>>>>>>>   })
>>>>>>>
>>>>>>>
>>>>>>> But I get an error message.
>>>>>>>
>>>>>>> Thanks for any help!!
>>>>>>>
>>>>>>>
>>>>>>> Frank S.
>>>>>>>
>>>>>>>
>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Using lapply in R data table

Frank S.
Many thanks Ista and Bert for your nice solutions!


As Ista commented in a previous mail, the 0.87 value in my example is not fixed, but for each subject
it depends on the difference "2007-01-01 - fini". However, both of your solutions take into account this
fact.


Frank S.

________________________________
De: Bert Gunter <[hidden email]>
Enviat el: dilluns, 26 de setembre de 2016 23:18:52
Per a: Ista Zahn
A/c: Frank S.; [hidden email]
Tema: Re: [R] Using lapply in R data table

... and just for fun, here's an alternative in which mapply() is used
to vectorize switch(); again, whether you like it may be just a matter
of taste, although I suspect it might be less efficient than ifelse(),
which is already vectorized:

DT <- within(DT,
            exposure <- {
              mapply(function(x,fac)switch(as.character(fac),
                          a = 1,
                          b = difftime(as.Date("2007-01-01"), x,
units="days")/365.25,
                          c = .5
                    ),
              x = fini,
              fac =
cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")),
                        labels= letters[1:3])
              )}
      )


> DT
  id       fini group  exposure
1  2 2005-04-20     A 1.0000000
2  2 2005-04-20     A 1.0000000
3  2 2005-04-20     A 1.0000000
4  5 2006-02-19     B 0.8651608
5  5 2006-06-29     B 0.5092402
6  7 2006-10-08     A 0.5000000
7  7 2006-10-08     A 0.5000000


Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 26, 2016 at 1:27 PM, Bert Gunter <[hidden email]> wrote:

> Ista:
>
> Aha -- now I see the point. My bad. You are right. I was careless.
>
> However, cut() with ifelse() might simplify the code a bit and/or make
> it more readable. To be clear, this is just a matter of taste; e.g.
> using your data and a data frame instead of a data table:
>
>> DT <- within(DT,
>         exposure <- {
>           f <-cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")),
>               labels= letters[1:3])
>           ifelse(f == "a", 1,
>                  ifelse( f == "c", .5,
>                     difftime(as.Date("2007-01-01"), fini, units="days")/365.25))
> }
>         )
>
>
>> DT
>   id       fini group  exposure f
> 1  2 2005-04-20     A 1.0000000 a
> 2  2 2005-04-20     A 1.0000000 a
> 3  2 2005-04-20     A 1.0000000 a
> 4  5 2006-02-19     B 0.8651608 b
> 5  5 2006-06-29     B 0.5092402 b
> 6  7 2006-10-08     A 0.5000000 c
> 7  7 2006-10-08     A 0.5000000 c
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Sep 26, 2016 at 12:07 PM, Ista Zahn <[hidden email]> wrote:
>> On Mon, Sep 26, 2016 at 2:48 PM, Bert Gunter <[hidden email]> wrote:
>>> I thought that that was a typo from the OP, as it disagrees with his
>>> example. But the labels are arbitrary, so in fact cut() will do it
>>> whichever way he meant.
>>
>> I don't see how cut will do it, at least not conveniently. Consider
>> this slightly altered example:
>>
>> library(data.table)
>> DT <- data.table(
>>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>>   fini = rep(as.Date(c('2005-04-20',
>>                        '2006-02-19',
>>                        '2006-06-29',
>>                        '2006-10-08')),
>>              c(3, 1, 1, 2)),
>>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>>
>> DT[, exposure := vector(mode = "numeric", length = .N)]
>> DT[fini < as.Date("2006-01-01"), exposure := 1]
>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>>    exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>>
>> DT
>>
>> ##    id       fini group  exposure
>> ## 1:  2 2005-04-20     A 1.0000000
>> ## 2:  2 2005-04-20     A 1.0000000
>> ## 3:  2 2005-04-20     A 1.0000000
>> ## 4:  5 2006-02-19     B 0.8651608
>> ## 5:  5 2006-06-29     B 0.5092402
>> ## 6:  7 2006-10-08     A 0.5000000
>> ## 7:  7 2006-10-08     A 0.5000000
>>
>> Best,
>> Ista
>>
>>>
>>> -- Bert
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Mon, Sep 26, 2016 at 11:37 AM, Ista Zahn <[hidden email]> wrote:
>>>> On Mon, Sep 26, 2016 at 1:59 PM, Bert Gunter <[hidden email]> wrote:
>>>>> This seems like a job for cut() .
>>>>
>>>> I thought that at first two, but the middle group shouldn't be .87 but rather
>>>>
>>>> exposure" = "2007-01-01" - "fini"
>>>>
>>>> so, I think cut alone won't do it.
>>>>
>>>> Best,
>>>> Ista
>>>>>
>>>>> (I made DT a data frame to avoid loading the data table package. But I
>>>>> assume it would work with a data table too, Check this, though!)
>>>>>
>>>>>> DT <- within(DT, exposure <- cut(fini,as.Date(c("2000-01-01","2006-01-01","2006-06-30","2006-12-21")), labels= c(1,.87,.5)))
>>>>>
>>>>>> DT
>>>>>   id       fini group exposure
>>>>> 1  2 2005-04-20     A        1
>>>>> 2  2 2005-04-20     A        1
>>>>> 3  2 2005-04-20     A        1
>>>>> 4  5 2006-02-19     B     0.87
>>>>> 5  5 2006-02-19     B     0.87
>>>>> 6  7 2006-10-08     A      0.5
>>>>> 7  7 2006-10-08     A      0.5
>>>>>
>>>>>
>>>>> (but note that exposure is a factor, not numeric)
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Bert
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "The trouble with having an open mind is that people keep coming along
>>>>> and sticking things into it."
>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>>
>>>>>
>>>>> On Mon, Sep 26, 2016 at 10:05 AM, Ista Zahn <[hidden email]> wrote:
>>>>>> Hi Frank,
>>>>>>
>>>>>> lapply(DT) iterates over each column. That doesn't seem to be what you want.
>>>>>>
>>>>>> There are probably better ways, but here is one approach.
>>>>>>
>>>>>> DT[, exposure := vector(mode = "numeric", length = .N)]
>>>>>> DT[fini < as.Date("2006-01-01"), exposure := 1]
>>>>>> DT[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30"),
>>>>>>       exposure := difftime(as.Date("2007-01-01"), fini, units="days")/365.25]
>>>>>> DT[fini >= as.Date("2006-07-01"), exposure := 0.5]
>>>>>>
>>>>>> Best,
>>>>>> Ista
>>>>>>
>>>>>> On Mon, Sep 26, 2016 at 11:28 AM, Frank S. <[hidden email]> wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>> I have a R data table like this:
>>>>>>>
>>>>>>> DT <- data.table(
>>>>>>>   id = rep(c(2, 5, 7), c(3, 2, 2)),
>>>>>>>   fini = rep(as.Date(c('2005-04-20', '2006-02-19', '2006-10-08')), c(3, 2, 2)),
>>>>>>>   group = rep(c("A", "B", "A"), c(3, 2, 2))  )
>>>>>>>
>>>>>>>
>>>>>>> I want to construct a new variable "exposure" defined as follows:
>>>>>>>
>>>>>>> 1) If "fini" earlier than 2006-01-01 --> "exposure" = 1
>>>>>>> 2) If "fini" in [2006-01-01, 2006-06-30] --> "exposure" = "2007-01-01" - "fini"
>>>>>>> 3) If "fini" in [2006-07-01, 2006-12-31] --> "exposure" = 0.5
>>>>>>>
>>>>>>>
>>>>>>> So the desired output would be the following data table:
>>>>>>>
>>>>>>>    id                fini exposure group
>>>>>>> 1:  2 2005-04-20        1.00        A
>>>>>>> 2:  2 2005-04-20        1.00        A
>>>>>>> 3:  2 2005-04-20        1.00        A
>>>>>>> 4:  5 2006-02-19        0.87        B
>>>>>>> 5:  5 2006-02-19        0.87        B
>>>>>>> 6:  7 2006-10-08        0.50        A
>>>>>>> 7:  7 2006-10-08        0.50        A
>>>>>>>
>>>>>>>
>>>>>>> I have tried:
>>>>>>>
>>>>>>> DT <- DT[ , list(id, fini, exposure = 0, group)]
>>>>>>> DT.new <- lapply(DT, function(exposure){
>>>>>>>       exposure[fini < as.Date("2006-01-01")] <- 1   # 1st case
>>>>>>>       exposure[fini >= as.Date("2006-01-01") & fini <= as.Date("2006-06-30")] <- difftime(as.Date("2007-01-01"), fini, units="days")/365.25 # 2nd case
>>>>>>>     exposure[fini >= as.Date("2006-07-01") & fini <= as.Date("2006-12-31")] <- 0.5       # 3rd case
>>>>>>>       exposure  # return value
>>>>>>>   })
>>>>>>>
>>>>>>>
>>>>>>> But I get an error message.
>>>>>>>
[[elided Hotmail spam]]

>>>>>>>
>>>>>>>
>>>>>>> Frank S.
>>>>>>>
>>>>>>>
>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.