writing a function to work with dplyr::mutate()

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

writing a function to work with dplyr::mutate()

Steven Rigatti
I am having some problems with what seems like a pretty simple issue. I
have some data where I want to convert numbers. Specifically, this is
cancer data and the size of tumors is encoded using millimeter
measurements. However, if the actual measurement is not available the
coding may imply a less specific range of sizes. For instance numbers 0-89
may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15, etc.

I have many such tables so I would like to be able to write a function
which takes as input a threshold over which new values need to be looked
up, and the new lookup table, returning the new values.

I successfully wrote the function:

translate_seer_numeric <- function(var, upper, lookup) {
    names(lookup) <- c('old','new')
    names(var) <- 'old'
    var <- as.data.frame(var)
    lookup2 <- data.frame(old = c(1:upper),
                          new = c(1:upper))
    lookup3 <- rbind(lookup, lookup2)
 print(var)
    res <- left_join(var, lookup3, by = 'old') %>%
         select(new)

    res

}

test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93, 95, 99),
                  new = c(3, 5, NA))
translate_seer_numeric(test1, 90, lup)

The above test generates the desired output:

  old1  992  953  934   8
  new1  NA2   53   34   8

My problem comes when I try to put this in line with pipes and the mutate
function:

test1 %>%
     mutate(varb = translate_seer_numeric(var = old, 90, lup))####
 Error: Problem with `mutate()` input `varb`.
x Join columns must be present in data.
x Problem with `old`.
i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.

Thoughts??

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

Bert Gunter-2
If you are willing to entertain another approach, have a look at ?cut. By
defining the 'breaks' argument appropriately, you can easily create a
factor that tells you which values should be looked up and which accepted
as is. If I understand correctly, this seems to be what you want. If I have
not, just ignore and wait for a more useful reply.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <[hidden email]> wrote:

> I am having some problems with what seems like a pretty simple issue. I
> have some data where I want to convert numbers. Specifically, this is
> cancer data and the size of tumors is encoded using millimeter
> measurements. However, if the actual measurement is not available the
> coding may imply a less specific range of sizes. For instance numbers 0-89
> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15,
> etc.
>
> I have many such tables so I would like to be able to write a function
> which takes as input a threshold over which new values need to be looked
> up, and the new lookup table, returning the new values.
>
> I successfully wrote the function:
>
> translate_seer_numeric <- function(var, upper, lookup) {
>     names(lookup) <- c('old','new')
>     names(var) <- 'old'
>     var <- as.data.frame(var)
>     lookup2 <- data.frame(old = c(1:upper),
>                           new = c(1:upper))
>     lookup3 <- rbind(lookup, lookup2)
>  print(var)
>     res <- left_join(var, lookup3, by = 'old') %>%
>          select(new)
>
>     res
>
> }
>
> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93, 95,
> 99),
>                   new = c(3, 5, NA))
> translate_seer_numeric(test1, 90, lup)
>
> The above test generates the desired output:
>
>   old1  992  953  934   8
>   new1  NA2   53   34   8
>
> My problem comes when I try to put this in line with pipes and the mutate
> function:
>
> test1 %>%
>      mutate(varb = translate_seer_numeric(var = old, 90, lup))####
>  Error: Problem with `mutate()` input `varb`.
> x Join columns must be present in data.
> x Problem with `old`.
> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
>
> Thoughts??
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

David Winsemius
In reply to this post by Steven Rigatti

On 1/19/21 7:50 AM, Steven Rigatti wrote:

> I am having some problems with what seems like a pretty simple issue. I
> have some data where I want to convert numbers. Specifically, this is
> cancer data and the size of tumors is encoded using millimeter
> measurements. However, if the actual measurement is not available the
> coding may imply a less specific range of sizes. For instance numbers 0-89
> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15, etc.
>
> I have many such tables so I would like to be able to write a function
> which takes as input a threshold over which new values need to be looked
> up, and the new lookup table, returning the new values.
>
> I successfully wrote the function:
>
> translate_seer_numeric <- function(var, upper, lookup) {
>      names(lookup) <- c('old','new')
>      names(var) <- 'old'
>      var <- as.data.frame(var)
>      lookup2 <- data.frame(old = c(1:upper),
>                            new = c(1:upper))
>      lookup3 <- rbind(lookup, lookup2)
>   print(var)
>      res <- left_join(var, lookup3, by = 'old') %>%
>           select(new)
>
>      res
>
> }
>
> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93, 95, 99),


This throws an error when copy-pasted, since you posted in html and
there was no line separator.


>                    new = c(3, 5, NA))
> translate_seer_numeric(test1, 90, lup)
>
> The above test generates the desired output:
>
>    old1  992  953  934   8
>    new1  NA2   53   34   8
>
> My problem comes when I try to put this in line with pipes and the mutate
> function:
>
> test1 %>%
>       mutate(varb = translate_seer_numeric(var = old, 90, lup))####


#Added:

library(tidyverse)   # since many people on rhelp are not particularly
"tidy".

>   Error: Problem with `mutate()` input `varb`.
> x Join columns must be present in data.
> x Problem with `old`.
> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.


I think I got useful results with this although you might need to
extract the "new" column from the dataframe result.


test1 %>%    mutate(varb = translate_seer_numeric( . , 90, lup))

#----------

   old
1  99
2  95
3  93
4   8
   old new
1  99  NA
2  95   5
3  93   3
4   8   8

  When you want to refer to the prior result in a piped chain you use a
dot ("."). I'm guessing you know this. But what I saw was that your
successful test case was using a dataframe as the input to the first
parameter of translate_seer_numeric, but you were apparently passing a
column name when it was being used in a pipe.


The error message wasn't particularly helpful to me, but maybe that's
because I don't have enough experience in that non-standard universe. It
did tell us that the there was a problem with "varb" and that was
probably because that was the wrong parameter name. However even
changing the call to just `var=old` would probably have failed as well
because you didn't write the function to accept a variable name as the
first parameter.

Best;

David.

>
> Thoughts??
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

Steven Rigatti
In reply to this post by Bert Gunter-2
It's not that I can't get the output I want. I was able to do that.
It is just that I can't make it pipeable - I get that weird error message
that I don't understand.

On Tue, Jan 19, 2021 at 1:34 PM Bert Gunter <[hidden email]> wrote:

> If you are willing to entertain another approach, have a look at ?cut. By
> defining the 'breaks' argument appropriately, you can easily create a
> factor that tells you which values should be looked up and which accepted
> as is. If I understand correctly, this seems to be what you want. If I have
> not, just ignore and wait for a more useful reply.
>
> Cheers,
> Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <[hidden email]>
> wrote:
>
>> I am having some problems with what seems like a pretty simple issue. I
>> have some data where I want to convert numbers. Specifically, this is
>> cancer data and the size of tumors is encoded using millimeter
>> measurements. However, if the actual measurement is not available the
>> coding may imply a less specific range of sizes. For instance numbers 0-89
>> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
>> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15,
>> etc.
>>
>> I have many such tables so I would like to be able to write a function
>> which takes as input a threshold over which new values need to be looked
>> up, and the new lookup table, returning the new values.
>>
>> I successfully wrote the function:
>>
>> translate_seer_numeric <- function(var, upper, lookup) {
>>     names(lookup) <- c('old','new')
>>     names(var) <- 'old'
>>     var <- as.data.frame(var)
>>     lookup2 <- data.frame(old = c(1:upper),
>>                           new = c(1:upper))
>>     lookup3 <- rbind(lookup, lookup2)
>>  print(var)
>>     res <- left_join(var, lookup3, by = 'old') %>%
>>          select(new)
>>
>>     res
>>
>> }
>>
>> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93,
>> 95, 99),
>>                   new = c(3, 5, NA))
>> translate_seer_numeric(test1, 90, lup)
>>
>> The above test generates the desired output:
>>
>>   old1  992  953  934   8
>>   new1  NA2   53   34   8
>>
>> My problem comes when I try to put this in line with pipes and the mutate
>> function:
>>
>> test1 %>%
>>      mutate(varb = translate_seer_numeric(var = old, 90, lup))####
>>  Error: Problem with `mutate()` input `varb`.
>> x Join columns must be present in data.
>> x Problem with `old`.
>> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
>>
>> Thoughts??
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

Bill Dunlap-2
In reply to this post by Steven Rigatti
Your translate... function seems unnecessarily complicated and reusing the
name 'var' for both the input and the data.frame containing the input makes
it confusing to me.  The following replacement, f, uses your algorithm but
I think gets the answer you want.

f <-
function(var, upper, lookup) {
    names(lookup) <- c('old','new')
    var_df <- data.frame(old = var)
    lookup2 <- data.frame(old = c(1:upper),
                          new = c(1:upper))
    lookup3 <- rbind(lookup, lookup2)
    res <- left_join(var_df, lookup3, by = 'old')
    res$new # return a vector, not a data.frame or tibble.
}
E.g.,
> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( YYY_mm =
f(YYY, 90, lup))
  XXX YYY YYY_mm
1  95  55     55
2  93  66     66
3  10  93      3
4  20  98     NA

You can modify this so that it names the output column based on the name of
the input column (by returning a data.frame/tibble instead of a numeric
vector):

f1 <-
function(var, upper, lookup,  new_varname =
paste0(deparse1(substitute(var)), "_mm")) {
    names(lookup) <- c('old',new_varname)
    var_df <- data.frame(old = var)
    lookup2 <- data.frame(old = c(1:upper),
                          new = c(1:upper))
    names(lookup2)[2] <- new_varname
    lookup3 <- rbind(lookup, lookup2)
    res <- left_join(var_df, lookup3, by = 'old')[2]
    res
}
E.g.,
> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( f1(YYY,
90, lup))
  XXX YYY YYY_mm
1  95  55     55
2  93  66     66
3  10  93      3
4  20  98     NA

-Bill

On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <[hidden email]> wrote:

> I am having some problems with what seems like a pretty simple issue. I
> have some data where I want to convert numbers. Specifically, this is
> cancer data and the size of tumors is encoded using millimeter
> measurements. However, if the actual measurement is not available the
> coding may imply a less specific range of sizes. For instance numbers 0-89
> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15,
> etc.
>
> I have many such tables so I would like to be able to write a function
> which takes as input a threshold over which new values need to be looked
> up, and the new lookup table, returning the new values.
>
> I successfully wrote the function:
>
> translate_seer_numeric <- function(var, upper, lookup) {
>     names(lookup) <- c('old','new')
>     names(var) <- 'old'
>     var <- as.data.frame(var)
>     lookup2 <- data.frame(old = c(1:upper),
>                           new = c(1:upper))
>     lookup3 <- rbind(lookup, lookup2)
>  print(var)
>     res <- left_join(var, lookup3, by = 'old') %>%
>          select(new)
>
>     res
>
> }
>
> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93, 95,
> 99),
>                   new = c(3, 5, NA))
> translate_seer_numeric(test1, 90, lup)
>
> The above test generates the desired output:
>
>   old1  992  953  934   8
>   new1  NA2   53   34   8
>
> My problem comes when I try to put this in line with pipes and the mutate
> function:
>
> test1 %>%
>      mutate(varb = translate_seer_numeric(var = old, 90, lup))####
>  Error: Problem with `mutate()` input `varb`.
> x Join columns must be present in data.
> x Problem with `old`.
> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
>
> Thoughts??
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

David Winsemius

On 1/19/21 11:17 AM, Bill Dunlap wrote:
> Your translate... function seems unnecessarily complicated and reusing the
> name 'var' for both the input and the data.frame containing the input makes
> it confusing to me.  The following replacement, f, uses your algorithm but
> I think gets the answer you want.


I was thinking that the tidyverse might already have a recode-like
operation. But the dplyr::recode appears to be deprecated and you get
referred to case_when. Perhaps following an example from the `case_when`
help page:


case_SEER_tsize <- function(tsize, upper, exceptions){

     case_when(tsize <=upper ~tsize,

               tsize %in% exceptions$bif ~ exceptions$new[match(tsize,
exceptions$bif)])}


I'm guessing that my lack of tidyversatility means there's probably a
`match`-equivalent that I'm not familiar with.


 > test1 <- data.frame(old = c(99,95,93, 8));lup <- data.frame(bif =
c(93, 95, 99),
+                                                            new = c(3,
5, NA))
 >
 > test1 %>%
+     mutate(varb = case_SEER_tsize(.$old, 90, lup))
   old varb
1  99   NA
2  95    5
3  93    3
4   8    8

--

David.

>
> f <-
> function(var, upper, lookup) {
>      names(lookup) <- c('old','new')
>      var_df <- data.frame(old = var)
>      lookup2 <- data.frame(old = c(1:upper),
>                            new = c(1:upper))
>      lookup3 <- rbind(lookup, lookup2)
>      res <- left_join(var_df, lookup3, by = 'old')
>      res$new # return a vector, not a data.frame or tibble.
> }
> E.g.,
>> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( YYY_mm =
> f(YYY, 90, lup))
>    XXX YYY YYY_mm
> 1  95  55     55
> 2  93  66     66
> 3  10  93      3
> 4  20  98     NA
>
> You can modify this so that it names the output column based on the name of
> the input column (by returning a data.frame/tibble instead of a numeric
> vector):
>
> f1 <-
> function(var, upper, lookup,  new_varname =
> paste0(deparse1(substitute(var)), "_mm")) {
>      names(lookup) <- c('old',new_varname)
>      var_df <- data.frame(old = var)
>      lookup2 <- data.frame(old = c(1:upper),
>                            new = c(1:upper))
>      names(lookup2)[2] <- new_varname
>      lookup3 <- rbind(lookup, lookup2)
>      res <- left_join(var_df, lookup3, by = 'old')[2]
>      res
> }
> E.g.,
>> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( f1(YYY,
> 90, lup))
>    XXX YYY YYY_mm
> 1  95  55     55
> 2  93  66     66
> 3  10  93      3
> 4  20  98     NA
>
> -Bill
>
> On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <[hidden email]> wrote:
>
>> I am having some problems with what seems like a pretty simple issue. I
>> have some data where I want to convert numbers. Specifically, this is
>> cancer data and the size of tumors is encoded using millimeter
>> measurements. However, if the actual measurement is not available the
>> coding may imply a less specific range of sizes. For instance numbers 0-89
>> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
>> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15,
>> etc.
>>
>> I have many such tables so I would like to be able to write a function
>> which takes as input a threshold over which new values need to be looked
>> up, and the new lookup table, returning the new values.
>>
>> I successfully wrote the function:
>>
>> translate_seer_numeric <- function(var, upper, lookup) {
>>      names(lookup) <- c('old','new')
>>      names(var) <- 'old'
>>      var <- as.data.frame(var)
>>      lookup2 <- data.frame(old = c(1:upper),
>>                            new = c(1:upper))
>>      lookup3 <- rbind(lookup, lookup2)
>>   print(var)
>>      res <- left_join(var, lookup3, by = 'old') %>%
>>           select(new)
>>
>>      res
>>
>> }
>>
>> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93, 95,
>> 99),
>>                    new = c(3, 5, NA))
>> translate_seer_numeric(test1, 90, lup)
>>
>> The above test generates the desired output:
>>
>>    old1  992  953  934   8
>>    new1  NA2   53   34   8
>>
>> My problem comes when I try to put this in line with pipes and the mutate
>> function:
>>
>> test1 %>%
>>       mutate(varb = translate_seer_numeric(var = old, 90, lup))####
>>   Error: Problem with `mutate()` input `varb`.
>> x Join columns must be present in data.
>> x Problem with `old`.
>> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
>>
>> Thoughts??
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

Jeff Newmiller
In reply to this post by Bert Gunter-2
Second this. There is also the findInterval function, which omits the factor attributes and just returns integers that can be used in lookup tables.

On January 19, 2021 10:33:59 AM PST, Bert Gunter <[hidden email]> wrote:

>If you are willing to entertain another approach, have a look at ?cut.
>By
>defining the 'breaks' argument appropriately, you can easily create a
>factor that tells you which values should be looked up and which
>accepted
>as is. If I understand correctly, this seems to be what you want. If I
>have
>not, just ignore and wait for a more useful reply.
>
>Cheers,
>Bert
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and
>sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
>On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <[hidden email]>
>wrote:
>
>> I am having some problems with what seems like a pretty simple issue.
>I
>> have some data where I want to convert numbers. Specifically, this is
>> cancer data and the size of tumors is encoded using millimeter
>> measurements. However, if the actual measurement is not available the
>> coding may imply a less specific range of sizes. For instance numbers
>0-89
>> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
>> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to
>15,
>> etc.
>>
>> I have many such tables so I would like to be able to write a
>function
>> which takes as input a threshold over which new values need to be
>looked
>> up, and the new lookup table, returning the new values.
>>
>> I successfully wrote the function:
>>
>> translate_seer_numeric <- function(var, upper, lookup) {
>>     names(lookup) <- c('old','new')
>>     names(var) <- 'old'
>>     var <- as.data.frame(var)
>>     lookup2 <- data.frame(old = c(1:upper),
>>                           new = c(1:upper))
>>     lookup3 <- rbind(lookup, lookup2)
>>  print(var)
>>     res <- left_join(var, lookup3, by = 'old') %>%
>>          select(new)
>>
>>     res
>>
>> }
>>
>> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif =
>c(93, 95,
>> 99),
>>                   new = c(3, 5, NA))
>> translate_seer_numeric(test1, 90, lup)
>>
>> The above test generates the desired output:
>>
>>   old1  992  953  934   8
>>   new1  NA2   53   34   8
>>
>> My problem comes when I try to put this in line with pipes and the
>mutate
>> function:
>>
>> test1 %>%
>>      mutate(varb = translate_seer_numeric(var = old, 90, lup))####
>>  Error: Problem with `mutate()` input `varb`.
>> x Join columns must be present in data.
>> x Problem with `old`.
>> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
>>
>> Thoughts??
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

Steven Rigatti
I use case_when a lot - but I have a lot of dynamic tables to treat this
way and case_when has to be hard-coded.

On Tue, Jan 19, 2021 at 3:48 PM Jeff Newmiller <[hidden email]>
wrote:

> Second this. There is also the findInterval function, which omits the
> factor attributes and just returns integers that can be used in lookup
> tables.
>
> On January 19, 2021 10:33:59 AM PST, Bert Gunter <[hidden email]>
> wrote:
> >If you are willing to entertain another approach, have a look at ?cut.
> >By
> >defining the 'breaks' argument appropriately, you can easily create a
> >factor that tells you which values should be looked up and which
> >accepted
> >as is. If I understand correctly, this seems to be what you want. If I
> >have
> >not, just ignore and wait for a more useful reply.
> >
> >Cheers,
> >Bert
> >
> >Bert Gunter
> >
> >"The trouble with having an open mind is that people keep coming along
> >and
> >sticking things into it."
> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> >On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <[hidden email]>
> >wrote:
> >
> >> I am having some problems with what seems like a pretty simple issue.
> >I
> >> have some data where I want to convert numbers. Specifically, this is
> >> cancer data and the size of tumors is encoded using millimeter
> >> measurements. However, if the actual measurement is not available the
> >> coding may imply a less specific range of sizes. For instance numbers
> >0-89
> >> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
> >> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to
> >15,
> >> etc.
> >>
> >> I have many such tables so I would like to be able to write a
> >function
> >> which takes as input a threshold over which new values need to be
> >looked
> >> up, and the new lookup table, returning the new values.
> >>
> >> I successfully wrote the function:
> >>
> >> translate_seer_numeric <- function(var, upper, lookup) {
> >>     names(lookup) <- c('old','new')
> >>     names(var) <- 'old'
> >>     var <- as.data.frame(var)
> >>     lookup2 <- data.frame(old = c(1:upper),
> >>                           new = c(1:upper))
> >>     lookup3 <- rbind(lookup, lookup2)
> >>  print(var)
> >>     res <- left_join(var, lookup3, by = 'old') %>%
> >>          select(new)
> >>
> >>     res
> >>
> >> }
> >>
> >> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif =
> >c(93, 95,
> >> 99),
> >>                   new = c(3, 5, NA))
> >> translate_seer_numeric(test1, 90, lup)
> >>
> >> The above test generates the desired output:
> >>
> >>   old1  992  953  934   8
> >>   new1  NA2   53   34   8
> >>
> >> My problem comes when I try to put this in line with pipes and the
> >mutate
> >> function:
> >>
> >> test1 %>%
> >>      mutate(varb = translate_seer_numeric(var = old, 90, lup))####
> >>  Error: Problem with `mutate()` input `varb`.
> >> x Join columns must be present in data.
> >> x Problem with `old`.
> >> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
> >>
> >> Thoughts??
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

David Winsemius


Sent from my iPhone

> On Jan 19, 2021, at 1:52 PM, Steven Rigatti <[hidden email]> wrote:
>
> I use case_when a lot - but I have a lot of dynamic tables to treat this
> way and case_when has to be hard-coded.

But, but, but .... my case_when-based illustration let you pass a parameter dataframe that contains a translation table.


David.

>
>> On Tue, Jan 19, 2021 at 3:48 PM Jeff Newmiller <[hidden email]>
>> wrote:
>>
>> Second this. There is also the findInterval function, which omits the
>> factor attributes and just returns integers that can be used in lookup
>> tables.
>>
>>> On January 19, 2021 10:33:59 AM PST, Bert Gunter <[hidden email]>
>>> wrote:
>>> If you are willing to entertain another approach, have a look at ?cut.
>>> By
>>> defining the 'breaks' argument appropriately, you can easily create a
>>> factor that tells you which values should be looked up and which
>>> accepted
>>> as is. If I understand correctly, this seems to be what you want. If I
>>> have
>>> not, just ignore and wait for a more useful reply.
>>>
>>> Cheers,
>>> Bert
>>>
>>> Bert Gunter
>>>
>>> "The trouble with having an open mind is that people keep coming along
>>> and
>>> sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <[hidden email]>
>>> wrote:
>>>
>>>> I am having some problems with what seems like a pretty simple issue.
>>> I
>>>> have some data where I want to convert numbers. Specifically, this is
>>>> cancer data and the size of tumors is encoded using millimeter
>>>> measurements. However, if the actual measurement is not available the
>>>> coding may imply a less specific range of sizes. For instance numbers
>>> 0-89
>>>> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
>>>> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to
>>> 15,
>>>> etc.
>>>>
>>>> I have many such tables so I would like to be able to write a
>>> function
>>>> which takes as input a threshold over which new values need to be
>>> looked
>>>> up, and the new lookup table, returning the new values.
>>>>
>>>> I successfully wrote the function:
>>>>
>>>> translate_seer_numeric <- function(var, upper, lookup) {
>>>>    names(lookup) <- c('old','new')
>>>>    names(var) <- 'old'
>>>>    var <- as.data.frame(var)
>>>>    lookup2 <- data.frame(old = c(1:upper),
>>>>                          new = c(1:upper))
>>>>    lookup3 <- rbind(lookup, lookup2)
>>>> print(var)
>>>>    res <- left_join(var, lookup3, by = 'old') %>%
>>>>         select(new)
>>>>
>>>>    res
>>>>
>>>> }
>>>>
>>>> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif =
>>> c(93, 95,
>>>> 99),
>>>>                  new = c(3, 5, NA))
>>>> translate_seer_numeric(test1, 90, lup)
>>>>
>>>> The above test generates the desired output:
>>>>
>>>>  old1  992  953  934   8
>>>>  new1  NA2   53   34   8
>>>>
>>>> My problem comes when I try to put this in line with pipes and the
>>> mutate
>>>> function:
>>>>
>>>> test1 %>%
>>>>     mutate(varb = translate_seer_numeric(var = old, 90, lup))####
>>>> Error: Problem with `mutate()` input `varb`.
>>>> x Join columns must be present in data.
>>>> x Problem with `old`.
>>>> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
>>>>
>>>> Thoughts??
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>      [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
>>
>
>    [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

John Kane-3
In reply to this post by David Winsemius
David
library(tidyverse)
char_vec <- sample(c("a", "b", "c"), 10, replace = TRUE)
recode(char_vec, a = "Apple")

works for me.

On Tue, 19 Jan 2021 at 15:13, David Winsemius <[hidden email]>
wrote:

>
> On 1/19/21 11:17 AM, Bill Dunlap wrote:
> > Your translate... function seems unnecessarily complicated and reusing
> the
> > name 'var' for both the input and the data.frame containing the input
> makes
> > it confusing to me.  The following replacement, f, uses your algorithm
> but
> > I think gets the answer you want.
>
>
> I was thinking that the tidyverse might already have a recode-like
> operation. But the dplyr::recode appears to be deprecated and you get
> referred to case_when. Perhaps following an example from the `case_when`
> help page:
>
>
> case_SEER_tsize <- function(tsize, upper, exceptions){
>
>      case_when(tsize <=upper ~tsize,
>
>                tsize %in% exceptions$bif ~ exceptions$new[match(tsize,
> exceptions$bif)])}
>
>
> I'm guessing that my lack of tidyversatility means there's probably a
> `match`-equivalent that I'm not familiar with.
>
>
>  > test1 <- data.frame(old = c(99,95,93, 8));lup <- data.frame(bif =
> c(93, 95, 99),
> +                                                            new = c(3,
> 5, NA))
>  >
>  > test1 %>%
> +     mutate(varb = case_SEER_tsize(.$old, 90, lup))
>    old varb
> 1  99   NA
> 2  95    5
> 3  93    3
> 4   8    8
>
> --
>
> David.
>
> >
> > f <-
> > function(var, upper, lookup) {
> >      names(lookup) <- c('old','new')
> >      var_df <- data.frame(old = var)
> >      lookup2 <- data.frame(old = c(1:upper),
> >                            new = c(1:upper))
> >      lookup3 <- rbind(lookup, lookup2)
> >      res <- left_join(var_df, lookup3, by = 'old')
> >      res$new # return a vector, not a data.frame or tibble.
> > }
> > E.g.,
> >> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( YYY_mm =
> > f(YYY, 90, lup))
> >    XXX YYY YYY_mm
> > 1  95  55     55
> > 2  93  66     66
> > 3  10  93      3
> > 4  20  98     NA
> >
> > You can modify this so that it names the output column based on the name
> of
> > the input column (by returning a data.frame/tibble instead of a numeric
> > vector):
> >
> > f1 <-
> > function(var, upper, lookup,  new_varname =
> > paste0(deparse1(substitute(var)), "_mm")) {
> >      names(lookup) <- c('old',new_varname)
> >      var_df <- data.frame(old = var)
> >      lookup2 <- data.frame(old = c(1:upper),
> >                            new = c(1:upper))
> >      names(lookup2)[2] <- new_varname
> >      lookup3 <- rbind(lookup, lookup2)
> >      res <- left_join(var_df, lookup3, by = 'old')[2]
> >      res
> > }
> > E.g.,
> >> data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( f1(YYY,
> > 90, lup))
> >    XXX YYY YYY_mm
> > 1  95  55     55
> > 2  93  66     66
> > 3  10  93      3
> > 4  20  98     NA
> >
> > -Bill
> >
> > On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <[hidden email]>
> wrote:
> >
> >> I am having some problems with what seems like a pretty simple issue. I
> >> have some data where I want to convert numbers. Specifically, this is
> >> cancer data and the size of tumors is encoded using millimeter
> >> measurements. However, if the actual measurement is not available the
> >> coding may imply a less specific range of sizes. For instance numbers
> 0-89
> >> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
> >> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15,
> >> etc.
> >>
> >> I have many such tables so I would like to be able to write a function
> >> which takes as input a threshold over which new values need to be looked
> >> up, and the new lookup table, returning the new values.
> >>
> >> I successfully wrote the function:
> >>
> >> translate_seer_numeric <- function(var, upper, lookup) {
> >>      names(lookup) <- c('old','new')
> >>      names(var) <- 'old'
> >>      var <- as.data.frame(var)
> >>      lookup2 <- data.frame(old = c(1:upper),
> >>                            new = c(1:upper))
> >>      lookup3 <- rbind(lookup, lookup2)
> >>   print(var)
> >>      res <- left_join(var, lookup3, by = 'old') %>%
> >>           select(new)
> >>
> >>      res
> >>
> >> }
> >>
> >> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93,
> 95,
> >> 99),
> >>                    new = c(3, 5, NA))
> >> translate_seer_numeric(test1, 90, lup)
> >>
> >> The above test generates the desired output:
> >>
> >>    old1  992  953  934   8
> >>    new1  NA2   53   34   8
> >>
> >> My problem comes when I try to put this in line with pipes and the
> mutate
> >> function:
> >>
> >> test1 %>%
> >>       mutate(varb = translate_seer_numeric(var = old, 90, lup))####
> >>   Error: Problem with `mutate()` input `varb`.
> >> x Join columns must be present in data.
> >> x Problem with `old`.
> >> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
> >>
> >> Thoughts??
> >>
> >>          [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
John Kane
Kingston ON Canada

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

Jeff Newmiller
In reply to this post by Steven Rigatti
I avoid case_when, so don't complain to me about it. Bert and I both suggested standard evaluation approaches that are very amenable to using lookup tables.

On January 19, 2021 1:51:17 PM PST, Steven Rigatti <[hidden email]> wrote:

>I use case_when a lot - but I have a lot of dynamic tables to treat
>this
>way and case_when has to be hard-coded.
>
>On Tue, Jan 19, 2021 at 3:48 PM Jeff Newmiller
><[hidden email]>
>wrote:
>
>> Second this. There is also the findInterval function, which omits the
>> factor attributes and just returns integers that can be used in
>lookup
>> tables.
>>
>> On January 19, 2021 10:33:59 AM PST, Bert Gunter
><[hidden email]>
>> wrote:
>> >If you are willing to entertain another approach, have a look at
>?cut.
>> >By
>> >defining the 'breaks' argument appropriately, you can easily create
>a
>> >factor that tells you which values should be looked up and which
>> >accepted
>> >as is. If I understand correctly, this seems to be what you want. If
>I
>> >have
>> >not, just ignore and wait for a more useful reply.
>> >
>> >Cheers,
>> >Bert
>> >
>> >Bert Gunter
>> >
>> >"The trouble with having an open mind is that people keep coming
>along
>> >and
>> >sticking things into it."
>> >-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >
>> >
>> >On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti
><[hidden email]>
>> >wrote:
>> >
>> >> I am having some problems with what seems like a pretty simple
>issue.
>> >I
>> >> have some data where I want to convert numbers. Specifically, this
>is
>> >> cancer data and the size of tumors is encoded using millimeter
>> >> measurements. However, if the actual measurement is not available
>the
>> >> coding may imply a less specific range of sizes. For instance
>numbers
>> >0-89
>> >> may indicate size in mm, but 90 indicates "greater than 90 mm" ,
>91
>> >> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92
>to
>> >15,
>> >> etc.
>> >>
>> >> I have many such tables so I would like to be able to write a
>> >function
>> >> which takes as input a threshold over which new values need to be
>> >looked
>> >> up, and the new lookup table, returning the new values.
>> >>
>> >> I successfully wrote the function:
>> >>
>> >> translate_seer_numeric <- function(var, upper, lookup) {
>> >>     names(lookup) <- c('old','new')
>> >>     names(var) <- 'old'
>> >>     var <- as.data.frame(var)
>> >>     lookup2 <- data.frame(old = c(1:upper),
>> >>                           new = c(1:upper))
>> >>     lookup3 <- rbind(lookup, lookup2)
>> >>  print(var)
>> >>     res <- left_join(var, lookup3, by = 'old') %>%
>> >>          select(new)
>> >>
>> >>     res
>> >>
>> >> }
>> >>
>> >> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif =
>> >c(93, 95,
>> >> 99),
>> >>                   new = c(3, 5, NA))
>> >> translate_seer_numeric(test1, 90, lup)
>> >>
>> >> The above test generates the desired output:
>> >>
>> >>   old1  992  953  934   8
>> >>   new1  NA2   53   34   8
>> >>
>> >> My problem comes when I try to put this in line with pipes and the
>> >mutate
>> >> function:
>> >>
>> >> test1 %>%
>> >>      mutate(varb = translate_seer_numeric(var = old, 90, lup))####
>> >>  Error: Problem with `mutate()` input `varb`.
>> >> x Join columns must be present in data.
>> >> x Problem with `old`.
>> >> i Input `varb` is `translate_seer_numeric(var = test1$old, 90,
>lup)`.
>> >>
>> >> Thoughts??
>> >>
>> >>         [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> >______________________________________________
>> >[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >https://stat.ethz.ch/mailman/listinfo/r-help
>> >PLEASE do read the posting guide
>> >http://www.R-project.org/posting-guide.html
>> >and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Sent from my phone. Please excuse my brevity.
>>

--
Sent from my phone. Please excuse my brevity.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: writing a function to work with dplyr::mutate()

Steven Rigatti
In reply to this post by Bill Dunlap-2
This works perfectly. Ah, just needed a vector as output instead of a
1-column df.
Thank you!!!

On Tue, Jan 19, 2021 at 2:18 PM Bill Dunlap <[hidden email]>
wrote:

> Your translate... function seems unnecessarily complicated and reusing the
> name 'var' for both the input and the data.frame containing the input makes
> it confusing to me.  The following replacement, f, uses your algorithm but
> I think gets the answer you want.
>
> f <-
> function(var, upper, lookup) {
>     names(lookup) <- c('old','new')
>     var_df <- data.frame(old = var)
>     lookup2 <- data.frame(old = c(1:upper),
>                           new = c(1:upper))
>     lookup3 <- rbind(lookup, lookup2)
>     res <- left_join(var_df, lookup3, by = 'old')
>     res$new # return a vector, not a data.frame or tibble.
> }
> E.g.,
> > data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( YYY_mm =
> f(YYY, 90, lup))
>   XXX YYY YYY_mm
> 1  95  55     55
> 2  93  66     66
> 3  10  93      3
> 4  20  98     NA
>
> You can modify this so that it names the output column based on the name
> of the input column (by returning a data.frame/tibble instead of a numeric
> vector):
>
> f1 <-
> function(var, upper, lookup,  new_varname =
> paste0(deparse1(substitute(var)), "_mm")) {
>     names(lookup) <- c('old',new_varname)
>     var_df <- data.frame(old = var)
>     lookup2 <- data.frame(old = c(1:upper),
>                           new = c(1:upper))
>     names(lookup2)[2] <- new_varname
>     lookup3 <- rbind(lookup, lookup2)
>     res <- left_join(var_df, lookup3, by = 'old')[2]
>     res
> }
> E.g.,
> > data.frame(XXX=c(95,93,10,20), YYY=c(55,66,93,98)) %>% mutate( f1(YYY,
> 90, lup))
>   XXX YYY YYY_mm
> 1  95  55     55
> 2  93  66     66
> 3  10  93      3
> 4  20  98     NA
>
> -Bill
>
> On Tue, Jan 19, 2021 at 10:24 AM Steven Rigatti <[hidden email]>
> wrote:
>
>> I am having some problems with what seems like a pretty simple issue. I
>> have some data where I want to convert numbers. Specifically, this is
>> cancer data and the size of tumors is encoded using millimeter
>> measurements. However, if the actual measurement is not available the
>> coding may imply a less specific range of sizes. For instance numbers 0-89
>> may indicate size in mm, but 90 indicates "greater than 90 mm" , 91
>> indicates "1 to 2 cm", etc. So, I want to translate 91 to 90, 92 to 15,
>> etc.
>>
>> I have many such tables so I would like to be able to write a function
>> which takes as input a threshold over which new values need to be looked
>> up, and the new lookup table, returning the new values.
>>
>> I successfully wrote the function:
>>
>> translate_seer_numeric <- function(var, upper, lookup) {
>>     names(lookup) <- c('old','new')
>>     names(var) <- 'old'
>>     var <- as.data.frame(var)
>>     lookup2 <- data.frame(old = c(1:upper),
>>                           new = c(1:upper))
>>     lookup3 <- rbind(lookup, lookup2)
>>  print(var)
>>     res <- left_join(var, lookup3, by = 'old') %>%
>>          select(new)
>>
>>     res
>>
>> }
>>
>> test1 <- data.frame(old = c(99,95,93, 8))lup <- data.frame(bif = c(93,
>> 95, 99),
>>                   new = c(3, 5, NA))
>> translate_seer_numeric(test1, 90, lup)
>>
>> The above test generates the desired output:
>>
>>   old1  992  953  934   8
>>   new1  NA2   53   34   8
>>
>> My problem comes when I try to put this in line with pipes and the mutate
>> function:
>>
>> test1 %>%
>>      mutate(varb = translate_seer_numeric(var = old, 90, lup))####
>>  Error: Problem with `mutate()` input `varb`.
>> x Join columns must be present in data.
>> x Problem with `old`.
>> i Input `varb` is `translate_seer_numeric(var = test1$old, 90, lup)`.
>>
>> Thoughts??
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.