Changing cell value for MANY unique pairings of values in 2 columns

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Changing cell value for MANY unique pairings of values in 2 columns

Alicia Ellis
 am cleaning some very messy health record lab data.  Several of the rows
in the VALUE column have text entries and they need to be converted to
numeric in the NUMERIC_VALUE column based on the values in VALUE and
DESCRIPTION.  For example:

df <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than 0.30",
"12%", "<0.2", "Unknown"),
                 DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
                 NUMERIC_VALUE=c(9, 9,9,9,9,9,9))
df

df$NUMERIC_VALUE[df$VALUE == "Positive" & df$DESCRIPTION == "A"]=999999999


However, I need to do this for ~500 unique pairings of VALUE and
DESCRIPTION entries.  I'm trying to find an easy way to do this without
having to have 500 lines of code for each unique pairing.  Some of the
pairings will be changed to the same value (e.g., 99999999, or -999999999)
but many will be unique numeric values.


I've started by creating a new object called rules where a SUBSET of df
rows are included with the new value they should be changed to.


rules <- data.frame(VALUE = c("<60",  "Negative", "Less than 0.30", "<0.2",
"Unknown"),
                    DESCRIPTION = c("A", "B", "C", "E", "E"),
                    NEW_VALUE=c(60, -999999,0.29,0.1,777777))
rules


I tried doing a loop to change the values in df based on the suggested
value in rules:

for (i in (1 : nrow(rules))) {
  df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
[i,2]]=rules[i,3]
}
df

This gives the error:

Error in Ops.factor(df$VALUE, rules[i, 1]) :
  level sets of factors are differentwork and I think because when I write

If I create rules using the exact same levels as df it works:

rules <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than
0.30", "12%", "<0.2", "Unknown"),
                    DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
                    NEW_VALUE=c(60, 999999,-999999,0.29,12,0.1,777777))
rules

for (i in (1 : nrow(rules))) {
  df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
[i,2]]=rules[i,3]
}
df


Can anyone suggest a way to modify my for loop so that it works for a
subset of rows in df and accomplish what I want?  Or suggest a completely
different method that works?


Thanks!

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Changing cell value for MANY unique pairings of values in 2 columns

David Winsemius

> On Mar 17, 2017, at 11:33 AM, Alicia Ellis <[hidden email]> wrote:
>
> am cleaning some very messy health record lab data.  Several of the rows
> in the VALUE column have text entries and they need to be converted to
> numeric in the NUMERIC_VALUE column based on the values in VALUE and
> DESCRIPTION.  For example:
>
> df <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than 0.30",
> "12%", "<0.2", "Unknown"),
>                 DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
>                 NUMERIC_VALUE=c(9, 9,9,9,9,9,9))
> df
>
> df$NUMERIC_VALUE[df$VALUE == "Positive" & df$DESCRIPTION == "A"]=999999999
>
>
> However, I need to do this for ~500 unique pairings of VALUE and
> DESCRIPTION entries.  I'm trying to find an easy way to do this without
> having to have 500 lines of code for each unique pairing.  Some of the
> pairings will be changed to the same value (e.g., 99999999, or -999999999)
> but many will be unique numeric values.

I'm not convinced that you have the necessary scientific background to do this job properly. There are different sorts of lab tests: enzymatic activity, solute concentrations, and viral titers come to mind immediately for which the handling would be materially different. If you have a serum sodium level of less than 100 mEq/ml, then that is a value inconsistent with human life and the value should be set to NA. If you have a value of alkaline phosphatase that is 0 it is most suggestive of specimen mishandling. If you have a hepatitis B antigen level of "Positive", then it seems a perfectly informative value that should not be changed.

Furthermore the attempt to change vlaues that you think are missing to 99999999, or -999999999 is simply wrongheaded. Learn to use the missing value indicator NA rather thna setting these to a numeric value.

You should seek advice within your organization before you charge ahead with this strategy.

--
David.

>
>
> I've started by creating a new object called rules where a SUBSET of df
> rows are included with the new value they should be changed to.
>
>
> rules <- data.frame(VALUE = c("<60",  "Negative", "Less than 0.30", "<0.2",
> "Unknown"),
>                    DESCRIPTION = c("A", "B", "C", "E", "E"),
>                    NEW_VALUE=c(60, -999999,0.29,0.1,777777))
> rules
>
>
> I tried doing a loop to change the values in df based on the suggested
> value in rules:
>
> for (i in (1 : nrow(rules))) {
>  df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
> [i,2]]=rules[i,3]
> }
> df
>
> This gives the error:
>
> Error in Ops.factor(df$VALUE, rules[i, 1]) :
>  level sets of factors are differentwork and I think because when I write
>
> If I create rules using the exact same levels as df it works:
>
> rules <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than
> 0.30", "12%", "<0.2", "Unknown"),
>                    DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
>                    NEW_VALUE=c(60, 999999,-999999,0.29,12,0.1,777777))
> rules
>
> for (i in (1 : nrow(rules))) {
>  df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
> [i,2]]=rules[i,3]
> }
> df
>
>
> Can anyone suggest a way to modify my for loop so that it works for a
> subset of rows in df and accomplish what I want?  Or suggest a completely
> different method that works?
>
>
> Thanks!
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Changing cell value for MANY unique pairings of values in 2 columns

David Winsemius

> On Mar 18, 2017, at 9:52 AM, David Winsemius <[hidden email]> wrote:
>
>
>> On Mar 17, 2017, at 11:33 AM, Alicia Ellis <[hidden email]> wrote:
>>
>> am cleaning some very messy health record lab data.  Several of the rows
>> in the VALUE column have text entries and they need to be converted to
>> numeric in the NUMERIC_VALUE column based on the values in VALUE and
>> DESCRIPTION.  For example:
>>
>> df <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than 0.30",
>> "12%", "<0.2", "Unknown"),
>>                DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
>>                NUMERIC_VALUE=c(9, 9,9,9,9,9,9))
>> df
>>
>> df$NUMERIC_VALUE[df$VALUE == "Positive" & df$DESCRIPTION == "A"]=999999999
>>
>>
>> However, I need to do this for ~500 unique pairings of VALUE and
>> DESCRIPTION entries.  I'm trying to find an easy way to do this without
>> having to have 500 lines of code for each unique pairing.  Some of the
>> pairings will be changed to the same value (e.g., 99999999, or -999999999)
>> but many will be unique numeric values.
>
> I'm not convinced that you have the necessary scientific background to do this job properly. There are different sorts of lab tests: enzymatic activity, solute concentrations, and viral titers come to mind immediately for which the handling would be materially different. If you have a serum sodium level of less than 100 mEq/ml, then that is a value inconsistent with human life and the value should be set to NA. If you have a value of alkaline phosphatase that is 0 it is most suggestive of specimen mishandling. If you have a hepatitis B antigen level of "Positive", then it seems a perfectly informative value that should not be changed.
>
> Furthermore the attempt to change vlaues that you think are missing to 99999999, or -999999999 is simply wrongheaded. Learn to use the missing value indicator NA rather thna setting these to a numeric value.
>
> You should seek advice within your organization before you charge ahead with this strategy.

I've been informed that my concerns are misplaced and that the scientific concerns I raised were inflated. The "answer" then might be:

merge( df, rules, by=1:2, all.x=TRUE)  # Or

merge( df, rules, by=c("VALUE", "DESCRIPTION") , all.x=TRUE)


Which returns in this example:


           VALUE DESCRIPTION NUMERIC_VALUE  NEW_VALUE
1           <0.2           E         9e+00       0.10
2            <60           A         9e+00      60.00
3            12%           D         9e+00         NA
4 Less than 0.30           C         9e+00       0.29
5       Negative           B         9e+00 -999999.00
6       Positive           A         1e+09         NA
7        Unknown           E         9e+00  777777.00

I'm leaving it in this form because I remain worried. I have no particular concerns about setting a value of "<0.2" to 0.1 but worry about setting values of  "<60" to 60.00 or "Less than 0.30" to 0.29 don't really accord with practice that I have seen in analysis of laboratory values. And I remain concerned that using flagrantly false numeric values as indicators will create serious errors down the line if this data is ever used by someone who has not been involved with its manipulation or has even forgotten months later how it was massaged.

--
David.

> --
> David.
>>
>>
>> I've started by creating a new object called rules where a SUBSET of df
>> rows are included with the new value they should be changed to.
>>
>>
>> rules <- data.frame(VALUE = c("<60",  "Negative", "Less than 0.30", "<0.2",
>> "Unknown"),
>>                   DESCRIPTION = c("A", "B", "C", "E", "E"),
>>                   NEW_VALUE=c(60, -999999,0.29,0.1,777777))
>> rules
>>
>>
>> I tried doing a loop to change the values in df based on the suggested
>> value in rules:
>>
>> for (i in (1 : nrow(rules))) {
>> df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
>> [i,2]]=rules[i,3]
>> }
>> df
>>
>> This gives the error:
>>
>> Error in Ops.factor(df$VALUE, rules[i, 1]) :
>> level sets of factors are differentwork and I think because when I write
>>
>> If I create rules using the exact same levels as df it works:
>>
>> rules <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than
>> 0.30", "12%", "<0.2", "Unknown"),
>>                   DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
>>                   NEW_VALUE=c(60, 999999,-999999,0.29,12,0.1,777777))
>> rules
>>
>> for (i in (1 : nrow(rules))) {
>> df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
>> [i,2]]=rules[i,3]
>> }
>> df
>>
>>
>> Can anyone suggest a way to modify my for loop so that it works for a
>> subset of rows in df and accomplish what I want?  Or suggest a completely
>> different method that works?
>>
>>
>> Thanks!
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Changing cell value for MANY unique pairings of values in 2 columns

Alicia Ellis
The solution proposed below does not accomplish my goal.  In the column
labeled NEW_VALUE, there are two NAs where the value in NUMERIC_VALUE for
that row should be.

Can anyone else suggest a solution to my problem?

Thanks!

On Sat, Mar 18, 2017 at 1:33 PM, David Winsemius <[hidden email]>
wrote:

>
> > On Mar 18, 2017, at 9:52 AM, David Winsemius <[hidden email]>
> wrote:
> >
> >
> >> On Mar 17, 2017, at 11:33 AM, Alicia Ellis <[hidden email]>
> wrote:
> >>
> >> am cleaning some very messy health record lab data.  Several of the rows
> >> in the VALUE column have text entries and they need to be converted to
> >> numeric in the NUMERIC_VALUE column based on the values in VALUE and
> >> DESCRIPTION.  For example:
> >>
> >> df <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than
> 0.30",
> >> "12%", "<0.2", "Unknown"),
> >>                DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
> >>                NUMERIC_VALUE=c(9, 9,9,9,9,9,9))
> >> df
> >>
> >> df$NUMERIC_VALUE[df$VALUE == "Positive" & df$DESCRIPTION ==
> "A"]=999999999
> >>
> >>
> >> However, I need to do this for ~500 unique pairings of VALUE and
> >> DESCRIPTION entries.  I'm trying to find an easy way to do this without
> >> having to have 500 lines of code for each unique pairing.  Some of the
> >> pairings will be changed to the same value (e.g., 99999999, or
> -999999999)
> >> but many will be unique numeric values.
> >
> > I'm not convinced that you have the necessary scientific background to
> do this job properly. There are different sorts of lab tests: enzymatic
> activity, solute concentrations, and viral titers come to mind immediately
> for which the handling would be materially different. If you have a serum
> sodium level of less than 100 mEq/ml, then that is a value inconsistent
> with human life and the value should be set to NA. If you have a value of
> alkaline phosphatase that is 0 it is most suggestive of specimen
> mishandling. If you have a hepatitis B antigen level of "Positive", then it
> seems a perfectly informative value that should not be changed.
> >
> > Furthermore the attempt to change vlaues that you think are missing to
> 99999999, or -999999999 is simply wrongheaded. Learn to use the missing
> value indicator NA rather thna setting these to a numeric value.
> >
> > You should seek advice within your organization before you charge ahead
> with this strategy.
>
> I've been informed that my concerns are misplaced and that the scientific
> concerns I raised were inflated. The "answer" then might be:
>
> merge( df, rules, by=1:2, all.x=TRUE)  # Or
>
> merge( df, rules, by=c("VALUE", "DESCRIPTION") , all.x=TRUE)
>
>
> Which returns in this example:
>
>
>            VALUE DESCRIPTION NUMERIC_VALUE  NEW_VALUE
> 1           <0.2           E         9e+00       0.10
> 2            <60           A         9e+00      60.00
> 3            12%           D         9e+00         NA
> 4 Less than 0.30           C         9e+00       0.29
> 5       Negative           B         9e+00 -999999.00
> 6       Positive           A         1e+09         NA
> 7        Unknown           E         9e+00  777777.00
>
> I'm leaving it in this form because I remain worried. I have no particular
> concerns about setting a value of "<0.2" to 0.1 but worry about setting
> values of  "<60" to 60.00 or "Less than 0.30" to 0.29 don't really accord
> with practice that I have seen in analysis of laboratory values. And I
> remain concerned that using flagrantly false numeric values as indicators
> will create serious errors down the line if this data is ever used by
> someone who has not been involved with its manipulation or has even
> forgotten months later how it was massaged.
>
> --
> David.
> > --
> > David.
> >>
> >>
> >> I've started by creating a new object called rules where a SUBSET of df
> >> rows are included with the new value they should be changed to.
> >>
> >>
> >> rules <- data.frame(VALUE = c("<60",  "Negative", "Less than 0.30",
> "<0.2",
> >> "Unknown"),
> >>                   DESCRIPTION = c("A", "B", "C", "E", "E"),
> >>                   NEW_VALUE=c(60, -999999,0.29,0.1,777777))
> >> rules
> >>
> >>
> >> I tried doing a loop to change the values in df based on the suggested
> >> value in rules:
> >>
> >> for (i in (1 : nrow(rules))) {
> >> df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
> >> [i,2]]=rules[i,3]
> >> }
> >> df
> >>
> >> This gives the error:
> >>
> >> Error in Ops.factor(df$VALUE, rules[i, 1]) :
> >> level sets of factors are differentwork and I think because when I write
> >>
> >> If I create rules using the exact same levels as df it works:
> >>
> >> rules <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than
> >> 0.30", "12%", "<0.2", "Unknown"),
> >>                   DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
> >>                   NEW_VALUE=c(60, 999999,-999999,0.29,12,0.1,777777))
> >> rules
> >>
> >> for (i in (1 : nrow(rules))) {
> >> df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
> >> [i,2]]=rules[i,3]
> >> }
> >> df
> >>
> >>
> >> Can anyone suggest a way to modify my for loop so that it works for a
> >> subset of rows in df and accomplish what I want?  Or suggest a
> completely
> >> different method that works?
> >>
> >>
> >> Thanks!
> >>
> >>      [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Changing cell value for MANY unique pairings of values in 2 columns

Jim Lemon-4
Hi Alicia,
If I understand this, perhaps:

df<-data.frame(VALUE=c("<60","Positive","Negative","Less than 0.30",
 "12%","<0.2","Unknown"),
 DESCRIPTION = c("A","A", "B","C","D","E","E"),
 NUMERIC_VALUE=c(9,9,9,9,9,9,9))

df$NUMERIC_VALUE[df$VALUE == "Positive" & df$DESCRIPTION == "A"]<-
 999999999

rules<-data.frame(VALUE = c("<60","Negative","Less than 0.30",
 "<0.2","Unknown"),
 DESCRIPTION = c("A", "B", "C", "E", "E"),
 NEW_VALUE=c(60, -999999,0.29,0.1,777777))

for(row in 1:dim(df)[1]) {
 if(df$VALUE %in% rules$VALUE && df$DESCRIPTION %in% rules$DESCRIPTION) {
  which_value<-which(as.character(df$VALUE[row]) ==
   as.character(rules$VALUE))
  which_desc<-which(as.character(df$DESCRIPTION[row]) ==
   as.character(rules$DESCRIPTION))
  cat(which_value,which_desc,"\n")
  if(length(which_value) && length(which_desc) &&
   which_value == which_desc)
   df$NUMERIC_VALUE[row]<-rules$NEW_VALUE[which_value]
 }
}

Jim

On Tue, Mar 21, 2017 at 9:51 AM, Alicia Ellis <[hidden email]> wrote:

> The solution proposed below does not accomplish my goal.  In the column
> labeled NEW_VALUE, there are two NAs where the value in NUMERIC_VALUE for
> that row should be.
>
> Can anyone else suggest a solution to my problem?
>
> Thanks!
>
> On Sat, Mar 18, 2017 at 1:33 PM, David Winsemius <[hidden email]>
> wrote:
>
>>
>> > On Mar 18, 2017, at 9:52 AM, David Winsemius <[hidden email]>
>> wrote:
>> >
>> >
>> >> On Mar 17, 2017, at 11:33 AM, Alicia Ellis <[hidden email]>
>> wrote:
>> >>
>> >> am cleaning some very messy health record lab data.  Several of the rows
>> >> in the VALUE column have text entries and they need to be converted to
>> >> numeric in the NUMERIC_VALUE column based on the values in VALUE and
>> >> DESCRIPTION.  For example:
>> >>
>> >> df <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than
>> 0.30",
>> >> "12%", "<0.2", "Unknown"),
>> >>                DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
>> >>                NUMERIC_VALUE=c(9, 9,9,9,9,9,9))
>> >> df
>> >>
>> >> df$NUMERIC_VALUE[df$VALUE == "Positive" & df$DESCRIPTION ==
>> "A"]=999999999
>> >>
>> >>
>> >> However, I need to do this for ~500 unique pairings of VALUE and
>> >> DESCRIPTION entries.  I'm trying to find an easy way to do this without
>> >> having to have 500 lines of code for each unique pairing.  Some of the
>> >> pairings will be changed to the same value (e.g., 99999999, or
>> -999999999)
>> >> but many will be unique numeric values.
>> >
>> > I'm not convinced that you have the necessary scientific background to
>> do this job properly. There are different sorts of lab tests: enzymatic
>> activity, solute concentrations, and viral titers come to mind immediately
>> for which the handling would be materially different. If you have a serum
>> sodium level of less than 100 mEq/ml, then that is a value inconsistent
>> with human life and the value should be set to NA. If you have a value of
>> alkaline phosphatase that is 0 it is most suggestive of specimen
>> mishandling. If you have a hepatitis B antigen level of "Positive", then it
>> seems a perfectly informative value that should not be changed.
>> >
>> > Furthermore the attempt to change vlaues that you think are missing to
>> 99999999, or -999999999 is simply wrongheaded. Learn to use the missing
>> value indicator NA rather thna setting these to a numeric value.
>> >
>> > You should seek advice within your organization before you charge ahead
>> with this strategy.
>>
>> I've been informed that my concerns are misplaced and that the scientific
>> concerns I raised were inflated. The "answer" then might be:
>>
>> merge( df, rules, by=1:2, all.x=TRUE)  # Or
>>
>> merge( df, rules, by=c("VALUE", "DESCRIPTION") , all.x=TRUE)
>>
>>
>> Which returns in this example:
>>
>>
>>            VALUE DESCRIPTION NUMERIC_VALUE  NEW_VALUE
>> 1           <0.2           E         9e+00       0.10
>> 2            <60           A         9e+00      60.00
>> 3            12%           D         9e+00         NA
>> 4 Less than 0.30           C         9e+00       0.29
>> 5       Negative           B         9e+00 -999999.00
>> 6       Positive           A         1e+09         NA
>> 7        Unknown           E         9e+00  777777.00
>>
>> I'm leaving it in this form because I remain worried. I have no particular
>> concerns about setting a value of "<0.2" to 0.1 but worry about setting
>> values of  "<60" to 60.00 or "Less than 0.30" to 0.29 don't really accord
>> with practice that I have seen in analysis of laboratory values. And I
>> remain concerned that using flagrantly false numeric values as indicators
>> will create serious errors down the line if this data is ever used by
>> someone who has not been involved with its manipulation or has even
>> forgotten months later how it was massaged.
>>
>> --
>> David.
>> > --
>> > David.
>> >>
>> >>
>> >> I've started by creating a new object called rules where a SUBSET of df
>> >> rows are included with the new value they should be changed to.
>> >>
>> >>
>> >> rules <- data.frame(VALUE = c("<60",  "Negative", "Less than 0.30",
>> "<0.2",
>> >> "Unknown"),
>> >>                   DESCRIPTION = c("A", "B", "C", "E", "E"),
>> >>                   NEW_VALUE=c(60, -999999,0.29,0.1,777777))
>> >> rules
>> >>
>> >>
>> >> I tried doing a loop to change the values in df based on the suggested
>> >> value in rules:
>> >>
>> >> for (i in (1 : nrow(rules))) {
>> >> df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
>> >> [i,2]]=rules[i,3]
>> >> }
>> >> df
>> >>
>> >> This gives the error:
>> >>
>> >> Error in Ops.factor(df$VALUE, rules[i, 1]) :
>> >> level sets of factors are differentwork and I think because when I write
>> >>
>> >> If I create rules using the exact same levels as df it works:
>> >>
>> >> rules <- data.frame(VALUE = c("<60", "Positive", "Negative", "Less than
>> >> 0.30", "12%", "<0.2", "Unknown"),
>> >>                   DESCRIPTION = c("A","A", "B", "C", "D", "E", "E"),
>> >>                   NEW_VALUE=c(60, 999999,-999999,0.29,12,0.1,777777))
>> >> rules
>> >>
>> >> for (i in (1 : nrow(rules))) {
>> >> df$NUMERIC_VALUE[df$VALUE == rules[i,1] & df$DESCRIPTION == rules
>> >> [i,2]]=rules[i,3]
>> >> }
>> >> df
>> >>
>> >>
>> >> Can anyone suggest a way to modify my for loop so that it works for a
>> >> subset of rows in df and accomplish what I want?  Or suggest a
>> completely
>> >> different method that works?
>> >>
>> >>
>> >> Thanks!
>> >>
>> >>      [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> > David Winsemius
>> > Alameda, CA, USA
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...