problem subsetting data frame with variable instead of constant

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

problem subsetting data frame with variable instead of constant

vaneet
Hello,

I've encountered a very weird issue with the method subset(), or maybe this is something I don't know about said method that when you're subsetting based on the columns of a data frame you can only use constants (0.1, 2.3, 2.2) instead of variables?

Here's a look at my data frame called 'ea.cad.pwr':
>ea.ca.pwr[1:5,]
   MAF   OR  POWER
1 0.02 0.01 0.9999
2 0.02 0.02 0.9998
3 0.02 0.03 0.9997
4 0.02 0.04 0.9995
5 0.02 0.05 0.9993


Here's my subset lines which finds no rows:

power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)
power2 = subset(ea.cad.pwr, MAF == maf2 & OR == odds)

Now when maf1 = 0.2 and odds = 1.2 it finds nothing.  I know for a fact that there's a row with these values:
> ea.cad.pwr[1430:1432,]
     MAF   OR  POWER
1430 0.2 0.58 0.9996
1431 0.2 1.20 0.3092
1432 0.2 1.22 0.3914


I have code working in a loop and each previous iteration the subset() function is working fine, but in this iteration some different lines are executed which are relevant to these variables, here they are:

maf1 = maf.adj - 0.01
maf2 = maf.adj + 0.01


Basically maf.adj is always a 2 decimal number (in this case = 0.21), and I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in case maf.adj isn't in the table.  maf.adj is read from another dataframe, when I use it to subset it always works fine but when I do this innocent subtraction for some reason it doesn't work.  If I rewrite statements like this it works:

power1 = subset(ea.cad.pwr, MAF == 0.2 & OR == odds)
power2 = subset(ea.cad.pwr, MAF == 0.22 & OR == odds)


Even if I write this first:

maf1 = 0.2

Then:

power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)

It works as well! That's what's really confusing, how can this subtraction mess everything up?  Please help if you can..thank you!

Vaneet
Reply | Threaded
Open this post in threaded view
|

Re: problem subsetting data frame with variable instead of constant

Petr Savicky
On Fri, Feb 10, 2012 at 08:15:39AM -0800, vaneet wrote:

> Hello,
>
> I've encountered a very weird issue with the method subset(), or maybe this
> is something I don't know about said method that when you're subsetting
> based on the columns of a data frame you can only use constants (0.1, 2.3,
> 2.2) instead of variables?
>
> Here's a look at my data frame called 'ea.cad.pwr':
> *>ea.ca.pwr[1:5,]
>    MAF   OR  POWER
> 1 0.02 0.01 0.9999
> 2 0.02 0.02 0.9998
> 3 0.02 0.03 0.9997
> 4 0.02 0.04 0.9995
> 5 0.02 0.05 0.9993*
>
> Here's my subset lines which finds no rows:
>
> *power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)
> power2 = subset(ea.cad.pwr, MAF == maf2 & OR == odds)
> *
> Now when maf1 = 0.2 and odds = 1.2 it finds nothing.  I know for a fact that
> there's a row with these values:
> *> ea.cad.pwr[1430:1432,]
>      MAF   OR  POWER
> 1430 0.2 0.58 0.9996
> 1431 0.2 1.20 0.3092
> 1432 0.2 1.22 0.3914*
>
> I have code working in a loop and each previous iteration the subset()
> function is working fine, but in this iteration some different lines are
> executed which are relevant to these variables, here they are:
> *
> maf1 = maf.adj - 0.01
> maf2 = maf.adj + 0.01*
>
> Basically maf.adj is always a 2 decimal number (in this case = 0.21), and
> I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in
> case maf.adj isn't in the table.  maf.adj is read from another dataframe,
> when I use it to subset it always works fine but when I do this innocent
> subtraction for some reason it doesn't work.  If I rewrite statements like
> this it works:
>
> *power1 = subset(ea.cad.pwr, MAF == 0.2 & OR == odds)
> power2 = subset(ea.cad.pwr, MAF == 0.22 & OR == odds)
> *
>
> Even if I write this first:
>
> maf1 = 0.2
>
> Then:
>
> power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)
>
> It works as well! That's what's really confusing, how can this subtraction
> mess everything up?  Please help if you can..thank you!

Hi.

This may be a rounding problem. Try

  0.3 - 0.1 == 0.2

  [1] FALSE

Explicit rounding to a not too large number of decimal
digits can help.

  round(0.3 - 0.1, digits=7) == 0.2

  [1] TRUE


See also FAQ 7.31 or http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy

Hope this helps.

Petr Savicky.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: problem subsetting data frame with variable instead of constant

Sarah Goslee
In reply to this post by vaneet
This is likely a representation issue, as in R FAQ 7.31.

?"==" suggests that using identical and all.equal is a better strategy:

     x1 <- 0.5 - 0.3
     x2 <- 0.3 - 0.1
     x1 == x2                           # FALSE on most machines
     identical(all.equal(x1, x2), TRUE) # TRUE everywhere

Sarah

On Fri, Feb 10, 2012 at 11:15 AM, vaneet <[hidden email]> wrote:

> Hello,
>
> I've encountered a very weird issue with the method subset(), or maybe this
> is something I don't know about said method that when you're subsetting
> based on the columns of a data frame you can only use constants (0.1, 2.3,
> 2.2) instead of variables?
>
> Here's a look at my data frame called 'ea.cad.pwr':
> *>ea.ca.pwr[1:5,]
>   MAF   OR  POWER
> 1 0.02 0.01 0.9999
> 2 0.02 0.02 0.9998
> 3 0.02 0.03 0.9997
> 4 0.02 0.04 0.9995
> 5 0.02 0.05 0.9993*
>
> Here's my subset lines which finds no rows:
>
> *power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)
> power2 = subset(ea.cad.pwr, MAF == maf2 & OR == odds)
> *
> Now when maf1 = 0.2 and odds = 1.2 it finds nothing.  I know for a fact that
> there's a row with these values:
> *> ea.cad.pwr[1430:1432,]
>     MAF   OR  POWER
> 1430 0.2 0.58 0.9996
> 1431 0.2 1.20 0.3092
> 1432 0.2 1.22 0.3914*
>
> I have code working in a loop and each previous iteration the subset()
> function is working fine, but in this iteration some different lines are
> executed which are relevant to these variables, here they are:
> *
> maf1 = maf.adj - 0.01
> maf2 = maf.adj + 0.01*
>
> Basically maf.adj is always a 2 decimal number (in this case = 0.21), and
> I'm computing the numbers around it by a difference of 0.01 (0.2,0.22) in
> case maf.adj isn't in the table.  maf.adj is read from another dataframe,
> when I use it to subset it always works fine but when I do this innocent
> subtraction for some reason it doesn't work.  If I rewrite statements like
> this it works:
>
> *power1 = subset(ea.cad.pwr, MAF == 0.2 & OR == odds)
> power2 = subset(ea.cad.pwr, MAF == 0.22 & OR == odds)
> *
>
> Even if I write this first:
>
> maf1 = 0.2
>
> Then:
>
> power1 = subset(ea.cad.pwr, MAF == maf1 & OR == odds)
>
> It works as well! That's what's really confusing, how can this subtraction
> mess everything up?  Please help if you can..thank you!
>
> Vaneet
>


--
Sarah Goslee
http://www.functionaldiversity.org

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: problem subsetting data frame with variable instead of constant

vaneet
Thanks guys, both those solutions work.  I really appreciate the help!