Finicky factor comparison operators

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Finicky factor comparison operators

johnmark
 This error occurs because the "==" comparison operator doesn't allow comparison of ordered and normal factors:

df[df5$close_quarter == as.Date("2011-02-01"),]
Warning message:
In `[.data.frame`(df, df$close_quarter == as.Date("2011-02-01"),  :
  Incompatible methods ("Ops.ordered", "Ops.Date") for "=="

Why should this be a problem -- Isn't this being overly cautious?  Can anyone think of a case where coercing the ordered factor to a normal factor for comparisons of == would do the wrong thing?

Perhaps this is a question for the developer's section.

Cheers -john mark agosta

Reply | Threaded
Open this post in threaded view
|

Re: Finicky factor comparison operators

Michael Weylandt
It's not a matter of unordered & ordered factors, but ordered factors
and Dates (as the warning says)

I can see at least one ambiguity -- should comparison be made from the
level or the internal code -- so the warning makes sense to me (though
an error might make even more sense). Generally, for factors that
correspond to non-Date quantities, the comparison likely isn't well
defined.

How would you resolve this comparison in general?

Michael

On Sat, Feb 18, 2012 at 3:08 PM, johnmark <[hidden email]> wrote:

>  This error occurs because the "==" comparison operator doesn't allow
> comparison of ordered and normal factors:
>
> /df[df5$close_quarter == as.Date("2011-02-01"),]/
> Warning message:
> In /`[.data.frame`(df, df$close_quarter == as.Date("2011-02-01")/,  :
>  Incompatible methods ("Ops.ordered", "Ops.Date") for "=="
>
> Why should this be a problem -- Isn't this being overly cautious?  Can
> anyone think of a case where coercing the ordered factor to a normal factor
> for comparisons of == would do the wrong thing?
>
> Perhaps this is a question for the developer's section.
>
> Cheers -john mark agosta
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Finicky-factor-comparison-operators-tp4400377p4400377.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Finicky factor comparison operators

johnmark
MIchael -

Thanks for your insight.  I think I see where you're going with this.  

To make '==' comparisons for subsetting against an ordered factor, I've had to create a lookup table for all possible values I'd ever want to compare against (all dates covered by the quarters in question, in this case) that maps into the ordered factors values.  This is wrapped by a function that returns an ordered factor, which allows me to write:

(opps$close_quarter == which.quarter.end("2010-10-20")

Otherwise if I try to create an ordered factor from the constant just for the purposes of comparison, the error tells me that ordered factors from different sources cannot be compared:

(opps$close_quarter == factor("2007-10-20", ordered=T)
Error in Ops.factor(factor("2007-10-30", ordered = T), quarter.factors[1, 2]) :
  level sets of factors are different


That makes sense, since internally factors are integers -- "enums" in other terms.

But what I want to avoid -- and what I don't see as necessary is explicitly coercing the terms to a common representation that mimics their print form:

as.character("2007-10-20")== as.character(factor("2007-10-20", ordered=T))

I don't think there should be confusion since the conversion to print form is "obvious" -- but it does conflict with the conversion rules for creating vectors by c():

c("2011-10-20", factor("2007-10-20", ordered=T))
[1] "2011-10-20" "1"


where the factor is converted to its internal "enum" representation, then to a character.

Having given this some more thought to what motivated the original question, one could use "which()" to invert the factor's levels vector:

which("2008-04-30" == levels(quarter.factors[,2]))
[1] 3


Its still not clear to me what exactly are the implicit conversion rules for factors.

Cheers -jm

Reply | Threaded
Open this post in threaded view
|

Re: Finicky factor comparison operators

David Winsemius

On Feb 20, 2012, at 1:45 AM, johnmark wrote:

> MIchael -
>
> Thanks for your insight.  I think I see where you're going with this.
>
> To make '==' comparisons for subsetting against an ordered factor,  
> I've had
> to create a lookup table for all possible values I'd ever want to  
> compare
> against (all dates covered by the quarters in question, in this  
> case) that
> maps into the ordered factors values.  This is wrapped by a function  
> that
> returns an ordered factor, which allows me to write:
>
> /(opps$close_quarter == which.quarter.end("2010-10-20")/
>
> Otherwise if I try to create an ordered factor from the constant  
> just for
> the purposes of comparison, the error tells me that ordered factors  
> from
> different sources cannot be compared:
>
> /(opps$close_quarter == factor("2007-10-20", ordered=T)
> Error in Ops.factor(factor("2007-10-30", ordered = T),  
> quarter.factors[1,
> 2]) :
>  level sets of factors are different/

Actually it is telling you that you cannot compare ordered factors  
which have different levels. That makes perfect sense for the same  
reasons that you are not allowed to compare Dates to ordered factors.  
If the factors from different sources had the same levels you should  
have succeeded.

 > z <- factor(LETTERS[3:1], ordered = TRUE)
 > z3 <- factor(LETTERS[1:3] , ordered=TRUE)
 > z[2] == z3[2]
[1] TRUE


>
> That makes sense, since internally factors are integers -- "enums"  
> in other
> terms.
>
> But what I want to avoid -- and what I don't see as necessary is  
> explicitly
> coercing the terms to a common representation that mimics their  
> print form:
>
> /as.character("2007-10-20")== as.character(factor("2007-10-20",  
> ordered=T))
> /
> I don't think there should be confusion since the conversion to  
> print form
> is "obvious" -- but it does conflict with the conversion rules for  
> creating
> vectors by c():
>
> /c("2011-10-20", factor("2007-10-20", ordered=T))
> [1] "2011-10-20" "1" /
>
> where the factor is converted to its internal "enum" representation,  
> then to
> a character.

That just an example of the need to use as.character when converting  
data out of factor class.

>
> Having given this some more thought to what motivated the original  
> question,
> one could use "which()" to invert the factor's levels vector:
>
> /which("2008-04-30" == levels(quarter.factors[,2]))
> [1] 3 /
>
> Its still not clear to me what exactly are the implicit conversion  
> rules for
> factors.

In your last case you are comparing a character to a character value  
and getting the expected result. (Since levels(quarter.factors) is NOT  
a factor.)  You should also succeed when testing equality between  
ordered factor and character types. You have still not provided an  
example for testing so this may suffice.

 > z <- factor(LETTERS[3:1], ordered = TRUE)
 > z == "A"
[1] FALSE FALSE  TRUE

You should be able to assemble a list of valid candidate (character)  
values with levels(fac). Or if you want them in factor representation  
then use unique(fac).


--
David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.