'==' operator: inconsistency in data.frame(...) == NULL

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

'==' operator: inconsistency in data.frame(...) == NULL

Hilmar Berger-4
Dear all,

I just stumbled upon some behavior of the == operator which is at least
somewhat inconsistent.

R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

 > list(a=1:3, b=LETTERS[1:3]) == NULL
logical(0)
 > matrix(1:6, 2,3) == NULL
logical(0)
 > data.frame(a=1:3, b=LETTERS[1:3]) == NULL # same for == logical(0)
Error in matrix(if (is.null(value)) logical() else value, nrow = nr,
dimnames = list(rn,  :
   length of 'dimnames' [2] not equal to array extent

 > data.frame(NULL) == 1
<0 x 0 matrix>
 > data.frame(NULL) == NULL
<0 x 0 matrix>
 > data.frame(NULL) == logical(0)
<0 x 0 matrix>

I wonder if data.frame(<some non-empty data>) == NULL should also return
a value instead of an error. R help reads:

"At least one of |x| and |y| must be an atomic vector, but if the other
is a list *R* attempts to coerce it to the type of the atomic vector:
this will succeed if the list is made up of elements of length one that
can be coerced to the correct type.

If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw."

It is not clear from the help what to expect for NULL or empty atomic
vectors. It is also strange that for list() there is no error but for
data.frame() with the same data an error is thrown. I can see that there
might be reasons to return logical(0) instead of FALSE, but I do not
fully understand why there should be differences between e.g. matrix()
and data.frame().

Also, It is at least somewhat strange that data.frame(NULL) == NULL and
similar expressions return an empty matrix, while comparing a normal
filled matrix to NULL returns logical(0).

Even if this behavior is expected, the error message shown by
data.frame(...) == NULL is not very informative.

Thanks and best regards,

Hilmar





        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: '==' operator: inconsistency in data.frame(...) == NULL

Martin Maechler
>>>>> Hilmar Berger
>>>>>     on Wed, 4 Sep 2019 15:25:46 +0200 writes:

    > Dear all,

    > I just stumbled upon some behavior of the == operator which is at least
    > somewhat inconsistent.

    > R version 3.6.1 (2019-07-05) -- "Action of the Toes"
    > Copyright (C) 2019 The R Foundation for Statistical Computing
    > Platform: x86_64-w64-mingw32/x64 (64-bit)

    >> list(a=1:3, b=LETTERS[1:3]) == NULL
    > logical(0)
    >> matrix(1:6, 2,3) == NULL
    > logical(0)
    >> data.frame(a=1:3, b=LETTERS[1:3]) == NULL # same for == logical(0)
    > Error in matrix(if (is.null(value)) logical() else value, nrow = nr,
    > dimnames = list(rn,  :
    >   length of 'dimnames' [2] not equal to array extent

    >> data.frame(NULL) == 1
    > <0 x 0 matrix>
    >> data.frame(NULL) == NULL
    > <0 x 0 matrix>
    >> data.frame(NULL) == logical(0)
    > <0 x 0 matrix>

    > I wonder if data.frame(<some non-empty data>) == NULL should also return
    > a value instead of an error. R help reads:

        > "At least one of |x| and |y| must be an atomic vector, but
        >  if the other is a list R attempts to coerce it to the
        >  type of the atomic vector: this will succeed if the list
        >  is made up of elements of length one that can be coerced
        >  to the correct type.

        >  If the two arguments are atomic vectors of different
        >  types, one is coerced to the type of the other, the
        >  (decreasing) order of precedence being character, complex,
        >  numeric, integer, logical and raw."

    > It is not clear from the help what to expect for NULL or
    > empty atomic vectors.

Well, strictly speaking an error would be expected for NULL,
as it is *not* an atomic vector, and your main issue

 " data.frame(..) == NULL "

would already be settled by the first half sentence from the
doc, and strictly speaking, even  data.frame(NULL) == NULL
"should" return an error ((Note: I'm not saying it really
 should, but at least the reference does not say it should work at all))

Now,  logical(0)  on the other hand *is* an atomic vector ...


    > It is also strange that for list()
    > there is no error but for data.frame() with the same data
    > an error is thrown. I can see that there might be reasons
    > to return logical(0) instead of FALSE, but I do not fully
    > understand why there should be differences between
    > e.g. matrix() and data.frame().

Well, a [regular base R] matrix() is atomic  and a data frame is not.

    > Also, It is at least somewhat strange that
    > data.frame(NULL) == NULL and similar expressions return an
    > empty matrix, while comparing a normal filled matrix to
    > NULL returns logical(0).

    > Even if this behavior is expected, the error message shown
    > by data.frame(...) == NULL is not very informative.

I'm not at all sure there's any need for a change here.

I would say the following general thinking should be applied

1. The general rule that '==' should be used only for comparing
  atomic objects (as it returns an atomic object, a 'logical' with
  corresponding attributes), is really principal
  and using '==' for anything else has never been "the idea".

2. There are (two) "semi-exceptions" to the above:
2a) Sometimes it has been convenient to treat NULL as if it was
     a zero-length atomic object (of "arbitrary" type/mode).
2b) data.frame()s "should typically" behave like matrices in
    many situations, notably when indexed {and that rule is
    violated (on purpose) by tibbles .. ("drop=FALSE" etc, but
    that's another story)}

So because of these exceptions, you and possibly others may
think  '=='  should "work" with data.frame()s and/or NULL, but
I would not tend to agree.

    > Thanks and best regards,
    > Hilmar

You are welcome!
Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: '==' operator: inconsistency in data.frame(...) == NULL

Hilmar Berger-4
Dear Martin,

On 11/09/2019 09:56, Martin Maechler wrote:

>
>      > I wonder if data.frame(<some non-empty data>) == NULL should also return
>      > a value instead of an error. R help reads:
>
>          > "At least one of |x| and |y| must be an atomic vector, but
> >  if the other is a list R attempts to coerce it to the
> >  type of the atomic vector: this will succeed if the list
> >  is made up of elements of length one that can be coerced
> >  to the correct type.
>
> >  If the two arguments are atomic vectors of different
> >  types, one is coerced to the type of the other, the
> >  (decreasing) order of precedence being character, complex,
> >  numeric, integer, logical and raw."
>
>      > It is not clear from the help what to expect for NULL or
>      > empty atomic vectors.
>
> Well, strictly speaking an error would be expected for NULL,
> as it is *not* an atomic vector, and your main issue
>
>   " data.frame(..) == NULL "
>
> would already be settled by the first half sentence from the
> doc, and strictly speaking, even  data.frame(NULL) == NULL
> "should" return an error ((Note: I'm not saying it really
>   should, but at least the reference does not say it should work at all))
Thanks, this explanation makes total sense to me. I did not consider
that NULL might be non-atomic. Strangely, is.atomic(NULL) returns TRUE.
On the other hand, I understand that one would not like to treat it like
atomic in ==.

However, in this case one might expect that the error message would be
more like that for S4 objects (which always seem to report an
informative error message for ==):

 > Pos <- setClass("Pos", slots = c(latitude = "numeric", longitude =
"numeric", altitude = "numeric"))
 > p = Pos()
 > p == NULL
Error in p == NULL :
   comparison (1) is possible only for atomic and list types
 > p == "FOO"
Error in p == "FOO" :
   comparison (1) is possible only for atomic and list types

In the data.frame()==NULL cases I have the impression that the fact that
both sides are non-atomic is not properly detected and therefore R tries
to go on with the == method for data.frames.

 From a cursory check in Ops.data.frame() and some debugging I have the
impression that the case of the second argument being non-atomic or
empty is not handled at all and the function progresses until the end,
where it fails in the last step on an empty value:

matrix(unlist(value, recursive = FALSE, use.names = FALSE),
     nrow = nr, dimnames = list(rn, cn))

Best regards,
Hilmar

--
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:    + 49 30 28460 401
 
E-Mail: [hidden email]
Web   : www.mpiib-berlin.mpg.de


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: '==' operator: inconsistency in data.frame(...) == NULL

Hilmar Berger-4
Another example where a data.frame is compared to (here non-null,
non-empty) non-atomic values in Ops.data.frame, resulting in an error
message:

setClass("FOOCLASS2",
          slots = c(M="matrix")
)
ma = new("FOOCLASS2", M=matrix(rnorm(300), 30,10))

 > isS4(ma)
[1] TRUE
 > ma == data.frame(a=1:3)
Error in eval(f) : dims [product 1] do not match the length of object [3]

As for the NULL/logical(0) cases I would suggest to explicitly test for
invalid conditions in Ops.data.frame and generate a comprehensible
message (e.g. "comparison is possible only for atomic and list types")
if appropriate.

Best regards,
Hilmar


On 11/09/2019 11:55, Hilmar Berger wrote:

>
> In the data.frame()==NULL cases I have the impression that the fact
> that both sides are non-atomic is not properly detected and therefore
> R tries to go on with the == method for data.frames.
>
> From a cursory check in Ops.data.frame() and some debugging I have the
> impression that the case of the second argument being non-atomic or
> empty is not handled at all and the function progresses until the end,
> where it fails in the last step on an empty value:
>
> matrix(unlist(value, recursive = FALSE, use.names = FALSE),
>     nrow = nr, dimnames = list(rn, cn))

--
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:    + 49 30 28460 401
 
E-Mail: [hidden email]
Web   : www.mpiib-berlin.mpg.de

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: '==' operator: inconsistency in data.frame(...) == NULL

Hilmar Berger-4
Sorry, I can't reproduce the example below even on the same machine.
However, the following example produces the same error as NULL values in
prior examples:

 > setClass("FOOCLASS",
+          representation("list")
+ )
 > ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))
 > isS4(ma)
[1] TRUE
 > data.frame(a=1:3) == ma
Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE),
nrow = nr,  :
   length of 'dimnames' [2] not equal to array extent

Best,
Hilmar


On 11/09/2019 12:24, Hilmar Berger wrote:

> Another example where a data.frame is compared to (here non-null,
> non-empty) non-atomic values in Ops.data.frame, resulting in an error
> message:
>
> setClass("FOOCLASS2",
>          slots = c(M="matrix")
> )
> ma = new("FOOCLASS2", M=matrix(rnorm(300), 30,10))
>
> > isS4(ma)
> [1] TRUE
> > ma == data.frame(a=1:3)
> Error in eval(f) : dims [product 1] do not match the length of object [3]
>
> As for the NULL/logical(0) cases I would suggest to explicitly test
> for invalid conditions in Ops.data.frame and generate a comprehensible
> message (e.g. "comparison is possible only for atomic and list types")
> if appropriate.
>
> Best regards,
> Hilmar
>
>
> On 11/09/2019 11:55, Hilmar Berger wrote:
>>
>> In the data.frame()==NULL cases I have the impression that the fact
>> that both sides are non-atomic is not properly detected and therefore
>> R tries to go on with the == method for data.frames.
>>
>> From a cursory check in Ops.data.frame() and some debugging I have
>> the impression that the case of the second argument being non-atomic
>> or empty is not handled at all and the function progresses until the
>> end, where it fails in the last step on an empty value:
>>
>> matrix(unlist(value, recursive = FALSE, use.names = FALSE),
>>     nrow = nr, dimnames = list(rn, cn))
>

--
Dr. Hilmar Berger, MD
Max Planck Institute for Infection Biology
Charitéplatz 1
D-10117 Berlin
GERMANY

Phone:  + 49 30 28460 430
Fax:    + 49 30 28460 401
 
E-Mail: [hidden email]
Web   : www.mpiib-berlin.mpg.de

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: '==' operator: inconsistency in data.frame(...) == NULL

Hilmar Berger-4
Dear all,

I did some more tests regarding the == operator in Ops.data.frame (see
below).  All tests done in R 3.6.1 (x86_64-w64-mingw32).

I find that errors are thrown also when comparing a zero length
data.frame to atomic objects with length>0 which should be a valid case
according to the documentation. This can be traced to a check in the
last line of Ops.data.frame which tests for the presence of an empty
result value (i.e. list() ) but does not handle a list of empty values
(i.e. list(logical(0))) which in fact is generated in those cases. There
is a simple fix (see also below).

There are other issues with the S4 class example (i.e. data.frame() ==
<s4_object with representation as list>) which fails for different reasons.

##############################################################################

d_0 = data.frame(a = numeric(0)) # zero length data.frame
d_00 = data.frame(numeric(0)) # zero length data.frame without names
names(d_00) <- NULL # remove names to obtain value being an empty list()
at the end of Ops.data.frame
d_3 = data.frame(a=1:3) # non-empty data.frame

m_0 = matrix(logical(0)) # zero length matrix
#------------------------
# error A:
# Error in matrix(if (is.null(value)) logical() else value, nrow = nr,
dimnames = list(rn,  :
# length of 'dimnames' [2] not equal to array extent

d_0 == 1   # error A
d_00 == 1  # <0 x 0 matrix>
d_3 == 1   # <3 x 1 matrix>

d_0 == logical(0) # error A
d_00 == logical(0) # <0 x 0 matrix>
d_3 == logical(0) # error A

d_0 == NULL # error A
d_00 == NULL # <0 x 0 matrix>
d_3 == NULL # error A

m_0 == d_0  # error A
m_0 == d_00 # <0 x 0 matrix>
m_0 == d3   # error A

# empty matrix for comparison
m_0 == 1 # < 0 x 1 matrix>
m_0 == logical(0) # < 0 x 1 matrix>
m_0 == NULL # < 0 x 1 matrix>

# All errors above could be solved by changing the last line in
Ops.data.frame from
# matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames =
list(rn, cn))
# to
# matrix(if (length(value)==0) logical() else value, nrow = nr, dimnames
= list(rn, cn))
# Alternatively or in addition one could add an explicit test for
data.frame() == NULL if desired and raise an error

#########################################################################################
# non-empty return value but failing in the same code line due to
incompatible dimensions.
# should Ops.data.frame at all be dispatched for <data.frame> == <S4
object> ?
setClass("FOOCLASS",
           representation("list")
)
ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))
isS4(ma)
d_3 == ma # error A
##########################################################################################

Best regards,
Hilmar

Am 11/09/2019 um 13:26 schrieb Hilmar Berger:

> Sorry, I can't reproduce the example below even on the same machine.
> However, the following example produces the same error as NULL values
> in prior examples:
>
> > setClass("FOOCLASS",
> +          representation("list")
> + )
> > ma = new("FOOCLASS", list(M=matrix(rnorm(300), 30,10)))
> > isS4(ma)
> [1] TRUE
> > data.frame(a=1:3) == ma
> Error in matrix(unlist(value, recursive = FALSE, use.names = FALSE),
> nrow = nr,  :
>   length of 'dimnames' [2] not equal to array extent
>
> Best,
> Hilmar
>
>
> On 11/09/2019 12:24, Hilmar Berger wrote:
>> Another example where a data.frame is compared to (here non-null,
>> non-empty) non-atomic values in Ops.data.frame, resulting in an error
>> message:
>>
>> setClass("FOOCLASS2",
>>          slots = c(M="matrix")
>> )
>> ma = new("FOOCLASS2", M=matrix(rnorm(300), 30,10))
>>
>> > isS4(ma)
>> [1] TRUE
>> > ma == data.frame(a=1:3)
>> Error in eval(f) : dims [product 1] do not match the length of object
>> [3]
>>
>> As for the NULL/logical(0) cases I would suggest to explicitly test
>> for invalid conditions in Ops.data.frame and generate a
>> comprehensible message (e.g. "comparison is possible only for atomic
>> and list types") if appropriate.
>>
>> Best regards,
>> Hilmar
>>
>>
>> On 11/09/2019 11:55, Hilmar Berger wrote:
>>>
>>> In the data.frame()==NULL cases I have the impression that the fact
>>> that both sides are non-atomic is not properly detected and
>>> therefore R tries to go on with the == method for data.frames.
>>>
>>> From a cursory check in Ops.data.frame() and some debugging I have
>>> the impression that the case of the second argument being non-atomic
>>> or empty is not handled at all and the function progresses until the
>>> end, where it fails in the last step on an empty value:
>>>
>>> matrix(unlist(value, recursive = FALSE, use.names = FALSE),
>>>     nrow = nr, dimnames = list(rn, cn))
>>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel