Dispatch mechanism seems to alter object before calling method on it

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Dispatch mechanism seems to alter object before calling method on it

Hervé Pagès-2
Hi,

This was quite unexpected:

   setGeneric("foo", function(x) standardGeneric("foo"))

   setMethod("foo", "vector", identity)

   foo(matrix(1:12, ncol=3))
   # [1]  1  2  3  4  5  6  7  8  9 10 11 12

   foo(array(1:24, 4:2))
   # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
22 23 24

If I define a method for array objects, things work as expected though:

   setMethod("foo", "array", identity)

   foo(matrix(1:12, ncol=3))
   #      [,1] [,2] [,3]
   # [1,]    1    5    9
   # [2,]    2    6   10
   # [3,]    3    7   11
   # [4,]    4    8   12

So, luckily, I have a workaround.

But shouldn't the dispatch mechanism stay away from the business of
altering objects before passed to it?

Thanks,
H.

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Dispatch mechanism seems to alter object before calling method on it

Michael Lawrence-3
My understanding is that array (or any other structure) does not
"simply" inherit from vector, because structures are not vectors in
the strictest sense. Basically, once a vector gains attributes, it is
a structure, not a vector. The methods package accommodates this by
defining an "is" relationship between "structure" and "vector" via an
"explicit coerce", such that any "structure" passed to a "vector"
method is first passed to as.vector(), which strips attributes. This
is very much by design.

Michael


On Tue, May 15, 2018 at 5:25 PM, Hervé Pagès <[hidden email]> wrote:

> Hi,
>
> This was quite unexpected:
>
>   setGeneric("foo", function(x) standardGeneric("foo"))
>
>   setMethod("foo", "vector", identity)
>
>   foo(matrix(1:12, ncol=3))
>   # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>
>   foo(array(1:24, 4:2))
>   # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
> 24
>
> If I define a method for array objects, things work as expected though:
>
>   setMethod("foo", "array", identity)
>
>   foo(matrix(1:12, ncol=3))
>   #      [,1] [,2] [,3]
>   # [1,]    1    5    9
>   # [2,]    2    6   10
>   # [3,]    3    7   11
>   # [4,]    4    8   12
>
> So, luckily, I have a workaround.
>
> But shouldn't the dispatch mechanism stay away from the business of
> altering objects before passed to it?
>
> Thanks,
> H.
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Dispatch mechanism seems to alter object before calling method on it

Hervé Pagès-2
On 05/15/2018 09:13 PM, Michael Lawrence wrote:
> My understanding is that array (or any other structure) does not
> "simply" inherit from vector, because structures are not vectors in
> the strictest sense. Basically, once a vector gains attributes, it is
> a structure, not a vector. The methods package accommodates this by
> defining an "is" relationship between "structure" and "vector" via an
> "explicit coerce", such that any "structure" passed to a "vector"
> method is first passed to as.vector(), which strips attributes. This
> is very much by design.

It seems that the problem is really with matrices and arrays, not
with "structures" in general:

   f <- factor(c("z", "x", "z"), levels=letters)
   m <- matrix(1:12, ncol=3)
   df <- data.frame(f=f)
   x <- structure(1:3, titi="A")

Only the matrix looses its attributes when passed to a "vector"
method:

   setGeneric("foo", function(x) standardGeneric("foo"))
   setMethod("foo", "vector", identity)

   foo(f)     # attributes are preserved
   # [1] z x z
   # Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

   foo(m)     # attributes are stripped
   # [1]  1  2  3  4  5  6  7  8  9 10 11 12

   foo(df)    # attributes are preserved
   #   f
   # 1 z
   # 2 x
   # 3 z

   foo(x)     # attributes are preserved
   # [1] 1 2 3
   # attr(,"titi")
   # [1] "A"

Also if structures are passed to as.vector() before being passed to
a "vector" method, shouldn't as.vector() and foo() be equivalent on
them? For 'f' and 'x' they're not:

   as.vector(f)
   # [1] "z" "x" "z"

   as.vector(x)
   # [1] 1 2 3

Finally note that for factors and data frames the "vector" method gets
selected despite the fact that is( , "vector") is FALSE:

   is(f, "vector")
   # [1] FALSE

   is(m, "vector")
   # [1] TRUE

   is(df, "vector")
   # [1] FALSE

   is(x, "vector")
   # [1] TRUE

Couldn't we recognize these problems as real, even if they are by
design? Hopefully we can all agree that:
- the dispatch mechanism should only dispatch, not alter objects;
- is() and selectMethod() should not contradict each other.

Thanks,
H.

>
> Michael
>
>
> On Tue, May 15, 2018 at 5:25 PM, Hervé Pagès <[hidden email]> wrote:
>> Hi,
>>
>> This was quite unexpected:
>>
>>    setGeneric("foo", function(x) standardGeneric("foo"))
>>
>>    setMethod("foo", "vector", identity)
>>
>>    foo(matrix(1:12, ncol=3))
>>    # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>
>>    foo(array(1:24, 4:2))
>>    # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
>> 24
>>
>> If I define a method for array objects, things work as expected though:
>>
>>    setMethod("foo", "array", identity)
>>
>>    foo(matrix(1:12, ncol=3))
>>    #      [,1] [,2] [,3]
>>    # [1,]    1    5    9
>>    # [2,]    2    6   10
>>    # [3,]    3    7   11
>>    # [4,]    4    8   12
>>
>> So, luckily, I have a workaround.
>>
>> But shouldn't the dispatch mechanism stay away from the business of
>> altering objects before passed to it?
>>
>> Thanks,
>> H.
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: [hidden email]
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gynT4YhbmVKZhnX4srXlCWZZRyVBMXG211CKgftdEs0&s=_I0aFHQVnXdBfB5kTLg9TxK_2LHdSuaB6gqZwSx1orQ&e=
>>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Dispatch mechanism seems to alter object before calling method on it

Michael Lawrence-3
Factors and data.frames are not structures, because they must have a
class attribute. Just call them "objects". They are higher level than
structures, which in practice just shape data without adding a lot of
semantics. Compare getClass("matrix") and getClass("factor").

I agree that inheritance through explicit coercion is confusing. As
far as I know, there are only 2 places where it is used:
1) Objects with attributes but no class, basically "structure" and its
subclasses "array" <- "matrix"
2) Classes that extend a reference type ("environment", "name" and
"externalptr") via hidden delegation (@.xData)

I'm not sure if anyone should be doing #2. For #1, a simple "fix"
would be just to drop inheritance of "structure" from "vector". I
think the intent was to mimic base R behavior, where it will happily
strip (or at least ignore) attributes when passing an array or matrix
to an internal function that expects a vector.

A related problem, which explains why factor and data.frame inherit
from "vector" even though they are objects, is that any S4 object
derived from those needs to be (for pragmatic compatibility reasons)
an integer vector or list, respectively, internally (the virtual
@.Data slot). Separating that from inheritance would probably be
difficult.

Yes, we can consider these to be problems, to some extent stemming
from the behavior and design of R itself, but I'm not sure it's worth
doing anything about them at this point.

Michael

On Wed, May 16, 2018 at 8:33 AM, Hervé Pagès <[hidden email]> wrote:

> On 05/15/2018 09:13 PM, Michael Lawrence wrote:
>>
>> My understanding is that array (or any other structure) does not
>> "simply" inherit from vector, because structures are not vectors in
>> the strictest sense. Basically, once a vector gains attributes, it is
>> a structure, not a vector. The methods package accommodates this by
>> defining an "is" relationship between "structure" and "vector" via an
>> "explicit coerce", such that any "structure" passed to a "vector"
>> method is first passed to as.vector(), which strips attributes. This
>> is very much by design.
>
>
> It seems that the problem is really with matrices and arrays, not
> with "structures" in general:
>
>   f <- factor(c("z", "x", "z"), levels=letters)
>   m <- matrix(1:12, ncol=3)
>   df <- data.frame(f=f)
>   x <- structure(1:3, titi="A")
>
> Only the matrix looses its attributes when passed to a "vector"
> method:
>
>   setGeneric("foo", function(x) standardGeneric("foo"))
>   setMethod("foo", "vector", identity)
>
>   foo(f)     # attributes are preserved
>   # [1] z x z
>   # Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
>
>   foo(m)     # attributes are stripped
>   # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>
>   foo(df)    # attributes are preserved
>   #   f
>   # 1 z
>   # 2 x
>   # 3 z
>
>   foo(x)     # attributes are preserved
>   # [1] 1 2 3
>   # attr(,"titi")
>   # [1] "A"
>
> Also if structures are passed to as.vector() before being passed to
> a "vector" method, shouldn't as.vector() and foo() be equivalent on
> them? For 'f' and 'x' they're not:
>
>   as.vector(f)
>   # [1] "z" "x" "z"
>
>   as.vector(x)
>   # [1] 1 2 3
>
> Finally note that for factors and data frames the "vector" method gets
> selected despite the fact that is( , "vector") is FALSE:
>
>   is(f, "vector")
>   # [1] FALSE
>
>   is(m, "vector")
>   # [1] TRUE
>
>   is(df, "vector")
>   # [1] FALSE
>
>   is(x, "vector")
>   # [1] TRUE
>
> Couldn't we recognize these problems as real, even if they are by
> design? Hopefully we can all agree that:
> - the dispatch mechanism should only dispatch, not alter objects;
> - is() and selectMethod() should not contradict each other.
>
> Thanks,
> H.
>
>>
>> Michael
>>
>>
>> On Tue, May 15, 2018 at 5:25 PM, Hervé Pagès <[hidden email]> wrote:
>>>
>>> Hi,
>>>
>>> This was quite unexpected:
>>>
>>>    setGeneric("foo", function(x) standardGeneric("foo"))
>>>
>>>    setMethod("foo", "vector", identity)
>>>
>>>    foo(matrix(1:12, ncol=3))
>>>    # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>>
>>>    foo(array(1:24, 4:2))
>>>    # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
>>> 22 23
>>> 24
>>>
>>> If I define a method for array objects, things work as expected though:
>>>
>>>    setMethod("foo", "array", identity)
>>>
>>>    foo(matrix(1:12, ncol=3))
>>>    #      [,1] [,2] [,3]
>>>    # [1,]    1    5    9
>>>    # [2,]    2    6   10
>>>    # [3,]    3    7   11
>>>    # [4,]    4    8   12
>>>
>>> So, luckily, I have a workaround.
>>>
>>> But shouldn't the dispatch mechanism stay away from the business of
>>> altering objects before passed to it?
>>>
>>> Thanks,
>>> H.
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: [hidden email]
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gynT4YhbmVKZhnX4srXlCWZZRyVBMXG211CKgftdEs0&s=_I0aFHQVnXdBfB5kTLg9TxK_2LHdSuaB6gqZwSx1orQ&e=
>>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Dispatch mechanism seems to alter object before calling method on it

Hervé Pagès-2
On 05/16/2018 10:22 AM, Michael Lawrence wrote:

> Factors and data.frames are not structures, because they must have a
> class attribute. Just call them "objects". They are higher level than
> structures, which in practice just shape data without adding a lot of
> semantics. Compare getClass("matrix") and getClass("factor").
>
> I agree that inheritance through explicit coercion is confusing. As
> far as I know, there are only 2 places where it is used:
> 1) Objects with attributes but no class, basically "structure" and its
> subclasses "array" <- "matrix"
> 2) Classes that extend a reference type ("environment", "name" and
> "externalptr") via hidden delegation (@.xData)
>
> I'm not sure if anyone should be doing #2. For #1, a simple "fix"
> would be just to drop inheritance of "structure" from "vector". I
> think the intent was to mimic base R behavior, where it will happily
> strip (or at least ignore) attributes when passing an array or matrix
> to an internal function that expects a vector.
>
> A related problem, which explains why factor and data.frame inherit
> from "vector" even though they are objects, is that any S4 object
> derived from those needs to be (for pragmatic compatibility reasons)
> an integer vector or list, respectively, internally (the virtual
> @.Data slot). Separating that from inheritance would probably be
> difficult.
>
> Yes, we can consider these to be problems, to some extent stemming
> from the behavior and design of R itself, but I'm not sure it's worth
> doing anything about them at this point.

Thanks for the informative discussion. It still doesn't explain
why 'm' gets its attributes stripped and 'x' does not though:

   m <- matrix(1:12, ncol=3)
   x <- structure(1:3, titi="A")

   setGeneric("foo", function(x) standardGeneric("foo"))
   setMethod("foo", "vector", identity)

   foo(m)
   # [1]  1  2  3  4  5  6  7  8  9 10 11 12

   foo(x)
   # [1] 1 2 3
   # attr(,"titi")
   # [1] "A"

If I understand correctly, both are "structures", not "objects".

Why aren't these problems worth fixing? More generally speaking
the erratic behavior of the S4 system with respect to S3 objects
has been a plague since the beginning of the methods package.
And many people have complained about this in many occasions in
one way or another. For the record, here are some of the most
notorious problems:

   class(as.numeric(1:4))
   # [1] "numeric"
   class(as(1:4, "numeric"))
   # [1] "integer"

   is.vector(matrix())
   # [1] FALSE
   is(matrix(), "vector")
   # [1] TRUE

   is.list(data.frame())
   # [1] TRUE
   is(data.frame(), "list")
   # [1] FALSE
   extends("data.frame", "list")
   # [1] TRUE

   setClassUnion("vector_OR_factor", c("vector", "factor"))
   is(data.frame(), "vector")
   # [1] FALSE
   is(data.frame(), "factor")
   # [1] FALSE
   is(data.frame(), "vector_OR_factor")
   # [1] TRUE

   etc...

Many people stay away from S4 because of these incomprehensible
behaviors.

Finally note that even pure S3 operations can produce output that
doesn't make sense:

   is.list(data.frame())
   # [1] TRUE
   is.vector(list())
   # [1] TRUE
   is.vector(data.frame())
   # [1] FALSE

   (that is: a data frame is a list and a list is a vector but
   a data frame is not a vector!)

Why aren't these problems taken more seriously?

Thanks,
H.

>
> Michael
>
> On Wed, May 16, 2018 at 8:33 AM, Hervé Pagès <[hidden email]> wrote:
>> On 05/15/2018 09:13 PM, Michael Lawrence wrote:
>>>
>>> My understanding is that array (or any other structure) does not
>>> "simply" inherit from vector, because structures are not vectors in
>>> the strictest sense. Basically, once a vector gains attributes, it is
>>> a structure, not a vector. The methods package accommodates this by
>>> defining an "is" relationship between "structure" and "vector" via an
>>> "explicit coerce", such that any "structure" passed to a "vector"
>>> method is first passed to as.vector(), which strips attributes. This
>>> is very much by design.
>>
>>
>> It seems that the problem is really with matrices and arrays, not
>> with "structures" in general:
>>
>>    f <- factor(c("z", "x", "z"), levels=letters)
>>    m <- matrix(1:12, ncol=3)
>>    df <- data.frame(f=f)
>>    x <- structure(1:3, titi="A")
>>
>> Only the matrix looses its attributes when passed to a "vector"
>> method:
>>
>>    setGeneric("foo", function(x) standardGeneric("foo"))
>>    setMethod("foo", "vector", identity)
>>
>>    foo(f)     # attributes are preserved
>>    # [1] z x z
>>    # Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
>>
>>    foo(m)     # attributes are stripped
>>    # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>
>>    foo(df)    # attributes are preserved
>>    #   f
>>    # 1 z
>>    # 2 x
>>    # 3 z
>>
>>    foo(x)     # attributes are preserved
>>    # [1] 1 2 3
>>    # attr(,"titi")
>>    # [1] "A"
>>
>> Also if structures are passed to as.vector() before being passed to
>> a "vector" method, shouldn't as.vector() and foo() be equivalent on
>> them? For 'f' and 'x' they're not:
>>
>>    as.vector(f)
>>    # [1] "z" "x" "z"
>>
>>    as.vector(x)
>>    # [1] 1 2 3
>>
>> Finally note that for factors and data frames the "vector" method gets
>> selected despite the fact that is( , "vector") is FALSE:
>>
>>    is(f, "vector")
>>    # [1] FALSE
>>
>>    is(m, "vector")
>>    # [1] TRUE
>>
>>    is(df, "vector")
>>    # [1] FALSE
>>
>>    is(x, "vector")
>>    # [1] TRUE
>>
>> Couldn't we recognize these problems as real, even if they are by
>> design? Hopefully we can all agree that:
>> - the dispatch mechanism should only dispatch, not alter objects;
>> - is() and selectMethod() should not contradict each other.
>>
>> Thanks,
>> H.
>>
>>>
>>> Michael
>>>
>>>
>>> On Tue, May 15, 2018 at 5:25 PM, Hervé Pagès <[hidden email]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> This was quite unexpected:
>>>>
>>>>     setGeneric("foo", function(x) standardGeneric("foo"))
>>>>
>>>>     setMethod("foo", "vector", identity)
>>>>
>>>>     foo(matrix(1:12, ncol=3))
>>>>     # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>>>
>>>>     foo(array(1:24, 4:2))
>>>>     # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
>>>> 22 23
>>>> 24
>>>>
>>>> If I define a method for array objects, things work as expected though:
>>>>
>>>>     setMethod("foo", "array", identity)
>>>>
>>>>     foo(matrix(1:12, ncol=3))
>>>>     #      [,1] [,2] [,3]
>>>>     # [1,]    1    5    9
>>>>     # [2,]    2    6   10
>>>>     # [3,]    3    7   11
>>>>     # [4,]    4    8   12
>>>>
>>>> So, luckily, I have a workaround.
>>>>
>>>> But shouldn't the dispatch mechanism stay away from the business of
>>>> altering objects before passed to it?
>>>>
>>>> Thanks,
>>>> H.
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: [hidden email]
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gynT4YhbmVKZhnX4srXlCWZZRyVBMXG211CKgftdEs0&s=_I0aFHQVnXdBfB5kTLg9TxK_2LHdSuaB6gqZwSx1orQ&e=
>>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: [hidden email]
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Dispatch mechanism seems to alter object before calling method on it

Michael Lawrence-3
On Wed, May 16, 2018 at 12:23 PM, Hervé Pagès <[hidden email]> wrote:

> On 05/16/2018 10:22 AM, Michael Lawrence wrote:
>>
>> Factors and data.frames are not structures, because they must have a
>> class attribute. Just call them "objects". They are higher level than
>> structures, which in practice just shape data without adding a lot of
>> semantics. Compare getClass("matrix") and getClass("factor").
>>
>> I agree that inheritance through explicit coercion is confusing. As
>> far as I know, there are only 2 places where it is used:
>> 1) Objects with attributes but no class, basically "structure" and its
>> subclasses "array" <- "matrix"
>> 2) Classes that extend a reference type ("environment", "name" and
>> "externalptr") via hidden delegation (@.xData)
>>
>> I'm not sure if anyone should be doing #2. For #1, a simple "fix"
>> would be just to drop inheritance of "structure" from "vector". I
>> think the intent was to mimic base R behavior, where it will happily
>> strip (or at least ignore) attributes when passing an array or matrix
>> to an internal function that expects a vector.
>>
>> A related problem, which explains why factor and data.frame inherit
>> from "vector" even though they are objects, is that any S4 object
>> derived from those needs to be (for pragmatic compatibility reasons)
>> an integer vector or list, respectively, internally (the virtual
>> @.Data slot). Separating that from inheritance would probably be
>> difficult.
>>
>> Yes, we can consider these to be problems, to some extent stemming
>> from the behavior and design of R itself, but I'm not sure it's worth
>> doing anything about them at this point.
>
>
> Thanks for the informative discussion. It still doesn't explain
> why 'm' gets its attributes stripped and 'x' does not though:
>
>   m <- matrix(1:12, ncol=3)
>   x <- structure(1:3, titi="A")
>
>   setGeneric("foo", function(x) standardGeneric("foo"))
>   setMethod("foo", "vector", identity)
>
>   foo(m)
>   # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>
>   foo(x)
>   # [1] 1 2 3
>   # attr(,"titi")
>   # [1] "A"
>
> If I understand correctly, both are "structures", not "objects".
>

The structure 'x' has no class, so nothing special is going to happen.
As you know, S4 has a well-defined class hierarchy. Just look at
getClass("structure") to see its subclasses. There was at some point
an attempt to create a sort of dynamic inheritance, where a 'test'
function would be called and could figure this out. However, that was
never implemented. For one thing, it would be even more confusing.

> Why aren't these problems worth fixing? More generally speaking
> the erratic behavior of the S4 system with respect to S3 objects
> has been a plague since the beginning of the methods package.
> And many people have complained about this in many occasions in
> one way or another. For the record, here are some of the most
> notorious problems:
>
>   class(as.numeric(1:4))
>   # [1] "numeric"
>   class(as(1:4, "numeric"))
>   # [1] "integer"
>

This is not really a problem with the methods package. is.numeric(1L)
is TRUE, thus integer extends numeric, so coercing an integer to
numeric is a no-op. as.numeric() should really be called as.double()
or something. But that's not going to change, of course.

>   is.vector(matrix())
>   # [1] FALSE
>   is(matrix(), "vector")
>   # [1] TRUE
>

We already discussed this in the context of "structure" inheriting
from "vector" and explicit coercion.

>   is.list(data.frame())
>   # [1] TRUE
>   is(data.frame(), "list")
>   # [1] FALSE
>   extends("data.frame", "list")
>   # [1] TRUE
>

This is a compromise for compatibility with inherits(), since the
result of data.frame() is an S3 object.

>
>   is(data.frame(), "vector")
>   # [1] FALSE
>   is(data.frame(), "factor")
>   # [1] FALSE
>   is(data.frame(), "vector_OR_factor")
>   # [1] TRUE
>

The question is: which inheritance to follow, S3 or S4? Since "vector"
is a basic class, inheritance follows S3 rules. But the class union is
an S4 class, so it follows S4 rules.

>   etc...
>
> Many people stay away from S4 because of these incomprehensible
> behaviors.
>
> Finally note that even pure S3 operations can produce output that
> doesn't make sense:
>
>   is.list(data.frame())
>   # [1] TRUE
>   is.vector(list())
>   # [1] TRUE
>   is.vector(data.frame())
>   # [1] FALSE
>
>   (that is: a data frame is a list and a list is a vector but
>   a data frame is not a vector!)
>

R has no notion of inheritance here. These are just different
functions checking different things. Bringing this up again after so
many discussions borders on trolling.

> Why aren't these problems taken more seriously?
>

They are taken seriously. But there are serious semantic differences
between S3, S4 and base type checking functions. The S3/S4 integration
should be viewed as a tool that is useful in practice, despite forced
compromises.

There are changes that would resolve some of these issues, like those
suggested earlier in this thread, but it's likely too disruptive to
make them now. Energy is better spent thinking about how we will do it
"right" the next time around.

> Thanks,
> H.
>
>>
>> Michael
>>
>> On Wed, May 16, 2018 at 8:33 AM, Hervé Pagès <[hidden email]> wrote:
>>>
>>> On 05/15/2018 09:13 PM, Michael Lawrence wrote:
>>>>
>>>>
>>>> My understanding is that array (or any other structure) does not
>>>> "simply" inherit from vector, because structures are not vectors in
>>>> the strictest sense. Basically, once a vector gains attributes, it is
>>>> a structure, not a vector. The methods package accommodates this by
>>>> defining an "is" relationship between "structure" and "vector" via an
>>>> "explicit coerce", such that any "structure" passed to a "vector"
>>>> method is first passed to as.vector(), which strips attributes. This
>>>> is very much by design.
>>>
>>>
>>>
>>> It seems that the problem is really with matrices and arrays, not
>>> with "structures" in general:
>>>
>>>    f <- factor(c("z", "x", "z"), levels=letters)
>>>    m <- matrix(1:12, ncol=3)
>>>    df <- data.frame(f=f)
>>>    x <- structure(1:3, titi="A")
>>>
>>> Only the matrix looses its attributes when passed to a "vector"
>>> method:
>>>
>>>    setGeneric("foo", function(x) standardGeneric("foo"))
>>>    setMethod("foo", "vector", identity)
>>>
>>>    foo(f)     # attributes are preserved
>>>    # [1] z x z
>>>    # Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
>>>
>>>    foo(m)     # attributes are stripped
>>>    # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>>
>>>    foo(df)    # attributes are preserved
>>>    #   f
>>>    # 1 z
>>>    # 2 x
>>>    # 3 z
>>>
>>>    foo(x)     # attributes are preserved
>>>    # [1] 1 2 3
>>>    # attr(,"titi")
>>>    # [1] "A"
>>>
>>> Also if structures are passed to as.vector() before being passed to
>>> a "vector" method, shouldn't as.vector() and foo() be equivalent on
>>> them? For 'f' and 'x' they're not:
>>>
>>>    as.vector(f)
>>>    # [1] "z" "x" "z"
>>>
>>>    as.vector(x)
>>>    # [1] 1 2 3
>>>
>>> Finally note that for factors and data frames the "vector" method gets
>>> selected despite the fact that is( , "vector") is FALSE:
>>>
>>>    is(f, "vector")
>>>    # [1] FALSE
>>>
>>>    is(m, "vector")
>>>    # [1] TRUE
>>>
>>>    is(df, "vector")
>>>    # [1] FALSE
>>>
>>>    is(x, "vector")
>>>    # [1] TRUE
>>>
>>> Couldn't we recognize these problems as real, even if they are by
>>> design? Hopefully we can all agree that:
>>> - the dispatch mechanism should only dispatch, not alter objects;
>>> - is() and selectMethod() should not contradict each other.
>>>
>>> Thanks,
>>> H.
>>>
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Tue, May 15, 2018 at 5:25 PM, Hervé Pagès <[hidden email]>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> This was quite unexpected:
>>>>>
>>>>>     setGeneric("foo", function(x) standardGeneric("foo"))
>>>>>
>>>>>     setMethod("foo", "vector", identity)
>>>>>
>>>>>     foo(matrix(1:12, ncol=3))
>>>>>     # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>>>>
>>>>>     foo(array(1:24, 4:2))
>>>>>     # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
>>>>> 21
>>>>> 22 23
>>>>> 24
>>>>>
>>>>> If I define a method for array objects, things work as expected though:
>>>>>
>>>>>     setMethod("foo", "array", identity)
>>>>>
>>>>>     foo(matrix(1:12, ncol=3))
>>>>>     #      [,1] [,2] [,3]
>>>>>     # [1,]    1    5    9
>>>>>     # [2,]    2    6   10
>>>>>     # [3,]    3    7   11
>>>>>     # [4,]    4    8   12
>>>>>
>>>>> So, luckily, I have a workaround.
>>>>>
>>>>> But shouldn't the dispatch mechanism stay away from the business of
>>>>> altering objects before passed to it?
>>>>>
>>>>> Thanks,
>>>>> H.
>>>>>
>>>>> --
>>>>> Hervé Pagès
>>>>>
>>>>> Program in Computational Biology
>>>>> Division of Public Health Sciences
>>>>> Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N, M1-B514
>>>>> P.O. Box 19024
>>>>> Seattle, WA 98109-1024
>>>>>
>>>>> E-mail: [hidden email]
>>>>> Phone:  (206) 667-5791
>>>>> Fax:    (206) 667-1319
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>>
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gynT4YhbmVKZhnX4srXlCWZZRyVBMXG211CKgftdEs0&s=_I0aFHQVnXdBfB5kTLg9TxK_2LHdSuaB6gqZwSx1orQ&e=
>>>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: [hidden email]
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Dispatch mechanism seems to alter object before calling method on it

Hervé Pagès-2
On 05/16/2018 01:24 PM, Michael Lawrence wrote:

> On Wed, May 16, 2018 at 12:23 PM, Hervé Pagès <[hidden email]> wrote:
>> On 05/16/2018 10:22 AM, Michael Lawrence wrote:
>>>
>>> Factors and data.frames are not structures, because they must have a
>>> class attribute. Just call them "objects". They are higher level than
>>> structures, which in practice just shape data without adding a lot of
>>> semantics. Compare getClass("matrix") and getClass("factor").
>>>
>>> I agree that inheritance through explicit coercion is confusing. As
>>> far as I know, there are only 2 places where it is used:
>>> 1) Objects with attributes but no class, basically "structure" and its
>>> subclasses "array" <- "matrix"
>>> 2) Classes that extend a reference type ("environment", "name" and
>>> "externalptr") via hidden delegation (@.xData)
>>>
>>> I'm not sure if anyone should be doing #2. For #1, a simple "fix"
>>> would be just to drop inheritance of "structure" from "vector". I
>>> think the intent was to mimic base R behavior, where it will happily
>>> strip (or at least ignore) attributes when passing an array or matrix
>>> to an internal function that expects a vector.
>>>
>>> A related problem, which explains why factor and data.frame inherit
>>> from "vector" even though they are objects, is that any S4 object
>>> derived from those needs to be (for pragmatic compatibility reasons)
>>> an integer vector or list, respectively, internally (the virtual
>>> @.Data slot). Separating that from inheritance would probably be
>>> difficult.
>>>
>>> Yes, we can consider these to be problems, to some extent stemming
>>> from the behavior and design of R itself, but I'm not sure it's worth
>>> doing anything about them at this point.
>>
>>
>> Thanks for the informative discussion. It still doesn't explain
>> why 'm' gets its attributes stripped and 'x' does not though:
>>
>>    m <- matrix(1:12, ncol=3)
>>    x <- structure(1:3, titi="A")
>>
>>    setGeneric("foo", function(x) standardGeneric("foo"))
>>    setMethod("foo", "vector", identity)
>>
>>    foo(m)
>>    # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>
>>    foo(x)
>>    # [1] 1 2 3
>>    # attr(,"titi")
>>    # [1] "A"
>>
>> If I understand correctly, both are "structures", not "objects".
>>
>
> The structure 'x' has no class, so nothing special is going to happen.
> As you know, S4 has a well-defined class hierarchy. Just look at
> getClass("structure") to see its subclasses. There was at some point
> an attempt to create a sort of dynamic inheritance, where a 'test'
> function would be called and could figure this out. However, that was
> never implemented. For one thing, it would be even more confusing.
>
>> Why aren't these problems worth fixing? More generally speaking
>> the erratic behavior of the S4 system with respect to S3 objects
>> has been a plague since the beginning of the methods package.
>> And many people have complained about this in many occasions in
>> one way or another. For the record, here are some of the most
>> notorious problems:
>>
>>    class(as.numeric(1:4))
>>    # [1] "numeric"
>>    class(as(1:4, "numeric"))
>>    # [1] "integer"
>>
>
> This is not really a problem with the methods package. is.numeric(1L)
> is TRUE, thus integer extends numeric, so coercing an integer to
> numeric is a no-op.

Only as(1:4, "numeric", strict=FALSE) should be a no-op.
as(1:4, "numeric") should still coerce because as() is supposed
to perform strict coercion by default.

> as.numeric() should really be called as.double()
> or something. But that's not going to change, of course.

as.numeric() is doing the right thing (i.e. strict coercion) so there
is no need to touch it.

>
>>    is.vector(matrix())
>>    # [1] FALSE
>>    is(matrix(), "vector")
>>    # [1] TRUE
>>
>
> We already discussed this in the context of "structure" inheriting
> from "vector" and explicit coercion.
>
>>    is.list(data.frame())
>>    # [1] TRUE
>>    is(data.frame(), "list")
>>    # [1] FALSE
>>    extends("data.frame", "list")
>>    # [1] TRUE
>>
>
> This is a compromise for compatibility with inherits(), since the
> result of data.frame() is an S3 object.

So we should add to the list that inherits(data.frame(), "list") is
broken too. Once it gets fixed, is(data.frame(), "list") won't need
to compromise anymore and will be free to return the correct answer.

>
>>
>>    is(data.frame(), "vector")
>>    # [1] FALSE
>>    is(data.frame(), "factor")
>>    # [1] FALSE
>>    is(data.frame(), "vector_OR_factor")
>>    # [1] TRUE
>>
>
> The question is: which inheritance to follow, S3 or S4? Since "vector"
> is a basic class, inheritance follows S3 rules. But the class union is
> an S4 class, so it follows S4 rules.
>
>>    etc...
>>
>> Many people stay away from S4 because of these incomprehensible
>> behaviors.
>>
>> Finally note that even pure S3 operations can produce output that
>> doesn't make sense:
>>
>>    is.list(data.frame())
>>    # [1] TRUE
>>    is.vector(list())
>>    # [1] TRUE
>>    is.vector(data.frame())
>>    # [1] FALSE
>>
>>    (that is: a data frame is a list and a list is a vector but
>>    a data frame is not a vector!)
>>
>
> R has no notion of inheritance here. These are just different
> functions checking different things.

Yes, I see that R is does not care about inheritance here.
But is that it? Is that the end of the story? 3 different
functions checking 3 different things but isn't the last one
broken?

> Bringing this up again after so
> many discussions borders on trolling.

Hopefully these issues are not officially "closed".

As you know these issues are serious flaws. They've been biting me
and other Bioconductor developers (including you) over and over in
our development effort in S4Vectors and other Bioconductor packages
that heavily rely on the S4 system.

Unfortunately the discussions I've seen about these issues almost
always die under the weight of complex technical considerations
that are almost impossible to understand if one is not familiar
with the internals of the methods package. Very few of us are
(I'm not counting myself). The problem is that this complexity,
or some obscure early design decisions, seems to be used as an
excuse for not fixing these issues. So yes, I'm finding this
situation quite frustrating to be honest, and I'm only expressing
this frustration here. Note that this is not the same as trolling.
Forgive me if it sounded like that.

H.

>
>> Why aren't these problems taken more seriously?
>>
>
> They are taken seriously. But there are serious semantic differences
> between S3, S4 and base type checking functions. The S3/S4 integration
> should be viewed as a tool that is useful in practice, despite forced
> compromises.
>
> There are changes that would resolve some of these issues, like those
> suggested earlier in this thread, but it's likely too disruptive to
> make them now. Energy is better spent thinking about how we will do it
> "right" the next time around.
>
>> Thanks,
>> H.
>>
>>>
>>> Michael
>>>
>>> On Wed, May 16, 2018 at 8:33 AM, Hervé Pagès <[hidden email]> wrote:
>>>>
>>>> On 05/15/2018 09:13 PM, Michael Lawrence wrote:
>>>>>
>>>>>
>>>>> My understanding is that array (or any other structure) does not
>>>>> "simply" inherit from vector, because structures are not vectors in
>>>>> the strictest sense. Basically, once a vector gains attributes, it is
>>>>> a structure, not a vector. The methods package accommodates this by
>>>>> defining an "is" relationship between "structure" and "vector" via an
>>>>> "explicit coerce", such that any "structure" passed to a "vector"
>>>>> method is first passed to as.vector(), which strips attributes. This
>>>>> is very much by design.
>>>>
>>>>
>>>>
>>>> It seems that the problem is really with matrices and arrays, not
>>>> with "structures" in general:
>>>>
>>>>     f <- factor(c("z", "x", "z"), levels=letters)
>>>>     m <- matrix(1:12, ncol=3)
>>>>     df <- data.frame(f=f)
>>>>     x <- structure(1:3, titi="A")
>>>>
>>>> Only the matrix looses its attributes when passed to a "vector"
>>>> method:
>>>>
>>>>     setGeneric("foo", function(x) standardGeneric("foo"))
>>>>     setMethod("foo", "vector", identity)
>>>>
>>>>     foo(f)     # attributes are preserved
>>>>     # [1] z x z
>>>>     # Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
>>>>
>>>>     foo(m)     # attributes are stripped
>>>>     # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>>>
>>>>     foo(df)    # attributes are preserved
>>>>     #   f
>>>>     # 1 z
>>>>     # 2 x
>>>>     # 3 z
>>>>
>>>>     foo(x)     # attributes are preserved
>>>>     # [1] 1 2 3
>>>>     # attr(,"titi")
>>>>     # [1] "A"
>>>>
>>>> Also if structures are passed to as.vector() before being passed to
>>>> a "vector" method, shouldn't as.vector() and foo() be equivalent on
>>>> them? For 'f' and 'x' they're not:
>>>>
>>>>     as.vector(f)
>>>>     # [1] "z" "x" "z"
>>>>
>>>>     as.vector(x)
>>>>     # [1] 1 2 3
>>>>
>>>> Finally note that for factors and data frames the "vector" method gets
>>>> selected despite the fact that is( , "vector") is FALSE:
>>>>
>>>>     is(f, "vector")
>>>>     # [1] FALSE
>>>>
>>>>     is(m, "vector")
>>>>     # [1] TRUE
>>>>
>>>>     is(df, "vector")
>>>>     # [1] FALSE
>>>>
>>>>     is(x, "vector")
>>>>     # [1] TRUE
>>>>
>>>> Couldn't we recognize these problems as real, even if they are by
>>>> design? Hopefully we can all agree that:
>>>> - the dispatch mechanism should only dispatch, not alter objects;
>>>> - is() and selectMethod() should not contradict each other.
>>>>
>>>> Thanks,
>>>> H.
>>>>
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>> On Tue, May 15, 2018 at 5:25 PM, Hervé Pagès <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> This was quite unexpected:
>>>>>>
>>>>>>      setGeneric("foo", function(x) standardGeneric("foo"))
>>>>>>
>>>>>>      setMethod("foo", "vector", identity)
>>>>>>
>>>>>>      foo(matrix(1:12, ncol=3))
>>>>>>      # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>>>>>
>>>>>>      foo(array(1:24, 4:2))
>>>>>>      # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
>>>>>> 21
>>>>>> 22 23
>>>>>> 24
>>>>>>
>>>>>> If I define a method for array objects, things work as expected though:
>>>>>>
>>>>>>      setMethod("foo", "array", identity)
>>>>>>
>>>>>>      foo(matrix(1:12, ncol=3))
>>>>>>      #      [,1] [,2] [,3]
>>>>>>      # [1,]    1    5    9
>>>>>>      # [2,]    2    6   10
>>>>>>      # [3,]    3    7   11
>>>>>>      # [4,]    4    8   12
>>>>>>
>>>>>> So, luckily, I have a workaround.
>>>>>>
>>>>>> But shouldn't the dispatch mechanism stay away from the business of
>>>>>> altering objects before passed to it?
>>>>>>
>>>>>> Thanks,
>>>>>> H.
>>>>>>
>>>>>> --
>>>>>> Hervé Pagès
>>>>>>
>>>>>> Program in Computational Biology
>>>>>> Division of Public Health Sciences
>>>>>> Fred Hutchinson Cancer Research Center
>>>>>> 1100 Fairview Ave. N, M1-B514
>>>>>> P.O. Box 19024
>>>>>> Seattle, WA 98109-1024
>>>>>>
>>>>>> E-mail: [hidden email]
>>>>>> Phone:  (206) 667-5791
>>>>>> Fax:    (206) 667-1319
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list
>>>>>>
>>>>>>
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gynT4YhbmVKZhnX4srXlCWZZRyVBMXG211CKgftdEs0&s=_I0aFHQVnXdBfB5kTLg9TxK_2LHdSuaB6gqZwSx1orQ&e=
>>>>>>
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: [hidden email]
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: [hidden email]
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Dispatch mechanism seems to alter object before calling method on it

Michael Lawrence-3
On Wed, May 16, 2018 at 3:45 PM, Hervé Pagès <[hidden email]> wrote:

> On 05/16/2018 01:24 PM, Michael Lawrence wrote:
>>
>> On Wed, May 16, 2018 at 12:23 PM, Hervé Pagès <[hidden email]>
>> wrote:
>>>
>>> On 05/16/2018 10:22 AM, Michael Lawrence wrote:
>>>>
>>>>
>>>> Factors and data.frames are not structures, because they must have a
>>>> class attribute. Just call them "objects". They are higher level than
>>>> structures, which in practice just shape data without adding a lot of
>>>> semantics. Compare getClass("matrix") and getClass("factor").
>>>>
>>>> I agree that inheritance through explicit coercion is confusing. As
>>>> far as I know, there are only 2 places where it is used:
>>>> 1) Objects with attributes but no class, basically "structure" and its
>>>> subclasses "array" <- "matrix"
>>>> 2) Classes that extend a reference type ("environment", "name" and
>>>> "externalptr") via hidden delegation (@.xData)
>>>>
>>>> I'm not sure if anyone should be doing #2. For #1, a simple "fix"
>>>> would be just to drop inheritance of "structure" from "vector". I
>>>> think the intent was to mimic base R behavior, where it will happily
>>>> strip (or at least ignore) attributes when passing an array or matrix
>>>> to an internal function that expects a vector.
>>>>
>>>> A related problem, which explains why factor and data.frame inherit
>>>> from "vector" even though they are objects, is that any S4 object
>>>> derived from those needs to be (for pragmatic compatibility reasons)
>>>> an integer vector or list, respectively, internally (the virtual
>>>> @.Data slot). Separating that from inheritance would probably be
>>>> difficult.
>>>>
>>>> Yes, we can consider these to be problems, to some extent stemming
>>>> from the behavior and design of R itself, but I'm not sure it's worth
>>>> doing anything about them at this point.
>>>
>>>
>>>
>>> Thanks for the informative discussion. It still doesn't explain
>>> why 'm' gets its attributes stripped and 'x' does not though:
>>>
>>>    m <- matrix(1:12, ncol=3)
>>>    x <- structure(1:3, titi="A")
>>>
>>>    setGeneric("foo", function(x) standardGeneric("foo"))
>>>    setMethod("foo", "vector", identity)
>>>
>>>    foo(m)
>>>    # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>>
>>>    foo(x)
>>>    # [1] 1 2 3
>>>    # attr(,"titi")
>>>    # [1] "A"
>>>
>>> If I understand correctly, both are "structures", not "objects".
>>>
>>
>> The structure 'x' has no class, so nothing special is going to happen.
>> As you know, S4 has a well-defined class hierarchy. Just look at
>> getClass("structure") to see its subclasses. There was at some point
>> an attempt to create a sort of dynamic inheritance, where a 'test'
>> function would be called and could figure this out. However, that was
>> never implemented. For one thing, it would be even more confusing.
>>
>>> Why aren't these problems worth fixing? More generally speaking
>>> the erratic behavior of the S4 system with respect to S3 objects
>>> has been a plague since the beginning of the methods package.
>>> And many people have complained about this in many occasions in
>>> one way or another. For the record, here are some of the most
>>> notorious problems:
>>>
>>>    class(as.numeric(1:4))
>>>    # [1] "numeric"
>>>    class(as(1:4, "numeric"))
>>>    # [1] "integer"
>>>
>>
>> This is not really a problem with the methods package. is.numeric(1L)
>> is TRUE, thus integer extends numeric, so coercing an integer to
>> numeric is a no-op.
>
>
> Only as(1:4, "numeric", strict=FALSE) should be a no-op.
> as(1:4, "numeric") should still coerce because as() is supposed
> to perform strict coercion by default.
>
>> as.numeric() should really be called as.double()
>> or something. But that's not going to change, of course.
>
>
> as.numeric() is doing the right thing (i.e. strict coercion) so there
> is no need to touch it.

Yes, sorry, you're right.

L342 of methods::as.R has a comment block explaining that this was
attempted in 2015 but it caused too many problems and so was reverted.

>
>>
>>>    is.vector(matrix())
>>>    # [1] FALSE
>>>    is(matrix(), "vector")
>>>    # [1] TRUE
>>>
>>
>> We already discussed this in the context of "structure" inheriting
>> from "vector" and explicit coercion.
>>
>>>    is.list(data.frame())
>>>    # [1] TRUE
>>>    is(data.frame(), "list")
>>>    # [1] FALSE
>>>    extends("data.frame", "list")
>>>    # [1] TRUE
>>>
>>
>> This is a compromise for compatibility with inherits(), since the
>> result of data.frame() is an S3 object.
>
>
> So we should add to the list that inherits(data.frame(), "list") is
> broken too. Once it gets fixed, is(data.frame(), "list") won't need
> to compromise anymore and will be free to return the correct answer.
>

But it's not broken according to S3 rules. Adding "list" to
class(data.frame()) would probably be very disruptive.

>>
>>>
>>>    is(data.frame(), "vector")
>>>    # [1] FALSE
>>>    is(data.frame(), "factor")
>>>    # [1] FALSE
>>>    is(data.frame(), "vector_OR_factor")
>>>    # [1] TRUE
>>>
>>
>> The question is: which inheritance to follow, S3 or S4? Since "vector"
>> is a basic class, inheritance follows S3 rules. But the class union is
>> an S4 class, so it follows S4 rules.
>>
>>>    etc...
>>>
>>> Many people stay away from S4 because of these incomprehensible
>>> behaviors.
>>>
>>> Finally note that even pure S3 operations can produce output that
>>> doesn't make sense:
>>>
>>>    is.list(data.frame())
>>>    # [1] TRUE
>>>    is.vector(list())
>>>    # [1] TRUE
>>>    is.vector(data.frame())
>>>    # [1] FALSE
>>>
>>>    (that is: a data frame is a list and a list is a vector but
>>>    a data frame is not a vector!)
>>>
>>
>> R has no notion of inheritance here. These are just different
>> functions checking different things.
>
>
> Yes, I see that R is does not care about inheritance here.
> But is that it? Is that the end of the story? 3 different
> functions checking 3 different things but isn't the last one
> broken?
>
>> Bringing this up again after so
>> many discussions borders on trolling.
>
>
> Hopefully these issues are not officially "closed".
>
> As you know these issues are serious flaws. They've been biting me
> and other Bioconductor developers (including you) over and over in
> our development effort in S4Vectors and other Bioconductor packages
> that heavily rely on the S4 system.
>
> Unfortunately the discussions I've seen about these issues almost
> always die under the weight of complex technical considerations
> that are almost impossible to understand if one is not familiar
> with the internals of the methods package. Very few of us are
> (I'm not counting myself). The problem is that this complexity,
> or some obscure early design decisions, seems to be used as an
> excuse for not fixing these issues. So yes, I'm finding this
> situation quite frustrating to be honest, and I'm only expressing
> this frustration here. Note that this is not the same as trolling.
> Forgive me if it sounded like that.

No worries. The problems here are mostly conceptual, because of the
incompatibility of the different type systems, or they are things that
could technically be fixed, but at an unacceptable loss of backwards
compatibility. It's helpful that you've identified these issues. We
could compile them somewhere so that others are not so easily bitten,
and so that future object systems avoid the same mistakes.

>
> H.
>
>
>>
>>> Why aren't these problems taken more seriously?
>>>
>>
>> They are taken seriously. But there are serious semantic differences
>> between S3, S4 and base type checking functions. The S3/S4 integration
>> should be viewed as a tool that is useful in practice, despite forced
>> compromises.
>>
>> There are changes that would resolve some of these issues, like those
>> suggested earlier in this thread, but it's likely too disruptive to
>> make them now. Energy is better spent thinking about how we will do it
>> "right" the next time around.
>>
>>> Thanks,
>>> H.
>>>
>>>>
>>>> Michael
>>>>
>>>> On Wed, May 16, 2018 at 8:33 AM, Hervé Pagès <[hidden email]>
>>>> wrote:
>>>>>
>>>>>
>>>>> On 05/15/2018 09:13 PM, Michael Lawrence wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> My understanding is that array (or any other structure) does not
>>>>>> "simply" inherit from vector, because structures are not vectors in
>>>>>> the strictest sense. Basically, once a vector gains attributes, it is
>>>>>> a structure, not a vector. The methods package accommodates this by
>>>>>> defining an "is" relationship between "structure" and "vector" via an
>>>>>> "explicit coerce", such that any "structure" passed to a "vector"
>>>>>> method is first passed to as.vector(), which strips attributes. This
>>>>>> is very much by design.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> It seems that the problem is really with matrices and arrays, not
>>>>> with "structures" in general:
>>>>>
>>>>>     f <- factor(c("z", "x", "z"), levels=letters)
>>>>>     m <- matrix(1:12, ncol=3)
>>>>>     df <- data.frame(f=f)
>>>>>     x <- structure(1:3, titi="A")
>>>>>
>>>>> Only the matrix looses its attributes when passed to a "vector"
>>>>> method:
>>>>>
>>>>>     setGeneric("foo", function(x) standardGeneric("foo"))
>>>>>     setMethod("foo", "vector", identity)
>>>>>
>>>>>     foo(f)     # attributes are preserved
>>>>>     # [1] z x z
>>>>>     # Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
>>>>>
>>>>>     foo(m)     # attributes are stripped
>>>>>     # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>>>>
>>>>>     foo(df)    # attributes are preserved
>>>>>     #   f
>>>>>     # 1 z
>>>>>     # 2 x
>>>>>     # 3 z
>>>>>
>>>>>     foo(x)     # attributes are preserved
>>>>>     # [1] 1 2 3
>>>>>     # attr(,"titi")
>>>>>     # [1] "A"
>>>>>
>>>>> Also if structures are passed to as.vector() before being passed to
>>>>> a "vector" method, shouldn't as.vector() and foo() be equivalent on
>>>>> them? For 'f' and 'x' they're not:
>>>>>
>>>>>     as.vector(f)
>>>>>     # [1] "z" "x" "z"
>>>>>
>>>>>     as.vector(x)
>>>>>     # [1] 1 2 3
>>>>>
>>>>> Finally note that for factors and data frames the "vector" method gets
>>>>> selected despite the fact that is( , "vector") is FALSE:
>>>>>
>>>>>     is(f, "vector")
>>>>>     # [1] FALSE
>>>>>
>>>>>     is(m, "vector")
>>>>>     # [1] TRUE
>>>>>
>>>>>     is(df, "vector")
>>>>>     # [1] FALSE
>>>>>
>>>>>     is(x, "vector")
>>>>>     # [1] TRUE
>>>>>
>>>>> Couldn't we recognize these problems as real, even if they are by
>>>>> design? Hopefully we can all agree that:
>>>>> - the dispatch mechanism should only dispatch, not alter objects;
>>>>> - is() and selectMethod() should not contradict each other.
>>>>>
>>>>> Thanks,
>>>>> H.
>>>>>
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>>
>>>>>> On Tue, May 15, 2018 at 5:25 PM, Hervé Pagès <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> This was quite unexpected:
>>>>>>>
>>>>>>>      setGeneric("foo", function(x) standardGeneric("foo"))
>>>>>>>
>>>>>>>      setMethod("foo", "vector", identity)
>>>>>>>
>>>>>>>      foo(matrix(1:12, ncol=3))
>>>>>>>      # [1]  1  2  3  4  5  6  7  8  9 10 11 12
>>>>>>>
>>>>>>>      foo(array(1:24, 4:2))
>>>>>>>      # [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
>>>>>>> 20
>>>>>>> 21
>>>>>>> 22 23
>>>>>>> 24
>>>>>>>
>>>>>>> If I define a method for array objects, things work as expected
>>>>>>> though:
>>>>>>>
>>>>>>>      setMethod("foo", "array", identity)
>>>>>>>
>>>>>>>      foo(matrix(1:12, ncol=3))
>>>>>>>      #      [,1] [,2] [,3]
>>>>>>>      # [1,]    1    5    9
>>>>>>>      # [2,]    2    6   10
>>>>>>>      # [3,]    3    7   11
>>>>>>>      # [4,]    4    8   12
>>>>>>>
>>>>>>> So, luckily, I have a workaround.
>>>>>>>
>>>>>>> But shouldn't the dispatch mechanism stay away from the business of
>>>>>>> altering objects before passed to it?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> H.
>>>>>>>
>>>>>>> --
>>>>>>> Hervé Pagès
>>>>>>>
>>>>>>> Program in Computational Biology
>>>>>>> Division of Public Health Sciences
>>>>>>> Fred Hutchinson Cancer Research Center
>>>>>>> 1100 Fairview Ave. N, M1-B514
>>>>>>> P.O. Box 19024
>>>>>>> Seattle, WA 98109-1024
>>>>>>>
>>>>>>> E-mail: [hidden email]
>>>>>>> Phone:  (206) 667-5791
>>>>>>> Fax:    (206) 667-1319
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=gynT4YhbmVKZhnX4srXlCWZZRyVBMXG211CKgftdEs0&s=_I0aFHQVnXdBfB5kTLg9TxK_2LHdSuaB6gqZwSx1orQ&e=
>>>>>>>
>>>>>
>>>>> --
>>>>> Hervé Pagès
>>>>>
>>>>> Program in Computational Biology
>>>>> Division of Public Health Sciences
>>>>> Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N, M1-B514
>>>>> P.O. Box 19024
>>>>> Seattle, WA 98109-1024
>>>>>
>>>>> E-mail: [hidden email]
>>>>> Phone:  (206) 667-5791
>>>>> Fax:    (206) 667-1319
>>>
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: [hidden email]
>>> Phone:  (206) 667-5791
>>> Fax:    (206) 667-1319
>>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel