head.matrix can return 1000s of columns -- limit to n or add new argument?

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

head.matrix can return 1000s of columns -- limit to n or add new argument?

Michael Chirico
I think of head() as a standard helper for "glancing" at objects, so I'm
sometimes surprised that head() produces massive output:

M = matrix(nrow = 10L, ncol = 100000L)
print(head(M)) # <- beware, could be a huge print

I assume there are lots of backwards-compatibility issues as well as valid
use cases for this behavior, so I guess defaulting to M[1:6, 1:6] is out of
the question.

Is there any scope for adding a new argument to head.matrix that would
allow this flexibility? IINM it should essentially be as simple to do
head.array as:

do.call(`[`, c(list(x, drop = FALSE), lapply(pmin(dim(x), n), seq_len)))

(with extra decoration to handle -n, etc)

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Abby Spurdle
> I assume there are lots of backwards-compatibility issues as well as valid
> use cases for this behavior, so I guess defaulting to M[1:6, 1:6] is out
of
> the question.

Agree.

> Is there any scope for adding a new argument to head.matrix that would
> allow this flexibility?

I agree with what you're trying to achieve.
However, I'm not sure this is as simple as you're suggesting.

What if the user wants "head" in rows but "tail" in columns.
Or "head" in rows, and both "head" and "tail" in columns.
With head and tail alone, there's a combinatorial explosion.

Also, when using tail on an unnamed matrix, it may be desirable to name
rows and columns.

And all of this assumes standard matrix objects.
Add in a matrix subclasses and related objects, and things get more complex
still.

As I suggested in a another thread, a few days ago, I'm planning to write
an R package for matrices and matrix-like objects (possibly extending the
Matrix package), with an initial emphasis on subsetting, printing and
formatting.
So, I'm interested to hear more suggestions on this topic.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Gabriel Becker-2
Hi Michael and Abby,

So one thing that could happen that would be backwards compatible (with the
exception of something that was an error no longer being an error) is head
and tail could take vectors of length (dim(x)) rather than integers of
length for n, with the default being n=6 being equivalent to n = c(6,
dim(x)[2], <...>, dim(x)[k]), at least for the deprecation cycle, if not
permanently. It not recycling would be unexpected based on the behavior of
many R functions but would preserve the current behavior while granting
more fine-grained control to users that feel they need it.

A rapidly thrown-together prototype of such a method for the head of a
matrix case is as follows:

head2 = function(x, n = 6L, ...) {
    indvecs = lapply(seq_along(dim(x)), function(i) {
        if(length(n) >= i) {
            ni = n[i]
        } else {
            ni =  dim(x)[i]
        }
        if(ni < 0L)
            ni = max(nrow(x) + ni, 0L)
        else
            ni = min(ni, dim(x)[i])
        seq_len(ni)
    })
    lstargs = c(list(x),indvecs, drop = FALSE)
    do.call("[", lstargs)
}


> mat = matrix(1:100, 10, 10)

> *head(mat)*

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,]    1   11   21   31   41   51   61   71   81    91

[2,]    2   12   22   32   42   52   62   72   82    92

[3,]    3   13   23   33   43   53   63   73   83    93

[4,]    4   14   24   34   44   54   64   74   84    94

[5,]    5   15   25   35   45   55   65   75   85    95

[6,]    6   16   26   36   46   56   66   76   86    96

> *head2(mat)*

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,]    1   11   21   31   41   51   61   71   81    91

[2,]    2   12   22   32   42   52   62   72   82    92

[3,]    3   13   23   33   43   53   63   73   83    93

[4,]    4   14   24   34   44   54   64   74   84    94

[5,]    5   15   25   35   45   55   65   75   85    95

[6,]    6   16   26   36   46   56   66   76   86    96

> *head2(mat, c(2, 3))*

     [,1] [,2] [,3]

[1,]    1   11   21

[2,]    2   12   22

> *head2(mat, c(2, -9))*

     [,1]

[1,]    1

[2,]    2


Now one thing to keep in mind here, is that I think we'd  either a) have to
make the non-recycling  behavior permanent, or b) have head treat
data.frames and matrices different with respect to the subsets they grab
(which strikes me as a  *Bad Plan *(tm)).

So I don't think the default behavior would ever be mat[1:6, 1:6],  not
because of backwards compatibility, but because at least in my intuition
that is just not what head on a data.frame should do by default, and I
think the behaviors for the basic rectangular datatypes should "stick
together". I mean, also because of backwards compatibility, but that could  *in
theory* change across a long enough deprecation cycle, but  the
conceptually right thing to do with a data.frame probably won't.

All of that said, is head(mat, c(6, 6)) really that much  easier to
type/better than just mat[1:6, 1:6, drop=FALSE] (I know this will behave
differently if any of the dims of mat are less than 6, but if so why are
you heading it in the first place ;) )? I don't really have a strong
feeling on the answer to that.

I'm happy to put a patch for head.matrix, head.data.frame, tail.matrix and
tail.data.frame, plus documentation, if people on R-core are interested in
this.

Note, as most here probably know, and as alluded to above,  length(n) > 1
for head or tail currently give an error, so  this would  be an extension
of the existing functionality in the mathematical extension sense, where
all existing behavior would remain identical, but the support/valid
parameter space would grow.

Best,
~G


On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle <[hidden email]> wrote:

> > I assume there are lots of backwards-compatibility issues as well as
> valid
> > use cases for this behavior, so I guess defaulting to M[1:6, 1:6] is out
> of
> > the question.
>
> Agree.
>
> > Is there any scope for adding a new argument to head.matrix that would
> > allow this flexibility?
>
> I agree with what you're trying to achieve.
> However, I'm not sure this is as simple as you're suggesting.
>
> What if the user wants "head" in rows but "tail" in columns.
> Or "head" in rows, and both "head" and "tail" in columns.
> With head and tail alone, there's a combinatorial explosion.
>
> Also, when using tail on an unnamed matrix, it may be desirable to name
> rows and columns.
>
> And all of this assumes standard matrix objects.
> Add in a matrix subclasses and related objects, and things get more complex
> still.
>
> As I suggested in a another thread, a few days ago, I'm planning to write
> an R package for matrices and matrix-like objects (possibly extending the
> Matrix package), with an initial emphasis on subsetting, printing and
> formatting.
> So, I'm interested to hear more suggestions on this topic.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Michael Chirico
Finally read in detail your response Gabe. Looks great, and I agree it's
quite intuitive, as well as agree against non-recycling.

Once the length(n) == length(dim(x)) behavior is enabled, I don't think
there's any need/desire to have head() do x[1:6,1:6] anymore. head(x, c(6,
6)) is quite clear for those familiar with head(x, 6), it would seem to me.

Mike C

On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker <[hidden email]>
wrote:

> Hi Michael and Abby,
>
> So one thing that could happen that would be backwards compatible (with
> the exception of something that was an error no longer being an error) is
> head and tail could take vectors of length (dim(x)) rather than integers of
> length for n, with the default being n=6 being equivalent to n = c(6,
> dim(x)[2], <...>, dim(x)[k]), at least for the deprecation cycle, if not
> permanently. It not recycling would be unexpected based on the behavior of
> many R functions but would preserve the current behavior while granting
> more fine-grained control to users that feel they need it.
>
> A rapidly thrown-together prototype of such a method for the head of a
> matrix case is as follows:
>
> head2 = function(x, n = 6L, ...) {
>     indvecs = lapply(seq_along(dim(x)), function(i) {
>         if(length(n) >= i) {
>             ni = n[i]
>         } else {
>             ni =  dim(x)[i]
>         }
>         if(ni < 0L)
>             ni = max(nrow(x) + ni, 0L)
>         else
>             ni = min(ni, dim(x)[i])
>         seq_len(ni)
>     })
>     lstargs = c(list(x),indvecs, drop = FALSE)
>     do.call("[", lstargs)
> }
>
>
> > mat = matrix(1:100, 10, 10)
>
> > *head(mat)*
>
>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>
> [1,]    1   11   21   31   41   51   61   71   81    91
>
> [2,]    2   12   22   32   42   52   62   72   82    92
>
> [3,]    3   13   23   33   43   53   63   73   83    93
>
> [4,]    4   14   24   34   44   54   64   74   84    94
>
> [5,]    5   15   25   35   45   55   65   75   85    95
>
> [6,]    6   16   26   36   46   56   66   76   86    96
>
> > *head2(mat)*
>
>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>
> [1,]    1   11   21   31   41   51   61   71   81    91
>
> [2,]    2   12   22   32   42   52   62   72   82    92
>
> [3,]    3   13   23   33   43   53   63   73   83    93
>
> [4,]    4   14   24   34   44   54   64   74   84    94
>
> [5,]    5   15   25   35   45   55   65   75   85    95
>
> [6,]    6   16   26   36   46   56   66   76   86    96
>
> > *head2(mat, c(2, 3))*
>
>      [,1] [,2] [,3]
>
> [1,]    1   11   21
>
> [2,]    2   12   22
>
> > *head2(mat, c(2, -9))*
>
>      [,1]
>
> [1,]    1
>
> [2,]    2
>
>
> Now one thing to keep in mind here, is that I think we'd  either a) have
> to make the non-recycling  behavior permanent, or b) have head treat
> data.frames and matrices different with respect to the subsets they grab
> (which strikes me as a  *Bad Plan *(tm)).
>
> So I don't think the default behavior would ever be mat[1:6, 1:6],  not
> because of backwards compatibility, but because at least in my intuition
> that is just not what head on a data.frame should do by default, and I
> think the behaviors for the basic rectangular datatypes should "stick
> together". I mean, also because of backwards compatibility, but that could  *in
> theory* change across a long enough deprecation cycle, but  the
> conceptually right thing to do with a data.frame probably won't.
>
> All of that said, is head(mat, c(6, 6)) really that much  easier to
> type/better than just mat[1:6, 1:6, drop=FALSE] (I know this will behave
> differently if any of the dims of mat are less than 6, but if so why are
> you heading it in the first place ;) )? I don't really have a strong
> feeling on the answer to that.
>
> I'm happy to put a patch for head.matrix, head.data.frame, tail.matrix and
> tail.data.frame, plus documentation, if people on R-core are interested in
> this.
>
> Note, as most here probably know, and as alluded to above,  length(n) > 1
> for head or tail currently give an error, so  this would  be an extension
> of the existing functionality in the mathematical extension sense, where
> all existing behavior would remain identical, but the support/valid
> parameter space would grow.
>
> Best,
> ~G
>
>
> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle <[hidden email]> wrote:
>
>> > I assume there are lots of backwards-compatibility issues as well as
>> valid
>> > use cases for this behavior, so I guess defaulting to M[1:6, 1:6] is out
>> of
>> > the question.
>>
>> Agree.
>>
>> > Is there any scope for adding a new argument to head.matrix that would
>> > allow this flexibility?
>>
>> I agree with what you're trying to achieve.
>> However, I'm not sure this is as simple as you're suggesting.
>>
>> What if the user wants "head" in rows but "tail" in columns.
>> Or "head" in rows, and both "head" and "tail" in columns.
>> With head and tail alone, there's a combinatorial explosion.
>>
>> Also, when using tail on an unnamed matrix, it may be desirable to name
>> rows and columns.
>>
>> And all of this assumes standard matrix objects.
>> Add in a matrix subclasses and related objects, and things get more
>> complex
>> still.
>>
>> As I suggested in a another thread, a few days ago, I'm planning to write
>> an R package for matrices and matrix-like objects (possibly extending the
>> Matrix package), with an initial emphasis on subsetting, printing and
>> formatting.
>> So, I'm interested to hear more suggestions on this topic.
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Martin Maechler
>>>>> Michael Chirico
>>>>>     on Sun, 15 Sep 2019 20:52:34 +0800 writes:

    > Finally read in detail your response Gabe. Looks great,
    > and I agree it's quite intuitive, as well as agree against
    > non-recycling.

    > Once the length(n) == length(dim(x)) behavior is enabled,
    > I don't think there's any need/desire to have head() do
    > x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
    > those familiar with head(x, 6), it would seem to me.

    > Mike C

Thank you, Gabe, and Michael.
I did like Gabe's proposal already back in July but was
busy and/or vacationing then ...

If you submit this with a patch (that includes changes to both
*.R and *.Rd , including some example) as "wishlist" item to R's
bugzilla, I'm willing/happy to check and commit this to R-devel.

Martin


    > On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
    > <[hidden email]> wrote:

    >> Hi Michael and Abby,
    >>
    >> So one thing that could happen that would be backwards
    >> compatible (with the exception of something that was an
    >> error no longer being an error) is head and tail could
    >> take vectors of length (dim(x)) rather than integers of
    >> length for n, with the default being n=6 being equivalent
    >> to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for
    >> the deprecation cycle, if not permanently. It not
    >> recycling would be unexpected based on the behavior of
    >> many R functions but would preserve the current behavior
    >> while granting more fine-grained control to users that
    >> feel they need it.
    >>
    >> A rapidly thrown-together prototype of such a method for
    >> the head of a matrix case is as follows:
    >>
    >> head2 = function(x, n = 6L, ...) { indvecs =
    >> lapply(seq_along(dim(x)), function(i) { if(length(n) >=
    >> i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni =
    >> max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
    >> seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE)
    >> do.call("[", lstargs) }
    >>
    >>
    >> > mat = matrix(1:100, 10, 10)
    >>
    >> > *head(mat)*
    >>
    >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    >>
    >> [1,] 1 11 21 31 41 51 61 71 81 91
    >>
    >> [2,] 2 12 22 32 42 52 62 72 82 92
    >>
    >> [3,] 3 13 23 33 43 53 63 73 83 93
    >>
    >> [4,] 4 14 24 34 44 54 64 74 84 94
    >>
    >> [5,] 5 15 25 35 45 55 65 75 85 95
    >>
    >> [6,] 6 16 26 36 46 56 66 76 86 96
    >>
    >> > *head2(mat)*
    >>
    >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    >>
    >> [1,] 1 11 21 31 41 51 61 71 81 91
    >>
    >> [2,] 2 12 22 32 42 52 62 72 82 92
    >>
    >> [3,] 3 13 23 33 43 53 63 73 83 93
    >>
    >> [4,] 4 14 24 34 44 54 64 74 84 94
    >>
    >> [5,] 5 15 25 35 45 55 65 75 85 95
    >>
    >> [6,] 6 16 26 36 46 56 66 76 86 96
    >>
    >> > *head2(mat, c(2, 3))*
    >>
    >> [,1] [,2] [,3]
    >>
    >> [1,] 1 11 21
    >>
    >> [2,] 2 12 22
    >>
    >> > *head2(mat, c(2, -9))*
    >>
    >> [,1]
    >>
    >> [1,] 1
    >>
    >> [2,] 2
    >>
    >>
    >> Now one thing to keep in mind here, is that I think we'd
    >> either a) have to make the non-recycling behavior
    >> permanent, or b) have head treat data.frames and matrices
    >> different with respect to the subsets they grab (which
    >> strikes me as a *Bad Plan *(tm)).
    >>
    >> So I don't think the default behavior would ever be
    >> mat[1:6, 1:6], not because of backwards compatibility,
    >> but because at least in my intuition that is just not
    >> what head on a data.frame should do by default, and I
    >> think the behaviors for the basic rectangular datatypes
    >> should "stick together". I mean, also because of
    >> backwards compatibility, but that could *in theory*
    >> change across a long enough deprecation cycle, but the
    >> conceptually right thing to do with a data.frame probably
    >> won't.
    >>
    >> All of that said, is head(mat, c(6, 6)) really that much
    >> easier to type/better than just mat[1:6, 1:6, drop=FALSE]
    >> (I know this will behave differently if any of the dims
    >> of mat are less than 6, but if so why are you heading it
    >> in the first place ;) )? I don't really have a strong
    >> feeling on the answer to that.
    >>
    >> I'm happy to put a patch for head.matrix,
    >> head.data.frame, tail.matrix and tail.data.frame, plus
    >> documentation, if people on R-core are interested in
    >> this.
    >>
    >> Note, as most here probably know, and as alluded to
    >> above, length(n) > 1 for head or tail currently give an
    >> error, so this would be an extension of the existing
    >> functionality in the mathematical extension sense, where
    >> all existing behavior would remain identical, but the
    >> support/valid parameter space would grow.
    >>
    >> Best, ~G
    >>
    >>
    >> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
    >> <[hidden email]> wrote:
    >>
    >>> > I assume there are lots of backwards-compatibility
    >>> issues as well as valid > use cases for this behavior,
    >>> so I guess defaulting to M[1:6, 1:6] is out of > the
    >>> question.
    >>>
    >>> Agree.
    >>>
    >>> > Is there any scope for adding a new argument to
    >>> head.matrix that would > allow this flexibility?
    >>>
    >>> I agree with what you're trying to achieve.  However,
    >>> I'm not sure this is as simple as you're suggesting.
    >>>
    >>> What if the user wants "head" in rows but "tail" in
    >>> columns.  Or "head" in rows, and both "head" and "tail"
    >>> in columns.  With head and tail alone, there's a
    >>> combinatorial explosion.
    >>>
    >>> Also, when using tail on an unnamed matrix, it may be
    >>> desirable to name rows and columns.
    >>>
    >>> And all of this assumes standard matrix objects.  Add in
    >>> a matrix subclasses and related objects, and things get
    >>> more complex still.
    >>>
    >>> As I suggested in a another thread, a few days ago, I'm
    >>> planning to write an R package for matrices and
    >>> matrix-like objects (possibly extending the Matrix
    >>> package), with an initial emphasis on subsetting,
    >>> printing and formatting.  So, I'm interested to hear
    >>> more suggestions on this topic.
    >>>
    >>> [[alternative HTML version deleted]]
    >>>
    >>> ______________________________________________
    >>> [hidden email] mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>
    >>

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Michael Chirico
Awesome. Gabe, since you already have a workshopped version, would you like
to proceed? Feel free to ping me to review the patch once it's posted.

On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler <[hidden email]>
wrote:

> >>>>> Michael Chirico
> >>>>>     on Sun, 15 Sep 2019 20:52:34 +0800 writes:
>
>     > Finally read in detail your response Gabe. Looks great,
>     > and I agree it's quite intuitive, as well as agree against
>     > non-recycling.
>
>     > Once the length(n) == length(dim(x)) behavior is enabled,
>     > I don't think there's any need/desire to have head() do
>     > x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
>     > those familiar with head(x, 6), it would seem to me.
>
>     > Mike C
>
> Thank you, Gabe, and Michael.
> I did like Gabe's proposal already back in July but was
> busy and/or vacationing then ...
>
> If you submit this with a patch (that includes changes to both
> *.R and *.Rd , including some example) as "wishlist" item to R's
> bugzilla, I'm willing/happy to check and commit this to R-devel.
>
> Martin
>
>
>     > On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
>     > <[hidden email]> wrote:
>
>     >> Hi Michael and Abby,
>     >>
>     >> So one thing that could happen that would be backwards
>     >> compatible (with the exception of something that was an
>     >> error no longer being an error) is head and tail could
>     >> take vectors of length (dim(x)) rather than integers of
>     >> length for n, with the default being n=6 being equivalent
>     >> to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for
>     >> the deprecation cycle, if not permanently. It not
>     >> recycling would be unexpected based on the behavior of
>     >> many R functions but would preserve the current behavior
>     >> while granting more fine-grained control to users that
>     >> feel they need it.
>     >>
>     >> A rapidly thrown-together prototype of such a method for
>     >> the head of a matrix case is as follows:
>     >>
>     >> head2 = function(x, n = 6L, ...) { indvecs =
>     >> lapply(seq_along(dim(x)), function(i) { if(length(n) >=
>     >> i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni =
>     >> max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
>     >> seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE)
>     >> do.call("[", lstargs) }
>     >>
>     >>
>     >> > mat = matrix(1:100, 10, 10)
>     >>
>     >> > *head(mat)*
>     >>
>     >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>     >>
>     >> [1,] 1 11 21 31 41 51 61 71 81 91
>     >>
>     >> [2,] 2 12 22 32 42 52 62 72 82 92
>     >>
>     >> [3,] 3 13 23 33 43 53 63 73 83 93
>     >>
>     >> [4,] 4 14 24 34 44 54 64 74 84 94
>     >>
>     >> [5,] 5 15 25 35 45 55 65 75 85 95
>     >>
>     >> [6,] 6 16 26 36 46 56 66 76 86 96
>     >>
>     >> > *head2(mat)*
>     >>
>     >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>     >>
>     >> [1,] 1 11 21 31 41 51 61 71 81 91
>     >>
>     >> [2,] 2 12 22 32 42 52 62 72 82 92
>     >>
>     >> [3,] 3 13 23 33 43 53 63 73 83 93
>     >>
>     >> [4,] 4 14 24 34 44 54 64 74 84 94
>     >>
>     >> [5,] 5 15 25 35 45 55 65 75 85 95
>     >>
>     >> [6,] 6 16 26 36 46 56 66 76 86 96
>     >>
>     >> > *head2(mat, c(2, 3))*
>     >>
>     >> [,1] [,2] [,3]
>     >>
>     >> [1,] 1 11 21
>     >>
>     >> [2,] 2 12 22
>     >>
>     >> > *head2(mat, c(2, -9))*
>     >>
>     >> [,1]
>     >>
>     >> [1,] 1
>     >>
>     >> [2,] 2
>     >>
>     >>
>     >> Now one thing to keep in mind here, is that I think we'd
>     >> either a) have to make the non-recycling behavior
>     >> permanent, or b) have head treat data.frames and matrices
>     >> different with respect to the subsets they grab (which
>     >> strikes me as a *Bad Plan *(tm)).
>     >>
>     >> So I don't think the default behavior would ever be
>     >> mat[1:6, 1:6], not because of backwards compatibility,
>     >> but because at least in my intuition that is just not
>     >> what head on a data.frame should do by default, and I
>     >> think the behaviors for the basic rectangular datatypes
>     >> should "stick together". I mean, also because of
>     >> backwards compatibility, but that could *in theory*
>     >> change across a long enough deprecation cycle, but the
>     >> conceptually right thing to do with a data.frame probably
>     >> won't.
>     >>
>     >> All of that said, is head(mat, c(6, 6)) really that much
>     >> easier to type/better than just mat[1:6, 1:6, drop=FALSE]
>     >> (I know this will behave differently if any of the dims
>     >> of mat are less than 6, but if so why are you heading it
>     >> in the first place ;) )? I don't really have a strong
>     >> feeling on the answer to that.
>     >>
>     >> I'm happy to put a patch for head.matrix,
>     >> head.data.frame, tail.matrix and tail.data.frame, plus
>     >> documentation, if people on R-core are interested in
>     >> this.
>     >>
>     >> Note, as most here probably know, and as alluded to
>     >> above, length(n) > 1 for head or tail currently give an
>     >> error, so this would be an extension of the existing
>     >> functionality in the mathematical extension sense, where
>     >> all existing behavior would remain identical, but the
>     >> support/valid parameter space would grow.
>     >>
>     >> Best, ~G
>     >>
>     >>
>     >> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
>     >> <[hidden email]> wrote:
>     >>
>     >>> > I assume there are lots of backwards-compatibility
>     >>> issues as well as valid > use cases for this behavior,
>     >>> so I guess defaulting to M[1:6, 1:6] is out of > the
>     >>> question.
>     >>>
>     >>> Agree.
>     >>>
>     >>> > Is there any scope for adding a new argument to
>     >>> head.matrix that would > allow this flexibility?
>     >>>
>     >>> I agree with what you're trying to achieve.  However,
>     >>> I'm not sure this is as simple as you're suggesting.
>     >>>
>     >>> What if the user wants "head" in rows but "tail" in
>     >>> columns.  Or "head" in rows, and both "head" and "tail"
>     >>> in columns.  With head and tail alone, there's a
>     >>> combinatorial explosion.
>     >>>
>     >>> Also, when using tail on an unnamed matrix, it may be
>     >>> desirable to name rows and columns.
>     >>>
>     >>> And all of this assumes standard matrix objects.  Add in
>     >>> a matrix subclasses and related objects, and things get
>     >>> more complex still.
>     >>>
>     >>> As I suggested in a another thread, a few days ago, I'm
>     >>> planning to write an R package for matrices and
>     >>> matrix-like objects (possibly extending the Matrix
>     >>> package), with an initial emphasis on subsetting,
>     >>> printing and formatting.  So, I'm interested to hear
>     >>> more suggestions on this topic.
>     >>>
>     >>> [[alternative HTML version deleted]]
>     >>>
>     >>> ______________________________________________
>     >>> [hidden email] mailing list
>     >>> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>>
>     >>
>
>     >   [[alternative HTML version deleted]]
>
>     > ______________________________________________
>     > [hidden email] mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Pages, Herve
Hi,

Alternatively, how about a new glance() generic that would do something
like this:

 > library(DelayedArray)
 > glance <- DelayedArray:::show_compact_array

 > M <- matrix(rnorm(1e6), nrow = 1000L, ncol = 2000L)
 > glance(M)
<1000 x 2000> matrix object of type "double":
                [,1]        [,2]        [,3] ...    [,1999]    [,2000]
    [1,]  -0.8854896   1.8010288   1.3051341   . -0.4473593  0.4684985
    [2,]  -0.8563415  -0.7102768  -0.9309155   . -1.8743504  0.4300557
    [3,]   1.0558159  -0.5956583   1.2689806   .  2.7292249  0.2608300
    [4,]   0.7547356   0.1465714   0.1798959   . -0.1778017  1.3417423
    [5,]   0.8037360  -2.7081809   0.9766657   . -0.9902788  0.1741957
     ...           .           .           .   .          .          .
  [996,]  0.67220752  0.07804320 -0.38743454   .  0.4438639 -0.8130713
  [997,] -0.67349962 -1.15292067 -0.54505567   .  0.4630923 -1.6287694
  [998,]  0.03374595 -1.68061325 -0.88458368   . -0.2890962  0.2552267
  [999,]  0.47861492  1.25530912  0.19436708   . -0.5193121 -1.1695501
[1000,]  1.52819218  2.23253275 -1.22051720   . -1.0342430 -0.1703396

 > A <- array(rnorm(1e6), c(50, 20, 10, 100))
 > glance(A)
<50 x 20 x 10 x 100> array object of type "double":
,,1,1
             [,1]       [,2]       [,3] ...      [,19]      [,20]
  [1,] 0.78319619 0.82258390 0.09122269   .  1.7288189  0.7968574
  [2,] 2.80687459 0.63709640 0.80844430   . -0.3963161 -1.2768284
   ...          .          .          .   .          .          .
[49,] -1.0696320 -0.1698111  2.0082890   .  0.4488292  0.5215745
[50,] -0.7012526 -2.0818229  0.7750518   .  0.3189076  0.1437394

...

,,10,100
             [,1]       [,2]       [,3] ...      [,19]      [,20]
  [1,]  0.5360649  0.5491561 -0.4098350   .  0.7647435  0.5640699
  [2,]  0.7924093 -0.7395815 -1.3792913   .  0.1980287 -0.2897026
   ...          .          .          .   .          .          .
[49,]  0.6266209  0.3778512  1.4995778   . -0.3820651 -1.4241691
[50,]  1.9218715  3.5475949  0.5963763   .  0.4005210  0.4385623

H.


On 9/16/19 00:54, Michael Chirico wrote:

> Awesome. Gabe, since you already have a workshopped version, would you like
> to proceed? Feel free to ping me to review the patch once it's posted.
>
> On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler <[hidden email]>
> wrote:
>
>>>>>>> Michael Chirico
>>>>>>>      on Sun, 15 Sep 2019 20:52:34 +0800 writes:
>>
>>      > Finally read in detail your response Gabe. Looks great,
>>      > and I agree it's quite intuitive, as well as agree against
>>      > non-recycling.
>>
>>      > Once the length(n) == length(dim(x)) behavior is enabled,
>>      > I don't think there's any need/desire to have head() do
>>      > x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
>>      > those familiar with head(x, 6), it would seem to me.
>>
>>      > Mike C
>>
>> Thank you, Gabe, and Michael.
>> I did like Gabe's proposal already back in July but was
>> busy and/or vacationing then ...
>>
>> If you submit this with a patch (that includes changes to both
>> *.R and *.Rd , including some example) as "wishlist" item to R's
>> bugzilla, I'm willing/happy to check and commit this to R-devel.
>>
>> Martin
>>
>>
>>      > On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
>>      > <[hidden email]> wrote:
>>
>>      >> Hi Michael and Abby,
>>      >>
>>      >> So one thing that could happen that would be backwards
>>      >> compatible (with the exception of something that was an
>>      >> error no longer being an error) is head and tail could
>>      >> take vectors of length (dim(x)) rather than integers of
>>      >> length for n, with the default being n=6 being equivalent
>>      >> to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for
>>      >> the deprecation cycle, if not permanently. It not
>>      >> recycling would be unexpected based on the behavior of
>>      >> many R functions but would preserve the current behavior
>>      >> while granting more fine-grained control to users that
>>      >> feel they need it.
>>      >>
>>      >> A rapidly thrown-together prototype of such a method for
>>      >> the head of a matrix case is as follows:
>>      >>
>>      >> head2 = function(x, n = 6L, ...) { indvecs =
>>      >> lapply(seq_along(dim(x)), function(i) { if(length(n) >=
>>      >> i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni =
>>      >> max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
>>      >> seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE)
>>      >> do.call("[", lstargs) }
>>      >>
>>      >>
>>      >> > mat = matrix(1:100, 10, 10)
>>      >>
>>      >> > *head(mat)*
>>      >>
>>      >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>      >>
>>      >> [1,] 1 11 21 31 41 51 61 71 81 91
>>      >>
>>      >> [2,] 2 12 22 32 42 52 62 72 82 92
>>      >>
>>      >> [3,] 3 13 23 33 43 53 63 73 83 93
>>      >>
>>      >> [4,] 4 14 24 34 44 54 64 74 84 94
>>      >>
>>      >> [5,] 5 15 25 35 45 55 65 75 85 95
>>      >>
>>      >> [6,] 6 16 26 36 46 56 66 76 86 96
>>      >>
>>      >> > *head2(mat)*
>>      >>
>>      >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>      >>
>>      >> [1,] 1 11 21 31 41 51 61 71 81 91
>>      >>
>>      >> [2,] 2 12 22 32 42 52 62 72 82 92
>>      >>
>>      >> [3,] 3 13 23 33 43 53 63 73 83 93
>>      >>
>>      >> [4,] 4 14 24 34 44 54 64 74 84 94
>>      >>
>>      >> [5,] 5 15 25 35 45 55 65 75 85 95
>>      >>
>>      >> [6,] 6 16 26 36 46 56 66 76 86 96
>>      >>
>>      >> > *head2(mat, c(2, 3))*
>>      >>
>>      >> [,1] [,2] [,3]
>>      >>
>>      >> [1,] 1 11 21
>>      >>
>>      >> [2,] 2 12 22
>>      >>
>>      >> > *head2(mat, c(2, -9))*
>>      >>
>>      >> [,1]
>>      >>
>>      >> [1,] 1
>>      >>
>>      >> [2,] 2
>>      >>
>>      >>
>>      >> Now one thing to keep in mind here, is that I think we'd
>>      >> either a) have to make the non-recycling behavior
>>      >> permanent, or b) have head treat data.frames and matrices
>>      >> different with respect to the subsets they grab (which
>>      >> strikes me as a *Bad Plan *(tm)).
>>      >>
>>      >> So I don't think the default behavior would ever be
>>      >> mat[1:6, 1:6], not because of backwards compatibility,
>>      >> but because at least in my intuition that is just not
>>      >> what head on a data.frame should do by default, and I
>>      >> think the behaviors for the basic rectangular datatypes
>>      >> should "stick together". I mean, also because of
>>      >> backwards compatibility, but that could *in theory*
>>      >> change across a long enough deprecation cycle, but the
>>      >> conceptually right thing to do with a data.frame probably
>>      >> won't.
>>      >>
>>      >> All of that said, is head(mat, c(6, 6)) really that much
>>      >> easier to type/better than just mat[1:6, 1:6, drop=FALSE]
>>      >> (I know this will behave differently if any of the dims
>>      >> of mat are less than 6, but if so why are you heading it
>>      >> in the first place ;) )? I don't really have a strong
>>      >> feeling on the answer to that.
>>      >>
>>      >> I'm happy to put a patch for head.matrix,
>>      >> head.data.frame, tail.matrix and tail.data.frame, plus
>>      >> documentation, if people on R-core are interested in
>>      >> this.
>>      >>
>>      >> Note, as most here probably know, and as alluded to
>>      >> above, length(n) > 1 for head or tail currently give an
>>      >> error, so this would be an extension of the existing
>>      >> functionality in the mathematical extension sense, where
>>      >> all existing behavior would remain identical, but the
>>      >> support/valid parameter space would grow.
>>      >>
>>      >> Best, ~G
>>      >>
>>      >>
>>      >> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
>>      >> <[hidden email]> wrote:
>>      >>
>>      >>> > I assume there are lots of backwards-compatibility
>>      >>> issues as well as valid > use cases for this behavior,
>>      >>> so I guess defaulting to M[1:6, 1:6] is out of > the
>>      >>> question.
>>      >>>
>>      >>> Agree.
>>      >>>
>>      >>> > Is there any scope for adding a new argument to
>>      >>> head.matrix that would > allow this flexibility?
>>      >>>
>>      >>> I agree with what you're trying to achieve.  However,
>>      >>> I'm not sure this is as simple as you're suggesting.
>>      >>>
>>      >>> What if the user wants "head" in rows but "tail" in
>>      >>> columns.  Or "head" in rows, and both "head" and "tail"
>>      >>> in columns.  With head and tail alone, there's a
>>      >>> combinatorial explosion.
>>      >>>
>>      >>> Also, when using tail on an unnamed matrix, it may be
>>      >>> desirable to name rows and columns.
>>      >>>
>>      >>> And all of this assumes standard matrix objects.  Add in
>>      >>> a matrix subclasses and related objects, and things get
>>      >>> more complex still.
>>      >>>
>>      >>> As I suggested in a another thread, a few days ago, I'm
>>      >>> planning to write an R package for matrices and
>>      >>> matrix-like objects (possibly extending the Matrix
>>      >>> package), with an initial emphasis on subsetting,
>>      >>> printing and formatting.  So, I'm interested to hear
>>      >>> more suggestions on this topic.
>>      >>>
>>      >>> [[alternative HTML version deleted]]
>>      >>>
>>      >>> ______________________________________________
>>      >>> [hidden email] mailing list
>>      >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>      >>>
>>      >>
>>
>>      >   [[alternative HTML version deleted]]
>>
>>      > ______________________________________________
>>      > [hidden email] mailing list
>>      > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Fox, John
In reply to this post by Michael Chirico
Dear Herve,

The brief() generic function in the car package does something very similar to that for data frames (and has methods for other classes of objects as well).

Best,
 John

  -----------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

> On Sep 17, 2019, at 2:52 AM, Pages, Herve <[hidden email]> wrote:
>
> Hi,
>
> Alternatively, how about a new glance() generic that would do something
> like this:
>
>> library(DelayedArray)
>> glance <- DelayedArray:::show_compact_array
>
>> M <- matrix(rnorm(1e6), nrow = 1000L, ncol = 2000L)
>> glance(M)
> <1000 x 2000> matrix object of type "double":
>                [,1]        [,2]        [,3] ...    [,1999]    [,2000]
>    [1,]  -0.8854896   1.8010288   1.3051341   . -0.4473593  0.4684985
>    [2,]  -0.8563415  -0.7102768  -0.9309155   . -1.8743504  0.4300557
>    [3,]   1.0558159  -0.5956583   1.2689806   .  2.7292249  0.2608300
>    [4,]   0.7547356   0.1465714   0.1798959   . -0.1778017  1.3417423
>    [5,]   0.8037360  -2.7081809   0.9766657   . -0.9902788  0.1741957
>     ...           .           .           .   .          .          .
>  [996,]  0.67220752  0.07804320 -0.38743454   .  0.4438639 -0.8130713
>  [997,] -0.67349962 -1.15292067 -0.54505567   .  0.4630923 -1.6287694
>  [998,]  0.03374595 -1.68061325 -0.88458368   . -0.2890962  0.2552267
>  [999,]  0.47861492  1.25530912  0.19436708   . -0.5193121 -1.1695501
> [1000,]  1.52819218  2.23253275 -1.22051720   . -1.0342430 -0.1703396
>
>> A <- array(rnorm(1e6), c(50, 20, 10, 100))
>> glance(A)
> <50 x 20 x 10 x 100> array object of type "double":
> ,,1,1
>             [,1]       [,2]       [,3] ...      [,19]      [,20]
>  [1,] 0.78319619 0.82258390 0.09122269   .  1.7288189  0.7968574
>  [2,] 2.80687459 0.63709640 0.80844430   . -0.3963161 -1.2768284
>   ...          .          .          .   .          .          .
> [49,] -1.0696320 -0.1698111  2.0082890   .  0.4488292  0.5215745
> [50,] -0.7012526 -2.0818229  0.7750518   .  0.3189076  0.1437394
>
> ...
>
> ,,10,100
>             [,1]       [,2]       [,3] ...      [,19]      [,20]
>  [1,]  0.5360649  0.5491561 -0.4098350   .  0.7647435  0.5640699
>  [2,]  0.7924093 -0.7395815 -1.3792913   .  0.1980287 -0.2897026
>   ...          .          .          .   .          .          .
> [49,]  0.6266209  0.3778512  1.4995778   . -0.3820651 -1.4241691
> [50,]  1.9218715  3.5475949  0.5963763   .  0.4005210  0.4385623
>
> H.
>
>
> On 9/16/19 00:54, Michael Chirico wrote:
>> Awesome. Gabe, since you already have a workshopped version, would you like
>> to proceed? Feel free to ping me to review the patch once it's posted.
>>
>> On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler <[hidden email]>
>> wrote:
>>
>>>>>>>> Michael Chirico
>>>>>>>>     on Sun, 15 Sep 2019 20:52:34 +0800 writes:
>>>
>>>> Finally read in detail your response Gabe. Looks great,
>>>> and I agree it's quite intuitive, as well as agree against
>>>> non-recycling.
>>>
>>>> Once the length(n) == length(dim(x)) behavior is enabled,
>>>> I don't think there's any need/desire to have head() do
>>>> x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
>>>> those familiar with head(x, 6), it would seem to me.
>>>
>>>> Mike C
>>>
>>> Thank you, Gabe, and Michael.
>>> I did like Gabe's proposal already back in July but was
>>> busy and/or vacationing then ...
>>>
>>> If you submit this with a patch (that includes changes to both
>>> *.R and *.Rd , including some example) as "wishlist" item to R's
>>> bugzilla, I'm willing/happy to check and commit this to R-devel.
>>>
>>> Martin
>>>
>>>
>>>> On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
>>>> <[hidden email]> wrote:
>>>
>>>>> Hi Michael and Abby,
>>>>>
>>>>> So one thing that could happen that would be backwards
>>>>> compatible (with the exception of something that was an
>>>>> error no longer being an error) is head and tail could
>>>>> take vectors of length (dim(x)) rather than integers of
>>>>> length for n, with the default being n=6 being equivalent
>>>>> to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for
>>>>> the deprecation cycle, if not permanently. It not
>>>>> recycling would be unexpected based on the behavior of
>>>>> many R functions but would preserve the current behavior
>>>>> while granting more fine-grained control to users that
>>>>> feel they need it.
>>>>>
>>>>> A rapidly thrown-together prototype of such a method for
>>>>> the head of a matrix case is as follows:
>>>>>
>>>>> head2 = function(x, n = 6L, ...) { indvecs =
>>>>> lapply(seq_along(dim(x)), function(i) { if(length(n) >=
>>>>> i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni =
>>>>> max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
>>>>> seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE)
>>>>> do.call("[", lstargs) }
>>>>>
>>>>>
>>>>>> mat = matrix(1:100, 10, 10)
>>>>>
>>>>>> *head(mat)*
>>>>>
>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>>>>
>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>>>>>
>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>>>>>
>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>>>>>
>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>>>>>
>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>>>>>
>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>>>>>
>>>>>> *head2(mat)*
>>>>>
>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>>>>
>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>>>>>
>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>>>>>
>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>>>>>
>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>>>>>
>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>>>>>
>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>>>>>
>>>>>> *head2(mat, c(2, 3))*
>>>>>
>>>>> [,1] [,2] [,3]
>>>>>
>>>>> [1,] 1 11 21
>>>>>
>>>>> [2,] 2 12 22
>>>>>
>>>>>> *head2(mat, c(2, -9))*
>>>>>
>>>>> [,1]
>>>>>
>>>>> [1,] 1
>>>>>
>>>>> [2,] 2
>>>>>
>>>>>
>>>>> Now one thing to keep in mind here, is that I think we'd
>>>>> either a) have to make the non-recycling behavior
>>>>> permanent, or b) have head treat data.frames and matrices
>>>>> different with respect to the subsets they grab (which
>>>>> strikes me as a *Bad Plan *(tm)).
>>>>>
>>>>> So I don't think the default behavior would ever be
>>>>> mat[1:6, 1:6], not because of backwards compatibility,
>>>>> but because at least in my intuition that is just not
>>>>> what head on a data.frame should do by default, and I
>>>>> think the behaviors for the basic rectangular datatypes
>>>>> should "stick together". I mean, also because of
>>>>> backwards compatibility, but that could *in theory*
>>>>> change across a long enough deprecation cycle, but the
>>>>> conceptually right thing to do with a data.frame probably
>>>>> won't.
>>>>>
>>>>> All of that said, is head(mat, c(6, 6)) really that much
>>>>> easier to type/better than just mat[1:6, 1:6, drop=FALSE]
>>>>> (I know this will behave differently if any of the dims
>>>>> of mat are less than 6, but if so why are you heading it
>>>>> in the first place ;) )? I don't really have a strong
>>>>> feeling on the answer to that.
>>>>>
>>>>> I'm happy to put a patch for head.matrix,
>>>>> head.data.frame, tail.matrix and tail.data.frame, plus
>>>>> documentation, if people on R-core are interested in
>>>>> this.
>>>>>
>>>>> Note, as most here probably know, and as alluded to
>>>>> above, length(n) > 1 for head or tail currently give an
>>>>> error, so this would be an extension of the existing
>>>>> functionality in the mathematical extension sense, where
>>>>> all existing behavior would remain identical, but the
>>>>> support/valid parameter space would grow.
>>>>>
>>>>> Best, ~G
>>>>>
>>>>>
>>>>> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
>>>>> <[hidden email]> wrote:
>>>>>
>>>>>>> I assume there are lots of backwards-compatibility
>>>>>> issues as well as valid > use cases for this behavior,
>>>>>> so I guess defaulting to M[1:6, 1:6] is out of > the
>>>>>> question.
>>>>>>
>>>>>> Agree.
>>>>>>
>>>>>>> Is there any scope for adding a new argument to
>>>>>> head.matrix that would > allow this flexibility?
>>>>>>
>>>>>> I agree with what you're trying to achieve.  However,
>>>>>> I'm not sure this is as simple as you're suggesting.
>>>>>>
>>>>>> What if the user wants "head" in rows but "tail" in
>>>>>> columns.  Or "head" in rows, and both "head" and "tail"
>>>>>> in columns.  With head and tail alone, there's a
>>>>>> combinatorial explosion.
>>>>>>
>>>>>> Also, when using tail on an unnamed matrix, it may be
>>>>>> desirable to name rows and columns.
>>>>>>
>>>>>> And all of this assumes standard matrix objects.  Add in
>>>>>> a matrix subclasses and related objects, and things get
>>>>>> more complex still.
>>>>>>
>>>>>> As I suggested in a another thread, a few days ago, I'm
>>>>>> planning to write an R package for matrices and
>>>>>> matrix-like objects (possibly extending the Matrix
>>>>>> package), with an initial emphasis on subsetting,
>>>>>> printing and formatting.  So, I'm interested to hear
>>>>>> more suggestions on this topic.
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] mailing list
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>>>>>
>>>>>
>>>
>>>>  [[alternative HTML version deleted]]
>>>
>>>> ______________________________________________
>>>> [hidden email] mailing list
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Fox, John
In reply to this post by Michael Chirico
Dear Herve,

Sorry, I should have said "matrices" rather than "data frames" -- brief() has methods for both.

Best,
 John

  -----------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

> On Sep 17, 2019, at 8:29 AM, Fox, John <[hidden email]> wrote:
>
> Dear Herve,
>
> The brief() generic function in the car package does something very similar to that for data frames (and has methods for other classes of objects as well).
>
> Best,
> John
>
>  -----------------------------
>  John Fox, Professor Emeritus
>  McMaster University
>  Hamilton, Ontario, Canada
>  Web: http::/socserv.mcmaster.ca/jfox
>
>> On Sep 17, 2019, at 2:52 AM, Pages, Herve <[hidden email]> wrote:
>>
>> Hi,
>>
>> Alternatively, how about a new glance() generic that would do something
>> like this:
>>
>>> library(DelayedArray)
>>> glance <- DelayedArray:::show_compact_array
>>
>>> M <- matrix(rnorm(1e6), nrow = 1000L, ncol = 2000L)
>>> glance(M)
>> <1000 x 2000> matrix object of type "double":
>>               [,1]        [,2]        [,3] ...    [,1999]    [,2000]
>>   [1,]  -0.8854896   1.8010288   1.3051341   . -0.4473593  0.4684985
>>   [2,]  -0.8563415  -0.7102768  -0.9309155   . -1.8743504  0.4300557
>>   [3,]   1.0558159  -0.5956583   1.2689806   .  2.7292249  0.2608300
>>   [4,]   0.7547356   0.1465714   0.1798959   . -0.1778017  1.3417423
>>   [5,]   0.8037360  -2.7081809   0.9766657   . -0.9902788  0.1741957
>>    ...           .           .           .   .          .          .
>> [996,]  0.67220752  0.07804320 -0.38743454   .  0.4438639 -0.8130713
>> [997,] -0.67349962 -1.15292067 -0.54505567   .  0.4630923 -1.6287694
>> [998,]  0.03374595 -1.68061325 -0.88458368   . -0.2890962  0.2552267
>> [999,]  0.47861492  1.25530912  0.19436708   . -0.5193121 -1.1695501
>> [1000,]  1.52819218  2.23253275 -1.22051720   . -1.0342430 -0.1703396
>>
>>> A <- array(rnorm(1e6), c(50, 20, 10, 100))
>>> glance(A)
>> <50 x 20 x 10 x 100> array object of type "double":
>> ,,1,1
>>            [,1]       [,2]       [,3] ...      [,19]      [,20]
>> [1,] 0.78319619 0.82258390 0.09122269   .  1.7288189  0.7968574
>> [2,] 2.80687459 0.63709640 0.80844430   . -0.3963161 -1.2768284
>>  ...          .          .          .   .          .          .
>> [49,] -1.0696320 -0.1698111  2.0082890   .  0.4488292  0.5215745
>> [50,] -0.7012526 -2.0818229  0.7750518   .  0.3189076  0.1437394
>>
>> ...
>>
>> ,,10,100
>>            [,1]       [,2]       [,3] ...      [,19]      [,20]
>> [1,]  0.5360649  0.5491561 -0.4098350   .  0.7647435  0.5640699
>> [2,]  0.7924093 -0.7395815 -1.3792913   .  0.1980287 -0.2897026
>>  ...          .          .          .   .          .          .
>> [49,]  0.6266209  0.3778512  1.4995778   . -0.3820651 -1.4241691
>> [50,]  1.9218715  3.5475949  0.5963763   .  0.4005210  0.4385623
>>
>> H.
>>
>>
>> On 9/16/19 00:54, Michael Chirico wrote:
>>> Awesome. Gabe, since you already have a workshopped version, would you like
>>> to proceed? Feel free to ping me to review the patch once it's posted.
>>>
>>> On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler <[hidden email]>
>>> wrote:
>>>
>>>>>>>>> Michael Chirico
>>>>>>>>>    on Sun, 15 Sep 2019 20:52:34 +0800 writes:
>>>>
>>>>> Finally read in detail your response Gabe. Looks great,
>>>>> and I agree it's quite intuitive, as well as agree against
>>>>> non-recycling.
>>>>
>>>>> Once the length(n) == length(dim(x)) behavior is enabled,
>>>>> I don't think there's any need/desire to have head() do
>>>>> x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
>>>>> those familiar with head(x, 6), it would seem to me.
>>>>
>>>>> Mike C
>>>>
>>>> Thank you, Gabe, and Michael.
>>>> I did like Gabe's proposal already back in July but was
>>>> busy and/or vacationing then ...
>>>>
>>>> If you submit this with a patch (that includes changes to both
>>>> *.R and *.Rd , including some example) as "wishlist" item to R's
>>>> bugzilla, I'm willing/happy to check and commit this to R-devel.
>>>>
>>>> Martin
>>>>
>>>>
>>>>> On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
>>>>> <[hidden email]> wrote:
>>>>
>>>>>> Hi Michael and Abby,
>>>>>>
>>>>>> So one thing that could happen that would be backwards
>>>>>> compatible (with the exception of something that was an
>>>>>> error no longer being an error) is head and tail could
>>>>>> take vectors of length (dim(x)) rather than integers of
>>>>>> length for n, with the default being n=6 being equivalent
>>>>>> to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for
>>>>>> the deprecation cycle, if not permanently. It not
>>>>>> recycling would be unexpected based on the behavior of
>>>>>> many R functions but would preserve the current behavior
>>>>>> while granting more fine-grained control to users that
>>>>>> feel they need it.
>>>>>>
>>>>>> A rapidly thrown-together prototype of such a method for
>>>>>> the head of a matrix case is as follows:
>>>>>>
>>>>>> head2 = function(x, n = 6L, ...) { indvecs =
>>>>>> lapply(seq_along(dim(x)), function(i) { if(length(n) >=
>>>>>> i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni =
>>>>>> max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
>>>>>> seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE)
>>>>>> do.call("[", lstargs) }
>>>>>>
>>>>>>
>>>>>>> mat = matrix(1:100, 10, 10)
>>>>>>
>>>>>>> *head(mat)*
>>>>>>
>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>>>>>
>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>>>>>>
>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>>>>>>
>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>>>>>>
>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>>>>>>
>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>>>>>>
>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>>>>>>
>>>>>>> *head2(mat)*
>>>>>>
>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>>>>>
>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>>>>>>
>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>>>>>>
>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>>>>>>
>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>>>>>>
>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>>>>>>
>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>>>>>>
>>>>>>> *head2(mat, c(2, 3))*
>>>>>>
>>>>>> [,1] [,2] [,3]
>>>>>>
>>>>>> [1,] 1 11 21
>>>>>>
>>>>>> [2,] 2 12 22
>>>>>>
>>>>>>> *head2(mat, c(2, -9))*
>>>>>>
>>>>>> [,1]
>>>>>>
>>>>>> [1,] 1
>>>>>>
>>>>>> [2,] 2
>>>>>>
>>>>>>
>>>>>> Now one thing to keep in mind here, is that I think we'd
>>>>>> either a) have to make the non-recycling behavior
>>>>>> permanent, or b) have head treat data.frames and matrices
>>>>>> different with respect to the subsets they grab (which
>>>>>> strikes me as a *Bad Plan *(tm)).
>>>>>>
>>>>>> So I don't think the default behavior would ever be
>>>>>> mat[1:6, 1:6], not because of backwards compatibility,
>>>>>> but because at least in my intuition that is just not
>>>>>> what head on a data.frame should do by default, and I
>>>>>> think the behaviors for the basic rectangular datatypes
>>>>>> should "stick together". I mean, also because of
>>>>>> backwards compatibility, but that could *in theory*
>>>>>> change across a long enough deprecation cycle, but the
>>>>>> conceptually right thing to do with a data.frame probably
>>>>>> won't.
>>>>>>
>>>>>> All of that said, is head(mat, c(6, 6)) really that much
>>>>>> easier to type/better than just mat[1:6, 1:6, drop=FALSE]
>>>>>> (I know this will behave differently if any of the dims
>>>>>> of mat are less than 6, but if so why are you heading it
>>>>>> in the first place ;) )? I don't really have a strong
>>>>>> feeling on the answer to that.
>>>>>>
>>>>>> I'm happy to put a patch for head.matrix,
>>>>>> head.data.frame, tail.matrix and tail.data.frame, plus
>>>>>> documentation, if people on R-core are interested in
>>>>>> this.
>>>>>>
>>>>>> Note, as most here probably know, and as alluded to
>>>>>> above, length(n) > 1 for head or tail currently give an
>>>>>> error, so this would be an extension of the existing
>>>>>> functionality in the mathematical extension sense, where
>>>>>> all existing behavior would remain identical, but the
>>>>>> support/valid parameter space would grow.
>>>>>>
>>>>>> Best, ~G
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
>>>>>> <[hidden email]> wrote:
>>>>>>
>>>>>>>> I assume there are lots of backwards-compatibility
>>>>>>> issues as well as valid > use cases for this behavior,
>>>>>>> so I guess defaulting to M[1:6, 1:6] is out of > the
>>>>>>> question.
>>>>>>>
>>>>>>> Agree.
>>>>>>>
>>>>>>>> Is there any scope for adding a new argument to
>>>>>>> head.matrix that would > allow this flexibility?
>>>>>>>
>>>>>>> I agree with what you're trying to achieve.  However,
>>>>>>> I'm not sure this is as simple as you're suggesting.
>>>>>>>
>>>>>>> What if the user wants "head" in rows but "tail" in
>>>>>>> columns.  Or "head" in rows, and both "head" and "tail"
>>>>>>> in columns.  With head and tail alone, there's a
>>>>>>> combinatorial explosion.
>>>>>>>
>>>>>>> Also, when using tail on an unnamed matrix, it may be
>>>>>>> desirable to name rows and columns.
>>>>>>>
>>>>>>> And all of this assumes standard matrix objects.  Add in
>>>>>>> a matrix subclasses and related objects, and things get
>>>>>>> more complex still.
>>>>>>>
>>>>>>> As I suggested in a another thread, a few days ago, I'm
>>>>>>> planning to write an R package for matrices and
>>>>>>> matrix-like objects (possibly extending the Matrix
>>>>>>> package), with an initial emphasis on subsetting,
>>>>>>> printing and formatting.  So, I'm interested to hear
>>>>>>> more suggestions on this topic.
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> [hidden email] mailing list
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>>>>>>
>>>>>>
>>>>
>>>>> [[alternative HTML version deleted]]
>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: [hidden email]
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Martin Maechler
>>>>> Fox, John
>>>>>     on Tue, 17 Sep 2019 12:32:13 +0000 writes:

    > Dear Herve,
    > Sorry, I should have said "matrices" rather than "data frames" -- brief() has methods for both.

    > Best,
    > John

    > -----------------------------
    > John Fox, Professor Emeritus
    > McMaster University
    > Hamilton, Ontario, Canada
    > Web: http::/socserv.mcmaster.ca/jfox

    >> On Sep 17, 2019, at 8:29 AM, Fox, John <[hidden email]> wrote:
    >>
    >> Dear Herve,
    >>
    >> The brief() generic function in the car package does something very similar to that for data frames (and has methods for other classes of objects as well).
    >>
    >> Best,
    >> John
    >>
    >> -----------------------------
    >> John Fox, Professor Emeritus
    >> McMaster University
    >> Hamilton, Ontario, Canada
    >> Web: http::/socserv.mcmaster.ca/jfox
    >>
    >>> On Sep 17, 2019, at 2:52 AM, Pages, Herve <[hidden email]> wrote:
    >>>
    >>> Hi,
    >>>
    >>> Alternatively, how about a new glance() generic that would do something
    >>> like this:
    >>>
    >>>> library(DelayedArray)
    >>>> glance <- DelayedArray:::show_compact_array
    >>>
    >>>> M <- matrix(rnorm(1e6), nrow = 1000L, ncol = 2000L)
    >>>> glance(M)
    >>> <1000 x 2000> matrix object of type "double":
    >>> [,1]        [,2]        [,3] ...    [,1999]    [,2000]
    >>> [1,]  -0.8854896   1.8010288   1.3051341   . -0.4473593  0.4684985
    >>> [2,]  -0.8563415  -0.7102768  -0.9309155   . -1.8743504  0.4300557
    >>> [3,]   1.0558159  -0.5956583   1.2689806   .  2.7292249  0.2608300
    >>> [4,]   0.7547356   0.1465714   0.1798959   . -0.1778017  1.3417423
    >>> [5,]   0.8037360  -2.7081809   0.9766657   . -0.9902788  0.1741957
    >>> ...           .           .           .   .          .          .
    >>> [996,]  0.67220752  0.07804320 -0.38743454   .  0.4438639 -0.8130713
    >>> [997,] -0.67349962 -1.15292067 -0.54505567   .  0.4630923 -1.6287694
    >>> [998,]  0.03374595 -1.68061325 -0.88458368   . -0.2890962  0.2552267
    >>> [999,]  0.47861492  1.25530912  0.19436708   . -0.5193121 -1.1695501
    >>> [1000,]  1.52819218  2.23253275 -1.22051720   . -1.0342430 -0.1703396
    >>>
    >>>> A <- array(rnorm(1e6), c(50, 20, 10, 100))
    >>>> glance(A)
    >>> <50 x 20 x 10 x 100> array object of type "double":
    >>> ,,1,1
    >>> [,1]       [,2]       [,3] ...      [,19]      [,20]
    >>> [1,] 0.78319619 0.82258390 0.09122269   .  1.7288189  0.7968574
    >>> [2,] 2.80687459 0.63709640 0.80844430   . -0.3963161 -1.2768284
    >>> ...          .          .          .   .          .          .
    >>> [49,] -1.0696320 -0.1698111  2.0082890   .  0.4488292  0.5215745
    >>> [50,] -0.7012526 -2.0818229  0.7750518   .  0.3189076  0.1437394
    >>>
    >>> ...
    >>>
    >>> ,,10,100
    >>> [,1]       [,2]       [,3] ...      [,19]      [,20]
    >>> [1,]  0.5360649  0.5491561 -0.4098350   .  0.7647435  0.5640699
    >>> [2,]  0.7924093 -0.7395815 -1.3792913   .  0.1980287 -0.2897026
    >>> ...          .          .          .   .          .          .
    >>> [49,]  0.6266209  0.3778512  1.4995778   . -0.3820651 -1.4241691
    >>> [50,]  1.9218715  3.5475949  0.5963763   .  0.4005210  0.4385623
    >>>
    >>> H.

Thank you, Hervé and John.
Both glance() and brief() are nice, and I think a version of one of
them could also make a nice addition to the 'utils' package.

However, there's a principal difference between them and the
proposed generalized head {or tail} :
The latter really does *return* a sub matrix/array of chosen
dimensions with modified dimnames and that *object* then is
printed if not assigned.

OTOH,  glance() and brief() rather are versions of print()
and I think have a dedicated "display-only" purpose {yes, I see they do
return something; glance() returning a character object, brief()
returning the principal argument invisibly, the same as any
"correct" print() method..}

From the above, I think it may make sense to entertain both a
generalization of head() and one such a glance() / brief()
/.. function which for a matrix shows all 4 corners of the
matrix of data frame.

There's another important criterion here:  __Simplicity__ in the
code that's added (and will have to be maintained as part of R
"forever" into the future)...
AFAICS, the DelayedArray stuff is beatifully modular, but
possibly also much entangled in the dependent packages and classes we
cannot require for 'utils'.

The current source for head() and tail() and all their methods
in utils is just 83 lines of code  {file utils/R/head.R minus
the initial mostly copyright comments}.
I am very reluctant to consider blowing that up by factors...


Martin

    >>> On 9/16/19 00:54, Michael Chirico wrote:
    >>>> Awesome. Gabe, since you already have a workshopped version, would you like
    >>>> to proceed? Feel free to ping me to review the patch once it's posted.
    >>>>
    >>>> On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler <[hidden email]>
    >>>> wrote:
    >>>>
    >>>>>>>>>> Michael Chirico
    >>>>>>>>>> on Sun, 15 Sep 2019 20:52:34 +0800 writes:
    >>>>>
>>>>> Finally read in detail your response Gabe. Looks great,
>>>>> and I agree it's quite intuitive, as well as agree against
>>>>> non-recycling.
    >>>>>
>>>>> Once the length(n) == length(dim(x)) behavior is enabled,
>>>>> I don't think there's any need/desire to have head() do
>>>>> x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
>>>>> those familiar with head(x, 6), it would seem to me.
    >>>>>
>>>>> Mike C
    >>>>>
    >>>>> Thank you, Gabe, and Michael.
    >>>>> I did like Gabe's proposal already back in July but was
    >>>>> busy and/or vacationing then ...
    >>>>>
    >>>>> If you submit this with a patch (that includes changes to both
    >>>>> *.R and *.Rd , including some example) as "wishlist" item to R's
    >>>>> bugzilla, I'm willing/happy to check and commit this to R-devel.
    >>>>>
    >>>>> Martin
    >>>>>
    >>>>>
>>>>> On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
>>>>> <[hidden email]> wrote:
    >>>>>
    >>>>>>> Hi Michael and Abby,
    >>>>>>>
    >>>>>>> So one thing that could happen that would be backwards
    >>>>>>> compatible (with the exception of something that was an
    >>>>>>> error no longer being an error) is head and tail could
    >>>>>>> take vectors of length (dim(x)) rather than integers of
    >>>>>>> length for n, with the default being n=6 being equivalent
    >>>>>>> to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for
    >>>>>>> the deprecation cycle, if not permanently. It not
    >>>>>>> recycling would be unexpected based on the behavior of
    >>>>>>> many R functions but would preserve the current behavior
    >>>>>>> while granting more fine-grained control to users that
    >>>>>>> feel they need it.
    >>>>>>>
    >>>>>>> A rapidly thrown-together prototype of such a method for
    >>>>>>> the head of a matrix case is as follows:
    >>>>>>>
    >>>>>>> head2 = function(x, n = 6L, ...) { indvecs =
    >>>>>>> lapply(seq_along(dim(x)), function(i) { if(length(n) >=
    >>>>>>> i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni =
    >>>>>>> max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
    >>>>>>> seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE)
    >>>>>>> do.call("[", lstargs) }
    >>>>>>>
    >>>>>>>
    >>>>>>>> mat = matrix(1:100, 10, 10)
    >>>>>>>
    >>>>>>>> *head(mat)*
    >>>>>>>
    >>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    >>>>>>>
    >>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
    >>>>>>>
    >>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
    >>>>>>>
    >>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
    >>>>>>>
    >>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
    >>>>>>>
    >>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
    >>>>>>>
    >>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
    >>>>>>>
    >>>>>>>> *head2(mat)*
    >>>>>>>
    >>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    >>>>>>>
    >>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
    >>>>>>>
    >>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
    >>>>>>>
    >>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
    >>>>>>>
    >>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
    >>>>>>>
    >>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
    >>>>>>>
    >>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
    >>>>>>>
    >>>>>>>> *head2(mat, c(2, 3))*
    >>>>>>>
    >>>>>>> [,1] [,2] [,3]
    >>>>>>>
    >>>>>>> [1,] 1 11 21
    >>>>>>>
    >>>>>>> [2,] 2 12 22
    >>>>>>>
    >>>>>>>> *head2(mat, c(2, -9))*
    >>>>>>>
    >>>>>>> [,1]
    >>>>>>>
    >>>>>>> [1,] 1
    >>>>>>>
    >>>>>>> [2,] 2
    >>>>>>>
    >>>>>>>
    >>>>>>> Now one thing to keep in mind here, is that I think we'd
    >>>>>>> either a) have to make the non-recycling behavior
    >>>>>>> permanent, or b) have head treat data.frames and matrices
    >>>>>>> different with respect to the subsets they grab (which
    >>>>>>> strikes me as a *Bad Plan *(tm)).
    >>>>>>>
    >>>>>>> So I don't think the default behavior would ever be
    >>>>>>> mat[1:6, 1:6], not because of backwards compatibility,
    >>>>>>> but because at least in my intuition that is just not
    >>>>>>> what head on a data.frame should do by default, and I
    >>>>>>> think the behaviors for the basic rectangular datatypes
    >>>>>>> should "stick together". I mean, also because of
    >>>>>>> backwards compatibility, but that could *in theory*
    >>>>>>> change across a long enough deprecation cycle, but the
    >>>>>>> conceptually right thing to do with a data.frame probably
    >>>>>>> won't.
    >>>>>>>
    >>>>>>> All of that said, is head(mat, c(6, 6)) really that much
    >>>>>>> easier to type/better than just mat[1:6, 1:6, drop=FALSE]
    >>>>>>> (I know this will behave differently if any of the dims
    >>>>>>> of mat are less than 6, but if so why are you heading it
    >>>>>>> in the first place ;) )? I don't really have a strong
    >>>>>>> feeling on the answer to that.
    >>>>>>>
    >>>>>>> I'm happy to put a patch for head.matrix,
    >>>>>>> head.data.frame, tail.matrix and tail.data.frame, plus
    >>>>>>> documentation, if people on R-core are interested in
    >>>>>>> this.
    >>>>>>>
    >>>>>>> Note, as most here probably know, and as alluded to
    >>>>>>> above, length(n) > 1 for head or tail currently give an
    >>>>>>> error, so this would be an extension of the existing
    >>>>>>> functionality in the mathematical extension sense, where
    >>>>>>> all existing behavior would remain identical, but the
    >>>>>>> support/valid parameter space would grow.
    >>>>>>>
    >>>>>>> Best, ~G
    >>>>>>>
    >>>>>>>
    >>>>>>> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
    >>>>>>> <[hidden email]> wrote:
    >>>>>>>
    >>>>>>>>> I assume there are lots of backwards-compatibility
    >>>>>>>> issues as well as valid > use cases for this behavior,
    >>>>>>>> so I guess defaulting to M[1:6, 1:6] is out of > the
    >>>>>>>> question.
    >>>>>>>>
    >>>>>>>> Agree.
    >>>>>>>>
    >>>>>>>>> Is there any scope for adding a new argument to
    >>>>>>>> head.matrix that would > allow this flexibility?
    >>>>>>>>
    >>>>>>>> I agree with what you're trying to achieve.  However,
    >>>>>>>> I'm not sure this is as simple as you're suggesting.
    >>>>>>>>
    >>>>>>>> What if the user wants "head" in rows but "tail" in
    >>>>>>>> columns.  Or "head" in rows, and both "head" and "tail"
    >>>>>>>> in columns.  With head and tail alone, there's a
    >>>>>>>> combinatorial explosion.
    >>>>>>>>
    >>>>>>>> Also, when using tail on an unnamed matrix, it may be
    >>>>>>>> desirable to name rows and columns.
    >>>>>>>>
    >>>>>>>> And all of this assumes standard matrix objects.  Add in
    >>>>>>>> a matrix subclasses and related objects, and things get
    >>>>>>>> more complex still.
    >>>>>>>>
    >>>>>>>> As I suggested in a another thread, a few days ago, I'm
    >>>>>>>> planning to write an R package for matrices and
    >>>>>>>> matrix-like objects (possibly extending the Matrix
    >>>>>>>> package), with an initial emphasis on subsetting,
    >>>>>>>> printing and formatting.  So, I'm interested to hear
    >>>>>>>> more suggestions on this topic.
    >>>>>>>>
    >>>>>>>> [[alternative HTML version deleted]]
    >>>>>>>>
    >>>>>>>> ______________________________________________
    >>>>>>>> [hidden email] mailing list
    >>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
    >>>>>>>>
    >>>>>>>
    >>>>>
>>>>> [[alternative HTML version deleted]]
    >>>>>
>>>>> ______________________________________________
>>>>> [hidden email] mailing list
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
    >>>>>
    >>>>
    >>>> [[alternative HTML version deleted]]
    >>>>
    >>>> ______________________________________________
    >>>> [hidden email] mailing list
    >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
    >>>>
    >>>
    >>> --
    >>> Hervé Pagès
    >>>
    >>> Program in Computational Biology
    >>> Division of Public Health Sciences
    >>> Fred Hutchinson Cancer Research Center
    >>> 1100 Fairview Ave. N, M1-B514
    >>> P.O. Box 19024
    >>> Seattle, WA 98109-1024
    >>>
    >>> E-mail: [hidden email]
    >>> Phone:  (206) 667-5791
    >>> Fax:    (206) 667-1319
    >>> ______________________________________________
    >>> [hidden email] mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Gabriel Becker-2
Hi Martin et al.

Sorry for not getting back onto this sooner. I've been pretty well buried
under travel plus being sick for a bit, but I will be happy to roll up a
patch for this, including documentation and put it into a wishlist item.

I'll aim to do that at some point next week.

Thanks @Martin Maechler <[hidden email]> for engaging with us
and being willing to consider the patch.

Best,
~G

On Tue, Sep 17, 2019 at 9:17 AM Martin Maechler <[hidden email]>
wrote:

> >>>>> Fox, John
> >>>>>     on Tue, 17 Sep 2019 12:32:13 +0000 writes:
>
>     > Dear Herve,
>     > Sorry, I should have said "matrices" rather than "data frames" --
> brief() has methods for both.
>
>     > Best,
>     > John
>
>     > -----------------------------
>     > John Fox, Professor Emeritus
>     > McMaster University
>     > Hamilton, Ontario, Canada
>     > Web: http::/socserv.mcmaster.ca/jfox
>
>     >> On Sep 17, 2019, at 8:29 AM, Fox, John <[hidden email]> wrote:
>     >>
>     >> Dear Herve,
>     >>
>     >> The brief() generic function in the car package does something very
> similar to that for data frames (and has methods for other classes of
> objects as well).
>     >>
>     >> Best,
>     >> John
>     >>
>     >> -----------------------------
>     >> John Fox, Professor Emeritus
>     >> McMaster University
>     >> Hamilton, Ontario, Canada
>     >> Web: http::/socserv.mcmaster.ca/jfox
>     >>
>     >>> On Sep 17, 2019, at 2:52 AM, Pages, Herve <[hidden email]>
> wrote:
>     >>>
>     >>> Hi,
>     >>>
>     >>> Alternatively, how about a new glance() generic that would do
> something
>     >>> like this:
>     >>>
>     >>>> library(DelayedArray)
>     >>>> glance <- DelayedArray:::show_compact_array
>     >>>
>     >>>> M <- matrix(rnorm(1e6), nrow = 1000L, ncol = 2000L)
>     >>>> glance(M)
>     >>> <1000 x 2000> matrix object of type "double":
>     >>> [,1]        [,2]        [,3] ...    [,1999]    [,2000]
>     >>> [1,]  -0.8854896   1.8010288   1.3051341   . -0.4473593  0.4684985
>     >>> [2,]  -0.8563415  -0.7102768  -0.9309155   . -1.8743504  0.4300557
>     >>> [3,]   1.0558159  -0.5956583   1.2689806   .  2.7292249  0.2608300
>     >>> [4,]   0.7547356   0.1465714   0.1798959   . -0.1778017  1.3417423
>     >>> [5,]   0.8037360  -2.7081809   0.9766657   . -0.9902788  0.1741957
>     >>> ...           .           .           .   .          .          .
>     >>> [996,]  0.67220752  0.07804320 -0.38743454   .  0.4438639
> -0.8130713
>     >>> [997,] -0.67349962 -1.15292067 -0.54505567   .  0.4630923
> -1.6287694
>     >>> [998,]  0.03374595 -1.68061325 -0.88458368   . -0.2890962
> 0.2552267
>     >>> [999,]  0.47861492  1.25530912  0.19436708   . -0.5193121
> -1.1695501
>     >>> [1000,]  1.52819218  2.23253275 -1.22051720   . -1.0342430
> -0.1703396
>     >>>
>     >>>> A <- array(rnorm(1e6), c(50, 20, 10, 100))
>     >>>> glance(A)
>     >>> <50 x 20 x 10 x 100> array object of type "double":
>     >>> ,,1,1
>     >>> [,1]       [,2]       [,3] ...      [,19]      [,20]
>     >>> [1,] 0.78319619 0.82258390 0.09122269   .  1.7288189  0.7968574
>     >>> [2,] 2.80687459 0.63709640 0.80844430   . -0.3963161 -1.2768284
>     >>> ...          .          .          .   .          .          .
>     >>> [49,] -1.0696320 -0.1698111  2.0082890   .  0.4488292  0.5215745
>     >>> [50,] -0.7012526 -2.0818229  0.7750518   .  0.3189076  0.1437394
>     >>>
>     >>> ...
>     >>>
>     >>> ,,10,100
>     >>> [,1]       [,2]       [,3] ...      [,19]      [,20]
>     >>> [1,]  0.5360649  0.5491561 -0.4098350   .  0.7647435  0.5640699
>     >>> [2,]  0.7924093 -0.7395815 -1.3792913   .  0.1980287 -0.2897026
>     >>> ...          .          .          .   .          .          .
>     >>> [49,]  0.6266209  0.3778512  1.4995778   . -0.3820651 -1.4241691
>     >>> [50,]  1.9218715  3.5475949  0.5963763   .  0.4005210  0.4385623
>     >>>
>     >>> H.
>
> Thank you, Hervé and John.
> Both glance() and brief() are nice, and I think a version of one of
> them could also make a nice addition to the 'utils' package.
>
> However, there's a principal difference between them and the
> proposed generalized head {or tail} :
> The latter really does *return* a sub matrix/array of chosen
> dimensions with modified dimnames and that *object* then is
> printed if not assigned.
>
> OTOH,  glance() and brief() rather are versions of print()
> and I think have a dedicated "display-only" purpose {yes, I see they do
> return something; glance() returning a character object, brief()
> returning the principal argument invisibly, the same as any
> "correct" print() method..}
>
> From the above, I think it may make sense to entertain both a
> generalization of head() and one such a glance() / brief()
> /.. function which for a matrix shows all 4 corners of the
> matrix of data frame.
>
> There's another important criterion here:  __Simplicity__ in the
> code that's added (and will have to be maintained as part of R
> "forever" into the future)...
> AFAICS, the DelayedArray stuff is beatifully modular, but
> possibly also much entangled in the dependent packages and classes we
> cannot require for 'utils'.
>
> The current source for head() and tail() and all their methods
> in utils is just 83 lines of code  {file utils/R/head.R minus
> the initial mostly copyright comments}.
> I am very reluctant to consider blowing that up by factors...
>
>
> Martin
>
>     >>> On 9/16/19 00:54, Michael Chirico wrote:
>     >>>> Awesome. Gabe, since you already have a workshopped version,
> would you like
>     >>>> to proceed? Feel free to ping me to review the patch once it's
> posted.
>     >>>>
>     >>>> On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler <
> [hidden email]>
>     >>>> wrote:
>     >>>>
>     >>>>>>>>>> Michael Chirico
>     >>>>>>>>>> on Sun, 15 Sep 2019 20:52:34 +0800 writes:
>     >>>>>
> >>>>> Finally read in detail your response Gabe. Looks great,
> >>>>> and I agree it's quite intuitive, as well as agree against
> >>>>> non-recycling.
>     >>>>>
> >>>>> Once the length(n) == length(dim(x)) behavior is enabled,
> >>>>> I don't think there's any need/desire to have head() do
> >>>>> x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
> >>>>> those familiar with head(x, 6), it would seem to me.
>     >>>>>
> >>>>> Mike C
>     >>>>>
>     >>>>> Thank you, Gabe, and Michael.
>     >>>>> I did like Gabe's proposal already back in July but was
>     >>>>> busy and/or vacationing then ...
>     >>>>>
>     >>>>> If you submit this with a patch (that includes changes to both
>     >>>>> *.R and *.Rd , including some example) as "wishlist" item to R's
>     >>>>> bugzilla, I'm willing/happy to check and commit this to R-devel.
>     >>>>>
>     >>>>> Martin
>     >>>>>
>     >>>>>
> >>>>> On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
> >>>>> <[hidden email]> wrote:
>     >>>>>
>     >>>>>>> Hi Michael and Abby,
>     >>>>>>>
>     >>>>>>> So one thing that could happen that would be backwards
>     >>>>>>> compatible (with the exception of something that was an
>     >>>>>>> error no longer being an error) is head and tail could
>     >>>>>>> take vectors of length (dim(x)) rather than integers of
>     >>>>>>> length for n, with the default being n=6 being equivalent
>     >>>>>>> to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for
>     >>>>>>> the deprecation cycle, if not permanently. It not
>     >>>>>>> recycling would be unexpected based on the behavior of
>     >>>>>>> many R functions but would preserve the current behavior
>     >>>>>>> while granting more fine-grained control to users that
>     >>>>>>> feel they need it.
>     >>>>>>>
>     >>>>>>> A rapidly thrown-together prototype of such a method for
>     >>>>>>> the head of a matrix case is as follows:
>     >>>>>>>
>     >>>>>>> head2 = function(x, n = 6L, ...) { indvecs =
>     >>>>>>> lapply(seq_along(dim(x)), function(i) { if(length(n) >=
>     >>>>>>> i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni =
>     >>>>>>> max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
>     >>>>>>> seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE)
>     >>>>>>> do.call("[", lstargs) }
>     >>>>>>>
>     >>>>>>>
>     >>>>>>>> mat = matrix(1:100, 10, 10)
>     >>>>>>>
>     >>>>>>>> *head(mat)*
>     >>>>>>>
>     >>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>     >>>>>>>
>     >>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>     >>>>>>>
>     >>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>     >>>>>>>
>     >>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>     >>>>>>>
>     >>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>     >>>>>>>
>     >>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>     >>>>>>>
>     >>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>     >>>>>>>
>     >>>>>>>> *head2(mat)*
>     >>>>>>>
>     >>>>>>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>     >>>>>>>
>     >>>>>>> [1,] 1 11 21 31 41 51 61 71 81 91
>     >>>>>>>
>     >>>>>>> [2,] 2 12 22 32 42 52 62 72 82 92
>     >>>>>>>
>     >>>>>>> [3,] 3 13 23 33 43 53 63 73 83 93
>     >>>>>>>
>     >>>>>>> [4,] 4 14 24 34 44 54 64 74 84 94
>     >>>>>>>
>     >>>>>>> [5,] 5 15 25 35 45 55 65 75 85 95
>     >>>>>>>
>     >>>>>>> [6,] 6 16 26 36 46 56 66 76 86 96
>     >>>>>>>
>     >>>>>>>> *head2(mat, c(2, 3))*
>     >>>>>>>
>     >>>>>>> [,1] [,2] [,3]
>     >>>>>>>
>     >>>>>>> [1,] 1 11 21
>     >>>>>>>
>     >>>>>>> [2,] 2 12 22
>     >>>>>>>
>     >>>>>>>> *head2(mat, c(2, -9))*
>     >>>>>>>
>     >>>>>>> [,1]
>     >>>>>>>
>     >>>>>>> [1,] 1
>     >>>>>>>
>     >>>>>>> [2,] 2
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> Now one thing to keep in mind here, is that I think we'd
>     >>>>>>> either a) have to make the non-recycling behavior
>     >>>>>>> permanent, or b) have head treat data.frames and matrices
>     >>>>>>> different with respect to the subsets they grab (which
>     >>>>>>> strikes me as a *Bad Plan *(tm)).
>     >>>>>>>
>     >>>>>>> So I don't think the default behavior would ever be
>     >>>>>>> mat[1:6, 1:6], not because of backwards compatibility,
>     >>>>>>> but because at least in my intuition that is just not
>     >>>>>>> what head on a data.frame should do by default, and I
>     >>>>>>> think the behaviors for the basic rectangular datatypes
>     >>>>>>> should "stick together". I mean, also because of
>     >>>>>>> backwards compatibility, but that could *in theory*
>     >>>>>>> change across a long enough deprecation cycle, but the
>     >>>>>>> conceptually right thing to do with a data.frame probably
>     >>>>>>> won't.
>     >>>>>>>
>     >>>>>>> All of that said, is head(mat, c(6, 6)) really that much
>     >>>>>>> easier to type/better than just mat[1:6, 1:6, drop=FALSE]
>     >>>>>>> (I know this will behave differently if any of the dims
>     >>>>>>> of mat are less than 6, but if so why are you heading it
>     >>>>>>> in the first place ;) )? I don't really have a strong
>     >>>>>>> feeling on the answer to that.
>     >>>>>>>
>     >>>>>>> I'm happy to put a patch for head.matrix,
>     >>>>>>> head.data.frame, tail.matrix and tail.data.frame, plus
>     >>>>>>> documentation, if people on R-core are interested in
>     >>>>>>> this.
>     >>>>>>>
>     >>>>>>> Note, as most here probably know, and as alluded to
>     >>>>>>> above, length(n) > 1 for head or tail currently give an
>     >>>>>>> error, so this would be an extension of the existing
>     >>>>>>> functionality in the mathematical extension sense, where
>     >>>>>>> all existing behavior would remain identical, but the
>     >>>>>>> support/valid parameter space would grow.
>     >>>>>>>
>     >>>>>>> Best, ~G
>     >>>>>>>
>     >>>>>>>
>     >>>>>>> On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
>     >>>>>>> <[hidden email]> wrote:
>     >>>>>>>
>     >>>>>>>>> I assume there are lots of backwards-compatibility
>     >>>>>>>> issues as well as valid > use cases for this behavior,
>     >>>>>>>> so I guess defaulting to M[1:6, 1:6] is out of > the
>     >>>>>>>> question.
>     >>>>>>>>
>     >>>>>>>> Agree.
>     >>>>>>>>
>     >>>>>>>>> Is there any scope for adding a new argument to
>     >>>>>>>> head.matrix that would > allow this flexibility?
>     >>>>>>>>
>     >>>>>>>> I agree with what you're trying to achieve.  However,
>     >>>>>>>> I'm not sure this is as simple as you're suggesting.
>     >>>>>>>>
>     >>>>>>>> What if the user wants "head" in rows but "tail" in
>     >>>>>>>> columns.  Or "head" in rows, and both "head" and "tail"
>     >>>>>>>> in columns.  With head and tail alone, there's a
>     >>>>>>>> combinatorial explosion.
>     >>>>>>>>
>     >>>>>>>> Also, when using tail on an unnamed matrix, it may be
>     >>>>>>>> desirable to name rows and columns.
>     >>>>>>>>
>     >>>>>>>> And all of this assumes standard matrix objects.  Add in
>     >>>>>>>> a matrix subclasses and related objects, and things get
>     >>>>>>>> more complex still.
>     >>>>>>>>
>     >>>>>>>> As I suggested in a another thread, a few days ago, I'm
>     >>>>>>>> planning to write an R package for matrices and
>     >>>>>>>> matrix-like objects (possibly extending the Matrix
>     >>>>>>>> package), with an initial emphasis on subsetting,
>     >>>>>>>> printing and formatting.  So, I'm interested to hear
>     >>>>>>>> more suggestions on this topic.
>     >>>>>>>>
>     >>>>>>>> [[alternative HTML version deleted]]
>     >>>>>>>>
>     >>>>>>>> ______________________________________________
>     >>>>>>>> [hidden email] mailing list
>     >>>>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>     >>>>>>>>
>     >>>>>>>
>     >>>>>
> >>>>> [[alternative HTML version deleted]]
>     >>>>>
> >>>>> ______________________________________________
> >>>>> [hidden email] mailing list
> >>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>     >>>>>
>     >>>>
>     >>>> [[alternative HTML version deleted]]
>     >>>>
>     >>>> ______________________________________________
>     >>>> [hidden email] mailing list
>     >>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=
>     >>>>
>     >>>
>     >>> --
>     >>> Hervé Pagès
>     >>>
>     >>> Program in Computational Biology
>     >>> Division of Public Health Sciences
>     >>> Fred Hutchinson Cancer Research Center
>     >>> 1100 Fairview Ave. N, M1-B514
>     >>> P.O. Box 19024
>     >>> Seattle, WA 98109-1024
>     >>>
>     >>> E-mail: [hidden email]
>     >>> Phone:  (206) 667-5791
>     >>> Fax:    (206) 667-1319
>     >>> ______________________________________________
>     >>> [hidden email] mailing list
>     >>> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>
>     >> ______________________________________________
>     >> [hidden email] mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Gabriel Becker-2
Hi all,

So I've started working on this and I ran into something that I didn't
know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
ignore dimension completely, treat x as an atomic vector, and return an
(unclassed) atomic vector:

> x = array(100, c(4, 5, 5))

> dim(x)

[1] 4 5 5

> head(x, 1)

[1] 100

> class(head(x))

[1] "numeric"


(For a 1d array, it does return another 1d array).

When extending head/tail to understand multiple dimensions as discussed in
this thread, then, should the behavior for 2+d arrays be explicitly
retained, or should head and tail do the analogous thing (with a head(<2d
array>) behaving the same as head(<matrix>), which honestly is what I
expected to already be happening)?

Are people using/relying on this behavior in their code, and if so, why/for
what?

Even more generally, one way forward is to have the default methods check
for dimensions, and use length if it is null:

tail.default <- tail.data.frame <- function(x, n = 6L, ...)
{
    if(any(n == 0))
        stop("n must be non-zero or unspecified for all dimensions")
    if(!is.null(dim(x)))
        dimsx <- dim(x)
    else
        dimsx <- length(x)

    ## this returns a list of vectors of indices in each
    ## dimension, regardless of length of the the n
    ## argument
    sel <- lapply(seq_along(dimsx), function(i) {
        dxi <- dimsx[i]
        ## select all indices (full dim) if not specified
        ni <- if(length(n) >= i) n[i] else dxi
        ## handle negative ns
        ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
        seq.int(to = dxi, length.out = ni)
    })
    args <- c(list(x), sel, drop = FALSE)
    do.call("[", args)
}


I think this precludes the need for a separate data.frame method at all,
actually, though (I would think) tail.data.frame would still be defined and
exported for backwards compatibility. (the matrix method has some extra
bits so my current conception of it is still separate, though it might not
NEED to be).

The question then becomes, should head/tail always return something with
the same dimensionally (number of dims) it got, or should data.frame and
matrix be special cased in this regard, as they are now?

What are people's thoughts?
~G

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Jan Gorecki
Gabriel,
My view is rather radical.

- head/tail should return object having same number of dimensions
- data.frame should be a special case
- matrix should be handled as 2D array

P.S. idea of accepting `n` argument as a vector of corresponding
dimensions is a brilliant one

On Wed, Oct 30, 2019 at 1:13 AM Gabriel Becker <[hidden email]> wrote:

>
> Hi all,
>
> So I've started working on this and I ran into something that I didn't
> know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
> ignore dimension completely, treat x as an atomic vector, and return an
> (unclassed) atomic vector:
>
> > x = array(100, c(4, 5, 5))
>
> > dim(x)
>
> [1] 4 5 5
>
> > head(x, 1)
>
> [1] 100
>
> > class(head(x))
>
> [1] "numeric"
>
>
> (For a 1d array, it does return another 1d array).
>
> When extending head/tail to understand multiple dimensions as discussed in
> this thread, then, should the behavior for 2+d arrays be explicitly
> retained, or should head and tail do the analogous thing (with a head(<2d
> array>) behaving the same as head(<matrix>), which honestly is what I
> expected to already be happening)?
>
> Are people using/relying on this behavior in their code, and if so, why/for
> what?
>
> Even more generally, one way forward is to have the default methods check
> for dimensions, and use length if it is null:
>
> tail.default <- tail.data.frame <- function(x, n = 6L, ...)
> {
>     if(any(n == 0))
>         stop("n must be non-zero or unspecified for all dimensions")
>     if(!is.null(dim(x)))
>         dimsx <- dim(x)
>     else
>         dimsx <- length(x)
>
>     ## this returns a list of vectors of indices in each
>     ## dimension, regardless of length of the the n
>     ## argument
>     sel <- lapply(seq_along(dimsx), function(i) {
>         dxi <- dimsx[i]
>         ## select all indices (full dim) if not specified
>         ni <- if(length(n) >= i) n[i] else dxi
>         ## handle negative ns
>         ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
>         seq.int(to = dxi, length.out = ni)
>     })
>     args <- c(list(x), sel, drop = FALSE)
>     do.call("[", args)
> }
>
>
> I think this precludes the need for a separate data.frame method at all,
> actually, though (I would think) tail.data.frame would still be defined and
> exported for backwards compatibility. (the matrix method has some extra
> bits so my current conception of it is still separate, though it might not
> NEED to be).
>
> The question then becomes, should head/tail always return something with
> the same dimensionally (number of dims) it got, or should data.frame and
> matrix be special cased in this regard, as they are now?
>
> What are people's thoughts?
> ~G
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Martin Maechler
In reply to this post by Gabriel Becker-2
>>>>> Gabriel Becker
>>>>>     on Tue, 29 Oct 2019 12:43:15 -0700 writes:

    > Hi all,
    > So I've started working on this and I ran into something that I didn't
    > know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
    > ignore dimension completely, treat x as an atomic vector, and return an
    > (unclassed) atomic vector:

Well, that's  (3+), not "2+" .

But I did write (on Sep 17 in this thread!)

  > The current source for head() and tail() and all their methods
  > in utils is just 83 lines of code  {file utils/R/head.R minus
  > the initial mostly copyright comments}.

and if've ever looked at these few dozen of R code lines, you'll
have seen that we just added two simple utilities with a few
reasonable simple methods.  To treat non-matrix (i.e. non-2d)
arrays as vectors, is typically not unreasonable in R, but
indeed with your proposals (in this thread), such non-2d arrays
should be treated differently either via new  head.array() /
tail.array() methods ((or -- only if it can be done more nicely -- by
the default method)).

Note however the following  historical quirk :

> sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
    1     2     3     4     5
 TRUE FALSE  TRUE  TRUE  TRUE

(Is this something we should consider changing for R 4.0.0 -- to
 have it TRUE also for 2d-arrays aka matrix objects ??)

The consequence of that is that
currently, "often"   foo.matrix is just a copy of foo.array  in
the case the latter exists:
"base" examples: foo in {unique, duplicated, anyDuplicated}.

So I propose you change current  head.matrix and tail.matrix  to
head.array and tail.array
(and then have   head.matrix <- head.array  etc, at least if the
 above quirk must remain, or remains (which I currently guess to
 be the case)).


    >> x = array(100, c(4, 5, 5))

    >> dim(x)

    > [1] 4 5 5

    >> head(x, 1)

    > [1] 100

    >> class(head(x))

    > [1] "numeric"


    > (For a 1d array, it does return another 1d array).

    > When extending head/tail to understand multiple dimensions as discussed in
    > this thread, then, should the behavior for 2+d arrays be explicitly
    > retained, or should head and tail do the analogous thing (with a head(<2d
    array> ) behaving the same as head(<matrix>), which honestly is what I
    > expected to already be happening)?

    > Are people using/relying on this behavior in their code, and if so, why/for
    > what?

    > Even more generally, one way forward is to have the default methods check
    > for dimensions, and use length if it is null:

    > tail.default <- tail.data.frame <- function(x, n = 6L, ...)
    > {
    > if(any(n == 0))
    > stop("n must be non-zero or unspecified for all dimensions")
    > if(!is.null(dim(x)))
    > dimsx <- dim(x)
    > else
    > dimsx <- length(x)

    > ## this returns a list of vectors of indices in each
    > ## dimension, regardless of length of the the n
    > ## argument
    > sel <- lapply(seq_along(dimsx), function(i) {
    > dxi <- dimsx[i]
    > ## select all indices (full dim) if not specified
    > ni <- if(length(n) >= i) n[i] else dxi
    > ## handle negative ns
    > ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
    > seq.int(to = dxi, length.out = ni)
    > })
    > args <- c(list(x), sel, drop = FALSE)
    > do.call("[", args)
    > }


    > I think this precludes the need for a separate data.frame method at all,
    > actually, though (I would think) tail.data.frame would still be defined and
    > exported for backwards compatibility. (the matrix method has some extra
    > bits so my current conception of it is still separate, though it might not
    > NEED to be).

    > The question then becomes, should head/tail always return something with
    > the same dimensionally (number of dims) it got, or should data.frame and
    > matrix be special cased in this regard, as they are now?

    > What are people's thoughts?
    > ~G

    > [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Gabriel Becker-2
Hi Martin,


On Wed, Oct 30, 2019 at 4:30 AM Martin Maechler <[hidden email]>
wrote:

> >>>>> Gabriel Becker
> >>>>>     on Tue, 29 Oct 2019 12:43:15 -0700 writes:
>
>     > Hi all,
>     > So I've started working on this and I ran into something that I
> didn't
>     > know, namely that for x a multi-dimensional (2+) array, head(x) and
> tail(x)
>     > ignore dimension completely, treat x as an atomic vector, and return
> an
>     > (unclassed) atomic vector:
>
> Well, that's  (3+), not "2+" .
>

You're correct, of course. Apologies for that.

>
> But I did write (on Sep 17 in this thread!)
>
>   > The current source for head() and tail() and all their methods
>   > in utils is just 83 lines of code  {file utils/R/head.R minus
>   > the initial mostly copyright comments}.
>
> and if've ever looked at these few dozen of R code lines, you'll
> have seen that we just added two simple utilities with a few
> reasonable simple methods.  To treat non-matrix (i.e. non-2d)
> arrays as vectors, is typically not unreasonable in R, but
> indeed with your proposals (in this thread), such non-2d arrays
> should be treated differently either via new  head.array() /
> tail.array() methods ((or -- only if it can be done more nicely -- by
> the default method)).
>

I hope you didn't construe my describing surprise (which was honest)  as a
criticism. It just quite literally not what I thought head(array(100, c(25,
2, 2))) would have done based on what head.matrix does is all.


>
> Note however the following  historical quirk :
>
> > sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
>     1     2     3     4     5
>  TRUE FALSE  TRUE  TRUE  TRUE
>
> (Is this something we should consider changing for R 4.0.0 -- to
>  have it TRUE also for 2d-arrays aka matrix objects ??)
>

That is pretty odd. IMHO It would be quite nice from a design perspective
to fix that, but I do wonder, as I infer you do as well, how much code it
would break.

Changing this would cause problems in any case where a generic has an array
method but no matrix method, as well as any code that explicitly checks for
inherits from "array" assuming matrices won't return true, correct? My
intuition is that the former would be pretty rare, though it might be a fun
little problem to figure it out.  The latter is ...probably also fairly
rare? My intuition on that one is less strong though.


>
> The consequence of that is that
> currently, "often"   foo.matrix is just a copy of foo.array  in
> the case the latter exists:
> "base" examples: foo in {unique, duplicated, anyDuplicated}.
>
> So I propose you change current  head.matrix and tail.matrix  to
> head.array and tail.array
> (and then have   head.matrix <- head.array  etc, at least if the
>  above quirk must remain, or remains (which I currently guess to
>  be the case)).
>
>

Absolutely, will do. I'm gratified we're going after the more general
approach. Thanks for working with us on this.

Best,
~G


>
>     >> x = array(100, c(4, 5, 5))
>
>     >> dim(x)
>
>     > [1] 4 5 5
>
>     >> head(x, 1)
>
>     > [1] 100
>
>     >> class(head(x))
>
>     > [1] "numeric"
>
>
>     > (For a 1d array, it does return another 1d array).
>
>     > When extending head/tail to understand multiple dimensions as
> discussed in
>     > this thread, then, should the behavior for 2+d arrays be explicitly
>     > retained, or should head and tail do the analogous thing (with a
> head(<2d
>     array> ) behaving the same as head(<matrix>), which honestly is what I
>     > expected to already be happening)?
>
>     > Are people using/relying on this behavior in their code, and if so,
> why/for
>     > what?
>
>     > Even more generally, one way forward is to have the default methods
> check
>     > for dimensions, and use length if it is null:
>
>     > tail.default <- tail.data.frame <- function(x, n = 6L, ...)
>     > {
>     > if(any(n == 0))
>     > stop("n must be non-zero or unspecified for all dimensions")
>     > if(!is.null(dim(x)))
>     > dimsx <- dim(x)
>     > else
>     > dimsx <- length(x)
>
>     > ## this returns a list of vectors of indices in each
>     > ## dimension, regardless of length of the the n
>     > ## argument
>     > sel <- lapply(seq_along(dimsx), function(i) {
>     > dxi <- dimsx[i]
>     > ## select all indices (full dim) if not specified
>     > ni <- if(length(n) >= i) n[i] else dxi
>     > ## handle negative ns
>     > ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
>     > seq.int(to = dxi, length.out = ni)
>     > })
>     > args <- c(list(x), sel, drop = FALSE)
>     > do.call("[", args)
>     > }
>
>
>     > I think this precludes the need for a separate data.frame method at
> all,
>     > actually, though (I would think) tail.data.frame would still be
> defined and
>     > exported for backwards compatibility. (the matrix method has some
> extra
>     > bits so my current conception of it is still separate, though it
> might not
>     > NEED to be).
>
>     > The question then becomes, should head/tail always return something
> with
>     > the same dimensionally (number of dims) it got, or should data.frame
> and
>     > matrix be special cased in this regard, as they are now?
>
>     > What are people's thoughts?
>     > ~G
>
>     > [[alternative HTML version deleted]]
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Pages, Herve
In reply to this post by Martin Maechler
On 10/30/19 04:29, Martin Maechler wrote:

>>>>>> Gabriel Becker
>>>>>>      on Tue, 29 Oct 2019 12:43:15 -0700 writes:
>
>      > Hi all,
>      > So I've started working on this and I ran into something that I didn't
>      > know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
>      > ignore dimension completely, treat x as an atomic vector, and return an
>      > (unclassed) atomic vector:
>
> Well, that's  (3+), not "2+" .
>
> But I did write (on Sep 17 in this thread!)
>
>    > The current source for head() and tail() and all their methods
>    > in utils is just 83 lines of code  {file utils/R/head.R minus
>    > the initial mostly copyright comments}.
>
> and if've ever looked at these few dozen of R code lines, you'll
> have seen that we just added two simple utilities with a few
> reasonable simple methods.  To treat non-matrix (i.e. non-2d)
> arrays as vectors, is typically not unreasonable in R, but
> indeed with your proposals (in this thread), such non-2d arrays
> should be treated differently either via new  head.array() /
> tail.array() methods ((or -- only if it can be done more nicely -- by
> the default method)).
>
> Note however the following  historical quirk :
>
>> sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
>      1     2     3     4     5
>   TRUE FALSE  TRUE  TRUE  TRUE
>
> (Is this something we should consider changing for R 4.0.0 -- to
>   have it TRUE also for 2d-arrays aka matrix objects ??)

That would be awesome! More generally I wonder how feasible it would be
to fix all these inheritance quirks where inherits(x, "something"),
is(x, "something"), and is.something(x) disagree. They've been such a
nuisance for so many years...

Thanks,
H.


>
> The consequence of that is that
> currently, "often"   foo.matrix is just a copy of foo.array  in
> the case the latter exists:
> "base" examples: foo in {unique, duplicated, anyDuplicated}.
>
> So I propose you change current  head.matrix and tail.matrix  to
> head.array and tail.array
> (and then have   head.matrix <- head.array  etc, at least if the
>   above quirk must remain, or remains (which I currently guess to
>   be the case)).
>
>
>      >> x = array(100, c(4, 5, 5))
>
>      >> dim(x)
>
>      > [1] 4 5 5
>
>      >> head(x, 1)
>
>      > [1] 100
>
>      >> class(head(x))
>
>      > [1] "numeric"
>
>
>      > (For a 1d array, it does return another 1d array).
>
>      > When extending head/tail to understand multiple dimensions as discussed in
>      > this thread, then, should the behavior for 2+d arrays be explicitly
>      > retained, or should head and tail do the analogous thing (with a head(<2d
>      array> ) behaving the same as head(<matrix>), which honestly is what I
>      > expected to already be happening)?
>
>      > Are people using/relying on this behavior in their code, and if so, why/for
>      > what?
>
>      > Even more generally, one way forward is to have the default methods check
>      > for dimensions, and use length if it is null:
>
>      > tail.default <- tail.data.frame <- function(x, n = 6L, ...)
>      > {
>      > if(any(n == 0))
>      > stop("n must be non-zero or unspecified for all dimensions")
>      > if(!is.null(dim(x)))
>      > dimsx <- dim(x)
>      > else
>      > dimsx <- length(x)
>
>      > ## this returns a list of vectors of indices in each
>      > ## dimension, regardless of length of the the n
>      > ## argument
>      > sel <- lapply(seq_along(dimsx), function(i) {
>      > dxi <- dimsx[i]
>      > ## select all indices (full dim) if not specified
>      > ni <- if(length(n) >= i) n[i] else dxi
>      > ## handle negative ns
>      > ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
>      > seq.int(to = dxi, length.out = ni)
>      > })
>      > args <- c(list(x), sel, drop = FALSE)
>      > do.call("[", args)
>      > }
>
>
>      > I think this precludes the need for a separate data.frame method at all,
>      > actually, though (I would think) tail.data.frame would still be defined and
>      > exported for backwards compatibility. (the matrix method has some extra
>      > bits so my current conception of it is still separate, though it might not
>      > NEED to be).
>
>      > The question then becomes, should head/tail always return something with
>      > the same dimensionally (number of dims) it got, or should data.frame and
>      > matrix be special cased in this regard, as they are now?
>
>      > What are people's thoughts?
>      > ~G
>
>      > [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Xl_11U8w8hVRbuqAPQkz0uSW02kokK9EUPhOopxw0d8&s=vyKU4VkWLb_fGG6KeDPPjVM5_nLhav6UiX7NkzgqsuE&e=
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Abby Spurdle
On Fri, Nov 1, 2019 at 10:02 AM Pages, Herve <[hidden email]> wrote:
> That would be awesome! More generally I wonder how feasible it would be
> to fix all these inheritance quirks where inherits(x, "something"),
> is(x, "something"), and is.something(x) disagree. They've been such a
> nuisance for so many years...

This matter was raised in March:
https://stat.ethz.ch/pipermail/r-devel/2019-March/077457.html

In principle, I agree.
However, I'm not sure it's possible without causing compatibility problems.
Not to mention all the disagreement about what's the correct approach.

And I should probably apologize for incorrectly suggesting that there
was a non-backward-compatible design flaw...

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Peter Dalgaard-2
In reply to this post by Martin Maechler
Hmm, the problem I see here is that these implied classes are all inherently one-off. We also have

> inherits(matrix(1,1,1),"numeric")
[1] FALSE
> is.numeric(matrix(1,1,1))
[1] TRUE
> inherits(1L,"numeric")
[1] FALSE
> is.numeric(1L)
[1] TRUE

and if we start fixing one, we might need to fix all.

For method dispatch, we do have inheritance, e.g.

> foo.numeric <- function(x) x + 1
> foo <- function(x) UseMethod("foo")
> foo(1)
[1] 2
> foo(1L)
[1] 2
> foo(matrix(1,1,1))
     [,1]
[1,]    2
> foo.integer <- function(x) x + 2
> foo(1)
[1] 2
> foo(1L)
[1] 3
> foo(matrix(1,1,1))
     [,1]
[1,]    2
> foo(matrix(1L,1,1))
     [,1]
[1,]    3

but these are not all automatic: "integer" implies "numeric", but "matrix" does not imply "numeric", much less "integer".

Also, we seem to have a rule that inherits(x, c) iff c %in% class(x), which would break -- unless we change class(x) to return the whole set of inherited classes, which I sense that we'd rather not do....

-pd

> On 30 Oct 2019, at 12:29 , Martin Maechler <[hidden email]> wrote:
>
> Note however the following  historical quirk :
>
>> sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
>    1     2     3     4     5
> TRUE FALSE  TRUE  TRUE  TRUE
>
> (Is this something we should consider changing for R 4.0.0 -- to
> have it TRUE also for 2d-arrays aka matrix objects ??)

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Martin Maechler
In reply to this post by Pages, Herve
>>>>> Pages, Herve
>>>>>     on Thu, 31 Oct 2019 21:02:07 +0000 writes:

    > On 10/30/19 04:29, Martin Maechler wrote:
    >>>>>>> Gabriel Becker
    >>>>>>> on Tue, 29 Oct 2019 12:43:15 -0700 writes:
    >>
    >> > Hi all,
    >> > So I've started working on this and I ran into something that I didn't
    >> > know, namely that for x a multi-dimensional (2+) array, head(x) and tail(x)
    >> > ignore dimension completely, treat x as an atomic vector, and return an
    >> > (unclassed) atomic vector:
    >>
    >> Well, that's  (3+), not "2+" .
    >>
    >> But I did write (on Sep 17 in this thread!)
    >>
    >> > The current source for head() and tail() and all their methods
    >> > in utils is just 83 lines of code  {file utils/R/head.R minus
    >> > the initial mostly copyright comments}.
    >>
    >> and if've ever looked at these few dozen of R code lines, you'll
    >> have seen that we just added two simple utilities with a few
    >> reasonable simple methods.  To treat non-matrix (i.e. non-2d)
    >> arrays as vectors, is typically not unreasonable in R, but
    >> indeed with your proposals (in this thread), such non-2d arrays
    >> should be treated differently either via new  head.array() /
    >> tail.array() methods ((or -- only if it can be done more nicely -- by
    >> the default method)).
    >>
    >> Note however the following  historical quirk :
    >>
    >>> sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
    >> 1     2     3     4     5
    >> TRUE FALSE  TRUE  TRUE  TRUE
    >>
    >> (Is this something we should consider changing for R 4.0.0 -- to
    >> have it TRUE also for 2d-arrays aka matrix objects ??)

    > That would be awesome! More generally I wonder how feasible it would be
    > to fix all these inheritance quirks where inherits(x, "something"),
    > is(x, "something"), and is.something(x) disagree. They've been such a
    > nuisance for so many years...

    > Thanks,
    > H.

Thank you Hervé; you are right "in theory", but
no, we don't want to fix _all_ these quirks at the moment
(because we know how much this would break).
Note that ?class does mention S3 and S4, and also you know about
is(.,.)  which is more "rational" than inherits insofar as it
"thinks" the S4 way about inheritance .. but then it has it's
surprises, too; e.g., note the result of  is(NULL) .

I really wanted to address the relatively limited case of
{matrix, array} for now.

{{more on this in the subthread Peter opened}}
Martin

    >> The consequence of that is that
    >> currently, "often"   foo.matrix is just a copy of foo.array  in
    >> the case the latter exists:
    >> "base" examples: foo in {unique, duplicated, anyDuplicated}.
    >>
    >> So I propose you change current  head.matrix and tail.matrix  to
    >> head.array and tail.array
    >> (and then have   head.matrix <- head.array  etc, at least if the
    >> above quirk must remain, or remains (which I currently guess to
    >> be the case)).
    >>
    >>
    >> >> x = array(100, c(4, 5, 5))
    >>
    >> >> dim(x)
    >>
    >> > [1] 4 5 5
    >>
    >> >> head(x, 1)
    >>
    >> > [1] 100
    >>
    >> >> class(head(x))
    >>
    >> > [1] "numeric"
    >>
    >>
    >> > (For a 1d array, it does return another 1d array).
    >>
    >> > When extending head/tail to understand multiple dimensions as discussed in
    >> > this thread, then, should the behavior for 2+d arrays be explicitly
    >> > retained, or should head and tail do the analogous thing (with a head(<2d
    array> ) behaving the same as head(<matrix>), which honestly is what I
    >> > expected to already be happening)?
    >>
    >> > Are people using/relying on this behavior in their code, and if so, why/for
    >> > what?
    >>
    >> > Even more generally, one way forward is to have the default methods check
    >> > for dimensions, and use length if it is null:
    >>
    >> > tail.default <- tail.data.frame <- function(x, n = 6L, ...)
    >> > {
    >> > if(any(n == 0))
    >> > stop("n must be non-zero or unspecified for all dimensions")
    >> > if(!is.null(dim(x)))
    >> > dimsx <- dim(x)
    >> > else
    >> > dimsx <- length(x)
    >>
    >> > ## this returns a list of vectors of indices in each
    >> > ## dimension, regardless of length of the the n
    >> > ## argument
    >> > sel <- lapply(seq_along(dimsx), function(i) {
    >> > dxi <- dimsx[i]
    >> > ## select all indices (full dim) if not specified
    >> > ni <- if(length(n) >= i) n[i] else dxi
    >> > ## handle negative ns
    >> > ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
    >> > seq.int(to = dxi, length.out = ni)
    >> > })
    >> > args <- c(list(x), sel, drop = FALSE)
    >> > do.call("[", args)
    >> > }
    >>
    >>
    >> > I think this precludes the need for a separate data.frame method at all,
    >> > actually, though (I would think) tail.data.frame would still be defined and
    >> > exported for backwards compatibility. (the matrix method has some extra
    >> > bits so my current conception of it is still separate, though it might not
    >> > NEED to be).
    >>
    >> > The question then becomes, should head/tail always return something with
    >> > the same dimensionally (number of dims) it got, or should data.frame and
    >> > matrix be special cased in this regard, as they are now?
    >>
    >> > What are people's thoughts?
    >> > ~G
    >>
    >> > [[alternative HTML version deleted]]
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Xl_11U8w8hVRbuqAPQkz0uSW02kokK9EUPhOopxw0d8&s=vyKU4VkWLb_fGG6KeDPPjVM5_nLhav6UiX7NkzgqsuE&e=
    >>

    > --
    > Hervé Pagès

    > Program in Computational Biology
    > Division of Public Health Sciences
    > Fred Hutchinson Cancer Research Center
    > 1100 Fairview Ave. N, M1-B514
    > P.O. Box 19024
    > Seattle, WA 98109-1024

    > E-mail: [hidden email]
    > Phone:  (206) 667-5791
    > Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: head.matrix can return 1000s of columns -- limit to n or add new argument?

Martin Maechler
In reply to this post by Peter Dalgaard-2
>>>>> peter dalgaard
>>>>>     on Thu, 31 Oct 2019 23:04:29 +0100 writes:

    > Hmm, the problem I see here is that these implied classes are all inherently one-off. We also have
    >> inherits(matrix(1,1,1),"numeric")
    > [1] FALSE
    >> is.numeric(matrix(1,1,1))
    > [1] TRUE
    >> inherits(1L,"numeric")
    > [1] FALSE
    >> is.numeric(1L)
    > [1] TRUE

    > and if we start fixing one, we might need to fix all.

I disagree about "fixing all" (see also my reply to Hervé), and
the {"numeric","double","integer"} case is particularly messy,
and I don't want to open that can now.

    > For method dispatch, we do have inheritance, e.g.

    >> foo.numeric <- function(x) x + 1
    >> foo <- function(x) UseMethod("foo")
    >> foo(1)
    > [1] 2
    >> foo(1L)
    > [1] 2
    >> foo(matrix(1,1,1))
    > [,1]
    > [1,]    2
    >> foo.integer <- function(x) x + 2
    >> foo(1)
    > [1] 2
    >> foo(1L)
    > [1] 3
    >> foo(matrix(1,1,1))
    > [,1]
    > [1,]    2
    >> foo(matrix(1L,1,1))
    > [,1]
    > [1,]    3

    > but these are not all automatic: "integer" implies "numeric", but "matrix" does not imply "numeric", much less "integer".

well it should not imply in general:
Contrary to Math,  we also have 'raw' or 'character' or 'logical' matrices.


    > Also, we seem to have a rule that inherits(x, c)  iff  c %in% class(x),

good point, and that's why my usage of  inherits(.,.) was not
quite to the point.  [OTOH, it was to the point, as indeed from
      the ?class / ?inherits docu, S3 method dispatch and inherits
      must be consistent ]

    > which would break -- unless we change class(x) to return the whole set of inherited classes, which I sense that we'd rather not do....

and we have something like that already with  is(.)

Thank you for these important points raised!

Note again that both "matrix" and "array" are special [see ?class] as
being of  __implicit class__  and I am considering that this
implicit class behavior for these two should be slightly changed
such that

  foo <- function(x,...) UseMethod("foo")
  foo.array <- function(x, ...)
           sprintf("array of dim. %s", paste(dim(x), collapse = " x "))

should work for all arrays and not be an exception for 2D arrays :

> foo(array(pi, 1:3))
[1] "array of dim. 1 x 2 x 3"
> foo(array(pi, 1))
[1] "array of dim. 1"
> foo(array(pi, 2:7))
[1] "array of dim. 2 x 3 x 4 x 5 x 6 x 7"
> foo(array(pi, 1:2))
Error in UseMethod("foo") :
  no applicable method for 'foo' applied to an object of class "c('matrix', 'double', 'numeric')"
>

And indeed I think you are right on spot and this would mean
that indeed the implicit class
"matrix" should rather become c("matrix", "array").

BTW: The 'Details' section of   ?class   nicely defines things,
     notably the __implicit class__ situation
     (but I think should be improved)  :

     {numbering the paragraphs for reference}

> Details:
>
> 1.   Here, we describe the so called “S3” classes (and methods). For
>      “S4” classes (and methods), see ‘Formal classes’ below.
>
> 2.   Many R objects have a class attribute, a character vector giving
>      the names of the classes from which the object _inherits_.
>      (Functions oldClass and oldClass<- get and set the attribute,
>      which can also be done directly.)
>
> 3.   If the object does not have a class attribute, it has an implicit
>      class, notably ‘"matrix"’, ‘"array"’, ‘"function"’ or ‘"numeric"’
>      or the result of ‘typeof(x)’ (which is similar to ‘mode(x)’), but
>      for type ‘"language"’ and mode ‘"call"’, where the following
>      extra classes exist for the corresponding function calls: if,
>      while, for, =, <-, (, {, call.

So, I think clearly  { for S3, not S4 ! }

  "class attribute" :=  attr(x, "class")

  "implicit class" := the class(x) of R objects that do *not*
           have a class attribute

 
> 4.   Note that NULL objects cannot have attributes (hence not
>      classes) and attempting to assign a class is an error.

the above has one small flaw : "(hence not classes)" is not correct.
Of course   class(NULL) is "NULL" by par. 3's  typeof(x) "rule".

> 5a.  When a generic function ‘fun’ is applied to an object with class
>      attribute ‘c("first", "second")’, the system searches for a
>      function called ‘fun.first’ and, if it finds it, applies it to the
>      object.  If no such function is found, a function called
>      ‘fun.second’ is tried.  If no class name produces a suitable
>      function, the function ‘fun.default’ is used (if it exists).
> 5b.  If there is no class attribute, the implicit class is tried, then the
>      default method.

> 6.   The function 'class' prints the vector of names of classes an
>      object inherits from.  Correspondingly, class<- sets the classes
>      an object inherits from.  Assigning NULL removes the class
>      attribute.

["of course", the word  "prints" above should be replaced by "returns" ! ]
     
> 7.   'unclass' returns (a copy of) its argument with its class
>      attribute removed.  (It is not allowed for objects which cannot be
>      copied, namely environments and external pointers.)
     
> 8.   'inherits' indicates whether its first argument inherits from any
>      of the classes specified in the ‘what’ argument.  If which is
>      TRUE then an integer vector of the same length as ‘what’ is
>      returned.  Each element indicates the position in the ‘class(x)’
>      matched by the element of ‘what’; zero indicates no match. If
>      which is FALSE then TRUE is returned by inherits if any of
>      the names in ‘what’ match with any class.

{I had forgotten that the 2nd argument of inherits, 'what', can
 be a vector and about the 'which' argument}


    >> On 30 Oct 2019, at 12:29 , Martin Maechler <[hidden email]> wrote:
    >>
    >> Note however the following  historical quirk :
    >>
    >>> sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
    >> 1     2     3     4     5
    >> TRUE FALSE  TRUE  TRUE  TRUE
    >>
    >> (Is this something we should consider changing for R 4.0.0 -- to
    >> have it TRUE also for 2d-arrays aka matrix objects ??)

    > --
    > Peter Dalgaard, Professor,
    > Center for Statistics, Copenhagen Business School
    > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
    > Phone: (+45)38153501
    > Office: A 4.23
    > Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
12