Subsetting row in single column matrix drops names in resulting vector

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Subsetting row in single column matrix drops names in resulting vector

Dmitriy Selivanov
Hello here. I'm struggling to understand R's subsetting behavior in couple
of edge cases - subsetting row in a single column matrix and subsetting
column in a single row matrix. I've read R's docs several times and haven't
found answer.

Consider following example:

a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
a[1, ]
# 1

It returns *unnamed* vector `1` where I would expect named vector. In fact
it returns named vector when number of columns is > 1.
Same issue applicable to single row matrix. Is it a bug? looks very
counterintuitive.


--
Regards
Dmitriy Selivanov

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting row in single column matrix drops names in resulting vector

Rui Barradas
Hello,

Use drop = FALSE.

a[1, , drop = FALSE]
#     col1
#row1    1


Hope this helps,

Rui Barradas

Às 16:51 de 21/11/2018, Dmitriy Selivanov escreveu:

> Hello here. I'm struggling to understand R's subsetting behavior in couple
> of edge cases - subsetting row in a single column matrix and subsetting
> column in a single row matrix. I've read R's docs several times and haven't
> found answer.
>
> Consider following example:
>
> a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
> a[1, ]
> # 1
>
> It returns *unnamed* vector `1` where I would expect named vector. In fact
> it returns named vector when number of columns is > 1.
> Same issue applicable to single row matrix. Is it a bug? looks very
> counterintuitive.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting row in single column matrix drops names in resulting vector

Dmitriy Selivanov
Hi Rui. Thanks for answer, I'm aware of drop = FALSE option. Unfortunately
it doesn't resolve the issue - I'm expecting to get a vector, not a matrix .

ср, 21 нояб. 2018 г. в 20:54, Rui Barradas <[hidden email]>:

> Hello,
>
> Use drop = FALSE.
>
> a[1, , drop = FALSE]
> #     col1
> #row1    1
>
>
> Hope this helps,
>
> Rui Barradas
>
> Às 16:51 de 21/11/2018, Dmitriy Selivanov escreveu:
> > Hello here. I'm struggling to understand R's subsetting behavior in
> couple
> > of edge cases - subsetting row in a single column matrix and subsetting
> > column in a single row matrix. I've read R's docs several times and
> haven't
> > found answer.
> >
> > Consider following example:
> >
> > a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
> > a[1, ]
> > # 1
> >
> > It returns *unnamed* vector `1` where I would expect named vector. In
> fact
> > it returns named vector when number of columns is > 1.
> > Same issue applicable to single row matrix. Is it a bug? looks very
> > counterintuitive.
> >
> >
>


--
Regards
Dmitriy Selivanov

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting row in single column matrix drops names in resulting vector

Emil
The problem is that the drop is only applied (or not) after the subsetting, so what R does is:
- Getting the subset, which means a 1 x 1 matrix.
- Only then It either returns that as is (when drop=FALSE), or removes ALL dimensions of extent 1, regardless of whether these are rows or columns (or higher dimensions).
And it can't keep any names, because what name should be returned? The name 'row1' is just as valid as 'col1'.
I guess if we could design everything anew, a solution would be to be able to specify something like a[1,,drop='row'], or a[1,,drop=1] to drop the rows but keep columns, and get a vector being equal to 'row1' (which in this case just has length-1, and names 'col1')
That not how it's designed, but you could use 'adrop()' from the 'abind' package:
abind:: adrop(a[1,,drop=FALSE], drop=1) first subsets, then drops the row-dimension, so gives what you're looking for.
Hope this solves your problem.

Best regards,
Emil Bode
 

On 21/11/2018, 17:58, "R-devel on behalf of Dmitriy Selivanov" <[hidden email] on behalf of [hidden email]> wrote:

    Hi Rui. Thanks for answer, I'm aware of drop = FALSE option. Unfortunately
    it doesn't resolve the issue - I'm expecting to get a vector, not a matrix .
   
    ср, 21 нояб. 2018 г. в 20:54, Rui Barradas <[hidden email]>:
   
    > Hello,
    >
    > Use drop = FALSE.
    >
    > a[1, , drop = FALSE]
    > #     col1
    > #row1    1
    >
    >
    > Hope this helps,
    >
    > Rui Barradas
    >
    > Às 16:51 de 21/11/2018, Dmitriy Selivanov escreveu:
    > > Hello here. I'm struggling to understand R's subsetting behavior in
    > couple
    > > of edge cases - subsetting row in a single column matrix and subsetting
    > > column in a single row matrix. I've read R's docs several times and
    > haven't
    > > found answer.
    > >
    > > Consider following example:
    > >
    > > a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
    > > a[1, ]
    > > # 1
    > >
    > > It returns *unnamed* vector `1` where I would expect named vector. In
    > fact
    > > it returns named vector when number of columns is > 1.
    > > Same issue applicable to single row matrix. Is it a bug? looks very
    > > counterintuitive.
    > >
    > >
    >
   
   
    --
    Regards
    Dmitriy Selivanov
   
    [[alternative HTML version deleted]]
   
    ______________________________________________
    [hidden email] mailing list
    https://stat.ethz.ch/mailman/listinfo/r-devel
   

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting row in single column matrix drops names in resulting vector

Serguei Sokol
Le 22/11/2018 à 14:47, Emil Bode a écrit :
> The problem is that the drop is only applied (or not) after the subsetting, so what R does is:
> - Getting the subset, which means a 1 x 1 matrix.
> - Only then It either returns that as is (when drop=FALSE), or removes ALL dimensions of extent 1, regardless of whether these are rows or columns (or higher dimensions).
> And it can't keep any names, because what name should be returned? The name 'row1' is just as valid as 'col1'.
If it is the only reason to not return any name in this case, I could
make a suggestion.
Let return the name corresponding to the index in subsetting request,
i.e. for a one-column matrix example it would give

names(a[1,])
#"row1"
names(a[2,])
#"row2"

as the indexes 1 and 2 here above corresponds to rows.

Just my 0.02€
Serguei.


> I guess if we could design everything anew, a solution would be to be able to specify something like a[1,,drop='row'], or a[1,,drop=1] to drop the rows but keep columns, and get a vector being equal to 'row1' (which in this case just has length-1, and names 'col1')
> That not how it's designed, but you could use 'adrop()' from the 'abind' package:
> abind:: adrop(a[1,,drop=FALSE], drop=1) first subsets, then drops the row-dimension, so gives what you're looking for.
> Hope this solves your problem.
>
> Best regards,
> Emil Bode
>  
>
> On 21/11/2018, 17:58, "R-devel on behalf of Dmitriy Selivanov" <[hidden email] on behalf of [hidden email]> wrote:
>
>      Hi Rui. Thanks for answer, I'm aware of drop = FALSE option. Unfortunately
>      it doesn't resolve the issue - I'm expecting to get a vector, not a matrix .
>      
>      ср, 21 нояб. 2018 г. в 20:54, Rui Barradas <[hidden email]>:
>      
>      > Hello,
>      >
>      > Use drop = FALSE.
>      >
>      > a[1, , drop = FALSE]
>      > #     col1
>      > #row1    1
>      >
>      >
>      > Hope this helps,
>      >
>      > Rui Barradas
>      >
>      > Às 16:51 de 21/11/2018, Dmitriy Selivanov escreveu:
>      > > Hello here. I'm struggling to understand R's subsetting behavior in
>      > couple
>      > > of edge cases - subsetting row in a single column matrix and subsetting
>      > > column in a single row matrix. I've read R's docs several times and
>      > haven't
>      > > found answer.
>      > >
>      > > Consider following example:
>      > >
>      > > a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
>      > > a[1, ]
>      > > # 1
>      > >
>      > > It returns *unnamed* vector `1` where I would expect named vector. In
>      > fact
>      > > it returns named vector when number of columns is > 1.
>      > > Same issue applicable to single row matrix. Is it a bug? looks very
>      > > counterintuitive.
>      > >
>      > >
>      >
>      
>      
>      --
>      Regards
>      Dmitriy Selivanov
>      
>       [[alternative HTML version deleted]]
>      
>      ______________________________________________
>      [hidden email] mailing list
>      https://stat.ethz.ch/mailman/listinfo/r-devel
>      
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


--
Serguei Sokol
Ingenieur de recherche INRA

Cellule mathématiques
LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
135 Avenue de Rangueil
31077 Toulouse Cedex 04

tel: +33 5 62 25 01 27
email: [hidden email]
http://www.lisbp.fr

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting row in single column matrix drops names in resulting vector

Dmitriy Selivanov
In reply to this post by Emil
Emil, thanks for very nice explanation. Wish base drop would have same
behavior as abind::adrop.

чт, 22 нояб. 2018 г., 17:47 Emil Bode [hidden email]:

> The problem is that the drop is only applied (or not) after the
> subsetting, so what R does is:
> - Getting the subset, which means a 1 x 1 matrix.
> - Only then It either returns that as is (when drop=FALSE), or removes ALL
> dimensions of extent 1, regardless of whether these are rows or columns (or
> higher dimensions).
> And it can't keep any names, because what name should be returned? The
> name 'row1' is just as valid as 'col1'.
> I guess if we could design everything anew, a solution would be to be able
> to specify something like a[1,,drop='row'], or a[1,,drop=1] to drop the
> rows but keep columns, and get a vector being equal to 'row1' (which in
> this case just has length-1, and names 'col1')
> That not how it's designed, but you could use 'adrop()' from the 'abind'
> package:
> abind:: adrop(a[1,,drop=FALSE], drop=1) first subsets, then drops the
> row-dimension, so gives what you're looking for.
> Hope this solves your problem.
>
> Best regards,
> Emil Bode
>
>
> On 21/11/2018, 17:58, "R-devel on behalf of Dmitriy Selivanov" <
> [hidden email] on behalf of [hidden email]>
> wrote:
>
>     Hi Rui. Thanks for answer, I'm aware of drop = FALSE option.
> Unfortunately
>     it doesn't resolve the issue - I'm expecting to get a vector, not a
> matrix .
>
>     ср, 21 нояб. 2018 г. в 20:54, Rui Barradas <[hidden email]>:
>
>     > Hello,
>     >
>     > Use drop = FALSE.
>     >
>     > a[1, , drop = FALSE]
>     > #     col1
>     > #row1    1
>     >
>     >
>     > Hope this helps,
>     >
>     > Rui Barradas
>     >
>     > Às 16:51 de 21/11/2018, Dmitriy Selivanov escreveu:
>     > > Hello here. I'm struggling to understand R's subsetting behavior in
>     > couple
>     > > of edge cases - subsetting row in a single column matrix and
> subsetting
>     > > column in a single row matrix. I've read R's docs several times and
>     > haven't
>     > > found answer.
>     > >
>     > > Consider following example:
>     > >
>     > > a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"),
> c("col1")))
>     > > a[1, ]
>     > > # 1
>     > >
>     > > It returns *unnamed* vector `1` where I would expect named vector.
> In
>     > fact
>     > > it returns named vector when number of columns is > 1.
>     > > Same issue applicable to single row matrix. Is it a bug? looks very
>     > > counterintuitive.
>     > >
>     > >
>     >
>
>
>     --
>     Regards
>     Dmitriy Selivanov
>
>         [[alternative HTML version deleted]]
>
>     ______________________________________________
>     [hidden email] mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting row in single column matrix drops names in resulting vector

Radford Neal
In reply to this post by Dmitriy Selivanov
Dmitriy Selivanov ([hidden email]) wrote:

> Consider following example:
>
> a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
> a[1, ]
> # 1
>
> It returns *unnamed* vector `1` where I would expect named vector. In fact
> it returns named vector when number of columns is > 1.
> Same issue applicable to single row matrix. Is it a bug? looks very
> counterintuitive.

This and related issues are addressed in pqR, in the new
release of 2018-11-18.  (See pqR-project.org, and my blog
post at radfordneal.wordpress.com)

The behaviour of a[1,] is unchanged, for backwards compatibility
reasons.  But in pqR one can explicitly mark an argument as
missing using "_".  When an array subscript is missing in this way,
the names will not be dropped in this context even if there is
only one of them.  So a[1,_] will do what you want:

  > a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
  > a[1, ]
  [1] 1
  > a[1,_]
  col1
     1

Furthermore, pqR will not drop names when the subscript is a
1D array (ie, has a length-1 dim attribute) even if it is only
one long.  In pqR, sequences that are 1D arrays are easily created
using the .. operator.  So the following works as intended when ..
is used, but not when the old : operator is used:

  > a = matrix(1:4, nrow=2, dimnames=list(c("row1","row2"),c("col1","col2")))
  > n = 2
  > a[1,1:n]
  col1 col2
     1    3
  > a[1,1..n]
  col1 col2
     1    3
  > n = 1
  > a[1,1:n]
  [1] 1
  > a[1,1..n]
  col1
     1

You can read more about this in my blog post at

https://radfordneal.wordpress.com/2016/06/25/fixing-rs-design-flaws-in-a-new-version-of-pqr/

That was written when most of these features where introduced,
though getting your specific example right relies on another
change introduced in the most recent version.

    Radford Neal

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting row in single column matrix drops names in resulting vector

Serguei Sokol
Le 27/11/2018 à 01:50, Radford Neal a écrit :

> Dmitriy Selivanov ([hidden email]) wrote:
>
>> Consider following example:
>>
>> a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
>> a[1, ]
>> # 1
>>
>> It returns *unnamed* vector `1` where I would expect named vector. In fact
>> it returns named vector when number of columns is > 1.
>> Same issue applicable to single row matrix. Is it a bug? looks very
>> counterintuitive.
> This and related issues are addressed in pqR, in the new
> release of 2018-11-18.  (See pqR-project.org, and my blog
> post at radfordneal.wordpress.com)
>
> The behaviour of a[1,] is unchanged, for backwards compatibility
> reasons.  But in pqR one can explicitly mark an argument as
> missing using "_".  When an array subscript is missing in this way,
> the names will not be dropped in this context even if there is
> only one of them.  So a[1,_] will do what you want:
>
>    > a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
>    > a[1, ]
>    [1] 1
>    > a[1,_]
>    col1
>       1
To my mind, it's rather counterintuitive as

> a[2,_]
col1
    1
so a[1,_] and a[2,_] have the same name. To make it intuitive (at least for me ;) )
it should rather return names "row1" and "row2" respectively.

Best,
Serguei.
 

>
> Furthermore, pqR will not drop names when the subscript is a
> 1D array (ie, has a length-1 dim attribute) even if it is only
> one long.  In pqR, sequences that are 1D arrays are easily created
> using the .. operator.  So the following works as intended when ..
> is used, but not when the old : operator is used:
>
>    > a = matrix(1:4, nrow=2, dimnames=list(c("row1","row2"),c("col1","col2")))
>    > n = 2
>    > a[1,1:n]
>    col1 col2
>       1    3
>    > a[1,1..n]
>    col1 col2
>       1    3
>    > n = 1
>    > a[1,1:n]
>    [1] 1
>    > a[1,1..n]
>    col1
>       1
>
> You can read more about this in my blog post at
>
> https://radfordneal.wordpress.com/2016/06/25/fixing-rs-design-flaws-in-a-new-version-of-pqr/
>
> That was written when most of these features where introduced,
> though getting your specific example right relies on another
> change introduced in the most recent version.
>
>      Radford Neal
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


--
Serguei Sokol
Ingenieur de recherche INRA

Cellule mathématiques
LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
135 Avenue de Rangueil
31077 Toulouse Cedex 04

tel: +33 5 62 25 01 27
email: [hidden email]
http://www.lisbp.fr

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting row in single column matrix drops names in resulting vector

Radford Neal
In reply to this post by Dmitriy Selivanov
> > The behaviour of a[1,] is unchanged, for backwards compatibility
> > reasons.  But in pqR one can explicitly mark an argument as
> > missing using "_".  When an array subscript is missing in this way,
> > the names will not be dropped in this context even if there is
> > only one of them.  So a[1,_] will do what you want:
> >
> >    > a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
> >    > a[1, ]
> >    [1] 1
> >    > a[1,_]
> >    col1
> >       1

> To my mind, it's rather counterintuitive as
>
> > a[2,_]
> col1
>     1
> so a[1,_] and a[2,_] have the same name. To make it intuitive (at least
> for me ;) ) it should rather return names "row1" and "row2" respectively.
>
> Best,
> Serguei.


The aim in designing these features should be to make it easier to
write reliable software, which doesn't unexpectedly fail in edge
cases.

Here, the fact that a is a matrix presumably means that the program is
designed to work for more than one column - in fact, it's likely that
the programmer was mostly thinking of the case where there is more
than one column, and perhaps only testing that case.  But of course
there is usually no reason why one column (or even zero columns) is
impossible.  We want the program to still work in such cases.

When there is more than one column, a[1,] and a[1,_] both produce a
vector with the _column_ names attached, and this is certainly not
going to change (nor should it, unless one wants to change the whole
semantics of matrices so that rows and columns are treated
non-symmetrically, and even then attaching the same row name to all
the elements would be rather strange...).

After v <- a[1,_], the program may well have an expression like v[nc]
where nc is a column name.  We want this to still work if there
happens to be only one column.  That will happen only if a[1,_]
attaches a column name, not a row name, when a has only one column.

   Radford Neal

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting row in single column matrix drops names in resulting vector

Serguei Sokol
The reason that multi-[column|row] and one-[column|row] matrices should
be treated in the same way as to names kept in the result sounds good to
me. I withdraw my remark.

Serguei.

Le 27/11/2018 à 15:48, Radford Neal a écrit :

>>> The behaviour of a[1,] is unchanged, for backwards compatibility
>>> reasons.  But in pqR one can explicitly mark an argument as
>>> missing using "_".  When an array subscript is missing in this way,
>>> the names will not be dropped in this context even if there is
>>> only one of them.  So a[1,_] will do what you want:
>>>
>>>     > a = matrix(1:2, nrow = 2, dimnames = list(c("row1", "row2"), c("col1")))
>>>     > a[1, ]
>>>     [1] 1
>>>     > a[1,_]
>>>     col1
>>>        1
>> To my mind, it's rather counterintuitive as
>>
>>> a[2,_]
>> col1
>>      1
>> so a[1,_] and a[2,_] have the same name. To make it intuitive (at least
>> for me ;) ) it should rather return names "row1" and "row2" respectively.
>>
>> Best,
>> Serguei.
>
> The aim in designing these features should be to make it easier to
> write reliable software, which doesn't unexpectedly fail in edge
> cases.
>
> Here, the fact that a is a matrix presumably means that the program is
> designed to work for more than one column - in fact, it's likely that
> the programmer was mostly thinking of the case where there is more
> than one column, and perhaps only testing that case.  But of course
> there is usually no reason why one column (or even zero columns) is
> impossible.  We want the program to still work in such cases.
>
> When there is more than one column, a[1,] and a[1,_] both produce a
> vector with the _column_ names attached, and this is certainly not
> going to change (nor should it, unless one wants to change the whole
> semantics of matrices so that rows and columns are treated
> non-symmetrically, and even then attaching the same row name to all
> the elements would be rather strange...).
>
> After v <- a[1,_], the program may well have an expression like v[nc]
> where nc is a column name.  We want this to still work if there
> happens to be only one column.  That will happen only if a[1,_]
> attaches a column name, not a row name, when a has only one column.
>
>     Radford Neal
>


--
Serguei Sokol
Ingenieur de recherche INRA

Cellule mathématiques
LISBP, INSA/INRA UMR 792, INSA/CNRS UMR 5504
135 Avenue de Rangueil
31077 Toulouse Cedex 04

tel: +33 5 62 25 01 27
email: [hidden email]
http://www.lisbp.fr

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel