Subsetting the "ROW"s of an object

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Subsetting the "ROW"s of an object

hadley wickham
Hi all,

Is there a better to way to subset the ROWs (in the sense of NROW) of
an vector, matrix, data frame or array than this?

subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L) {
    x[i]
  } else {
    dims <- rep(list(quote(expr = )), nd - 1L)
    do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
  }
}

subset_ROW(1:10, 4:6)
#> [1] 4 5 6

str(subset_ROW(array(1:10, c(10)), 2:4))
#>  int [1:3(1d)] 2 3 4
str(subset_ROW(array(1:10, c(10, 1)), 2:4))
#>  int [1:3, 1] 2 3 4
str(subset_ROW(array(1:10, c(5, 2)), 2:4))
#>  int [1:3, 1:2] 2 3 4 7 8 9
str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
#>  int [1:3, 1, 1] 2 3 4

subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
#>   x y
#> 2 2 9
#> 3 3 8
#> 4 4 7

It seems like there should be a way to do this that doesn't require
generating a call with missing arguments, but I can't think of it.

Thanks!

Hadley

--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Iñaki Úcar
El vie., 8 jun. 2018 a las 17:46, Hadley Wickham
(<[hidden email]>) escribió:

>
> Hi all,
>
> Is there a better to way to subset the ROWs (in the sense of NROW) of
> an vector, matrix, data frame or array than this?
>
> subset_ROW <- function(x, i) {
>   nd <- length(dim(x))
>   if (nd <= 1L) {
>     x[i]
>   } else {
>     dims <- rep(list(quote(expr = )), nd - 1L)
>     do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>   }
> }
>
> subset_ROW(1:10, 4:6)
> #> [1] 4 5 6
>
> str(subset_ROW(array(1:10, c(10)), 2:4))
> #>  int [1:3(1d)] 2 3 4
> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
> #>  int [1:3, 1] 2 3 4
> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
> #>  int [1:3, 1:2] 2 3 4 7 8 9
> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
> #>  int [1:3, 1, 1] 2 3 4
>
> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
> #>   x y
> #> 2 2 9
> #> 3 3 8
> #> 4 4 7
>
> It seems like there should be a way to do this that doesn't require
> generating a call with missing arguments, but I can't think of it.

The following code seems to work. The only minor drawback is that, for
the last case, the output is not a data frame.

subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L)
    return(x[i])
  xx <- apply(x, 2:nd, `[`, i, drop=FALSE)
  dim(xx) <- c(length(i), dim(x)[-1])
  xx
}

Iñaki

>
> Thanks!
>
> Hadley
>
> --
> http://hadley.nz
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Iñaki Úcar
Sorry, without remnants from other attempts:

subset_ROW <- function(x, i) {
  nd <- length(dim(x))
  if (nd <= 1L)
    return(x[i])
  apply(x, 2:nd, `[`, i, drop=FALSE)
}
El vie., 8 jun. 2018 a las 19:07, Iñaki Úcar (<[hidden email]>) escribió:

>
> El vie., 8 jun. 2018 a las 17:46, Hadley Wickham
> (<[hidden email]>) escribió:
> >
> > Hi all,
> >
> > Is there a better to way to subset the ROWs (in the sense of NROW) of
> > an vector, matrix, data frame or array than this?
> >
> > subset_ROW <- function(x, i) {
> >   nd <- length(dim(x))
> >   if (nd <= 1L) {
> >     x[i]
> >   } else {
> >     dims <- rep(list(quote(expr = )), nd - 1L)
> >     do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
> >   }
> > }
> >
> > subset_ROW(1:10, 4:6)
> > #> [1] 4 5 6
> >
> > str(subset_ROW(array(1:10, c(10)), 2:4))
> > #>  int [1:3(1d)] 2 3 4
> > str(subset_ROW(array(1:10, c(10, 1)), 2:4))
> > #>  int [1:3, 1] 2 3 4
> > str(subset_ROW(array(1:10, c(5, 2)), 2:4))
> > #>  int [1:3, 1:2] 2 3 4 7 8 9
> > str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
> > #>  int [1:3, 1, 1] 2 3 4
> >
> > subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
> > #>   x y
> > #> 2 2 9
> > #> 3 3 8
> > #> 4 4 7
> >
> > It seems like there should be a way to do this that doesn't require
> > generating a call with missing arguments, but I can't think of it.
>
> The following code seems to work. The only minor drawback is that, for
> the last case, the output is not a data frame.
>
> subset_ROW <- function(x, i) {
>   nd <- length(dim(x))
>   if (nd <= 1L)
>     return(x[i])
>   xx <- apply(x, 2:nd, `[`, i, drop=FALSE)
>   dim(xx) <- c(length(i), dim(x)[-1])
>   xx
> }
>
> Iñaki
>
> >
> > Thanks!
> >
> > Hadley
> >
> > --
> > http://hadley.nz
> >



--
Iñaki Úcar
http://www.enchufa2.es
@Enchufa2

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Michael Lawrence-3
In reply to this post by hadley wickham
There probably should be an abstraction for this. In S4Vectors, we
have extractROWS().

Michael

On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham <[hidden email]> wrote:

> Hi all,
>
> Is there a better to way to subset the ROWs (in the sense of NROW) of
> an vector, matrix, data frame or array than this?
>
> subset_ROW <- function(x, i) {
>   nd <- length(dim(x))
>   if (nd <= 1L) {
>     x[i]
>   } else {
>     dims <- rep(list(quote(expr = )), nd - 1L)
>     do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>   }
> }
>
> subset_ROW(1:10, 4:6)
> #> [1] 4 5 6
>
> str(subset_ROW(array(1:10, c(10)), 2:4))
> #>  int [1:3(1d)] 2 3 4
> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
> #>  int [1:3, 1] 2 3 4
> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
> #>  int [1:3, 1:2] 2 3 4 7 8 9
> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
> #>  int [1:3, 1, 1] 2 3 4
>
> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
> #>   x y
> #> 2 2 9
> #> 3 3 8
> #> 4 4 7
>
> It seems like there should be a way to do this that doesn't require
> generating a call with missing arguments, but I can't think of it.
>
> Thanks!
>
> Hadley
>
> --
> http://hadley.nz
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Berry, Charles
In reply to this post by hadley wickham


> On Jun 8, 2018, at 8:45 AM, Hadley Wickham <[hidden email]> wrote:
>
> Hi all,
>
> Is there a better to way to subset the ROWs (in the sense of NROW) of
> an vector, matrix, data frame or array than this?


You can use TRUE to fill the subscripts for dimensions 2:nd

>
> subset_ROW <- function(x, i) {
>  nd <- length(dim(x))
>  if (nd <= 1L) {
>    x[i]
>  } else {
>    dims <- rep(list(quote(expr = )), nd - 1L)
>    do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>  }
> }


subset_ROW <-
    function(x,i)
{
    mc <- quote(x[i])
    nd <- max(1L, length(dim(x)))
    mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
    mc[["drop"]] <- FALSE
    eval(mc)
   
}

>
> subset_ROW(1:10, 4:6)
> #> [1] 4 5 6
>
> str(subset_ROW(array(1:10, c(10)), 2:4))
> #>  int [1:3(1d)] 2 3 4
> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
> #>  int [1:3, 1] 2 3 4
> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
> #>  int [1:3, 1:2] 2 3 4 7 8 9
> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
> #>  int [1:3, 1, 1] 2 3 4
>
> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
> #>   x y
> #> 2 2 9
> #> 3 3 8
> #> 4 4 7
>

HTH,

Chuck

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

hadley wickham
I suspect this will have suboptimal performance since the TRUEs will
get recycled. (Maybe there is, or could be, ALTREP, support for
recycling)
Hadley

On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles <[hidden email]> wrote:

>
>
>> On Jun 8, 2018, at 8:45 AM, Hadley Wickham <[hidden email]> wrote:
>>
>> Hi all,
>>
>> Is there a better to way to subset the ROWs (in the sense of NROW) of
>> an vector, matrix, data frame or array than this?
>
>
> You can use TRUE to fill the subscripts for dimensions 2:nd
>
>>
>> subset_ROW <- function(x, i) {
>>  nd <- length(dim(x))
>>  if (nd <= 1L) {
>>    x[i]
>>  } else {
>>    dims <- rep(list(quote(expr = )), nd - 1L)
>>    do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>>  }
>> }
>
>
> subset_ROW <-
>     function(x,i)
> {
>     mc <- quote(x[i])
>     nd <- max(1L, length(dim(x)))
>     mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
>     mc[["drop"]] <- FALSE
>     eval(mc)
>
> }
>
>>
>> subset_ROW(1:10, 4:6)
>> #> [1] 4 5 6
>>
>> str(subset_ROW(array(1:10, c(10)), 2:4))
>> #>  int [1:3(1d)] 2 3 4
>> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
>> #>  int [1:3, 1] 2 3 4
>> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
>> #>  int [1:3, 1:2] 2 3 4 7 8 9
>> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
>> #>  int [1:3, 1, 1] 2 3 4
>>
>> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
>> #>   x y
>> #> 2 2 9
>> #> 3 3 8
>> #> 4 4 7
>>
>
> HTH,
>
> Chuck
>



--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Hervé Pagès-2
In reply to this post by Michael Lawrence-3
On 06/08/2018 10:15 AM, Michael Lawrence wrote:
> There probably should be an abstraction for this. In S4Vectors, we
> have extractROWS().

FWIW the code in S4Vectors that does what your subset_ROW() does is:

 
https://github.com/Bioconductor/S4Vectors/blob/04cc9516af986b30445e99fd1337f13321b7b4f6/R/subsetting-utils.R#L466-L476

(This is the default "extractROWS" method.)

Except for the normalization of 'i', it does the same as your
subset_ROW(). I don't know how to do this without generating a call
with missing arguments.

H.

>
> Michael
>
> On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham <[hidden email]> wrote:
>> Hi all,
>>
>> Is there a better to way to subset the ROWs (in the sense of NROW) of
>> an vector, matrix, data frame or array than this?
>>
>> subset_ROW <- function(x, i) {
>>    nd <- length(dim(x))
>>    if (nd <= 1L) {
>>      x[i]
>>    } else {
>>      dims <- rep(list(quote(expr = )), nd - 1L)
>>      do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>>    }
>> }
>>
>> subset_ROW(1:10, 4:6)
>> #> [1] 4 5 6
>>
>> str(subset_ROW(array(1:10, c(10)), 2:4))
>> #>  int [1:3(1d)] 2 3 4
>> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
>> #>  int [1:3, 1] 2 3 4
>> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
>> #>  int [1:3, 1:2] 2 3 4 7 8 9
>> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
>> #>  int [1:3, 1, 1] 2 3 4
>>
>> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
>> #>   x y
>> #> 2 2 9
>> #> 3 3 8
>> #> 4 4 7
>>
>> It seems like there should be a way to do this that doesn't require
>> generating a call with missing arguments, but I can't think of it.
>>
>> Thanks!
>>
>> Hadley
>>
>> --
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=GSpoAzc1Kn_BnTIkDh0HBFGKtRm-xFodxEPOejriC9Q&e=
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e=
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e=
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Hervé Pagès-2
On 06/08/2018 10:32 AM, Hervé Pagès wrote:
> On 06/08/2018 10:15 AM, Michael Lawrence wrote:
>> There probably should be an abstraction for this. In S4Vectors, we
>> have extractROWS().
>
> FWIW the code in S4Vectors that does what your subset_ROW() does is:
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_S4Vectors_blob_04cc9516af986b30445e99fd1337f13321b7b4f6_R_subsetting-2Dutils.R-23L466-2DL476&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=LnDTzOeXwI6VI-4SVVi2rwDE7A-az-AhxPAB6X7Lkhc&s=_2PVGd2BrNNHtPjGsJkhSLAmtX3eoFuZDWWs2c8zZ4w&e= 

Wrong link sorry. Here is the correct one:

 
https://github.com/Bioconductor/S4Vectors/blob/04cc9516af986b30445e99fd1337f13321b7b4f6/R/subsetting-utils.R#L453-L464

H.

>
>
> (This is the default "extractROWS" method.)
>
> Except for the normalization of 'i', it does the same as your
> subset_ROW(). I don't know how to do this without generating a call
> with missing arguments.
>
> H.
>
>>
>> Michael
>>
>> On Fri, Jun 8, 2018 at 8:45 AM, Hadley Wickham <[hidden email]>
>> wrote:
>>> Hi all,
>>>
>>> Is there a better to way to subset the ROWs (in the sense of NROW) of
>>> an vector, matrix, data frame or array than this?
>>>
>>> subset_ROW <- function(x, i) {
>>>    nd <- length(dim(x))
>>>    if (nd <= 1L) {
>>>      x[i]
>>>    } else {
>>>      dims <- rep(list(quote(expr = )), nd - 1L)
>>>      do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>>>    }
>>> }
>>>
>>> subset_ROW(1:10, 4:6)
>>> #> [1] 4 5 6
>>>
>>> str(subset_ROW(array(1:10, c(10)), 2:4))
>>> #>  int [1:3(1d)] 2 3 4
>>> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
>>> #>  int [1:3, 1] 2 3 4
>>> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
>>> #>  int [1:3, 1:2] 2 3 4 7 8 9
>>> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
>>> #>  int [1:3, 1, 1] 2 3 4
>>>
>>> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
>>> #>   x y
>>> #> 2 2 9
>>> #> 3 3 8
>>> #> 4 4 7
>>>
>>> It seems like there should be a way to do this that doesn't require
>>> generating a call with missing arguments, but I can't think of it.
>>>
>>> Thanks!
>>>
>>> Hadley
>>>
>>> --
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__hadley.nz&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=GSpoAzc1Kn_BnTIkDh0HBFGKtRm-xFodxEPOejriC9Q&e= 
>>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e= 
>>>
>>>
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=MF0DzYDiaYtcFXIyQwpQKs9lVbLNvdBBUubTv7BVAfM&s=HsEbNAT5IElAUS-W2VVSeJs4tfQc77heV7BbQxru518&e= 
>>
>>
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Hervé Pagès-2
In reply to this post by hadley wickham
Also the TRUEs cause problems if some dimensions are 0:

   > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
   Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
     (subscript) logical subscript too long

H.

On 06/08/2018 10:29 AM, Hadley Wickham wrote:

> I suspect this will have suboptimal performance since the TRUEs will
> get recycled. (Maybe there is, or could be, ALTREP, support for
> recycling)
> Hadley
>
> On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles <[hidden email]> wrote:
>>
>>
>>> On Jun 8, 2018, at 8:45 AM, Hadley Wickham <[hidden email]> wrote:
>>>
>>> Hi all,
>>>
>>> Is there a better to way to subset the ROWs (in the sense of NROW) of
>>> an vector, matrix, data frame or array than this?
>>
>>
>> You can use TRUE to fill the subscripts for dimensions 2:nd
>>
>>>
>>> subset_ROW <- function(x, i) {
>>>   nd <- length(dim(x))
>>>   if (nd <= 1L) {
>>>     x[i]
>>>   } else {
>>>     dims <- rep(list(quote(expr = )), nd - 1L)
>>>     do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>>>   }
>>> }
>>
>>
>> subset_ROW <-
>>      function(x,i)
>> {
>>      mc <- quote(x[i])
>>      nd <- max(1L, length(dim(x)))
>>      mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
>>      mc[["drop"]] <- FALSE
>>      eval(mc)
>>
>> }
>>
>>>
>>> subset_ROW(1:10, 4:6)
>>> #> [1] 4 5 6
>>>
>>> str(subset_ROW(array(1:10, c(10)), 2:4))
>>> #>  int [1:3(1d)] 2 3 4
>>> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
>>> #>  int [1:3, 1] 2 3 4
>>> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
>>> #>  int [1:3, 1:2] 2 3 4 7 8 9
>>> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
>>> #>  int [1:3, 1, 1] 2 3 4
>>>
>>> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
>>> #>   x y
>>> #> 2 2 9
>>> #> 3 3 8
>>> #> 4 4 7
>>>
>>
>> HTH,
>>
>> Chuck
>>
>
>
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Berry, Charles


> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <[hidden email]> wrote:
>
> Also the TRUEs cause problems if some dimensions are 0:
>
>  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>    (subscript) logical subscript too long

OK. But this is easy enough to handle.

>
> H.
>
> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>> I suspect this will have suboptimal performance since the TRUEs will
>> get recycled. (Maybe there is, or could be, ALTREP, support for
>> recycling)
>> Hadley


AFAICS, it is not an issue. Taking

arr <- array(rnorm(2^22),c(2^10,4,4,4))

as a test case

and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':

subset_ROW4 <-
     function(x, i, useLiteral=FALSE)
{
    literal <- quote(x[i,,,,drop=FALSE])
    mc <- quote(x[i])
    nd <- max(1L, length(dim(x)))
    mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
    mc[["drop"]] <- FALSE
    if (useLiteral)
        eval(literal)
    else
        eval(mc)
 }

I get identical times with

system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))

and with

system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))

Changing the dimensions to c(2^5, 2^7, 4, 4 ) and running something similar also shows equal times.

Chuck

>> On Fri, Jun 8, 2018 at 10:16 AM, Berry, Charles <[hidden email]> wrote:
>>>
>>>
>>>> On Jun 8, 2018, at 8:45 AM, Hadley Wickham <[hidden email]> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Is there a better to way to subset the ROWs (in the sense of NROW) of
>>>> an vector, matrix, data frame or array than this?
>>>
>>>
>>> You can use TRUE to fill the subscripts for dimensions 2:nd
>>>
>>>>
>>>> subset_ROW <- function(x, i) {
>>>>  nd <- length(dim(x))
>>>>  if (nd <= 1L) {
>>>>    x[i]
>>>>  } else {
>>>>    dims <- rep(list(quote(expr = )), nd - 1L)
>>>>    do.call(`[`, c(list(quote(x), quote(i)), dims, list(drop = FALSE)))
>>>>  }
>>>> }
>>>
>>>
>>> subset_ROW <-
>>>     function(x,i)
>>> {
>>>     mc <- quote(x[i])
>>>     nd <- max(1L, length(dim(x)))
>>>     mc[seq(4, length=nd-1L)] <- rep(list(TRUE), nd - 1L)
>>>     mc[["drop"]] <- FALSE
>>>     eval(mc)
>>>
>>> }
>>>
>>>>
>>>> subset_ROW(1:10, 4:6)
>>>> #> [1] 4 5 6
>>>>
>>>> str(subset_ROW(array(1:10, c(10)), 2:4))
>>>> #>  int [1:3(1d)] 2 3 4
>>>> str(subset_ROW(array(1:10, c(10, 1)), 2:4))
>>>> #>  int [1:3, 1] 2 3 4
>>>> str(subset_ROW(array(1:10, c(5, 2)), 2:4))
>>>> #>  int [1:3, 1:2] 2 3 4 7 8 9
>>>> str(subset_ROW(array(1:10, c(10, 1, 1)), 2:4))
>>>> #>  int [1:3, 1, 1] 2 3 4
>>>>
>>>> subset_ROW(data.frame(x = 1:10, y = 10:1), 2:4)
>>>> #>   x y
>>>> #> 2 2 9
>>>> #> 3 3 8
>>>> #> 4 4 7
>>>>
>>>
>>> HTH,
>>>
>>> Chuck
>>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

hadley wickham
On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <[hidden email]> wrote:

>
>
>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <[hidden email]> wrote:
>>
>> Also the TRUEs cause problems if some dimensions are 0:
>>
>>  > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>  Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>    (subscript) logical subscript too long
>
> OK. But this is easy enough to handle.
>
>>
>> H.
>>
>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>> I suspect this will have suboptimal performance since the TRUEs will
>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>> recycling)
>>> Hadley
>
>
> AFAICS, it is not an issue. Taking
>
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>
> as a test case
>
> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':
>
> subset_ROW4 <-
>      function(x, i, useLiteral=FALSE)
> {
>     literal <- quote(x[i,,,,drop=FALSE])
>     mc <- quote(x[i])
>     nd <- max(1L, length(dim(x)))
>     mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>     mc[["drop"]] <- FALSE
>     if (useLiteral)
>         eval(literal)
>     else
>         eval(mc)
>  }
>
> I get identical times with
>
> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>
> and with
>
> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))

I think that's because you used a relatively low precision timing
mechnaism, and included the index generation in the timing. I see:

arr <- array(rnorm(2^22),c(2^10,4,4,4))
i <- seq(1,length = 10, by = 100)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)
#> # A tibble: 2 x 1
#>   expression        min    mean   median      max  n_gc
#>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
#> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms     2
#> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs     2

So not a huge difference, but it's there.

Hadley


--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Hervé Pagès-2
A missing subscript is still preferable to a TRUE though because it
carries the meaning "take it all". A TRUE also achieves this but via
implicit recycling. For example x[ , , ] and x[TRUE, TRUE, TRUE]
achieve the same thing (if length(x) != 0) and are both no-ops but
the subsetting code gets a chance to immediately and easily detect
the former as a no-op whereas it will probably not be able to do it
so easily for the latter. So in this case it will most likely generate
a copy of 'x' and fill the new array by taking a full walk on it.

H.

On 06/08/2018 11:52 AM, Hadley Wickham wrote:

> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <[hidden email]> wrote:
>>
>>
>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <[hidden email]> wrote:
>>>
>>> Also the TRUEs cause problems if some dimensions are 0:
>>>
>>>   > matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>>   Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>>     (subscript) logical subscript too long
>>
>> OK. But this is easy enough to handle.
>>
>>>
>>> H.
>>>
>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>>> I suspect this will have suboptimal performance since the TRUEs will
>>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>>> recycling)
>>>> Hadley
>>
>>
>> AFAICS, it is not an issue. Taking
>>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>
>> as a test case
>>
>> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':
>>
>> subset_ROW4 <-
>>       function(x, i, useLiteral=FALSE)
>> {
>>      literal <- quote(x[i,,,,drop=FALSE])
>>      mc <- quote(x[i])
>>      nd <- max(1L, length(dim(x)))
>>      mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>      mc[["drop"]] <- FALSE
>>      if (useLiteral)
>>          eval(literal)
>>      else
>>          eval(mc)
>>   }
>>
>> I get identical times with
>>
>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>
>> and with
>>
>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>
> I think that's because you used a relatively low precision timing
> mechnaism, and included the index generation in the timing. I see:
>
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
> i <- seq(1,length = 10, by = 100)
>
> bench::mark(
>    arr[i, TRUE, TRUE, TRUE],
>    arr[i, , , ]
> )
> #> # A tibble: 2 x 1
> #>   expression        min    mean   median      max  n_gc
> #>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms     2
> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs     2
>
> So not a huge difference, but it's there.
>
> Hadley
>
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Berry, Charles
In reply to this post by hadley wickham


> On Jun 8, 2018, at 11:52 AM, Hadley Wickham <[hidden email]> wrote:
>
> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <[hidden email]> wrote:
>>
>>
>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <[hidden email]> wrote:
>>>
>>> Also the TRUEs cause problems if some dimensions are 0:
>>>
>>>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>>   (subscript) logical subscript too long
>>
>> OK. But this is easy enough to handle.
>>
>>>
>>> H.
>>>
>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>>> I suspect this will have suboptimal performance since the TRUEs will
>>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>>> recycling)
>>>> Hadley
>>
>>
>> AFAICS, it is not an issue. Taking
>>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>
>> as a test case
>>
>> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':
>>
>> subset_ROW4 <-
>>     function(x, i, useLiteral=FALSE)
>> {
>>    literal <- quote(x[i,,,,drop=FALSE])
>>    mc <- quote(x[i])
>>    nd <- max(1L, length(dim(x)))
>>    mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>    mc[["drop"]] <- FALSE
>>    if (useLiteral)
>>        eval(literal)
>>    else
>>        eval(mc)
>> }
>>
>> I get identical times with
>>
>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>
>> and with
>>
>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>
> I think that's because you used a relatively low precision timing
> mechnaism, and included the index generation in the timing. I see:
>
> arr <- array(rnorm(2^22),c(2^10,4,4,4))
> i <- seq(1,length = 10, by = 100)
>
> bench::mark(
>  arr[i, TRUE, TRUE, TRUE],
>  arr[i, , , ]
> )
> #> # A tibble: 2 x 1
> #>   expression        min    mean   median      max  n_gc
> #>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms     2
> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs     2
>
> So not a huge difference, but it's there.


Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent.

But with subset_ROW4 I see no consistent difference.

In this example, it runs faster on average using `eval(mc)' to return the result:

> arr <- array(rnorm(2^22),c(2^10,4,4,4))
> i <- seq(1,length=10,by=100)
> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
# A tibble: 2 x 8
  expression                      min     mean   median      max `itr/sec` mem_alloc  n_gc
  <chr>                      <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl>
1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms    28686.    5.05KB     5
2 subset_ROW4(arr, i, TRUE)    28.9µs     35µs   32.4µs 875.11µs    28572.    5.05KB     5
>

And on subsequent reps the lead switches back and forth.


Chuck

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

hadley wickham
Hmmm, yes, there must be some special case in the C code to avoid
recycling a length-1 logical vector:

dims <- c(4, 4, 4, 1e5)

arr <- array(rnorm(prod(dims)), dims)
dim(arr)
#> [1]      4      4      4 100000
i <- c(1, 3)

bench::mark(
  arr[i, TRUE, TRUE, TRUE],
  arr[i, , , ]
)[c("expression", "min", "mean", "max")]
#> # A tibble: 2 x 4
#>   expression                    min     mean      max
#>   <chr>                    <bch:tm> <bch:tm> <bch:tm>
#> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
#> 2 arr[i, , , ]               41.7ms   43.1ms   46.3ms


On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles <[hidden email]> wrote:

>
>
>> On Jun 8, 2018, at 11:52 AM, Hadley Wickham <[hidden email]> wrote:
>>
>> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <[hidden email]> wrote:
>>>
>>>
>>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <[hidden email]> wrote:
>>>>
>>>> Also the TRUEs cause problems if some dimensions are 0:
>>>>
>>>>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>>>   (subscript) logical subscript too long
>>>
>>> OK. But this is easy enough to handle.
>>>
>>>>
>>>> H.
>>>>
>>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>>>> I suspect this will have suboptimal performance since the TRUEs will
>>>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>>>> recycling)
>>>>> Hadley
>>>
>>>
>>> AFAICS, it is not an issue. Taking
>>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>>
>>> as a test case
>>>
>>> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':
>>>
>>> subset_ROW4 <-
>>>     function(x, i, useLiteral=FALSE)
>>> {
>>>    literal <- quote(x[i,,,,drop=FALSE])
>>>    mc <- quote(x[i])
>>>    nd <- max(1L, length(dim(x)))
>>>    mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>>    mc[["drop"]] <- FALSE
>>>    if (useLiteral)
>>>        eval(literal)
>>>    else
>>>        eval(mc)
>>> }
>>>
>>> I get identical times with
>>>
>>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>>
>>> and with
>>>
>>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>>
>> I think that's because you used a relatively low precision timing
>> mechnaism, and included the index generation in the timing. I see:
>>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>> i <- seq(1,length = 10, by = 100)
>>
>> bench::mark(
>>  arr[i, TRUE, TRUE, TRUE],
>>  arr[i, , , ]
>> )
>> #> # A tibble: 2 x 1
>> #>   expression        min    mean   median      max  n_gc
>> #>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
>> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms     2
>> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs     2
>>
>> So not a huge difference, but it's there.
>
>
> Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent.
>
> But with subset_ROW4 I see no consistent difference.
>
> In this example, it runs faster on average using `eval(mc)' to return the result:
>
>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>> i <- seq(1,length=10,by=100)
>> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
> # A tibble: 2 x 8
>   expression                      min     mean   median      max `itr/sec` mem_alloc  n_gc
>   <chr>                      <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl>
> 1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms    28686.    5.05KB     5
> 2 subset_ROW4(arr, i, TRUE)    28.9µs     35µs   32.4µs 875.11µs    28572.    5.05KB     5
>>
>
> And on subsequent reps the lead switches back and forth.
>
>
> Chuck
>



--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Michael Lawrence-3
Actually, it's sort of the opposite. Everything becomes a sequence of
integers internally, even when the argument is missing. So the same
amount of work is done, basically. ALTREP will let us improve this
sort of thing.

Michael

On Fri, Jun 8, 2018 at 1:49 PM, Hadley Wickham <[hidden email]> wrote:

> Hmmm, yes, there must be some special case in the C code to avoid
> recycling a length-1 logical vector:
>
> dims <- c(4, 4, 4, 1e5)
>
> arr <- array(rnorm(prod(dims)), dims)
> dim(arr)
> #> [1]      4      4      4 100000
> i <- c(1, 3)
>
> bench::mark(
>   arr[i, TRUE, TRUE, TRUE],
>   arr[i, , , ]
> )[c("expression", "min", "mean", "max")]
> #> # A tibble: 2 x 4
> #>   expression                    min     mean      max
> #>   <chr>                    <bch:tm> <bch:tm> <bch:tm>
> #> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
> #> 2 arr[i, , , ]               41.7ms   43.1ms   46.3ms
>
>
> On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles <[hidden email]> wrote:
>>
>>
>>> On Jun 8, 2018, at 11:52 AM, Hadley Wickham <[hidden email]> wrote:
>>>
>>> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <[hidden email]> wrote:
>>>>
>>>>
>>>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <[hidden email]> wrote:
>>>>>
>>>>> Also the TRUEs cause problems if some dimensions are 0:
>>>>>
>>>>>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>>>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>>>>   (subscript) logical subscript too long
>>>>
>>>> OK. But this is easy enough to handle.
>>>>
>>>>>
>>>>> H.
>>>>>
>>>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>>>>> I suspect this will have suboptimal performance since the TRUEs will
>>>>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>>>>> recycling)
>>>>>> Hadley
>>>>
>>>>
>>>> AFAICS, it is not an issue. Taking
>>>>
>>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>>>
>>>> as a test case
>>>>
>>>> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':
>>>>
>>>> subset_ROW4 <-
>>>>     function(x, i, useLiteral=FALSE)
>>>> {
>>>>    literal <- quote(x[i,,,,drop=FALSE])
>>>>    mc <- quote(x[i])
>>>>    nd <- max(1L, length(dim(x)))
>>>>    mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>>>    mc[["drop"]] <- FALSE
>>>>    if (useLiteral)
>>>>        eval(literal)
>>>>    else
>>>>        eval(mc)
>>>> }
>>>>
>>>> I get identical times with
>>>>
>>>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>>>
>>>> and with
>>>>
>>>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>>>
>>> I think that's because you used a relatively low precision timing
>>> mechnaism, and included the index generation in the timing. I see:
>>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>> i <- seq(1,length = 10, by = 100)
>>>
>>> bench::mark(
>>>  arr[i, TRUE, TRUE, TRUE],
>>>  arr[i, , , ]
>>> )
>>> #> # A tibble: 2 x 1
>>> #>   expression        min    mean   median      max  n_gc
>>> #>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
>>> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms     2
>>> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs     2
>>>
>>> So not a huge difference, but it's there.
>>
>>
>> Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent.
>>
>> But with subset_ROW4 I see no consistent difference.
>>
>> In this example, it runs faster on average using `eval(mc)' to return the result:
>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>> i <- seq(1,length=10,by=100)
>>> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
>> # A tibble: 2 x 8
>>   expression                      min     mean   median      max `itr/sec` mem_alloc  n_gc
>>   <chr>                      <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl>
>> 1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms    28686.    5.05KB     5
>> 2 subset_ROW4(arr, i, TRUE)    28.9µs     35µs   32.4µs 875.11µs    28572.    5.05KB     5
>>>
>>
>> And on subsequent reps the lead switches back and forth.
>>
>>
>> Chuck
>>
>
>
>
> --
> http://hadley.nz
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Hervé Pagès-2
In reply to this post by hadley wickham
The C code for subsetting doesn't need to recycle a logical subscript.
It only needs to walk on it and start again at the beginning of the
vector when it reaches the end. Not exactly the same as detecting the
"take everything along that dimension" situation though.
x[TRUE, TRUE, TRUE] triggers the full subsetting machinery when x[]
and x[ , , ] could (and should) easily avoid it.

H.

On 06/08/2018 01:49 PM, Hadley Wickham wrote:

> Hmmm, yes, there must be some special case in the C code to avoid
> recycling a length-1 logical vector:
>
> dims <- c(4, 4, 4, 1e5)
>
> arr <- array(rnorm(prod(dims)), dims)
> dim(arr)
> #> [1]      4      4      4 100000
> i <- c(1, 3)
>
> bench::mark(
>    arr[i, TRUE, TRUE, TRUE],
>    arr[i, , , ]
> )[c("expression", "min", "mean", "max")]
> #> # A tibble: 2 x 4
> #>   expression                    min     mean      max
> #>   <chr>                    <bch:tm> <bch:tm> <bch:tm>
> #> 1 arr[i, TRUE, TRUE, TRUE]   41.8ms   43.6ms   46.5ms
> #> 2 arr[i, , , ]               41.7ms   43.1ms   46.3ms
>
>
> On Fri, Jun 8, 2018 at 12:31 PM, Berry, Charles <[hidden email]> wrote:
>>
>>
>>> On Jun 8, 2018, at 11:52 AM, Hadley Wickham <[hidden email]> wrote:
>>>
>>> On Fri, Jun 8, 2018 at 11:38 AM, Berry, Charles <[hidden email]> wrote:
>>>>
>>>>
>>>>> On Jun 8, 2018, at 10:37 AM, Hervé Pagès <[hidden email]> wrote:
>>>>>
>>>>> Also the TRUEs cause problems if some dimensions are 0:
>>>>>
>>>>>> matrix(raw(0), nrow=5, ncol=0)[1:3 , TRUE]
>>>>> Error in matrix(raw(0), nrow = 5, ncol = 0)[1:3, TRUE] :
>>>>>    (subscript) logical subscript too long
>>>>
>>>> OK. But this is easy enough to handle.
>>>>
>>>>>
>>>>> H.
>>>>>
>>>>> On 06/08/2018 10:29 AM, Hadley Wickham wrote:
>>>>>> I suspect this will have suboptimal performance since the TRUEs will
>>>>>> get recycled. (Maybe there is, or could be, ALTREP, support for
>>>>>> recycling)
>>>>>> Hadley
>>>>
>>>>
>>>> AFAICS, it is not an issue. Taking
>>>>
>>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>>>
>>>> as a test case
>>>>
>>>> and using a function that will either use the literal code `x[i,,,,drop=FALSE]' or `eval(mc)':
>>>>
>>>> subset_ROW4 <-
>>>>      function(x, i, useLiteral=FALSE)
>>>> {
>>>>     literal <- quote(x[i,,,,drop=FALSE])
>>>>     mc <- quote(x[i])
>>>>     nd <- max(1L, length(dim(x)))
>>>>     mc[seq(4,length=nd-1L)] <- rep(TRUE, nd-1L)
>>>>     mc[["drop"]] <- FALSE
>>>>     if (useLiteral)
>>>>         eval(literal)
>>>>     else
>>>>         eval(mc)
>>>> }
>>>>
>>>> I get identical times with
>>>>
>>>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),TRUE))
>>>>
>>>> and with
>>>>
>>>> system.time(for (i in 1:10000) subset_ROW4(arr,seq(1,length=10,by=100),FALSE))
>>>
>>> I think that's because you used a relatively low precision timing
>>> mechnaism, and included the index generation in the timing. I see:
>>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>> i <- seq(1,length = 10, by = 100)
>>>
>>> bench::mark(
>>>   arr[i, TRUE, TRUE, TRUE],
>>>   arr[i, , , ]
>>> )
>>> #> # A tibble: 2 x 1
>>> #>   expression        min    mean   median      max  n_gc
>>> #>   <chr>         <bch:t> <bch:t> <bch:tm> <bch:tm> <dbl>
>>> #> 1 arr[i, TRUE,…   7.4µs  10.9µs  10.66µs   1.22ms     2
>>> #> 2 arr[i, , , ]   7.06µs   8.8µs   7.85µs 538.09µs     2
>>>
>>> So not a huge difference, but it's there.
>>
>>
>> Funny. I get similar results to yours above albeit with smaller differences. Usually < 5 percent.
>>
>> But with subset_ROW4 I see no consistent difference.
>>
>> In this example, it runs faster on average using `eval(mc)' to return the result:
>>
>>> arr <- array(rnorm(2^22),c(2^10,4,4,4))
>>> i <- seq(1,length=10,by=100)
>>> bench::mark(subset_ROW4(arr,i,FALSE), subset_ROW4(arr,i,TRUE))[,1:8]
>> # A tibble: 2 x 8
>>    expression                      min     mean   median      max `itr/sec` mem_alloc  n_gc
>>    <chr>                      <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt> <dbl>
>> 1 subset_ROW4(arr, i, FALSE)   28.9µs   34.9µs   32.1µs   1.36ms    28686.    5.05KB     5
>> 2 subset_ROW4(arr, i, TRUE)    28.9µs     35µs   32.4µs 875.11µs    28572.    5.05KB     5
>>>
>>
>> And on subsequent reps the lead switches back and forth.
>>
>>
>> Chuck
>>
>
>
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Berry, Charles
In reply to this post by hadley wickham


> On Jun 8, 2018, at 1:49 PM, Hadley Wickham <[hidden email]> wrote:
>
> Hmmm, yes, there must be some special case in the C code to avoid
> recycling a length-1 logical vector:


Here is a version that (I think) handles Herve's issue of arrays having one or more 0 dimensions.

subset_ROW <-
    function(x,i)
{
    dims <- dim(x)
    index_list <- which(dims[-1] != 0L) + 3
    mc <- quote(x[i])
    nd <- max(1L, length(dims))
    mc[ index_list ] <- list(TRUE)
    mc[[ nd + 3L ]] <- FALSE
    names( mc )[ nd+3L ] <- "drop"
    eval(mc)
}

Curiously enough the timing is *much* better for this implementation than for the first version I sent.

Constructing a version of `mc' that looks like `x[i,,,,drop=FALSE]' can be done with `alist(a=)' in place of `list(TRUE)' in the earlier version but seems to slow things down noticeably. It requires almost twice (!!) as much time as the version above.

Best,

Chuck
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

hadley wickham
On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles <[hidden email]> wrote:

>
>
>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham <[hidden email]> wrote:
>>
>> Hmmm, yes, there must be some special case in the C code to avoid
>> recycling a length-1 logical vector:
>
>
> Here is a version that (I think) handles Herve's issue of arrays having one or more 0 dimensions.
>
> subset_ROW <-
>     function(x,i)
> {
>     dims <- dim(x)
>     index_list <- which(dims[-1] != 0L) + 3
>     mc <- quote(x[i])
>     nd <- max(1L, length(dims))
>     mc[ index_list ] <- list(TRUE)
>     mc[[ nd + 3L ]] <- FALSE
>     names( mc )[ nd+3L ] <- "drop"
>     eval(mc)
> }
>
> Curiously enough the timing is *much* better for this implementation than for the first version I sent.
>
> Constructing a version of `mc' that looks like `x[i,,,,drop=FALSE]' can be done with `alist(a=)' in place of `list(TRUE)' in the earlier version but seems to slow things down noticeably. It requires almost twice (!!) as much time as the version above.

I think that's probably because alist() is a slow way to generate a
missing symbol:

bench::mark(
  alist(x = ),
  list(x = quote(expr = )),
  check = FALSE
)[1:5]
#> # A tibble: 2 x 5
#>   expression                    min     mean   median      max
#>   <chr>                    <bch:tm> <bch:tm> <bch:tm> <bch:tm>
#> 1 alist(x = )                 2.8µs   3.54µs   3.29µs   34.9µs
#> 2 list(x = quote(expr = ))    169ns 219.38ns    181ns   24.2µs

(note the units)

Hadley


--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Berry, Charles


> On Jun 8, 2018, at 2:15 PM, Hadley Wickham <[hidden email]> wrote:
>
> On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles <[hidden email]> wrote:
>>
>>
>>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham <[hidden email]> wrote:
>>>
>>> Hmmm, yes, there must be some special case in the C code to avoid
>>> recycling a length-1 logical vector:
>>
>>
>> Here is a version that (I think) handles Herve's issue of arrays having one or more 0 dimensions.
>>
>> subset_ROW <-
>>    function(x,i)
>> {
>>    dims <- dim(x)
>>    index_list <- which(dims[-1] != 0L) + 3
>>    mc <- quote(x[i])
>>    nd <- max(1L, length(dims))
>>    mc[ index_list ] <- list(TRUE)
>>    mc[[ nd + 3L ]] <- FALSE
>>    names( mc )[ nd+3L ] <- "drop"
>>    eval(mc)
>> }
>>
>> Curiously enough the timing is *much* better for this implementation than for the first version I sent.
>>
>> Constructing a version of `mc' that looks like `x[i,,,,drop=FALSE]' can be done with `alist(a=)' in place of `list(TRUE)' in the earlier version but seems to slow things down noticeably. It requires almost twice (!!) as much time as the version above.
>
> I think that's probably because alist() is a slow way to generate a
> missing symbol:
>
> bench::mark(
>  alist(x = ),
>  list(x = quote(expr = )),
>  check = FALSE
> )[1:5]
> #> # A tibble: 2 x 5
> #>   expression                    min     mean   median      max
> #>   <chr>                    <bch:tm> <bch:tm> <bch:tm> <bch:tm>
> #> 1 alist(x = )                 2.8µs   3.54µs   3.29µs   34.9µs
> #> 2 list(x = quote(expr = ))    169ns 219.38ns    181ns   24.2µs
>
> (note the units)

Yes. That is good for about half the difference. And I guess the rest is getting rid of seq(). This seems a bit quicker than anything else and satisfies Herve's objections:

subset_ROW <-
      function(x,i)
  {
      dims <- dim(x)
      nd <- length(dims)
      index_list <- if (nd > 1) 2L + 2L:nd else 0
      mc <- quote(x[i])
      mc[ index_list ] <- list(quote(expr=))
      mc[[ "drop" ]] <- FALSE
      eval(mc)
  }

Chuck
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Subsetting the "ROW"s of an object

Hervé Pagès-2
In reply to this post by hadley wickham


On 06/08/2018 02:15 PM, Hadley Wickham wrote:

> On Fri, Jun 8, 2018 at 2:09 PM, Berry, Charles <[hidden email]> wrote:
>>
>>
>>> On Jun 8, 2018, at 1:49 PM, Hadley Wickham <[hidden email]> wrote:
>>>
>>> Hmmm, yes, there must be some special case in the C code to avoid
>>> recycling a length-1 logical vector:
>>
>>
>> Here is a version that (I think) handles Herve's issue of arrays having one or more 0 dimensions.
>>
>> subset_ROW <-
>>      function(x,i)
>> {
>>      dims <- dim(x)
>>      index_list <- which(dims[-1] != 0L) + 3
>>      mc <- quote(x[i])
>>      nd <- max(1L, length(dims))
>>      mc[ index_list ] <- list(TRUE)
>>      mc[[ nd + 3L ]] <- FALSE
>>      names( mc )[ nd+3L ] <- "drop"
>>      eval(mc)
>> }
>>
>> Curiously enough the timing is *much* better for this implementation than for the first version I sent.
>>
>> Constructing a version of `mc' that looks like `x[i,,,,drop=FALSE]' can be done with `alist(a=)' in place of `list(TRUE)' in the earlier version but seems to slow things down noticeably. It requires almost twice (!!) as much time as the version above.
>
> I think that's probably because alist() is a slow way to generate a
> missing symbol:
>
> bench::mark(
>    alist(x = ),
>    list(x = quote(expr = )),
>    check = FALSE
> )[1:5]
> #> # A tibble: 2 x 5
> #>   expression                    min     mean   median      max
> #>   <chr>                    <bch:tm> <bch:tm> <bch:tm> <bch:tm>
> #> 1 alist(x = )                 2.8µs   3.54µs   3.29µs   34.9µs
> #> 2 list(x = quote(expr = ))    169ns 219.38ns    181ns   24.2µs
>
> (note the units)

That's a good one. Need to change this in S4Vectors::default_extractROWS()
and other places. Thanks!

H.

>
> Hadley
>
>

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel