Error in unsplit() with tibbles

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Error in unsplit() with tibbles

Mario Annau
Hello,

using the `unsplit()` function with tibbles currently leads to the
following error:

> mtcars_tb <- as_tibble(mtcars, rownames = NULL)
> s <- split(mtcars_tb, mtcars_tb$gear)
> unsplit(s, mtcars_tb$gear)
 Error: Must subset rows with a valid subscript vector.
ℹ Logical subscripts must match the size of the indexed input.
x Input has size 15 but subscript `rep(NA, len)` has size 32.
Run `rlang::last_error()` to see where the error occurred.

Tibble seems to (rightly) complain, that a logical vector has been used for
subsetting which does not have the same length as the data.frame (rows).
Since `NA` is a logical value, the subset should be changed to
`NA_integer_` in `unsplit()`:

> unsplit
function (value, f, drop = FALSE)
{
    len <- length(if (is.list(f)) f[[1L]] else f)
    if (is.data.frame(value[[1L]])) {
        x <- value[[1L]][rep(*NA_integer_*, len), , drop = FALSE]
        rownames(x) <- unsplit(lapply(value, rownames), f, drop = drop)
    }
    else x <- value[[1L]][rep(NA, len)]
    split(x, f, drop = drop) <- value
    x
}

Cheers,
Mario

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Error in unsplit() with tibbles

R devel mailing list

> On Nov 21, 2020, at 10:55 AM, Mario Annau <[hidden email]> wrote:
>
> Hello,
>
> using the `unsplit()` function with tibbles currently leads to the
> following error:
>
>> mtcars_tb <- as_tibble(mtcars, rownames = NULL)
>> s <- split(mtcars_tb, mtcars_tb$gear)
>> unsplit(s, mtcars_tb$gear)
> Error: Must subset rows with a valid subscript vector.
> ℹ Logical subscripts must match the size of the indexed input.
> x Input has size 15 but subscript `rep(NA, len)` has size 32.
> Run `rlang::last_error()` to see where the error occurred.
>
> Tibble seems to (rightly) complain, that a logical vector has been used for
> subsetting which does not have the same length as the data.frame (rows).
> Since `NA` is a logical value, the subset should be changed to
> `NA_integer_` in `unsplit()`:
>
>> unsplit
> function (value, f, drop = FALSE)
> {
>    len <- length(if (is.list(f)) f[[1L]] else f)
>    if (is.data.frame(value[[1L]])) {
>        x <- value[[1L]][rep(*NA_integer_*, len), , drop = FALSE]
>        rownames(x) <- unsplit(lapply(value, rownames), f, drop = drop)
>    }
>    else x <- value[[1L]][rep(NA, len)]
>    split(x, f, drop = drop) <- value
>    x
> }
>
> Cheers,
> Mario


Hi,

Perhaps I am missing something, but if you are using objects, like tibbles, that are intended to be part of another environment, in this case the tidyverse, why would you not use functions to manipulate these objects that were specifically created in the other environment?

I don't use the tidyverse, but it seems to me that to expect base R functions to work with objects not created in base R, is problematic, even though, perhaps by coincidence, they may work without adverse effects, as appears to be the case with split().

In other words, you should not, in reality, have had an a priori expectation that split() would work with a tibble either.

Rather than modifying the base R functions, like unsplit(), as you are suggesting, to be compatible with these third party objects, the burden should either be on you to use relevant tidyverse functions, or on the authors of the tidyverse to provide relevant class methods to provide that functionality.

Regards,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Error in unsplit() with tibbles

Peter Dalgaard-2
In reply to this post by Mario Annau
Yes. Nevermind tibbles, the [rep(NA, len),] construction only happens to work because len will always be >= the number of rows in  value[[1L]], witness

> (1:10)[rep(NA, 20)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> (1:20)[rep(NA, 10)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> (1:20)[rep(NA_integer_, 10)]
 [1] NA NA NA NA NA NA NA NA NA NA
> (1:10)[rep(NA_integer_, 20)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

-pd


> On 21 Nov 2020, at 16:55 , Mario Annau <[hidden email]> wrote:
>
> Hello,
>
> using the `unsplit()` function with tibbles currently leads to the
> following error:
>
>> mtcars_tb <- as_tibble(mtcars, rownames = NULL)
>> s <- split(mtcars_tb, mtcars_tb$gear)
>> unsplit(s, mtcars_tb$gear)
> Error: Must subset rows with a valid subscript vector.
> ℹ Logical subscripts must match the size of the indexed input.
> x Input has size 15 but subscript `rep(NA, len)` has size 32.
> Run `rlang::last_error()` to see where the error occurred.
>
> Tibble seems to (rightly) complain, that a logical vector has been used for
> subsetting which does not have the same length as the data.frame (rows).
> Since `NA` is a logical value, the subset should be changed to
> `NA_integer_` in `unsplit()`:
>
>> unsplit
> function (value, f, drop = FALSE)
> {
>    len <- length(if (is.list(f)) f[[1L]] else f)
>    if (is.data.frame(value[[1L]])) {
>        x <- value[[1L]][rep(*NA_integer_*, len), , drop = FALSE]
>        rownames(x) <- unsplit(lapply(value, rownames), f, drop = drop)
>    }
>    else x <- value[[1L]][rep(NA, len)]
>    split(x, f, drop = drop) <- value
>    x
> }
>
> Cheers,
> Mario
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Error in unsplit() with tibbles

Peter Dalgaard-2
In reply to this post by R devel mailing list
I get the sentiment, but this is really just bad coding (on my own part, I suspect), so we might as well just fix it...

-pd

> On 21 Nov 2020, at 17:42 , Marc Schwartz via R-devel <[hidden email]> wrote:
>
>
>> On Nov 21, 2020, at 10:55 AM, Mario Annau <[hidden email]> wrote:
>>
>> Hello,
>>
>> using the `unsplit()` function with tibbles currently leads to the
>> following error:
>>
>>> mtcars_tb <- as_tibble(mtcars, rownames = NULL)
>>> s <- split(mtcars_tb, mtcars_tb$gear)
>>> unsplit(s, mtcars_tb$gear)
>> Error: Must subset rows with a valid subscript vector.
>> ℹ Logical subscripts must match the size of the indexed input.
>> x Input has size 15 but subscript `rep(NA, len)` has size 32.
>> Run `rlang::last_error()` to see where the error occurred.
>>
>> Tibble seems to (rightly) complain, that a logical vector has been used for
>> subsetting which does not have the same length as the data.frame (rows).
>> Since `NA` is a logical value, the subset should be changed to
>> `NA_integer_` in `unsplit()`:
>>
>>> unsplit
>> function (value, f, drop = FALSE)
>> {
>>   len <- length(if (is.list(f)) f[[1L]] else f)
>>   if (is.data.frame(value[[1L]])) {
>>       x <- value[[1L]][rep(*NA_integer_*, len), , drop = FALSE]
>>       rownames(x) <- unsplit(lapply(value, rownames), f, drop = drop)
>>   }
>>   else x <- value[[1L]][rep(NA, len)]
>>   split(x, f, drop = drop) <- value
>>   x
>> }
>>
>> Cheers,
>> Mario
>
>
> Hi,
>
> Perhaps I am missing something, but if you are using objects, like tibbles, that are intended to be part of another environment, in this case the tidyverse, why would you not use functions to manipulate these objects that were specifically created in the other environment?
>
> I don't use the tidyverse, but it seems to me that to expect base R functions to work with objects not created in base R, is problematic, even though, perhaps by coincidence, they may work without adverse effects, as appears to be the case with split().
>
> In other words, you should not, in reality, have had an a priori expectation that split() would work with a tibble either.
>
> Rather than modifying the base R functions, like unsplit(), as you are suggesting, to be compatible with these third party objects, the burden should either be on you to use relevant tidyverse functions, or on the authors of the tidyverse to provide relevant class methods to provide that functionality.
>
> Regards,
>
> Marc Schwartz
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Error in unsplit() with tibbles

Mario Annau-2
Cool - thank you Peter!

@Marc: This is really not a tidyverse vs base-R debate and I personally
think that they should both work together for most parts. The common
environment is still R. But just to give you the full picture I also filed
a bug for tibbles (https://github.com/tidyverse/tibble/issues/829). With
these two fixes I think that split/unsplit would work for tibbles and users
(like me) just don't have to care in which "environments" they are working
in.

Cheers,
Mario


On Sat, 21 Nov 2020 at 17:54, Peter Dalgaard <[hidden email]> wrote:

> I get the sentiment, but this is really just bad coding (on my own part, I
> suspect), so we might as well just fix it...
>
> -pd
>
> > On 21 Nov 2020, at 17:42 , Marc Schwartz via R-devel <
> [hidden email]> wrote:
> >
> >
> >> On Nov 21, 2020, at 10:55 AM, Mario Annau <[hidden email]>
> wrote:
> >>
> >> Hello,
> >>
> >> using the `unsplit()` function with tibbles currently leads to the
> >> following error:
> >>
> >>> mtcars_tb <- as_tibble(mtcars, rownames = NULL)
> >>> s <- split(mtcars_tb, mtcars_tb$gear)
> >>> unsplit(s, mtcars_tb$gear)
> >> Error: Must subset rows with a valid subscript vector.
> >> ℹ Logical subscripts must match the size of the indexed input.
> >> x Input has size 15 but subscript `rep(NA, len)` has size 32.
> >> Run `rlang::last_error()` to see where the error occurred.
> >>
> >> Tibble seems to (rightly) complain, that a logical vector has been used
> for
> >> subsetting which does not have the same length as the data.frame (rows).
> >> Since `NA` is a logical value, the subset should be changed to
> >> `NA_integer_` in `unsplit()`:
> >>
> >>> unsplit
> >> function (value, f, drop = FALSE)
> >> {
> >>   len <- length(if (is.list(f)) f[[1L]] else f)
> >>   if (is.data.frame(value[[1L]])) {
> >>       x <- value[[1L]][rep(*NA_integer_*, len), , drop = FALSE]
> >>       rownames(x) <- unsplit(lapply(value, rownames), f, drop = drop)
> >>   }
> >>   else x <- value[[1L]][rep(NA, len)]
> >>   split(x, f, drop = drop) <- value
> >>   x
> >> }
> >>
> >> Cheers,
> >> Mario
> >
> >
> > Hi,
> >
> > Perhaps I am missing something, but if you are using objects, like
> tibbles, that are intended to be part of another environment, in this case
> the tidyverse, why would you not use functions to manipulate these objects
> that were specifically created in the other environment?
> >
> > I don't use the tidyverse, but it seems to me that to expect base R
> functions to work with objects not created in base R, is problematic, even
> though, perhaps by coincidence, they may work without adverse effects, as
> appears to be the case with split().
> >
> > In other words, you should not, in reality, have had an a priori
> expectation that split() would work with a tibble either.
> >
> > Rather than modifying the base R functions, like unsplit(), as you are
> suggesting, to be compatible with these third party objects, the burden
> should either be on you to use relevant tidyverse functions, or on the
> authors of the tidyverse to provide relevant class methods to provide that
> functionality.
> >
> > Regards,
> >
> > Marc Schwartz
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
>
>
>
>
>
>
>
>
>

--
Mario Annau
Founder and CEO
Quantargo

Tel: +43 1 348 44 55-11 | [hidden email]
www.quantargo.com

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Error in unsplit() with tibbles

R devel mailing list
Hi,

Peter, thanks for the clarification.


Mario, I was not looking to debate the pros and cons of each environment, simply to point out that expecting mutually compatible functionality is not generalizable, especially when third party authors can make structural changes to their objects over time, that can then make them incompatible with base R functions, even if they may be today.

That is a key basis for third party packages offering specific class methods, whether S3 or S4, for object classes that are unique to their packages. That approach provides the obvious level of transparency.

For the tidyverse folks to offer a variant of split() and unsplit() that have specific methods for tibbles would seem entirely reasonable, presuming that they don't have a philosophical barrier to doing so, in deference to other approaches that do conform to their preferred function syntax.

Regards,

Marc


> On Nov 21, 2020, at 12:04 PM, Mario Annau <[hidden email]> wrote:
>
> Cool - thank you Peter!
>
> @Marc: This is really not a tidyverse vs base-R debate and I personally think that they should both work together for most parts. The common environment is still R. But just to give you the full picture I also filed a bug for tibbles (https://github.com/tidyverse/tibble/issues/829 <https://github.com/tidyverse/tibble/issues/829>). With these two fixes I think that split/unsplit would work for tibbles and users (like me) just don't have to care in which "environments" they are working in.
>
> Cheers,
> Mario
>
>
> On Sat, 21 Nov 2020 at 17:54, Peter Dalgaard <[hidden email] <mailto:[hidden email]>> wrote:
> I get the sentiment, but this is really just bad coding (on my own part, I suspect), so we might as well just fix it...
>
> -pd
>
> > On 21 Nov 2020, at 17:42 , Marc Schwartz via R-devel <[hidden email] <mailto:[hidden email]>> wrote:
> >
> >
> >> On Nov 21, 2020, at 10:55 AM, Mario Annau <[hidden email] <mailto:[hidden email]>> wrote:
> >>
> >> Hello,
> >>
> >> using the `unsplit()` function with tibbles currently leads to the
> >> following error:
> >>
> >>> mtcars_tb <- as_tibble(mtcars, rownames = NULL)
> >>> s <- split(mtcars_tb, mtcars_tb$gear)
> >>> unsplit(s, mtcars_tb$gear)
> >> Error: Must subset rows with a valid subscript vector.
> >> ℹ Logical subscripts must match the size of the indexed input.
> >> x Input has size 15 but subscript `rep(NA, len)` has size 32.
> >> Run `rlang::last_error()` to see where the error occurred.
> >>
> >> Tibble seems to (rightly) complain, that a logical vector has been used for
> >> subsetting which does not have the same length as the data.frame (rows).
> >> Since `NA` is a logical value, the subset should be changed to
> >> `NA_integer_` in `unsplit()`:
> >>
> >>> unsplit
> >> function (value, f, drop = FALSE)
> >> {
> >>   len <- length(if (is.list(f)) f[[1L]] else f)
> >>   if (is.data.frame(value[[1L]])) {
> >>       x <- value[[1L]][rep(*NA_integer_*, len), , drop = FALSE]
> >>       rownames(x) <- unsplit(lapply(value, rownames), f, drop = drop)
> >>   }
> >>   else x <- value[[1L]][rep(NA, len)]
> >>   split(x, f, drop = drop) <- value
> >>   x
> >> }
> >>
> >> Cheers,
> >> Mario
> >
> >
> > Hi,
> >
> > Perhaps I am missing something, but if you are using objects, like tibbles, that are intended to be part of another environment, in this case the tidyverse, why would you not use functions to manipulate these objects that were specifically created in the other environment?
> >
> > I don't use the tidyverse, but it seems to me that to expect base R functions to work with objects not created in base R, is problematic, even though, perhaps by coincidence, they may work without adverse effects, as appears to be the case with split().
> >
> > In other words, you should not, in reality, have had an a priori expectation that split() would work with a tibble either.
> >
> > Rather than modifying the base R functions, like unsplit(), as you are suggesting, to be compatible with these third party objects, the burden should either be on you to use relevant tidyverse functions, or on the authors of the tidyverse to provide relevant class methods to provide that functionality.
> >
> > Regards,
> >
> > Marc Schwartz
> >

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel