nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Gabriel Becker-2
Hi all,

Apologies if this has been asked before (a quick google didn't  find it for
me),and I know this is a case of behaving as documented but its so
unintuitive (to me at least) that I figured I'd bring it up here anyway. I
figure its probably going to not be changed,  but I'm happy to submit a
patch if this is something R-core feels can/should change.

So I recently got bitten by the fact that

> nrow(rbind(character(), character()))

[1] 2


I was checking whether the result of an rbind call had more than one row,
and that unexpected returned true, causing all sorts of shenanigans
downstream as I'm sure you can imagine.

Now I know that from ?rbind

For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’)
>
>      are ignored unless the result would have zero rows (columns), for
>
>      S compatibility.  (Zero-extent matrices do not occur in S3 and are
>
>      not ignored in R.)
>

But there's a couple of things here. First, for the rowbind  case this
reads as "if there would be zero columns,  the vectors will not be
ignored". This wording implies to me that not ignoring the vectors is a
remedy to the "problem" of the potential for a zero-column return, but
thats not the case.  The result still has 0 columns, it just does not also
have zero rows. So even if the behavior is not changed, perhaps this
wording can be massaged for clarity?

The other issue, which I admit is likely a problem with my intuition, but
which I don't think I'm alone in having, is that even if I can't have a 0x0
matrix (which is what I'd prefer) I would have expected/preferred a 1x0
matrix, the reasoning being that if we must avoid a 0x0 return value, we
would do the  minimum required to avoid, which is to not ignore the first
length 0 vector, to ensure a non-zero-extent matrix, but then ignore the
remaining ones as they contain information for 0 new rows.

Of course I can program around this now that I know the behavior, but
again, its so unintuitive (even for someone with a fairly well developed
intuition for R's sometimes "quirky" behavior) that I figured I'd bring it
up.

Thoughts?

Best,
~G

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

hadley wickham
The existing behaviour seems inutitive to me. I would consider these
invariants for n vector x_i's each with size m:

* nrow(rbind(x_1, x_2, ..., x_n)) equals n
* ncol(rbind(x_1, x_2, ..., x_n)) equals m

Additionally, wouldn't you expect rbind(x_1[i], x_2[i]) to equal
rbind(x_1, x_2)[, i, drop = FALSE] ?

Hadley

On Thu, May 16, 2019 at 3:26 PM Gabriel Becker <[hidden email]> wrote:

>
> Hi all,
>
> Apologies if this has been asked before (a quick google didn't  find it for
> me),and I know this is a case of behaving as documented but its so
> unintuitive (to me at least) that I figured I'd bring it up here anyway. I
> figure its probably going to not be changed,  but I'm happy to submit a
> patch if this is something R-core feels can/should change.
>
> So I recently got bitten by the fact that
>
> > nrow(rbind(character(), character()))
>
> [1] 2
>
>
> I was checking whether the result of an rbind call had more than one row,
> and that unexpected returned true, causing all sorts of shenanigans
> downstream as I'm sure you can imagine.
>
> Now I know that from ?rbind
>
> For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’)
> >
> >      are ignored unless the result would have zero rows (columns), for
> >
> >      S compatibility.  (Zero-extent matrices do not occur in S3 and are
> >
> >      not ignored in R.)
> >
>
> But there's a couple of things here. First, for the rowbind  case this
> reads as "if there would be zero columns,  the vectors will not be
> ignored". This wording implies to me that not ignoring the vectors is a
> remedy to the "problem" of the potential for a zero-column return, but
> thats not the case.  The result still has 0 columns, it just does not also
> have zero rows. So even if the behavior is not changed, perhaps this
> wording can be massaged for clarity?
>
> The other issue, which I admit is likely a problem with my intuition, but
> which I don't think I'm alone in having, is that even if I can't have a 0x0
> matrix (which is what I'd prefer) I would have expected/preferred a 1x0
> matrix, the reasoning being that if we must avoid a 0x0 return value, we
> would do the  minimum required to avoid, which is to not ignore the first
> length 0 vector, to ensure a non-zero-extent matrix, but then ignore the
> remaining ones as they contain information for 0 new rows.
>
> Of course I can program around this now that I know the behavior, but
> again, its so unintuitive (even for someone with a fairly well developed
> intuition for R's sometimes "quirky" behavior) that I figured I'd bring it
> up.
>
> Thoughts?
>
> Best,
> ~G
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



--
http://hadley.nz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

robin hankin-3
Gabriel, you ask an insightful and instructive question. One of R's
great strengths is that we have a forum where this kind of edge-case
can be fruitfully discussed.
My interest in this would be the names of the arguments; in the magic
package I make heavy use of the dimnames of zero-extent arrays.

> rbind(a='x',b='y')
  [,1]
a "x"
b "y"

> rbind(a='x',b=character())
  [,1]
a "x"

> rbind(a=character(),b=character())

a
b

The first and third idiom are fine.  The result of the second one, in
which we rbind() a length-one to  a length-zero vector, is desirable
IMO on the grounds that the content of a two-row matrix cannot be
defined sensibly, so R takes the perfectly reasonable stance of
deciding to ignore the second argument...which carries with it the
implication that the name ('b')  be ignored too.  If the second
argument *could* be recycled, I would want the name, otherwise I
wouldn't.  And this is what R does.

best wishes,


[hidden email]



[hidden email]




On Fri, May 17, 2019 at 9:06 AM Hadley Wickham <[hidden email]> wrote:

>
> The existing behaviour seems inutitive to me. I would consider these
> invariants for n vector x_i's each with size m:
>
> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
> * ncol(rbind(x_1, x_2, ..., x_n)) equals m
>
> Additionally, wouldn't you expect rbind(x_1[i], x_2[i]) to equal
> rbind(x_1, x_2)[, i, drop = FALSE] ?
>
> Hadley
>
> On Thu, May 16, 2019 at 3:26 PM Gabriel Becker <[hidden email]> wrote:
> >
> > Hi all,
> >
> > Apologies if this has been asked before (a quick google didn't  find it for
> > me),and I know this is a case of behaving as documented but its so
> > unintuitive (to me at least) that I figured I'd bring it up here anyway. I
> > figure its probably going to not be changed,  but I'm happy to submit a
> > patch if this is something R-core feels can/should change.
> >
> > So I recently got bitten by the fact that
> >
> > > nrow(rbind(character(), character()))
> >
> > [1] 2
> >
> >
> > I was checking whether the result of an rbind call had more than one row,
> > and that unexpected returned true, causing all sorts of shenanigans
> > downstream as I'm sure you can imagine.
> >
> > Now I know that from ?rbind
> >
> > For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’)
> > >
> > >      are ignored unless the result would have zero rows (columns), for
> > >
> > >      S compatibility.  (Zero-extent matrices do not occur in S3 and are
> > >
> > >      not ignored in R.)
> > >
> >
> > But there's a couple of things here. First, for the rowbind  case this
> > reads as "if there would be zero columns,  the vectors will not be
> > ignored". This wording implies to me that not ignoring the vectors is a
> > remedy to the "problem" of the potential for a zero-column return, but
> > thats not the case.  The result still has 0 columns, it just does not also
> > have zero rows. So even if the behavior is not changed, perhaps this
> > wording can be massaged for clarity?
> >
> > The other issue, which I admit is likely a problem with my intuition, but
> > which I don't think I'm alone in having, is that even if I can't have a 0x0
> > matrix (which is what I'd prefer) I would have expected/preferred a 1x0
> > matrix, the reasoning being that if we must avoid a 0x0 return value, we
> > would do the  minimum required to avoid, which is to not ignore the first
> > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the
> > remaining ones as they contain information for 0 new rows.
> >
> > Of course I can program around this now that I know the behavior, but
> > again, its so unintuitive (even for someone with a fairly well developed
> > intuition for R's sometimes "quirky" behavior) that I figured I'd bring it
> > up.
> >
> > Thoughts?
> >
> > Best,
> > ~G
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
> --
> http://hadley.nz
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Gabriel Becker-2
In reply to this post by hadley wickham
Hi Hadley,

Thanks for the counterpoint. Response below.

On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <[hidden email]> wrote:

> The existing behaviour seems inutitive to me. I would consider these
> invariants for n vector x_i's each with size m:
>
> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
>

Personally, no I wouldn't. I would consider m==0 a degenerate case, where
there is no data, but I personally find matrices (or data.frames) with rows
but no columns a very strange concept. The converse is not true, I
understand the utility of columns but no rows, particularly in the
data.frame case, but rows with no columns are observations we didn't
observe anything about. Strange, imho.

Also, I know that you said *each with size m*, but the generalization would
be

for n vectors with m = max(length(x_i))
nrow(rbind(x_1, ..., x_n)) = m

And that is the behavior now as documented, but *only* when length(x_i) >0
for all i (or, currently, when m == 0, so all vectors are length 0).

> nrow(rbind(1:5, numeric()))

[1] 1


So that is where I was coming from. Length-zero vectors don't add rows
because they contain no observed information.

I do see where you'er coming from, but it does make interrogating
nrow(rbind(x_1, ..., x_n)) NOT mean  (give me the number of observations
for which I have data), which is what it means in non-degenerate contexts,
and that seems pretty important too.

Robin does also have an interesting point below about argument names, but
I'll leave that for another mail.

Best,
~G

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Gabriel Becker-2
On Thu, May 16, 2019 at 3:47 PM Gabriel Becker <[hidden email]>
wrote:

> Hi Hadley,
>
> Thanks for the counterpoint. Response below.
>
> On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <[hidden email]>
> wrote:
>
>> The existing behaviour seems inutitive to me. I would consider these
>> invariants for n vector x_i's each with size m:
>>
>> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
>>
>
> Personally, no I wouldn't. I would consider m==0 a degenerate case, where
> there is no data, but I personally find matrices (or data.frames) with rows
> but no columns a very strange concept. The converse is not true, I
> understand the utility of columns but no rows, particularly in the
> data.frame case, but rows with no columns are observations we didn't
> observe anything about. Strange, imho.
>
> Also, I know that you said *each with size m*, but the generalization
> would be
>
> for n vectors with m = max(length(x_i))
> nrow(rbind(x_1, ..., x_n)) = m
>

Ugh, obviously that should say ==n, not =m and then we have
ncol(rbind(x_1, ..., x_n)) == m

~G



>
> And that is the behavior now as documented, but *only* when length(x_i)
> >0 for all i (or, currently, when m == 0, so all vectors are length 0).
>
> > nrow(rbind(1:5, numeric()))
>
> [1] 1
>
>
> So that is where I was coming from. Length-zero vectors don't add rows
> because they contain no observed information.
>
> I do see where you'er coming from, but it does make interrogating
> nrow(rbind(x_1, ..., x_n)) NOT mean  (give me the number of observations
> for which I have data), which is what it means in non-degenerate contexts,
> and that seems pretty important too.
>
> Robin does also have an interesting point below about argument names, but
> I'll leave that for another mail.
>
> Best,
> ~G
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Pages, Herve
In reply to this post by Gabriel Becker-2
Hi Gabe,

   ncol(data.frame(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
   # [1] 2

   ncol(data.frame(aa="a", AA="A"))
   # [1] 2

   ncol(data.frame(aa=character(0), AA=character(0)))
   # [1] 2

   ncol(cbind(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
   # [1] 2

   ncol(cbind(aa="a", AA="A"))
   # [1] 2

   ncol(cbind(aa=character(0), AA=character(0)))
   # [1] 2

   nrow(rbind(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
   # [1] 2

   nrow(rbind(aa="a", AA="A"))
   # [1] 2

   nrow(rbind(aa=character(0), AA=character(0)))
   # [1] 2

hmmm... not sure why ncol(cbind(aa=character(0), AA=character(0))) or
nrow(rbind(aa=character(0), AA=character(0))) should do anything
different from what they do.

In my experience, and more generally speaking, the desire to treat
0-length vectors as a special case that deviates from the
non-zero-length case has never been productive.

H.


On 5/16/19 13:17, Gabriel Becker wrote:

> Hi all,
>
> Apologies if this has been asked before (a quick google didn't  find it for
> me),and I know this is a case of behaving as documented but its so
> unintuitive (to me at least) that I figured I'd bring it up here anyway. I
> figure its probably going to not be changed,  but I'm happy to submit a
> patch if this is something R-core feels can/should change.
>
> So I recently got bitten by the fact that
>
>> nrow(rbind(character(), character()))
> [1] 2
>
>
> I was checking whether the result of an rbind call had more than one row,
> and that unexpected returned true, causing all sorts of shenanigans
> downstream as I'm sure you can imagine.
>
> Now I know that from ?rbind
>
> For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’)
>>       are ignored unless the result would have zero rows (columns), for
>>
>>       S compatibility.  (Zero-extent matrices do not occur in S3 and are
>>
>>       not ignored in R.)
>>
> But there's a couple of things here. First, for the rowbind  case this
> reads as "if there would be zero columns,  the vectors will not be
> ignored". This wording implies to me that not ignoring the vectors is a
> remedy to the "problem" of the potential for a zero-column return, but
> thats not the case.  The result still has 0 columns, it just does not also
> have zero rows. So even if the behavior is not changed, perhaps this
> wording can be massaged for clarity?
>
> The other issue, which I admit is likely a problem with my intuition, but
> which I don't think I'm alone in having, is that even if I can't have a 0x0
> matrix (which is what I'd prefer) I would have expected/preferred a 1x0
> matrix, the reasoning being that if we must avoid a 0x0 return value, we
> would do the  minimum required to avoid, which is to not ignore the first
> length 0 vector, to ensure a non-zero-extent matrix, but then ignore the
> remaining ones as they contain information for 0 new rows.
>
> Of course I can program around this now that I know the behavior, but
> again, its so unintuitive (even for someone with a fairly well developed
> intuition for R's sometimes "quirky" behavior) that I figured I'd bring it
> up.
>
> Thoughts?
>
> Best,
> ~G
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=WzRf-6PuyYeprM0v55lLX2U-_hYGf__5yf3h6JNdJH0&s=nn76KQtp4viR66768zoSNcH7WpG77Pp8LyhOwYOs674&e=

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Gabriel Becker-2
Hi Herve,

Inline.



On Thu, May 16, 2019 at 4:45 PM Pages, Herve <[hidden email]> wrote:

> Hi Gabe,
>
>    ncol(data.frame(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
>    # [1] 2
>
>    ncol(data.frame(aa="a", AA="A"))
>    # [1] 2
>
>    ncol(data.frame(aa=character(0), AA=character(0)))
>    # [1] 2
>
>    ncol(cbind(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
>    # [1] 2
>
>    ncol(cbind(aa="a", AA="A"))
>    # [1] 2
>
>    ncol(cbind(aa=character(0), AA=character(0)))
>    # [1] 2
>
>    nrow(rbind(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
>    # [1] 2
>
>    nrow(rbind(aa="a", AA="A"))
>    # [1] 2
>
>    nrow(rbind(aa=character(0), AA=character(0)))
>    # [1] 2
>

Sure, but

> nrow(rbind(aa = c("a", "b", "c"), AA = c("a", "b", "c")))

[1] 2

> nrow(rbind(aa = c("a", "b", "c"), AA = "a"))

[1] 2

> nrow(rbind(aa = c("a", "b", "c"), AA = character()))
[1] 1

So even if I ultimately "lose"  this debate (which really wouldn't shock
me, even if R-core did agree with me there's backwards compatibility to
consider), you have to concede that the current behavior is more
complicated than the above is acknowledging.

By rights of the invariance that you and Hadley are advocating,  as far as
I understand it, the last should give 2 rows, one of which is all NAs,
rather than giving only one row as it currently does (and, I assume?,
always has).

So there are two different behavior patterns that could coherently (and
internally-consistently) be generalized to apply to the  rbind(character(),
character()) case, not just one. I'm making the case that the other one
(that length 0 vectors do not add rows because they don't contain data)
would be equally valid, and to N>1 people, at least equally intuitive.

Best,
~G

>
> hmmm... not sure why ncol(cbind(aa=character(0), AA=character(0))) or
> nrow(rbind(aa=character(0), AA=character(0))) should do anything
> different from what they do.
>
> In my experience, and more generally speaking, the desire to treat
> 0-length vectors as a special case that deviates from the
> non-zero-length case has never been productive.
>
> H.
>
>
> On 5/16/19 13:17, Gabriel Becker wrote:
> > Hi all,
> >
> > Apologies if this has been asked before (a quick google didn't  find it
> for
> > me),and I know this is a case of behaving as documented but its so
> > unintuitive (to me at least) that I figured I'd bring it up here anyway.
> I
> > figure its probably going to not be changed,  but I'm happy to submit a
> > patch if this is something R-core feels can/should change.
> >
> > So I recently got bitten by the fact that
> >
> >> nrow(rbind(character(), character()))
> > [1] 2
> >
> >
> > I was checking whether the result of an rbind call had more than one row,
> > and that unexpected returned true, causing all sorts of shenanigans
> > downstream as I'm sure you can imagine.
> >
> > Now I know that from ?rbind
> >
> > For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’)
> >>       are ignored unless the result would have zero rows (columns), for
> >>
> >>       S compatibility.  (Zero-extent matrices do not occur in S3 and are
> >>
> >>       not ignored in R.)
> >>
> > But there's a couple of things here. First, for the rowbind  case this
> > reads as "if there would be zero columns,  the vectors will not be
> > ignored". This wording implies to me that not ignoring the vectors is a
> > remedy to the "problem" of the potential for a zero-column return, but
> > thats not the case.  The result still has 0 columns, it just does not
> also
> > have zero rows. So even if the behavior is not changed, perhaps this
> > wording can be massaged for clarity?
> >
> > The other issue, which I admit is likely a problem with my intuition, but
> > which I don't think I'm alone in having, is that even if I can't have a
> 0x0
> > matrix (which is what I'd prefer) I would have expected/preferred a 1x0
> > matrix, the reasoning being that if we must avoid a 0x0 return value, we
> > would do the  minimum required to avoid, which is to not ignore the first
> > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the
> > remaining ones as they contain information for 0 new rows.
> >
> > Of course I can program around this now that I know the behavior, but
> > again, its so unintuitive (even for someone with a fairly well developed
> > intuition for R's sometimes "quirky" behavior) that I figured I'd bring
> it
> > up.
> >
> > Thoughts?
> >
> > Best,
> > ~G
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=WzRf-6PuyYeprM0v55lLX2U-_hYGf__5yf3h6JNdJH0&s=nn76KQtp4viR66768zoSNcH7WpG77Pp8LyhOwYOs674&e=
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Pages, Herve
On 5/16/19 17:48, Gabriel Becker wrote:

Hi Herve,

Inline.



On Thu, May 16, 2019 at 4:45 PM Pages, Herve <[hidden email]<mailto:[hidden email]>> wrote:
Hi Gabe,

   ncol(data.frame(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
   # [1] 2

   ncol(data.frame(aa="a", AA="A"))
   # [1] 2

   ncol(data.frame(aa=character(0), AA=character(0)))
   # [1] 2

   ncol(cbind(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
   # [1] 2

   ncol(cbind(aa="a", AA="A"))
   # [1] 2

   ncol(cbind(aa=character(0), AA=character(0)))
   # [1] 2

   nrow(rbind(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
   # [1] 2

   nrow(rbind(aa="a", AA="A"))
   # [1] 2

   nrow(rbind(aa=character(0), AA=character(0)))
   # [1] 2

Sure, but


> nrow(rbind(aa = c("a", "b", "c"), AA = c("a", "b", "c")))

[1] 2

> nrow(rbind(aa = c("a", "b", "c"), AA = "a"))

[1] 2

> nrow(rbind(aa = c("a", "b", "c"), AA = character()))

[1] 1


Ah, I see now.

But:

  > data.frame(aa = c("a", "b", "c"), AA = character())
  Error in data.frame(aa = c("a", "b", "c"), AA = character()) :
    arguments imply differing number of rows: 3, 0

and

  > mapply(`*`, 1:5, integer(0))
  Error in mapply(`*`, 1:5, integer(0)) :
    zero-length inputs cannot be mixed with those of non-zero length

So I would declare rbind(aa = c("a", "b", "c"), AA = character()) inconsistent rather than making the case that rbind(aa = character(), AA = character()) needs to change.

Cheers,

H.


So even if I ultimately "lose"  this debate (which really wouldn't shock me, even if R-core did agree with me there's backwards compatibility to consider), you have to concede that the current behavior is more complicated than the above is acknowledging.

By rights of the invariance that you and Hadley are advocating,  as far as I understand it, the last should give 2 rows, one of which is all NAs, rather than giving only one row as it currently does (and, I assume?,  always has).

So there are two different behavior patterns that could coherently (and internally-consistently) be generalized to apply to the  rbind(character(), character()) case, not just one. I'm making the case that the other one (that length 0 vectors do not add rows because they don't contain data) would be equally valid, and to N>1 people, at least equally intuitive.

Best,
~G

hmmm... not sure why ncol(cbind(aa=character(0), AA=character(0))) or
nrow(rbind(aa=character(0), AA=character(0))) should do anything
different from what they do.

In my experience, and more generally speaking, the desire to treat
0-length vectors as a special case that deviates from the
non-zero-length case has never been productive.

H.


On 5/16/19 13:17, Gabriel Becker wrote:

> Hi all,
>
> Apologies if this has been asked before (a quick google didn't  find it for
> me),and I know this is a case of behaving as documented but its so
> unintuitive (to me at least) that I figured I'd bring it up here anyway. I
> figure its probably going to not be changed,  but I'm happy to submit a
> patch if this is something R-core feels can/should change.
>
> So I recently got bitten by the fact that
>
>> nrow(rbind(character(), character()))
> [1] 2
>
>
> I was checking whether the result of an rbind call had more than one row,
> and that unexpected returned true, causing all sorts of shenanigans
> downstream as I'm sure you can imagine.
>
> Now I know that from ?rbind
>
> For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’)
>>       are ignored unless the result would have zero rows (columns), for
>>
>>       S compatibility.  (Zero-extent matrices do not occur in S3 and are
>>
>>       not ignored in R.)
>>
> But there's a couple of things here. First, for the rowbind  case this
> reads as "if there would be zero columns,  the vectors will not be
> ignored". This wording implies to me that not ignoring the vectors is a
> remedy to the "problem" of the potential for a zero-column return, but
> thats not the case.  The result still has 0 columns, it just does not also
> have zero rows. So even if the behavior is not changed, perhaps this
> wording can be massaged for clarity?
>
> The other issue, which I admit is likely a problem with my intuition, but
> which I don't think I'm alone in having, is that even if I can't have a 0x0
> matrix (which is what I'd prefer) I would have expected/preferred a 1x0
> matrix, the reasoning being that if we must avoid a 0x0 return value, we
> would do the  minimum required to avoid, which is to not ignore the first
> length 0 vector, to ensure a non-zero-extent matrix, but then ignore the
> remaining ones as they contain information for 0 new rows.
>
> Of course I can program around this now that I know the behavior, but
> again, its so unintuitive (even for someone with a fairly well developed
> intuition for R's sometimes "quirky" behavior) that I figured I'd bring it
> up.
>
> Thoughts?
>
> Best,
> ~G
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email]<mailto:[hidden email]> mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=WzRf-6PuyYeprM0v55lLX2U-_hYGf__5yf3h6JNdJH0&s=nn76KQtp4viR66768zoSNcH7WpG77Pp8LyhOwYOs674&e=

--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]<mailto:[hidden email]>
Phone:  (206) 667-5791
Fax:    (206) 667-1319


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [hidden email]<mailto:[hidden email]>
Phone:  (206) 667-5791
Fax:    (206) 667-1319


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Jan Gorecki
Hi Gabriel

> Personally, no I wouldn't. I would consider m==0 a degenerate case, where
there is no data, but I personally find matrices (or data.frames) with rows
but no columns a very strange concept.

This distinction between matrix and data.frames is the crux in this case.
From the dimensional modelling point of view, matrix can have non-zero
rows and zero columns, but data.frame (assuming it maps to database
table structure) should never have non-zero rows and zero columns.
This kind of issue was raised before in our issue tracker:
https://github.com/Rdatatable/data.table/issues/2422
You should find that discussion useful.

Best,
Jan Gorecki


On Fri, May 17, 2019 at 8:11 AM Pages, Herve <[hidden email]> wrote:

>
> On 5/16/19 17:48, Gabriel Becker wrote:
>
> Hi Herve,
>
> Inline.
>
>
>
> On Thu, May 16, 2019 at 4:45 PM Pages, Herve <[hidden email]<mailto:[hidden email]>> wrote:
> Hi Gabe,
>
>    ncol(data.frame(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
>    # [1] 2
>
>    ncol(data.frame(aa="a", AA="A"))
>    # [1] 2
>
>    ncol(data.frame(aa=character(0), AA=character(0)))
>    # [1] 2
>
>    ncol(cbind(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
>    # [1] 2
>
>    ncol(cbind(aa="a", AA="A"))
>    # [1] 2
>
>    ncol(cbind(aa=character(0), AA=character(0)))
>    # [1] 2
>
>    nrow(rbind(aa=c("a", "b", "c"), AA=c("A", "B", "C")))
>    # [1] 2
>
>    nrow(rbind(aa="a", AA="A"))
>    # [1] 2
>
>    nrow(rbind(aa=character(0), AA=character(0)))
>    # [1] 2
>
> Sure, but
>
>
> > nrow(rbind(aa = c("a", "b", "c"), AA = c("a", "b", "c")))
>
> [1] 2
>
> > nrow(rbind(aa = c("a", "b", "c"), AA = "a"))
>
> [1] 2
>
> > nrow(rbind(aa = c("a", "b", "c"), AA = character()))
>
> [1] 1
>
>
> Ah, I see now.
>
> But:
>
>   > data.frame(aa = c("a", "b", "c"), AA = character())
>   Error in data.frame(aa = c("a", "b", "c"), AA = character()) :
>     arguments imply differing number of rows: 3, 0
>
> and
>
>   > mapply(`*`, 1:5, integer(0))
>   Error in mapply(`*`, 1:5, integer(0)) :
>     zero-length inputs cannot be mixed with those of non-zero length
>
> So I would declare rbind(aa = c("a", "b", "c"), AA = character()) inconsistent rather than making the case that rbind(aa = character(), AA = character()) needs to change.
>
> Cheers,
>
> H.
>
>
> So even if I ultimately "lose"  this debate (which really wouldn't shock me, even if R-core did agree with me there's backwards compatibility to consider), you have to concede that the current behavior is more complicated than the above is acknowledging.
>
> By rights of the invariance that you and Hadley are advocating,  as far as I understand it, the last should give 2 rows, one of which is all NAs, rather than giving only one row as it currently does (and, I assume?,  always has).
>
> So there are two different behavior patterns that could coherently (and internally-consistently) be generalized to apply to the  rbind(character(), character()) case, not just one. I'm making the case that the other one (that length 0 vectors do not add rows because they don't contain data) would be equally valid, and to N>1 people, at least equally intuitive.
>
> Best,
> ~G
>
> hmmm... not sure why ncol(cbind(aa=character(0), AA=character(0))) or
> nrow(rbind(aa=character(0), AA=character(0))) should do anything
> different from what they do.
>
> In my experience, and more generally speaking, the desire to treat
> 0-length vectors as a special case that deviates from the
> non-zero-length case has never been productive.
>
> H.
>
>
> On 5/16/19 13:17, Gabriel Becker wrote:
> > Hi all,
> >
> > Apologies if this has been asked before (a quick google didn't  find it for
> > me),and I know this is a case of behaving as documented but its so
> > unintuitive (to me at least) that I figured I'd bring it up here anyway. I
> > figure its probably going to not be changed,  but I'm happy to submit a
> > patch if this is something R-core feels can/should change.
> >
> > So I recently got bitten by the fact that
> >
> >> nrow(rbind(character(), character()))
> > [1] 2
> >
> >
> > I was checking whether the result of an rbind call had more than one row,
> > and that unexpected returned true, causing all sorts of shenanigans
> > downstream as I'm sure you can imagine.
> >
> > Now I know that from ?rbind
> >
> > For ‘cbind’ (‘rbind’), vectors of zero length (including ‘NULL’)
> >>       are ignored unless the result would have zero rows (columns), for
> >>
> >>       S compatibility.  (Zero-extent matrices do not occur in S3 and are
> >>
> >>       not ignored in R.)
> >>
> > But there's a couple of things here. First, for the rowbind  case this
> > reads as "if there would be zero columns,  the vectors will not be
> > ignored". This wording implies to me that not ignoring the vectors is a
> > remedy to the "problem" of the potential for a zero-column return, but
> > thats not the case.  The result still has 0 columns, it just does not also
> > have zero rows. So even if the behavior is not changed, perhaps this
> > wording can be massaged for clarity?
> >
> > The other issue, which I admit is likely a problem with my intuition, but
> > which I don't think I'm alone in having, is that even if I can't have a 0x0
> > matrix (which is what I'd prefer) I would have expected/preferred a 1x0
> > matrix, the reasoning being that if we must avoid a 0x0 return value, we
> > would do the  minimum required to avoid, which is to not ignore the first
> > length 0 vector, to ensure a non-zero-extent matrix, but then ignore the
> > remaining ones as they contain information for 0 new rows.
> >
> > Of course I can program around this now that I know the behavior, but
> > again, its so unintuitive (even for someone with a fairly well developed
> > intuition for R's sometimes "quirky" behavior) that I figured I'd bring it
> > up.
> >
> > Thoughts?
> >
> > Best,
> > ~G
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email]<mailto:[hidden email]> mailing list
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=WzRf-6PuyYeprM0v55lLX2U-_hYGf__5yf3h6JNdJH0&s=nn76KQtp4viR66768zoSNcH7WpG77Pp8LyhOwYOs674&e=
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]<mailto:[hidden email]>
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: [hidden email]<mailto:[hidden email]>
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Abby Spurdle
In reply to this post by Gabriel Becker-2
Herve Pages wrote:

> In my experience, and more generally speaking, the desire to treat
> 0-length vectors as a special case that deviates from the
> non-zero-length case has never been productive.

Good idea.

Gabriel Becker Wrote:

> > nrow(rbind(aa = c("a", "b", "c"), AA = character()))
> [1] 1

> By rights of the invariance that you and Hadley are advocating,  as far as
> I understand it, the last should give 2 rows, one of which is all NAs,
> rather than giving only one row as it currently does (and, I assume?,
> always has).

I think, ideally, this example should generate an error or a warning.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Martin Maechler
In reply to this post by Gabriel Becker-2
>>>>> Gabriel Becker
>>>>>     on Thu, 16 May 2019 15:47:57 -0700 writes:

    > Hi Hadley,
    > Thanks for the counterpoint. Response below.

    > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <[hidden email]> wrote:

    >> The existing behaviour seems inutitive to me. I would consider these
    >> invariants for n vector x_i's each with size m:
    >>
    >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
    >>

    > Personally, no I wouldn't. I would consider m==0 a degenerate case, where
    > there is no data, but I personally find matrices (or data.frames) with rows
    > but no columns a very strange concept. The converse is not true, I
    > understand the utility of columns but no rows, particularly in the
    > data.frame case, but rows with no columns are observations we didn't
    > observe anything about. Strange, imho.

Gabe, here I have to very strongly disagree.

Matrices (and higher order Arrays)  are  always definitely to
behave "symmetrically" / "uniformly" with respect to all of their dimensions.

We (and the S developers before us) have always taken a lot of
care trying to ensure that this is true.

So for the matrix case, if rows and columns behaved differently
that would be a bug "by definition".

Of course there's one thing where this uniformity / symmetry
must be violated: in the coercion from and to atomic vectors:
There, 'by column' (generalized for arrays to "earlier dimensions vary faster
than later one") has been chosen, not the least because this had
been adapted for Fortran (first, AFAIK) and all related ABIs
dealing with Matrix vector arithmetic for very good (numerical,
performance, known convention) reasons that enabled to know how
fast numerical linear algebra should be implemented.

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Gabriel Becker-2
Hi Martin,

Thanks for chiming in. Responses inline.

On Fri, May 17, 2019 at 12:32 AM Martin Maechler <[hidden email]>
wrote:

> >>>>> Gabriel Becker
> >>>>>     on Thu, 16 May 2019 15:47:57 -0700 writes:
>
>     > Hi Hadley,
>     > Thanks for the counterpoint. Response below.
>
>     > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <[hidden email]>
> wrote:
>
>     >> The existing behaviour seems inutitive to me. I would consider these
>     >> invariants for n vector x_i's each with size m:
>     >>
>     >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
>     >>
>
>     > Personally, no I wouldn't. I would consider m==0 a degenerate case,
> where
>     > there is no data, but I personally find matrices (or data.frames)
> with rows
>     > but no columns a very strange concept. The converse is not true, I
>     > understand the utility of columns but no rows, particularly in the
>     > data.frame case, but rows with no columns are observations we didn't
>     > observe anything about. Strange, imho.
>
> Gabe, here I have to very strongly disagree.
>
> Matrices (and higher order Arrays)  are  always definitely to
> behave "symmetrically" / "uniformly" with respect to all of their
> dimensions.
>
> We (and the S developers before us) have always taken a lot of
> care trying to ensure that this is true.
>
> So for the matrix case, if rows and columns behaved differently
> that would be a bug "by definition".
>

I realize now I could have been  clearer/more  explicit about this, but I
wasn't  arguing that the behavior should be different between columns and
rows, just that the behavior in the rows case didn't necessarily make a ton
of sense to me.  I was arguing that a change to both rbind and cbind be
considered when all length zero vectors are passed, not that rbind change
without cbind also changing. I will admit even here to feeling much more
strongly about the data.frame case.

That said, I do see that the cbind/columns argument seems harder (though
not impossible) for me to make. And maybe that's a good enough reason not
to consider such a change, because as I say, I agree the symmetry is
important, and would (also) want  cbind to change the same way rbind did if
such a change  happened, and that might bother many? more people than the
rbind case would. Maybe not though, based on the other responses in the
thread.

Honestly,  the most intuitive thing for me if you rbind or cbind a bunch of
length zero vectors together would be a  0x0 matrix, at  the very least in
the non-named arguments case. Its  a matrix with 0 elements in it, after
all. It seems perhaps that my intuition  is just somewhat  non-standard
though.


> Of course there's one thing where this uniformity / symmetry
> must be violated: in the coercion from and to atomic vectors:
> There, 'by column' (generalized for arrays to "earlier dimensions vary
> faster
> than later one") has been chosen, not the least because this had
> been adapted for Fortran (first, AFAIK) and all related ABIs
> dealing with Matrix vector arithmetic for very good (numerical,
> performance, known convention) reasons that enabled to know how
> fast numerical linear algebra should be implemented.
>

I do understand here, and would never suggest anything  that could damage
numerical linear algebra capabilities, in R or more broadly. That said, can
numerical algebra routines operate meaningfully in the degerate
one/both/all dimensions are 0 case anyway? Maybe they do, I'd be somewhat
surprised but not my area of expertise.

 Best,
~G

>
> Martin
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: nrow(rbind(character(), character())) returns 2 (as documented but very unintuitive, IMHO)

Martin Maechler
>>>>> Gabriel Becker
>>>>>     on Fri, 17 May 2019 01:06:11 -0700 writes:

    > Hi Martin,
    > Thanks for chiming in. Responses inline.

    > On Fri, May 17, 2019 at 12:32 AM Martin Maechler <[hidden email]>
    > wrote:

    >> >>>>> Gabriel Becker
    >> >>>>>     on Thu, 16 May 2019 15:47:57 -0700 writes:
    >>
    >> > Hi Hadley,
    >> > Thanks for the counterpoint. Response below.
    >>
    >> > On Thu, May 16, 2019 at 1:59 PM Hadley Wickham <[hidden email]>
    >> wrote:
    >>
    >> >> The existing behaviour seems inutitive to me. I would consider these
    >> >> invariants for n vector x_i's each with size m:
    >> >>
    >> >> * nrow(rbind(x_1, x_2, ..., x_n)) equals n
    >> >>
    >>
    >> > Personally, no I wouldn't. I would consider m==0 a degenerate case,
    >> where
    >> > there is no data, but I personally find matrices (or data.frames)
    >> with rows
    >> > but no columns a very strange concept. The converse is not true, I
    >> > understand the utility of columns but no rows, particularly in the
    >> > data.frame case, but rows with no columns are observations we didn't
    >> > observe anything about. Strange, imho.
    >>
    >> Gabe, here I have to very strongly disagree.
    >>
    >> Matrices (and higher order Arrays)  are  always definitely to
    >> behave "symmetrically" / "uniformly" with respect to all of their
    >> dimensions.
    >>
    >> We (and the S developers before us) have always taken a lot of
    >> care trying to ensure that this is true.
    >>
    >> So for the matrix case, if rows and columns behaved differently
    >> that would be a bug "by definition".
    >>

    > I realize now I could have been  clearer/more  explicit about this, but I
    > wasn't  arguing that the behavior should be different between columns and
    > rows, just that the behavior in the rows case didn't necessarily make a ton
    > of sense to me.  I was arguing that a change to both rbind and cbind be
    > considered when all length zero vectors are passed, not that rbind change
    > without cbind also changing. I will admit even here to feeling much more
    > strongly about the data.frame case.

    > That said, I do see that the cbind/columns argument seems harder (though
    > not impossible) for me to make. And maybe that's a good enough reason not
    > to consider such a change, because as I say, I agree the symmetry is
    > important, and would (also) want  cbind to change the same way rbind did if
    > such a change  happened, and that might bother many? more people than the
    > rbind case would. Maybe not though, based on the other responses in the
    > thread.

    > Honestly,  the most intuitive thing for me if you rbind or cbind a bunch of
    > length zero vectors together would be a  0x0 matrix, at  the very least in
    > the non-named arguments case. Its  a matrix with 0 elements in it, after
    > all. It seems perhaps that my intuition  is just somewhat  non-standard
    > though.

I think  your "problem"  may be that you've not appreciated yet
the importance of   {0 x p}  and {n x 0}  matrices  and would
think all of these should be  {0 x 0} ?

Believe me we did quite a bit of reasoning and looking at
associative law and transitiveness etc at the time, which I can't easily
recall, but believe me that it has been very beneficial to
consistently deal with  n x 0   and  0 x d  matrices :
Much of R code could be simplified / automagically worked
correctly in edge cases, once such matrices were fulfilling
basic consistency identities.

Martin

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel