dimnames incoherence?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

dimnames incoherence?

Sokol Serguei
Hi,

I was bitten by a little incoherence in dimnames assignment or may be I
missed some point.
Here is the case. If I assign row names via dimnames(a)[[1]], when
nrow(a)=1 then an error is thrown. But if I do the same when nrow(a) > 1
it's OK. Is one of this case works unexpectedly? Both? Neither?

a=as.matrix(1)
dimnames(a)[[1]]="a" # error: 'dimnames' must be a list

aa=as.matrix(1:2)
dimnames(aa)[[1]]=c("a", "b") # OK

In the second case, dimnames(aa) is not a list (like in the first case)
but it works.
I would expect that the both work or neither.

Your thoughts are welcome.
Best,
Serguei.

PS the same apply for dimnames(a)[[2]]<-.

 > sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Mageia 7

Matrix products: default
BLAS/LAPACK: /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so

locale:
  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets methods
[8] base

other attached packages:
[1] multbxxc_1.0.1            rmumps_5.2.1-11
[3] arrApply_2.1              RcppArmadillo_0.9.800.4.0
[5] Rcpp_1.0.3                slam_0.1-47
[7] nnls_1.4

loaded via a namespace (and not attached):
[1] compiler_3.6.1   tools_3.6.1      codetools_0.2-16

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: dimnames incoherence?

Martin Maechler
>>>>> Serguei Sokol
>>>>>     on Wed, 19 Feb 2020 15:21:21 +0100 writes:

    > Hi,
    > I was bitten by a little incoherence in dimnames assignment or may be I
    > missed some point.
    > Here is the case. If I assign row names via dimnames(a)[[1]], when
    > nrow(a)=1 then an error is thrown. But if I do the same when nrow(a) > 1
    > it's OK. Is one of this case works unexpectedly? Both? Neither?

    > a=as.matrix(1)
    > dimnames(a)[[1]]="a" # error: 'dimnames' must be a list

    > aa=as.matrix(1:2)
    > dimnames(aa)[[1]]=c("a", "b") # OK

    > In the second case, dimnames(aa) is not a list (like in the first case)
    > but it works.
    > I would expect that the both work or neither.

I agree (even though I'm strongly advising people to use '<-'
  instead of '=');
which in this case helps you get the name of the function really
involved:  It is  `dimnames<-`  (which is implemented in C
entirely, for matrices and other arrays).


    > Your thoughts are welcome.

I think we'd be happy if you report this formally on R's
bugzilla - https://bugs.r-project.org/ - as a bug.

  --> https://www.r-project.org/bugs.html

From reading bugs.html you note that you should ask for an account there;
as I'm one of the people who get such request by e-mail, in this
case, I can do it directly (if you confirm you'd want in a
private e-mail).

    > Best,
    > Serguei.

    > PS the same apply for dimnames(a)[[2]]<-.

(of course)

NB *and*  importantly, the buglet is still in current versions of R

Best,
Martin

    >> sessionInfo()
    > R version 3.6.1 (2019-07-05)
    > Platform: x86_64-pc-linux-gnu (64-bit)
    > Running under: Mageia 7

    > Matrix products: default
    > BLAS/LAPACK: /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so

    > locale:
    >  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
    >  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
    >  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
    >  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
    >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
    > [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

    > attached base packages:
    > [1] parallel  stats     graphics  grDevices utils     datasets methods
    > [8] base

    > other attached packages:
    > [1] multbxxc_1.0.1            rmumps_5.2.1-11
    > [3] arrApply_2.1              RcppArmadillo_0.9.800.4.0
    > [5] Rcpp_1.0.3                slam_0.1-47
    > [7] nnls_1.4

    > loaded via a namespace (and not attached):
    > [1] compiler_3.6.1   tools_3.6.1      codetools_0.2-16

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: dimnames incoherence?

R devel mailing list
In reply to this post by Sokol Serguei
How far would you like to go with the automatic creation of dimnames in
nested replacement operations on arrays?  It currently works nicely with [<-
   > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[3] <- list("One")
   > str(a)
    num[1:2, 0 , 1]
    - attr(*, "dimnames")=List of 3
     ..$ : NULL
     ..$ : NULL
     ..$ : chr "One"

It works most of the time (except for length=1) for [[<-
  > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[[1]] <- c("X1","X2")
  > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[[2]] <- character()
  > a <- array(numeric(), dim=c(2,0,1)); dimnames(a)[[3]] <- "Z1"
  Error in dimnames(a)[[3]] <- "Z1" : 'dimnames' must be a list

It does not work at all for names<-.
> a <- array(numeric(), dim=c(2,0,1)); names(dimnames(a)) <- c("X","Y","Z")
Error in names(dimnames(a)) <- c("X", "Y", "Z") :
  attempt to set an attribute on NULL
> a <- array(numeric(), dim=c(2,0,1)); dimnames(a)<-vector("list",3);
names(dimnames(a)) <- c("X","Y","Z")
> str(a)
 num[1:2, 0 , 1]
 - attr(*, "dimnames")=List of 3
  ..$ X: NULL
  ..$ Y: NULL
  ..$ Z: NULL

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Feb 19, 2020 at 6:24 AM Serguei Sokol <[hidden email]>
wrote:

> Hi,
>
> I was bitten by a little incoherence in dimnames assignment or may be I
> missed some point.
> Here is the case. If I assign row names via dimnames(a)[[1]], when
> nrow(a)=1 then an error is thrown. But if I do the same when nrow(a) > 1
> it's OK. Is one of this case works unexpectedly? Both? Neither?
>
> a=as.matrix(1)
> dimnames(a)[[1]]="a" # error: 'dimnames' must be a list
>
> aa=as.matrix(1:2)
> dimnames(aa)[[1]]=c("a", "b") # OK
>
> In the second case, dimnames(aa) is not a list (like in the first case)
> but it works.
> I would expect that the both work or neither.
>
> Your thoughts are welcome.
> Best,
> Serguei.
>
> PS the same apply for dimnames(a)[[2]]<-.
>
>  > sessionInfo()
> R version 3.6.1 (2019-07-05)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Mageia 7
>
> Matrix products: default
> BLAS/LAPACK: /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so
>
> locale:
>   [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8
>   [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8
>   [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets methods
> [8] base
>
> other attached packages:
> [1] multbxxc_1.0.1            rmumps_5.2.1-11
> [3] arrApply_2.1              RcppArmadillo_0.9.800.4.0
> [5] Rcpp_1.0.3                slam_0.1-47
> [7] nnls_1.4
>
> loaded via a namespace (and not attached):
> [1] compiler_3.6.1   tools_3.6.1      codetools_0.2-16
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: dimnames incoherence?

Martin Maechler
In reply to this post by Martin Maechler
>>>>> Martin Maechler
>>>>>     on Wed, 19 Feb 2020 18:06:57 +0100 writes:

>>>>> Serguei Sokol
>>>>>     on Wed, 19 Feb 2020 15:21:21 +0100 writes:

    >> Hi,
    >> I was bitten by a little incoherence in dimnames assignment or may be I
    >> missed some point.
    >> Here is the case. If I assign row names via dimnames(a)[[1]], when
    >> nrow(a)=1 then an error is thrown. But if I do the same when nrow(a) > 1
    >> it's OK. Is one of this case works unexpectedly? Both? Neither?

    >> a=as.matrix(1)
    >> dimnames(a)[[1]]="a" # error: 'dimnames' must be a list

    >> aa=as.matrix(1:2)
    >> dimnames(aa)[[1]]=c("a", "b") # OK

    >> In the second case, dimnames(aa) is not a list (like in the first case)
    >> but it works.
    >> I would expect that the both work or neither.

    > I agree (even though I'm strongly advising people to use '<-'
    > instead of '=');
    > which in this case helps you get the name of the function really
    > involved:  It is  `dimnames<-`  (which is implemented in C
    > entirely, for matrices and other arrays).

As a matter of fact, I wrote too quickly, the culprit here is
the  `[[<-`  function (rather than `dimnames<-`),
which has a special "inconsistency" feature when used to "add to NULL";
almost surely inherited from S,  but I now think we should
consider dropping on the occasion of aiming for  R 4.0.0 :

It's documented in ?Extract  that  length 1  `[[.]]`-assignment works
specially for NULL (and dimnames(.) are NULL here).

Note you need to read and understand one of the tougher sections
in the official  'R Language Definition'  Manual,
section -- 3.4.4 Subset assignment ---
i.e.,
https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment

notably this part:

Nesting of complex assignments is evaluated recursively

     names(x)[3] <- "Three"

is equivalent to

     `*tmp*` <- x
     x <- "names<-"(`*tmp*`, value="[<-"(names(`*tmp*`), 3, value="Three"))
     rm(`*tmp*`)

and then, apply this to our    dimnames(a)[[1]] <- "a"
and so  replace

 -  'names<-' by 'dimnames<-'
 -  '[<-'     by '[[<-'

----------

Here is the rest of my analysis as valid R code
{this is not new, Peter Dalgaard had explained this 10 or 20
 years ago to a mailing list audience IIRC} :

## MM: The problematic behavior (bug ?) is in `[[<-`, not in `dimnames<-` :

`[[<-`(NULL, 1,   "a"     ) # gives  "a"  (*not* a list)
`[[<-`(NULL, 1, c("a","b")) # gives list(c("a","b"))  !!

##==> in C code: in  subassign.c  [ ~/R/D/r-devel/R/src/main/subassign.c ]
##==> function (~ 340 lines)
##            do_subassign2_dflt(SEXP call, SEXP op, SEXP args, SEXP rho)
## has
"
line svn r.  svn auth. c.o.d.e...
---- ------  --------- ----------------------------------------------
1741   4166      ihaka     if (isNull(x)) {
1742  45446     ripley         if (isNull(y)) {
1743  76166       luke             UNPROTECT(2); /* args, y */
1744   4166      ihaka             return x;
1745  45446     ripley         }
1746  35680    murdoch         if (length(y) == 1)
1747  68094       luke             x = allocVector(TYPEOF(y), 0);
1748  24954     ripley         else
1749  68094       luke             x = allocVector(VECSXP, 0);
1750   1820      ihaka     }
---- ------  --------- ----------------------------------------------
"
## so clearly, in case the value is of length 1, no list is created .

## For dimnames<-  Replacing NULL by list()  should be done in both cases , and then things work :
`[[<-`(list(), 1,   "a"     ) # gives list( "a" )
`[[<-`(list(), 1, c("a","b")) # gives list(c("a","b"))  !!

## but the problem here is that  `[[<-` at this time in the game
## does *not* know that it comes from dimnames<- ....

---------------

If we change the behavior  NULL--[[--assignment from

 `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)

to

 `[[<-`(NULL, 1, "a" ) # gives  list("a")


then we have more consistency there *and* your bug is fixed too.
Of course, in other situations back-compatibility would be
broken as well.

At the moment, I think we (R Core Team) should consider doing
that here.

Martin



    >> Your thoughts are welcome.

    > I think we'd be happy if you report this formally on R's
    > bugzilla - https://bugs.r-project.org/ - as a bug.

    --> https://www.r-project.org/bugs.html

    >> From reading bugs.html you note that you should ask for
    >> an account there;
    > as I'm one of the people who get such request by e-mail,
    > in this case, I can do it directly (if you confirm you'd
    > want in a private e-mail).

    >> Best, Serguei.

    >> PS the same apply for dimnames(a)[[2]]<-.

    > (of course)

    > NB *and* importantly, the buglet is still in current
    > versions of R

    > Best, Martin

    >>> sessionInfo()
    >> R version 3.6.1 (2019-07-05) Platform:
    >> x86_64-pc-linux-gnu (64-bit) Running under: Mageia 7

    >> Matrix products: default BLAS/LAPACK:
    >> /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so

    >> locale:  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C  [3]
    >> LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8  [5]
    >> LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8  [7]
    >> LC_PAPER=fr_FR.UTF-8       LC_NAME=C  [9]
    >> LC_ADDRESS=C               LC_TELEPHONE=C [11]
    >> LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

    >> attached base packages: [1] parallel  stats     graphics 
    >> grDevices utils     datasets methods [8] base

    >> other attached packages: [1] multbxxc_1.0.1           
    >> rmumps_5.2.1-11 [3] arrApply_2.1             
    >> RcppArmadillo_0.9.800.4.0 [5] Rcpp_1.0.3               
    >> slam_0.1-47 [7] nnls_1.4

    >> loaded via a namespace (and not attached): [1]
    >> compiler_3.6.1   tools_3.6.1      codetools_0.2-16

    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: dimnames incoherence?

R devel mailing list
   If we change the behavior  NULL--[[--assignment from
    `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)
   to
    `[[<-`(NULL, 1, "a" ) # gives  list("a")
   then we have more consistency there *and* your bug is fixed too.
   Of course, in other situations back-compatibility would be
   broken as well.

Would that change the result of
   L <- list(One=1) ; L$Two[[1]] <- 2
from the current list(One=1,Two=2) to list(One=1, Two=list(2))
and the result of
   F <- 1L ; levels(F)[[1]] <- "one"
from structure(1L, levels="one") to structure(1L, levels=list("one"))?
This change would make L$Name[[1]] <- value act like L$Name$one <- value
in cases when L did not have a component named "Name" and value
had length 1.

I have seen users use [[<- where [<- is more appropriate in cases like
this.  Should there be a way to generate warnings about the change in
behavior as you've done with other syntax changes?

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Feb 19, 2020 at 12:59 PM Martin Maechler <[hidden email]>
wrote:

> >>>>> Martin Maechler
> >>>>>     on Wed, 19 Feb 2020 18:06:57 +0100 writes:
>
> >>>>> Serguei Sokol
> >>>>>     on Wed, 19 Feb 2020 15:21:21 +0100 writes:
>
>     >> Hi,
>     >> I was bitten by a little incoherence in dimnames assignment or may
> be I
>     >> missed some point.
>     >> Here is the case. If I assign row names via dimnames(a)[[1]], when
>     >> nrow(a)=1 then an error is thrown. But if I do the same when
> nrow(a) > 1
>     >> it's OK. Is one of this case works unexpectedly? Both? Neither?
>
>     >> a=as.matrix(1)
>     >> dimnames(a)[[1]]="a" # error: 'dimnames' must be a list
>
>     >> aa=as.matrix(1:2)
>     >> dimnames(aa)[[1]]=c("a", "b") # OK
>
>     >> In the second case, dimnames(aa) is not a list (like in the first
> case)
>     >> but it works.
>     >> I would expect that the both work or neither.
>
>     > I agree (even though I'm strongly advising people to use '<-'
>     > instead of '=');
>     > which in this case helps you get the name of the function really
>     > involved:  It is  `dimnames<-`  (which is implemented in C
>     > entirely, for matrices and other arrays).
>
> As a matter of fact, I wrote too quickly, the culprit here is
> the  `[[<-`  function (rather than `dimnames<-`),
> which has a special "inconsistency" feature when used to "add to NULL";
> almost surely inherited from S,  but I now think we should
> consider dropping on the occasion of aiming for  R 4.0.0 :
>
> It's documented in ?Extract  that  length 1  `[[.]]`-assignment works
> specially for NULL (and dimnames(.) are NULL here).
>
> Note you need to read and understand one of the tougher sections
> in the official  'R Language Definition'  Manual,
> section -- 3.4.4 Subset assignment ---
> i.e.,
>
> https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment
>
> notably this part:
>
> Nesting of complex assignments is evaluated recursively
>
>      names(x)[3] <- "Three"
>
> is equivalent to
>
>      `*tmp*` <- x
>      x <- "names<-"(`*tmp*`, value="[<-"(names(`*tmp*`), 3, value="Three"))
>      rm(`*tmp*`)
>
> and then, apply this to our    dimnames(a)[[1]] <- "a"
> and so  replace
>
>  -  'names<-' by 'dimnames<-'
>  -  '[<-'     by '[[<-'
>
> ----------
>
> Here is the rest of my analysis as valid R code
> {this is not new, Peter Dalgaard had explained this 10 or 20
>  years ago to a mailing list audience IIRC} :
>
> ## MM: The problematic behavior (bug ?) is in `[[<-`, not in `dimnames<-` :
>
> `[[<-`(NULL, 1,   "a"     ) # gives  "a"  (*not* a list)
> `[[<-`(NULL, 1, c("a","b")) # gives list(c("a","b"))  !!
>
> ##==> in C code: in  subassign.c  [ ~/R/D/r-devel/R/src/main/subassign.c ]
> ##==> function (~ 340 lines)
> ##            do_subassign2_dflt(SEXP call, SEXP op, SEXP args, SEXP rho)
> ## has
> "
> line svn r.  svn auth. c.o.d.e...
> ---- ------  --------- ----------------------------------------------
> 1741   4166      ihaka     if (isNull(x)) {
> 1742  45446     ripley         if (isNull(y)) {
> 1743  76166       luke             UNPROTECT(2); /* args, y */
> 1744   4166      ihaka             return x;
> 1745  45446     ripley         }
> 1746  35680    murdoch         if (length(y) == 1)
> 1747  68094       luke             x = allocVector(TYPEOF(y), 0);
> 1748  24954     ripley         else
> 1749  68094       luke             x = allocVector(VECSXP, 0);
> 1750   1820      ihaka     }
> ---- ------  --------- ----------------------------------------------
> "
> ## so clearly, in case the value is of length 1, no list is created .
>
> ## For dimnames<-  Replacing NULL by list()  should be done in both cases
> , and then things work :
> `[[<-`(list(), 1,   "a"     ) # gives list( "a" )
> `[[<-`(list(), 1, c("a","b")) # gives list(c("a","b"))  !!
>
> ## but the problem here is that  `[[<-` at this time in the game
> ## does *not* know that it comes from dimnames<- ....
>
> ---------------
>
> If we change the behavior  NULL--[[--assignment from
>
>  `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)
>
> to
>
>  `[[<-`(NULL, 1, "a" ) # gives  list("a")
>
>
> then we have more consistency there *and* your bug is fixed too.
> Of course, in other situations back-compatibility would be
> broken as well.
>
> At the moment, I think we (R Core Team) should consider doing
> that here.
>
> Martin
>
>
>
>     >> Your thoughts are welcome.
>
>     > I think we'd be happy if you report this formally on R's
>     > bugzilla - https://bugs.r-project.org/ - as a bug.
>
>     --> https://www.r-project.org/bugs.html
>
>     >> From reading bugs.html you note that you should ask for
>     >> an account there;
>     > as I'm one of the people who get such request by e-mail,
>     > in this case, I can do it directly (if you confirm you'd
>     > want in a private e-mail).
>
>     >> Best, Serguei.
>
>     >> PS the same apply for dimnames(a)[[2]]<-.
>
>     > (of course)
>
>     > NB *and* importantly, the buglet is still in current
>     > versions of R
>
>     > Best, Martin
>
>     >>> sessionInfo()
>     >> R version 3.6.1 (2019-07-05) Platform:
>     >> x86_64-pc-linux-gnu (64-bit) Running under: Mageia 7
>
>     >> Matrix products: default BLAS/LAPACK:
>     >> /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so
>
>     >> locale:  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C  [3]
>     >> LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8  [5]
>     >> LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8  [7]
>     >> LC_PAPER=fr_FR.UTF-8       LC_NAME=C  [9]
>     >> LC_ADDRESS=C               LC_TELEPHONE=C [11]
>     >> LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>
>     >> attached base packages: [1] parallel  stats     graphics
>     >> grDevices utils     datasets methods [8] base
>
>     >> other attached packages: [1] multbxxc_1.0.1
>     >> rmumps_5.2.1-11 [3] arrApply_2.1
>     >> RcppArmadillo_0.9.800.4.0 [5] Rcpp_1.0.3
>     >> slam_0.1-47 [7] nnls_1.4
>
>     >> loaded via a namespace (and not attached): [1]
>     >> compiler_3.6.1   tools_3.6.1      codetools_0.2-16
>
>     >> ______________________________________________
>     >> [hidden email] mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>     > ______________________________________________
>     > [hidden email] mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: dimnames incoherence?

Martin Maechler
>>>>> William Dunlap
>>>>>     on Fri, 21 Feb 2020 14:05:49 -0800 writes:

    > If we change the behavior  NULL--[[--assignment from

    > `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)

    > to

    > `[[<-`(NULL, 1, "a" ) # gives  list("a")

    > then we have more consistency there *and* your bug is fixed too.
    > Of course, in other situations back-compatibility would be
    > broken as well.

    > Would that change the result of
    > L <- list(One=1) ; L$Two[[1]] <- 2
    > from the current list(One=1,Two=2) to list(One=1, Two=list(2))

    > and the result of
    > F <- 1L ; levels(F)[[1]] <- "one"
    > from structure(1L, levels="one") to structure(1L, levels=list("one"))?

Yes (twice).

This is indeed what happens in current R-devel, as I had
committed the proposition above yesterday.
So R-devel (with svn rev >= 77844 )  does this :

> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
list(One = 1, Two = list(2))
>  F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
structure(1L, .Label = list("one"))
>

but I find that still considerably more logical than current
(pre R-devel) R's

> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
list(One = 1, Two = 2)
> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L)
list(One = 1, Two = list(2:3))
>
> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
structure(1L, .Label = "one")
> F <- 1L ; levels(F)[[1]] <- c("one", "TWO") ; dput(F)
structure(1L, .Label = list(c("one", "TWO")))
>


    > This change would make L$Name[[1]] <- value act like L$Name$one <- value
    > in cases when L did not have a component named "Name" and value
    > had length 1.

(I don't entirely get what you mean, but)
indeed,
the  [[<-  assignments will be closer to corresponding $<-  assignments...
which I thought would be another good thing about the change.

    > I have seen users use [[<- where [<- is more appropriate in cases like
    > this.  Should there be a way to generate warnings about the change in
    > behavior as you've done with other syntax changes?

Well, good question.
I'd guess one would get such warnings "all over the place",  and
if a warning is given only once per session it may not be
effective  ... also the warning be confusing to the 99.9% of R users who
don't even get what we are talking about here ;-)

Thank you for your comments.. I did not get too many.

Martin

    > Bill Dunlap
    > TIBCO Software
    > wdunlap tibco.com


    > On Wed, Feb 19, 2020 at 12:59 PM Martin Maechler <[hidden email]>
    > wrote:

    >> >>>>> Martin Maechler
    >> >>>>>     on Wed, 19 Feb 2020 18:06:57 +0100 writes:
    >>
    >> >>>>> Serguei Sokol
    >> >>>>>     on Wed, 19 Feb 2020 15:21:21 +0100 writes:
    >>
    >> >> Hi,
    >> >> I was bitten by a little incoherence in dimnames assignment or may
    >> be I
    >> >> missed some point.
    >> >> Here is the case. If I assign row names via dimnames(a)[[1]], when
    >> >> nrow(a)=1 then an error is thrown. But if I do the same when
    >> nrow(a) > 1
    >> >> it's OK. Is one of this case works unexpectedly? Both? Neither?
    >>
    >> >> a=as.matrix(1)
    >> >> dimnames(a)[[1]]="a" # error: 'dimnames' must be a list
    >>
    >> >> aa=as.matrix(1:2)
    >> >> dimnames(aa)[[1]]=c("a", "b") # OK
    >>
    >> >> In the second case, dimnames(aa) is not a list (like in the first
    >> case)
    >> >> but it works.
    >> >> I would expect that the both work or neither.
    >>
    >> > I agree (even though I'm strongly advising people to use '<-'
    >> > instead of '=');
    >> > which in this case helps you get the name of the function really
    >> > involved:  It is  `dimnames<-`  (which is implemented in C
    >> > entirely, for matrices and other arrays).
    >>
    >> As a matter of fact, I wrote too quickly, the culprit here is
    >> the  `[[<-`  function (rather than `dimnames<-`),
    >> which has a special "inconsistency" feature when used to "add to NULL";
    >> almost surely inherited from S,  but I now think we should
    >> consider dropping on the occasion of aiming for  R 4.0.0 :
    >>
    >> It's documented in ?Extract  that  length 1  `[[.]]`-assignment works
    >> specially for NULL (and dimnames(.) are NULL here).
    >>
    >> Note you need to read and understand one of the tougher sections
    >> in the official  'R Language Definition'  Manual,
    >> section -- 3.4.4 Subset assignment ---
    >> i.e.,
    >>
    >> https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment
    >>
    >> notably this part:
    >>
    >> Nesting of complex assignments is evaluated recursively
    >>
    >> names(x)[3] <- "Three"
    >>
    >> is equivalent to
    >>
    >> `*tmp*` <- x
    >> x <- "names<-"(`*tmp*`, value="[<-"(names(`*tmp*`), 3, value="Three"))
    >> rm(`*tmp*`)
    >>
    >> and then, apply this to our    dimnames(a)[[1]] <- "a"
    >> and so  replace
    >>
    >> -  'names<-' by 'dimnames<-'
    >> -  '[<-'     by '[[<-'
    >>
    >> ----------
    >>
    >> Here is the rest of my analysis as valid R code
    >> {this is not new, Peter Dalgaard had explained this 10 or 20
    >> years ago to a mailing list audience IIRC} :
    >>
    >> ## MM: The problematic behavior (bug ?) is in `[[<-`, not in `dimnames<-` :
    >>
    >> `[[<-`(NULL, 1,   "a"     ) # gives  "a"  (*not* a list)
    >> `[[<-`(NULL, 1, c("a","b")) # gives list(c("a","b"))  !!
    >>
    >> ##==> in C code: in  subassign.c  [ ~/R/D/r-devel/R/src/main/subassign.c ]
    >> ##==> function (~ 340 lines)
    >> ##            do_subassign2_dflt(SEXP call, SEXP op, SEXP args, SEXP rho)
    >> ## has
    >> "
    >> line svn r.  svn auth. c.o.d.e...
    >> ---- ------  --------- ----------------------------------------------
    >> 1741   4166      ihaka     if (isNull(x)) {
    >> 1742  45446     ripley         if (isNull(y)) {
    >> 1743  76166       luke             UNPROTECT(2); /* args, y */
    >> 1744   4166      ihaka             return x;
    >> 1745  45446     ripley         }
    >> 1746  35680    murdoch         if (length(y) == 1)
    >> 1747  68094       luke             x = allocVector(TYPEOF(y), 0);
    >> 1748  24954     ripley         else
    >> 1749  68094       luke             x = allocVector(VECSXP, 0);
    >> 1750   1820      ihaka     }
    >> ---- ------  --------- ----------------------------------------------
    >> "
    >> ## so clearly, in case the value is of length 1, no list is created .
    >>
    >> ## For dimnames<-  Replacing NULL by list()  should be done in both cases
    >> , and then things work :
    >> `[[<-`(list(), 1,   "a"     ) # gives list( "a" )
    >> `[[<-`(list(), 1, c("a","b")) # gives list(c("a","b"))  !!
    >>
    >> ## but the problem here is that  `[[<-` at this time in the game
    >> ## does *not* know that it comes from dimnames<- ....
    >>
    >> ---------------
    >>
    >> If we change the behavior  NULL--[[--assignment from
    >>
    >> `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)
    >>
    >> to
    >>
    >> `[[<-`(NULL, 1, "a" ) # gives  list("a")
    >>
    >>
    >> then we have more consistency there *and* your bug is fixed too.
    >> Of course, in other situations back-compatibility would be
    >> broken as well.
    >>
    >> At the moment, I think we (R Core Team) should consider doing
    >> that here.
    >>
    >> Martin
    >>
    >>
    >>
    >> >> Your thoughts are welcome.
    >>
    >> > I think we'd be happy if you report this formally on R's
    >> > bugzilla - https://bugs.r-project.org/ - as a bug.
    >>
    --> https://www.r-project.org/bugs.html
    >>
    >> >> From reading bugs.html you note that you should ask for
    >> >> an account there;
    >> > as I'm one of the people who get such request by e-mail,
    >> > in this case, I can do it directly (if you confirm you'd
    >> > want in a private e-mail).
    >>
    >> >> Best, Serguei.
    >>
    >> >> PS the same apply for dimnames(a)[[2]]<-.
    >>
    >> > (of course)
    >>
    >> > NB *and* importantly, the buglet is still in current
    >> > versions of R
    >>
    >> > Best, Martin
    >>
    >> >>> sessionInfo()
    >> >> R version 3.6.1 (2019-07-05) Platform:
    >> >> x86_64-pc-linux-gnu (64-bit) Running under: Mageia 7
    >>
    >> >> Matrix products: default BLAS/LAPACK:
    >> >> /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so
    >>
    >> >> locale:  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C  [3]
    >> >> LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8  [5]
    >> >> LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8  [7]
    >> >> LC_PAPER=fr_FR.UTF-8       LC_NAME=C  [9]
    >> >> LC_ADDRESS=C               LC_TELEPHONE=C [11]
    >> >> LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
    >>
    >> >> attached base packages: [1] parallel  stats     graphics
    >> >> grDevices utils     datasets methods [8] base
    >>
    >> >> other attached packages: [1] multbxxc_1.0.1
    >> >> rmumps_5.2.1-11 [3] arrApply_2.1
    >> >> RcppArmadillo_0.9.800.4.0 [5] Rcpp_1.0.3
    >> >> slam_0.1-47 [7] nnls_1.4
    >>
    >> >> loaded via a namespace (and not attached): [1]
    >> >> compiler_3.6.1   tools_3.6.1      codetools_0.2-16
    >>
    >> >> ______________________________________________
    >> >> [hidden email] mailing list
    >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>
    >> > ______________________________________________
    >> > [hidden email] mailing list
    >> > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Change 77844 breaking pkgs [Re: dimnames incoherence?]

Martin Maechler
>>>>> Martin Maechler
>>>>>     on Sat, 22 Feb 2020 20:20:49 +0100 writes:

>>>>> William Dunlap
>>>>>     on Fri, 21 Feb 2020 14:05:49 -0800 writes:

    >> If we change the behavior  NULL--[[--assignment from

    >> `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)

    >> to

    >> `[[<-`(NULL, 1, "a" ) # gives  list("a")

    >> then we have more consistency there *and* your bug is fixed too.
    >> Of course, in other situations back-compatibility would be
    >> broken as well.

    >> Would that change the result of
    >> L <- list(One=1) ; L$Two[[1]] <- 2
    >> from the current list(One=1,Two=2) to list(One=1, Two=list(2))

    >> and the result of
    >> F <- 1L ; levels(F)[[1]] <- "one"
    >> from structure(1L, levels="one") to structure(1L, levels=list("one"))?

    > Yes (twice).

    > This is indeed what happens in current R-devel, as I had
    > committed the proposition above yesterday.
    > So R-devel (with svn rev >= 77844 )  does this :

    >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
    > list(One = 1, Two = list(2))
    >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
    > structure(1L, .Label = list("one"))
    >>

    > but I find that still considerably more logical than current
    > (pre R-devel) R's

    >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
    > list(One = 1, Two = 2)
    >> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L)
    > list(One = 1, Two = list(2:3))
    >>
    >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
    > structure(1L, .Label = "one")
    >> F <- 1L ; levels(F)[[1]] <- c("one", "TWO") ; dput(F)
    > structure(1L, .Label = list(c("one", "TWO")))
    >>


    >> This change would make L$Name[[1]] <- value act like L$Name$one <- value
    >> in cases when L did not have a component named "Name" and value
    >> had length 1.

    > (I don't entirely get what you mean, but)
    > indeed,
    > the  [[<-  assignments will be closer to corresponding $<-  assignments...
    > which I thought would be another good thing about the change.

    >> I have seen users use [[<- where [<- is more appropriate in cases like
    >> this.  Should there be a way to generate warnings about the change in
    >> behavior as you've done with other syntax changes?

    > Well, good question.
    > I'd guess one would get such warnings "all over the place",  and
    > if a warning is given only once per session it may not be
    > effective  ... also the warning be confusing to the 99.9% of R users who
    > don't even get what we are talking about here ;-)

    > Thank you for your comments.. I did not get too many.

Well, there's one situation where semi-experienced package
authors are bitten by the new R-devel behavior...

I'm seeing a few dozen CRAN packages breaking in R-devel >= r77884.

One case is exactly as you (Bill) mention above: people using
dd[[.]] <- ..   where they should use single [.].

In one package, I see an inefficient for loop over all rows of a
data frame 'dd'

for(i in 1:nrow(dd)) {

 ...

 dd$<nonexisting_column>[[i]] <-  <one character string>

}

This used to work -- as said quite inefficiently:
for i=1 it created the **full** data frame column  and then,
once the column exists, it presumably does assign one entry
after the other...

Now this code breaks (later!) in the package now, because the
new column ends up as a *list* of strings, instead of a vector
of strings.

I think there are quite a few such cases also in other CRAN
packages which now break with the latest R-devel.

Coming back to Bill Dunlap's question: Should we not warn here?
And now when our toplevel list is a data frame, maybe we should
warn indeed, if we can easily limit ourselves to such "bizarre"
ways of growng a data frame  ...


  dd $ foo [[i]] <- vv

<==>

  `*tmp*` <- dd
  dd <- `$<-`(`*tmp*`, value = `[[<-`(`*tmp*`$foo, i, vv))
  rm(`*tmp*`)
 
but then really we have the same problem as previously: The
 `[[<-`(NULL, i, vv)  part does not "know" anything about the
fact that we are in a data frame column creation context.

If the R package author had used  '[i]' instead of '[[i]]'
he|she would have been safe

(as they would be if they worked more efficiently and created
the whole variable as a vector and only then added it to the
data frame ... but then, it seems people want to perpetuate the
claim of R to be slow ... even if it's them who make R run
slowly ... ;-))

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Change 77844 breaking pkgs [Re: dimnames incoherence?]

R devel mailing list
> but then, it seems people want to perpetuate the
> claim of R to be slow

More charitably, I think that the thinking may have been that since x[[i]]
gives you one element of x,
they should use x[[i]]<-value, for scalar i, to stick in one element.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sat, Feb 22, 2020 at 12:44 PM Martin Maechler <[hidden email]>
wrote:

> >>>>> Martin Maechler
> >>>>>     on Sat, 22 Feb 2020 20:20:49 +0100 writes:
>
> >>>>> William Dunlap
> >>>>>     on Fri, 21 Feb 2020 14:05:49 -0800 writes:
>
>     >> If we change the behavior  NULL--[[--assignment from
>
>     >> `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a list)
>
>     >> to
>
>     >> `[[<-`(NULL, 1, "a" ) # gives  list("a")
>
>     >> then we have more consistency there *and* your bug is fixed too.
>     >> Of course, in other situations back-compatibility would be
>     >> broken as well.
>
>     >> Would that change the result of
>     >> L <- list(One=1) ; L$Two[[1]] <- 2
>     >> from the current list(One=1,Two=2) to list(One=1, Two=list(2))
>
>     >> and the result of
>     >> F <- 1L ; levels(F)[[1]] <- "one"
>     >> from structure(1L, levels="one") to structure(1L,
> levels=list("one"))?
>
>     > Yes (twice).
>
>     > This is indeed what happens in current R-devel, as I had
>     > committed the proposition above yesterday.
>     > So R-devel (with svn rev >= 77844 )  does this :
>
>     >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
>     > list(One = 1, Two = list(2))
>     >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
>     > structure(1L, .Label = list("one"))
>     >>
>
>     > but I find that still considerably more logical than current
>     > (pre R-devel) R's
>
>     >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
>     > list(One = 1, Two = 2)
>     >> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L)
>     > list(One = 1, Two = list(2:3))
>     >>
>     >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
>     > structure(1L, .Label = "one")
>     >> F <- 1L ; levels(F)[[1]] <- c("one", "TWO") ; dput(F)
>     > structure(1L, .Label = list(c("one", "TWO")))
>     >>
>
>
>     >> This change would make L$Name[[1]] <- value act like L$Name$one <-
> value
>     >> in cases when L did not have a component named "Name" and value
>     >> had length 1.
>
>     > (I don't entirely get what you mean, but)
>     > indeed,
>     > the  [[<-  assignments will be closer to corresponding $<-
> assignments...
>     > which I thought would be another good thing about the change.
>
>     >> I have seen users use [[<- where [<- is more appropriate in cases
> like
>     >> this.  Should there be a way to generate warnings about the change
> in
>     >> behavior as you've done with other syntax changes?
>
>     > Well, good question.
>     > I'd guess one would get such warnings "all over the place",  and
>     > if a warning is given only once per session it may not be
>     > effective  ... also the warning be confusing to the 99.9% of R users
> who
>     > don't even get what we are talking about here ;-)
>
>     > Thank you for your comments.. I did not get too many.
>
> Well, there's one situation where semi-experienced package
> authors are bitten by the new R-devel behavior...
>
> I'm seeing a few dozen CRAN packages breaking in R-devel >= r77884.
>
> One case is exactly as you (Bill) mention above: people using
> dd[[.]] <- ..   where they should use single [.].
>
> In one package, I see an inefficient for loop over all rows of a
> data frame 'dd'
>
> for(i in 1:nrow(dd)) {
>
>  ...
>
>  dd$<nonexisting_column>[[i]] <-  <one character string>
>
> }
>
> This used to work -- as said quite inefficiently:
> for i=1 it created the **full** data frame column  and then,
> once the column exists, it presumably does assign one entry
> after the other...
>
> Now this code breaks (later!) in the package now, because the
> new column ends up as a *list* of strings, instead of a vector
> of strings.
>
> I think there are quite a few such cases also in other CRAN
> packages which now break with the latest R-devel.
>
> Coming back to Bill Dunlap's question: Should we not warn here?
> And now when our toplevel list is a data frame, maybe we should
> warn indeed, if we can easily limit ourselves to such "bizarre"
> ways of growng a data frame  ...
>
>
>   dd $ foo [[i]] <- vv
>
> <==>
>
>   `*tmp*` <- dd
>   dd <- `$<-`(`*tmp*`, value = `[[<-`(`*tmp*`$foo, i, vv))
>   rm(`*tmp*`)
>
> but then really we have the same problem as previously: The
>  `[[<-`(NULL, i, vv)  part does not "know" anything about the
> fact that we are in a data frame column creation context.
>
> If the R package author had used  '[i]' instead of '[[i]]'
> he|she would have been safe
>
> (as they would be if they worked more efficiently and created
> the whole variable as a vector and only then added it to the
> data frame ... but then, it seems people want to perpetuate the
> claim of R to be slow ... even if it's them who make R run
> slowly ... ;-))
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel