Bug in the "reformulate" function in stats package

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Bug in the "reformulate" function in stats package

Saren Tasciyan
Hi,

I have found a bug in reformulate function and have a solution for it. I
was wondering, where I can submit it?

Best,

Saren

--
Saren Tasciyan
/PhD Student / Sixt Group/
Institute of Science and Technology Austria
Am Campus 1
3400 Klosterneuburg, Austria


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

Martin Maechler
>>>>> Saren Tasciyan
>>>>>     on Thu, 28 Mar 2019 17:02:10 +0100 writes:

    > Hi,
    > I have found a bug in reformulate function and have a solution for it. I
    > was wondering, where I can submit it?

    > Best,
    > Saren


Well, you could have given a small reproducible example
depicting the bug, notably when posting here:
Just a prose text with no R code or other technical content is
almost always not really appropriate fo the R-devel mailing list.

Further, in such a case you should google a bit and hopefully
have found
                https://www.r-project.org/bugs.html

which also mention reproducibility (and many more useful things).

Then it also tells you about R's bug repository, also called
"R's bugzilla" at https://bugs.r-project.org/

and if you are diligent (but here, I'd say bugzilla is
(configured?) far from ideal), you'd also find bug PR#17359

   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359

which was reported already on Nov 2017 .. and only fixed
yesterday (in the "cleanup old bugs" process that happens
often before the big new spring release of R).

So is your bug the same as that one?

Martin

    > --
    > Saren Tasciyan
    > /PhD Student / Sixt Group/
    > Institute of Science and Technology Austria
    > Am Campus 1
    > 3400 Klosterneuburg, Austria

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

Saren Tasciyan
Well, first I can't sign in bugzilla myself, that is why I wrote here
first. Also, I don't know if I have the time at the moment to provide
tests, multiple examples or more. If that is not ok or welcomed, that is
fine, I can come back, whenever I have more time to properly report the bug.

I didn't find the existing bug report, sorry for that.

Yes, it is related. My problem was that I have column names with spaces
and current solution doesn't solve it. I have a solution, which works
for me and maybe also for others.

Either, someone can register me to bugzilla or I can post it here, which
could give some direction to developers. I don't mind whichever is
preferred here.

Best,

Saren


On 29.03.19 09:29, Martin Maechler wrote:

>>>>>> Saren Tasciyan
>>>>>>      on Thu, 28 Mar 2019 17:02:10 +0100 writes:
>      > Hi,
>      > I have found a bug in reformulate function and have a solution for it. I
>      > was wondering, where I can submit it?
>
>      > Best,
>      > Saren
>
>
> Well, you could have given a small reproducible example
> depicting the bug, notably when posting here:
> Just a prose text with no R code or other technical content is
> almost always not really appropriate fo the R-devel mailing list.
>
> Further, in such a case you should google a bit and hopefully
> have found
> https://www.r-project.org/bugs.html
>
> which also mention reproducibility (and many more useful things).
>
> Then it also tells you about R's bug repository, also called
> "R's bugzilla" at https://bugs.r-project.org/
>
> and if you are diligent (but here, I'd say bugzilla is
> (configured?) far from ideal), you'd also find bug PR#17359
>
>     https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
>
> which was reported already on Nov 2017 .. and only fixed
> yesterday (in the "cleanup old bugs" process that happens
> often before the big new spring release of R).
>
> So is your bug the same as that one?
>
> Martin
>
>      > --
>      > Saren Tasciyan
>      > /PhD Student / Sixt Group/
>      > Institute of Science and Technology Austria
>      > Am Campus 1
>      > 3400 Klosterneuburg, Austria
>
>      > ______________________________________________
>      > [hidden email] mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-devel
--
Saren Tasciyan
/PhD Student / Sixt Group/
Institute of Science and Technology Austria
Am Campus 1
3400 Klosterneuburg, Austria



______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

J C Nash
The main thing is to post the "small reproducible example".

My (rather long term experience) can be written

  if (exists("reproducible example") ) {
     DeveloperFixHappens()
  } else {
     NULL
  }

JN

On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:

> Well, first I can't sign in bugzilla myself, that is why I wrote here first. Also, I don't know if I have the time at
> the moment to provide tests, multiple examples or more. If that is not ok or welcomed, that is fine, I can come back,
> whenever I have more time to properly report the bug.
>
> I didn't find the existing bug report, sorry for that.
>
> Yes, it is related. My problem was that I have column names with spaces and current solution doesn't solve it. I have a
> solution, which works for me and maybe also for others.
>
> Either, someone can register me to bugzilla or I can post it here, which could give some direction to developers. I
> don't mind whichever is preferred here.
>
> Best,
>
> Saren
>
>
> On 29.03.19 09:29, Martin Maechler wrote:
>>>>>>> Saren Tasciyan
>>>>>>>      on Thu, 28 Mar 2019 17:02:10 +0100 writes:
>>      > Hi,
>>      > I have found a bug in reformulate function and have a solution for it. I
>>      > was wondering, where I can submit it?
>>
>>      > Best,
>>      > Saren
>>
>>
>> Well, you could have given a small reproducible example
>> depicting the bug, notably when posting here:
>> Just a prose text with no R code or other technical content is
>> almost always not really appropriate fo the R-devel mailing list.
>>
>> Further, in such a case you should google a bit and hopefully
>> have found
>>         https://www.r-project.org/bugs.html
>>
>> which also mention reproducibility (and many more useful things).
>>
>> Then it also tells you about R's bug repository, also called
>> "R's bugzilla" at https://bugs.r-project.org/
>>
>> and if you are diligent (but here, I'd say bugzilla is
>> (configured?) far from ideal), you'd also find bug PR#17359
>>
>>     https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
>>
>> which was reported already on Nov 2017 .. and only fixed
>> yesterday (in the "cleanup old bugs" process that happens
>> often before the big new spring release of R).
>>
>> So is your bug the same as that one?
>>
>> Martin
>>
>>      > --
>>      > Saren Tasciyan
>>      > /PhD Student / Sixt Group/
>>      > Institute of Science and Technology Austria
>>      > Am Campus 1
>>      > 3400 Klosterneuburg, Austria
>>
>>      > ______________________________________________
>>      > [hidden email] mailing list
>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

bbolker
  I suspect that the issue is addressed (obliquely) in the examples,
which shows that variables with spaces in them (or otherwise
'non-syntactic', i.e. not satisfying the constraints of legal R symbols)
can be handled by protecting them with backticks  (``)

 ## using non-syntactic names:
     reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))

It seems to me there could be room for a *documentation* patch (stating
explicitly that if termlabels has length > 1 its elements are
concatenated with "+", and explicitly stating that non-syntactic names
must be protected with back-ticks).  (There is a little bit of obscurity
in the fact that the elements of termlabels don't have to be
syntactically valid names: many will be included in formulas if they can
be interpreted as *parseable* expressions, e.g. reformulate("x<2"))

  I would be happy to give it a shot if the consensus is that it would
be worthwhile.

   One workaround to the OP's problem is below (may be worth including
as an example in docs)

> z <- c("a variable","another variable")
> reformulate(z)
Error in parse(text = termtext, keep.source = FALSE) :
  <text>:1:6: unexpected symbol
1:  ~ a variable
         ^
> reformulate(sprintf("`%s`",z))
~`a variable` + `another variable`




On 2019-03-29 11:54 a.m., J C Nash wrote:

> The main thing is to post the "small reproducible example".
>
> My (rather long term experience) can be written
>
>   if (exists("reproducible example") ) {
>      DeveloperFixHappens()
>   } else {
>      NULL
>   }
>
> JN
>
> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
>> Well, first I can't sign in bugzilla myself, that is why I wrote here first. Also, I don't know if I have the time at
>> the moment to provide tests, multiple examples or more. If that is not ok or welcomed, that is fine, I can come back,
>> whenever I have more time to properly report the bug.
>>
>> I didn't find the existing bug report, sorry for that.
>>
>> Yes, it is related. My problem was that I have column names with spaces and current solution doesn't solve it. I have a
>> solution, which works for me and maybe also for others.
>>
>> Either, someone can register me to bugzilla or I can post it here, which could give some direction to developers. I
>> don't mind whichever is preferred here.
>>
>> Best,
>>
>> Saren
>>
>>
>> On 29.03.19 09:29, Martin Maechler wrote:
>>>>>>>> Saren Tasciyan
>>>>>>>>      on Thu, 28 Mar 2019 17:02:10 +0100 writes:
>>>      > Hi,
>>>      > I have found a bug in reformulate function and have a solution for it. I
>>>      > was wondering, where I can submit it?
>>>
>>>      > Best,
>>>      > Saren
>>>
>>>
>>> Well, you could have given a small reproducible example
>>> depicting the bug, notably when posting here:
>>> Just a prose text with no R code or other technical content is
>>> almost always not really appropriate fo the R-devel mailing list.
>>>
>>> Further, in such a case you should google a bit and hopefully
>>> have found
>>>         https://www.r-project.org/bugs.html
>>>
>>> which also mention reproducibility (and many more useful things).
>>>
>>> Then it also tells you about R's bug repository, also called
>>> "R's bugzilla" at https://bugs.r-project.org/
>>>
>>> and if you are diligent (but here, I'd say bugzilla is
>>> (configured?) far from ideal), you'd also find bug PR#17359
>>>
>>>     https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
>>>
>>> which was reported already on Nov 2017 .. and only fixed
>>> yesterday (in the "cleanup old bugs" process that happens
>>> often before the big new spring release of R).
>>>
>>> So is your bug the same as that one?
>>>
>>> Martin
>>>
>>>      > --
>>>      > Saren Tasciyan
>>>      > /PhD Student / Sixt Group/
>>>      > Institute of Science and Technology Austria
>>>      > Am Campus 1
>>>      > 3400 Klosterneuburg, Austria
>>>
>>>      > ______________________________________________
>>>      > [hidden email] mailing list
>>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> ______________________________________________
>>> [hidden email] mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

Martin Maechler
>>>>> Ben Bolker
>>>>>     on Fri, 29 Mar 2019 12:34:50 -0400 writes:

    > I suspect that the issue is addressed (obliquely) in the examples,
    > which shows that variables with spaces in them (or otherwise
    > 'non-syntactic', i.e. not satisfying the constraints of legal R symbols)
    > can be handled by protecting them with backticks  (``)

    > ## using non-syntactic names:
    > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))

    > It seems to me there could be room for a *documentation* patch (stating
    > explicitly that if termlabels has length > 1 its elements are
    > concatenated with "+", and explicitly stating that non-syntactic names
    > must be protected with back-ticks).  (There is a little bit of obscurity
    > in the fact that the elements of termlabels don't have to be
    > syntactically valid names: many will be included in formulas if they can
    > be interpreted as *parseable* expressions, e.g. reformulate("x<2"))

    > I would be happy to give it a shot if the consensus is that it would
    > be worthwhile.

I think it would be worthwhile to add to the docs a bit.

[With currently just your and my vote, we have a 100% consensus
;-)]

Martin

    > One workaround to the OP's problem is below (may be worth including
    > as an example in docs)

    >> z <- c("a variable","another variable")
    >> reformulate(z)
    > Error in parse(text = termtext, keep.source = FALSE) :
    > <text>:1:6: unexpected symbol
    > 1:  ~ a variable
    > ^
    >> reformulate(sprintf("`%s`",z))
    > ~`a variable` + `another variable`




    > On 2019-03-29 11:54 a.m., J C Nash wrote:
    >> The main thing is to post the "small reproducible example".
    >>
    >> My (rather long term experience) can be written
    >>
    >> if (exists("reproducible example") ) {
    >> DeveloperFixHappens()
    >> } else {
    >> NULL
    >> }
    >>
    >> JN
    >>
    >> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
    >>> Well, first I can't sign in bugzilla myself, that is why I wrote here first. Also, I don't know if I have the time at
    >>> the moment to provide tests, multiple examples or more. If that is not ok or welcomed, that is fine, I can come back,
    >>> whenever I have more time to properly report the bug.
    >>>
    >>> I didn't find the existing bug report, sorry for that.
    >>>
    >>> Yes, it is related. My problem was that I have column names with spaces and current solution doesn't solve it. I have a
    >>> solution, which works for me and maybe also for others.
    >>>
    >>> Either, someone can register me to bugzilla or I can post it here, which could give some direction to developers. I
    >>> don't mind whichever is preferred here.
    >>>
    >>> Best,
    >>>
    >>> Saren
    >>>
    >>>
    >>> On 29.03.19 09:29, Martin Maechler wrote:
    >>>>>>>>> Saren Tasciyan
    >>>>>>>>>      on Thu, 28 Mar 2019 17:02:10 +0100 writes:
    >>>>      > Hi,
    >>>>      > I have found a bug in reformulate function and have a solution for it. I
    >>>>      > was wondering, where I can submit it?
    >>>>
    >>>>      > Best,
    >>>>      > Saren
    >>>>
    >>>>
    >>>> Well, you could have given a small reproducible example
    >>>> depicting the bug, notably when posting here:
    >>>> Just a prose text with no R code or other technical content is
    >>>> almost always not really appropriate fo the R-devel mailing list.
    >>>>
    >>>> Further, in such a case you should google a bit and hopefully
    >>>> have found
    >>>>         https://www.r-project.org/bugs.html
    >>>>
    >>>> which also mention reproducibility (and many more useful things).
    >>>>
    >>>> Then it also tells you about R's bug repository, also called
    >>>> "R's bugzilla" at https://bugs.r-project.org/
    >>>>
    >>>> and if you are diligent (but here, I'd say bugzilla is
    >>>> (configured?) far from ideal), you'd also find bug PR#17359
    >>>>
    >>>>     https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
    >>>>
    >>>> which was reported already on Nov 2017 .. and only fixed
    >>>> yesterday (in the "cleanup old bugs" process that happens
    >>>> often before the big new spring release of R).
    >>>>
    >>>> So is your bug the same as that one?
    >>>>
    >>>> Martin
    >>>>
    >>>>      > --
    >>>>      > Saren Tasciyan
    >>>>      > /PhD Student / Sixt Group/
    >>>>      > Institute of Science and Technology Austria
    >>>>      > Am Campus 1
    >>>>      > 3400 Klosterneuburg, Austria
    >>>>
    >>>>      > ______________________________________________
    >>>>      > [hidden email] mailing list
    >>>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>>
    >>>> ______________________________________________
    >>>> [hidden email] mailing list
    >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>
    >> ______________________________________________
    >> [hidden email] mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

bbolker

  Proposed patch (I think .txt files work OK as attachments to the list?)

On 2019-04-04 2:21 a.m., Martin Maechler wrote:

>>>>>> Ben Bolker
>>>>>>     on Fri, 29 Mar 2019 12:34:50 -0400 writes:
>
>     > I suspect that the issue is addressed (obliquely) in the examples,
>     > which shows that variables with spaces in them (or otherwise
>     > 'non-syntactic', i.e. not satisfying the constraints of legal R symbols)
>     > can be handled by protecting them with backticks  (``)
>
>     > ## using non-syntactic names:
>     > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))
>
>     > It seems to me there could be room for a *documentation* patch (stating
>     > explicitly that if termlabels has length > 1 its elements are
>     > concatenated with "+", and explicitly stating that non-syntactic names
>     > must be protected with back-ticks).  (There is a little bit of obscurity
>     > in the fact that the elements of termlabels don't have to be
>     > syntactically valid names: many will be included in formulas if they can
>     > be interpreted as *parseable* expressions, e.g. reformulate("x<2"))
>
>     > I would be happy to give it a shot if the consensus is that it would
>     > be worthwhile.
>
> I think it would be worthwhile to add to the docs a bit.
>
> [With currently just your and my vote, we have a 100% consensus
> ;-)]
>
> Martin
>
>     > One workaround to the OP's problem is below (may be worth including
>     > as an example in docs)
>
>     >> z <- c("a variable","another variable")
>     >> reformulate(z)
>     > Error in parse(text = termtext, keep.source = FALSE) :
>     > <text>:1:6: unexpected symbol
>     > 1:  ~ a variable
>     > ^
>     >> reformulate(sprintf("`%s`",z))
>     > ~`a variable` + `another variable`
>
>
>
>
>     > On 2019-03-29 11:54 a.m., J C Nash wrote:
>     >> The main thing is to post the "small reproducible example".
>     >>
>     >> My (rather long term experience) can be written
>     >>
>     >> if (exists("reproducible example") ) {
>     >> DeveloperFixHappens()
>     >> } else {
>     >> NULL
>     >> }
>     >>
>     >> JN
>     >>
>     >> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
>     >>> Well, first I can't sign in bugzilla myself, that is why I wrote here first. Also, I don't know if I have the time at
>     >>> the moment to provide tests, multiple examples or more. If that is not ok or welcomed, that is fine, I can come back,
>     >>> whenever I have more time to properly report the bug.
>     >>>
>     >>> I didn't find the existing bug report, sorry for that.
>     >>>
>     >>> Yes, it is related. My problem was that I have column names with spaces and current solution doesn't solve it. I have a
>     >>> solution, which works for me and maybe also for others.
>     >>>
>     >>> Either, someone can register me to bugzilla or I can post it here, which could give some direction to developers. I
>     >>> don't mind whichever is preferred here.
>     >>>
>     >>> Best,
>     >>>
>     >>> Saren
>     >>>
>     >>>
>     >>> On 29.03.19 09:29, Martin Maechler wrote:
>     >>>>>>>>> Saren Tasciyan
>     >>>>>>>>>      on Thu, 28 Mar 2019 17:02:10 +0100 writes:
>     >>>>      > Hi,
>     >>>>      > I have found a bug in reformulate function and have a solution for it. I
>     >>>>      > was wondering, where I can submit it?
>     >>>>
>     >>>>      > Best,
>     >>>>      > Saren
>     >>>>
>     >>>>
>     >>>> Well, you could have given a small reproducible example
>     >>>> depicting the bug, notably when posting here:
>     >>>> Just a prose text with no R code or other technical content is
>     >>>> almost always not really appropriate fo the R-devel mailing list.
>     >>>>
>     >>>> Further, in such a case you should google a bit and hopefully
>     >>>> have found
>     >>>>         https://www.r-project.org/bugs.html
>     >>>>
>     >>>> which also mention reproducibility (and many more useful things).
>     >>>>
>     >>>> Then it also tells you about R's bug repository, also called
>     >>>> "R's bugzilla" at https://bugs.r-project.org/
>     >>>>
>     >>>> and if you are diligent (but here, I'd say bugzilla is
>     >>>> (configured?) far from ideal), you'd also find bug PR#17359
>     >>>>
>     >>>>     https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
>     >>>>
>     >>>> which was reported already on Nov 2017 .. and only fixed
>     >>>> yesterday (in the "cleanup old bugs" process that happens
>     >>>> often before the big new spring release of R).
>     >>>>
>     >>>> So is your bug the same as that one?
>     >>>>
>     >>>> Martin
>     >>>>
>     >>>>      > --
>     >>>>      > Saren Tasciyan
>     >>>>      > /PhD Student / Sixt Group/
>     >>>>      > Institute of Science and Technology Austria
>     >>>>      > Am Campus 1
>     >>>>      > 3400 Klosterneuburg, Austria
>     >>>>
>     >>>>      > ______________________________________________
>     >>>>      > [hidden email] mailing list
>     >>>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>>>
>     >>>> ______________________________________________
>     >>>> [hidden email] mailing list
>     >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>
>     >> ______________________________________________
>     >> [hidden email] mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>
>
>     > ______________________________________________
>     > [hidden email] mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

reformulate_diff.txt (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

Martin Maechler
>>>>> Ben Bolker
>>>>>     on Thu, 4 Apr 2019 12:46:37 -0400 writes:

  > Proposed patch

Thank you Ben!


[the rest is technical nit-picking .. but hopefully interesting
 to the smart R-devel reader base:]

There was a very subtle thinko in your patch which is not easily
diagnosed from R's parse_Rd():

Error in parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd",  :
  Unexpected end of input (in " quoted string opened at delete.response.Rd:78:63)
In addition: Warning message:
In parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd",  :
  newline within quoted string at delete.response.Rd:74

and even I needed more than a minute to find out that the
culprit was that

  reformulate(sprintf("`%s`", x))

is not ok in *.Rd  and must be

  reformulate(sprintf("`\%s`", x))

---------

  > (I think .txt files work OK as attachments to the list?)  

yes, typically -- what really counts is if your e-mail program
marks them with MIME-type 'text/plain'
and most E-mail programs are very "silly" / "safe" nowadays and
don't expect to have smart users  and hence mark (and sometimes
encode) everything unknown as non-text.

Using very old flexible e-mail interfaces such as Emacs VM allow
you to specify the MIME-type in addition to the file *and* it
also proposes smart defaults, I think by using something like
unix 'file' to determine that your 'foo.diff' file is plain text.
{{ .. and we all know that Windows is sillily using file extensions
   to determine file type and only knows  Windows-extensions plus
   those added explicitly by software installed; so nowadays *.rda
   is marked as an Rstudio file ... [argh].
}}

Martin

    > On 2019-04-04 2:21 a.m., Martin Maechler wrote:
    >>>>>>> Ben Bolker
    >>>>>>> on Fri, 29 Mar 2019 12:34:50 -0400 writes:
    >>
    >> > I suspect that the issue is addressed (obliquely) in the examples,
    >> > which shows that variables with spaces in them (or otherwise
    >> > 'non-syntactic', i.e. not satisfying the constraints of legal R symbols)
    >> > can be handled by protecting them with backticks  (``)
    >>
    >> > ## using non-syntactic names:
    >> > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))
    >>
    >> > It seems to me there could be room for a *documentation* patch (stating
    >> > explicitly that if termlabels has length > 1 its elements are
    >> > concatenated with "+", and explicitly stating that non-syntactic names
    >> > must be protected with back-ticks).  (There is a little bit of obscurity
    >> > in the fact that the elements of termlabels don't have to be
    >> > syntactically valid names: many will be included in formulas if they can
    >> > be interpreted as *parseable* expressions, e.g. reformulate("x<2"))
    >>
    >> > I would be happy to give it a shot if the consensus is that it would
    >> > be worthwhile.
    >>
    >> I think it would be worthwhile to add to the docs a bit.
    >>
    >> [With currently just your and my vote, we have a 100% consensus
    >> ;-)]
    >>
    >> Martin
    >>
    >> > One workaround to the OP's problem is below (may be worth including
    >> > as an example in docs)
    >>
    >> >> z <- c("a variable","another variable")
    >> >> reformulate(z)
    >> > Error in parse(text = termtext, keep.source = FALSE) :
    >> > <text>:1:6: unexpected symbol
    >> > 1:  ~ a variable
    >> > ^
    >> >> reformulate(sprintf("`%s`",z))
    >> > ~`a variable` + `another variable`
    >>
    >>
    >>
    >>
    >> > On 2019-03-29 11:54 a.m., J C Nash wrote:
    >> >> The main thing is to post the "small reproducible example".
    >> >>
    >> >> My (rather long term experience) can be written
    >> >>
    >> >> if (exists("reproducible example") ) {
    >> >> DeveloperFixHappens()
    >> >> } else {
    >> >> NULL
    >> >> }
    >> >>
    >> >> JN
    >> >>
    >> >> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
    >> >>> Well, first I can't sign in bugzilla myself, that is why I wrote here first. Also, I don't know if I have the time at
    >> >>> the moment to provide tests, multiple examples or more. If that is not ok or welcomed, that is fine, I can come back,
    >> >>> whenever I have more time to properly report the bug.
    >> >>>
    >> >>> I didn't find the existing bug report, sorry for that.
    >> >>>
    >> >>> Yes, it is related. My problem was that I have column names with spaces and current solution doesn't solve it. I have a
    >> >>> solution, which works for me and maybe also for others.
    >> >>>
    >> >>> Either, someone can register me to bugzilla or I can post it here, which could give some direction to developers. I
    >> >>> don't mind whichever is preferred here.
    >> >>>
    >> >>> Best,
    >> >>>
    >> >>> Saren
    >> >>>
    >> >>>
    >> >>> On 29.03.19 09:29, Martin Maechler wrote:
    >> >>>>>>>>> Saren Tasciyan
    >> >>>>>>>>>      on Thu, 28 Mar 2019 17:02:10 +0100 writes:
    >> >>>>      > Hi,
    >> >>>>      > I have found a bug in reformulate function and have a solution for it. I
    >> >>>>      > was wondering, where I can submit it?
    >> >>>>
    >> >>>>      > Best,
    >> >>>>      > Saren
    >> >>>>
    >> >>>>
    >> >>>> Well, you could have given a small reproducible example
    >> >>>> depicting the bug, notably when posting here:
    >> >>>> Just a prose text with no R code or other technical content is
    >> >>>> almost always not really appropriate fo the R-devel mailing list.
    >> >>>>
    >> >>>> Further, in such a case you should google a bit and hopefully
    >> >>>> have found
    >> >>>>         https://www.r-project.org/bugs.html
    >> >>>>
    >> >>>> which also mention reproducibility (and many more useful things).
    >> >>>>
    >> >>>> Then it also tells you about R's bug repository, also called
    >> >>>> "R's bugzilla" at https://bugs.r-project.org/
    >> >>>>
    >> >>>> and if you are diligent (but here, I'd say bugzilla is
    >> >>>> (configured?) far from ideal), you'd also find bug PR#17359
    >> >>>>
    >> >>>>     https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
    >> >>>>
    >> >>>> which was reported already on Nov 2017 .. and only fixed
    >> >>>> yesterday (in the "cleanup old bugs" process that happens
    >> >>>> often before the big new spring release of R).
    >> >>>>
    >> >>>> So is your bug the same as that one?
    >> >>>>
    >> >>>> Martin
    >> >>>>
    >> >>>>      > --
    >> >>>>      > Saren Tasciyan
    >> >>>>      > /PhD Student / Sixt Group/
    >> >>>>      > Institute of Science and Technology Austria
    >> >>>>      > Am Campus 1
    >> >>>>      > 3400 Klosterneuburg, Austria
    >> >>>>
    >> >>>>      > ______________________________________________
    >> >>>>      > [hidden email] mailing list
    >> >>>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
    >> >>>>
    >> >>>> ______________________________________________
    >> >>>> [hidden email] mailing list
    >> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> >>
    >> >> ______________________________________________
    >> >> [hidden email] mailing list
    >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> >>
    >>
    >> > ______________________________________________
    >> > [hidden email] mailing list
    >> > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>
    > x[DELETED ATTACHMENT external: reformulate.diff, plain text]
    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

Saren Tasciyan
Hi,

Sorry for writing this late, I was very busy. I started this discussion
here. I wish I could write to bugs.r-project.org, but I don't have an
account and I will write here instead.

Meanwhile, I solved my problem with a simpler fix (please see attached
file)/.
/

This requires that term labels are not "ticked". I think this is better,
since it is easier to have column names unticked.

New development function is IMO unnecessarily complicated. It requires
strings to be ticked or as.name(). It is more intuitive to have a vector
of column names.

Best,

Saren


On 05.04.19 09:38, Martin Maechler wrote:

>>>>>> Ben Bolker
>>>>>>      on Thu, 4 Apr 2019 12:46:37 -0400 writes:
>    > Proposed patch
>
> Thank you Ben!
>
>
> [the rest is technical nit-picking .. but hopefully interesting
>   to the smart R-devel reader base:]
>
> There was a very subtle thinko in your patch which is not easily
> diagnosed from R's parse_Rd():
>
> Error in parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd",  :
>    Unexpected end of input (in " quoted string opened at delete.response.Rd:78:63)
> In addition: Warning message:
> In parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd",  :
>    newline within quoted string at delete.response.Rd:74
>
> and even I needed more than a minute to find out that the
> culprit was that
>
>    reformulate(sprintf("`%s`", x))
>
> is not ok in *.Rd  and must be
>
>    reformulate(sprintf("`\%s`", x))
>
> ---------
>
>    > (I think .txt files work OK as attachments to the list?)
>
> yes, typically -- what really counts is if your e-mail program
> marks them with MIME-type 'text/plain'
> and most E-mail programs are very "silly" / "safe" nowadays and
> don't expect to have smart users  and hence mark (and sometimes
> encode) everything unknown as non-text.
>
> Using very old flexible e-mail interfaces such as Emacs VM allow
> you to specify the MIME-type in addition to the file *and* it
> also proposes smart defaults, I think by using something like
> unix 'file' to determine that your 'foo.diff' file is plain text.
> {{ .. and we all know that Windows is sillily using file extensions
>     to determine file type and only knows  Windows-extensions plus
>     those added explicitly by software installed; so nowadays *.rda
>     is marked as an Rstudio file ... [argh].
> }}
>
> Martin
>
>      > On 2019-04-04 2:21 a.m., Martin Maechler wrote:
>      >>>>>>> Ben Bolker
>      >>>>>>> on Fri, 29 Mar 2019 12:34:50 -0400 writes:
>      >>
>      >> > I suspect that the issue is addressed (obliquely) in the examples,
>      >> > which shows that variables with spaces in them (or otherwise
>      >> > 'non-syntactic', i.e. not satisfying the constraints of legal R symbols)
>      >> > can be handled by protecting them with backticks  (``)
>      >>
>      >> > ## using non-syntactic names:
>      >> > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))
>      >>
>      >> > It seems to me there could be room for a *documentation* patch (stating
>      >> > explicitly that if termlabels has length > 1 its elements are
>      >> > concatenated with "+", and explicitly stating that non-syntactic names
>      >> > must be protected with back-ticks).  (There is a little bit of obscurity
>      >> > in the fact that the elements of termlabels don't have to be
>      >> > syntactically valid names: many will be included in formulas if they can
>      >> > be interpreted as *parseable* expressions, e.g. reformulate("x<2"))
>      >>
>      >> > I would be happy to give it a shot if the consensus is that it would
>      >> > be worthwhile.
>      >>
>      >> I think it would be worthwhile to add to the docs a bit.
>      >>
>      >> [With currently just your and my vote, we have a 100% consensus
>      >> ;-)]
>      >>
>      >> Martin
>      >>
>      >> > One workaround to the OP's problem is below (may be worth including
>      >> > as an example in docs)
>      >>
>      >> >> z <- c("a variable","another variable")
>      >> >> reformulate(z)
>      >> > Error in parse(text = termtext, keep.source = FALSE) :
>      >> > <text>:1:6: unexpected symbol
>      >> > 1:  ~ a variable
>      >> > ^
>      >> >> reformulate(sprintf("`%s`",z))
>      >> > ~`a variable` + `another variable`
>      >>
>      >>
>      >>
>      >>
>      >> > On 2019-03-29 11:54 a.m., J C Nash wrote:
>      >> >> The main thing is to post the "small reproducible example".
>      >> >>
>      >> >> My (rather long term experience) can be written
>      >> >>
>      >> >> if (exists("reproducible example") ) {
>      >> >> DeveloperFixHappens()
>      >> >> } else {
>      >> >> NULL
>      >> >> }
>      >> >>
>      >> >> JN
>      >> >>
>      >> >> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
>      >> >>> Well, first I can't sign in bugzilla myself, that is why I wrote here first. Also, I don't know if I have the time at
>      >> >>> the moment to provide tests, multiple examples or more. If that is not ok or welcomed, that is fine, I can come back,
>      >> >>> whenever I have more time to properly report the bug.
>      >> >>>
>      >> >>> I didn't find the existing bug report, sorry for that.
>      >> >>>
>      >> >>> Yes, it is related. My problem was that I have column names with spaces and current solution doesn't solve it. I have a
>      >> >>> solution, which works for me and maybe also for others.
>      >> >>>
>      >> >>> Either, someone can register me to bugzilla or I can post it here, which could give some direction to developers. I
>      >> >>> don't mind whichever is preferred here.
>      >> >>>
>      >> >>> Best,
>      >> >>>
>      >> >>> Saren
>      >> >>>
>      >> >>>
>      >> >>> On 29.03.19 09:29, Martin Maechler wrote:
>      >> >>>>>>>>> Saren Tasciyan
>      >> >>>>>>>>>      on Thu, 28 Mar 2019 17:02:10 +0100 writes:
>      >> >>>>      > Hi,
>      >> >>>>      > I have found a bug in reformulate function and have a solution for it. I
>      >> >>>>      > was wondering, where I can submit it?
>      >> >>>>
>      >> >>>>      > Best,
>      >> >>>>      > Saren
>      >> >>>>
>      >> >>>>
>      >> >>>> Well, you could have given a small reproducible example
>      >> >>>> depicting the bug, notably when posting here:
>      >> >>>> Just a prose text with no R code or other technical content is
>      >> >>>> almost always not really appropriate fo the R-devel mailing list.
>      >> >>>>
>      >> >>>> Further, in such a case you should google a bit and hopefully
>      >> >>>> have found
>      >> >>>>         https://www.r-project.org/bugs.html
>      >> >>>>
>      >> >>>> which also mention reproducibility (and many more useful things).
>      >> >>>>
>      >> >>>> Then it also tells you about R's bug repository, also called
>      >> >>>> "R's bugzilla" at https://bugs.r-project.org/
>      >> >>>>
>      >> >>>> and if you are diligent (but here, I'd say bugzilla is
>      >> >>>> (configured?) far from ideal), you'd also find bug PR#17359
>      >> >>>>
>      >> >>>>     https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
>      >> >>>>
>      >> >>>> which was reported already on Nov 2017 .. and only fixed
>      >> >>>> yesterday (in the "cleanup old bugs" process that happens
>      >> >>>> often before the big new spring release of R).
>      >> >>>>
>      >> >>>> So is your bug the same as that one?
>      >> >>>>
>      >> >>>> Martin
>      >> >>>>
>      >> >>>>      > --
>      >> >>>>      > Saren Tasciyan
>      >> >>>>      > /PhD Student / Sixt Group/
>      >> >>>>      > Institute of Science and Technology Austria
>      >> >>>>      > Am Campus 1
>      >> >>>>      > 3400 Klosterneuburg, Austria
>      >> >>>>
>      >> >>>>      > ______________________________________________
>      >> >>>>      > [hidden email] mailing list
>      >> >>>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>      >> >>>>
>      >> >>>> ______________________________________________
>      >> >>>> [hidden email] mailing list
>      >> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>      >> >>
>      >> >> ______________________________________________
>      >> >> [hidden email] mailing list
>      >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>      >> >>
>      >>
>      >> > ______________________________________________
>      >> > [hidden email] mailing list
>      >> > https://stat.ethz.ch/mailman/listinfo/r-devel
>      >>
>      > x[DELETED ATTACHMENT external: reformulate.diff, plain text]
>      > ______________________________________________
>      > [hidden email] mailing list
>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Saren Tasciyan
/PhD Student / Sixt Group/
Institute of Science and Technology Austria
Am Campus 1
3400 Klosterneuburg, Austria



______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

bbolker

  Your file didn't make it through the mailing list (which is quite
restrictive about which types/extensions it will take).

  I appreciate your enthusiasm and persistence for this issue, but I
suspect you may have trouble convincing R-core to adopt your changes --
they are "better", "easier", "more intuitive" for you ... but how sure
are you they are completely backward compatible, have no performance
issues, will not break in unusual cases ... ?

  Hopefully someone here will set up a bugzilla account so you can post
your patch/it can be further discussed there, if you want to purseu this ...

  cheers
    Ben Bolker

On 2019-04-18 7:30 a.m., Saren Tasciyan wrote:

> Hi,
>
> Sorry for writing this late, I was very busy. I started this discussion
> here. I wish I could write to bugs.r-project.org, but I don't have an
> account and I will write here instead.
>
> Meanwhile, I solved my problem with a simpler fix (please see attached
> file)/.
> /
>
> This requires that term labels are not "ticked". I think this is better,
> since it is easier to have column names unticked.
>
> New development function is IMO unnecessarily complicated. It requires
> strings to be ticked or as.name(). It is more intuitive to have a vector
> of column names.
>
> Best,
>
> Saren
>
>
> On 05.04.19 09:38, Martin Maechler wrote:
>>>>>>> Ben Bolker
>>>>>>>      on Thu, 4 Apr 2019 12:46:37 -0400 writes:
>>    > Proposed patch
>>
>> Thank you Ben!
>>
>>
>> [the rest is technical nit-picking .. but hopefully interesting
>>   to the smart R-devel reader base:]
>>
>> There was a very subtle thinko in your patch which is not easily
>> diagnosed from R's parse_Rd():
>>
>> Error in
>> parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd", 
>> :
>>    Unexpected end of input (in " quoted string opened at
>> delete.response.Rd:78:63)
>> In addition: Warning message:
>> In
>> parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd", 
>> :
>>    newline within quoted string at delete.response.Rd:74
>>
>> and even I needed more than a minute to find out that the
>> culprit was that
>>
>>    reformulate(sprintf("`%s`", x))
>>
>> is not ok in *.Rd  and must be
>>
>>    reformulate(sprintf("`\%s`", x))
>>
>> ---------
>>
>>    > (I think .txt files work OK as attachments to the list?)
>>
>> yes, typically -- what really counts is if your e-mail program
>> marks them with MIME-type 'text/plain'
>> and most E-mail programs are very "silly" / "safe" nowadays and
>> don't expect to have smart users  and hence mark (and sometimes
>> encode) everything unknown as non-text.
>>
>> Using very old flexible e-mail interfaces such as Emacs VM allow
>> you to specify the MIME-type in addition to the file *and* it
>> also proposes smart defaults, I think by using something like
>> unix 'file' to determine that your 'foo.diff' file is plain text.
>> {{ .. and we all know that Windows is sillily using file extensions
>>     to determine file type and only knows  Windows-extensions plus
>>     those added explicitly by software installed; so nowadays *.rda
>>     is marked as an Rstudio file ... [argh].
>> }}
>>
>> Martin
>>
>>      > On 2019-04-04 2:21 a.m., Martin Maechler wrote:
>>      >>>>>>> Ben Bolker
>>      >>>>>>> on Fri, 29 Mar 2019 12:34:50 -0400 writes:
>>      >>
>>      >> > I suspect that the issue is addressed (obliquely) in the
>> examples,
>>      >> > which shows that variables with spaces in them (or otherwise
>>      >> > 'non-syntactic', i.e. not satisfying the constraints of
>> legal R symbols)
>>      >> > can be handled by protecting them with backticks  (``)
>>      >>
>>      >> > ## using non-syntactic names:
>>      >> > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))
>>      >>
>>      >> > It seems to me there could be room for a *documentation*
>> patch (stating
>>      >> > explicitly that if termlabels has length > 1 its elements are
>>      >> > concatenated with "+", and explicitly stating that
>> non-syntactic names
>>      >> > must be protected with back-ticks).  (There is a little bit
>> of obscurity
>>      >> > in the fact that the elements of termlabels don't have to be
>>      >> > syntactically valid names: many will be included in formulas
>> if they can
>>      >> > be interpreted as *parseable* expressions, e.g.
>> reformulate("x<2"))
>>      >>
>>      >> > I would be happy to give it a shot if the consensus is that
>> it would
>>      >> > be worthwhile.
>>      >>
>>      >> I think it would be worthwhile to add to the docs a bit.
>>      >>
>>      >> [With currently just your and my vote, we have a 100% consensus
>>      >> ;-)]
>>      >>
>>      >> Martin
>>      >>
>>      >> > One workaround to the OP's problem is below (may be worth
>> including
>>      >> > as an example in docs)
>>      >>
>>      >> >> z <- c("a variable","another variable")
>>      >> >> reformulate(z)
>>      >> > Error in parse(text = termtext, keep.source = FALSE) :
>>      >> > <text>:1:6: unexpected symbol
>>      >> > 1:  ~ a variable
>>      >> > ^
>>      >> >> reformulate(sprintf("`%s`",z))
>>      >> > ~`a variable` + `another variable`
>>      >>
>>      >>
>>      >>
>>      >>
>>      >> > On 2019-03-29 11:54 a.m., J C Nash wrote:
>>      >> >> The main thing is to post the "small reproducible example".
>>      >> >>
>>      >> >> My (rather long term experience) can be written
>>      >> >>
>>      >> >> if (exists("reproducible example") ) {
>>      >> >> DeveloperFixHappens()
>>      >> >> } else {
>>      >> >> NULL
>>      >> >> }
>>      >> >>
>>      >> >> JN
>>      >> >>
>>      >> >> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
>>      >> >>> Well, first I can't sign in bugzilla myself, that is why I
>> wrote here first. Also, I don't know if I have the time at
>>      >> >>> the moment to provide tests, multiple examples or more. If
>> that is not ok or welcomed, that is fine, I can come back,
>>      >> >>> whenever I have more time to properly report the bug.
>>      >> >>>
>>      >> >>> I didn't find the existing bug report, sorry for that.
>>      >> >>>
>>      >> >>> Yes, it is related. My problem was that I have column
>> names with spaces and current solution doesn't solve it. I have a
>>      >> >>> solution, which works for me and maybe also for others.
>>      >> >>>
>>      >> >>> Either, someone can register me to bugzilla or I can post
>> it here, which could give some direction to developers. I
>>      >> >>> don't mind whichever is preferred here.
>>      >> >>>
>>      >> >>> Best,
>>      >> >>>
>>      >> >>> Saren
>>      >> >>>
>>      >> >>>
>>      >> >>> On 29.03.19 09:29, Martin Maechler wrote:
>>      >> >>>>>>>>> Saren Tasciyan
>>      >> >>>>>>>>>      on Thu, 28 Mar 2019 17:02:10 +0100 writes:
>>      >> >>>>      > Hi,
>>      >> >>>>      > I have found a bug in reformulate function and
>> have a solution for it. I
>>      >> >>>>      > was wondering, where I can submit it?
>>      >> >>>>
>>      >> >>>>      > Best,
>>      >> >>>>      > Saren
>>      >> >>>>
>>      >> >>>>
>>      >> >>>> Well, you could have given a small reproducible example
>>      >> >>>> depicting the bug, notably when posting here:
>>      >> >>>> Just a prose text with no R code or other technical
>> content is
>>      >> >>>> almost always not really appropriate fo the R-devel
>> mailing list.
>>      >> >>>>
>>      >> >>>> Further, in such a case you should google a bit and
>> hopefully
>>      >> >>>> have found
>>      >> >>>>         https://www.r-project.org/bugs.html
>>      >> >>>>
>>      >> >>>> which also mention reproducibility (and many more useful
>> things).
>>      >> >>>>
>>      >> >>>> Then it also tells you about R's bug repository, also called
>>      >> >>>> "R's bugzilla" at https://bugs.r-project.org/
>>      >> >>>>
>>      >> >>>> and if you are diligent (but here, I'd say bugzilla is
>>      >> >>>> (configured?) far from ideal), you'd also find bug PR#17359
>>      >> >>>>
>>      >> >>>>    
>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
>>      >> >>>>
>>      >> >>>> which was reported already on Nov 2017 .. and only fixed
>>      >> >>>> yesterday (in the "cleanup old bugs" process that happens
>>      >> >>>> often before the big new spring release of R).
>>      >> >>>>
>>      >> >>>> So is your bug the same as that one?
>>      >> >>>>
>>      >> >>>> Martin
>>      >> >>>>
>>      >> >>>>      > --
>>      >> >>>>      > Saren Tasciyan
>>      >> >>>>      > /PhD Student / Sixt Group/
>>      >> >>>>      > Institute of Science and Technology Austria
>>      >> >>>>      > Am Campus 1
>>      >> >>>>      > 3400 Klosterneuburg, Austria
>>      >> >>>>
>>      >> >>>>      > ______________________________________________
>>      >> >>>>      > [hidden email] mailing list
>>      >> >>>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>>      >> >>>>
>>      >> >>>> ______________________________________________
>>      >> >>>> [hidden email] mailing list
>>      >> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>      >> >>
>>      >> >> ______________________________________________
>>      >> >> [hidden email] mailing list
>>      >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>>      >> >>
>>      >>
>>      >> > ______________________________________________
>>      >> > [hidden email] mailing list
>>      >> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>      >>
>>      > x[DELETED ATTACHMENT external: reformulate.diff, plain text]
>>      > ______________________________________________
>>      > [hidden email] mailing list
>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

Martin Maechler
>>>>> Ben Bolker
>>>>>     on Thu, 18 Apr 2019 11:51:33 -0400 writes:

    > Your file didn't make it through the mailing list (which is quite
    > restrictive about which types/extensions it will take).

    > I appreciate your enthusiasm and persistence for this issue, but I
    > suspect you may have trouble convincing R-core to adopt your changes --
    > they are "better", "easier", "more intuitive" for you ... but how sure
    > are you they are completely backward compatible, have no performance
    > issues, will not break in unusual cases ... ?

    > Hopefully someone here will set up a bugzilla account so you can post
    > your patch/it can be further discussed there, if you want to purseu this ...
This case has been closed quite a while ago, thank you.

The changes will be in R 3.6.0  that'll come in 8 days, not the
least thanks to Ben's patch (earlier in this thread).

Martin Maechler

    > cheers
    > Ben Bolker

    > On 2019-04-18 7:30 a.m., Saren Tasciyan wrote:
    >> Hi,
    >>
    >> Sorry for writing this late, I was very busy. I started this discussion
    >> here. I wish I could write to bugs.r-project.org, but I don't have an
    >> account and I will write here instead.
    >>
    >> Meanwhile, I solved my problem with a simpler fix (please see attached
    >> file)/.
    >> /
    >>
    >> This requires that term labels are not "ticked". I think this is better,
    >> since it is easier to have column names unticked.
    >>
    >> New development function is IMO unnecessarily complicated. It requires
    >> strings to be ticked or as.name(). It is more intuitive to have a vector
    >> of column names.
    >>
    >> Best,
    >>
    >> Saren
    >>
    >>
    >> On 05.04.19 09:38, Martin Maechler wrote:
    >>>>>>>> Ben Bolker
    >>>>>>>>      on Thu, 4 Apr 2019 12:46:37 -0400 writes:
    >>>    > Proposed patch
    >>>
    >>> Thank you Ben!
    >>>
    >>>
    >>> [the rest is technical nit-picking .. but hopefully interesting
    >>>   to the smart R-devel reader base:]
    >>>
    >>> There was a very subtle thinko in your patch which is not easily
    >>> diagnosed from R's parse_Rd():
    >>>
    >>> Error in
    >>> parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd", 
    >>> :
    >>>    Unexpected end of input (in " quoted string opened at
    >>> delete.response.Rd:78:63)
    >>> In addition: Warning message:
    >>> In
    >>> parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd", 
    >>> :
    >>>    newline within quoted string at delete.response.Rd:74
    >>>
    >>> and even I needed more than a minute to find out that the
    >>> culprit was that
    >>>
    >>>    reformulate(sprintf("`%s`", x))
    >>>
    >>> is not ok in *.Rd  and must be
    >>>
    >>>    reformulate(sprintf("`\%s`", x))
    >>>
    >>> ---------
    >>>
    >>>    > (I think .txt files work OK as attachments to the list?)
    >>>
    >>> yes, typically -- what really counts is if your e-mail program
    >>> marks them with MIME-type 'text/plain'
    >>> and most E-mail programs are very "silly" / "safe" nowadays and
    >>> don't expect to have smart users  and hence mark (and sometimes
    >>> encode) everything unknown as non-text.
    >>>
    >>> Using very old flexible e-mail interfaces such as Emacs VM allow
    >>> you to specify the MIME-type in addition to the file *and* it
    >>> also proposes smart defaults, I think by using something like
    >>> unix 'file' to determine that your 'foo.diff' file is plain text.
    >>> {{ .. and we all know that Windows is sillily using file extensions
    >>>     to determine file type and only knows  Windows-extensions plus
    >>>     those added explicitly by software installed; so nowadays *.rda
    >>>     is marked as an Rstudio file ... [argh].
    >>> }}
    >>>
    >>> Martin
    >>>
    >>>      > On 2019-04-04 2:21 a.m., Martin Maechler wrote:
    >>>      >>>>>>> Ben Bolker
    >>>      >>>>>>> on Fri, 29 Mar 2019 12:34:50 -0400 writes:
    >>>      >>
    >>>      >> > I suspect that the issue is addressed (obliquely) in the
    >>> examples,
    >>>      >> > which shows that variables with spaces in them (or otherwise
    >>>      >> > 'non-syntactic', i.e. not satisfying the constraints of
    >>> legal R symbols)
    >>>      >> > can be handled by protecting them with backticks  (``)
    >>>      >>
    >>>      >> > ## using non-syntactic names:
    >>>      >> > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))
    >>>      >>
    >>>      >> > It seems to me there could be room for a *documentation*
    >>> patch (stating
    >>>      >> > explicitly that if termlabels has length > 1 its elements are
    >>>      >> > concatenated with "+", and explicitly stating that
    >>> non-syntactic names
    >>>      >> > must be protected with back-ticks).  (There is a little bit
    >>> of obscurity
    >>>      >> > in the fact that the elements of termlabels don't have to be
    >>>      >> > syntactically valid names: many will be included in formulas
    >>> if they can
    >>>      >> > be interpreted as *parseable* expressions, e.g.
    >>> reformulate("x<2"))
    >>>      >>
    >>>      >> > I would be happy to give it a shot if the consensus is that
    >>> it would
    >>>      >> > be worthwhile.
    >>>      >>
    >>>      >> I think it would be worthwhile to add to the docs a bit.
    >>>      >>
    >>>      >> [With currently just your and my vote, we have a 100% consensus
    >>>      >> ;-)]
    >>>      >>
    >>>      >> Martin
    >>>      >>
    >>>      >> > One workaround to the OP's problem is below (may be worth
    >>> including
    >>>      >> > as an example in docs)
    >>>      >>
    >>>      >> >> z <- c("a variable","another variable")
    >>>      >> >> reformulate(z)
    >>>      >> > Error in parse(text = termtext, keep.source = FALSE) :
    >>>      >> > <text>:1:6: unexpected symbol
    >>>      >> > 1:  ~ a variable
    >>>      >> > ^
    >>>      >> >> reformulate(sprintf("`%s`",z))
    >>>      >> > ~`a variable` + `another variable`
    >>>      >>
    >>>      >>
    >>>      >>
    >>>      >>
    >>>      >> > On 2019-03-29 11:54 a.m., J C Nash wrote:
    >>>      >> >> The main thing is to post the "small reproducible example".
    >>>      >> >>
    >>>      >> >> My (rather long term experience) can be written
    >>>      >> >>
    >>>      >> >> if (exists("reproducible example") ) {
    >>>      >> >> DeveloperFixHappens()
    >>>      >> >> } else {
    >>>      >> >> NULL
    >>>      >> >> }
    >>>      >> >>
    >>>      >> >> JN
    >>>      >> >>
    >>>      >> >> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
    >>>      >> >>> Well, first I can't sign in bugzilla myself, that is why I
    >>> wrote here first. Also, I don't know if I have the time at
    >>>      >> >>> the moment to provide tests, multiple examples or more. If
    >>> that is not ok or welcomed, that is fine, I can come back,
    >>>      >> >>> whenever I have more time to properly report the bug.
    >>>      >> >>>
    >>>      >> >>> I didn't find the existing bug report, sorry for that.
    >>>      >> >>>
    >>>      >> >>> Yes, it is related. My problem was that I have column
    >>> names with spaces and current solution doesn't solve it. I have a
    >>>      >> >>> solution, which works for me and maybe also for others.
    >>>      >> >>>
    >>>      >> >>> Either, someone can register me to bugzilla or I can post
    >>> it here, which could give some direction to developers. I
    >>>      >> >>> don't mind whichever is preferred here.
    >>>      >> >>>
    >>>      >> >>> Best,
    >>>      >> >>>
    >>>      >> >>> Saren
    >>>      >> >>>
    >>>      >> >>>
    >>>      >> >>> On 29.03.19 09:29, Martin Maechler wrote:
    >>>      >> >>>>>>>>> Saren Tasciyan
    >>>      >> >>>>>>>>>      on Thu, 28 Mar 2019 17:02:10 +0100 writes:
    >>>      >> >>>>      > Hi,
    >>>      >> >>>>      > I have found a bug in reformulate function and
    >>> have a solution for it. I
    >>>      >> >>>>      > was wondering, where I can submit it?
    >>>      >> >>>>
    >>>      >> >>>>      > Best,
    >>>      >> >>>>      > Saren
    >>>      >> >>>>
    >>>      >> >>>>
    >>>      >> >>>> Well, you could have given a small reproducible example
    >>>      >> >>>> depicting the bug, notably when posting here:
    >>>      >> >>>> Just a prose text with no R code or other technical
    >>> content is
    >>>      >> >>>> almost always not really appropriate fo the R-devel
    >>> mailing list.
    >>>      >> >>>>
    >>>      >> >>>> Further, in such a case you should google a bit and
    >>> hopefully
    >>>      >> >>>> have found
    >>>      >> >>>>         https://www.r-project.org/bugs.html
    >>>      >> >>>>
    >>>      >> >>>> which also mention reproducibility (and many more useful
    >>> things).
    >>>      >> >>>>
    >>>      >> >>>> Then it also tells you about R's bug repository, also called
    >>>      >> >>>> "R's bugzilla" at https://bugs.r-project.org/
    >>>      >> >>>>
    >>>      >> >>>> and if you are diligent (but here, I'd say bugzilla is
    >>>      >> >>>> (configured?) far from ideal), you'd also find bug PR#17359
    >>>      >> >>>>
    >>>      >> >>>>    
    >>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
    >>>      >> >>>>
    >>>      >> >>>> which was reported already on Nov 2017 .. and only fixed
    >>>      >> >>>> yesterday (in the "cleanup old bugs" process that happens
    >>>      >> >>>> often before the big new spring release of R).
    >>>      >> >>>>
    >>>      >> >>>> So is your bug the same as that one?
    >>>      >> >>>>
    >>>      >> >>>> Martin
    >>>      >> >>>>
    >>>      >> >>>>      > --
    >>>      >> >>>>      > Saren Tasciyan
    >>>      >> >>>>      > /PhD Student / Sixt Group/
    >>>      >> >>>>      > Institute of Science and Technology Austria
    >>>      >> >>>>      > Am Campus 1
    >>>      >> >>>>      > 3400 Klosterneuburg, Austria
    >>>      >> >>>>
    >>>      >> >>>>      > ______________________________________________
    >>>      >> >>>>      > [hidden email] mailing list
    >>>      >> >>>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>      >> >>>>
    >>>      >> >>>> ______________________________________________
    >>>      >> >>>> [hidden email] mailing list
    >>>      >> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>      >> >>
    >>>      >> >> ______________________________________________
    >>>      >> >> [hidden email] mailing list
    >>>      >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>      >> >>
    >>>      >>
    >>>      >> > ______________________________________________
    >>>      >> > [hidden email] mailing list
    >>>      >> > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>      >>
    >>>      > x[DELETED ATTACHMENT external: reformulate.diff, plain text]
    >>>      > ______________________________________________
    >>>      > [hidden email] mailing list
    >>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>
    >>> ______________________________________________
    >>> [hidden email] mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>
    >>> ______________________________________________
    >>> [hidden email] mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel

    > ______________________________________________
    > [hidden email] mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

Saren Tasciyan
In reply to this post by bbolker
So here is it as txt file. It is funny that a R file is restricted in
R-devel mailing list.

Anyhow, in this case R-core have a few choices here:

  * ignore my solution
  * show that it is actually bad or worse
  * consider adding it

Considering, that it is a minor change from previous version and doesn't
bother the existing usage, I saw the necessity to submit it here. But
newer solution in the 3.6.0 may solve other problems too. I can't argue
against that. This solves my part of the problem, without affecting
existing usage of the function.

If R-core is hard to convince, because this is just who they are, then I
should consider moving to other platforms. But so far, it seems to me
that they are doing a great job. I don't mind also someone rejecting
this tiny fix I have found, which works for me now. I can only thank for
their time spent considering it.

Actually, I had in mind a more complex but cleaner solution with
recursive functions to implement any kind of reformulation (not only
with "+"). But I simple lack the big picture on R expressions, I need to
read more. Maybe I will come back with that in the future.

Cheers to all,

Saren

On 18.04.19 17:51, Ben Bolker wrote:
>    I appreciate your enthusiasm and persistence for this issue, but I
> suspect you may have trouble convincing R-core to adopt your changes --
> they are "better", "easier", "more intuitive" for you ... but how sure
> are you they are completely backward compatible, have no performance
> issues, will not break in unusual cases ... ?
--
Saren Tasciyan
/PhD Student / Sixt Group/
Institute of Science and Technology Austria
Am Campus 1
3400 Klosterneuburg, Austria



______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

reformulate_solution.txt (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Bug in the "reformulate" function in stats package

Peter Dalgaard-2
I think that, also in R core, it is well recognized that it is unfortunate design that some formula manipulation tools rely on going via textual representation of the entire formula.
I'd be strongly tempted to replace the current reformulate() with something like this

> x <- c("a variable","another variable", "anormalone")
> lhs <- Reduce(function(x,y) bquote(.(x)+.(y)), lapply(x, as.name))
> as.formula(bquote(~.(lhs)))
~`a variable` + `another variable` + anormalone

However, there is a fair amount of conservatism because of the existing code base.
In particular, one needs to watch out for nasty corner cases: E.g., reformulate(c("x","y","x:y")) contains an interaction term, not a regression variable `x:y`. It is not too clear that this is desirable, but it is quite likely that someone's code actually uses it as a feature. Of course, auto-quoting anything that isn't a plain variable name breaks the feature. And there's no progammatic way to tell whether "P/E" is intended as a variable name (price/earnings ratio) or as equal to "P + P:E", so if we want both possibilities there needs to be a way to choose between them.  Which puts us back at square one.

-pd

> On 18 Apr 2019, at 22:21 , Saren Tasciyan <[hidden email]> wrote:
>
> So here is it as txt file. It is funny that a R file is restricted in R-devel mailing list.
>
> Anyhow, in this case R-core have a few choices here:
>
> * ignore my solution
> * show that it is actually bad or worse
> * consider adding it
>
> Considering, that it is a minor change from previous version and doesn't bother the existing usage, I saw the necessity to submit it here. But newer solution in the 3.6.0 may solve other problems too. I can't argue against that. This solves my part of the problem, without affecting existing usage of the function.
>
> If R-core is hard to convince, because this is just who they are, then I should consider moving to other platforms. But so far, it seems to me that they are doing a great job. I don't mind also someone rejecting this tiny fix I have found, which works for me now. I can only thank for their time spent considering it.
>
> Actually, I had in mind a more complex but cleaner solution with recursive functions to implement any kind of reformulation (not only with "+"). But I simple lack the big picture on R expressions, I need to read more. Maybe I will come back with that in the future.
>
> Cheers to all,
>
> Saren
>
> On 18.04.19 17:51, Ben Bolker wrote:
>>   I appreciate your enthusiasm and persistence for this issue, but I
>> suspect you may have trouble convincing R-core to adopt your changes --
>> they are "better", "easier", "more intuitive" for you ... but how sure
>> are you they are completely backward compatible, have no performance
>> issues, will not break in unusual cases ... ?
> --
> Saren Tasciyan
> /PhD Student / Sixt Group/
> Institute of Science and Technology Austria
> Am Campus 1
> 3400 Klosterneuburg, Austria
>
>
> <reformulate_solution.txt>______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel