In praise of "options(warnPartialMatchDollar = TRUE)"

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

In praise of "options(warnPartialMatchDollar = TRUE)"

Chris Evans
I am just posting this to the list because someone else may one day waste an hour or so because s/he has unknowingly hit a partial match failure using "$". It's my folly that I did but I am surprised that options(warnPartialMatchDollar = TRUE) isn't the default setting.

Here's a bit of reproducible code that shows the challenge.

#rm(list=ls()) ### BEWARE: me making sure environment was clean
set.seed(12345) # get fully reproducible example
nRows <- 100
Sample <- sample(0:1,nRows,replace=TRUE)
data2 <- data.frame(cbind(1:nRows,Sample)) # create data frame
table(data2$Samp) # call which silently achieves partial match
data2$innoccuousname <- factor(data2$Samp,labels=c("Non-clinical","Clinical"),levels=0:1)
str(data2$Samp) # all fine, no apparent destruction of the non-existent vector data2$Samp
data2$SampFac <- factor(data2$Samp,labels=c("Non-clinical","Clinical"),levels=0:1)
str(data2$Samp) # returns NULL because there is no longer a single partial match to "Samp" but no warning!
str(data2$Sample) # but of course, data2$Sample is still there

Because I had used "data2$Samp" all the way through a large file of R (markup) code and hadn't noticed that the variable names in the SPSS file I was reading in had changed from "Samp" to "Sample" I appeared to be destroying data2$Samp.

I have now set options(warnPartialMatchDollar = TRUE) in my Rprofile.site file and am just posting this here in case it helps someone some day.

Very best all,


Chris


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: In praise of "options(warnPartialMatchDollar = TRUE)"

David Winsemius

> On Nov 27, 2016, at 7:12 AM, Chris Evans <[hidden email]> wrote:
>
> I am just posting this to the list because someone else may one day waste an hour or so because s/he has unknowingly hit a partial match failure using "$". It's my folly that I did but I am surprised that options(warnPartialMatchDollar = TRUE) isn't the default setting.
>
> Here's a bit of reproducible code that shows the challenge.
>
> #rm(list=ls()) ### BEWARE: me making sure environment was clean
> set.seed(12345) # get fully reproducible example
> nRows <- 100
> Sample <- sample(0:1,nRows,replace=TRUE)
> data2 <- data.frame(cbind(1:nRows,Sample)) # create data frame

Using dataframe( cbind( ...) ) is a predictable method for creating later headaches. and there is no options-warning available. cbind coerces an argument list of vectors to matrix class, thus dropping all attributes (dates, times and factors are all destroyed.)

> table(data2$Samp) # call which silently achieves partial match
> data2$innoccuousname <- factor(data2$Samp,labels=c("Non-clinical","Clinical"),levels=0:1)
> str(data2$Samp) # all fine, no apparent destruction of the non-existent vector data2$Samp
> data2$SampFac <- factor(data2$Samp,labels=c("Non-clinical","Clinical"),levels=0:1)
> str(data2$Samp) # returns NULL because there is no longer a single partial match to "Samp" but no warning!
> str(data2$Sample) # but of course, data2$Sample is still there
>
> Because I had used "data2$Samp" all the way through a large file of R (markup) code and hadn't noticed that the variable names in the SPSS file I was reading in had changed from "Samp" to "Sample" I appeared to be destroying data2$Samp.
>
> I have now set options(warnPartialMatchDollar = TRUE) in my Rprofile.site file and am just posting this here in case it helps someone some day.

This is one of the reasons many experienced R programmers eschew the use of the "$" function in programming.

The preferred use would be :

data2[['Samp']]

(No partial match.)

>
> [[alternative HTML version deleted]]

Plain text is generally preferred on Rhelp but there does not appear to have been a problem in this posting instance.

>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: In praise of "options(warnPartialMatchDollar = TRUE)"

Chris Evans
Was about to reply just to sender but thought there were good byproducts of this so, to all ...

Many thanks Dr. Winsemius, and apologies for the HTML: tired sloppiness.  My bad.

Aha. For the first time I can really see a logic for dataFrame[['variable']] -- thanks for that.  I will have to break myself of the "$" habit.  Pity as it's a lot more keystrokes!  W

I understand about the nasty asset stripped side effects of cbind but thought in this situation it would cause no problems. Is the preferred route to create turn a first vector to data frame and then add the others using dataframe[['newVariable']] <- nextVector ?

Very best wishes and thanks again: this is an amazing list,

Chris


----- Original Message -----
> From: "David Winsemius" <[hidden email]>
> To: "Chris Evans" <[hidden email]>
> Cc: [hidden email]
> Sent: Sunday, 27 November, 2016 18:25:18
> Subject: Re: [R] In praise of "options(warnPartialMatchDollar = TRUE)"

>> On Nov 27, 2016, at 7:12 AM, Chris Evans <[hidden email]> wrote:
>>
>> I am just posting this to the list because someone else may one day waste an
>> hour or so because s/he has unknowingly hit a partial match failure using "$".
>> It's my folly that I did but I am surprised that options(warnPartialMatchDollar
>> = TRUE) isn't the default setting.
>>
>> Here's a bit of reproducible code that shows the challenge.
>>
>> #rm(list=ls()) ### BEWARE: me making sure environment was clean
>> set.seed(12345) # get fully reproducible example
>> nRows <- 100
>> Sample <- sample(0:1,nRows,replace=TRUE)
>> data2 <- data.frame(cbind(1:nRows,Sample)) # create data frame
>
> Using dataframe( cbind( ...) ) is a predictable method for creating later
> headaches. and there is no options-warning available. cbind coerces an argument
> list of vectors to matrix class, thus dropping all attributes (dates, times and
> factors are all destroyed.)
>
>> table(data2$Samp) # call which silently achieves partial match
>> data2$innoccuousname <-
>> factor(data2$Samp,labels=c("Non-clinical","Clinical"),levels=0:1)
>> str(data2$Samp) # all fine, no apparent destruction of the non-existent vector
>> data2$Samp
>> data2$SampFac <-
>> factor(data2$Samp,labels=c("Non-clinical","Clinical"),levels=0:1)
>> str(data2$Samp) # returns NULL because there is no longer a single partial match
>> to "Samp" but no warning!
>> str(data2$Sample) # but of course, data2$Sample is still there
>>
>> Because I had used "data2$Samp" all the way through a large file of R (markup)
>> code and hadn't noticed that the variable names in the SPSS file I was reading
>> in had changed from "Samp" to "Sample" I appeared to be destroying data2$Samp.
>>
>> I have now set options(warnPartialMatchDollar = TRUE) in my Rprofile.site file
>> and am just posting this here in case it helps someone some day.
>
> This is one of the reasons many experienced R programmers eschew the use of the
> "$" function in programming.
>
> The preferred use would be :
>
> data2[['Samp']]
>
> (No partial match.)
>
>>
>> [[alternative HTML version deleted]]
>
> Plain text is generally preferred on Rhelp but there does not appear to have
> been a problem in this posting instance.
>
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: In praise of "options(warnPartialMatchDollar = TRUE)"

David Winsemius

> On Nov 27, 2016, at 11:21 AM, Chris Evans <[hidden email]> wrote:
>
> Was about to reply just to sender but thought there were good byproducts of this so, to all ...
>
> Many thanks Dr. Winsemius, and apologies for the HTML: tired sloppiness.  My bad.
>
> Aha. For the first time I can really see a logic for dataFrame[['variable']] -- thanks for that.  I will have to break myself of the "$" habit.  Pity as it's a lot more keystrokes!  W
>
> I understand about the nasty asset stripped side effects of cbind but thought in this situation it would cause no problems. Is the preferred route to create turn a first vector to data frame and then add the others using dataframe[['newVariable']] <- nextVector ?

With an existing dataframe the use of cbind is safe, because the cbind data.frame function will not coerce arguments to matrix class. It is the use of cbind with vectors that is the source of danger.

This would be the preferred method of constructing a data.frame from objects with attributes:

dt <- as.Date( 1:10, origin="1970-01-01")
fac <- factor(letters[1:10])
nums <-10:1

dfrm <- data.frame( dt, fac, nums)

#OR skip the preliminary vector creation and use a named argument list:

dfrm <- data.frame( dt=as.Date( 1:10, origin="1970-01-01"),
                    fac=factor(letters[1:10]),
                    nums=10:1)

The second method lets you avoid leaving loose vectors that might later get used inappropriately if you happened to later write a function with a parameter or object that matched a named vector in the global environment.


After either method, you can `cbind` to that dfrm object to your heart's content, because the cbind.data.frame method is dispatched.

--
David.

>
> Very best wishes and thanks again: this is an amazing list,
>
> Chris
>
>
> ----- Original Message -----
>> From: "David Winsemius" <[hidden email]>
>> To: "Chris Evans" <[hidden email]>
>> Cc: [hidden email]
>> Sent: Sunday, 27 November, 2016 18:25:18
>> Subject: Re: [R] In praise of "options(warnPartialMatchDollar = TRUE)"
>
>>> On Nov 27, 2016, at 7:12 AM, Chris Evans <[hidden email]> wrote:
>>>
>>> I am just posting this to the list because someone else may one day waste an
>>> hour or so because s/he has unknowingly hit a partial match failure using "$".
>>> It's my folly that I did but I am surprised that options(warnPartialMatchDollar
>>> = TRUE) isn't the default setting.
>>>
>>> Here's a bit of reproducible code that shows the challenge.
>>>
>>> #rm(list=ls()) ### BEWARE: me making sure environment was clean
>>> set.seed(12345) # get fully reproducible example
>>> nRows <- 100
>>> Sample <- sample(0:1,nRows,replace=TRUE)
>>> data2 <- data.frame(cbind(1:nRows,Sample)) # create data frame
>>
>> Using dataframe( cbind( ...) ) is a predictable method for creating later
>> headaches. and there is no options-warning available. cbind coerces an argument
>> list of vectors to matrix class, thus dropping all attributes (dates, times and
>> factors are all destroyed.)
>>
>>> table(data2$Samp) # call which silently achieves partial match
>>> data2$innoccuousname <-
>>> factor(data2$Samp,labels=c("Non-clinical","Clinical"),levels=0:1)
>>> str(data2$Samp) # all fine, no apparent destruction of the non-existent vector
>>> data2$Samp
>>> data2$SampFac <-
>>> factor(data2$Samp,labels=c("Non-clinical","Clinical"),levels=0:1)
>>> str(data2$Samp) # returns NULL because there is no longer a single partial match
>>> to "Samp" but no warning!
>>> str(data2$Sample) # but of course, data2$Sample is still there
>>>
>>> Because I had used "data2$Samp" all the way through a large file of R (markup)
>>> code and hadn't noticed that the variable names in the SPSS file I was reading
>>> in had changed from "Samp" to "Sample" I appeared to be destroying data2$Samp.
>>>
>>> I have now set options(warnPartialMatchDollar = TRUE) in my Rprofile.site file
>>> and am just posting this here in case it helps someone some day.
>>
>> This is one of the reasons many experienced R programmers eschew the use of the
>> "$" function in programming.
>>
>> The preferred use would be :
>>
>> data2[['Samp']]
>>
>> (No partial match.)
>>
>>>
>>> [[alternative HTML version deleted]]
>>
>> Plain text is generally preferred on Rhelp but there does not appear to have
>> been a problem in this posting instance.
>>
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius
>> Alameda, CA, USA
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: In praise of "options(warnPartialMatchDollar = TRUE)"

Bert Gunter-2
...
>
> After either method, you can `cbind` to that dfrm object to your heart's content, because the cbind.data.frame method is dispatched.
>
> --
> David.
>

... But of course, this is unnecessary anyway, as:

"The cbind data frame method is just a wrapper for data.frame(...,
check.names = FALSE). This means that it will split matrix columns in
data frame arguments, and convert character columns to factors unless
stringsAsFactors = FALSE is specified."


Cheers,
Bert

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: In praise of "options(warnPartialMatchDollar = TRUE)"

David Winsemius

> On Nov 27, 2016, at 2:03 PM, Bert Gunter <[hidden email]> wrote:
>
> ...
>>
>> After either method, you can `cbind` to that dfrm object to your heart's content, because the cbind.data.frame method is dispatched.
>>
>> --
>> David.
>>
>
> ... But of course, this is unnecessary anyway, as:
>
> "The cbind data frame method is just a wrapper for data.frame(...,
> check.names = FALSE). This means that it will split matrix columns in
> data frame arguments, and convert character columns to factors unless
> stringsAsFactors = FALSE is specified."

Good point especially the reminder about the stringsAsFactors pitfall. The factor call to create my example was superfluous, but if I had included a character column the print method for dataframes would h=not have warned me anyway. Neither does the print method for data.tables. Only the print method for the dplyr-tibble objects gives a clear indication of factor-classed columns.

--
Best;

David Winsemius
Alameda, CA, USA

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.