Dataframe columns are accessible by incomplete column names, is this a bug?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Dataframe columns are accessible by incomplete column names, is this a bug?

Yannick.Suter
Hello all
I noticed today that you can access dataframe columns by using incomplete names. This is a really unexpected behavior which led to some unexpected errors and I was wondering whether it's a bug or not and whether it should be changed in the future.
Here's a working example using the preinstalled "swiss" dataset:

> head(swiss)
             Fertility Agriculture Examination Education Catholic
Courtelary        80.2        17.0          15        12     9.96
Delemont          83.1        45.1           6         9    84.84
Franches-Mnt      92.5        39.7           5         5    93.40
Moutier           85.8        36.5          12         7    33.77
Neuveville        76.9        43.5          17        15     5.16
Porrentruy        76.1        35.3           9         7    90.57
             Infant.Mortality
Courtelary               22.2
Delemont                 22.2
Franches-Mnt             20.2
Moutier                  20.3
Neuveville               20.6
Porrentruy               26.6
> swiss$E
NULL
> swiss$Ex
[1] 15  6  5 12 17  9 16 14 12 16 14 21 14 19 22 18 17 26 31 19 22 14 22 20 12
[26] 14  6 16 25 15  3  7  5 12  7  9  3 13 26 29 22 35 15 25 37 16 22
> swiss$Ed
[1] 12  9  5  7 15  7  7  8  7 13  6 12  7 12  5  2  8 28 20  9 10  3 12  6  1
[26]  8  3 10 19  8  2  6  2  6  3  9  3 13 12 11 13 32  7  7 53 29 29

So in order to access the column "Examination", I can type any substring from "Ex" to "Examination" and will always get the column swiss$Examination.

Thanks for reading and Greetings
Yannick Suter

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dataframe columns are accessible by incomplete column names, is this a bug?

Sarah Goslee
Hello Yannick,

That behavior is documented in the help for subsetting ( ?'$' ):

     Both ‘[[’ and ‘$’ select a single element of the list.  The main
     difference is that ‘$’ does not allow computed indices, whereas
     ‘[[’ does.  ‘x$name’ is equivalent to ‘x[["name", exact =
     FALSE]]’.  Also, the partial matching behavior of ‘[[’ can be
     controlled using the ‘exact’ argument.

You can avoid it by using [[]] instead:

> swiss[['Ex']]
NULL
> head(swiss[['Examination']])
[1] 15  6  5 12 17  9

That's one of the major reasons using $ is sometimes discouraged.

Sarah

On Thu, Jul 18, 2019 at 11:38 AM <[hidden email]> wrote:

>
> Hello all
> I noticed today that you can access dataframe columns by using incomplete names. This is a really unexpected behavior which led to some unexpected errors and I was wondering whether it's a bug or not and whether it should be changed in the future.
> Here's a working example using the preinstalled "swiss" dataset:
>
> > head(swiss)
>              Fertility Agriculture Examination Education Catholic
> Courtelary        80.2        17.0          15        12     9.96
> Delemont          83.1        45.1           6         9    84.84
> Franches-Mnt      92.5        39.7           5         5    93.40
> Moutier           85.8        36.5          12         7    33.77
> Neuveville        76.9        43.5          17        15     5.16
> Porrentruy        76.1        35.3           9         7    90.57
>              Infant.Mortality
> Courtelary               22.2
> Delemont                 22.2
> Franches-Mnt             20.2
> Moutier                  20.3
> Neuveville               20.6
> Porrentruy               26.6
> > swiss$E
> NULL
> > swiss$Ex
> [1] 15  6  5 12 17  9 16 14 12 16 14 21 14 19 22 18 17 26 31 19 22 14 22 20 12
> [26] 14  6 16 25 15  3  7  5 12  7  9  3 13 26 29 22 35 15 25 37 16 22
> > swiss$Ed
> [1] 12  9  5  7 15  7  7  8  7 13  6 12  7 12  5  2  8 28 20  9 10  3 12  6  1
> [26]  8  3 10 19  8  2  6  2  6  3  9  3 13 12 11 13 32  7  7 53 29 29
>
> So in order to access the column "Examination", I can type any substring from "Ex" to "Examination" and will always get the column swiss$Examination.
>
> Thanks for reading and Greetings
> Yannick Suter
>
--
Sarah Goslee (she/her)
http://www.numberwright.com

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Dataframe columns are accessible by incomplete column names, is this a bug?

Patrick (Malone Quantitative)
But it's also a convenience feature. Note that $E returned null
because there was an ambiguity. By the time you got to $Ex the column
you were referencing was unambiguous and you didn't have to type out
the whole thing. Useful if you have very long column names, for
example imported from a spreadsheet.

That said, I agree that relying on it can be risky.

Also, please use plain-text to post to this list. Your table would
have been much easier to read.

Pat

On Thu, Jul 18, 2019 at 11:56 AM Sarah Goslee <[hidden email]> wrote:

>
> Hello Yannick,
>
> That behavior is documented in the help for subsetting ( ?'$' ):
>
>      Both ‘[[’ and ‘$’ select a single element of the list.  The main
>      difference is that ‘$’ does not allow computed indices, whereas
>      ‘[[’ does.  ‘x$name’ is equivalent to ‘x[["name", exact =
>      FALSE]]’.  Also, the partial matching behavior of ‘[[’ can be
>      controlled using the ‘exact’ argument.
>
> You can avoid it by using [[]] instead:
>
> > swiss[['Ex']]
> NULL
> > head(swiss[['Examination']])
> [1] 15  6  5 12 17  9
>
> That's one of the major reasons using $ is sometimes discouraged.
>
> Sarah
>
> On Thu, Jul 18, 2019 at 11:38 AM <[hidden email]> wrote:
> >
> > Hello all
> > I noticed today that you can access dataframe columns by using incomplete names. This is a really unexpected behavior which led to some unexpected errors and I was wondering whether it's a bug or not and whether it should be changed in the future.
> > Here's a working example using the preinstalled "swiss" dataset:
> >
> > > head(swiss)
> >              Fertility Agriculture Examination Education Catholic
> > Courtelary        80.2        17.0          15        12     9.96
> > Delemont          83.1        45.1           6         9    84.84
> > Franches-Mnt      92.5        39.7           5         5    93.40
> > Moutier           85.8        36.5          12         7    33.77
> > Neuveville        76.9        43.5          17        15     5.16
> > Porrentruy        76.1        35.3           9         7    90.57
> >              Infant.Mortality
> > Courtelary               22.2
> > Delemont                 22.2
> > Franches-Mnt             20.2
> > Moutier                  20.3
> > Neuveville               20.6
> > Porrentruy               26.6
> > > swiss$E
> > NULL
> > > swiss$Ex
> > [1] 15  6  5 12 17  9 16 14 12 16 14 21 14 19 22 18 17 26 31 19 22 14 22 20 12
> > [26] 14  6 16 25 15  3  7  5 12  7  9  3 13 26 29 22 35 15 25 37 16 22
> > > swiss$Ed
> > [1] 12  9  5  7 15  7  7  8  7 13  6 12  7 12  5  2  8 28 20  9 10  3 12  6  1
> > [26]  8  3 10 19  8  2  6  2  6  3  9  3 13 12 11 13 32  7  7 53 29 29
> >
> > So in order to access the column "Examination", I can type any substring from "Ex" to "Examination" and will always get the column swiss$Examination.
> >
> > Thanks for reading and Greetings
> > Yannick Suter
> >
> --
> Sarah Goslee (she/her)
> http://www.numberwright.com
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.