Oddity: I seem to have a variable in a dataframe that doesn't show in colnames() - can anyone advise?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Oddity: I seem to have a variable in a dataframe that doesn't show in colnames() - can anyone advise?

Chris Evans
I may be being dopey, I surely am, but I'm baffled by this.  I've been
working, on and off for a few days in R version 2.13.0 (2011-04-13)
i386-pc-mingw32/i386 (32-bit) working it through ESS.

I've got a dataframe created a couple of days back, during the session:
> dim(AllDat)
[1] 27270    94

I came back this morning and misremembered my variables and thought I
had a variable AllDat$PHQ and started using it and everything seemed
fine until I realised that I shouldn't have it (!) and that the variable
I was thinking of is AllDat$PHQ9 and that's there:
> colnames(AllDat)[grep("PHQ",colnames(AllDat))]
[1] "PHQ9"    "HasPHQ"  "ZeroPHQ"

and, as you can see, AllDat$PHQ.  But I can I do:

> head(table(AllDat$PHQ))
  0   1   2   3   4   5
731 527 764 845 872 915

Ooops ... so AllDat$PHQ _DOES_ exist.  Its contents exactly match
AllDat$PHQ9:
> table(abs(AllDat$PHQ - AllDat$PHQ9))
    0
19032

I have searched back through my ESS transcript back to the start of the
session and I can't see anywhere I've assigned to AllDat$PHQ (and I've
never used "attach").

However, I guess that somehow I must have managed to duplicate AllDat in
more than one open environment so I check out and I have 16 environments
(I'm sure that's not right terminology, apologies):
> search()
 [1] ".GlobalEnv"        "package:reshape2"
 [3] "package:Hmisc"     "package:survival"
 [5] "package:splines"   "package:nnet"
 [7] "package:MASS"      "package:gdata"
 [9] "package:stats"     "package:graphics"
[11] "package:grDevices" "package:utils"
[13] "package:datasets"  "package:methods"
[15] "Autoloads"         "package:base"

So I try:
> for (i in 1:16) { print(paste("i =",i,exists("AllDat",i,inherits =
FALSE))) }
[1] "i = 1 TRUE"
[1] "i = 2 FALSE"
[1] "i = 3 FALSE"
[1] "i = 4 FALSE"
[1] "i = 5 FALSE"
[1] "i = 6 FALSE"
[1] "i = 7 FALSE"
[1] "i = 8 FALSE"
[1] "i = 9 FALSE"
[1] "i = 10 FALSE"
[1] "i = 11 FALSE"
[1] "i = 12 FALSE"
[1] "i = 13 FALSE"
[1] "i = 14 FALSE"
[1] "i = 15 FALSE"
[1] "i = 16 FALSE"

So I don't think I do have two different AllDat dataframes.

Can anyone throw light on what's going on?  I have searched archives
etc. but can't think of sensible keywords and so far turned up nothing.
 Happy to be told RTFM or the equivalent but could someone point me to a
specific location?  Also happy to try any diagnostics anyone recommends.

Many thanks in advance,

Chris

--
Chris Evans <[hidden email]> Skype: chris-psyctc
Consultant Psychiatrist in Psychotherapy, Notts. PDD network;
Professor, Psychotherapy, Nottingham University
*If I am writing from one of those roles, it will be clear. Otherwise*
*my views are my own and not representative of those institutions    *
If you have difficulty Emailing me on this address or getting a reply,
send again but cc to:       chris dot evans at nottshc dot nhs dot uk
and to:                     c dot evans at nottingham dot ac dot uk

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Oddity: I seem to have a variable in a dataframe that doesn't show in colnames() - can anyone advise?

Phil Spector
Chris -
     If you check the documentation for the "$" operator,
for example by typing

help("$")

you'll find (among a lot of other information):

     name: A literal character string or a name (possibly backtick
           quoted).  For extraction, this is normally (see under
           ‘Environments’) partially matched to the ‘names’ of the
           object.

So when you use the "$" operator (but not "[" or "[["), partial
matching is performed.  For example:

> x = data.frame(PHQ9=1:10)
> x$PHQ
  [1]  1  2  3  4  5  6  7  8  9 10
> x[,'PHQ']
Error in `[.data.frame`(x, , "PHQ") : undefined columns selected
> x[['PHQ']]
NULL

So if you don't want this "feature", you can use brackets instead
of the dollar sign for extraction.

  - Phil Spector
  Statistical Computing Facility
  Department of Statistics
  UC Berkeley
  [hidden email]


On Sun, 29 May 2011, Chris Evans wrote:

> I may be being dopey, I surely am, but I'm baffled by this.  I've been
> working, on and off for a few days in R version 2.13.0 (2011-04-13)
> i386-pc-mingw32/i386 (32-bit) working it through ESS.
>
> I've got a dataframe created a couple of days back, during the session:
>> dim(AllDat)
> [1] 27270    94
>
> I came back this morning and misremembered my variables and thought I
> had a variable AllDat$PHQ and started using it and everything seemed
> fine until I realised that I shouldn't have it (!) and that the variable
> I was thinking of is AllDat$PHQ9 and that's there:
>> colnames(AllDat)[grep("PHQ",colnames(AllDat))]
> [1] "PHQ9"    "HasPHQ"  "ZeroPHQ"
>
> and, as you can see, AllDat$PHQ.  But I can I do:
>
>> head(table(AllDat$PHQ))
>  0   1   2   3   4   5
> 731 527 764 845 872 915
>
> Ooops ... so AllDat$PHQ _DOES_ exist.  Its contents exactly match
> AllDat$PHQ9:
>> table(abs(AllDat$PHQ - AllDat$PHQ9))
>    0
> 19032
>
> I have searched back through my ESS transcript back to the start of the
> session and I can't see anywhere I've assigned to AllDat$PHQ (and I've
> never used "attach").
>
> However, I guess that somehow I must have managed to duplicate AllDat in
> more than one open environment so I check out and I have 16 environments
> (I'm sure that's not right terminology, apologies):
>> search()
> [1] ".GlobalEnv"        "package:reshape2"
> [3] "package:Hmisc"     "package:survival"
> [5] "package:splines"   "package:nnet"
> [7] "package:MASS"      "package:gdata"
> [9] "package:stats"     "package:graphics"
> [11] "package:grDevices" "package:utils"
> [13] "package:datasets"  "package:methods"
> [15] "Autoloads"         "package:base"
>
> So I try:
>> for (i in 1:16) { print(paste("i =",i,exists("AllDat",i,inherits =
> FALSE))) }
> [1] "i = 1 TRUE"
> [1] "i = 2 FALSE"
> [1] "i = 3 FALSE"
> [1] "i = 4 FALSE"
> [1] "i = 5 FALSE"
> [1] "i = 6 FALSE"
> [1] "i = 7 FALSE"
> [1] "i = 8 FALSE"
> [1] "i = 9 FALSE"
> [1] "i = 10 FALSE"
> [1] "i = 11 FALSE"
> [1] "i = 12 FALSE"
> [1] "i = 13 FALSE"
> [1] "i = 14 FALSE"
> [1] "i = 15 FALSE"
> [1] "i = 16 FALSE"
>
> So I don't think I do have two different AllDat dataframes.
>
> Can anyone throw light on what's going on?  I have searched archives
> etc. but can't think of sensible keywords and so far turned up nothing.
> Happy to be told RTFM or the equivalent but could someone point me to a
> specific location?  Also happy to try any diagnostics anyone recommends.
>
> Many thanks in advance,
>
> Chris
>
> --
> Chris Evans <[hidden email]> Skype: chris-psyctc
> Consultant Psychiatrist in Psychotherapy, Notts. PDD network;
> Professor, Psychotherapy, Nottingham University
> *If I am writing from one of those roles, it will be clear. Otherwise*
> *my views are my own and not representative of those institutions    *
> If you have difficulty Emailing me on this address or getting a reply,
> send again but cc to:       chris dot evans at nottshc dot nhs dot uk
> and to:                     c dot evans at nottingham dot ac dot uk
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Oddity: I seem to have a variable in a dataframe that doesn't show in colnames() - can anyone advise?

Chris Evans
I thought it might have been that but stupidly didn't search on "$" and
I can now see that the one partial match I tried would have been
ambiguous so "$" hadn't resolved it.  Patrick Burns tells me I could
have found this in the wonderful "R inferno" and I'm sure I could have,
probably have, read that in other things, perhaps things I read way back.

What a reminder that you/we ... OK, _I_ ... can use R for some 13 years
or so now but still not known or forget crucial things like that.

This seems to me a good example of the sort of thing that R uses that
can be useful but can also be a trap for part-timer.  I keep getting
tripped up by R moving things to factors and me not realising that, so
now I've opted to go for options(stringsAsFactors=FALSE) so I can force
myself to retain explicit control of that.  I also repeatedly stumble on
aspects of date handling in R.

Every time I stumble in this sort of way I can see the reasons why R is
designed as it is and the power and efficiency in it and my respect for
the core team grows.  However, it can still make the learning curve
steep & hard.  Thanks to r-help for providing the free tour guide up
Everest (or into the inferno)!

Specific thanks to you both for pointing this one out and apologies if
this is just wasted bandwidth.

Chris

Phil Spector sent the following  at 29/05/2011 16:06:

> Chris -
>     If you check the documentation for the "$" operator, for example by
> typing
>
> help("$")
>
> you'll find (among a lot of other information):
>
>     name: A literal character string or a name (possibly backtick
>           quoted).  For extraction, this is normally (see under
>           ¡Environments¢) partially matched to the ¡names¢ of the
>           object.
>
> So when you use the "$" operator (but not "[" or "[["), partial
> matching is performed.  For example:
>
>> x = data.frame(PHQ9=1:10)
>> x$PHQ
>  [1]  1  2  3  4  5  6  7  8  9 10
>> x[,'PHQ']
> Error in `[.data.frame`(x, , "PHQ") : undefined columns selected
>> x[['PHQ']]
> NULL
>
> So if you don't want this "feature", you can use brackets instead
> of the dollar sign for extraction.
>
>                     - Phil Spector
>                      Statistical Computing Facility
>                      Department of Statistics
>                      UC Berkeley
>                      [hidden email]


--
Chris Evans <[hidden email]> Skype: chris-psyctc
Consultant Psychiatrist in Psychotherapy, Notts. PDD network;
Professor, Psychotherapy, Nottingham University
*If I am writing from one of those roles, it will be clear. Otherwise*
*my views are my own and not representative of those institutions    *
If you have difficulty Emailing me on this address or getting a reply,
send again but cc to:       chris dot evans at nottshc dot nhs dot uk
and to:                     c dot evans at nottingham dot ac dot uk

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.