read.table(..., header == FALSE, colClasses = <vector with names attribute>)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

read.table(..., header == FALSE, colClasses = <vector with names attribute>)

Benjamin Tyner
Hello

I noticed that starting with R version 3.3.0 onward, this generates a
warning:

    > txt <- c("a", "3.14")
    > read.table(file = textConnection(txt), header = FALSE, colClasses
= c(x = "character", y = "numeric"))

the warning is "not all columns named in 'colClasses' exist" and I guess
the change was made in response to this?

    https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16478

Regardless, I am wondering whether this is desirable, that as a result
of the change, the code has become stricter about the presence of a
(formerly) harmless names attribute. Or am I missing something?

Regards

Ben

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: read.table(..., header == FALSE, colClasses = <vector with names attribute>)

Jeff Newmiller
You are constructing the equivalent of a two-line data file, and complaining that it is not treating it like it was one line. If it did used to accept this silently [skeptical] then I for one am glad it produces a warning now.
--
Sent from my phone. Please excuse my brevity.

On October 23, 2017 2:53:21 PM PDT, Benjamin Tyner <[hidden email]> wrote:

>Hello
>
>I noticed that starting with R version 3.3.0 onward, this generates a
>warning:
>
>    > txt <- c("a", "3.14")
>   > read.table(file = textConnection(txt), header = FALSE, colClasses
>= c(x = "character", y = "numeric"))
>
>the warning is "not all columns named in 'colClasses' exist" and I
>guess
>the change was made in response to this?
>
>    https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16478
>
>Regardless, I am wondering whether this is desirable, that as a result
>of the change, the code has become stricter about the presence of a
>(formerly) harmless names attribute. Or am I missing something?
>
>Regards
>
>Ben
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: read.table(..., header == FALSE, colClasses = <vector with names attribute>)

Benjamin Tyner
Jeff,

Thank you for your reply. The intent was to construct a minimum
reproducible example. The same warning occurs when the 'file' argument
points to a file on disk with a million lines. But you are correct, my
example was slightly malformed and in fact gives an error under R
version 3.2.2. Please allow me to try again; in older versions of R,

    > read.table(file = textConnection("a\t3.14"), header = FALSE,
colClasses = c(x = "character", y = "numeric"), sep="\t")
      V1   V2
    1  a 3.14

(with no warning). As of version 3.3.0,

    > read.table(file = textConnection("a\t3.14"), header = FALSE,
colClasses = c(x = "character", y = "numeric"), sep="\t")
      V1   V2
    1  a 3.14
    Warning message:
    In read.table(file = textConnection("a\t3.14"), header = FALSE,  :
      not all columns named in 'colClasses' exist

My intent was not to complain but rather to learn more about best
practices regarding the names attribute.

Regards

Ben



On 10/23/2017 08:51 PM, Jeff Newmiller wrote:
> You are constructing the equivalent of a two-line data file, and complaining that it is not treating it like it was one line. If it did used to accept this silently [skeptical] then I for one am glad it produces a warning now.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: read.table(..., header == FALSE, colClasses = <vector with names attribute>)

Martin Maechler
>>>>> Benjamin Tyner <[hidden email]>
>>>>>     on Tue, 24 Oct 2017 07:21:33 -0400 writes:

    > Jeff,
    > Thank you for your reply. The intent was to construct a minimum
    > reproducible example. The same warning occurs when the 'file' argument
    > points to a file on disk with a million lines. But you are correct, my
    > example was slightly malformed and in fact gives an error under R
    > version 3.2.2. Please allow me to try again; in older versions of R,

    >    > read.table(file = textConnection("a\t3.14"), header = FALSE,
    > colClasses = c(x = "character", y = "numeric"), sep="\t")
    >      V1   V2
    >    1  a 3.14

    > (with no warning). As of version 3.3.0,

    >    > read.table(file = textConnection("a\t3.14"), header = FALSE,
    > colClasses = c(x = "character", y = "numeric"), sep="\t")
    >      V1   V2
    >    1  a 3.14
    >    Warning message:
    >    In read.table(file = textConnection("a\t3.14"), header = FALSE,  :
    >      not all columns named in 'colClasses' exist

    > My intent was not to complain but rather to learn more about best
    > practices regarding the names attribute.

which is a nice attitude, thank you.

An even shorter MRE (as header=FALSE is default, and the default
sep="" works, too):

> tt <- read.table(textConnection("a 3.14"), colClasses = c(x="character", y="numeric"))
Warning message:
In read.table(file = textConnection("a 3.14"), colClasses = c(x = "character",  :
  not all columns named in 'colClasses' exist
>

If you read in the help page -- you did read that before posting, did you?---
how 'colClasses' should be specified ,

    colClasses: character.  A vector of classes to be assumed for the
              columns.  If unnamed, recycled as necessary.  If named, names
              are matched with unspecified values being taken to be ‘NA’.

              Possible values are ..................
              .........

and the 'x' and 'y' names you used, are matched with the
colnames ... which on the other hand are "V1" and "V2"  for
you, and so you provoke a warning.

Once you have read (and understood) the above part of the help
page, it becomes, easy, no?

> tt <- read.table(textConnection("a 3.14"), colClasses = c("character","numeric"))
> t2 <- read.table(textConnection("a 3.14"), colClasses=c(x="character",y="numeric"), col.names=c("x","y"))
> t2
  x    y
1 a 3.14
>

i.e., no warning in both of these two cases.  

So please, please, PLEASE: at least non-beginners like you *should*
take the effort to read the help page (and report if these seem
incomplete or otherwise improvable)...

Best,
Martin Maechler
ETH Zurich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: read.table(..., header == FALSE, colClasses = <vector with names attribute>)

Benjamin Tyner
Yes, it makes sense now; lesson learned. Thank you both! Sometimes it
seems that no matter how good the documentation, some useR will
inevitably (ab)use the code in ways that were never intended by the
authors. Then when the code and/or documentation changes, it is not
always obvious to the useR whether the intent of the authors has
changed, or whether the useR had just been "getting the right answer for
the wrong reason" all along. In this particular case, the change was
documented as stemming from a "new feature" (as opposed to a bugfix or
more stringent argument checking) that might appear to be a non fully
backwards compatible change. For example one might have the (apparently)
bad habit of using col.names as a shortcut to rename headers on-the-fly ...

    > getRversion()
    [1] ‘3.2.2’

    > read.table(textConnection("x y\na 3.14"), header = TRUE,
colClasses = c(x = "character", y = "numeric"), col.names = c("foo", "bar"))
      foo  bar
    1   a 3.14

but indeed, the names attribute has zero effect on the result:

    > read.table(textConnection("x y\na 3.14"), header = TRUE,
colClasses = c(y = "character", x = "numeric"), col.names = c("foo", "bar"))
      foo  bar
    1   a 3.14

so I agree it is good that we are checking for that now.

Regards
Ben

On 10/24/2017 08:55 AM, Martin Maechler wrote:

>>>>>> Benjamin Tyner <[hidden email]>
>>>>>>      on Tue, 24 Oct 2017 07:21:33 -0400 writes:
>      > Jeff,
>      > Thank you for your reply. The intent was to construct a minimum
>      > reproducible example. The same warning occurs when the 'file' argument
>      > points to a file on disk with a million lines. But you are correct, my
>      > example was slightly malformed and in fact gives an error under R
>      > version 3.2.2. Please allow me to try again; in older versions of R,
>
>      >    > read.table(file = textConnection("a\t3.14"), header = FALSE,
>      > colClasses = c(x = "character", y = "numeric"), sep="\t")
>      >      V1   V2
>      >    1  a 3.14
>
>      > (with no warning). As of version 3.3.0,
>
>      >    > read.table(file = textConnection("a\t3.14"), header = FALSE,
>      > colClasses = c(x = "character", y = "numeric"), sep="\t")
>      >      V1   V2
>      >    1  a 3.14
>      >    Warning message:
>      >    In read.table(file = textConnection("a\t3.14"), header = FALSE,  :
>      >      not all columns named in 'colClasses' exist
>
>      > My intent was not to complain but rather to learn more about best
>      > practices regarding the names attribute.
>
> which is a nice attitude, thank you.
>
> An even shorter MRE (as header=FALSE is default, and the default
> sep="" works, too):
>
>> tt <- read.table(textConnection("a 3.14"), colClasses = c(x="character", y="numeric"))
> Warning message:
> In read.table(file = textConnection("a 3.14"), colClasses = c(x = "character",  :
>    not all columns named in 'colClasses' exist
> If you read in the help page -- you did read that before posting, did you?---
> how 'colClasses' should be specified ,
>
>      colClasses: character.  A vector of classes to be assumed for the
>      columns.  If unnamed, recycled as necessary.  If named, names
>      are matched with unspecified values being taken to be ‘NA’.
>
>      Possible values are ..................
>      .........
>
> and the 'x' and 'y' names you used, are matched with the
> colnames ... which on the other hand are "V1" and "V2"  for
> you, and so you provoke a warning.
>
> Once you have read (and understood) the above part of the help
> page, it becomes, easy, no?
>
>> tt <- read.table(textConnection("a 3.14"), colClasses = c("character","numeric"))
>> t2 <- read.table(textConnection("a 3.14"), colClasses=c(x="character",y="numeric"), col.names=c("x","y"))
>> t2
>    x    y
> 1 a 3.14
> i.e., no warning in both of these two cases.
>
> So please, please, PLEASE: at least non-beginners like you *should*
> take the effort to read the help page (and report if these seem
> incomplete or otherwise improvable)...
>
> Best,
> Martin Maechler
> ETH Zurich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.