read.table with ":" in column names (PR#8511)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

read.table with ":" in column names (PR#8511)

peverlorenvanthemaat
Full_Name: emiel ver loren
Version: 2.2.0
OS: Windows XP
Submission from: (NULL) (145.117.31.248)


Dear R-community and developers,

I have been trying to read in a tab delimeted file where the column names and
the row names are of the form "GO:0000051" (gene ontology IDs). When using:

> gomat<-read.table("test.txt")
> colnames(gomat)[1]
[1] "GO.0000051"
> rownames(gomat)[1]
[1] "GO:0000002"

Which means that ":" is transformed into a "." !! This seems like Excel when it
is trying to guess what I am really ment (and turning 1/1/1 into 1-1-2001).

Furthermore, I found the following quite strange as well:

> gomat2<-read.delim2("test.txt",header=FALSE)
> gomat2[1,1:2]
          V1         V2
1 GO:0000051 GO:0000280
>  as.character(gomat2[1,1:2])
[1] "8" "2"
> as.character(gomat2[1,1])
[1] "GO:0000051"

I have found a way to work around it, but I am wandering what's happening....

The tab-delimited file look like:

GO:0000051 GO:0000280 GO:0000740
GO:0000002 0 0 0
GO:0000004 0 0 0
GO:0000012 0 0 0
GO:0000014 0 0 0
GO:0000015 0 0 0
GO:0000018 0 0 0
GO:0000019 0 0 0

Thanks for helping, and

Emiel

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: read.table with ":" in column names (PR#8511)

Prof Brian Ripley
Please do not report documented behaviour as a bug!
See the 'check.names' argument to read.table.

In your second example you are applying as.character to a data frame, and
you seem not to realize that.  We specifically ask you NOT to use R-bugs
to ask questions.  (What is happening is that you got the internal codes
of the factor columns, which is not what you intended.  If you want
character columns, read them as such.)

On Fri, 20 Jan 2006 [hidden email] wrote:

> Full_Name: emiel ver loren
> Version: 2.2.0

We do ask you not to send reports on obselete versions of R.

> OS: Windows XP
> Submission from: (NULL) (145.117.31.248)
>
>
> Dear R-community and developers,
>
> I have been trying to read in a tab delimeted file where the column names and
> the row names are of the form "GO:0000051" (gene ontology IDs). When using:
>
>> gomat<-read.table("test.txt")
>> colnames(gomat)[1]
> [1] "GO.0000051"
>> rownames(gomat)[1]
> [1] "GO:0000002"
>
> Which means that ":" is transformed into a "." !! This seems like Excel when it
> is trying to guess what I am really ment (and turning 1/1/1 into 1-1-2001).
>
> Furthermore, I found the following quite strange as well:
>
>> gomat2<-read.delim2("test.txt",header=FALSE)
>> gomat2[1,1:2]
>          V1         V2
> 1 GO:0000051 GO:0000280
>>  as.character(gomat2[1,1:2])
> [1] "8" "2"
>> as.character(gomat2[1,1])
> [1] "GO:0000051"
>
> I have found a way to work around it, but I am wandering what's happening....
>
> The tab-delimited file look like:
>
> GO:0000051 GO:0000280 GO:0000740
> GO:0000002 0 0 0
> GO:0000004 0 0 0
> GO:0000012 0 0 0
> GO:0000014 0 0 0
> GO:0000015 0 0 0
> GO:0000018 0 0 0
> GO:0000019 0 0 0
>
> Thanks for helping, and
>
> Emiel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: read.table with ":" in column names (PR#8511)

Peter Dalgaard
In reply to this post by peverlorenvanthemaat
[hidden email] writes:

> Full_Name: emiel ver loren
> Version: 2.2.0
> OS: Windows XP
> Submission from: (NULL) (145.117.31.248)
>
>
> Dear R-community and developers,
>
> I have been trying to read in a tab delimeted file where the column names and
> the row names are of the form "GO:0000051" (gene ontology IDs). When using:
>
> > gomat<-read.table("test.txt")
> > colnames(gomat)[1]
> [1] "GO.0000051"
> > rownames(gomat)[1]
> [1] "GO:0000002"
>
> Which means that ":" is transformed into a "." !! This seems like Excel when it
> is trying to guess what I am really ment (and turning 1/1/1 into 1-1-2001).

This is what check.names=FALSE is for... (and NOT a bug, please don't
abuse the bug repository, use the mailing lists)
 

> Furthermore, I found the following quite strange as well:
>
> > gomat2<-read.delim2("test.txt",header=FALSE)
> > gomat2[1,1:2]
>           V1         V2
> 1 GO:0000051 GO:0000280
> >  as.character(gomat2[1,1:2])
> [1] "8" "2"
> > as.character(gomat2[1,1])
> [1] "GO:0000051"
>
> I have found a way to work around it, but I am wandering what's happening....

Yes, this is a bit nasty, but... What is happening is similar to this:

> d <- data.frame(a=factor(LETTERS), b=factor(letters))
> d[1,]
  a b
1 A a
> as.character(d[1,])
[1] "1" "1"
> as.character(d[1,1])
[1] "A"
> as.character(d[1,1,drop=F])
[1] "1"

or this:

> l <- list(a=factor("x"),b=factor("y"))
> l
$a
[1] x
Levels: x

$b
[1] y
Levels: y

> as.character(l)
[1] "1" "1"

The thing is that as.character on a list will first coerce factors to
numeric, then numeric to character. I'm not sure whether there could
be a rationale for it, but it isn't S-PLUS compatible (not 6.2.1
anyway, which is the most recent one that I have access to).


--
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: read.table with ":" in column names (PR#8511)

Roger Bivand
In reply to this post by peverlorenvanthemaat
On Fri, 20 Jan 2006 [hidden email] wrote:

> Full_Name: emiel ver loren
> Version: 2.2.0
> OS: Windows XP
> Submission from: (NULL) (145.117.31.248)
>
>
> Dear R-community and developers,
>
> I have been trying to read in a tab delimeted file where the column names and
> the row names are of the form "GO:0000051" (gene ontology IDs). When using:
>
> > gomat<-read.table("test.txt")
> > colnames(gomat)[1]
> [1] "GO.0000051"
> > rownames(gomat)[1]
> [1] "GO:0000002"
>
> Which means that ":" is transformed into a "." !! This seems like Excel when it
> is trying to guess what I am really ment (and turning 1/1/1 into 1-1-2001).

Wrong.

?read.table says with reference to the check.names = TRUE argument that:

"check.names: logical.  If 'TRUE' then the names of the variables in the
          data frame are checked to ensure that they are syntactically
          valid variable names.  If necessary they are adjusted (by
          'make.names') so that they are, and also to ensure that there
          are no duplicates."

> make.names("GO:0000051")
[1] "GO.0000051"

You can use "GO:0000051" as a column name if quoted, otherwise ":" is an
operator, so the default value of the check.names argument is sound.

If you "ment" to do what you say, you should have set check.names=FALSE.

>
> Furthermore, I found the following quite strange as well:
>
> > gomat2<-read.delim2("test.txt",header=FALSE)
> > gomat2[1,1:2]
>           V1         V2
> 1 GO:0000051 GO:0000280
> >  as.character(gomat2[1,1:2])
> [1] "8" "2"
> > as.character(gomat2[1,1])
> [1] "GO:0000051"
>
> I have found a way to work around it, but I am wandering what's happening....
>
> The tab-delimited file look like:
>
> GO:0000051 GO:0000280 GO:0000740
> GO:0000002 0 0 0
> GO:0000004 0 0 0
> GO:0000012 0 0 0
> GO:0000014 0 0 0
> GO:0000015 0 0 0
> GO:0000018 0 0 0
> GO:0000019 0 0 0
>
> Thanks for helping, and
>
> Emiel
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: [hidden email]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Roger Bivand
Department of Economics
NHH Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Re: read.table with ":" in column names (PR#8511)

Peter Dalgaard
In reply to this post by Prof Brian Ripley
[hidden email] writes:

> On Fri, 20 Jan 2006 [hidden email] wrote:
>
> > Full_Name: emiel ver loren
> > Version: 2.2.0
>
> We do ask you not to send reports on obselete versions of R.

Well, we might forgive that (please at least check against the current
NEWS file), but

USING FAKE EMAIL ADDRESSES ON BUG REPORTS IS BLOODY UNFORGIVEABLE!!!!!

Grrrr....

--
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.character on list (was read.table with ":" in column names)

Prof Brian Ripley
In reply to this post by Peter Dalgaard
On Fri, 20 Jan 2006, Peter Dalgaard wrote:

[...]

> Yes, this is a bit nasty, but... What is happening is similar to this:
>
>> d <- data.frame(a=factor(LETTERS), b=factor(letters))
>> d[1,]
>  a b
> 1 A a
>> as.character(d[1,])
> [1] "1" "1"
>> as.character(d[1,1])
> [1] "A"
>> as.character(d[1,1,drop=F])
> [1] "1"
>
> or this:
>
>> l <- list(a=factor("x"),b=factor("y"))
>> l
> $a
> [1] x
> Levels: x
>
> $b
> [1] y
> Levels: y
>
>> as.character(l)
> [1] "1" "1"
>
> The thing is that as.character on a list will first coerce factors to
> numeric, then numeric to character.

Nope.  It just coerces an INTSXP to a STRSXP.  as.character (and all other
forms of coercion that I can think of quickly) ignores classes except when
initially dispatching.

Note that these examples are special cases:

> as.character(d[1:2,])
[1] "c(1, 2)" "c(1, 2)"

may also be unexpected but follows from the general (undocumented, I
dare say) rules.

> I'm not sure whether there could be a rationale for it, but it isn't
> S-PLUS compatible (not 6.2.1 anyway, which is the most recent one that I
> have access to).

My S-PLUS deparses:

> l <- list(a=factor("x"),b=factor("y"))
> as.character(l)
[1] "structure(.Data = 1, .Label = \"x\", class = \"factor\")"
[2] "structure(.Data = 1, .Label = \"y\", class = \"factor\")"

which seems no better (and probably worse).

The only other consistent option I can see is for all coercion methods to
dispatch at each element of a recursive object, which I suspect introduces
a considerable overhead for very little gain.

One could perhaps argue for a data.frame method, since coercion operations
on dataframes are rare and that is a case where people get factors where
they wanted character columns.

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: as.character on list (was read.table with ":" in column names)

Peter Dalgaard
Prof Brian Ripley <[hidden email]> writes:


> > The thing is that as.character on a list will first coerce factors to
> > numeric, then numeric to character.
>
> Nope.  It just coerces an INTSXP to a STRSXP.  as.character (and all
> other forms of coercion that I can think of quickly) ignores classes
> except when initially dispatching.

OK. I just meant that "de facto" it is like as.character(as.integer(f))
 
> Note that these examples are special cases:
>
> > as.character(d[1:2,])
> [1] "c(1, 2)" "c(1, 2)"
>
> may also be unexpected but follows from the general (undocumented, I
> dare say) rules.

and unlike as.character(as.integer(f)), so I do stand corrected....

> > I'm not sure whether there could be a rationale for it, but it isn't
> > S-PLUS compatible (not 6.2.1 anyway, which is the most recent one
> > that I have access to).
>
> My S-PLUS deparses:
>
> > l <- list(a=factor("x"),b=factor("y"))
> > as.character(l)
> [1] "structure(.Data = 1, .Label = \"x\", class = \"factor\")"
> [2] "structure(.Data = 1, .Label = \"y\", class = \"factor\")"
>
> which seems no better (and probably worse).

Same here. Arguably, we deparse too, we just discard attributes first.
Both S-PLUS and R will do

> as.character(list(a=1:5,b=3))
[1] "c(1, 2, 3, 4, 5)" "3"


> The only other consistent option I can see is for all coercion methods
> to dispatch at each element of a recursive object, which I suspect
> introduces a considerable overhead for very little gain.

Then again maybe not, but it is one of those things which have the
potential to break things in unexpected places if you change it.
 
> One could perhaps argue for a data.frame method, since coercion
> operations on dataframes are rare and that is a case where people get
> factors where they wanted character columns.

Agreed.

--
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel