Quantcast

real numeric variable transforms into factor:

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

real numeric variable transforms into factor:

Aldi Kraja
Hi
Test made in: R in windows Vista OS, R version 2.8.1
 From FAQ:
http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f
"It may happen that when reading numeric data into R (usually, when
reading in a file), they come in as factors. If |f| is such a factor
object, you can use as.numeric(as.character(f)) to get the numbers back."

1: Why it may happen? Why R transforms x1 from real numeric with decimal
values into factor???
2: Doesn't it look strange to get "internal numbers" when one applies
as.numeric(x$x1)?
3. What are the internal numbers mentioned in the FAQ?
Why is needed to write:
as.numeric(as.character(x$x1)) to get finally the right numbers I read
with read.table?
Are the missing values shown as dot to force R (or the programmer who
wrote the function read.table) to consider x1 as factor?

Is it possible who is maintaining the read.table function to improve it
to recognize numbers with decimal places as numeric and not as factors
and dots as missing values which transform into NA?

The data file saved as text:
test.txt
ob,x1,y1
1,1.1,1/1
2,2.1,1/2
3,3.2,2/2
4,.,0/0
5,4.5,1/1
6,5.1,0/0
7,6.3,1/1
8,.,1/2
reading it from d directory:
x<-read.table(file="d:\\test\\test.txt",header=T,sep=',')
 > x
  ob  x1  y1
1  1 1.1 1/1
2  2 2.1 1/2
3  3 3.2 2/2
4  4   . 0/0
5  5 4.5 1/1
6  6 5.1 0/0
7  7 6.3 1/1
8  8   . 1/2
 > as.numeric(x$x1)
[1] 2 3 4 1 5 6 7 1

 > as.numeric(as.character(x$x1))
[1] 1.1 2.1 3.2  NA 4.5 5.1 6.3  NA
Warning message:
NAs introduced by coercion

Thanks,

Aldi



        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: real numeric variable transforms into factor:

Marc Schwartz-3
On Apr 17, 2009, at 2:52 PM, Aldi Kraja wrote:

> Hi
> Test made in: R in windows Vista OS, R version 2.8.1
> From FAQ:
> http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f
> "It may happen that when reading numeric data into R (usually, when
> reading in a file), they come in as factors. If |f| is such a factor
> object, you can use as.numeric(as.character(f)) to get the numbers  
> back."
>
> 1: Why it may happen? Why R transforms x1 from real numeric with  
> decimal
> values into factor???
> 2: Doesn't it look strange to get "internal numbers" when one applies
> as.numeric(x$x1)?
> 3. What are the internal numbers mentioned in the FAQ?
> Why is needed to write:
> as.numeric(as.character(x$x1)) to get finally the right numbers I read
> with read.table?
> Are the missing values shown as dot to force R (or the programmer who
> wrote the function read.table) to consider x1 as factor?
>
> Is it possible who is maintaining the read.table function to improve  
> it
> to recognize numbers with decimal places as numeric and not as factors
> and dots as missing values which transform into NA?
>
> The data file saved as text:
> test.txt
> ob,x1,y1
> 1,1.1,1/1
> 2,2.1,1/2
> 3,3.2,2/2
> 4,.,0/0
> 5,4.5,1/1
> 6,5.1,0/0
> 7,6.3,1/1
> 8,.,1/2
> reading it from d directory:
> x<-read.table(file="d:\\test\\test.txt",header=T,sep=',')
>> x
>  ob  x1  y1
> 1  1 1.1 1/1
> 2  2 2.1 1/2
> 3  3 3.2 2/2
> 4  4   . 0/0
> 5  5 4.5 1/1
> 6  6 5.1 0/0
> 7  7 6.3 1/1
> 8  8   . 1/2
>> as.numeric(x$x1)
> [1] 2 3 4 1 5 6 7 1
>
>> as.numeric(as.character(x$x1))
> [1] 1.1 2.1 3.2  NA 4.5 5.1 6.3  NA
> Warning message:
> NAs introduced by coercion
>
> Thanks,
>
> Aldi

Looks like you are taking data from SAS perhaps, where the missing  
value indicator is '.'.  In R, where type.convert() is used to  
determine the data types for incoming text data, you get:

 > type.convert(".")
[1] .
Levels: .

That is, a factor.

What you want to do is to set the 'na.strings' argument to  
read.table() to '.' rather than the default 'NA', so that the periods  
are interpreted as missing values and set to NA during import. Thus:

# Create from your data in the clipboard (on OSX)
DF <- read.table(pipe("pbpaste"), header = TRUE, sep = ",", na.strings  
= ".")

 > DF
   ob  x1  y1
1  1 1.1 1/1
2  2 2.1 1/2
3  3 3.2 2/2
4  4  NA 0/0
5  5 4.5 1/1
6  6 5.1 0/0
7  7 6.3 1/1
8  8  NA 1/2

 > str(DF)
'data.frame': 8 obs. of  3 variables:
  $ ob: int  1 2 3 4 5 6 7 8
  $ x1: num  1.1 2.1 3.2 NA 4.5 5.1 6.3 NA
  $ y1: Factor w/ 4 levels "0/0","1/1","1/2",..: 2 3 4 1 2 1 2 3

This is now because:

 > type.convert(".", na.strings = ".")
[1] NA


See ?read.table and ?type.convert for more information.

HTH,

Marc Schwartz

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: real numeric variable transforms into factor:

Aldi Kraja
Thank you Marc for your detailed and helpful info.

Aldi



Marc Schwartz wrote:

> On Apr 17, 2009, at 2:52 PM, Aldi Kraja wrote:
>
>> Hi
>> Test made in: R in windows Vista OS, R version 2.8.1
>> From FAQ:
>> http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f 
>>
>> "It may happen that when reading numeric data into R (usually, when
>> reading in a file), they come in as factors. If |f| is such a factor
>> object, you can use as.numeric(as.character(f)) to get the numbers
>> back."
>>
>> 1: Why it may happen? Why R transforms x1 from real numeric with decimal
>> values into factor???
>> 2: Doesn't it look strange to get "internal numbers" when one applies
>> as.numeric(x$x1)?
>> 3. What are the internal numbers mentioned in the FAQ?
>> Why is needed to write:
>> as.numeric(as.character(x$x1)) to get finally the right numbers I read
>> with read.table?
>> Are the missing values shown as dot to force R (or the programmer who
>> wrote the function read.table) to consider x1 as factor?
>>
>> Is it possible who is maintaining the read.table function to improve it
>> to recognize numbers with decimal places as numeric and not as factors
>> and dots as missing values which transform into NA?
>>
>> The data file saved as text:
>> test.txt
>> ob,x1,y1
>> 1,1.1,1/1
>> 2,2.1,1/2
>> 3,3.2,2/2
>> 4,.,0/0
>> 5,4.5,1/1
>> 6,5.1,0/0
>> 7,6.3,1/1
>> 8,.,1/2
>> reading it from d directory:
>> x<-read.table(file="d:\\test\\test.txt",header=T,sep=',')
>>> x
>>  ob  x1  y1
>> 1  1 1.1 1/1
>> 2  2 2.1 1/2
>> 3  3 3.2 2/2
>> 4  4   . 0/0
>> 5  5 4.5 1/1
>> 6  6 5.1 0/0
>> 7  7 6.3 1/1
>> 8  8   . 1/2
>>> as.numeric(x$x1)
>> [1] 2 3 4 1 5 6 7 1
>>
>>> as.numeric(as.character(x$x1))
>> [1] 1.1 2.1 3.2  NA 4.5 5.1 6.3  NA
>> Warning message:
>> NAs introduced by coercion
>>
>> Thanks,
>>
>> Aldi
>
> Looks like you are taking data from SAS perhaps, where the missing
> value indicator is '.'.  In R, where type.convert() is used to
> determine the data types for incoming text data, you get:
>
> > type.convert(".")
> [1] .
> Levels: .
>
> That is, a factor.
>
> What you want to do is to set the 'na.strings' argument to
> read.table() to '.' rather than the default 'NA', so that the periods
> are interpreted as missing values and set to NA during import. Thus:
>
> # Create from your data in the clipboard (on OSX)
> DF <- read.table(pipe("pbpaste"), header = TRUE, sep = ",", na.strings
> = ".")
>
> > DF
>   ob  x1  y1
> 1  1 1.1 1/1
> 2  2 2.1 1/2
> 3  3 3.2 2/2
> 4  4  NA 0/0
> 5  5 4.5 1/1
> 6  6 5.1 0/0
> 7  7 6.3 1/1
> 8  8  NA 1/2
>
> > str(DF)
> 'data.frame':    8 obs. of  3 variables:
>  $ ob: int  1 2 3 4 5 6 7 8
>  $ x1: num  1.1 2.1 3.2 NA 4.5 5.1 6.3 NA
>  $ y1: Factor w/ 4 levels "0/0","1/1","1/2",..: 2 3 4 1 2 1 2 3
>
> This is now because:
>
> > type.convert(".", na.strings = ".")
> [1] NA
>
>
> See ?read.table and ?type.convert for more information.
>
> HTH,
>
> Marc Schwartz

--

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...