Data type in a data frame

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Data type in a data frame

asafwe
Hi all,

How does R know to regard a variable as a factor and not a character?
For example, consider the following table:

Observation                Gender                Dosage                Alertness
1                               m                        a                         8
2                               m                        a                        12
3                               m                        a                        13
4                               m                        a                        12
5                               m                        b                         6
6                               m                        b                         7

When read into a dataframe, will "m", "a", "b" be regarded as a factor or as a character? How does R decide?

Thanks a lot in advance,

Asaf
Reply | Threaded
Open this post in threaded view
|

Re: Data type in a data frame

arun kirshna
This post has NOT been accepted by the mailing list yet.
Hi,
It depends on how you read the data.
dat1<-read.table(text="
Observation                Gender                Dosage                Alertness
1                               m                        a                         8
2                               m                        a                        12
3                               m                        a                        13
4                               m                        a                        12
5                               m                        b                         6
6                               m                        b                         7
",sep="",header=TRUE)
#By default,
You will get factors
 str(dat1)
'data.frame': 6 obs. of  4 variables:
 $ Observation: int  1 2 3 4 5 6
 $ Gender     : Factor w/ 1 level "m": 1 1 1 1 1 1
 $ Dosage     : Factor w/ 2 levels "a","b": 1 1 1 1 2 2
 $ Alertness  : int  8 12 13 12 6 7

dat1<-read.table(text="
Observation                Gender                Dosage                Alertness
1                               m                        a                         8
2                               m                        a                        12
3                               m                        a                        13
4                               m                        a                        12
5                               m                        b                         6
6                               m                        b                         7
",sep="",header=TRUE,stringsAsFactors=FALSE)
str(dat1)
#'data.frame': 6 obs. of  4 variables:
 #$ Observation: int  1 2 3 4 5 6
 #$ Gender     : chr  "m" "m" "m" "m" ...
 #$ Dosage     : chr  "a" "a" "a" "a" ...
 #$ Alertness  : int  8 12 13 12 6 7

A.K.
Reply | Threaded
Open this post in threaded view
|

Re: Data type in a data frame

asafwe
So helpful, Arun -- Thank you!

Asaf

Reply | Threaded
Open this post in threaded view
|

Re: Data type in a data frame

Rui Barradas
In reply to this post by asafwe
Hello,

When read into a data.frame, R defaults to reading character strings as
factors. If you don't want that, use option stringsAsFactors = FALSE.
Using your dataset,


dat1 <- read.table(text = "
Observation   Gender  Dosage  Alertness
1             m       a               8
2             m       a              12
3             m       a              13
4             m       a              12
5             m       b               6
6             m       b               7
", header = TRUE)
str(dat2)

dat2 <- read.table(text = "
Observation   Gender  Dosage  Alertness
1             m       a               8
2             m       a              12
3             m       a              13
4             m       a              12
5             m       b               6
6             m       b               7
", header = TRUE, stringsAsFactors = FALSE)
str(dat2)


This is decided based on the setting of (which you can change)

options("stringsAsFactors")

Hope this helps,

Rui Barradas
Em 23-10-2012 15:43, asafwe escreveu:

> Hi all,
>
> How does R know to regard a variable as a factor and not a character?
> For example, consider the following table:
>
> Observation                Gender                Dosage
> Alertness
> 1                               m                        a
> 8
> 2                               m                        a
> 12
> 3                               m                        a
> 13
> 4                               m                        a
> 12
> 5                               m                        b
> 6
> 6                               m                        b
> 7
>
> When read into a dataframe, will "m", "a", "b" be regarded as a factor or as
> a character? How does R decide?
>
> Thanks a lot in advance,
>
> Asaf
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Data-type-in-a-data-frame-tp4647161.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Data type in a data frame

William Dunlap
> When read into a data.frame, R defaults to reading character strings as
> factors. If you don't want that, use option stringsAsFactors = FALSE.

This is somewhat tangential, but if you plan on using
  predict(fit,newdata=nd)
after fitting a model like
  fit <- lm(y~x, data=d)
be sure you have converted character columns in nd and d into factors.
Otherwise you are likely to get errors from predict().   You will get
a warning when fitting the model if you use character columns, but
the results are ok until you use predict() on the result.

E.g.,
> d <- data.frame(y=1:10, cGroup=rep(c("A","B","C"),c(3,4,3)), fGroup=factor(rep(c("A","B","C"),c(3,4,3))), stringsAsFactors=FALSE)
> fitChar <- lm(y ~ cGroup - 1, data=d[1:9,])
Warning message:
In model.matrix.default(mt, mf, contrasts) :
  variable 'cGroup' converted to a factor
> fitFactor <- lm(y ~ fGroup - 1, data=d[1:9,])
> coef(fitChar)
cGroupA cGroupB cGroupC
    2.0     5.5     8.5
> coef(fitFactor)
fGroupA fGroupB fGroupC
    2.0     5.5     8.5
> # so far things are ok, but ...
> predict(fitChar, newdata=d[10,])
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) :
  variable 'cGroup' converted to a factor
> predict(fitFactor, newdata=d[10,])
 10
8.5
> predict(fitChar, newdata=d[c(1,10),])
Error in predict.lm(fitChar, newdata = d[c(1, 10), ]) :
  subscript out of bounds
In addition: Warning message:
In model.matrix.default(Terms, m, contrasts.arg = object$contrasts) :
  variable 'cGroup' converted to a factor
> predict(fitFactor, newdata=d[c(1,10),])
  1  10
2.0 8.5


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Rui Barradas
> Sent: Tuesday, October 23, 2012 11:16 AM
> To: asafwe
> Cc: [hidden email]
> Subject: Re: [R] Data type in a data frame
>
> Hello,
>
> When read into a data.frame, R defaults to reading character strings as
> factors. If you don't want that, use option stringsAsFactors = FALSE.
> Using your dataset,
>
>
> dat1 <- read.table(text = "
> Observation   Gender  Dosage  Alertness
> 1             m       a               8
> 2             m       a              12
> 3             m       a              13
> 4             m       a              12
> 5             m       b               6
> 6             m       b               7
> ", header = TRUE)
> str(dat2)
>
> dat2 <- read.table(text = "
> Observation   Gender  Dosage  Alertness
> 1             m       a               8
> 2             m       a              12
> 3             m       a              13
> 4             m       a              12
> 5             m       b               6
> 6             m       b               7
> ", header = TRUE, stringsAsFactors = FALSE)
> str(dat2)
>
>
> This is decided based on the setting of (which you can change)
>
> options("stringsAsFactors")
>
> Hope this helps,
>
> Rui Barradas
> Em 23-10-2012 15:43, asafwe escreveu:
> > Hi all,
> >
> > How does R know to regard a variable as a factor and not a character?
> > For example, consider the following table:
> >
> > Observation                Gender                Dosage
> > Alertness
> > 1                               m                        a
> > 8
> > 2                               m                        a
> > 12
> > 3                               m                        a
> > 13
> > 4                               m                        a
> > 12
> > 5                               m                        b
> > 6
> > 6                               m                        b
> > 7
> >
> > When read into a dataframe, will "m", "a", "b" be regarded as a factor or as
> > a character? How does R decide?
> >
> > Thanks a lot in advance,
> >
> > Asaf
> >
> >
> >
> > --
> > View this message in context: http://r.789695.n4.nabble.com/Data-type-in-a-data-
> frame-tp4647161.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.