factor level issue after subsetting

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

factor level issue after subsetting

Schreiber, Stefan
Dear list,

I cannot figure out why, after sub-setting my data, that particular item
which I don't want to plot is still in the newly created subset (please
see example below). R somehow remembers what was in the original data
set. A work around is exporting and importing the new subset. Then it's
all fine; but I don't like this idea and was wondering what am I missing
here?

Thanks!
Stefan

P.S. I am using R 2.13.2 for Mac.

> dat<-read.csv("~/MyFiles/data.csv")
> class(dat$treat)
[1] "factor"
> dat
   treat yield
1   cont  98.7
2   cont  97.2
3   cont  96.1
4   cont  98.1
5     10 103.0
6     10 101.3
7     10 102.1
8     10 101.9
9     30 121.1
10    30 123.1
11    30 119.7
12    30 118.9
13    60 109.9
14    60 110.1
15    60 113.1
16    60 112.3
> plot(dat$treat,dat$yield)
> dat.sub<-dat[which(dat$treat!='cont')]
> class(dat.sub$treat)
[1] "factor"
> dat.sub
   treat yield
5     10 103.0
6     10 101.3
7     10 102.1
8     10 101.9
9     30 121.1
10    30 123.1
11    30 119.7
12    30 118.9
13    60 109.9
14    60 110.1
15    60 113.1
16    60 112.3
> plot(dat.sub$treat,dat.sub$yield)

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: factor level issue after subsetting

Nordlund, Dan (DSHS/RDA)
> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of Schreiber, Stefan
> Sent: Tuesday, November 01, 2011 2:29 PM
> To: [hidden email]
> Subject: [R] factor level issue after subsetting
>
> Dear list,
>
> I cannot figure out why, after sub-setting my data, that particular
> item
> which I don't want to plot is still in the newly created subset (please
> see example below). R somehow remembers what was in the original data
> set.

That is the nature of factors.  Once created, unused levels must be xplicitly dropped

plot(droplevels(dat.sub$treat),dat.sub$yield)


Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204



A work around is exporting and importing the new subset. Then it's

> all fine; but I don't like this idea and was wondering what am I
> missing
> here?
>
> Thanks!
> Stefan
>
> P.S. I am using R 2.13.2 for Mac.
>
> > dat<-read.csv("~/MyFiles/data.csv")
> > class(dat$treat)
> [1] "factor"
> > dat
>    treat yield
> 1   cont  98.7
> 2   cont  97.2
> 3   cont  96.1
> 4   cont  98.1
> 5     10 103.0
> 6     10 101.3
> 7     10 102.1
> 8     10 101.9
> 9     30 121.1
> 10    30 123.1
> 11    30 119.7
> 12    30 118.9
> 13    60 109.9
> 14    60 110.1
> 15    60 113.1
> 16    60 112.3
> > plot(dat$treat,dat$yield)
> > dat.sub<-dat[which(dat$treat!='cont')]
> > class(dat.sub$treat)
> [1] "factor"
> > dat.sub
>    treat yield
> 5     10 103.0
> 6     10 101.3
> 7     10 102.1
> 8     10 101.9
> 9     30 121.1
> 10    30 123.1
> 11    30 119.7
> 12    30 118.9
> 13    60 109.9
> 14    60 110.1
> 15    60 113.1
> 16    60 112.3
> > plot(dat.sub$treat,dat.sub$yield)
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: factor level issue after subsetting

Justin Haynes
In reply to this post by Schreiber, Stefan
first of all, the subsetting line is overly complicated.

dat.sub<-dat[dat$treat!='cont',]

will work just fine.  R does exactly what you're describing.  It knows
the levels of the factor.  Once you remove 'cont' from the data, that
doesn't mean that the level is removed from the factor:

> df<-data.frame(let=factor(sample(letters[1:5],100,replace=T)),num=rnorm(100))
> str(df)
'data.frame': 100 obs. of  2 variables:
 $ let: Factor w/ 5 levels "a","b","c","d",..: 1 5 1 4 3 5 2 2 1 3 ...
 $ num: num  0.224 -0.523 0.974 -0.268 -0.61 ...

> df.sub<-df[df$let!='a',]
> str(df.sub)
'data.frame': 82 obs. of  2 variables:
 $ let: Factor w/ 5 levels "a","b","c","d",..: 5 4 3 5 2 2 3 3 5 3 ...
 $ num: num  -0.523 -0.268 -0.61 -1.383 -0.193 ...

> unique(df.sub$let)
[1] e d c b
Levels: a b c d e

> df.sub$let<-factor(df.sub$let)
> unique(df.sub$let)
[1] e d c b
Levels: e d c b

> str(df.sub$let)
 Factor w/ 4 levels "e","d","c","b": 1 2 3 1 4 4 3 3 1 3 ...
>

by redefining your factor you can eliminate the problem.  the other
option, if you don't want factors to begin with is:

options(stringsAsFactors=FALSE)  # to set the global option

or

dat<-read.csv("~/MyFiles/data.csv",stringsAsFactors=FALSE)  # to set
the option locally for this single read.csv call.


On Tue, Nov 1, 2011 at 2:28 PM, Schreiber, Stefan
<[hidden email]> wrote:

> Dear list,
>
> I cannot figure out why, after sub-setting my data, that particular item
> which I don't want to plot is still in the newly created subset (please
> see example below). R somehow remembers what was in the original data
> set. A work around is exporting and importing the new subset. Then it's
> all fine; but I don't like this idea and was wondering what am I missing
> here?
>
> Thanks!
> Stefan
>
> P.S. I am using R 2.13.2 for Mac.
>
>> dat<-read.csv("~/MyFiles/data.csv")
>> class(dat$treat)
> [1] "factor"
>> dat
>   treat yield
> 1   cont  98.7
> 2   cont  97.2
> 3   cont  96.1
> 4   cont  98.1
> 5     10 103.0
> 6     10 101.3
> 7     10 102.1
> 8     10 101.9
> 9     30 121.1
> 10    30 123.1
> 11    30 119.7
> 12    30 118.9
> 13    60 109.9
> 14    60 110.1
> 15    60 113.1
> 16    60 112.3
>> plot(dat$treat,dat$yield)
>> dat.sub<-dat[which(dat$treat!='cont')]
>> class(dat.sub$treat)
> [1] "factor"
>> dat.sub
>   treat yield
> 5     10 103.0
> 6     10 101.3
> 7     10 102.1
> 8     10 101.9
> 9     30 121.1
> 10    30 123.1
> 11    30 119.7
> 12    30 118.9
> 13    60 109.9
> 14    60 110.1
> 15    60 113.1
> 16    60 112.3
>> plot(dat.sub$treat,dat.sub$yield)
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: factor level issue after subsetting

Felipe Carrillo
In reply to this post by Schreiber, Stefan
Stefan:
Use the droplevels function...
dat <- read.table(textConnection("
  treat yield
1  cont  98.7
2  cont  97.2
3  cont  96.1
4  cont  98.1
5    10 103.0
6    10 101.3
7    10 102.1
8    10 101.9
9    30 121.1
10    30 123.1
11    30 119.7
12    30 118.9
13    60 109.9
14    60 110.1
15    60 113.1
16    60 112.3"),header=T)
dat
 plot(dat$treat,dat$yield)
 dat.sub <- subset(dat,treat!="cont");dat.sub
 dat.sub <- droplevels(dat.sub)    # drop unwanted levels
plot(dat.sub$treat,dat.sub$yield)

Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA
http://www.fws.gov/redbluff/rbdd_jsmp.aspx


From: "Schreiber, Stefan" <[hidden email]>

>To: [hidden email]
>Sent: Tuesday, November 1, 2011 2:28 PM
>Subject: [R] factor level issue after subsetting
>
>Dear list,
>
>I cannot figure out why, after sub-setting my data, that particular item
>which I don't want to plot is still in the newly created subset (please
>see example below). R somehow remembers what was in the original data
>set. A work around is exporting and importing the new subset. Then it's
>all fine; but I don't like this idea and was wondering what am I missing
>here?
>
>Thanks!
>Stefan
>
>P.S. I am using R 2.13.2 for Mac.
>
>> dat<-read.csv("~/MyFiles/data.csv")
>> class(dat$treat)
>[1] "factor"
>> dat
>  treat yield
>1  cont  98.7
>2  cont  97.2
>3  cont  96.1
>4  cont  98.1
>5    10 103.0
>6    10 101.3
>7    10 102.1
>8    10 101.9
>9    30 121.1
>10    30 123.1
>11    30 119.7
>12    30 118.9
>13    60 109.9
>14    60 110.1
>15    60 113.1
>16    60 112.3
>> plot(dat$treat,dat$yield)
>> dat.sub<-dat[which(dat$treat!='cont')]
>> class(dat.sub$treat)
>[1] "factor"
>> dat.sub
>  treat yield
>5    10 103.0
>6    10 101.3
>7    10 102.1
>8    10 101.9
>9    30 121.1
>10    30 123.1
>11    30 119.7
>12    30 118.9
>13    60 109.9
>14    60 110.1
>15    60 113.1
>16    60 112.3
>> plot(dat.sub$treat,dat.sub$yield)
>
>    [[alternative HTML version deleted]]
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: factor level issue after subsetting

Schreiber, Stefan
Thanks for the fast response and your comments!

That works perfect!

 

Another little mystery solved ;)

 

Stefan

 

 

From: Felipe Carrillo [mailto:[hidden email]]
Sent: Tuesday, November 01, 2011 3:54 PM
To: Schreiber, Stefan; [hidden email]
Subject: Re: [R] factor level issue after subsetting

 

Stefan:

Use the droplevels function...

dat <- read.table(textConnection("
  treat yield
1  cont  98.7
2  cont  97.2
3  cont  96.1
4  cont  98.1
5    10 103.0
6    10 101.3
7    10 102.1
8    10 101.9
9    30 121.1
10    30 123.1
11    30 119.7
12    30 118.9
13    60 109.9
14    60 110.1
15    60 113.1
16    60 112.3"),header=T)
dat
 plot(dat$treat,dat$yield)
 dat.sub <- subset(dat,treat!="cont");dat.sub
 dat.sub <- droplevels(dat.sub)    # drop unwanted levels
plot(dat.sub$treat,dat.sub$yield)

 

Felipe D. Carrillo

Supervisory Fishery Biologist

Department of the Interior

US Fish & Wildlife Service

California, USA

http://www.fws.gov/redbluff/rbdd_jsmp.aspx

         

        From: "Schreiber, Stefan" <[hidden email]>
        To: [hidden email]
        Sent: Tuesday, November 1, 2011 2:28 PM
        Subject: [R] factor level issue after subsetting
       
        Dear list,
       
        I cannot figure out why, after sub-setting my data, that
particular item
        which I don't want to plot is still in the newly created subset
(please
        see example below). R somehow remembers what was in the original
data
        set. A work around is exporting and importing the new subset.
Then it's
        all fine; but I don't like this idea and was wondering what am I
missing
        here?
       
        Thanks!
        Stefan
       
        P.S. I am using R 2.13.2 for Mac.
       
        > dat<-read.csv("~/MyFiles/data.csv")
        > class(dat$treat)
        [1] "factor"
        > dat
          treat yield
        1  cont  98.7
        2  cont  97.2
        3  cont  96.1
        4  cont  98.1
        5    10 103.0
        6    10 101.3
        7    10 102.1
        8    10 101.9
        9    30 121.1
        10    30 123.1
        11    30 119.7
        12    30 118.9
        13    60 109.9
        14    60 110.1
        15    60 113.1
        16    60 112.3
        > plot(dat$treat,dat$yield)
        > dat.sub<-dat[which(dat$treat!='cont')]
        > class(dat.sub$treat)
        [1] "factor"
        > dat.sub
          treat yield
        5    10 103.0
        6    10 101.3
        7    10 102.1
        8    10 101.9
        9    30 121.1
        10    30 123.1
        11    30 119.7
        12    30 118.9
        13    60 109.9
        14    60 110.1
        15    60 113.1
        16    60 112.3
        > plot(dat.sub$treat,dat.sub$yield)
       
            [[alternative HTML version deleted]]
       
        ______________________________________________
        [hidden email] mailing list
        https://stat.ethz.ch/mailman/listinfo/r-help
        PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
        and provide commented, minimal, self-contained, reproducible
code.
       
       


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.