drop levels problem

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

drop levels problem

Felipe Carrillo
Hi all:
I am having trouble dropping levels, got a few hints online without success.
Please consider the dataset below:
 I was under the inpression that subset(......drop=TRUE) would work but it
doesn't

library(ggplot2)
    library(hmisc)

x <- structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232,
46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056,
34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894,
42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766,
40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419,
42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634,
38.7079, 38.0791, 44.3136, 39.5333)), .Names = c("first", "second",
"third"), class = "data.frame", row.names = c(NA, -14L))

 head(x);str(x)
xmelt <- melt(x)
 names(xmelt) <- c("year","fatPerc")

  # Year variable is a factor with three levels
 # Subset to plot only 'first' year
firstyear <- subset(xmelt,year=='first');str(firstyear)
# Plot showing three levels still after I made the subset
  ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter()

# Try to drop the levels but dropUnusedLevels() doesn't seem to work here
  dropUnusedLevels()
ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter()

# code below also should drop levels but it doesn't
#data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)}
else{x}))
str(firstyear)
 
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA




______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: drop levels problem

Joshua Wiley-2
Hi Felipe,

On Mon, Nov 29, 2010 at 11:01 AM, Felipe Carrillo
<[hidden email]> wrote:
> Hi all:
> I am having trouble dropping levels, got a few hints online without success.
> Please consider the dataset below:
>  I was under the inpression that subset(......drop=TRUE) would work but it
> doesn't

Here drop is referring to:

data.frame(1:10)[, 1]
data.frame(1:10)[, 1, drop = FALSE]

not to levels of a factor.

>
> library(ggplot2)
>     library(hmisc)
>
> x <- structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232,
> 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056,
> 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894,
> 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766,
> 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419,
> 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634,
> 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c("first", "second",
> "third"), class = "data.frame", row.names = c(NA, -14L))

Thanks for the nice example!

>
>  head(x);str(x)
> xmelt <- melt(x)
>  names(xmelt) <- c("year","fatPerc")
>
>   # Year variable is a factor with three levels
>  # Subset to plot only 'first' year
> firstyear <- subset(xmelt,year=='first');str(firstyear)
> # Plot showing three levels still after I made the subset
>   ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter()

right, because it is possible to have levels of a factor that have no
observations---sometimes these are the most interesting (e.g., if you
subset by smoking and found that there were no instances of lung
cancer in non-smokers (not that extreme, but you get the point)).

>
> # Try to drop the levels but dropUnusedLevels() doesn't seem to work here
>   dropUnusedLevels()

sorry, I have had some difficulty installing Hmisc on my linux system
and never gotten around to working it out.

> ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter()
>
> # code below also should drop levels but it doesn't
> #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)}
> else{x}))

it would if you assigned it back to firstyear.  You do it, and then
just print to screen and the changed data goes off to oblivion.

firstyear <- data.frame(lapply(firstyear, function(x) if(is.factor(x))
{factor(x)} else {x}))
str(firstyear) # should now just have one level

Cheers,

Josh

> str(firstyear)
>
> Felipe D. Carrillo
> Supervisory Fishery Biologist
> Department of the Interior
> US Fish & Wildlife Service
> California, USA
>
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: drop levels problem

Henrique Dallazuanna
In reply to this post by Felipe Carrillo
Take a look on droplevels function (R >= 2.12)

On Mon, Nov 29, 2010 at 5:01 PM, Felipe Carrillo
<[hidden email]>wrote:

> Hi all:
> I am having trouble dropping levels, got a few hints online without
> success.
> Please consider the dataset below:
>  I was under the inpression that subset(......drop=TRUE) would work but it
> doesn't
>
> library(ggplot2)
>     library(hmisc)
>
> x <- structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232,
> 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056,
> 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894,
> 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766,
> 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419,
> 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634,
> 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c("first", "second",
> "third"), class = "data.frame", row.names = c(NA, -14L))
>
>  head(x);str(x)
> xmelt <- melt(x)
>  names(xmelt) <- c("year","fatPerc")
>
>   # Year variable is a factor with three levels
>  # Subset to plot only 'first' year
> firstyear <- subset(xmelt,year=='first');str(firstyear)
> # Plot showing three levels still after I made the subset
>   ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter()
>
> # Try to drop the levels but dropUnusedLevels() doesn't seem to work here
>   dropUnusedLevels()
> ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter()
>
> # code below also should drop levels but it doesn't
> #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)}
> else{x}))
> str(firstyear)
>
> Felipe D. Carrillo
> Supervisory Fishery Biologist
> Department of the Interior
> US Fish & Wildlife Service
> California, USA
>
>
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: drop levels problem

Felipe Carrillo
In reply to this post by Joshua Wiley-2
Thanks Joshua, I get it now, levels sometimes drive me loco....
 
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA



----- Original Message ----

> From: Joshua Wiley <[hidden email]>
> To: Felipe Carrillo <[hidden email]>
> Cc: [hidden email]
> Sent: Mon, November 29, 2010 11:18:45 AM
> Subject: Re: [R] drop levels problem
>
> Hi Felipe,
>
> On Mon, Nov 29, 2010 at 11:01 AM, Felipe Carrillo
> <[hidden email]> wrote:
> > Hi all:
> > I am having trouble dropping levels, got a few hints online without success.
> > Please consider the dataset below:
> >  I was under the inpression that subset(......drop=TRUE) would work but it
> > doesn't
>
> Here drop is referring to:
>
> data.frame(1:10)[, 1]
> data.frame(1:10)[, 1, drop = FALSE]
>
> not to levels of a factor.
>
> >
> > library(ggplot2)
> >     library(hmisc)
> >
> > x <- structure(list(first = c(38.2086, 43.1768, 43.146, 41.8044, 42.4232,
> > 46.3646, 38.0813, 40.0745, 40.4889, 38.6246, 40.2826, 41.6056,
> > 34.5353, 40.0768), second = c(43.3295, 42.4326, 38.8994, 37.0894,
> > 42.3218, 46.1726, 39.1206, 41.2072, 42.4874, 40.2657, 38.7766,
> > 40.8822, 42.0165, 49.2055), third = c(42.24, 42.992, 37.7419,
> > 42.3448, 41.9131, 44.385, 42.7811, 44.1963, 40.8088, 43.9634,
> > 38.7079, 38.0791, 44.3136, 39.5333)), .Names = c("first", "second",
> > "third"), class = "data.frame", row.names = c(NA, -14L))
>
> Thanks for the nice example!
>
> >
> >  head(x);str(x)
> > xmelt <- melt(x)
> >  names(xmelt) <- c("year","fatPerc")
> >
> >   # Year variable is a factor with three levels
> >  # Subset to plot only 'first' year
> > firstyear <- subset(xmelt,year=='first');str(firstyear)
> > # Plot showing three levels still after I made the subset
> >   ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter()
>
> right, because it is possible to have levels of a factor that have no
> observations---sometimes these are the most interesting (e.g., if you
> subset by smoking and found that there were no instances of lung
> cancer in non-smokers (not that extreme, but you get the point)).
>
> >
> > # Try to drop the levels but dropUnusedLevels() doesn't seem to work here
> >   dropUnusedLevels()
>
> sorry, I have had some difficulty installing Hmisc on my linux system
> and never gotten around to working it out.
>
> > ggplot(firstyear,aes(year,fatPerc)) + geom_boxplot() + geom_jitter()
> >
> > # code below also should drop levels but it doesn't
> > #data.frame(lapply(firstyear, function(x) if (is.factor(x)){ factor(x)}
> > else{x}))
>
> it would if you assigned it back to firstyear.  You do it, and then
> just print to screen and the changed data goes off to oblivion.
>
> firstyear <- data.frame(lapply(firstyear, function(x) if(is.factor(x))
> {factor(x)} else {x}))
> str(firstyear) # should now just have one level
>
> Cheers,
>
> Josh
>
> > str(firstyear)
> >
> > Felipe D. Carrillo
> > Supervisory Fishery Biologist
> > Department of the Interior
> > US Fish & Wildlife Service
> > California, USA
> >
> >
> >
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://www.joshuawiley.com/
>




______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: drop levels problem

Joshua Wiley-2
In reply to this post by Joshua Wiley-2
Just to follow up on my own post a bit:

xmelt$year[xmelt$year == "first", drop = TRUE]

will do what you want.  I think because in the subset there are
multiple columns not all of which are factor, the method for '[' being
used is not the factor one that would drop unused levels.  I did not
make that clear at all the first time around (and probably still
butchered it, which some knowledgeable soul may correct me on).  Also
I did get Hmisc installed, but I think dropUnusedLevels() does not
work in this case for a similar reason.

Henrique's solution is, as usual, the shortest :)

Josh

[snip]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.