# Collapse factor levels

5 messages
Open this post in threaded view
|
Report Content as Inappropriate

## Collapse factor levels

 I'm sure this is simple enough, but an R site search on my subject terms did suggest a solution.  I have a numeric vector with many values that I wish to create a factor from having only a few levels. Here is a toy example.  > x <- 1:10  > x <- factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))  > x   [1] A A A B B B C C C C Levels: A A A B B B C C C C  > summary(x) A A A B B B C C C C 3 0 0 3 0 0 4 0 0 0 So, there are clearly still 10 underlying levels.  The results I would like to see from printing the value and summary(x) are:  > x   [1] A A A B B B C C C C Levels: A B C  > summary(x) A B C 3 3 4 Hopefully this makes sense. Thanks, Kevin -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Dalla Lana School of Public Health University of Toronto email: [hidden email]  Tel: 416.864.5776  Fax: 416.864.3016 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Collapse factor levels

 Hi Kevin, Here are two suggestions: # Combination of levels() and table() table(levels(x)) # A B C # 3 3 4 # Or defining a function mysummary <- function(x) table(levels(x)) # you can easily improve it :-) mysummary(x) # A B C # 3 3 4 HTH, Jorge On Sun, Nov 1, 2009 at 3:51 PM, Kevin E. Thorpe <> wrote: > I'm sure this is simple enough, but an R site search on my subject > terms did suggest a solution.  I have a numeric vector with many > values that I wish to create a factor from having only a few levels. > Here is a toy example. > > > x <- 1:10 > > x <- > factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C")) > > x >  [1] A A A B B B C C C C > Levels: A A A B B B C C C C > > summary(x) > A A A B B B C C C C > 3 0 0 3 0 0 4 0 0 0 > > So, there are clearly still 10 underlying levels.  The results I would > like to see from printing the value and summary(x) are: > > > x >  [1] A A A B B B C C C C > Levels: A B C > > summary(x) > A B C > 3 3 4 > > Hopefully this makes sense. > > Thanks, > > Kevin > > -- > Kevin E. Thorpe > Biostatistician/Trialist, Knowledge Translation Program > Assistant Professor, Dalla Lana School of Public Health > University of Toronto > email: [hidden email]  Tel: 416.864.5776  Fax: 416.864.3016 > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Collapse factor levels

 In reply to this post by Kevin E. Thorpe On Nov 1, 2009, at 3:51 PM, Kevin E. Thorpe wrote: > I'm sure this is simple enough, but an R site search on my subject > terms did suggest a solution.  I have a numeric vector with many > values that I wish to create a factor from having only a few levels. > Here is a toy example. > > > x <- 1:10 > > x <-   > factor > (x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C")) You have thusly created a pathological situation. In 2.10.0 this is   what you might see:  >  x <-   factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C")) Warning message: In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :    duplicated levels will not be allowed in factors anymore What you _should_ have done was:   x2 <- factor(c("A","A","A","B","B","B","C","C","C","C")) The usual approach to getting rid of unused factor levels is just to   apply the function factor() again without additional arguments.  > x <- factor(x)  # the "x" was from your code Warning message: In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :    duplicated levels will not be allowed in factors anymore # but that will be the last time you will see the warning..  > summary(x) A B C 3 3 4 -- David. > > x > [1] A A A B B B C C C C > Levels: A A A B B B C C C C > > summary(x) > A A A B B B C C C C > 3 0 0 3 0 0 4 0 0 0 > > So, there are clearly still 10 underlying levels.  The results I would > like to see from printing the value and summary(x) are: > > > x > [1] A A A B B B C C C C > Levels: A B C > > summary(x) > A B C > 3 3 4 > > Hopefully this makes sense. > > Thanks, > > Kevin > > -- > Kevin E. Thorpe > Biostatistician/Trialist, Knowledge Translation Program > Assistant Professor, Dalla Lana School of Public Health > University of Toronto > email: [hidden email]  Tel: 416.864.5776  Fax: 416.864.3016 > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|
Report Content as Inappropriate

## Re: Collapse factor levels

 In reply to this post by Kevin E. Thorpe Kevin E. Thorpe wrote: > I'm sure this is simple enough, but an R site search on my subject > terms did suggest a solution.  I have a numeric vector with many > values that I wish to create a factor from having only a few levels. > Here is a toy example. > >  > x <- 1:10 >  > x <- > factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C")) >  > x >  [1] A A A B B B C C C C > Levels: A A A B B B C C C C >  > summary(x) > A A A B B B C C C C > 3 0 0 3 0 0 4 0 0 0 > > So, there are clearly still 10 underlying levels.  The results I would > like to see from printing the value and summary(x) are: > >  > x >  [1] A A A B B B C C C C > Levels: A B C >  > summary(x) > A B C > 3 3 4 > > Hopefully this makes sense. > > Thanks, > > Kevin > It's an anomaly inherited frokm S-PLUS (or so I have been told). Actually, with the current R, you should get a warning:  > x <- 1:10  > x <- factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C")) Warning message: In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :    duplicated levels will not be allowed in factors anymore This works (as documented on the help page for levels!):  > x <- 1:10  > x <- factor(x,levels=1:10)  > levels(x) <- c("A","A","A","B","B","B","C","C","C","C")  > table(x) x A B C 3 3 4 --     O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B    c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K   (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918 ~~~~~~~~~~ - ([hidden email])              FAX: (+45) 35327907 ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.