Quantcast

Collapse factor levels

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Collapse factor levels

Kevin E. Thorpe
I'm sure this is simple enough, but an R site search on my subject
terms did suggest a solution.  I have a numeric vector with many
values that I wish to create a factor from having only a few levels.
Here is a toy example.

 > x <- 1:10
 > x <-
factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
 > x
  [1] A A A B B B C C C C
Levels: A A A B B B C C C C
 > summary(x)
A A A B B B C C C C
3 0 0 3 0 0 4 0 0 0

So, there are clearly still 10 underlying levels.  The results I would
like to see from printing the value and summary(x) are:

 > x
  [1] A A A B B B C C C C
Levels: A B C
 > summary(x)
A B C
3 3 4

Hopefully this makes sense.

Thanks,

Kevin

--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: [hidden email]  Tel: 416.864.5776  Fax: 416.864.3016

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Collapse factor levels

Jorge I Velez
Hi Kevin,

Here are two suggestions:

# Combination of levels() and table()
table(levels(x))
# A B C
# 3 3 4

# Or defining a function
mysummary <- function(x) table(levels(x)) # you can easily improve it :-)
mysummary(x)
# A B C
# 3 3 4

HTH,
Jorge

On Sun, Nov 1, 2009 at 3:51 PM, Kevin E. Thorpe <> wrote:

> I'm sure this is simple enough, but an R site search on my subject
> terms did suggest a solution.  I have a numeric vector with many
> values that I wish to create a factor from having only a few levels.
> Here is a toy example.
>
> > x <- 1:10
> > x <-
> factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
> > x
>  [1] A A A B B B C C C C
> Levels: A A A B B B C C C C
> > summary(x)
> A A A B B B C C C C
> 3 0 0 3 0 0 4 0 0 0
>
> So, there are clearly still 10 underlying levels.  The results I would
> like to see from printing the value and summary(x) are:
>
> > x
>  [1] A A A B B B C C C C
> Levels: A B C
> > summary(x)
> A B C
> 3 3 4
>
> Hopefully this makes sense.
>
> Thanks,
>
> Kevin
>
> --
> Kevin E. Thorpe
> Biostatistician/Trialist, Knowledge Translation Program
> Assistant Professor, Dalla Lana School of Public Health
> University of Toronto
> email: [hidden email]  Tel: 416.864.5776  Fax: 416.864.3016
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Collapse factor levels

David Winsemius
In reply to this post by Kevin E. Thorpe

On Nov 1, 2009, at 3:51 PM, Kevin E. Thorpe wrote:

> I'm sure this is simple enough, but an R site search on my subject
> terms did suggest a solution.  I have a numeric vector with many
> values that I wish to create a factor from having only a few levels.
> Here is a toy example.
>
> > x <- 1:10
> > x <-  
> factor
> (x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))

You have thusly created a pathological situation. In 2.10.0 this is  
what you might see:

 >  x <-  
factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
Warning message:
In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :
   duplicated levels will not be allowed in factors anymore

What you _should_ have done was:

  x2 <- factor(c("A","A","A","B","B","B","C","C","C","C"))

The usual approach to getting rid of unused factor levels is just to  
apply the function factor() again without additional arguments.

 > x <- factor(x)  # the "x" was from your code
Warning message:
In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :
   duplicated levels will not be allowed in factors anymore

# but that will be the last time you will see the warning..

 > summary(x)
A B C
3 3 4

--
David.

> > x
> [1] A A A B B B C C C C
> Levels: A A A B B B C C C C
> > summary(x)
> A A A B B B C C C C
> 3 0 0 3 0 0 4 0 0 0
>
> So, there are clearly still 10 underlying levels.  The results I would
> like to see from printing the value and summary(x) are:
>
> > x
> [1] A A A B B B C C C C
> Levels: A B C
> > summary(x)
> A B C
> 3 3 4
>
> Hopefully this makes sense.
>
> Thanks,
>
> Kevin
>
> --
> Kevin E. Thorpe
> Biostatistician/Trialist, Knowledge Translation Program
> Assistant Professor, Dalla Lana School of Public Health
> University of Toronto
> email: [hidden email]  Tel: 416.864.5776  Fax: 416.864.3016
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Collapse factor levels

Peter Dalgaard
In reply to this post by Kevin E. Thorpe
Kevin E. Thorpe wrote:

> I'm sure this is simple enough, but an R site search on my subject
> terms did suggest a solution.  I have a numeric vector with many
> values that I wish to create a factor from having only a few levels.
> Here is a toy example.
>
>  > x <- 1:10
>  > x <-
> factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
>  > x
>  [1] A A A B B B C C C C
> Levels: A A A B B B C C C C
>  > summary(x)
> A A A B B B C C C C
> 3 0 0 3 0 0 4 0 0 0
>
> So, there are clearly still 10 underlying levels.  The results I would
> like to see from printing the value and summary(x) are:
>
>  > x
>  [1] A A A B B B C C C C
> Levels: A B C
>  > summary(x)
> A B C
> 3 3 4
>
> Hopefully this makes sense.
>
> Thanks,
>
> Kevin
>

It's an anomaly inherited frokm S-PLUS (or so I have been told).
Actually, with the current R, you should get a warning:

 > x <- 1:10
 > x <-
factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
Warning message:
In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :
   duplicated levels will not be allowed in factors anymore

This works (as documented on the help page for levels!):

 > x <- 1:10
 > x <- factor(x,levels=1:10)
 > levels(x) <- c("A","A","A","B","B","B","C","C","C","C")
 > table(x)
x
A B C
3 3 4


--
    O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - ([hidden email])              FAX: (+45) 35327907

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Collapse factor levels

Kevin E. Thorpe
Peter Dalgaard wrote:

> Kevin E. Thorpe wrote:
>> I'm sure this is simple enough, but an R site search on my subject
>> terms did suggest a solution.  I have a numeric vector with many
>> values that I wish to create a factor from having only a few levels.
>> Here is a toy example.
>>
>>  > x <- 1:10
>>  > x <-
>> factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
>>  > x
>>  [1] A A A B B B C C C C
>> Levels: A A A B B B C C C C
>>  > summary(x)
>> A A A B B B C C C C
>> 3 0 0 3 0 0 4 0 0 0
>>
>> So, there are clearly still 10 underlying levels.  The results I would
>> like to see from printing the value and summary(x) are:
>>
>>  > x
>>  [1] A A A B B B C C C C
>> Levels: A B C
>>  > summary(x)
>> A B C
>> 3 3 4
>>
>> Hopefully this makes sense.
>>
>> Thanks,
>>
>> Kevin
>>
>
> It's an anomaly inherited frokm S-PLUS (or so I have been told).
> Actually, with the current R, you should get a warning:
>
>  > x <- 1:10
>  > x <-
> factor(x,levels=1:10,labels=c("A","A","A","B","B","B","C","C","C","C"))
> Warning message:
> In `levels<-`(`*tmp*`, value = c("A", "A", "A", "B", "B", "B", "C",  :
>   duplicated levels will not be allowed in factors anymore
>
> This works (as documented on the help page for levels!):
>
>  > x <- 1:10
>  > x <- factor(x,levels=1:10)
>  > levels(x) <- c("A","A","A","B","B","B","C","C","C","C")
>  > table(x)
> x
> A B C
> 3 3 4
>
>

Thanks.  That's exactly what I need.  I knew it was simple.
I've even used levels() before, but it just didn't occur to
me this time.  I'm clearly not on current R. :-)
When I have some time, I'll upgrade.

Kevin

--
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Dalla Lana School of Public Health
University of Toronto
email: [hidden email]  Tel: 416.864.5776  Fax: 416.864.3016

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...