Levels in returned data.frame after subset

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Levels in returned data.frame after subset

Ulrik Stervbo-2
Dear List,

When I subset a data.frame, the levels are not re-adjusted (see
example). Why is this? Am I missing out on some basic stuff here?

Thanks
Ulrik


> m <- data.frame(gender = c("M", "M","F"), ht = c(172, 186.5, 165), wt = c(91,99, 74))
> dim(m)
[1] 3 3

> levels(m$gender)
[1] "F" "M"

> s <- subset(m, m$gender == "M")
> dim(s)
[1] 2 3

> levels(s$gender)
[1] "F" "M"

> cat <- sapply(s, is.factor); s[cat] <- lapply(s[cat], factor)
> dim(s)
[1] 2 3

> levels(s$gender)
[1] "M"

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Levels in returned data.frame after subset

Ista Zahn-2
Hi Ulrik

On Sat, Sep 4, 2010 at 12:52 PM, Ulrik Stervbo <[hidden email]> wrote:
> Dear List,
>
> When I subset a data.frame, the levels are not re-adjusted (see
> example). Why is this? Am I missing out on some basic stuff here?

Only that this issue has come up many times before, and that this list
is archived and searchable. Try

RSiteSearch("subset drop levels", restrict = c("Rhelp10", "Rhelp08", "Rhelp02"))


-Ista

>
> Thanks
> Ulrik
>
>
>> m <- data.frame(gender = c("M", "M","F"), ht = c(172, 186.5, 165), wt = c(91,99, 74))
>> dim(m)
> [1] 3 3
>
>> levels(m$gender)
> [1] "F" "M"
>
>> s <- subset(m, m$gender == "M")
>> dim(s)
> [1] 2 3
>
>> levels(s$gender)
> [1] "F" "M"
>
>> cat <- sapply(s, is.factor); s[cat] <- lapply(s[cat], factor)
>> dim(s)
> [1] 2 3
>
>> levels(s$gender)
> [1] "M"
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Levels in returned data.frame after subset

Greg Snow-2
In reply to this post by Ulrik Stervbo-2
The advantage of computers is that they do exactly what they are told.
The disadvantage of computers is that they do exactly what they are told.

R is a set of instructions to the computer, those instructions are a combinations from the original programmers and from you.  Who should make important decisions about the structure of your data?  A group of (admittedly brilliant) programmers who have never seen your data nor know what questions you are trying to answer, or you (who hopefully knows more about your data and questions)?

I don't claim to be more intelligent/knowledgable than the programmers of R, but I am grateful that they have/had sufficient humility to allow for the possibility that I may actually know something about my data and questions that they don't (or maybe they are just to lazy to do my job for me, but that is also appropriate).

In your example below, why do you care what the levels of gender are after the subset?  Why waste time/effort dropping the levels for a column that by definition only has one value?

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[hidden email]
801.408.8111


> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of Ulrik Stervbo
> Sent: Saturday, September 04, 2010 6:53 AM
> To: [hidden email]
> Subject: [R] Levels in returned data.frame after subset
>
> Dear List,
>
> When I subset a data.frame, the levels are not re-adjusted (see
> example). Why is this? Am I missing out on some basic stuff here?
>
> Thanks
> Ulrik
>
>
> > m <- data.frame(gender = c("M", "M","F"), ht = c(172, 186.5, 165), wt
> = c(91,99, 74))
> > dim(m)
> [1] 3 3
>
> > levels(m$gender)
> [1] "F" "M"
>
> > s <- subset(m, m$gender == "M")
> > dim(s)
> [1] 2 3
>
> > levels(s$gender)
> [1] "F" "M"
>
> > cat <- sapply(s, is.factor); s[cat] <- lapply(s[cat], factor)
> > dim(s)
> [1] 2 3
>
> > levels(s$gender)
> [1] "M"
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Levels in returned data.frame after subset

Ulrik Stervbo-2
In reply to this post by Ista Zahn-2
Thanks for the replies!  Obviously I must have used to wrong search
terms - sorry.

@greg: I care about the levels after the subset, because if they are
not dropped, then they still appear in the subsequent heatmap I make
with ggplot (with my read data-set of course). Admittedly I am quite
green, and may do things in a rather silly way - but it works (at
least I think it does)



On 4 September 2010 15:41, Ista Zahn <[hidden email]> wrote:

> Hi Ulrik
>
> On Sat, Sep 4, 2010 at 12:52 PM, Ulrik Stervbo <[hidden email]> wrote:
>> Dear List,
>>
>> When I subset a data.frame, the levels are not re-adjusted (see
>> example). Why is this? Am I missing out on some basic stuff here?
>
> Only that this issue has come up many times before, and that this list
> is archived and searchable. Try
>
> RSiteSearch("subset drop levels", restrict = c("Rhelp10", "Rhelp08", "Rhelp02"))
>
>
> -Ista
>
>>
>> Thanks
>> Ulrik
>>
>>
>>> m <- data.frame(gender = c("M", "M","F"), ht = c(172, 186.5, 165), wt = c(91,99, 74))
>>> dim(m)
>> [1] 3 3
>>
>>> levels(m$gender)
>> [1] "F" "M"
>>
>>> s <- subset(m, m$gender == "M")
>>> dim(s)
>> [1] 2 3
>>
>>> levels(s$gender)
>> [1] "F" "M"
>>
>>> cat <- sapply(s, is.factor); s[cat] <- lapply(s[cat], factor)
>>> dim(s)
>> [1] 2 3
>>
>>> levels(s$gender)
>> [1] "M"
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.