Quantcast

Creating a vector of categories

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Creating a vector of categories

Christoffer Karlsson
Hi,

I have a column in a data frame looking something like:

$sex $language $count
male  english  0
male  english  0
female  english  32
male  spanish  154
female  english  11
female  norweigan 7

and so on.
What I want to do is to order these in to categories, for instance one
category where count>=0 & count<10 and so on..

I want my data to turn out looking something like:

male english 0-10 1324
male english 11-20 756
.....
male spanish 0-10 354
...
female english 0-10 1557
...

and so on, where the right hand is the count of the number of people in each
category.
Up until now I've been subsetting the data frame into each category, and
then counting number of rows in each subset. However I now have a large
amount of different factor combinations which makes this process tedious.

Any help would be appreciated!
Chris

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Creating a vector of categories

Jim Lemon
On 03/26/2010 08:41 PM, Christoffer Karlsson wrote:

> Hi,
>
> I have a column in a data frame looking something like:
>
> $sex $language $count
> male  english  0
> male  english  0
> female  english  32
> male  spanish  154
> female  english  11
> female  norweigan 7
>
> and so on.
> What I want to do is to order these in to categories, for instance one
> category where count>=0&  count<10 and so on..
>
> I want my data to turn out looking something like:
>
> male english 0-10 1324
> male english 11-20 756
> .....
> male spanish 0-10 354
> ...
> female english 0-10 1557
> ...
>
> and so on, where the right hand is the count of the number of people in each
> category.
> Up until now I've been subsetting the data frame into each category, and
> then counting number of rows in each subset. However I now have a large
> amount of different factor combinations which makes this process tedious.
>
> Any help would be appreciated!

Hi Chris,
As luck would have it, I have been working on a very similar problem,
that of graphically representing multi-level summaries. What you could
do is to create a new factor variable with the "cut" function (say,
"countcut"), then call the "by" function like this:

by(mydf$sex,list(mydf$language,mydf$countcut),sum)

You will not get the format you have specified, but you will get the
numbers that can be reformatted.

Jim

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Odp: Creating a vector of categories

PIKAL Petr
In reply to this post by Christoffer Karlsson
Hi

[hidden email] napsal dne 26.03.2010 10:41:29:

> Hi,
>
> I have a column in a data frame looking something like:
>
> $sex $language $count
> male  english  0
> male  english  0
> female  english  32
> male  spanish  154
> female  english  11
> female  norweigan 7
>
> and so on.
> What I want to do is to order these in to categories, for instance one
> category where count>=0 & count<10 and so on..

Break your counts into desired levels,
see ?cut
cut(1:100, breaks=10)


>
> I want my data to turn out looking something like:
>
> male english 0-10 1324
> male english 11-20 756
> .....
> male spanish 0-10 354
> ...
> female english 0-10 1557
> ...

aggregate your data

with(your.data, aggregate(count, list(sex, language, cutted.count),
length))

Regards
Petr


>
> and so on, where the right hand is the count of the number of people in
each
> category.
> Up until now I've been subsetting the data frame into each category, and
> then counting number of rows in each subset. However I now have a large
> amount of different factor combinations which makes this process
tedious.

>
> Any help would be appreciated!
> Chris
>
>    [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Creating a vector of categories

Sharpie
In reply to this post by Christoffer Karlsson
Christoffer Karlsson wrote
Hi,

I have a column in a data frame looking something like:

$sex $language $count
male  english  0
male  english  0
female  english  32
male  spanish  154
female  english  11
female  norweigan 7

and so on.
What I want to do is to order these in to categories, for instance one
category where count>=0 & count<10 and so on..

I want my data to turn out looking something like:

male english 0-10 1324
male english 11-20 756
.....
male spanish 0-10 354
...
female english 0-10 1557
...

and so on, where the right hand is the count of the number of people in each
category.
Up until now I've been subsetting the data frame into each category, and
then counting number of rows in each subset. However I now have a large
amount of different factor combinations which makes this process tedious.

Any help would be appreciated!
Chris
You can quickly assign a category to each row in your data frame with the cut() function:

  testData <- structure(list(sex = structure(c(2L, 2L, 1L, 2L, 1L, 1L, 2L,
1L, 2L), .Label = c("female", "male"), class = "factor"), language = structure(c(1L,
1L, 1L, 3L, 1L, 2L, 3L, 3L, 1L), .Label = c("english", "norweigan",
"spanish"), class = "factor"), count = c(0L, 0L, 32L, 154L, 11L,
7L, 3L, 5L, 2L)), .Names = c("sex", "language", "count"), class = "data.frame", row.names = c(NA,
-9L))

  binMax <- ceiling( max(testData$count) / 10 ) * 10
  binBreaks <- seq( 0, binMax, by = 10 )

  testData$bin <- cut( testData$count, binBreaks, include.lowest = TRUE )

And then as Petr said:

  with( testData, aggregate(count, list(sex, language, bin), length))


Hope this helps!

-Charlie
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Creating a vector of categories

Sharpie
Sharpie wrote
  testData$bin <- cut( testData$count, binBreaks, include.lowest = TRUE )
I also made a slight mistake, you will want to replace inclde.lowest = TRUE with right = FALSE to the call to cut() to preserve the greater-than-or-equal boundary at the lower end of each bin.

Sorry if that caused any confusion!

-Charlie
Charlie Sharpsteen
Undergraduate-- Environmental Resources Engineering
Humboldt State University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Creating a vector of categories

Christoffer Karlsson
In reply to this post by PIKAL Petr
Thanks a ton guys for your help! Saved me a boat load of time and helped me
develop a much better method of doing these things than I had in the past.
The method I ended up using was to cut up my counts and aggregating my data
as suggested by Petr.

Thanks!
Chris

On Fri, Mar 26, 2010 at 12:05 PM, Petr PIKAL <[hidden email]> wrote:

> Hi
>
> [hidden email] napsal dne 26.03.2010 10:41:29:
>
> > Hi,
> >
> > I have a column in a data frame looking something like:
> >
> > $sex $language $count
> > male  english  0
> > male  english  0
> > female  english  32
> > male  spanish  154
> > female  english  11
> > female  norweigan 7
> >
> > and so on.
> > What I want to do is to order these in to categories, for instance one
> > category where count>=0 & count<10 and so on..
>
> Break your counts into desired levels,
> see ?cut
> cut(1:100, breaks=10)
>
>
> >
> > I want my data to turn out looking something like:
> >
> > male english 0-10 1324
> > male english 11-20 756
> > .....
> > male spanish 0-10 354
> > ...
> > female english 0-10 1557
> > ...
>
> aggregate your data
>
> with(your.data, aggregate(count, list(sex, language, cutted.count),
> length))
>
> Regards
> Petr
>
>
> >
> > and so on, where the right hand is the count of the number of people in
> each
> > category.
> > Up until now I've been subsetting the data frame into each category, and
> > then counting number of rows in each subset. However I now have a large
> > amount of different factor combinations which makes this process
> tedious.
> >
> > Any help would be appreciated!
> > Chris
> >
> >    [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...