Quantcast

Re-ordering factors

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re-ordering factors

Murray Jorgensen
A small example before I begin my query:

> educ <- read.table(efile, header=TRUE)
> educ
      Education Age_Group Count
1  IncompleteHS     25-34  5416
2  IncompleteHS     35-44  5030
3  IncompleteHS     45-54  5777
4  IncompleteHS     55-64  7606
5  IncompleteHS       >64 13746
6   CompletedHS     25-34 16431
7   CompletedHS     35-44  1855
8   CompletedHS     45-54  9435
9   CompletedHS     55-64  8795
10  CompletedHS       >64  7558
11       Uni1-3     25-34  8555
12       Uni1-3     35-44  5576
13       Uni1-3     45-54  3124
14       Uni1-3     55-64  2524
15       Uni1-3       >64  2503
16        Uni4+     25-34  9771
17        Uni4+     35-44  7596
18        Uni4+     45-54  3904
19        Uni4+     55-64  3109
20        Uni4+       >64  2483
> xtabs(Count ~ Education + Age_Group, data=educ)
              Age_Group
Education        >64 25-34 35-44 45-54 55-64
  CompletedHS   7558 16431  1855  9435  8795
  IncompleteHS 13746  5416  5030  5777  7606
  Uni1-3        2503  8555  5576  3124  2524
  Uni4+         2483  9771  7596  3904  3109

Naturally I would prefer the factor levels in their natural ordering in
the table. I would like to re-order the levels of the factors to achieve
this.

I have tried reorder() in the gdata package:

> ed <- reorder(Education,neworder= c("IncompleteHS","CompletedHS",
+                           "Uni1-3","Uni4+"))
> agrp <- reorder(Age_Group,neworder=
+       c("25-34","35-44","45-54","55-64",">64"))
> xtabs(Count ~ ed + agrp)
              agrp
ed             25-34 35-44 45-54 55-64   >64
  CompletedHS  16431  1855  9435  8795  7558
  IncompleteHS  5416  5030  5777  7606 13746
  Uni1-3        8555  5576  3124  2524  2503
  Uni4+         9771  7596  3904  3109  2483

which works on one factor but not the other.

I have fooled a bit with reorder.factor() but I can't seem to figure out
how to drive it!

Cheers,  Murray Jorgensen
--
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: [hidden email]                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    Home +64 7 825 0441    Mobile 021 1395 862

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Re-ordering factors

James Reilly

Using reorder.factor from the stats package seems to work:

educ$ed <- reorder(educ$Education, sort(rep(1:4,5)))
levels(educ$Education)
[1] "CompletedHS"  "IncompleteHS" "Uni1-3"       "Uni4+"
levels(educ$ed)
[1] "IncompleteHS" "CompletedHS"  "Uni1-3"       "Uni4+"
xtabs(Count ~ ed + Age_Group, data=educ)
               Age_Group
ed             25-34 35-44 45-54 55-64   >64
   IncompleteHS  5416  5030  5777  7606 13746
   CompletedHS  16431  1855  9435  8795  7558
   Uni1-3        8555  5576  3124  2524  2503
   Uni4+         9771  7596  3904  3109  2483

James

On 7/10/07 4:52 PM, [hidden email] wrote:

> A small example before I begin my query:
>
>> educ <- read.table(efile, header=TRUE)
>> educ
>       Education Age_Group Count
> 1  IncompleteHS     25-34  5416
> 2  IncompleteHS     35-44  5030
> 3  IncompleteHS     45-54  5777
> 4  IncompleteHS     55-64  7606
> 5  IncompleteHS       >64 13746
> 6   CompletedHS     25-34 16431
> 7   CompletedHS     35-44  1855
> 8   CompletedHS     45-54  9435
> 9   CompletedHS     55-64  8795
> 10  CompletedHS       >64  7558
> 11       Uni1-3     25-34  8555
> 12       Uni1-3     35-44  5576
> 13       Uni1-3     45-54  3124
> 14       Uni1-3     55-64  2524
> 15       Uni1-3       >64  2503
> 16        Uni4+     25-34  9771
> 17        Uni4+     35-44  7596
> 18        Uni4+     45-54  3904
> 19        Uni4+     55-64  3109
> 20        Uni4+       >64  2483
>> xtabs(Count ~ Education + Age_Group, data=educ)
>               Age_Group
> Education        >64 25-34 35-44 45-54 55-64
>   CompletedHS   7558 16431  1855  9435  8795
>   IncompleteHS 13746  5416  5030  5777  7606
>   Uni1-3        2503  8555  5576  3124  2524
>   Uni4+         2483  9771  7596  3904  3109
>
> Naturally I would prefer the factor levels in their natural ordering in
> the table. I would like to re-order the levels of the factors to achieve
> this.
>
> I have tried reorder() in the gdata package:
>
>> ed <- reorder(Education,neworder= c("IncompleteHS","CompletedHS",
> +                           "Uni1-3","Uni4+"))
>> agrp <- reorder(Age_Group,neworder=
> +       c("25-34","35-44","45-54","55-64",">64"))
>> xtabs(Count ~ ed + agrp)
>               agrp
> ed             25-34 35-44 45-54 55-64   >64
>   CompletedHS  16431  1855  9435  8795  7558
>   IncompleteHS  5416  5030  5777  7606 13746
>   Uni1-3        8555  5576  3124  2524  2503
>   Uni4+         9771  7596  3904  3109  2483
>
> which works on one factor but not the other.
>
> I have fooled a bit with reorder.factor() but I can't seem to figure out
> how to drive it!
>
> Cheers,  Murray Jorgensen

--
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Controlling the Output from table()

Öhagen Patrik
In reply to this post by Murray Jorgensen



Dear All,

I would like to have some controll over the output from the function table(). I want to controll the order of the cells and I want to include empty cells (if any).

Example: I have ordinal data wich takes the values 1,2,3....N but I want the output from table to put the frequencies in a different order, say, 3,7,1,4.... and I want to include categories with zero frequency.

How is that done?


Cheers, Patrik

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Controlling the Output from table()

Prof Brian Ripley
On Sun, 7 Oct 2007, Öhagen Patrik wrote:

>
>
>
> Dear All,
>
> I would like to have some controll over the output from the function
> table(). I want to controll the order of the cells and I want to include
> empty cells (if any).
>
> Example: I have ordinal data wich takes the values 1,2,3....N but I want
> the output from table to put the frequencies in a different order, say,
> 3,7,1,4.... and I want to include categories with zero frequency.
>
> How is that done?
See ?tabulate.  You can reorder the result via indexing: I don't follow
what you want and you did not give us an example.  (For example, is
'ordinal data' represented by an ordered factor or by an integer vector?)

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Controlling the Output from table()

Öhagen Patrik
Sorry for the vaugness.

I have text data (exam grades, say) for different courses and I want to table them. In my table I want the frequency for the lowest grade first and the other frequencies in increaseing (grade) order. Some frequencies might be zero but I want to include those empty cells in my table.

I hope that made sense?

Thank you in advance!


Cheer, Patrik

-----Ursprungligt meddelande-----
Från: Prof Brian Ripley [mailto:[hidden email]]
Skickat: den 7 oktober 2007 09:11
Till: Öhagen Patrik
Kopia: [hidden email]
Ämne: Re: [R] Controlling the Output from table()

On Sun, 7 Oct 2007, Öhagen Patrik wrote:

>
>
>
> Dear All,
>
> I would like to have some controll over the output from the function
> table(). I want to controll the order of the cells and I want to include
> empty cells (if any).
>
> Example: I have ordinal data wich takes the values 1,2,3....N but I want
> the output from table to put the frequencies in a different order, say,
> 3,7,1,4.... and I want to include categories with zero frequency.
>
> How is that done?

See ?tabulate.  You can reorder the result via indexing: I don't follow
what you want and you did not give us an example.  (For example, is
'ordinal data' represented by an ordered factor or by an integer vector?)

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Re-ordering factors

Peter Dalgaard
In reply to this post by James Reilly
James Reilly wrote:

> Using reorder.factor from the stats package seems to work:
>
> educ$ed <- reorder(educ$Education, sort(rep(1:4,5)))
> levels(educ$Education)
> [1] "CompletedHS"  "IncompleteHS" "Uni1-3"       "Uni4+"
> levels(educ$ed)
> [1] "IncompleteHS" "CompletedHS"  "Uni1-3"       "Uni4+"
> xtabs(Count ~ ed + Age_Group, data=educ)
>                Age_Group
> ed             25-34 35-44 45-54 55-64   >64
>    IncompleteHS  5416  5030  5777  7606 13746
>    CompletedHS  16431  1855  9435  8795  7558
>    Uni1-3        8555  5576  3124  2524  2503
>    Uni4+         9771  7596  3904  3109  2483
>  
Notice that factor() itself will do it quite happily:

ed <- factor(Education, levels = c("IncompleteHS", "CompletedHS", "Uni1-3", "Uni4+"))

or even, utilizing the fact that the levels were in the right order to begin with

> educ$Education <- factor(educ$Education, levels=unique(educ$Education))
> educ$Age_Group <- factor(educ$Age_Group, levels=unique(educ$Age_Group))
> xtabs(Count ~ Education + Age_Group, data=educ)
              Age_Group
Education      25-34 35-44 45-54 55-64   >64
  IncompleteHS  5416  5030  5777  7606 13746
  CompletedHS  16431  1855  9435  8795  7558
  Uni1-3        8555  5576  3124  2524  2503
  Uni4+         9771  7596  3904  3109  2483





--
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Controlling the Output from table()

Prof Brian Ripley
In reply to this post by Öhagen Patrik
On Sun, 7 Oct 2007, Öhagen Patrik wrote:

> Sorry for the vaugness.
>
> I have text data (exam grades, say) for different courses and I want to
> table them. In my table I want the frequency for the lowest grade first
> and the other frequencies in increaseing (grade) order. Some frequencies
> might be zero but I want to include those empty cells in my table.

table() does not omit empty cells.  What is likely to be happening is that
you have missing levels in the factor you supply or table() creates.  The
way do this is to make your own ordered factor.  E.g.

> marks <- c(1,5,3,4,5,6)
> table(ordered(marks, levels=1:6))

1 2 3 4 5 6
1 0 1 1 2 1

>
> I hope that made sense?
>
> Thank you in advance!
>
>
> Cheer, Patrik
>
> -----Ursprungligt meddelande-----
> Från: Prof Brian Ripley [mailto:[hidden email]]
> Skickat: den 7 oktober 2007 09:11
> Till: Öhagen Patrik
> Kopia: [hidden email]
> Ämne: Re: [R] Controlling the Output from table()
>
> On Sun, 7 Oct 2007, Öhagen Patrik wrote:
>
>>
>>
>>
>> Dear All,
>>
>> I would like to have some controll over the output from the function
>> table(). I want to controll the order of the cells and I want to include
>> empty cells (if any).
>>
>> Example: I have ordinal data wich takes the values 1,2,3....N but I want
>> the output from table to put the frequencies in a different order, say,
>> 3,7,1,4.... and I want to include categories with zero frequency.
>>
>> How is that done?
>
> See ?tabulate.  You can reorder the result via indexing: I don't follow
> what you want and you did not give us an example.  (For example, is
> 'ordinal data' represented by an ordered factor or by an integer vector?)
>
>
--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: Re-ordering factors

Murray Jorgensen
In reply to this post by Peter Dalgaard
Thanks to James and Phil and Peter for their helpful suggestions. I think
that I should also point out one way *not* to do the job:

> xtabs(Count ~ Education + Age_Group, data=educ)
              Age_Group
Education        >64 25-34 35-44 45-54 55-64
  CompletedHS   7558 16431  1855  9435  8795
  IncompleteHS 13746  5416  5030  5777  7606
  Uni1-3        2503  8555  5576  3124  2524
  Uni4+         2483  9771  7596  3904  3109
> levels(educ$Education) <- c("IncompleteHS","CompletedHS",
+                           "Uni1-3","Uni4+")
> levels(educ$Age_Group) <- c("25-34","35-44","45-54","55-64",">64")
> xtabs(Count ~ Education + Age_Group, data=educ)
              Age_Group
Education      25-34 35-44 45-54 55-64   >64
  IncompleteHS  7558 16431  1855  9435  8795
  CompletedHS  13746  5416  5030  5777  7606
  Uni1-3        2503  8555  5576  3124  2524
  Uni4+         2483  9771  7596  3904  3109


Cheers,  Murray

> James Reilly wrote:
>> Using reorder.factor from the stats package seems to work:
>>
>> educ$ed <- reorder(educ$Education, sort(rep(1:4,5)))
>> levels(educ$Education)
>> [1] "CompletedHS"  "IncompleteHS" "Uni1-3"       "Uni4+"
>> levels(educ$ed)
>> [1] "IncompleteHS" "CompletedHS"  "Uni1-3"       "Uni4+"
>> xtabs(Count ~ ed + Age_Group, data=educ)
>>                Age_Group
>> ed             25-34 35-44 45-54 55-64   >64
>>    IncompleteHS  5416  5030  5777  7606 13746
>>    CompletedHS  16431  1855  9435  8795  7558
>>    Uni1-3        8555  5576  3124  2524  2503
>>    Uni4+         9771  7596  3904  3109  2483
>>
> Notice that factor() itself will do it quite happily:
>
> ed <- factor(Education, levels = c("IncompleteHS", "CompletedHS",
> "Uni1-3", "Uni4+"))
>
> or even, utilizing the fact that the levels were in the right order to
> begin with
>
>> educ$Education <- factor(educ$Education, levels=unique(educ$Education))
>> educ$Age_Group <- factor(educ$Age_Group, levels=unique(educ$Age_Group))
>> xtabs(Count ~ Education + Age_Group, data=educ)
>               Age_Group
> Education      25-34 35-44 45-54 55-64   >64
>   IncompleteHS  5416  5030  5777  7606 13746
>   CompletedHS  16431  1855  9435  8795  7558
>   Uni1-3        8555  5576  3124  2524  2503
>   Uni4+         9771  7596  3904  3109  2483
>
>
>
>
>
> --
>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45)
> 35327918
> ~~~~~~~~~~ - ([hidden email])                  FAX: (+45)
> 35327907
>
>
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...