|
|
Hi,
I want to get a clean succinct list of all levels for all my factor variables.
I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
1. DATAFRAME
> str(mydata)
'data.frame': 11868 obs. of 26 variables:
$ EMPLID : int 431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
$ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
$ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ...
$ TARGET : int 0 0 0 0 0 0 0 0 0 0 ...
$ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
$ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
$ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
$ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
$ AGE : int 62 53 46 62 55 59 50 36 34 53 ...
$ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ...
$ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ...
$ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
$ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
$ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
$ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
$ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
$ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
$ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
$ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
$ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
$ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
$ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
$ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ...
$ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ...
$ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ...
$ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
3. Get a list of all levels
> sapply(mydataF,function(x)levels(x))
$APPT_TYP_CD_LL
[1] "FX" "IN" "IP"
$ORG_NAM_LL
[1] "Business" "Chief Financial Officer" "Chief Information Office" "Computation" "Engineering" "ESH and Quality"
[7] "Facilities and Infrastructure" "Global Security" "NIF" "NO" "Office of the Director" "Operations and Business Office"
[13] "Physical and Life Sciences" "Planning and Financial Services" "ST" "Security Organization" "Strategic Human Resources Mgmt" "WCI"
$NEW_DISCIPLINE
[1] "100s" "300s" "400s" "500s" "600s" "800s" "900s"
[8] "Chem Science" "Engineering" "Life Sciences" "Math Computer Science IT" "Physics" "pre100s" "PSTS Other"
[15] "Re"
$SERIES ......
Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan < [hidden email]> wrote:
> Hi,
>
> I want to get a clean succinct list of all levels for all my factor variables.
>
> I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
>
> BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
>
>
>
> 1. DATAFRAME
>
>> str(mydata)
> 'data.frame': 11868 obs. of 26 variables:
> $ EMPLID : int 431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
> $ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
> $ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ...
> $ TARGET : int 0 0 0 0 0 0 0 0 0 0 ...
> $ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
> $ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
> $ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
> $ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
> $ AGE : int 62 53 46 62 55 59 50 36 34 53 ...
> $ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ...
> $ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ...
> $ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
> $ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
> $ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
> $ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
> $ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
> $ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
> $ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
> $ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
> $ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
> $ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
> $ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ...
> $ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ...
> $ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ...
> $ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
>
> 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
>
>> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>
> 3. Get a list of all levels
>
>> sapply(mydataF,function(x)levels(x))
>
I think you want to unlist() the result of this call.
RMW
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
Using unlist() did not produce the result I wanted. I have a dataframe. I tried playing with the parameters of unlist but each time it just tried to return each observation.
unlist(x, recursive = TRUE, use.names = TRUE)
Dan
-----Original Message-----
From: R. Michael Weylandt [mailto: [hidden email]]
Sent: Tuesday, October 16, 2012 8:28 AM
To: Lopez, Dan
Cc: R help ( [hidden email])
Subject: Re: [R] List of Levels for all Factor variables
On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan < [hidden email]> wrote:
> Hi,
>
> I want to get a clean succinct list of all levels for all my factor variables.
>
> I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
>
> BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
>
>
>
> 1. DATAFRAME
>
>> str(mydata)
> 'data.frame': 11868 obs. of 26 variables:
> $ EMPLID : int 431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
> $ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
> $ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ...
> $ TARGET : int 0 0 0 0 0 0 0 0 0 0 ...
> $ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
> $ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
> $ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
> $ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
> $ AGE : int 62 53 46 62 55 59 50 36 34 53 ...
> $ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ...
> $ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ...
> $ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
> $ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
> $ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
> $ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
> $ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
> $ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
> $ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
> $ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
> $ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
> $ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
> $ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ...
> $ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ...
> $ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ...
> $ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
>
> 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
>
>> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>
> 3. Get a list of all levels
>
>> sapply(mydataF,function(x)levels(x))
>
I think you want to unlist() the result of this call.
RMW
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
Hello,
The problem is with "clean"?
dat <- data.frame(X = sample(letters[1:4], 100, TRUE),
Y = sample(LETTERS[1:6], 100, TRUE),
Z = factor(rep(1:5, 4)))
levs <- lapply(dat, levels)
clean <- lapply(seq_along(levs), function(i)
paste(names(levs)[i], ":", paste(levs[[i]], collapse = " ")))
sapply(clean, print)
Hope this helps,
Rui Barradas
Em 16-10-2012 16:40, Lopez, Dan escreveu:
> Using unlist() did not produce the result I wanted. I have a dataframe. I tried playing with the parameters of unlist but each time it just tried to return each observation.
>
> unlist(x, recursive = TRUE, use.names = TRUE)
>
> Dan
>
> -----Original Message-----
> From: R. Michael Weylandt [mailto: [hidden email]]
> Sent: Tuesday, October 16, 2012 8:28 AM
> To: Lopez, Dan
> Cc: R help ( [hidden email])
> Subject: Re: [R] List of Levels for all Factor variables
>
> On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan < [hidden email]> wrote:
>> Hi,
>>
>> I want to get a clean succinct list of all levels for all my factor variables.
>>
>> I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
>>
>> BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
>>
>>
>>
>> 1. DATAFRAME
>>
>>> str(mydata)
>> 'data.frame': 11868 obs. of 26 variables:
>> $ EMPLID : int 431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
>> $ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
>> $ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ...
>> $ TARGET : int 0 0 0 0 0 0 0 0 0 0 ...
>> $ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
>> $ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
>> $ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
>> $ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
>> $ AGE : int 62 53 46 62 55 59 50 36 34 53 ...
>> $ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ...
>> $ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ...
>> $ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
>> $ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
>> $ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
>> $ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
>> $ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
>> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
>> $ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
>> $ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
>> $ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
>> $ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
>> $ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
>> $ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ...
>> $ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ...
>> $ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ...
>> $ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
>>
>> 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
>>
>>> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>> 3. Get a list of all levels
>>
>>> sapply(mydataF,function(x)levels(x))
> I think you want to unlist() the result of this call.
>
> RMW
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
Perfect!
Thank you!
Dan
-----Original Message-----
From: Rui Barradas [mailto: [hidden email]]
Sent: Tuesday, October 16, 2012 9:03 AM
To: Lopez, Dan
Cc: R. Michael Weylandt; R help ( [hidden email])
Subject: Re: [R] List of Levels for all Factor variables
Hello,
The problem is with "clean"?
dat <- data.frame(X = sample(letters[1:4], 100, TRUE),
Y = sample(LETTERS[1:6], 100, TRUE),
Z = factor(rep(1:5, 4)))
levs <- lapply(dat, levels)
clean <- lapply(seq_along(levs), function(i)
paste(names(levs)[i], ":", paste(levs[[i]], collapse = " ")))
sapply(clean, print)
Hope this helps,
Rui Barradas
Em 16-10-2012 16:40, Lopez, Dan escreveu:
> Using unlist() did not produce the result I wanted. I have a dataframe. I tried playing with the parameters of unlist but each time it just tried to return each observation.
>
> unlist(x, recursive = TRUE, use.names = TRUE)
>
> Dan
>
> -----Original Message-----
> From: R. Michael Weylandt [mailto: [hidden email]]
> Sent: Tuesday, October 16, 2012 8:28 AM
> To: Lopez, Dan
> Cc: R help ( [hidden email])
> Subject: Re: [R] List of Levels for all Factor variables
>
> On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan < [hidden email]> wrote:
>> Hi,
>>
>> I want to get a clean succinct list of all levels for all my factor variables.
>>
>> I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
>>
>> BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
>>
>>
>>
>> 1. DATAFRAME
>>
>>> str(mydata)
>> 'data.frame': 11868 obs. of 26 variables:
>> $ EMPLID : int 431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
>> $ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
>> $ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ...
>> $ TARGET : int 0 0 0 0 0 0 0 0 0 0 ...
>> $ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
>> $ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
>> $ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
>> $ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
>> $ AGE : int 62 53 46 62 55 59 50 36 34 53 ...
>> $ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ...
>> $ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ...
>> $ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
>> $ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
>> $ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
>> $ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
>> $ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
>> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
>> $ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
>> $ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
>> $ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
>> $ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
>> $ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
>> $ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ...
>> $ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ...
>> $ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ...
>> $ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
>>
>> 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
>>
>>> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>> 3. Get a list of all levels
>>
>>> sapply(mydataF,function(x)levels(x))
> I think you want to unlist() the result of this call.
>
> RMW
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
HI,
You can also try this:
set.seed(1)
dat1<-data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(letters[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))
sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x) x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-1],collapse=" "))),print)
#[1] "col1 : 2 6 7 10 15 16 17 23 24"
#[1] "col2 : b c d e g h j"
#[1] "col3 : 1 2 3 4 5"
#[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j"
#[3] "col3 : 1 2 3 4 5"
A.K.
----- Original Message -----
From: "Lopez, Dan" < [hidden email]>
To: "R help ( [hidden email])" < [hidden email]>
Cc:
Sent: Tuesday, October 16, 2012 11:19 AM
Subject: [R] List of Levels for all Factor variables
Hi,
I want to get a clean succinct list of all levels for all my factor variables.
I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
1. DATAFRAME
> str(mydata)
'data.frame': 11868 obs. of 26 variables:
$ EMPLID : int 431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
$ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
$ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ...
$ TARGET : int 0 0 0 0 0 0 0 0 0 0 ...
$ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
$ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
$ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
$ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
$ AGE : int 62 53 46 62 55 59 50 36 34 53 ...
$ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ...
$ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ...
$ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
$ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
$ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
$ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
$ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
$ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
$ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
$ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
$ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
$ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
$ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
$ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ...
$ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ...
$ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ...
$ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
3. Get a list of all levels
> sapply(mydataF,function(x)levels(x))
$APPT_TYP_CD_LL
[1] "FX" "IN" "IP"
$ORG_NAM_LL
[1] "Business" "Chief Financial Officer" "Chief Information Office" "Computation" "Engineering" "ESH and Quality"
[7] "Facilities and Infrastructure" "Global Security" "NIF" "NO" "Office of the Director" "Operations and Business Office"
[13] "Physical and Life Sciences" "Planning and Financial Services" "ST" "Security Organization" "Strategic Human Resources Mgmt" "WCI"
$NEW_DISCIPLINE
[1] "100s" "300s" "400s" "500s" "600s" "800s" "900s"
[8] "Chem Science" "Engineering" "Life Sciences" "Math Computer Science IT" "Physics" "pre100s" "PSTS Other"
[15] "Re"
$SERIES ......
Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
Given dat1, does this work?
> PrintLvls <- function(x) {print(data.frame(Lvls=sapply(x[sapply(x,
is.factor)],
+ nlevels), Names=sapply(x[sapply(x, is.factor)],
+ function(y) paste0(levels(y), collapse=", "))), right=FALSE)
+ }
> PrintLvls(dat1)
Lvls Names
col1 9 2, 6, 7, 10, 15, 16, 17, 23, 24
col2 7 b, c, d, e, g, h, j
col3 5 1, 2, 3, 4, 5
It automatically extracts the columns that are factors so it should work on
your original data.frame.
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of arun
> Sent: Tuesday, October 16, 2012 12:09 PM
> To: Lopez, Dan
> Cc: R help
> Subject: Re: [R] List of Levels for all Factor variables
>
> HI,
> You can also try this:
> set.seed(1)
> dat1<-
> data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(letter
> s[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))
>
> sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x)
> x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-
> 1],collapse=" "))),print)
> #[1] "col1 : 2 6 7 10 15 16 17 23 24"
> #[1] "col2 : b c d e g h j"
> #[1] "col3 : 1 2 3 4 5"
> #[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j"
> #[3] "col3 : 1 2 3 4 5"
>
> A.K.
>
>
>
>
> ----- Original Message -----
> From: "Lopez, Dan" < [hidden email]>
> To: "R help ( [hidden email])" < [hidden email]>
> Cc:
> Sent: Tuesday, October 16, 2012 11:19 AM
> Subject: [R] List of Levels for all Factor variables
>
> Hi,
>
> I want to get a clean succinct list of all levels for all my factor
> variables.
>
> I have a dataframe that's something like #1 below. This is just an
> example subset of my data and my actual dataset has 70 variables. I
> know how to narrow down my list of variables to just my factor
> variables by using #2 below (thanks to Bert Gunter). I can also get
> list of all levels for all my factor variables using #3 below. But I
> what I want to find out is if there is a way to get this list in a
> similar fashion to what the str function returns: without all the extra
> spacing and carriage returns. That's what I mean by "clean succinct
> list".
>
> BTW I also tried playing around with several of the parameters for the
> str function itself but could not find a way to accomplish what I want
> to accomplish.
>
>
>
> 1. DATAFRAME
>
> > str(mydata)
> 'data.frame': 11868 obs. of 26 variables:
> $ EMPLID : int 431108 32709 19730 10850 48786 2004 237628 558
> 3423 743175 ...
> $ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242
> 161 104 336 4254 1595 1244 3669 4760 ...
> $ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ...
> $ TARGET : int 0 0 0 0 0 0 0 0 0 0 ...
> $ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2
> 2 2 ...
> $ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial
> Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
> $ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11
> 11 14 2 1 1 ...
> $ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9
> 2 1 1 ...
> $ AGE : int 62 53 46 62 55 59 50 36 34 53 ...
> $ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ...
> $ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ...
> $ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6
> 6 5 2 3 2 2 1 ...
> $ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
> $ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1
> 2 ...
> $ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
> $ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986
> 7231 6758 ...
> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4
> 3 2 3 4 4 3 4 3 ...
> $ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8
> 8 8 8 8 ...
> $ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4
> 3 3 6 3 2 ...
> $ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1
> 2 4 2 ...
> $ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3
> 3 4 4 ...
> $ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
> $ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ...
> $ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ...
> $ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ...
> $ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1
> 2 2 1 ...
>
> 2. Create mydataF to only include factor variables (and exclude NAME
> which I am not interested in)
>
> > mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>
> 3. Get a list of all levels
>
> > sapply(mydataF,function(x)levels(x))
>
> $APPT_TYP_CD_LL
>
> [1] "FX" "IN" "IP"
>
>
>
> $ORG_NAM_LL
>
> [1] "Business" "Chief Financial Officer"
> "Chief Information Office" "Computation"
> "Engineering" "ESH and Quality"
>
> [7] "Facilities and Infrastructure" "Global Security"
> "NIF" "NO" "Office of the Director"
> "Operations and Business Office"
>
> [13] "Physical and Life Sciences" "Planning and Financial
> Services" "ST" "Security Organization" "Strategic Human
> Resources Mgmt" "WCI"
>
>
>
> $NEW_DISCIPLINE
>
> [1] "100s" "300s" "400s"
> "500s" "600s"
> "800s" "900s"
>
> [8] "Chem Science" "Engineering" "Life
> Sciences" "Math Computer Science IT" "Physics"
> "pre100s" "PSTS Other"
>
> [15] "Re"
>
>
>
> $SERIES ......
>
> Daniel Lopez
> Workforce Analyst
> HRIM - Workforce Analytics & Metrics
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-> guide.html
> and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
Thanks.
Dan
-----Original Message-----
From: arun [mailto: [hidden email]]
Sent: Tuesday, October 16, 2012 10:09 AM
To: Lopez, Dan
Cc: R help; Rui Barradas
Subject: Re: [R] List of Levels for all Factor variables
HI,
You can also try this:
set.seed(1)
dat1<-data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(letters[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))
sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x) x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-1],collapse=" "))),print) #[1] "col1 : 2 6 7 10 15 16 17 23 24"
#[1] "col2 : b c d e g h j"
#[1] "col3 : 1 2 3 4 5"
#[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j" #[3] "col3 : 1 2 3 4 5"
A.K.
----- Original Message -----
From: "Lopez, Dan" < [hidden email]>
To: "R help ( [hidden email])" < [hidden email]>
Cc:
Sent: Tuesday, October 16, 2012 11:19 AM
Subject: [R] List of Levels for all Factor variables
Hi,
I want to get a clean succinct list of all levels for all my factor variables.
I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
1. DATAFRAME
> str(mydata)
'data.frame': 11868 obs. of 26 variables:
$ EMPLID : int 431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
$ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
$ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ...
$ TARGET : int 0 0 0 0 0 0 0 0 0 0 ...
$ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
$ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
$ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
$ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
$ AGE : int 62 53 46 62 55 59 50 36 34 53 ...
$ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ...
$ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ...
$ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
$ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
$ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
$ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
$ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
$ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
$ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
$ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
$ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
$ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
$ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
$ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ...
$ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ...
$ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ...
$ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
3. Get a list of all levels
> sapply(mydataF,function(x)levels(x))
$APPT_TYP_CD_LL
[1] "FX" "IN" "IP"
$ORG_NAM_LL
[1] "Business" "Chief Financial Officer" "Chief Information Office" "Computation" "Engineering" "ESH and Quality"
[7] "Facilities and Infrastructure" "Global Security" "NIF" "NO" "Office of the Director" "Operations and Business Office"
[13] "Physical and Life Sciences" "Planning and Financial Services" "ST" "Security Organization" "Strategic Human Resources Mgmt" "WCI"
$NEW_DISCIPLINE
[1] "100s" "300s" "400s" "500s" "600s" "800s" "900s"
[8] "Chem Science" "Engineering" "Life Sciences" "Math Computer Science IT" "Physics" "pre100s" "PSTS Other"
[15] "Re"
$SERIES ......
Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|
Hi David,
This is perfect.
Thank you very much!
FYI - I tweaked the code you gave me to exclude factor variables with more than 32 levels (based on Random Forest limits). This would be fields like employee names or department names. This is what I used
PrintLvls2 <- function(x) {print(data.frame(Lvls=sapply(x[sapply(x,function(x)is.factor(x)&&length(levels(x))<=32)],nlevels),
Names=sapply(x[sapply(x, function(x)is.factor(x)&&length(levels(x))<=32)],
function(y) paste0(levels(y), collapse=", "))), right=FALSE)}
Thanks again.
Dan
-----Original Message-----
From: David L Carlson [mailto: [hidden email]]
Sent: Wednesday, October 17, 2012 8:29 AM
To: 'arun'; Lopez, Dan
Cc: 'R help'
Subject: RE: [R] List of Levels for all Factor variables
Given dat1, does this work?
> PrintLvls <- function(x) {print(data.frame(Lvls=sapply(x[sapply(x,
is.factor)],
+ nlevels), Names=sapply(x[sapply(x, is.factor)],
+ function(y) paste0(levels(y), collapse=", "))), right=FALSE) }
> PrintLvls(dat1)
Lvls Names
col1 9 2, 6, 7, 10, 15, 16, 17, 23, 24
col2 7 b, c, d, e, g, h, j
col3 5 1, 2, 3, 4, 5
It automatically extracts the columns that are factors so it should work on your original data.frame.
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of arun
> Sent: Tuesday, October 16, 2012 12:09 PM
> To: Lopez, Dan
> Cc: R help
> Subject: Re: [R] List of Levels for all Factor variables
>
> HI,
> You can also try this:
> set.seed(1)
> dat1<-
> data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(lette
> r
> s[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))
>
> sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x)
> x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-
> 1],collapse=" "))),print) #[1] "col1 : 2 6 7 10 15 16 17 23 24"
> #[1] "col2 : b c d e g h j"
> #[1] "col3 : 1 2 3 4 5"
> #[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j"
> #[3] "col3 : 1 2 3 4 5"
>
> A.K.
>
>
>
>
> ----- Original Message -----
> From: "Lopez, Dan" < [hidden email]>
> To: "R help ( [hidden email])" < [hidden email]>
> Cc:
> Sent: Tuesday, October 16, 2012 11:19 AM
> Subject: [R] List of Levels for all Factor variables
>
> Hi,
>
> I want to get a clean succinct list of all levels for all my factor
> variables.
>
> I have a dataframe that's something like #1 below. This is just an
> example subset of my data and my actual dataset has 70 variables. I
> know how to narrow down my list of variables to just my factor
> variables by using #2 below (thanks to Bert Gunter). I can also get
> list of all levels for all my factor variables using #3 below. But I
> what I want to find out is if there is a way to get this list in a
> similar fashion to what the str function returns: without all the
> extra spacing and carriage returns. That's what I mean by "clean
> succinct list".
>
> BTW I also tried playing around with several of the parameters for the
> str function itself but could not find a way to accomplish what I want
> to accomplish.
>
>
>
> 1. DATAFRAME
>
> > str(mydata)
> 'data.frame': 11868 obs. of 26 variables:
> $ EMPLID : int 431108 32709 19730 10850 48786 2004 237628
> 558
> 3423 743175 ...
> $ NAME : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242
> 161 104 336 4254 1595 1244 3669 4760 ...
> $ TRAIN : int 1 1 1 1 1 1 1 1 1 1 ...
> $ TARGET : int 0 0 0 0 0 0 0 0 0 0 ...
> $ APPT_TYP_CD_LL : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2
> 2 2 ...
> $ ORG_NAM_LL : Factor w/ 18 levels "Business","Chief Financial
> Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
> $ NEW_DISCIPLINE : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11
> 11 14 2 1 1 ...
> $ SERIES : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9
> 9
> 2 1 1 ...
> $ AGE : int 62 53 46 62 55 59 50 36 34 53 ...
> $ SERVICE : int 13 29 16 26 18 9 19 11 8 26 ...
> $ AGE_SERVICE : int 75 82 62 87 73 69 69 47 42 79 ...
> $ HIEDUCLV : Factor w/ 6 levels "Associate","Bachelor",..: 5 6
> 6
> 6 5 2 3 2 2 1 ...
> $ GENDER : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
> $ RETCD : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2
> 1
> 2 ...
> $ FLSASTATUS : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
> $ MONTHLY_RT : int 17640 6932 5845 9809 11473 8719 19190 8986
> 7231 6758 ...
> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4
> 3 2 3 4 4 3 4 3 ...
> $ ETHNIC_GRP_CD : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8
> 8 8 8 8 ...
> $ COMMUTE_BIN : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2
> 4
> 3 3 6 3 2 ...
> $ EEO_CLASS : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1
> 2 4 2 ...
> $ WRK_SCHED : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3
> 3
> 3 4 4 ...
> $ FWT_MAR_STATUS : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
> $ COVERED_DP : int 2 2 4 0 1 3 1 2 0 0 ...
> $ YRS_IN_SERIES : int 13 29 16 26 18 9 19 3 7 26 ...
> $ SAVINGS_PCT : int 10 0 6 19 8 0 10 15 15 18 ...
> $ Generation : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1
> 1
> 2 2 1 ...
>
> 2. Create mydataF to only include factor variables (and exclude NAME
> which I am not interested in)
>
> > mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>
> 3. Get a list of all levels
>
> > sapply(mydataF,function(x)levels(x))
>
> $APPT_TYP_CD_LL
>
> [1] "FX" "IN" "IP"
>
>
>
> $ORG_NAM_LL
>
> [1] "Business" "Chief Financial Officer"
> "Chief Information Office" "Computation"
> "Engineering" "ESH and Quality"
>
> [7] "Facilities and Infrastructure" "Global Security"
> "NIF" "NO" "Office of the Director"
> "Operations and Business Office"
>
> [13] "Physical and Life Sciences" "Planning and Financial
> Services" "ST" "Security Organization" "Strategic Human
> Resources Mgmt" "WCI"
>
>
>
> $NEW_DISCIPLINE
>
> [1] "100s" "300s" "400s"
> "500s" "600s"
> "800s" "900s"
>
> [8] "Chem Science" "Engineering" "Life
> Sciences" "Math Computer Science IT" "Physics"
> "pre100s" "PSTS Other"
>
> [15] "Re"
>
>
>
> $SERIES ......
>
> Daniel Lopez
> Workforce Analyst
> HRIM - Workforce Analytics & Metrics
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html and provide commented, minimal, self-contained,
> reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html and provide commented, minimal, self-contained,
> reproducible code.
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
|
|