List of Levels for all Factor variables

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

List of Levels for all Factor variables

Lopez, Dan
Hi,

I want to get a clean succinct list of all levels for all my factor variables.

I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".

BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.



1.       DATAFRAME

> str(mydata)
'data.frame':  11868 obs. of  26 variables:
$ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
$ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
$ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
$ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
$ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
$ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
$ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
$ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
$ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
$ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
$ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
$ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
$ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
$ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
$ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
$ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
$ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
$ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
$ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
$ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
$ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
$ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
$ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
$ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
$ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
$ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...

2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)

> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]

3. Get a list of all levels

> sapply(mydataF,function(x)levels(x))

$APPT_TYP_CD_LL

[1] "FX" "IN" "IP"



$ORG_NAM_LL

 [1] "Business"                        "Chief Financial Officer"         "Chief Information Office"        "Computation"                     "Engineering"                     "ESH and Quality"

 [7] "Facilities and Infrastructure"   "Global Security"                 "NIF"          "NO"              "Office of the Director"          "Operations and Business Office"

[13] "Physical and Life Sciences"      "Planning and Financial Services" "ST"   "Security Organization"           "Strategic Human Resources Mgmt"  "WCI"



$NEW_DISCIPLINE

 [1] "100s"                       "300s"                       "400s"                       "500s"                       "600s"                       "800s"                       "900s"

 [8] "Chem  Science"              "Engineering"                "Life Sciences"              "Math  Computer Science  IT" "Physics"                    "pre100s"                    "PSTS Other"

[15] "Re"



$SERIES   ......

Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: List of Levels for all Factor variables

Michael Weylandt
On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan <[hidden email]> wrote:

> Hi,
>
> I want to get a clean succinct list of all levels for all my factor variables.
>
> I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
>
> BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
>
>
>
> 1.       DATAFRAME
>
>> str(mydata)
> 'data.frame':  11868 obs. of  26 variables:
> $ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
> $ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
> $ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
> $ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
> $ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
> $ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
> $ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
> $ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
> $ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
> $ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
> $ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
> $ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
> $ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
> $ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
> $ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
> $ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
> $ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
> $ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
> $ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
> $ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
> $ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
> $ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
> $ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
> $ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
> $ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
>
> 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
>
>> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>
> 3. Get a list of all levels
>
>> sapply(mydataF,function(x)levels(x))
>

I think you want to unlist() the result of this call.

RMW

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: List of Levels for all Factor variables

Lopez, Dan
Using unlist() did not produce the result I wanted. I have a dataframe. I tried playing with the parameters of unlist but each time it just tried to return each observation.

unlist(x, recursive = TRUE, use.names = TRUE)

Dan

-----Original Message-----
From: R. Michael Weylandt [mailto:[hidden email]]
Sent: Tuesday, October 16, 2012 8:28 AM
To: Lopez, Dan
Cc: R help ([hidden email])
Subject: Re: [R] List of Levels for all Factor variables

On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan <[hidden email]> wrote:

> Hi,
>
> I want to get a clean succinct list of all levels for all my factor variables.
>
> I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
>
> BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
>
>
>
> 1.       DATAFRAME
>
>> str(mydata)
> 'data.frame':  11868 obs. of  26 variables:
> $ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
> $ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
> $ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
> $ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
> $ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
> $ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
> $ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
> $ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
> $ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
> $ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
> $ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
> $ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
> $ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
> $ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
> $ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
> $ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
> $ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
> $ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
> $ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
> $ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
> $ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
> $ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
> $ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
> $ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
> $ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
>
> 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
>
>> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>
> 3. Get a list of all levels
>
>> sapply(mydataF,function(x)levels(x))
>

I think you want to unlist() the result of this call.

RMW

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: List of Levels for all Factor variables

Rui Barradas
Hello,

The problem is with "clean"?

dat <- data.frame(X = sample(letters[1:4], 100, TRUE),
             Y = sample(LETTERS[1:6], 100, TRUE),
             Z = factor(rep(1:5, 4)))


levs <- lapply(dat, levels)
clean <- lapply(seq_along(levs), function(i)
     paste(names(levs)[i], ":",  paste(levs[[i]], collapse = " ")))

sapply(clean, print)

Hope this helps,

Rui Barradas
Em 16-10-2012 16:40, Lopez, Dan escreveu:

> Using unlist() did not produce the result I wanted. I have a dataframe. I tried playing with the parameters of unlist but each time it just tried to return each observation.
>
> unlist(x, recursive = TRUE, use.names = TRUE)
>
> Dan
>
> -----Original Message-----
> From: R. Michael Weylandt [mailto:[hidden email]]
> Sent: Tuesday, October 16, 2012 8:28 AM
> To: Lopez, Dan
> Cc: R help ([hidden email])
> Subject: Re: [R] List of Levels for all Factor variables
>
> On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan <[hidden email]> wrote:
>> Hi,
>>
>> I want to get a clean succinct list of all levels for all my factor variables.
>>
>> I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
>>
>> BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
>>
>>
>>
>> 1.       DATAFRAME
>>
>>> str(mydata)
>> 'data.frame':  11868 obs. of  26 variables:
>> $ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
>> $ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
>> $ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
>> $ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
>> $ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
>> $ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
>> $ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
>> $ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
>> $ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
>> $ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
>> $ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
>> $ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
>> $ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
>> $ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
>> $ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
>> $ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
>> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
>> $ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
>> $ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
>> $ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
>> $ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
>> $ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
>> $ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
>> $ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
>> $ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
>> $ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
>>
>> 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
>>
>>> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>> 3. Get a list of all levels
>>
>>> sapply(mydataF,function(x)levels(x))
> I think you want to unlist() the result of this call.
>
> RMW
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: List of Levels for all Factor variables

Lopez, Dan
Perfect!
Thank you!
Dan


-----Original Message-----
From: Rui Barradas [mailto:[hidden email]]
Sent: Tuesday, October 16, 2012 9:03 AM
To: Lopez, Dan
Cc: R. Michael Weylandt; R help ([hidden email])
Subject: Re: [R] List of Levels for all Factor variables

Hello,

The problem is with "clean"?

dat <- data.frame(X = sample(letters[1:4], 100, TRUE),
             Y = sample(LETTERS[1:6], 100, TRUE),
             Z = factor(rep(1:5, 4)))


levs <- lapply(dat, levels)
clean <- lapply(seq_along(levs), function(i)
     paste(names(levs)[i], ":",  paste(levs[[i]], collapse = " ")))

sapply(clean, print)

Hope this helps,

Rui Barradas
Em 16-10-2012 16:40, Lopez, Dan escreveu:

> Using unlist() did not produce the result I wanted. I have a dataframe. I tried playing with the parameters of unlist but each time it just tried to return each observation.
>
> unlist(x, recursive = TRUE, use.names = TRUE)
>
> Dan
>
> -----Original Message-----
> From: R. Michael Weylandt [mailto:[hidden email]]
> Sent: Tuesday, October 16, 2012 8:28 AM
> To: Lopez, Dan
> Cc: R help ([hidden email])
> Subject: Re: [R] List of Levels for all Factor variables
>
> On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan <[hidden email]> wrote:
>> Hi,
>>
>> I want to get a clean succinct list of all levels for all my factor variables.
>>
>> I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
>>
>> BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
>>
>>
>>
>> 1.       DATAFRAME
>>
>>> str(mydata)
>> 'data.frame':  11868 obs. of  26 variables:
>> $ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
>> $ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
>> $ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
>> $ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
>> $ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
>> $ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
>> $ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
>> $ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
>> $ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
>> $ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
>> $ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
>> $ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
>> $ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
>> $ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
>> $ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
>> $ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
>> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
>> $ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
>> $ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
>> $ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
>> $ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
>> $ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
>> $ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
>> $ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
>> $ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
>> $ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
>>
>> 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
>>
>>> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>> 3. Get a list of all levels
>>
>>> sapply(mydataF,function(x)levels(x))
> I think you want to unlist() the result of this call.
>
> RMW
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: List of Levels for all Factor variables

arun kirshna
In reply to this post by Lopez, Dan
HI,
You can also try this:
set.seed(1)
dat1<-data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(letters[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))

sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x) x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-1],collapse=" "))),print)
#[1] "col1 : 2 6 7 10 15 16 17 23 24"
#[1] "col2 : b c d e g h j"
#[1] "col3 : 1 2 3 4 5"
#[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j"         
#[3] "col3 : 1 2 3 4 5"  

A.K.   




----- Original Message -----
From: "Lopez, Dan" <[hidden email]>
To: "R help ([hidden email])" <[hidden email]>
Cc:
Sent: Tuesday, October 16, 2012 11:19 AM
Subject: [R] List of Levels for all Factor variables

Hi,

I want to get a clean succinct list of all levels for all my factor variables.

I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".

BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.



1.       DATAFRAME

> str(mydata)
'data.frame':  11868 obs. of  26 variables:
$ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
$ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
$ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
$ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
$ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
$ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
$ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
$ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
$ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
$ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
$ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
$ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
$ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
$ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
$ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
$ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
$ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
$ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
$ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
$ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
$ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
$ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
$ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
$ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
$ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
$ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...

2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)

> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]

3. Get a list of all levels

> sapply(mydataF,function(x)levels(x))

$APPT_TYP_CD_LL

[1] "FX" "IN" "IP"



$ORG_NAM_LL

[1] "Business"                        "Chief Financial Officer"         "Chief Information Office"        "Computation"                     "Engineering"                     "ESH and Quality"

[7] "Facilities and Infrastructure"   "Global Security"                 "NIF"          "NO"              "Office of the Director"          "Operations and Business Office"

[13] "Physical and Life Sciences"      "Planning and Financial Services" "ST"   "Security Organization"           "Strategic Human Resources Mgmt"  "WCI"



$NEW_DISCIPLINE

[1] "100s"                       "300s"                       "400s"                       "500s"                       "600s"                       "800s"                       "900s"

[8] "Chem  Science"              "Engineering"                "Life Sciences"              "Math  Computer Science  IT" "Physics"                    "pre100s"                    "PSTS Other"

[15] "Re"



$SERIES   ......

Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics


    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: List of Levels for all Factor variables

David Carlson
Given dat1, does this work?

> PrintLvls <- function(x) {print(data.frame(Lvls=sapply(x[sapply(x,
is.factor)],
+      nlevels), Names=sapply(x[sapply(x, is.factor)],
+      function(y) paste0(levels(y), collapse=", "))), right=FALSE)
+ }
> PrintLvls(dat1)
     Lvls Names                          
col1 9    2, 6, 7, 10, 15, 16, 17, 23, 24
col2 7    b, c, d, e, g, h, j            
col3 5    1, 2, 3, 4, 5                  

It automatically extracts the columns that are factors so it should work on
your original data.frame.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352



> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of arun
> Sent: Tuesday, October 16, 2012 12:09 PM
> To: Lopez, Dan
> Cc: R help
> Subject: Re: [R] List of Levels for all Factor variables
>
> HI,
> You can also try this:
> set.seed(1)
> dat1<-
> data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(letter
> s[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))
>
> sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x)
> x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-
> 1],collapse=" "))),print)
> #[1] "col1 : 2 6 7 10 15 16 17 23 24"
> #[1] "col2 : b c d e g h j"
> #[1] "col3 : 1 2 3 4 5"
> #[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j"
> #[3] "col3 : 1 2 3 4 5"
>
> A.K.
>
>
>
>
> ----- Original Message -----
> From: "Lopez, Dan" <[hidden email]>
> To: "R help ([hidden email])" <[hidden email]>
> Cc:
> Sent: Tuesday, October 16, 2012 11:19 AM
> Subject: [R] List of Levels for all Factor variables
>
> Hi,
>
> I want to get a clean succinct list of all levels for all my factor
> variables.
>
> I have a dataframe that's something like #1 below. This is just an
> example subset of my data and my actual dataset has 70 variables. I
> know how to narrow down my list of variables to just my factor
> variables by using #2 below (thanks to Bert Gunter). I can also get
> list of all levels for all my factor variables using #3 below. But I
> what I want to find out is if there is a way to get this list in a
> similar fashion to what the str function returns: without all the extra
> spacing and carriage returns. That's what I mean by "clean succinct
> list".
>
> BTW I also tried playing around with several of the parameters for the
> str function itself but could not find a way to accomplish what I want
> to accomplish.
>
>
>
> 1.       DATAFRAME
>
> > str(mydata)
> 'data.frame':  11868 obs. of  26 variables:
> $ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558
> 3423 743175 ...
> $ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242
> 161 104 336 4254 1595 1244 3669 4760 ...
> $ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
> $ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
> $ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2
> 2 2 ...
> $ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial
> Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
> $ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11
> 11 14 2 1 1 ...
> $ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9
> 2 1 1 ...
> $ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
> $ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
> $ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
> $ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6
> 6 5 2 3 2 2 1 ...
> $ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
> $ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1
> 2 ...
> $ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
> $ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986
> 7231 6758 ...
> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4
> 3 2 3 4 4 3 4 3 ...
> $ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8
> 8 8 8 8 ...
> $ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4
> 3 3 6 3 2 ...
> $ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1
> 2 4 2 ...
> $ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3
> 3 4 4 ...
> $ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
> $ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
> $ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
> $ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
> $ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1
> 2 2 1 ...
>
> 2. Create mydataF to only include factor variables (and exclude NAME
> which I am not interested in)
>
> > mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>
> 3. Get a list of all levels
>
> > sapply(mydataF,function(x)levels(x))
>
> $APPT_TYP_CD_LL
>
> [1] "FX" "IN" "IP"
>
>
>
> $ORG_NAM_LL
>
> [1] "Business"                        "Chief Financial Officer"
> "Chief Information Office"        "Computation"
> "Engineering"                     "ESH and Quality"
>
> [7] "Facilities and Infrastructure"   "Global Security"
> "NIF"          "NO"              "Office of the Director"
> "Operations and Business Office"
>
> [13] "Physical and Life Sciences"      "Planning and Financial
> Services" "ST"   "Security Organization"           "Strategic Human
> Resources Mgmt"  "WCI"
>
>
>
> $NEW_DISCIPLINE
>
> [1] "100s"                       "300s"                       "400s"
>                    "500s"                       "600s"
>      "800s"                       "900s"
>
> [8] "Chem  Science"              "Engineering"                "Life
> Sciences"              "Math  Computer Science  IT" "Physics"
>           "pre100s"                    "PSTS Other"
>
> [15] "Re"
>
>
>
> $SERIES   ......
>
> Daniel Lopez
> Workforce Analyst
> HRIM - Workforce Analytics & Metrics
>
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: List of Levels for all Factor variables

Lopez, Dan
In reply to this post by arun kirshna


Thanks.
Dan


-----Original Message-----
From: arun [mailto:[hidden email]]
Sent: Tuesday, October 16, 2012 10:09 AM
To: Lopez, Dan
Cc: R help; Rui Barradas
Subject: Re: [R] List of Levels for all Factor variables

HI,
You can also try this:
set.seed(1)
dat1<-data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(letters[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))

sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x) x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-1],collapse=" "))),print) #[1] "col1 : 2 6 7 10 15 16 17 23 24"
#[1] "col2 : b c d e g h j"
#[1] "col3 : 1 2 3 4 5"
#[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j" #[3] "col3 : 1 2 3 4 5"  

A.K.   




----- Original Message -----
From: "Lopez, Dan" <[hidden email]>
To: "R help ([hidden email])" <[hidden email]>
Cc:
Sent: Tuesday, October 16, 2012 11:19 AM
Subject: [R] List of Levels for all Factor variables

Hi,

I want to get a clean succinct list of all levels for all my factor variables.

I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".

BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.



1.       DATAFRAME

> str(mydata)
'data.frame':  11868 obs. of  26 variables:
$ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
$ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
$ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
$ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
$ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
$ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
$ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
$ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
$ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
$ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
$ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
$ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
$ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
$ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
$ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
$ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
$ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
$ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
$ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
$ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
$ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
$ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
$ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
$ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
$ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
$ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...

2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)

> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]

3. Get a list of all levels

> sapply(mydataF,function(x)levels(x))

$APPT_TYP_CD_LL

[1] "FX" "IN" "IP"



$ORG_NAM_LL

[1] "Business"                        "Chief Financial Officer"         "Chief Information Office"        "Computation"                     "Engineering"                     "ESH and Quality"

[7] "Facilities and Infrastructure"   "Global Security"                 "NIF"          "NO"              "Office of the Director"          "Operations and Business Office"

[13] "Physical and Life Sciences"      "Planning and Financial Services" "ST"   "Security Organization"           "Strategic Human Resources Mgmt"  "WCI"



$NEW_DISCIPLINE

[1] "100s"                       "300s"                       "400s"                       "500s"                       "600s"                       "800s"                       "900s"

[8] "Chem  Science"              "Engineering"                "Life Sciences"              "Math  Computer Science  IT" "Physics"                    "pre100s"                    "PSTS Other"

[15] "Re"



$SERIES   ......

Daniel Lopez
Workforce Analyst
HRIM - Workforce Analytics & Metrics


    [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: List of Levels for all Factor variables

Lopez, Dan
In reply to this post by David Carlson
Hi David,

This is perfect.

Thank you very much!

FYI - I tweaked the code you gave me to exclude factor variables with more than 32 levels (based on Random Forest limits). This would be fields like employee names or department names. This is what I used
PrintLvls2 <- function(x) {print(data.frame(Lvls=sapply(x[sapply(x,function(x)is.factor(x)&&length(levels(x))<=32)],nlevels),
                                              Names=sapply(x[sapply(x, function(x)is.factor(x)&&length(levels(x))<=32)],
                                            function(y) paste0(levels(y), collapse=", "))), right=FALSE)}

Thanks again.
Dan


-----Original Message-----
From: David L Carlson [mailto:[hidden email]]
Sent: Wednesday, October 17, 2012 8:29 AM
To: 'arun'; Lopez, Dan
Cc: 'R help'
Subject: RE: [R] List of Levels for all Factor variables

Given dat1, does this work?

> PrintLvls <- function(x) {print(data.frame(Lvls=sapply(x[sapply(x,
is.factor)],
+      nlevels), Names=sapply(x[sapply(x, is.factor)],
+      function(y) paste0(levels(y), collapse=", "))), right=FALSE) }
> PrintLvls(dat1)
     Lvls Names                          
col1 9    2, 6, 7, 10, 15, 16, 17, 23, 24
col2 7    b, c, d, e, g, h, j            
col3 5    1, 2, 3, 4, 5                  

It automatically extracts the columns that are factors so it should work on your original data.frame.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352



> -----Original Message-----
> From: [hidden email] [mailto:r-help-bounces@r-
> project.org] On Behalf Of arun
> Sent: Tuesday, October 16, 2012 12:09 PM
> To: Lopez, Dan
> Cc: R help
> Subject: Re: [R] List of Levels for all Factor variables
>
> HI,
> You can also try this:
> set.seed(1)
> dat1<-
> data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(lette
> r
> s[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))
>
> sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x)
> x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-
> 1],collapse=" "))),print) #[1] "col1 : 2 6 7 10 15 16 17 23 24"
> #[1] "col2 : b c d e g h j"
> #[1] "col3 : 1 2 3 4 5"
> #[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j"
> #[3] "col3 : 1 2 3 4 5"
>
> A.K.
>
>
>
>
> ----- Original Message -----
> From: "Lopez, Dan" <[hidden email]>
> To: "R help ([hidden email])" <[hidden email]>
> Cc:
> Sent: Tuesday, October 16, 2012 11:19 AM
> Subject: [R] List of Levels for all Factor variables
>
> Hi,
>
> I want to get a clean succinct list of all levels for all my factor
> variables.
>
> I have a dataframe that's something like #1 below. This is just an
> example subset of my data and my actual dataset has 70 variables. I
> know how to narrow down my list of variables to just my factor
> variables by using #2 below (thanks to Bert Gunter). I can also get
> list of all levels for all my factor variables using #3 below. But I
> what I want to find out is if there is a way to get this list in a
> similar fashion to what the str function returns: without all the
> extra spacing and carriage returns. That's what I mean by "clean
> succinct list".
>
> BTW I also tried playing around with several of the parameters for the
> str function itself but could not find a way to accomplish what I want
> to accomplish.
>
>
>
> 1.       DATAFRAME
>
> > str(mydata)
> 'data.frame':  11868 obs. of  26 variables:
> $ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628
> 558
> 3423 743175 ...
> $ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242
> 161 104 336 4254 1595 1244 3669 4760 ...
> $ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
> $ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
> $ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2
> 2 2 ...
> $ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial
> Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
> $ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11
> 11 14 2 1 1 ...
> $ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9
> 9
> 2 1 1 ...
> $ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
> $ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
> $ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
> $ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6
> 6
> 6 5 2 3 2 2 1 ...
> $ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
> $ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2
> 1
> 2 ...
> $ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
> $ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986
> 7231 6758 ...
> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4
> 3 2 3 4 4 3 4 3 ...
> $ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8
> 8 8 8 8 ...
> $ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2
> 4
> 3 3 6 3 2 ...
> $ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1
> 2 4 2 ...
> $ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3
> 3
> 3 4 4 ...
> $ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
> $ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
> $ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
> $ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
> $ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1
> 1
> 2 2 1 ...
>
> 2. Create mydataF to only include factor variables (and exclude NAME
> which I am not interested in)
>
> > mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>
> 3. Get a list of all levels
>
> > sapply(mydataF,function(x)levels(x))
>
> $APPT_TYP_CD_LL
>
> [1] "FX" "IN" "IP"
>
>
>
> $ORG_NAM_LL
>
> [1] "Business"                        "Chief Financial Officer"
> "Chief Information Office"        "Computation"
> "Engineering"                     "ESH and Quality"
>
> [7] "Facilities and Infrastructure"   "Global Security"
> "NIF"          "NO"              "Office of the Director"
> "Operations and Business Office"
>
> [13] "Physical and Life Sciences"      "Planning and Financial
> Services" "ST"   "Security Organization"           "Strategic Human
> Resources Mgmt"  "WCI"
>
>
>
> $NEW_DISCIPLINE
>
> [1] "100s"                       "300s"                       "400s"
>                    "500s"                       "600s"
>      "800s"                       "900s"
>
> [8] "Chem  Science"              "Engineering"                "Life
> Sciences"              "Math  Computer Science  IT" "Physics"
>           "pre100s"                    "PSTS Other"
>
> [15] "Re"
>
>
>
> $SERIES   ......
>
> Daniel Lopez
> Workforce Analyst
> HRIM - Workforce Analytics & Metrics
>
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html and provide commented, minimal, self-contained,
> reproducible code.
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html and provide commented, minimal, self-contained,
> reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.