datastructure for multi-choice factors

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

datastructure for multi-choice factors

Jeroen Ooms.
I am working on a system to visualize survey responses. Survey responses typically include factors, numeric, timestamps, textfields and therefore fit perfectly nice in dataframes, making it easy to visualize using standard R functions.

However I am currently working on a survey that also include questions in which the respondent can check more than one answer on a single multichoice item. I.e. this represents a factor for which every row has multiple responses. I am looking for a way to put this into a dataframe together with the other questions of the survey.

I considered three workarounds, but both are problematic:

 - Column-wise expanding: convert a single multi-choice item into N binary column factors for every possible response (level) with 1/0 values representing if the answer was checked or not. Problem with this is that you lose the information that these N columns are in fact one question and it becomes very hard to vizualise this single question.

- Row wise expanding: convert a single response into N rows, one for every response. Problem with this is that if the factor is part of the dataframe, also all of the other items have to be duplicated, leading to artificial results.

I was wondering if there is a more natural datastructure to put a multi-choice item into a dataframe? Some code for illustration:

people <- list(
  name=c("John", "Mary", "Jennifer", "Neil"),
  gender=factor(c("M","F","F","M")),
  age=c(34,23,40,30),
  residence=sapply(list("US", c("US", "CA"), "MX", c("MX", "US", "CA")), factor, levels=c("US", "CA", "MX"))
);

Reply | Threaded
Open this post in threaded view
|

Re: datastructure for multi-choice factors

Neal Fultz
I had a similar survey, and ended up stuffing everything into one field
using the bitops library.

On Thu, Jul 7, 2011 at 4:18 AM, jeroen00ms <[hidden email]>wrote:

> I am working on a system to visualize survey responses. Survey responses
> typically include factors, numeric, timestamps, textfields and therefore
> fit
> perfectly nice in dataframes, making it easy to visualize using standard R
> functions.
>
> However I am currently working on a survey that also include questions in
> which the respondent can check more than one answer on a single multichoice
> item. I.e. this represents a factor for which every row has multiple
> responses. I am looking for a way to put this into a dataframe together
> with
> the other questions of the survey.
>
> I considered three workarounds, but both are problematic:
>
>  - Column-wise expanding: convert a single multi-choice item into N binary
> column factors for every possible response (level) with 1/0 values
> representing if the answer was checked or not. Problem with this is that
> you
> lose the information that these N columns are in fact one question and it
> becomes very hard to vizualise this single question.
>
> - Row wise expanding: convert a single response into N rows, one for every
> response. Problem with this is that if the factor is part of the dataframe,
> also all of the other items have to be duplicated, leading to artificial
> results.
>
> I was wondering if there is a more natural datastructure to put a
> multi-choice item into a dataframe? Some code for illustration:
>
> people <- list(
>  name=c("John", "Mary", "Jennifer", "Neil"),
>  gender=factor(c("M","F","F","M")),
>  age=c(34,23,40,30),
>  residence=sapply(list("US", c("US", "CA"), "MX", c("MX", "US", "CA")),
> factor, levels=c("US", "CA", "MX"))
> );
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/datastructure-for-multi-choice-factors-tp3650940p3650940.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.