# Approach for Storing Result Data

4 messages
Open this post in threaded view
|

## Approach for Storing Result Data

 Hi All, today I have a more general question concerning the approach of storing different values from the analysis of multiple variables. My task is to compare distributions in a universe with distributions from the respondents using a whole bunch of variables. Comparison shall be done on relative frequencies (proportions). I was thinking about the structure I should store the results in and came up with the following: -- cut -- library(stringi) # Result data frame # Some sort of tidytidy data set where # each value is stored as an identity. # This way all values for all variables could be stored in # one unique data structure. # If an additional variable added for the name of the # research one could also build result data set across # surveys. # Values for measure could be "number" for 'raw' values or # "freq" for frequencies/counts. # Values for unit could be "n" for 'numbers' and # "%" for percentages. d_test <- data.frame(     group = rep(c("Universe", "Respondents"), each = 16),     variable = rep("State", 32),     value = rep(c(11.3,                     12.7,                     3.3,                     5,                     0.6,                     8.1,                     6.2,                     5.8,                     6.4,                     14.5,                     8.3,                     0.3,                     3.8,                     2.5,                     8.1,                     3), 2),     label = rep(c("Baden-Wuerttemberg",                 "Bayern",                 "Berlin",                 "Brandenburg",                 "Bremen",                 "Hamburg",                 "Hessen",                 "Mecklenburg-Vorpommern",                 "Niedersachsen",                 "Nordrhein-Westfalen",                 "Rheinland-Pfalz",                 "Saarland",                 "Sachsen",                 "Sachsen-Anhalt",                 "Schleswig-Holstein",                 "Thueringen"),2),     measure = rep("freq", 32),     unit = rep("%", 32),     stringsAsFactors = FALSE ) # This way the variables can be selected using simple # value selection from Base R functionality. data <- d_test[d_test\$variable == "State" ,] # And plot results for every variable. ggplot(   data = data,   aes(     x = label,     y = value,     fill = group)) +   geom_bar(stat = "identity", position = "dodge") +   theme(axis.text.x = element_text(angle = 45, hjust = 1)) +   scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1])) +   scale_x_discrete(name = data\$variable[1]) +   scale_y_discrete(name = data\$unit[1]) -- cut -- The reporting / presentation is done in R Markdown. I would load the result data set once at the beginning and running the comparisons as plots on each variable named in the results data set under "variable". If I follow this approach for my customer relationship survey, do think I would face drawbacks or run into serious trouble? I am interested in your opinion and open for other approaches and suggestions. Kind regards Georg ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|