Quantcast

Approach for Storing Result Data

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Approach for Storing Result Data

G.Maubach
Hi All,

today I have a more general question concerning the approach of storing
different values from the analysis of multiple variables.

My task is to compare distributions in a universe with distributions from
the respondents using a whole bunch of variables. Comparison shall be done
on relative frequencies (proportions).

I was thinking about the structure I should store the results in and came
up with the following:

-- cut --

library(stringi)

# Result data frame
# Some sort of tidytidy data set where
# each value is stored as an identity.
# This way all values for all variables could be stored in
# one unique data structure.
# If an additional variable added for the name of the
# research one could also build result data set across
# surveys.
# Values for measure could be "number" for 'raw' values or
# "freq" for frequencies/counts.
# Values for unit could be "n" for 'numbers' and
# "%" for percentages.
d_test <- data.frame(
    group = rep(c("Universe", "Respondents"), each = 16),
    variable = rep("State", 32),
    value = rep(c(11.3,
                    12.7,
                    3.3,
                    5,
                    0.6,
                    8.1,
                    6.2,
                    5.8,
                    6.4,
                    14.5,
                    8.3,
                    0.3,
                    3.8,
                    2.5,
                    8.1,
                    3), 2),
    label = rep(c("Baden-Wuerttemberg",
                "Bayern",
                "Berlin",
                "Brandenburg",
                "Bremen",
                "Hamburg",
                "Hessen",
                "Mecklenburg-Vorpommern",
                "Niedersachsen",
                "Nordrhein-Westfalen",
                "Rheinland-Pfalz",
                "Saarland",
                "Sachsen",
                "Sachsen-Anhalt",
                "Schleswig-Holstein",
                "Thueringen"),2),
    measure = rep("freq", 32),
    unit = rep("%", 32),
    stringsAsFactors = FALSE
)

# This way the variables can be selected using simple
# value selection from Base R functionality.
data <- d_test[d_test$variable == "State" ,]

# And plot results for every variable.
ggplot(
  data = data,
  aes(
    x = label,
    y = value,
    fill = group)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1]))
+
  scale_x_discrete(name = data$variable[1]) +
  scale_y_discrete(name = data$unit[1])

-- cut --

The reporting / presentation is done in R Markdown. I would load the
result data set once at the beginning and running the comparisons as plots
on each variable named in the results data set under "variable".

If I follow this approach for my customer relationship survey, do think I
would face drawbacks or run into serious trouble?

I am interested in your opinion and open for other approaches and
suggestions.

Kind regards

Georg

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Approach for Storing Result Data

Bert Gunter-2
This does not appear to be a legitimate topic for r-help: it is are
not a consulting service. Please see the posting guide.

Of course, others may disagree and reply. Wouldn't be the first time I'm wrong.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Mar 8, 2017 at 7:27 AM,  <[hidden email]> wrote:

> Hi All,
>
> today I have a more general question concerning the approach of storing
> different values from the analysis of multiple variables.
>
> My task is to compare distributions in a universe with distributions from
> the respondents using a whole bunch of variables. Comparison shall be done
> on relative frequencies (proportions).
>
> I was thinking about the structure I should store the results in and came
> up with the following:
>
> -- cut --
>
> library(stringi)
>
> # Result data frame
> # Some sort of tidytidy data set where
> # each value is stored as an identity.
> # This way all values for all variables could be stored in
> # one unique data structure.
> # If an additional variable added for the name of the
> # research one could also build result data set across
> # surveys.
> # Values for measure could be "number" for 'raw' values or
> # "freq" for frequencies/counts.
> # Values for unit could be "n" for 'numbers' and
> # "%" for percentages.
> d_test <- data.frame(
>     group = rep(c("Universe", "Respondents"), each = 16),
>     variable = rep("State", 32),
>     value = rep(c(11.3,
>                     12.7,
>                     3.3,
>                     5,
>                     0.6,
>                     8.1,
>                     6.2,
>                     5.8,
>                     6.4,
>                     14.5,
>                     8.3,
>                     0.3,
>                     3.8,
>                     2.5,
>                     8.1,
>                     3), 2),
>     label = rep(c("Baden-Wuerttemberg",
>                 "Bayern",
>                 "Berlin",
>                 "Brandenburg",
>                 "Bremen",
>                 "Hamburg",
>                 "Hessen",
>                 "Mecklenburg-Vorpommern",
>                 "Niedersachsen",
>                 "Nordrhein-Westfalen",
>                 "Rheinland-Pfalz",
>                 "Saarland",
>                 "Sachsen",
>                 "Sachsen-Anhalt",
>                 "Schleswig-Holstein",
>                 "Thueringen"),2),
>     measure = rep("freq", 32),
>     unit = rep("%", 32),
>     stringsAsFactors = FALSE
> )
>
> # This way the variables can be selected using simple
> # value selection from Base R functionality.
> data <- d_test[d_test$variable == "State" ,]
>
> # And plot results for every variable.
> ggplot(
>   data = data,
>   aes(
>     x = label,
>     y = value,
>     fill = group)) +
>   geom_bar(stat = "identity", position = "dodge") +
>   theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
>   scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1]))
> +
>   scale_x_discrete(name = data$variable[1]) +
>   scale_y_discrete(name = data$unit[1])
>
> -- cut --
>
> The reporting / presentation is done in R Markdown. I would load the
> result data set once at the beginning and running the comparisons as plots
> on each variable named in the results data set under "variable".
>
> If I follow this approach for my customer relationship survey, do think I
> would face drawbacks or run into serious trouble?
>
> I am interested in your opinion and open for other approaches and
> suggestions.
>
> Kind regards
>
> Georg
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Approach for Storing Result Data

Jeff Newmiller
In reply to this post by G.Maubach
Seems pretty normal except that your one-by-one lookup process usually gets old eventually, and comparing results is much easier if you merge the study data with the lookup data all at once and then use aggregate() (or any of numerous equivalents from contributed packages) to collect results or color/linetype/panel/etc plotted graphical presentations with lattice or ggplot2.
--
Sent from my phone. Please excuse my brevity.

On March 8, 2017 7:27:08 AM PST, [hidden email] wrote:

>Hi All,
>
>today I have a more general question concerning the approach of storing
>
>different values from the analysis of multiple variables.
>
>My task is to compare distributions in a universe with distributions
>from
>the respondents using a whole bunch of variables. Comparison shall be
>done
>on relative frequencies (proportions).
>
>I was thinking about the structure I should store the results in and
>came
>up with the following:
>
>-- cut --
>
>library(stringi)
>
># Result data frame
># Some sort of tidytidy data set where
># each value is stored as an identity.
># This way all values for all variables could be stored in
># one unique data structure.
># If an additional variable added for the name of the
># research one could also build result data set across
># surveys.
># Values for measure could be "number" for 'raw' values or
># "freq" for frequencies/counts.
># Values for unit could be "n" for 'numbers' and
># "%" for percentages.
>d_test <- data.frame(
>    group = rep(c("Universe", "Respondents"), each = 16),
>    variable = rep("State", 32),
>    value = rep(c(11.3,
>                    12.7,
>                    3.3,
>                    5,
>                    0.6,
>                    8.1,
>                    6.2,
>                    5.8,
>                    6.4,
>                    14.5,
>                    8.3,
>                    0.3,
>                    3.8,
>                    2.5,
>                    8.1,
>                    3), 2),
>    label = rep(c("Baden-Wuerttemberg",
>                "Bayern",
>                "Berlin",
>                "Brandenburg",
>                "Bremen",
>                "Hamburg",
>                "Hessen",
>                "Mecklenburg-Vorpommern",
>                "Niedersachsen",
>                "Nordrhein-Westfalen",
>                "Rheinland-Pfalz",
>                "Saarland",
>                "Sachsen",
>                "Sachsen-Anhalt",
>                "Schleswig-Holstein",
>                "Thueringen"),2),
>    measure = rep("freq", 32),
>    unit = rep("%", 32),
>    stringsAsFactors = FALSE
>)
>
># This way the variables can be selected using simple
># value selection from Base R functionality.
>data <- d_test[d_test$variable == "State" ,]
>
># And plot results for every variable.
>ggplot(
>  data = data,
>  aes(
>    x = label,
>    y = value,
>    fill = group)) +
>  geom_bar(stat = "identity", position = "dodge") +
>  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
>scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1]))
>
>+
>  scale_x_discrete(name = data$variable[1]) +
>  scale_y_discrete(name = data$unit[1])
>
>-- cut --
>
>The reporting / presentation is done in R Markdown. I would load the
>result data set once at the beginning and running the comparisons as
>plots
>on each variable named in the results data set under "variable".
>
>If I follow this approach for my customer relationship survey, do think
>I
>would face drawbacks or run into serious trouble?
>
>I am interested in your opinion and open for other approaches and
>suggestions.
>
>Kind regards
>
>Georg
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Antwort: Re: Approach for Storing Result Data

G.Maubach
In reply to this post by Bert Gunter-2
Hi Gunter,
Hi Jeff,
Hi Readers,

many thanks for your reply.

My questions seems to be a little off topic cause it is not about using
the programming language itself but how to use it in a analytics context.
It is about processes and approaches of how to do things in R from a
conception point of view. That is a subject I don't see in the community
but would help me a lot to enhance my work.

Do you know I place where these things are discussed?

Kind regards

Georg



Von:    Jeff Newmiller <[hidden email]>
An:     [hidden email], [hidden email],
Datum:  08.03.2017 17:54
Betreff:        Re: [R] Approach for Storing Result Data



Seems pretty normal except that your one-by-one lookup process usually
gets old eventually, and comparing results is much easier if you merge the
study data with the lookup data all at once and then use aggregate() (or
any of numerous equivalents from contributed packages) to collect results
or color/linetype/panel/etc plotted graphical presentations with lattice
or ggplot2.



Von:    Bert Gunter <[hidden email]>
An:     [hidden email],
Kopie:  R-help <[hidden email]>
Datum:  08.03.2017 17:43
Betreff:        Re: [R] Approach for Storing Result Data



This does not appear to be a legitimate topic for r-help: it is are
not a consulting service. Please see the posting guide.

Of course, others may disagree and reply. Wouldn't be the first time I'm
wrong.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Mar 8, 2017 at 7:27 AM,  <[hidden email]> wrote:
> Hi All,
>
> today I have a more general question concerning the approach of storing
> different values from the analysis of multiple variables.
>
> My task is to compare distributions in a universe with distributions
from
> the respondents using a whole bunch of variables. Comparison shall be
done
> on relative frequencies (proportions).
>
> I was thinking about the structure I should store the results in and
came

> up with the following:
>
> -- cut --
>
> library(stringi)
>
> # Result data frame
> # Some sort of tidytidy data set where
> # each value is stored as an identity.
> # This way all values for all variables could be stored in
> # one unique data structure.
> # If an additional variable added for the name of the
> # research one could also build result data set across
> # surveys.
> # Values for measure could be "number" for 'raw' values or
> # "freq" for frequencies/counts.
> # Values for unit could be "n" for 'numbers' and
> # "%" for percentages.
> d_test <- data.frame(
>     group = rep(c("Universe", "Respondents"), each = 16),
>     variable = rep("State", 32),
>     value = rep(c(11.3,
>                     12.7,
>                     3.3,
>                     5,
>                     0.6,
>                     8.1,
>                     6.2,
>                     5.8,
>                     6.4,
>                     14.5,
>                     8.3,
>                     0.3,
>                     3.8,
>                     2.5,
>                     8.1,
>                     3), 2),
>     label = rep(c("Baden-Wuerttemberg",
>                 "Bayern",
>                 "Berlin",
>                 "Brandenburg",
>                 "Bremen",
>                 "Hamburg",
>                 "Hessen",
>                 "Mecklenburg-Vorpommern",
>                 "Niedersachsen",
>                 "Nordrhein-Westfalen",
>                 "Rheinland-Pfalz",
>                 "Saarland",
>                 "Sachsen",
>                 "Sachsen-Anhalt",
>                 "Schleswig-Holstein",
>                 "Thueringen"),2),
>     measure = rep("freq", 32),
>     unit = rep("%", 32),
>     stringsAsFactors = FALSE
> )
>
> # This way the variables can be selected using simple
> # value selection from Base R functionality.
> data <- d_test[d_test$variable == "State" ,]
>
> # And plot results for every variable.
> ggplot(
>   data = data,
>   aes(
>     x = label,
>     y = value,
>     fill = group)) +
>   geom_bar(stat = "identity", position = "dodge") +
>   theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
>   scale_fill_discrete(name =
stringi::stri_trans_totitle(names(data)[1]))
> +
>   scale_x_discrete(name = data$variable[1]) +
>   scale_y_discrete(name = data$unit[1])
>
> -- cut --
>
> The reporting / presentation is done in R Markdown. I would load the
> result data set once at the beginning and running the comparisons as
plots
> on each variable named in the results data set under "variable".
>
> If I follow this approach for my customer relationship survey, do think
I

> would face drawbacks or run into serious trouble?
>
> I am interested in your opinion and open for other approaches and
> suggestions.
>
> Kind regards
>
> Georg
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...