barplot that displays sums of values of 2 y colums grouped by different variables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

barplot that displays sums of values of 2 y colums grouped by different variables

kenneth dyson
I am trying to create a barplot displaying the sums of 2 columns of data
grouped by a variable. the data is set up like this:

"city" "n" "y" <br>
mon 100 200 <br>
tor 209 300 <br>
edm 98 87 <br>
mon 20 76 <br>
tor 50 96 <br>
edm 62 27 <br>

the resulting plot should have city as the x-axis, 2 bars per city, 1
representing the sum of "n" in that city, the other the sum of "y" in
that city.

If possible also show the sum in each bar as a label?

I aggregated the data into sums like this:

sum_data <- aggregate(. ~ City,data=raw_data,sum)

this gave me the sums per city as I wanted but for some reason 1 of the
cities is missing in the output.

Using this code for the plot:

ggplot(sum_data,aes(x = City,y = n)) + geom_bar(aes(fill = y),stat =
"identity",position = "dodge")

gave be a bar plot with one bar per city showing the sum of y as a color
gradient. not what I expected given the "dodge" command in geom_bar.

Thanks.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: barplot that displays sums of values of 2 y colums grouped by different variables

Eric Berger
'position="dodge"' has no effect in the plot because the x-axis is a factor
variable. The bars do not need to be moved to avoid each other. The
'aes(fill=y)' is specifying that you want the color gradient to capture the
sums in the 'y' variable. You might be better off to use 'no' and 'yes'
rather than 'n' and 'y' to avoid confusion. Then you would see that the
statement would be 'aes(fill=yes)'. Summary: the height of each bar
represents the sum of the 'no' for that city, and the color of each bar
represents the sum of the 'yes' for that city. Your code is fine, unless
that is not what you were trying to do.

HTH,
Eric


On Mon, Jan 15, 2018 at 6:59 PM, kenneth dyson <[hidden email]
> wrote:

> I am trying to create a barplot displaying the sums of 2 columns of data
> grouped by a variable. the data is set up like this:
>
> "city" "n" "y" <br>
> mon 100 200 <br>
> tor 209 300 <br>
> edm 98 87 <br>
> mon 20 76 <br>
> tor 50 96 <br>
> edm 62 27 <br>
>
> the resulting plot should have city as the x-axis, 2 bars per city, 1
> representing the sum of "n" in that city, the other the sum of "y" in that
> city.
>
> If possible also show the sum in each bar as a label?
>
> I aggregated the data into sums like this:
>
> sum_data <- aggregate(. ~ City,data=raw_data,sum)
>
> this gave me the sums per city as I wanted but for some reason 1 of the
> cities is missing in the output.
>
> Using this code for the plot:
>
> ggplot(sum_data,aes(x = City,y = n)) + geom_bar(aes(fill = y),stat =
> "identity",position = "dodge")
>
> gave be a bar plot with one bar per city showing the sum of y as a color
> gradient. not what I expected given the "dodge" command in geom_bar.
>
> Thanks.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: barplot that displays sums of values of 2 y colums grouped by different variables

Eric Berger
https://stackoverflow.com/questions/25070547/ggplot-side-by-side-geom-bar

On Mon, Jan 15, 2018 at 9:39 PM, Kenneth Dyson <[hidden email]
> wrote:

> Hi Eric,
>
> Thanks for the detailed response.
> This is not exactly what I want to do but is close.
> I want 2 bars for each city, 1 with the sum for "yes" , the other, beside
> it, with the sum for "no".
>
> I am way off track with my method here?
>
> Thanks,
> Ken
>
> Sent from Blue <http://www.bluemail.me/r?b=11745>
> On Jan 15, 2018, at 14:34, Eric Berger <[hidden email]> wrote:
>>
>> 'position="dodge"' has no effect in the plot because the x-axis is a
>> factor variable. The bars do not need to be moved to avoid each other. The
>> 'aes(fill=y)' is specifying that you want the color gradient to capture the
>> sums in the 'y' variable. You might be better off to use 'no' and 'yes'
>> rather than 'n' and 'y' to avoid confusion. Then you would see that the
>> statement would be 'aes(fill=yes)'. Summary: the height of each bar
>> represents the sum of the 'no' for that city, and the color of each bar
>> represents the sum of the 'yes' for that city. Your code is fine, unless
>> that is not what you were trying to do.
>>
>> HTH,
>> Eric
>>
>>
>> On Mon, Jan 15, 2018 at 6:59 PM, kenneth dyson <
>> [hidden email]> wrote:
>>
>>> I am trying to create a barplot displaying the sums of 2 columns of data
>>> grouped by a variable. the data is set up like this:
>>>
>>> "city" "n" "y" <br>
>>> mon 100 200 <br>
>>> tor 209 300 <br>
>>> edm 98 87 <br>
>>> mon 20 76 <br>
>>> tor 50 96 <br>
>>> edm 62 27 <br>
>>>
>>> the resulting plot should have city as the x-axis, 2 bars per city, 1
>>> representing the sum of "n" in that city, the other the sum of "y" in that
>>> city.
>>>
>>> If possible also show the sum in each bar as a label?
>>>
>>> I aggregated the data into sums like this:
>>>
>>> sum_data <- aggregate(. ~ City,data=raw_data,sum)
>>>
>>> this gave me the sums per city as I wanted but for some reason 1 of the
>>> cities is missing in the output.
>>>
>>> Using this code for the plot:
>>>
>>> ggplot(sum_data,aes(x = City,y = n)) + geom_bar(aes(fill = y),stat =
>>> "identity",position = "dodge")
>>>
>>> gave be a bar plot with one bar per city showing the sum of y as a color
>>> gradient. not what I expected given the "dodge" command in geom_bar.
>>>
>>> Thanks.
>>>
>>> ______________________________ ________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: barplot that displays sums of values of 2 y colums grouped by different variables

Jim Lemon-4
In reply to this post by kenneth dyson
Hi Kenneth,
I don't know about ggplot, but perhaps this will help:

kddf<-read.table(text="city n y
mon 100 200
tor 209 300
edm 98 87
mon 20 76
tor 50 96
edm 62 27",
header=TRUE,stringsAsFactors=FALSE)
library(plotrix)
barpos<-barp(t(kddf[,2:3]),names.arg=kddf[,1],xlab="City",ylab="Sum",
 main="Sums of values of 2 y columns",col=2:3)
legend(5,300,c("Sum of n","Sum, of y"),fill=2:3)
barlabels(barpos$x,barpos$y)

Jim


On Tue, Jan 16, 2018 at 3:59 AM, kenneth dyson
<[hidden email]> wrote:

> I am trying to create a barplot displaying the sums of 2 columns of data
> grouped by a variable. the data is set up like this:
>
> "city" "n" "y" <br>
> mon 100 200 <br>
> tor 209 300 <br>
> edm 98 87 <br>
> mon 20 76 <br>
> tor 50 96 <br>
> edm 62 27 <br>
>
> the resulting plot should have city as the x-axis, 2 bars per city, 1
> representing the sum of "n" in that city, the other the sum of "y" in that
> city.
>
> If possible also show the sum in each bar as a label?
>
> I aggregated the data into sums like this:
>
> sum_data <- aggregate(. ~ City,data=raw_data,sum)
>
> this gave me the sums per city as I wanted but for some reason 1 of the
> cities is missing in the output.
>
> Using this code for the plot:
>
> ggplot(sum_data,aes(x = City,y = n)) + geom_bar(aes(fill = y),stat =
> "identity",position = "dodge")
>
> gave be a bar plot with one bar per city showing the sum of y as a color
> gradient. not what I expected given the "dodge" command in geom_bar.
>
> Thanks.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: barplot that displays sums of values of 2 y colums grouped by different variables

Jeff Newmiller
In reply to this post by kenneth dyson
It is not generally advisable to get too fancy with stat functions in
ggplot... things can easily get more complicated than ggplot is ready to
handle when it comes to calculations. It is better to create data that
corresponds directly to the graphical representations you are mapping
them to.

Read [1] for more on this philosophy.

[1] H. Wickham, Tidy Data, Journal of Statistical Software, vol. 59, no.
10, pp. 123, Sep. 2014. http://www.jstatsoft.org/v59/i10/

#---
library(ggplot2) # ggplot
library(dplyr)   # `%>%`, group_by, summarise
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#>     filter, lag
#> The following objects are masked from 'package:base':
#>
#>     intersect, setdiff, setequal, union
library(tidyr)   # gather

dta <- read.table( text =
"city n y
mon 100 200
tor 209 300
edm 98 87
mon 20 76
tor 50 96
edm 62 27
", header = TRUE )

dta2 <- (   dta
         %>% group_by( city )
         %>% summarise( n = sum( n )
                      , y = sum( y )
                      )
         %>% gather( Response, value, -city )
         )

ggplot( dta2, aes( x=city, y=value, fill = Response ) ) +
     geom_bar( stat="identity", position="dodge" )

#' ![](https://i.imgur.com/cosFf3B.png)
#---

On Mon, 15 Jan 2018, kenneth dyson wrote:

> I am trying to create a barplot displaying the sums of 2 columns of data
> grouped by a variable. the data is set up like this:
>
> "city" "n" "y" <br>
> mon 100 200 <br>
> tor 209 300 <br>
> edm 98 87 <br>
> mon 20 76 <br>
> tor 50 96 <br>
> edm 62 27 <br>
>
> the resulting plot should have city as the x-axis, 2 bars per city, 1
> representing the sum of "n" in that city, the other the sum of "y" in that
> city.
>
> If possible also show the sum in each bar as a label?
>
> I aggregated the data into sums like this:
>
> sum_data <- aggregate(. ~ City,data=raw_data,sum)
>
> this gave me the sums per city as I wanted but for some reason 1 of the
> cities is missing in the output.
>
> Using this code for the plot:
>
> ggplot(sum_data,aes(x = City,y = n)) + geom_bar(aes(fill = y),stat =
> "identity",position = "dodge")
>
> gave be a bar plot with one bar per city showing the sum of y as a color
> gradient. not what I expected given the "dodge" command in geom_bar.
>
> Thanks.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<[hidden email]>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: barplot that displays sums of values of 2 y colums grouped by different variables

kenneth dyson
Thanks everyone.

Got it to work like this, if anyone is interested:

import the data with readr, taking in only the columns that have numeric
values ("n" and "y") and the column with the groups ("city").

aggregate the data by the group ("city") so that each variable has a sum:

|sum_data <-aggregate(.~City__c,data=raw_data,sum)|

reshape the data so that the groups are now variables (there will be 3
columns: "city", "variable", "value"):

|library(reshape2)library(ggplot2)sums <-melt(sum_data)|

plot using ggplot:

|ggplot(sums,aes(x =city,y=value,fill =variable,ymax
=1000))+geom_bar(stat="identity",width=.8,position ="dodge")|


On 2018-01-15 6:01 PM, Jeff Newmiller wrote:

> It is not generally advisable to get too fancy with stat functions in
> ggplot... things can easily get more complicated than ggplot is ready
> to handle when it comes to calculations. It is better to create data
> that corresponds directly to the graphical representations you are
> mapping them to.
>
> Read [1] for more on this philosophy.
>
> [1] H. Wickham, Tidy Data, Journal of Statistical Software, vol. 59,
> no. 10, pp. 123, Sep. 2014. http://www.jstatsoft.org/v59/i10/
>
> #---
> library(ggplot2) # ggplot
> library(dplyr)   # `%>%`, group_by, summarise
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #>     filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #>     intersect, setdiff, setequal, union
> library(tidyr)   # gather
>
> dta <- read.table( text =
> "city n y
> mon 100 200
> tor 209 300
> edm 98 87
> mon 20 76
> tor 50 96
> edm 62 27
> ", header = TRUE )
>
> dta2 <- (   dta
>         %>% group_by( city )
>         %>% summarise( n = sum( n )
>                      , y = sum( y )
>                      )
>         %>% gather( Response, value, -city )
>         )
>
> ggplot( dta2, aes( x=city, y=value, fill = Response ) ) +
>     geom_bar( stat="identity", position="dodge" )
>
> #' ![](https://i.imgur.com/cosFf3B.png)
> #---
>
> On Mon, 15 Jan 2018, kenneth dyson wrote:
>
>> I am trying to create a barplot displaying the sums of 2 columns of
>> data grouped by a variable. the data is set up like this:
>>
>> "city" "n" "y" <br>
>> mon 100 200 <br>
>> tor 209 300 <br>
>> edm 98 87 <br>
>> mon 20 76 <br>
>> tor 50 96 <br>
>> edm 62 27 <br>
>>
>> the resulting plot should have city as the x-axis, 2 bars per city, 1
>> representing the sum of "n" in that city, the other the sum of "y" in
>> that city.
>>
>> If possible also show the sum in each bar as a label?
>>
>> I aggregated the data into sums like this:
>>
>> sum_data <- aggregate(. ~ City,data=raw_data,sum)
>>
>> this gave me the sums per city as I wanted but for some reason 1 of
>> the cities is missing in the output.
>>
>> Using this code for the plot:
>>
>> ggplot(sum_data,aes(x = City,y = n)) + geom_bar(aes(fill = y),stat =
>> "identity",position = "dodge")
>>
>> gave be a bar plot with one bar per city showing the sum of y as a
>> color gradient. not what I expected given the "dodge" command in
>> geom_bar.
>>
>> Thanks.
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ---------------------------------------------------------------------------
>
> Jeff Newmiller                        The     .....       ..... Go
> Live...
> DCN:<[hidden email]>        Basics: ##.#. ##.#.  Live Go...
>                                       Live:   OO#.. Dead: OO#.. Playing
> Research Engineer (Solar/Batteries            O.O#.       #.O#. with
> /Software/Embedded Controllers)               .OO#.       .OO#.
> rocks...1k
> ---------------------------------------------------------------------------
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.