adding the mean and standard deviation to boxplots

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

adding the mean and standard deviation to boxplots

Tom Cohen-2
Dear list,
   
  How can I add the mean and standard deviation to each of the boxplots using the example provided  in the boxplot function?
 
boxplot(len ~ dose, data = ToothGrowth,
        boxwex = 0.25, at = 1:3 - 0.2,
        subset = supp == "VC", col = "yellow",
        main = "Guinea Pigs' Tooth Growth",
        xlab = "Vitamin C dose mg",
        ylab = "tooth length", ylim = c(0, 35), yaxs = "i")
boxplot(len ~ dose, data = ToothGrowth, add = TRUE,
        boxwex = 0.25, at = 1:3 + 0.2,
        subset = supp == "OJ", col = "orange")
legend(2, 9, c("Ascorbic acid", "Orange juice"),
       fill = c("yellow", "orange"))

Thanks for any help,
  Tom

       
---------------------------------

Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: adding the mean and standard deviation to boxplots

Gabor Grothendieck
Not precisely what you asked for but see the notch= argument to boxplot
for a graphic measure of variability.  If you simply wish to print certain
statistics below the numbers already on the X axis then see:

https://stat.ethz.ch/pipermail/r-help/2008-January/152994.html

On Feb 4, 2008 10:41 AM, Tom Cohen <[hidden email]> wrote:

> Dear list,
>
>  How can I add the mean and standard deviation to each of the boxplots using the example provided  in the boxplot function?
>
> boxplot(len ~ dose, data = ToothGrowth,
>        boxwex = 0.25, at = 1:3 - 0.2,
>        subset = supp == "VC", col = "yellow",
>        main = "Guinea Pigs' Tooth Growth",
>        xlab = "Vitamin C dose mg",
>        ylab = "tooth length", ylim = c(0, 35), yaxs = "i")
> boxplot(len ~ dose, data = ToothGrowth, add = TRUE,
>        boxwex = 0.25, at = 1:3 + 0.2,
>        subset = supp == "OJ", col = "orange")
> legend(2, 9, c("Ascorbic acid", "Orange juice"),
>       fill = c("yellow", "orange"))
>
> Thanks for any help,
>  Tom
>
>
> ---------------------------------
>
> Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: adding the mean and standard deviation to boxplots

HBaize
In reply to this post by Tom Cohen-2
There are many ways to do it. The following will place a blue point on the boxplot at the mean, then print the mean at the bottom of the plot. In some plots I've gone too far and included median points and values as well. You could also put 95% CI on the same plot, but it would get perhaps too "busy."


# Plot boxplot of vitamin C subset
bx <- boxplot(len ~ dose, data = ToothGrowth,
        boxwex = 0.25, at = 1:3 - 0.2,
        subset = supp == "VC", col = "yellow",
        main = "Guinea Pigs' Tooth Growth",
        xlab = "Vitamin C dose mg",
        ylab = "tooth length", ylim = c(0, 35), yaxs = "i")

# keep location
at <- c(1:length(bx$names))

# find means, plot as points
SubTc <-subset(ToothGrowth, supp == "VC")
means <- by(SubTc$len, SubTc$dose, mean,na.rm=TRUE)
points(at - 0.2, means, pch = 19, col = "blue")

# write mean values
text(at - 0.1, 1, labels = formatC(means, format = "f", digits = 1),
    pos = 2, cex = 1, col = "red")

# Orange juice subset
boxplot(len ~ dose, data = ToothGrowth, add = TRUE,
        boxwex = 0.25, at = 1:3 + 0.2,
        subset = supp == "OJ", col = "orange")

# find means, plot as points
SubTo <-subset(ToothGrowth, supp == "OJ")
meano <- by(SubTo$len, SubTo$dose, mean,na.rm=TRUE)
points(at + 0.2,meano, pch = 19, col = "blue")

# write mean values
text(at + 0.3, 1, labels = formatC(meano, format = "f", digits = 1),
    pos = 2, cex = 1, col = "orange")

legend(2, 9, c("Ascorbic acid", "Orange juice"),
       fill = c("yellow", "orange"))
------------------------------      
THT, I'm sure my R code is not as efficient as it could be.

Harold Baize
Butte County Department of Behavioral Health



Tom Cohen-2 wrote
Dear list,
   
  How can I add the mean and standard deviation to each of the boxplots using the example provided  in the boxplot function?
 
boxplot(len ~ dose, data = ToothGrowth,
        boxwex = 0.25, at = 1:3 - 0.2,
        subset = supp == "VC", col = "yellow",
        main = "Guinea Pigs' Tooth Growth",
        xlab = "Vitamin C dose mg",
        ylab = "tooth length", ylim = c(0, 35), yaxs = "i")
boxplot(len ~ dose, data = ToothGrowth, add = TRUE,
        boxwex = 0.25, at = 1:3 + 0.2,
        subset = supp == "OJ", col = "orange")
legend(2, 9, c("Ascorbic acid", "Orange juice"),
       fill = c("yellow", "orange"))

Thanks for any help,
  Tom

       
---------------------------------

Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html
        [[alternative HTML version deleted]]


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: adding the mean and standard deviation to boxplots

Felipe Carrillo
In reply to this post by Tom Cohen-2
Tom:
You can do this with ggplot2. The code below puts  95%
CI,a smooth line and the mean(blue point)on the same
plot.
Felipe

 library(ggplot2)
  r <- ggplot(ToothGrowth, aes(y=len, x=factor(dose)))
  r$background.fill = "cornsilk"
r + geom_boxplot(aes(colour=supp)) +
stat_summary(aes(group=supp),fun="mean_cl_normal",colour="red",geom="smooth",linetype=3,size=1)
+
 
stat_summary(aes(group=supp),fun="mean_cl_normal",colour="blue",geom="point",size=4)

   grid.gedit("label", gp=gpar(fontsize=10,
col="red")) # color the axis labels



Felipe D. Carrillo
  Fishery Biologist
  US Fish & Wildlife Service
  California, USA



      ____________________________________________________________________________________
Be a better friend, newshound, and

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: adding the mean and standard deviation to boxplots

Fernando Marmolejo-Ramos
In reply to this post by HBaize
Dear users

This is a message I was directing to Harold Baize but because I pressed the wrong button the message got lost grrrr!!!

So I’m doing it all over again:

Lets suppose I have three batches of data:

a <- rnorm(50,2500,300)
b <- rnorm(50,3500,250)
c <- rnorm(50,4000,200)

# Now I want to plot them as boxplots and violin plots
require(vioplot)
vioplot (a,b,c, horizontal=T, col=“white”)
boxplot (a,b,c, horizontal=T, col=“white”)

As we know boxplot show the least-greates values, lower-upper quartiles, the mean, and outliers (when present).

However, for some data is not important the MEDIAN but the MEAN. Also, it is more relevant to show ERROR BARS instead of quartiles.

So, how could I see (for the batches of data I introduced above)…

1. a boxplot showing the MEAN and the SD instead of the lower/upper quartile?
2. a boxplot showing the MEAN and the STANDARD ERROR OF THE MEAN instead of the lower/upper quartile?
3. a boxplot showing the MEAN and the 95% CI instead of the lower/upper quartile?

(I think in all these cases is preferable to have visual access, or to have the line that shows, the LEAST and the GREATEST VALUES.)

In other words, that the ERROR BARS (95% CI, SD, SE) proposed here take the place of the boxes usually used to represent the lower/upper quartile.

Now, the big question, is all this jazz possible to be implemented in violin plots as well?

How could that be done?

Cheers,

Fernando
Reply | Threaded
Open this post in threaded view
|

Re: adding the mean and standard deviation to boxplots

jholtman
?bxp

This is the underlying routine called by boxplot and you can supply
your own values to the 5 that define a boxplot.

On Wed, Jul 23, 2008 at 2:24 AM, Fernando Marmolejo-Ramos
<[hidden email]> wrote:

>
> Dear users
>
> This is a message I was directing to Harold Baize but because I pressed the
> wrong button the message got lost grrrr!!!
>
> So I'm doing it all over again:
>
> Lets suppose I have three batches of data:
>
> a <- rnorm(50,2500,300)
> b <- rnorm(50,3500,250)
> c <- rnorm(50,4000,200)
>
> # Now I want to plot them as boxplots and violin plots
> require(vioplot)
> vioplot (a,b,c, horizontal=T, col="white")
> boxplot (a,b,c, horizontal=T, col="white")
>
> As we know boxplot show the least-greates values, lower-upper quartiles, the
> mean, and outliers (when present).
>
> However, for some data is not important the MEDIAN but the MEAN. Also, it is
> more relevant to show ERROR BARS instead of quartiles.
>
> So, how could I see (for the batches of data I introduced above)…
>
> 1.      a boxplot showing the MEAN and the SD instead of the lower/upper
> quartile?
> 2.      a boxplot showing the MEAN and the STANDARD ERROR OF THE MEAN instead of
> the lower/upper quartile?
> 3.      a boxplot showing the MEAN and the 95% CI instead of the lower/upper
> quartile?
>
> (I think in all these cases is preferable to have visual access, or to have
> the line that shows, the LEAST and the GREATEST VALUES.)
>
> In other words, that the ERROR BARS (95% CI, SD, SE) proposed here take the
> place of the boxes usually used to represent the lower/upper quartile.
>
> Now, the big question, is all this jazz possible to be implemented in violin
> plots as well?
>
> How could that be done?
>
> Cheers,
>
> Fernando
> --
> View this message in context: http://www.nabble.com/adding-the-mean-and-standard-deviation-to-boxplots-tp15271398p18604571.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: adding the mean and standard deviation to boxplots

John Kane-2
In reply to this post by Fernando Marmolejo-Ramos
This thread may help
http://www.nabble.com/adding-the-mean-and-standard-deviation-to-boxplots-td15271398.html


--- On Wed, 7/23/08, Fernando Marmolejo-Ramos <[hidden email]> wrote:

> From: Fernando Marmolejo-Ramos <[hidden email]>
> Subject: Re: [R] adding the mean and standard deviation to boxplots
> To: [hidden email]
> Received: Wednesday, July 23, 2008, 2:24 AM
> Dear users
>
> This is a message I was directing to Harold Baize but
> because I pressed the
> wrong button the message got lost grrrr!!!
>
> So I’m doing it all over again:
>
> Lets suppose I have three batches of data:
>
> a <- rnorm(50,2500,300)
> b <- rnorm(50,3500,250)
> c <- rnorm(50,4000,200)
>
> # Now I want to plot them as boxplots and violin plots
> require(vioplot)
> vioplot (a,b,c, horizontal=T, col=“white”)
> boxplot (a,b,c, horizontal=T, col=“white”)
>
> As we know boxplot show the least-greates values,
> lower-upper quartiles, the
> mean, and outliers (when present).
>
> However, for some data is not important the MEDIAN but the
> MEAN. Also, it is
> more relevant to show ERROR BARS instead of quartiles.
>
> So, how could I see (for the batches of data I introduced
> above)…
>
> 1. a boxplot showing the MEAN and the SD instead of the
> lower/upper
> quartile?
> 2. a boxplot showing the MEAN and the STANDARD ERROR OF THE
> MEAN instead of
> the lower/upper quartile?
> 3. a boxplot showing the MEAN and the 95% CI instead of the
> lower/upper
> quartile?
>
> (I think in all these cases is preferable to have visual
> access, or to have
> the line that shows, the LEAST and the GREATEST VALUES.)
>
> In other words, that the ERROR BARS (95% CI, SD, SE)
> proposed here take the
> place of the boxes usually used to represent the
> lower/upper quartile.
>
> Now, the big question, is all this jazz possible to be
> implemented in violin
> plots as well?
>
> How could that be done?
>
> Cheers,
>
> Fernando
> --
> View this message in context:
> http://www.nabble.com/adding-the-mean-and-standard-deviation-to-boxplots-tp15271398p18604571.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.


      __________________________________________________________________
Instant Messaging, free SMS, sharing photos and more... Try the new Yahoo! Canada Messenger at http://ca.beta.messenger.yahoo.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: adding the mean and standard deviation to boxplots

HBaize
In reply to this post by Fernando Marmolejo-Ramos
Fernando,
I don't have time to do all that you asked, but here is some code that makes violin plots with mean, median, and 95% CI. I like this plot very much, even if boxplot purists think it is horrible :-)

I think the boxplot was developed before we had computing power. Now we can show the detail of the distribution easily. This code uses the library "UsingR" written by John Verzani.

Real R wizards will find my code to be crude. It could be done with more elegance, but it works :-)
Note that I varied the sample size to show difference in 95% CI.

HTH

-------------------------------

## Create three random data vectors
a <- rnorm(25,2500,300)
b <- rnorm(50,3500,250)
c <- rnorm(100,4000,200)


## Convert data vectors to dataframes
adf <- data.frame(Group = " A ", Measure = a)
bdf <- data.frame(Group = " B ", Measure = b)
cdf <- data.frame(Group = " C ", Measure = c)

## Combine into a dataframe using rbind
abcData <- rbind(adf, bdf, cdf)
attach(abcData)

## load the UsingR library for violin plots

library(UsingR)

## Run boxplot to find statistics, but don't draw the boxplots
S <- boxplot(Measure ~ Group, plot=FALSE)

## Draw violin plots
simple.violinplot(Measure ~ Group,
                col = "lightblue")

## Add title
title(main="Just Random Test Data",
      sub="A, B, & C",
      cex.main = 1.5,
      cex.sub = 1.3)


## Define locations for additional chart elements
at <- c(1:length(S$names))

## Draw thick green lines for median values

points(at,S$stats[3, ], pch = 22, cex = 1.2, bg = "darkgreen")

## Get Group means and plot them using a diamond plot symbol
##    IMPORTANT -- must add the missing values removal: na.rm=TRUE
##                         if there is any missing data.

means <- by(Measure, Group, mean, na.rm=TRUE)
points(at,means, pch = 23, cex = 1.2, bg = "red")

##-  Get CIs -##
## create standard error function--
se <- function(x) {
         y <- x[!is.na(x)]
         sqrt(var(as.vector(y))/length(y))
    }

## create length function for non-missing values
lngth <- function(x){
            y <- x[!is.na(x)]
            length(y)
    }

## Compute vectors of standard error and n
Hse <- by(Measure,Group,se)
Hn  <- by(Measure,Group,lngth)

## compute 95% CIs and store in vectors
civ.u <- means + qt(.975, df=Hn-1) * Hse # Upper bound CI
civ.l <- means + qt(.025, df=Hn-1) * Hse # Lower bound CI


## Draw CI, first vertical line, then upper and lower horizontal
segments(at, civ.u, at, civ.l, lty = "solid", lwd = 2, col = "red")
segments(at - 0.1, civ.u, at + 0.1, civ.u, lty = "solid", lwd =2,col = "red")
segments(at - 0.1, civ.l, at + 0.1, civ.l, lty = "solid", lwd =2,col = "red")


## Draw Mean values to the left edge of each violinplot
text(at - 0.1, means, labels = formatC(means, format = "f", digits = 1),
    pos = 2, cex = 1, col = "red")

## Draw Median values to the right edge of each violinplot
text(at + 0.1, S$stats[3, ], labels = formatC(S$stats[3, ],
     format = "f", digits = 1), pos = 4, cex = 1, col = "darkgreen")

## Print "n" under the name of measure
mtext(S$n, side = 1, at = at, cex=.75, line = 2.5)

## End

--------------------------------------------



Fernando Marmolejo-Ramos wrote
Dear users

This is a message I was directing to Harold Baize but because I pressed the wrong button the message got lost grrrr!!!

So I’m doing it all over again:

Lets suppose I have three batches of data:

a <- rnorm(50,2500,300)
b <- rnorm(50,3500,250)
c <- rnorm(50,4000,200)

# Now I want to plot them as boxplots and violin plots
require(vioplot)
vioplot (a,b,c, horizontal=T, col=“white”)
boxplot (a,b,c, horizontal=T, col=“white”)

As we know boxplot show the least-greates values, lower-upper quartiles, the mean, and outliers (when present).

However, for some data is not important the MEDIAN but the MEAN. Also, it is more relevant to show ERROR BARS instead of quartiles.

So, how could I see (for the batches of data I introduced above)…

1. a boxplot showing the MEAN and the SD instead of the lower/upper quartile?
2. a boxplot showing the MEAN and the STANDARD ERROR OF THE MEAN instead of the lower/upper quartile?
3. a boxplot showing the MEAN and the 95% CI instead of the lower/upper quartile?

(I think in all these cases is preferable to have visual access, or to have the line that shows, the LEAST and the GREATEST VALUES.)

In other words, that the ERROR BARS (95% CI, SD, SE) proposed here take the place of the boxes usually used to represent the lower/upper quartile.

Now, the big question, is all this jazz possible to be implemented in violin plots as well?

How could that be done?

Cheers,

Fernando