How to compare stacked histograms/datasets

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to compare stacked histograms/datasets

Atulkakrana
Hello All,

I have a couple of stacked histograms which I need to compare/evaluate for similarity or difference.Example of stacked histogram

I believe rather than evaluating histograms is will be east to work with dataset used to plot these stacked histograms, which is in format:

RED                              PURPLE                     BLUE         GREY                           YELLOW
22.0640569395 16.9483985765 0 60.987544484 0
8.1850533808 8.8523131673 0 82.962633452 0
6.8505338078 6.8950177936 0.756227758 85.4982206406 0.5338078292
6.7615658363 5.2491103203 1.6459074733 86.3434163701 0.6672597865
5.8274021352 7.384341637 2.1352313167 84.653024911 1.1565836299
7.8736654804 6.628113879 1.5569395018 83.9412811388 1.2010676157
7.1619217082 8.1850533808 1.2455516014 83.4074733096 1.3790035587
5.5604982206 10.2758007117 1.0676156584 83.0960854093 1.0231316726
7.1174377224 7.6067615658 0.7117437722 84.5640569395 0.756227758
7.8736654804 3.9590747331 0.6672597865 87.5 0.3113879004
7.6512455516 7.8736654804 0.5338078292 83.9412811388 0.5338078292
7.6067615658 8.9857651246 1.4679715302 81.9395017794 0.3558718861
8.9412811388 8.0071174377 1.3790035587 81.6725978648 0.5782918149
19.0836298932 9.2081850534 2.1352313167 69.5729537367 1.3790035587
14.9911032028 11.0765124555 3.2028469751 70.7295373665 1.0676156584
15.3914590747 10.8985765125 3.024911032 70.6850533808 1.2900355872
17.4822064057 12.5444839858 2.4911032028 67.4822064057 1.334519573
15.8362989324 13.0338078292 2.0017793594 69.128113879 1.334519573
17.037366548 10.4537366548 2.4021352313 70.1067615658 1.2010676157
20.2846975089 10.0088967972 0 69.706405694 1.0676156584
28.7366548043 12.6334519573 0 58.6298932384 0

Is there any possible way I can compare such dataset from multiple experiments (n=8) and visually show (plot) that these datasets are in consensus or differ from each other?

Awaiting reply,

Atul
Reply | Threaded
Open this post in threaded view
|

Re: How to compare stacked histograms/datasets

Joshua Wiley-2
Hi,

Probably easier to work with the raw data, but whatever.  If your data
is in a data frame, dat,

## create row index
dat$x <- 1:21

## load packages
require(ggplot2)
require(reshape2)

## melt the data frame to be long, long dat, ldat for short
ldat <- melt(dat, id.vars="x")

## plot the distributions
ggplot(ldat, aes(x, value, colour = variable)) + geom_line()

## they don't really look on the same scale
## we could scale the data first to have equal mean and variance
dat2 <- as.data.frame(scale(dat))
## remake index so it is not scaled
dat2$x <- 1:21

ldat2 <- melt(dat2, id.vars="x")
ggplot(ldat2, aes(x, value, colour = variable)) + geom_line()

which yields the attached PDF (maybe scrubbed on the official list as
most file extensions are, but should go through to you personally via
gmail).  I'm not sure it's the greatest approach ever, but it gives
you a sense if they go up and down together or at different points.

Cheers,

Josh

On Fri, Jul 6, 2012 at 1:55 PM, Atulkakrana <[hidden email]> wrote:

> Hello All,
>
> I have a couple of stacked histograms which I need to compare/evaluate for
> similarity or difference.
> http://r.789695.n4.nabble.com/file/n4635668/Selection_011.png
>
> I believe rather than evaluating histograms is will be east to work with
> dataset used to plot these stacked histograms, which is in format:
>
> RED                              PURPLE                     BLUE
> GREY                           YELLOW
> 22.0640569395   16.9483985765   0       60.987544484    0
> 8.1850533808    8.8523131673    0       82.962633452    0
> 6.8505338078    6.8950177936    0.756227758     85.4982206406   0.5338078292
> 6.7615658363    5.2491103203    1.6459074733    86.3434163701   0.6672597865
> 5.8274021352    7.384341637     2.1352313167    84.653024911    1.1565836299
> 7.8736654804    6.628113879     1.5569395018    83.9412811388   1.2010676157
> 7.1619217082    8.1850533808    1.2455516014    83.4074733096   1.3790035587
> 5.5604982206    10.2758007117   1.0676156584    83.0960854093   1.0231316726
> 7.1174377224    7.6067615658    0.7117437722    84.5640569395   0.756227758
> 7.8736654804    3.9590747331    0.6672597865    87.5    0.3113879004
> 7.6512455516    7.8736654804    0.5338078292    83.9412811388   0.5338078292
> 7.6067615658    8.9857651246    1.4679715302    81.9395017794   0.3558718861
> 8.9412811388    8.0071174377    1.3790035587    81.6725978648   0.5782918149
> 19.0836298932   9.2081850534    2.1352313167    69.5729537367   1.3790035587
> 14.9911032028   11.0765124555   3.2028469751    70.7295373665   1.0676156584
> 15.3914590747   10.8985765125   3.024911032     70.6850533808   1.2900355872
> 17.4822064057   12.5444839858   2.4911032028    67.4822064057   1.334519573
> 15.8362989324   13.0338078292   2.0017793594    69.128113879    1.334519573
> 17.037366548    10.4537366548   2.4021352313    70.1067615658   1.2010676157
> 20.2846975089   10.0088967972   0       69.706405694    1.0676156584
> 28.7366548043   12.6334519573   0       58.6298932384   0
>
> Is there any possible way I can compare such dataset from multiple
> experiments (n=8) and visually show (plot) that these datasets are in
> consensus or differ from each other?
>
> Awaiting reply,
>
> Atul
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-compare-stacked-histograms-datasets-tp4635668.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

plots.pdf (9K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to compare stacked histograms/datasets

Atulkakrana
Hello Joshua,

Thanks for taking time out to help me with problem. Actually the comparison is to be done among two (if possible, more than two) datasets and not within the dataset. Each dataset hold 5 variables (i.e Red, Purple, Blue, Grey and Yellow) for 21 different positions i.e 1-21n. So, we have 5 values for each position (total 21) that make a single dataset or stacked histogram (Plot in original post).

Initially I was comparing datasets by plotting stacked histograms for each and analyzing them visually. But that doesn't give a statistical idea of how similar or different the datasets are. Therefore, I want to evaluate the datasets in order to quantify their difference/similarity. So, end result would be a plot showing similarity/difference among two or more datasets.

Example datasets: http://pastebin.com/iYj1RNvt

Does the method you explained can be applied to multiple datasets? Can a qqplot be obtained in such a case?

Awaiting your reply

Thanks

Atul
Reply | Threaded
Open this post in threaded view
|

Re: How to compare stacked histograms/datasets

Joshua Wiley-2
Hi,

Sure, you could do a qqplot for each variable between two datasets.
In a 2d graph, it will be hard to reasonably compare more than 2
datasets (you can put many such graphs on a single page, but it would
be pairwise sets of comparisons, I think.  Perhaps you could plots
multiple qqplots on top of each other varying the points by colour for
the different data sets?

I have not seen anything like this before, so I suppose it depends
what helps you understand your data.

Cheers,

Josh

On Sat, Jul 7, 2012 at 3:25 PM, Atulkakrana <[hidden email]> wrote:

> Hello Joshua,
>
> Thanks for taking time out to help me with problem. Actually the comparison
> is to be done among two (if possible, more than two) datasets and not within
> the dataset. Each dataset hold 5 variables (i.e Red, Purple, Blue, Grey and
> Yellow) for 21 different positions i.e 1-21n. So, we have 5 values for each
> position (total 21) that make a single dataset or stacked histogram (Plot in
> original post).
>
> Initially I was comparing datasets by plotting stacked histograms for each
> and analyzing them visually. But that doesn't give a statistical idea of how
> similar or different the datasets are. Therefore, I want to evaluate the
> datasets in order to quantify their difference/similarity. So, end result
> would be a plot showing similarity/difference among two or more datasets.
>
> Example datasets: http://pastebin.com/iYj1RNvt
>
> Does the method you explained can be applied to multiple datasets? Can a
> qqplot be obtained in such a case?
>
> Awaiting your reply
>
> Thanks
>
> Atul
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/How-to-compare-stacked-histograms-datasets-tp4635668p4635744.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.