|
Is there an easy way to "thin" a lattice plot? I often create plots from
large data sets, and use the "pdf" command to save them to a file, but the resulting files can be huge, because every point in the underlying dataset is rendered in the plot, even though it isn't possible to see that much detail. For example: require(Hmisc) x <- rnorm(1e6) pdf("test.pdf") Ecdf(x) dev.off() The resulting pdf files is 31MB. Is there any easy way to get a smaller pdf file without having to manually prune the dataset? Thanks. - Elliot -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: [hidden email] [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
On Jul 30, 2012, at 2:13 PM, Elliot Joel Bernstein wrote: > Is there an easy way to "thin" a lattice plot? I often create plots > from > large data sets, and use the "pdf" command to save them to a file, > but the > resulting files can be huge, because every point in the underlying > dataset > is rendered in the plot, even though it isn't possible to see that > much > detail. > > For example: > > require(Hmisc) > x <- rnorm(1e6) > > pdf("test.pdf") > Ecdf(x) > dev.off() > > The resulting pdf files is 31MB. Is there any easy way to get a > smaller pdf > file without having to manually prune the dataset? There are plotting routines that display the density of distributions. I use hexbin fairly frequently but that is for 2d plots. If you wanted the ECDF of a 1d vector, you could use cumsum() on the output of hist() or quantile() with suitable arguments to their parameters to control the degree of aggregation. Either of these yields an 8KB file on my machine. > pdf("test.pdf") > xyplot( cumsum(hist(x, plot=F)$intensities) ~ hist(x, plot=F) $breaks ) > dev.off() quartz 2 > pdf("test.pdf") > xyplot( (0:100)/100 ~ quantile(x, prob=(0:100)/100) ) > dev.off() quartz 2 > > Thanks. > > - Elliot > > -- > Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC > 134 Mount Auburn Street | Cambridge, MA | 02138 > Phone: (617) 503-4619 | Email: [hidden email] > David Winsemius, MD Alameda, CA, USA ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
You might also check ?pdf on your system. On Windows the default is for
compression. Your code creates a 186K file although it is slow to load reflecting the overhead from decompressing the file. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 > -----Original Message----- > From: [hidden email] [mailto:r-help-bounces@r- > project.org] On Behalf Of David Winsemius > Sent: Monday, July 30, 2012 5:47 PM > To: Elliot Joel Bernstein > Cc: [hidden email] > Subject: Re: [R] Thinning Lattice Plot > > > On Jul 30, 2012, at 2:13 PM, Elliot Joel Bernstein wrote: > > > Is there an easy way to "thin" a lattice plot? I often create plots > > from > > large data sets, and use the "pdf" command to save them to a file, > > but the > > resulting files can be huge, because every point in the underlying > > dataset > > is rendered in the plot, even though it isn't possible to see that > > much > > detail. > > > > For example: > > > > require(Hmisc) > > x <- rnorm(1e6) > > > > pdf("test.pdf") > > Ecdf(x) > > dev.off() > > > > The resulting pdf files is 31MB. Is there any easy way to get a > > smaller pdf > > file without having to manually prune the dataset? > > There are plotting routines that display the density of distributions. > I use hexbin fairly frequently but that is for 2d plots. If you > wanted the ECDF of a 1d vector, you could use cumsum() on the output > of hist() or quantile() with suitable arguments to their parameters to > control the degree of aggregation. Either of these yields an 8KB file > on my machine. > > > pdf("test.pdf") > > xyplot( cumsum(hist(x, plot=F)$intensities) ~ hist(x, plot=F) > $breaks ) > > dev.off() > quartz > 2 > > > pdf("test.pdf") > > xyplot( (0:100)/100 ~ quantile(x, prob=(0:100)/100) ) > > dev.off() > quartz > 2 > > > > > > > Thanks. > > > > - Elliot > > > > -- > > Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC > > 134 Mount Auburn Street | Cambridge, MA | 02138 > > Phone: (617) 503-4619 | Email: [hidden email] > > > > David Winsemius, MD > Alameda, CA, USA > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Elliot
On Tue, Jul 31, 2012 at 2:43 AM, Elliot Joel Bernstein
<[hidden email]> wrote: > Is there an easy way to "thin" a lattice plot? I often create plots from > large data sets, and use the "pdf" command to save them to a file, but the > resulting files can be huge, because every point in the underlying dataset > is rendered in the plot, even though it isn't possible to see that much > detail. > > For example: > > require(Hmisc) > x <- rnorm(1e6) > > pdf("test.pdf") > Ecdf(x) > dev.off() (This is not a lattice plot, BTW.) > The resulting pdf files is 31MB. Hmm, for me it's 192K. Perhaps you have not bothered to update R recently. > Is there any easy way to get a smaller pdf > file without having to manually prune the dataset? In general, as David noted, you need to do some sort of data summarization; great if tools are available to that, otherwise yourself. In this case, for example, it seems reasonable to do Ecdf(quantile(x, probs = ppoints(500, a=1))) If you don't like to do this yourself, ecdfplot() in latticeExtra will allow library(latticeExtra) ecdfplot(x, f.value = ppoints(500, a=1)) -Deepayan ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Thanks everyone for your replies. I didn't know about the ecdfplot
function, so I'll start using that instead of Ecdf. Why is Ecdf not a lattice plot? The result certainly looks like other lattice plots, the arguments are similar to other lattice plots. In fact, internally it seems to just call the "histogram" function with a different prepanel and panel function. Is it not considered a lattice plot only because it isn't part of the lattice package? Thanks. - Elliot On Tue, Jul 31, 2012 at 2:32 AM, Deepayan Sarkar <[hidden email]>wrote: > On Tue, Jul 31, 2012 at 2:43 AM, Elliot Joel Bernstein > <[hidden email]> wrote: > > Is there an easy way to "thin" a lattice plot? I often create plots from > > large data sets, and use the "pdf" command to save them to a file, but > the > > resulting files can be huge, because every point in the underlying > dataset > > is rendered in the plot, even though it isn't possible to see that much > > detail. > > > > For example: > > > > require(Hmisc) > > x <- rnorm(1e6) > > > > pdf("test.pdf") > > Ecdf(x) > > dev.off() > > (This is not a lattice plot, BTW.) > > > The resulting pdf files is 31MB. > > Hmm, for me it's 192K. Perhaps you have not bothered to update R recently. > > > Is there any easy way to get a smaller pdf > > file without having to manually prune the dataset? > > In general, as David noted, you need to do some sort of data > summarization; great if tools are available to that, otherwise > yourself. In this case, for example, it seems reasonable to do > > Ecdf(quantile(x, probs = ppoints(500, a=1))) > > If you don't like to do this yourself, ecdfplot() in latticeExtra will > allow > > library(latticeExtra) > ecdfplot(x, f.value = ppoints(500, a=1)) > > -Deepayan > -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: [hidden email] [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
Well, yes.
Terminology-wise, I guess one could say that it's a trellis plot in the Hmisc package. But I'd agree that this is nitpicking. -- Bert On Tue, Jul 31, 2012 at 6:13 AM, Elliot Joel Bernstein <[hidden email]> wrote: > Thanks everyone for your replies. I didn't know about the ecdfplot > function, so I'll start using that instead of Ecdf. Why is Ecdf not a > lattice plot? The result certainly looks like other lattice plots, the > arguments are similar to other lattice plots. In fact, internally it seems > to just call the "histogram" function with a different prepanel and panel > function. Is it not considered a lattice plot only because it isn't part of > the lattice package? > > Thanks. > > - Elliot > > On Tue, Jul 31, 2012 at 2:32 AM, Deepayan Sarkar > <[hidden email]>wrote: > >> On Tue, Jul 31, 2012 at 2:43 AM, Elliot Joel Bernstein >> <[hidden email]> wrote: >> > Is there an easy way to "thin" a lattice plot? I often create plots from >> > large data sets, and use the "pdf" command to save them to a file, but >> the >> > resulting files can be huge, because every point in the underlying >> dataset >> > is rendered in the plot, even though it isn't possible to see that much >> > detail. >> > >> > For example: >> > >> > require(Hmisc) >> > x <- rnorm(1e6) >> > >> > pdf("test.pdf") >> > Ecdf(x) >> > dev.off() >> >> (This is not a lattice plot, BTW.) >> >> > The resulting pdf files is 31MB. >> >> Hmm, for me it's 192K. Perhaps you have not bothered to update R recently. >> >> > Is there any easy way to get a smaller pdf >> > file without having to manually prune the dataset? >> >> In general, as David noted, you need to do some sort of data >> summarization; great if tools are available to that, otherwise >> yourself. In this case, for example, it seems reasonable to do >> >> Ecdf(quantile(x, probs = ppoints(500, a=1))) >> >> If you don't like to do this yourself, ecdfplot() in latticeExtra will >> allow >> >> library(latticeExtra) >> ecdfplot(x, f.value = ppoints(500, a=1)) >> >> -Deepayan >> > > > > -- > Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC > 134 Mount Auburn Street | Cambridge, MA | 02138 > Phone: (617) 503-4619 | Email: [hidden email] > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
In reply to this post by Elliot
On Tue, Jul 31, 2012 at 6:43 PM, Elliot Joel Bernstein
<[hidden email]> wrote: > Thanks everyone for your replies. I didn't know about the ecdfplot function, > so I'll start using that instead of Ecdf. Why is Ecdf not a lattice plot? > The result certainly looks like other lattice plots, the arguments are > similar to other lattice plots. In fact, internally it seems to just call > the "histogram" function with a different prepanel and panel function. Is it > not considered a lattice plot only because it isn't part of the lattice > package? Of course not. What you are saying is a valid description of the Ecdf.formula() method, which definitely produces a lattice plot (or trellis plot if you prefer). However, the example you gave, namely, x <- rnorm(1e6) Ecdf(x) ends up calling Ecdf.default(), which is very much a traditional graphics function. I should add that this is for Hmisc 3.9-2, and don't know if the behaviour is different with other versions. Note that Ecdf() has more features than ecdfplot(), in particular it allows weights. -Deepayan ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
|
I see. I typically use a (one-sided) formula as the first argument to Ecdf,
but didn't even think about that distinction in putting together this example. Thanks again for your help. - Elliot On Tue, Jul 31, 2012 at 12:46 PM, Deepayan Sarkar <[hidden email] > wrote: > On Tue, Jul 31, 2012 at 6:43 PM, Elliot Joel Bernstein > <[hidden email]> wrote: > > > Thanks everyone for your replies. I didn't know about the ecdfplot > function, > > so I'll start using that instead of Ecdf. Why is Ecdf not a lattice plot? > > The result certainly looks like other lattice plots, the arguments are > > similar to other lattice plots. In fact, internally it seems to just call > > the "histogram" function with a different prepanel and panel function. > Is it > > not considered a lattice plot only because it isn't part of the lattice > > package? > > Of course not. What you are saying is a valid description of the > Ecdf.formula() method, which definitely produces a lattice plot (or > trellis plot if you prefer). However, the example you gave, namely, > > x <- rnorm(1e6) > Ecdf(x) > > ends up calling Ecdf.default(), which is very much a traditional > graphics function. I should add that this is for Hmisc 3.9-2, and > don't know if the behaviour is different with other versions. > > Note that Ecdf() has more features than ecdfplot(), in particular it > allows weights. > > -Deepayan > -- Elliot Joel Bernstein, Ph.D. | Research Associate | FDO Partners, LLC 134 Mount Auburn Street | Cambridge, MA | 02138 Phone: (617) 503-4619 | Email: [hidden email] [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. |
| Powered by Nabble | Edit this page |
