Display time of PDF plots

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Display time of PDF plots

Rich Shepard
   This may be an inappropriate forum for this question. If so, please point
me in a better direction.

   A current project includes scatter plots with thousands of points. Saved
as PDF files they display slowly using a pdf viewer or when included in the
PDF output of a LaTeX document.

   Is there a process by which these plots can be 'thinned' so they show the
same overall patterns but with fewer points so they display more quickly?

   Rasterizing them to .jpg files using 'convert' allows them to load
immediately, but the bit-mapped resolution is, of course, much lower than
the vector PDF format.

Rich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Display time of PDF plots

Bert Gunter-2
1. Plot a random sample of the points (e.g. of rows of matrix/dataframe
containing "x" and "y" columns

2. See the hexbin package

3. Check out the graphics taskview on cran:
https://cran.r-project.org/web/views/Graphics.html
(though it may be somewhat dated by now)

4. Internet search:  e.g. on "display scatterplots with thousands of
points"
typical hit:
https://stackoverflow.com/questions/7714677/scatterplot-with-too-many-points

5. Search/Post on stats.stackexchange.com instead.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Sep 3, 2018 at 10:45 AM Rich Shepard <[hidden email]>
wrote:

>    This may be an inappropriate forum for this question. If so, please
> point
> me in a better direction.
>
>    A current project includes scatter plots with thousands of points. Saved
> as PDF files they display slowly using a pdf viewer or when included in the
> PDF output of a LaTeX document.
>
>    Is there a process by which these plots can be 'thinned' so they show
> the
> same overall patterns but with fewer points so they display more quickly?
>
>    Rasterizing them to .jpg files using 'convert' allows them to load
> immediately, but the bit-mapped resolution is, of course, much lower than
> the vector PDF format.
>
> Rich
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Display time of PDF plots

Rich Shepard
On Mon, 3 Sep 2018, Bert Gunter wrote:

> 1. Plot a random sample of the points (e.g. of rows of matrix/dataframe
> containing "x" and "y" columns
>
> 2. See the hexbin package
>
> 3. Check out the graphics taskview on cran:
> https://cran.r-project.org/web/views/Graphics.html
> (though it may be somewhat dated by now)
>
> 4. Internet search:  e.g. on "display scatterplots with thousands of
> points"
> typical hit:
> https://stackoverflow.com/questions/7714677/scatterplot-with-too-many-points
>
> 5. Search/Post on stats.stackexchange.com instead.

Bert,

   I did a web search without finding useful information. Probably not the
best search terms.

   Will implement your suggestions.

Thanks,

Rich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Display time of PDF plots

David Carlson
In reply to this post by Rich Shepard
If the plot is being displayed on a monitor, it is being bitmapped to the resolution of the display device regardless of how you save it. Most computer monitors are about 100dpi.

If the problem is that the points are overprinting, Bert's suggestion to use hexbin() is the way to go.

If the points are not substantially overprinting, you could just save the plot in raster format using an lzh compressed tif() or png() to the maximum likely resolution of the display device (take zooming into account by going up to 600dpi or 1200dpi, for example). Don't use jpg since it is lossy and you will get halos when you zoom in.

You can always preserve a vector version for publication. If you have Adobe Acrobat (not Reader), you can Save As Other | Image | tiff (or png) and set the resolution before exporting.

----------------------------
David L. Carlson
Department of Anthropology
Texas A&M University


-----Original Message-----
From: R-help [mailto:[hidden email]] On Behalf Of Rich Shepard
Sent: Monday, September 3, 2018 12:45 PM
To: [hidden email]
Subject: [R] Display time of PDF plots

   This may be an inappropriate forum for this question. If so, please point
me in a better direction.

   A current project includes scatter plots with thousands of points. Saved
as PDF files they display slowly using a pdf viewer or when included in the
PDF output of a LaTeX document.

   Is there a process by which these plots can be 'thinned' so they show the
same overall patterns but with fewer points so they display more quickly?

   Rasterizing them to .jpg files using 'convert' allows them to load
immediately, but the bit-mapped resolution is, of course, much lower than
the vector PDF format.

Rich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Display time of PDF plots

Rich Shepard
On Mon, 3 Sep 2018, David L Carlson wrote:

> If the plot is being displayed on a monitor, it is being bitmapped to the
> resolution of the display device regardless of how you save it. Most
> computer monitors are about 100dpi.

David,

   I'm looking at the report on the monitor. I suspect that most readers
will, too. But, some will print it.

> If the problem is that the points are overprinting, Bert's suggestion to
> use hexbin() is the way to go.

   Most look like overprints, but at the top there are discrete print
characters.

> If the points are not substantially overprinting, you could just save the
> plot in raster format using an lzh compressed tif() or png() to the
> maximum likely resolution of the display device (take zooming into account
> by going up to 600dpi or 1200dpi, for example). Don't use jpg since it is
> lossy and you will get halos when you zoom in.

   I used convert to produce .png images but, of course, bit-maps of plots
and text are less sharp than are vector images.

> You can always preserve a vector version for publication. If you have
> Adobe Acrobat (not Reader), you can Save As Other | Image | tiff (or png)
> and set the resolution before exporting.

   'convert', the ImageMagick tool, does this, too.

Thanks,

Rich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Re: Display time of PDF plots

Paul Murrell-2
In reply to this post by Bert Gunter-2
Hi

Another option is to just rasterize the points (but leave the rest of
the plot vector).  See ...

https://www.stat.auckland.ac.nz/~paul/Reports/rasterize/rasterize.html

Paul

On 04/09/18 06:20, Bert Gunter wrote:

> 1. Plot a random sample of the points (e.g. of rows of matrix/dataframe
> containing "x" and "y" columns
>
> 2. See the hexbin package
>
> 3. Check out the graphics taskview on cran:
> https://cran.r-project.org/web/views/Graphics.html
> (though it may be somewhat dated by now)
>
> 4. Internet search:  e.g. on "display scatterplots with thousands of
> points"
> typical hit:
> https://stackoverflow.com/questions/7714677/scatterplot-with-too-many-points
>
> 5. Search/Post on stats.stackexchange.com instead.
>
> -- Bert
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Sep 3, 2018 at 10:45 AM Rich Shepard <[hidden email]>
> wrote:
>
>>     This may be an inappropriate forum for this question. If so, please
>> point
>> me in a better direction.
>>
>>     A current project includes scatter plots with thousands of points. Saved
>> as PDF files they display slowly using a pdf viewer or when included in the
>> PDF output of a LaTeX document.
>>
>>     Is there a process by which these plots can be 'thinned' so they show
>> the
>> same overall patterns but with fewer points so they display more quickly?
>>
>>     Rasterizing them to .jpg files using 'convert' allows them to load
>> immediately, but the bit-mapped resolution is, of course, much lower than
>> the vector PDF format.
>>
>> Rich
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: [FORGED] Re: Display time of PDF plots

Rich Shepard
On Tue, 4 Sep 2018, Paul Murrell wrote:

> Another option is to just rasterize the points (but leave the rest of the
> plot vector). See ...
> https://www.stat.auckland.ac.nz/~paul/Reports/rasterize/rasterize.html

Paul,

   Thanks very much for the suggestion and URL.

Regards,

Rich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Display time of PDF plots

Rich Shepard
In reply to this post by Rich Shepard
On Mon, 3 Sep 2018, Sorkin, John wrote:

> Might it help to take a random subset of the data and plot the sub set? If
> the relation is linear you could include a regression line obtained from
> the entire data set

John,

   I'll definitely explore this option. Thanks for the idea.

Regards,

Rich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Display time of PDF plots

Rich Shepard
In reply to this post by Rich Shepard
On Mon, 3 Sep 2018, Rich Shepard wrote:

> Is there a process by which these plots can be 'thinned' so they show the
> same overall patterns but with fewer points so they display more quickly?

Bert/Paul/David/John:

   Thanks very much for the suggestions. I think an appropriate way to
illustrate the patterns is to plot the median and maximum for each month
(for all sites). That's the important information and plotting each daily
point over 13 years obscures that information.

   The dataframe is structured this way:

str(rainfall)
'data.frame': 113569 obs. of  6 variables:
  $ name    : chr  "Headworks Portland Water" "Headworks Portland Water" "Headworks Portland Water" "Headworks Portland Water" ...
  $ easting : num  2370575 2370575 2370575 2370575 2370575 ...
  $ northing: num  199338 199338 199338 199338 199338 ...
  $ elev    : num  228 228 228 228 228 228 228 228 228 228 ...
  $ sampdate: Date, format: "2005-01-01" "2005-01-02" ...
  $ prcp    : num  0.59 0.08 0.1 0 0 0.02 0.05 0.1 0 0.02 ...

   There are probably multiple ways of extracting the monthly median and
maximum 'prcp' and I don't know how to identify the appropriate one. Is
there a task view for this type of data manipulation? I've not before done
anything like this and would appreciate a pointer to where I start to learn.

Regards,

Rich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Display time of PDF plots

R help mailing list-2
(this is somewhat a change of subject from the original question)

Rich, there functions such as aggregate() in base R. There are also many options in CRAN packages.

But I tend to have difficulty getting them to do exactly what I want, and usually end up rolling my own.

The idea is to split the data into groups by station and month, then calculate summary stats for each group, then recombine into a new data frame.

## untested with your data, but this kind of approach works well for me
## note that this code assumes easting, northing, and elevation are in fact unique within each group
## if they are not, you will get an ERROR

## add a 'month' variable
raindf <- rainfall
raindf$mon <- format(raindf$sampdate,'%Y-%m')
 
  mysum <- function(df) {
    data.frame( name=unique(df$name),
               easting=unique(df$easting),
               northing=unique(df$northing),
               elev=unique(df$elev),
               mon=unique(df$mon),
               pr.med=median(df$prcp),
               pr.max=max(df$prcp) )
  }

tmpdf <- split(raindf, paste(raindf$name, raindf$mon) )

## at this point, you can check your summary stats function with, for example,
mysum(tmpdf[[1]])
mysum(tmpdf[[2]])

## when satisfied with mysum(), do this
tmpsum <- lapply(tmpdf, mysum)

## recombine
rain.by.mon <- do.call(rbind, tmpsum)

## might still want to create a numeric month to facilitate plotting
## or maybe assign each month to the first of the month, or the 15th, or end or whatever makes sense
rain.by.mon$mondt <- as.Date(paste0(rain.by.mon$mon,'-1'))




--
Don MacQueen
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062
Lab cell 925-724-7509
 
 

´╗┐On 9/4/18, 9:41 AM, "R-help on behalf of Rich Shepard" <[hidden email] on behalf of [hidden email]> wrote:

    On Mon, 3 Sep 2018, Rich Shepard wrote:
   
    > Is there a process by which these plots can be 'thinned' so they show the
    > same overall patterns but with fewer points so they display more quickly?
   
    Bert/Paul/David/John:
   
       Thanks very much for the suggestions. I think an appropriate way to
    illustrate the patterns is to plot the median and maximum for each month
    (for all sites). That's the important information and plotting each daily
    point over 13 years obscures that information.
   
       The dataframe is structured this way:
   
    str(rainfall)
    'data.frame': 113569 obs. of  6 variables:
      $ name    : chr  "Headworks Portland Water" "Headworks Portland Water" "Headworks Portland Water" "Headworks Portland Water" ...
      $ easting : num  2370575 2370575 2370575 2370575 2370575 ...
      $ northing: num  199338 199338 199338 199338 199338 ...
      $ elev    : num  228 228 228 228 228 228 228 228 228 228 ...
      $ sampdate: Date, format: "2005-01-01" "2005-01-02" ...
      $ prcp    : num  0.59 0.08 0.1 0 0 0.02 0.05 0.1 0 0.02 ...
   
       There are probably multiple ways of extracting the monthly median and
    maximum 'prcp' and I don't know how to identify the appropriate one. Is
    there a task view for this type of data manipulation? I've not before done
    anything like this and would appreciate a pointer to where I start to learn.
   
    Regards,
   
    Rich
   
    ______________________________________________
    [hidden email] mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.
   

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.