How to create a readable plot in R with 10000+ values in a dataframe

classic Classic list List threaded Threaded
10 messages Options
Rit
Reply | Threaded
Open this post in threaded view
|

How to create a readable plot in R with 10000+ values in a dataframe

Rit
How to create a readable and legible plot in R with 10k+ values.I have a
dataframe with 17298 records.There are two columns:Machine Name(Character)
and Region(Character).So i want to create a readable plot with region in x
axis and machine name in y axis.How do i do that using ggplot or any other
way.Please help.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a readable plot in R with 10000+ values in a dataframe

Duncan Murdoch-2
On 23/07/2020 2:11 p.m., Ritwik Mohapatra wrote:
> How to create a readable and legible plot in R with 10k+ values.I have a
> dataframe with 17298 records.There are two columns:Machine Name(Character)
> and Region(Character).So i want to create a readable plot with region in x
> axis and machine name in y axis.How do i do that using ggplot or any other
> way.Please help.

Can you point to the URL of a plot online that is similar to what you
want?  I can't imagine a way to show 17298 character records in a graph
in any useful way.

Duncan Murdoch

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a readable plot in R with 10000+ values in a dataframe

Martin Maechler
In reply to this post by Rit
>>>>> Ritwik Mohapatra
>>>>>     on Thu, 23 Jul 2020 23:41:57 +0530 writes:

    > How to create a readable and legible plot in R with 10k+ values.I have a
    > dataframe with 17298 records.There are two columns:Machine Name(Character)
    > and Region(Character).So i want to create a readable plot with region in x
    > axis and machine name in y axis.How do i do that using ggplot or any other
    > way.Please help.

Good answers to this question will depend very much on how many
'Machine' and 'Region' levels there are.

(and this is a case where in my opinion it'd be *MUCH* more
 useful to have 'factor' instead of 'character'.. if only just
 so
  str(<data>)
or   summary(<data>)

would give useful/relevant information.

--
One possibility for a somewhat cute plot is a  "good ole"
sunflower plot (base graphics, but the idea must be easily
transferable to grid-based graphics such as ggplot2):

  help(sunflowerplot)


Martin Maechler
ETH Zurich

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Rit
Reply | Threaded
Open this post in threaded view
|

Re: How to create a readable plot in R with 10000+ values in a dataframe

Rit
Hi All,

These are the two codes i have used so far:-
ggplot(df3_machine_region,aes(Region,Machine.Name)) +
  geom_count()
!![2nd Plot|690x375](upload://gTyYUXe6lPJXCdyvqRBtUZ8zsyL.png) [1st
Plot|690x375](upload://bb0ux9WheqM4ViyYf3Gki6TKtlG.png)
ggplot(df3_machine_region,aes(Region,Machine.Name)) +
  geom_jitter(aes(colour=Region))

I have to present the plot to my stakeholders,so thats why its required in
a readable and legible way.

There would be approximately 10k+ values(max) for machine and region
combination.

I have attached the output plots for your reference.Please find below a
snapshot of data for your reference.

|Machine.Name|Region|
|0460-EPBS1.sga-res.com|Europe|
|04821-EABS1.sga-res.com|Europe|
|10429-EDABS1.sga-res.com|Europe|
|1042619-ESWEBS1.sga-res.com|Europe|
|ABE-L-98769.europe.shell.com|Americas|
|AB-L-98769.europe.shell.com|APAC|
|AB-L-98769.europe.shell.com|Europe|
|ABE-L-98769.europe.shell.com (2)|Americas|
|ABE-L-98769.europe.shell.com (2)|Europe|
|ABE-L-98840.europe.shell.com|Americas|
|AB-L-98840.europe.shell.com|APAC|
|ABE-L-98840.europe.shell.com|Europe|
|AB-L-98854.europe.shell.com|Americas|
|ABE-L-98854.europe.shell.com|Europe|
|ABE-L-98862.europe.shell.com|Americas|

Regards,
Ritwik

On Fri, Jul 24, 2020 at 6:05 PM Martin Maechler <[hidden email]>
wrote:

> >>>>> Ritwik Mohapatra
> >>>>>     on Thu, 23 Jul 2020 23:41:57 +0530 writes:
>
>     > How to create a readable and legible plot in R with 10k+ values.I
> have a
>     > dataframe with 17298 records.There are two columns:Machine
> Name(Character)
>     > and Region(Character).So i want to create a readable plot with
> region in x
>     > axis and machine name in y axis.How do i do that using ggplot or any
> other
>     > way.Please help.
>
> Good answers to this question will depend very much on how many
> 'Machine' and 'Region' levels there are.
>
> (and this is a case where in my opinion it'd be *MUCH* more
>  useful to have 'factor' instead of 'character'.. if only just
>  so
>          str(<data>)
> or   summary(<data>)
>
> would give useful/relevant information.
>
> --
> One possibility for a somewhat cute plot is a  "good ole"
> sunflower plot (base graphics, but the idea must be easily
> transferable to grid-based graphics such as ggplot2):
>
>   help(sunflowerplot)
>
>
> Martin Maechler
> ETH Zurich
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1st Plot.png (39K) Download Attachment
2nd Plot.png (47K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to create a readable plot in R with 10000+ values in a dataframe

Jim Lemon-4
Hi Ritwik,
I haven't seen any further answers to your request, so I'll make a
suggestion. I don't think there is any sensible way to illustrate that
many data points on a single plot. I would try to segment the data by
machine type or similar and plot a number of plots.

Jim

On Fri, Jul 24, 2020 at 11:34 PM Ritwik Mohapatra <[hidden email]> wrote:

>
> Hi All,
>
> These are the two codes i have used so far:-
> ggplot(df3_machine_region,aes(Region,Machine.Name)) +
>   geom_count()
> !![2nd Plot|690x375](upload://gTyYUXe6lPJXCdyvqRBtUZ8zsyL.png) [1st
> Plot|690x375](upload://bb0ux9WheqM4ViyYf3Gki6TKtlG.png)
> ggplot(df3_machine_region,aes(Region,Machine.Name)) +
>   geom_jitter(aes(colour=Region))
>
> I have to present the plot to my stakeholders,so thats why its required in
> a readable and legible way.
>
> There would be approximately 10k+ values(max) for machine and region
> combination.
>
> I have attached the output plots for your reference.Please find below a
> snapshot of data for your reference.
>
> |Machine.Name|Region|
> |0460-EPBS1.sga-res.com|Europe|
> |04821-EABS1.sga-res.com|Europe|
> |10429-EDABS1.sga-res.com|Europe|
> |1042619-ESWEBS1.sga-res.com|Europe|
> |ABE-L-98769.europe.shell.com|Americas|
> |AB-L-98769.europe.shell.com|APAC|
> |AB-L-98769.europe.shell.com|Europe|
> |ABE-L-98769.europe.shell.com (2)|Americas|
> |ABE-L-98769.europe.shell.com (2)|Europe|
> |ABE-L-98840.europe.shell.com|Americas|
> |AB-L-98840.europe.shell.com|APAC|
> |ABE-L-98840.europe.shell.com|Europe|
> |AB-L-98854.europe.shell.com|Americas|
> |ABE-L-98854.europe.shell.com|Europe|
> |ABE-L-98862.europe.shell.com|Americas|
>
> Regards,
> Ritwik
>
> On Fri, Jul 24, 2020 at 6:05 PM Martin Maechler <[hidden email]>
> wrote:
>
> > >>>>> Ritwik Mohapatra
> > >>>>>     on Thu, 23 Jul 2020 23:41:57 +0530 writes:
> >
> >     > How to create a readable and legible plot in R with 10k+ values.I
> > have a
> >     > dataframe with 17298 records.There are two columns:Machine
> > Name(Character)
> >     > and Region(Character).So i want to create a readable plot with
> > region in x
> >     > axis and machine name in y axis.How do i do that using ggplot or any
> > other
> >     > way.Please help.
> >
> > Good answers to this question will depend very much on how many
> > 'Machine' and 'Region' levels there are.
> >
> > (and this is a case where in my opinion it'd be *MUCH* more
> >  useful to have 'factor' instead of 'character'.. if only just
> >  so
> >          str(<data>)
> > or   summary(<data>)
> >
> > would give useful/relevant information.
> >
> > --
> > One possibility for a somewhat cute plot is a  "good ole"
> > sunflower plot (base graphics, but the idea must be easily
> > transferable to grid-based graphics such as ggplot2):
> >
> >   help(sunflowerplot)
> >
> >
> > Martin Maechler
> > ETH Zurich
> >
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a readable plot in R with 10000+ values in a dataframe

Carlos Ortega
Hello Ritwik,

There is another possibility.

You can count (crosstab) the number of elements for each Region and Machine
(with table() function) and represent this table with geom_tile() function.
Wit this you will get an equivalent of a heatmap which will give you a good
sense of which combination of Region/Machine prevails.

Here you can get an example of how to use it:

   - https://www.r-graph-gallery.com/79-levelplot-with-ggplot2.html

And, just in in case you have to represent numeric values (numeric scatter
plot) there is an excellent way to graph that with this package, without
leaving ggplot ecosystem:

https://github.com/LKremer/ggpointdensity

Thanks,
Carlos Ortega.

On Wed, Jul 29, 2020 at 11:31 AM Jim Lemon <[hidden email]> wrote:

> Hi Ritwik,
> I haven't seen any further answers to your request, so I'll make a
> suggestion. I don't think there is any sensible way to illustrate that
> many data points on a single plot. I would try to segment the data by
> machine type or similar and plot a number of plots.
>
> Jim
>
> On Fri, Jul 24, 2020 at 11:34 PM Ritwik Mohapatra <[hidden email]>
> wrote:
> >
> > Hi All,
> >
> > These are the two codes i have used so far:-
> > ggplot(df3_machine_region,aes(Region,Machine.Name)) +
> >   geom_count()
> > !![2nd Plot|690x375](upload://gTyYUXe6lPJXCdyvqRBtUZ8zsyL.png) [1st
> > Plot|690x375](upload://bb0ux9WheqM4ViyYf3Gki6TKtlG.png)
> > ggplot(df3_machine_region,aes(Region,Machine.Name)) +
> >   geom_jitter(aes(colour=Region))
> >
> > I have to present the plot to my stakeholders,so thats why its required
> in
> > a readable and legible way.
> >
> > There would be approximately 10k+ values(max) for machine and region
> > combination.
> >
> > I have attached the output plots for your reference.Please find below a
> > snapshot of data for your reference.
> >
> > |Machine.Name|Region|
> > |0460-EPBS1.sga-res.com|Europe|
> > |04821-EABS1.sga-res.com|Europe|
> > |10429-EDABS1.sga-res.com|Europe|
> > |1042619-ESWEBS1.sga-res.com|Europe|
> > |ABE-L-98769.europe.shell.com|Americas|
> > |AB-L-98769.europe.shell.com|APAC|
> > |AB-L-98769.europe.shell.com|Europe|
> > |ABE-L-98769.europe.shell.com (2)|Americas|
> > |ABE-L-98769.europe.shell.com (2)|Europe|
> > |ABE-L-98840.europe.shell.com|Americas|
> > |AB-L-98840.europe.shell.com|APAC|
> > |ABE-L-98840.europe.shell.com|Europe|
> > |AB-L-98854.europe.shell.com|Americas|
> > |ABE-L-98854.europe.shell.com|Europe|
> > |ABE-L-98862.europe.shell.com|Americas|
> >
> > Regards,
> > Ritwik
> >
> > On Fri, Jul 24, 2020 at 6:05 PM Martin Maechler <
> [hidden email]>
> > wrote:
> >
> > > >>>>> Ritwik Mohapatra
> > > >>>>>     on Thu, 23 Jul 2020 23:41:57 +0530 writes:
> > >
> > >     > How to create a readable and legible plot in R with 10k+ values.I
> > > have a
> > >     > dataframe with 17298 records.There are two columns:Machine
> > > Name(Character)
> > >     > and Region(Character).So i want to create a readable plot with
> > > region in x
> > >     > axis and machine name in y axis.How do i do that using ggplot or
> any
> > > other
> > >     > way.Please help.
> > >
> > > Good answers to this question will depend very much on how many
> > > 'Machine' and 'Region' levels there are.
> > >
> > > (and this is a case where in my opinion it'd be *MUCH* more
> > >  useful to have 'factor' instead of 'character'.. if only just
> > >  so
> > >          str(<data>)
> > > or   summary(<data>)
> > >
> > > would give useful/relevant information.
> > >
> > > --
> > > One possibility for a somewhat cute plot is a  "good ole"
> > > sunflower plot (base graphics, but the idea must be easily
> > > transferable to grid-based graphics such as ggplot2):
> > >
> > >   help(sunflowerplot)
> > >
> > >
> > > Martin Maechler
> > > ETH Zurich
> > >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a readable plot in R with 10000+ values in a dataframe

Dr Eberhard Lisse
In reply to this post by Rit
I always find two things helpful

1) RTFM

2) Asking myself what information do I want to convey
   before thinking about how to do that.

From the below I can not understand what you want to tell
your audience.

I don't think it's helpful trying to read 17298 names on a
plot so maybe show the counts by region, perhaps with another
grouping.

From the data sample in another post, one could maybe group/count
count the host(names) and them plot it on a worldmap with a colour
scale showing the numbers.


el

On 2020-07-23 20:11 , Ritwik Mohapatra wrote:
> How to create a readable and legible plot in R with 10k+ values.I have a
> dataframe with 17298 records.There are two columns:Machine Name(Character)
> and Region(Character).So i want to create a readable plot with region in x
> axis and machine name in y axis.How do i do that using ggplot or any other
> way.Please help.
>
> [[alternative HTML version deleted]]
>


--
If you want to email me, replace nospam with el

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a readable plot in R with 10000+ values in a dataframe

Abby Spurdle
In reply to this post by Martin Maechler
On Sat, Jul 25, 2020 at 12:40 AM Martin Maechler
<[hidden email]> wrote:
> Good answers to this question will depend very much on how many
> 'Machine' and 'Region' levels there are.

I second that.
And unless I missed something, the OP hasn't answered this question, as such.
But "10k+" combinations, does imply around 100 levels each.

Another important question is, are the combinations unique or not?

It would be possible to create an (approx):
    100x100 heatmap of boolean values, for unique combinations, or;
    100x100 heatmap of counts (or density), for non-unique combinations.

But unless there's some meaningful order to the levels, the resulting
plot may end up looking like a $3 pizza.
I'm unable to comment on possible exploratory value, but I doubt that
this is a good approach, for presentation purposes.

If the goal was some sort of ranking, a textual summary, may work better...?
Or you could plot relevant subsets of the data...

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: How to create a readable plot in R with 10000+ values in a dataframe

Jim Lemon-4
In reply to this post by Rit
Hi Ritwik,
Carlos made an excellent suggestion and there are at least two ways to
plot "machine" and "region" as the cells in a 2D matrix and then add
two more variables (say count and price) as the attributes of each
cell. Is the data you are using publicly available? If so a
demonstration of this would not be difficult to program.

Jim

On Fri, Jul 24, 2020 at 9:55 PM Ritwik Mohapatra <[hidden email]> wrote:

>
> How to create a readable and legible plot in R with 10k+ values.I have a
> dataframe with 17298 records.There are two columns:Machine Name(Character)
> and Region(Character).So i want to create a readable plot with region in x
> axis and machine name in y axis.How do i do that using ggplot or any other
> way.Please help.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Rit
Reply | Threaded
Open this post in threaded view
|

Re: How to create a readable plot in R with 10000+ values in a dataframe

Rit
In reply to this post by Dr Eberhard Lisse
Hi All,

Thanks for all the suggestions and help.I have gone for simpler plots with
lesser values for demonstration now which served the purpose.

Regards,
Ritwik

On Thu, 30 Jul, 2020, 22:00 Dr Eberhard Lisse, <[hidden email]> wrote:

> I always find two things helpful
>
> 1) RTFM
>
> 2) Asking myself what information do I want to convey
>    before thinking about how to do that.
>
> From the below I can not understand what you want to tell
> your audience.
>
> I don't think it's helpful trying to read 17298 names on a
> plot so maybe show the counts by region, perhaps with another
> grouping.
>
> From the data sample in another post, one could maybe group/count
> count the host(names) and them plot it on a worldmap with a colour
> scale showing the numbers.
>
>
> el
>
> On 2020-07-23 20:11 , Ritwik Mohapatra wrote:
> > How to create a readable and legible plot in R with 10k+ values.I have a
> > dataframe with 17298 records.There are two columns:Machine
> Name(Character)
> > and Region(Character).So i want to create a readable plot with region in
> x
> > axis and machine name in y axis.How do i do that using ggplot or any
> other
> > way.Please help.
> >
> >       [[alternative HTML version deleted]]
> >
>
>
> --
> If you want to email me, replace nospam with el
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.