how to show percentage of individuals for two groups on histogram?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

how to show percentage of individuals for two groups on histogram?

anikaM
Hello,

I have a data frame like this:
> head(a)
         FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
1 fam1000-03 G1000      1      1      38  10.2    1 control
2 fam1001-03 G1001      1      1      15   7.3    1 control
3 fam1003-03 G1003      1      2      17   7.0    1    case
4 fam1005-03 G1005      1      1      36   7.7    1 control
5 fam1009-03 G1009      1      1      23   7.6    1 control
6 fam1052-03 G1052      1      1      32   7.3    1 control

> dim(a)
[1] 1698    8

I am doing histogram plot via:
ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
position="dodge")

there is 848 who have "case" in pheno column and 892 who have
"control" in pheno column.

I would like to have on y-axis shown percentage of individuals which
have either "case" or "control" in pheno instead of count.

Please advise,
Ana

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to show percentage of individuals for two groups on histogram?

anikaM
the result would basically look something like this on in attach or
the overlay of those two plots


On Thu, May 21, 2020 at 5:23 PM Ana Marija <[hidden email]> wrote:

>
> Hello,
>
> I have a data frame like this:
> > head(a)
>          FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
> 1 fam1000-03 G1000      1      1      38  10.2    1 control
> 2 fam1001-03 G1001      1      1      15   7.3    1 control
> 3 fam1003-03 G1003      1      2      17   7.0    1    case
> 4 fam1005-03 G1005      1      1      36   7.7    1 control
> 5 fam1009-03 G1009      1      1      23   7.6    1 control
> 6 fam1052-03 G1052      1      1      32   7.3    1 control
>
> > dim(a)
> [1] 1698    8
>
> I am doing histogram plot via:
> ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> position="dodge")
>
> there is 848 who have "case" in pheno column and 892 who have
> "control" in pheno column.
>
> I would like to have on y-axis shown percentage of individuals which
> have either "case" or "control" in pheno instead of count.
>
> Please advise,
> Ana

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Screen Shot 2020-05-21 at 5.49.37 PM.png (70K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: how to show percentage of individuals for two groups on histogram?

Jim Lemon-4
Hi Ana,
My apologies for the pedestrian graphics, but it may help.

# a bit of fake data
aafd<-data.frame(FID=paste0("fam",1000:2739),
 IID=paste0("G",1000,2739),FLASER=rep(1,1740),
 PLASER=c(rep(1,892),rep(2,848)),
 DIABDUR=sample(10:50,1740,TRUE),
 HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
 pheno=c(rep("control",892),rep("case",848)))
par(mfrow=c(2,1))
casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
par(mar=c(0,4,1,2))
barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
 space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
text(mean(barpos),23,
 "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
box()
par(mar=c(3,4,0,2))
barplot(100*controlhist,names.arg=names(controlpct),
 space=0,ylab="Percentage",col="orange",ylim=c(0,25))
text(mean(barpos),23,
 "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
box()

Jim

On Fri, May 22, 2020 at 9:08 AM Ana Marija <[hidden email]> wrote:

>
> the result would basically look something like this on in attach or
> the overlay of those two plots
>
>
> On Thu, May 21, 2020 at 5:23 PM Ana Marija <[hidden email]> wrote:
> >
> > Hello,
> >
> > I have a data frame like this:
> > > head(a)
> >          FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
> > 1 fam1000-03 G1000      1      1      38  10.2    1 control
> > 2 fam1001-03 G1001      1      1      15   7.3    1 control
> > 3 fam1003-03 G1003      1      2      17   7.0    1    case
> > 4 fam1005-03 G1005      1      1      36   7.7    1 control
> > 5 fam1009-03 G1009      1      1      23   7.6    1 control
> > 6 fam1052-03 G1052      1      1      32   7.3    1 control
> >
> > > dim(a)
> > [1] 1698    8
> >
> > I am doing histogram plot via:
> > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> > position="dodge")
> >
> > there is 848 who have "case" in pheno column and 892 who have
> > "control" in pheno column.
> >
> > I would like to have on y-axis shown percentage of individuals which
> > have either "case" or "control" in pheno instead of count.
> >
> > Please advise,
> > Ana
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to show percentage of individuals for two groups on histogram?

Jim Lemon-4
Hi Ana,
Just noticed a typo from a hasty cut-paste. Two lines should read:

casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))

Jim

On Fri, May 22, 2020 at 2:08 PM Jim Lemon <[hidden email]> wrote:

>
> Hi Ana,
> My apologies for the pedestrian graphics, but it may help.
>
> # a bit of fake data
> aafd<-data.frame(FID=paste0("fam",1000:2739),
>  IID=paste0("G",1000,2739),FLASER=rep(1,1740),
>  PLASER=c(rep(1,892),rep(2,848)),
>  DIABDUR=sample(10:50,1740,TRUE),
>  HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
>  pheno=c(rep("control",892),rep("case",848)))
> par(mfrow=c(2,1))
> casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
> par(mar=c(0,4,1,2))
> barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
>  space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
> text(mean(barpos),23,
>  "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
> box()
> par(mar=c(3,4,0,2))
> barplot(100*controlhist,names.arg=names(controlpct),
>  space=0,ylab="Percentage",col="orange",ylim=c(0,25))
> text(mean(barpos),23,
>  "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
> box()
>
> Jim
>
> On Fri, May 22, 2020 at 9:08 AM Ana Marija <[hidden email]> wrote:
> >
> > the result would basically look something like this on in attach or
> > the overlay of those two plots
> >
> >
> > On Thu, May 21, 2020 at 5:23 PM Ana Marija <[hidden email]> wrote:
> > >
> > > Hello,
> > >
> > > I have a data frame like this:
> > > > head(a)
> > >          FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
> > > 1 fam1000-03 G1000      1      1      38  10.2    1 control
> > > 2 fam1001-03 G1001      1      1      15   7.3    1 control
> > > 3 fam1003-03 G1003      1      2      17   7.0    1    case
> > > 4 fam1005-03 G1005      1      1      36   7.7    1 control
> > > 5 fam1009-03 G1009      1      1      23   7.6    1 control
> > > 6 fam1052-03 G1052      1      1      32   7.3    1 control
> > >
> > > > dim(a)
> > > [1] 1698    8
> > >
> > > I am doing histogram plot via:
> > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> > > position="dodge")
> > >
> > > there is 848 who have "case" in pheno column and 892 who have
> > > "control" in pheno column.
> > >
> > > I would like to have on y-axis shown percentage of individuals which
> > > have either "case" or "control" in pheno instead of count.
> > >
> > > Please advise,
> > > Ana
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to show percentage of individuals for two groups on histogram?

Eric Berger
Hi Ana,
This is a very common question about ggplot.
A quick search turns up lots of hits that answer your question. Here
are a couple
https://community.rstudio.com/t/trouble-scaling-y-axis-to-percentages-from-counts/42999
https://stackoverflow.com/questions/3695497/show-instead-of-counts-in-charts-of-categorical-variables

From reading those discussions, the following should work (untested)

ggplot(a, aes(x = HBA1C, fill=pheno)) + geom_histogram(aes(y =
stat(density)), binwidth = 0.5) +
      scale_y_continuous(labels = scales::percent_format())

HTH,
Eric


On Fri, May 22, 2020 at 7:18 AM Jim Lemon <[hidden email]> wrote:

>
> Hi Ana,
> Just noticed a typo from a hasty cut-paste. Two lines should read:
>
> casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
>
> Jim
>
> On Fri, May 22, 2020 at 2:08 PM Jim Lemon <[hidden email]> wrote:
> >
> > Hi Ana,
> > My apologies for the pedestrian graphics, but it may help.
> >
> > # a bit of fake data
> > aafd<-data.frame(FID=paste0("fam",1000:2739),
> >  IID=paste0("G",1000,2739),FLASER=rep(1,1740),
> >  PLASER=c(rep(1,892),rep(2,848)),
> >  DIABDUR=sample(10:50,1740,TRUE),
> >  HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
> >  pheno=c(rep("control",892),rep("case",848)))
> > par(mfrow=c(2,1))
> > casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> > controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
> > par(mar=c(0,4,1,2))
> > barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
> >  space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
> > text(mean(barpos),23,
> >  "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
> > box()
> > par(mar=c(3,4,0,2))
> > barplot(100*controlhist,names.arg=names(controlpct),
> >  space=0,ylab="Percentage",col="orange",ylim=c(0,25))
> > text(mean(barpos),23,
> >  "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
> > box()
> >
> > Jim
> >
> > On Fri, May 22, 2020 at 9:08 AM Ana Marija <[hidden email]> wrote:
> > >
> > > the result would basically look something like this on in attach or
> > > the overlay of those two plots
> > >
> > >
> > > On Thu, May 21, 2020 at 5:23 PM Ana Marija <[hidden email]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have a data frame like this:
> > > > > head(a)
> > > >          FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
> > > > 1 fam1000-03 G1000      1      1      38  10.2    1 control
> > > > 2 fam1001-03 G1001      1      1      15   7.3    1 control
> > > > 3 fam1003-03 G1003      1      2      17   7.0    1    case
> > > > 4 fam1005-03 G1005      1      1      36   7.7    1 control
> > > > 5 fam1009-03 G1009      1      1      23   7.6    1 control
> > > > 6 fam1052-03 G1052      1      1      32   7.3    1 control
> > > >
> > > > > dim(a)
> > > > [1] 1698    8
> > > >
> > > > I am doing histogram plot via:
> > > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> > > > position="dodge")
> > > >
> > > > there is 848 who have "case" in pheno column and 892 who have
> > > > "control" in pheno column.
> > > >
> > > > I would like to have on y-axis shown percentage of individuals which
> > > > have either "case" or "control" in pheno instead of count.
> > > >
> > > > Please advise,
> > > > Ana
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: how to show percentage of individuals for two groups on histogram?

anikaM
In reply to this post by Jim Lemon-4
HI Jim,

Thank you so much for getting back to me I tried your codes and I got
this in attach,
I think the issue is in calculating percentage per groups (cases or controls)

par(mfrow=c(2,1))
casehist<-table(cut(a$HBA1C[a$pheno=="case"],breaks=0:15))
controlhist<-table(cut(a$HBA1C[a$pheno=="control"],breaks=0:15))

par(mar=c(0,4,1,2))
barpos=barplot(100*casehist,names.arg=names(casehist),col="orange",
               space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
text(mean(barpos),23,
     "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
box()
par(mar=c(3,4,0,2))
barplot(100*controlhist,names.arg=names(controlhist),
        space=0,ylab="Percentage",col="orange",ylim=c(0,25))
text(mean(barpos),23,
     "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
box()

I can send you the whole dataset if you would like to try with it
On Thu, May 21, 2020 at 11:14 PM Jim Lemon <[hidden email]> wrote:

>
> Hi Ana,
> Just noticed a typo from a hasty cut-paste. Two lines should read:
>
> casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
>
> Jim
>
> On Fri, May 22, 2020 at 2:08 PM Jim Lemon <[hidden email]> wrote:
> >
> > Hi Ana,
> > My apologies for the pedestrian graphics, but it may help.
> >
> > # a bit of fake data
> > aafd<-data.frame(FID=paste0("fam",1000:2739),
> >  IID=paste0("G",1000,2739),FLASER=rep(1,1740),
> >  PLASER=c(rep(1,892),rep(2,848)),
> >  DIABDUR=sample(10:50,1740,TRUE),
> >  HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
> >  pheno=c(rep("control",892),rep("case",848)))
> > par(mfrow=c(2,1))
> > casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> > controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
> > par(mar=c(0,4,1,2))
> > barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
> >  space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
> > text(mean(barpos),23,
> >  "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
> > box()
> > par(mar=c(3,4,0,2))
> > barplot(100*controlhist,names.arg=names(controlpct),
> >  space=0,ylab="Percentage",col="orange",ylim=c(0,25))
> > text(mean(barpos),23,
> >  "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
> > box()
> >
> > Jim
> >
> > On Fri, May 22, 2020 at 9:08 AM Ana Marija <[hidden email]> wrote:
> > >
> > > the result would basically look something like this on in attach or
> > > the overlay of those two plots
> > >
> > >
> > > On Thu, May 21, 2020 at 5:23 PM Ana Marija <[hidden email]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I have a data frame like this:
> > > > > head(a)
> > > >          FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
> > > > 1 fam1000-03 G1000      1      1      38  10.2    1 control
> > > > 2 fam1001-03 G1001      1      1      15   7.3    1 control
> > > > 3 fam1003-03 G1003      1      2      17   7.0    1    case
> > > > 4 fam1005-03 G1005      1      1      36   7.7    1 control
> > > > 5 fam1009-03 G1009      1      1      23   7.6    1 control
> > > > 6 fam1052-03 G1052      1      1      32   7.3    1 control
> > > >
> > > > > dim(a)
> > > > [1] 1698    8
> > > >
> > > > I am doing histogram plot via:
> > > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> > > > position="dodge")
> > > >
> > > > there is 848 who have "case" in pheno column and 892 who have
> > > > "control" in pheno column.
> > > >
> > > > I would like to have on y-axis shown percentage of individuals which
> > > > have either "case" or "control" in pheno instead of count.
> > > >
> > > > Please advise,
> > > > Ana
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Screen Shot 2020-05-22 at 9.42.01 AM.png (117K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: how to show percentage of individuals for two groups on histogram?

anikaM
In reply to this post by Eric Berger
Hi Eric,

Thank you for getting back to me, I tried those solutions but they
don't do percentage per groups, so if I do
ggplot(data=subset(a, !is.na(pheno)), aes(x=HBA1C, fill=pheno)) +
geom_histogram(aes(y =

stat(density)), binwidth = 0.5) +
  scale_y_continuous(labels = scales::percent_format())

I am getting the plot in attach, while my results should be more in
this range like on the plot here:
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/variable.cgi?study_id=phs000018.v2.p1&phv=19980&phd=154&pha=2864&pht=62&phvf=&phdf=&phaf=&phtf=&dssp=1&consent=&temp=1


On Fri, May 22, 2020 at 12:18 AM Eric Berger <[hidden email]> wrote:

>
> Hi Ana,
> This is a very common question about ggplot.
> A quick search turns up lots of hits that answer your question. Here
> are a couple
> https://community.rstudio.com/t/trouble-scaling-y-axis-to-percentages-from-counts/42999
> https://stackoverflow.com/questions/3695497/show-instead-of-counts-in-charts-of-categorical-variables
>
> From reading those discussions, the following should work (untested)
>
> ggplot(a, aes(x = HBA1C, fill=pheno)) + geom_histogram(aes(y =
> stat(density)), binwidth = 0.5) +
>       scale_y_continuous(labels = scales::percent_format())
>
> HTH,
> Eric
>
>
> On Fri, May 22, 2020 at 7:18 AM Jim Lemon <[hidden email]> wrote:
> >
> > Hi Ana,
> > Just noticed a typo from a hasty cut-paste. Two lines should read:
> >
> > casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> > controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
> >
> > Jim
> >
> > On Fri, May 22, 2020 at 2:08 PM Jim Lemon <[hidden email]> wrote:
> > >
> > > Hi Ana,
> > > My apologies for the pedestrian graphics, but it may help.
> > >
> > > # a bit of fake data
> > > aafd<-data.frame(FID=paste0("fam",1000:2739),
> > >  IID=paste0("G",1000,2739),FLASER=rep(1,1740),
> > >  PLASER=c(rep(1,892),rep(2,848)),
> > >  DIABDUR=sample(10:50,1740,TRUE),
> > >  HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
> > >  pheno=c(rep("control",892),rep("case",848)))
> > > par(mfrow=c(2,1))
> > > casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
> > > controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
> > > par(mar=c(0,4,1,2))
> > > barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
> > >  space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
> > > text(mean(barpos),23,
> > >  "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
> > > box()
> > > par(mar=c(3,4,0,2))
> > > barplot(100*controlhist,names.arg=names(controlpct),
> > >  space=0,ylab="Percentage",col="orange",ylim=c(0,25))
> > > text(mean(barpos),23,
> > >  "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
> > > box()
> > >
> > > Jim
> > >
> > > On Fri, May 22, 2020 at 9:08 AM Ana Marija <[hidden email]> wrote:
> > > >
> > > > the result would basically look something like this on in attach or
> > > > the overlay of those two plots
> > > >
> > > >
> > > > On Thu, May 21, 2020 at 5:23 PM Ana Marija <[hidden email]> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I have a data frame like this:
> > > > > > head(a)
> > > > >          FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
> > > > > 1 fam1000-03 G1000      1      1      38  10.2    1 control
> > > > > 2 fam1001-03 G1001      1      1      15   7.3    1 control
> > > > > 3 fam1003-03 G1003      1      2      17   7.0    1    case
> > > > > 4 fam1005-03 G1005      1      1      36   7.7    1 control
> > > > > 5 fam1009-03 G1009      1      1      23   7.6    1 control
> > > > > 6 fam1052-03 G1052      1      1      32   7.3    1 control
> > > > >
> > > > > > dim(a)
> > > > > [1] 1698    8
> > > > >
> > > > > I am doing histogram plot via:
> > > > > ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
> > > > > position="dodge")
> > > > >
> > > > > there is 848 who have "case" in pheno column and 892 who have
> > > > > "control" in pheno column.
> > > > >
> > > > > I would like to have on y-axis shown percentage of individuals which
> > > > > have either "case" or "control" in pheno instead of count.
> > > > >
> > > > > Please advise,
> > > > > Ana
> > > > ______________________________________________
> > > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Screen Shot 2020-05-22 at 9.42.21 AM.png (76K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: how to show percentage of individuals for two groups on histogram?

Jim Lemon-4
In reply to this post by anikaM
Hi Ana,
I think this is what you want in the panel style of plot. Let me know
if not, or if I have calculated the wrong percentages. The overlaid
histograms definitely use a different calculation.

amsdf<-read.table("pheno_m1_plot",header=TRUE,stringsAsFactors=FALSE)
dim(amsdf)
# find the right breaks for your "cut"
casen<-table(cut(amsdf$HBA1C[amsdf$pheno==2],breaks=3:14))
controln<-table(cut(amsdf$HBA1C[amsdf$pheno==1],breaks=3:14))
# save yourself some typing
HBA1C2<-amsdf$HBA1C[amsdf$pheno==2]
HBA1C1<-amsdf$HBA1C[amsdf$pheno==1]
ncases<-length(HBA1C2)
ncontrols<-length(HBA1C1)
split.screen(matrix(c(0,1,0.6,1,0,1,0,0.6),nrow=2,byrow=TRUE))
par(mar=c(0,4,1,2))
barpos=barplot(100*casen/ncases,names.arg=NA,col="orange",
 space=0,ylab="Percentage",xaxt="n",ylim=c(0,27))
case_text<-sprintf(
 "Cases: n=%d, nulls=%d, median=%.1f, mean=%.1f, sd=%.1f",
 length(HBA1C2),sum(is.na(HBA1C2)),round(median(HBA1C2,na.rm=TRUE),1),
 round(mean(HBA1C2,na.rm=TRUE),1),round(sd(HBA1C2,na.rm=TRUE),1))
text(mean(barpos),25,case_text)
box()
screen(2)
par(mar=c(4,4,0,2))
barplot(100*controln/ncontrols,names.arg=NA,
 space=0,ylab="Percentage",col="orange",ylim=c(0,34))
control_text<-sprintf(
 "Cases: n=%d, nulls=%d, median=%.1f, mean=%.1f, sd=%.1f",
 length(HBA1C1),sum(is.na(HBA1C1)),round(median(HBA1C1,na.rm=TRUE),1),
 round(mean(HBA1C1,na.rm=TRUE),1),round(sd(HBA1C1,na.rm=TRUE),1))
text(mean(barpos),32,control_text)
box()
library(plotrix)
staxlab(1,at=barpos,labels=names(casen))

Jim

On Sat, May 23, 2020 at 9:01 AM Ana Marija <[hidden email]> wrote:

>
> Hi Jim,
>
> My data is attached. It is most kind of you for looking into this!
>
> Cheers,
> Ana
>
> On Fri, May 22, 2020 at 5:49 PM Jim Lemon <[hidden email]> wrote:
> >
> > Hi Ana,
> > As I had very little idea what your data looked like, what I made up
> > obviously didn't fit in the plot that well. If you can send the data I
> > can make a better attempt. The other thing is whether you want a plot
> > with two adjacent panels (what I sent) or overlaid histograms (what
> > Eric sent). Let me know.
> >
> > Jim
> >
> > On Sat, May 23, 2020 at 12:45 AM Ana Marija <[hidden email]> wrote:
> > >
> > > HI Jim,
> > >
> > > Thank you so much for getting back to me I tried your codes and I got
> > > this in attach,
> > > I think the issue is in calculating percentage per groups (cases or controls)
> > > ...
> > > I can send you the whole dataset if you would like to try with it
> > > On Thu, May 21, 2020 at 11:14 PM Jim Lemon <[hidden email]> wrote:

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

ams1.png (24K) Download Attachment