# Compute the Gini coefficient

4 messages
Open this post in threaded view
|

## Compute the Gini coefficient

 Hello, I would like to build a Lorenz curve and calculate a Gini coefficient in order to find how much parasites does the top 20% most infected hosts support. Here is my data set: Number of parasites per host: parasites = c(0,1,2,3,4,5,6,7,8,9,10) Number of hosts associated with each number of parasites given above: hosts = c(18,20,28,19,16,10,3,1,0,0,0) To represent the Lorenz curve: I manually calculated the cumulative percentage of parasites and hosts: cumul_parasites <- cumsum(parasites)/max(cumsum(parasites)) cumul_hosts <- cumsum(hosts)/max(cumsum(hosts)) plot(cumul_hosts, cumul_parasites, type= "l") >From this Lorenz curve, how can I calculate the Gini coefficient with the function "gini" in R (package reldist) given that the vector "hosts" is not a vector of weights ? Thank you very much for your help. Have a nice day Marine         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: Compute the Gini coefficient

 > On 30 Mar 2016, at 02:53, Marine Regis <[hidden email]> wrote: > > Hello, > > I would like to build a Lorenz curve and calculate a Gini coefficient in order to find how much parasites does the top 20% most infected hosts support. > > Here is my data set: > > Number of parasites per host: > parasites = c(0,1,2,3,4,5,6,7,8,9,10) > > Number of hosts associated with each number of parasites given above: > hosts = c(18,20,28,19,16,10,3,1,0,0,0) > > To represent the Lorenz curve: > I manually calculated the cumulative percentage of parasites and hosts: > > cumul_parasites <- cumsum(parasites)/max(cumsum(parasites)) > cumul_hosts <- cumsum(hosts)/max(cumsum(hosts)) > plot(cumul_hosts, cumul_parasites, type= "l”) Your values in hosts are frequencies. So you need to calculate cumul_hosts = cumsum(hosts)/sum(hosts) cumul_parasites = cumsum(hosts*parasites)/sum(parasites) The Lorenz curves starts at (0,0), so to draw it, you need to extend these vectors cumul_hosts = c(0,cumul_hosts) cumul_parasites = c(0,cumul_parasites) plot(cumul_hosts,cum9l_parasites,type=“l”) The Gini coefficient can be calculated as library(reldist) gini(parasites,hosts) If you want to check, you can “recreate” the original data (number of parasited for each host) with num_parasites = rep(parasites,hosts) and gini(num_parasites) will also give you the Gini coefficient you want. > >> From this Lorenz curve, how can I calculate the Gini coefficient with the function "gini" in R (package reldist) given that the vector "hosts" is not a vector of weights ? > > Thank you very much for your help. > Have a nice day > Marine > > > [[alternative HTML version deleted]] > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code. signature.asc (684 bytes) Download Attachment