Compute the Gini coefficient

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Compute the Gini coefficient

Marine
Hello,

I would like to build a Lorenz curve and calculate a Gini coefficient in order to find how much parasites does the top 20% most infected hosts support.

Here is my data set:

Number of parasites per host:
parasites = c(0,1,2,3,4,5,6,7,8,9,10)

Number of hosts associated with each number of parasites given above:
hosts = c(18,20,28,19,16,10,3,1,0,0,0)

To represent the Lorenz curve:
I manually calculated the cumulative percentage of parasites and hosts:

cumul_parasites <- cumsum(parasites)/max(cumsum(parasites))
cumul_hosts <- cumsum(hosts)/max(cumsum(hosts))
plot(cumul_hosts, cumul_parasites, type= "l")

>From this Lorenz curve, how can I calculate the Gini coefficient with the function "gini" in R (package reldist) given that the vector "hosts" is not a vector of weights ?

Thank you very much for your help.
Have a nice day
Marine


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Compute the Gini coefficient

Erich Neuwirth

> On 30 Mar 2016, at 02:53, Marine Regis <[hidden email]> wrote:
>
> Hello,
>
> I would like to build a Lorenz curve and calculate a Gini coefficient in order to find how much parasites does the top 20% most infected hosts support.
>
> Here is my data set:
>
> Number of parasites per host:
> parasites = c(0,1,2,3,4,5,6,7,8,9,10)
>
> Number of hosts associated with each number of parasites given above:
> hosts = c(18,20,28,19,16,10,3,1,0,0,0)
>
> To represent the Lorenz curve:
> I manually calculated the cumulative percentage of parasites and hosts:
>
> cumul_parasites <- cumsum(parasites)/max(cumsum(parasites))
> cumul_hosts <- cumsum(hosts)/max(cumsum(hosts))
> plot(cumul_hosts, cumul_parasites, type= "l”)

Your values in hosts are frequencies. So you need to calculate

cumul_hosts = cumsum(hosts)/sum(hosts)
cumul_parasites = cumsum(hosts*parasites)/sum(parasites)

The Lorenz curves starts at (0,0), so to draw it, you need to extend these vectors

cumul_hosts = c(0,cumul_hosts)
cumul_parasites = c(0,cumul_parasites)

plot(cumul_hosts,cum9l_parasites,type=“l”)


The Gini coefficient can be calculated as
library(reldist)
gini(parasites,hosts)


If you want to check, you can “recreate” the original data (number of parasited for each host) with

num_parasites = rep(parasites,hosts)

and
gini(num_parasites)

will also give you the Gini coefficient you want.



>

>> From this Lorenz curve, how can I calculate the Gini coefficient with the function "gini" in R (package reldist) given that the vector "hosts" is not a vector of weights ?
>
> Thank you very much for your help.
> Have a nice day
> Marine
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

signature.asc (684 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Compute the Gini coefficient

Achim Zeileis-4
On Wed, 30 Mar 2016, Erich Neuwirth wrote:

>
>> On 30 Mar 2016, at 02:53, Marine Regis <[hidden email]> wrote:
>>
>> Hello,
>>
>> I would like to build a Lorenz curve and calculate a Gini coefficient in order to find how much parasites does the top 20% most infected hosts support.
>>
>> Here is my data set:
>>
>> Number of parasites per host:
>> parasites = c(0,1,2,3,4,5,6,7,8,9,10)
>>
>> Number of hosts associated with each number of parasites given above:
>> hosts = c(18,20,28,19,16,10,3,1,0,0,0)
>>
>> To represent the Lorenz curve:
>> I manually calculated the cumulative percentage of parasites and hosts:
>>
>> cumul_parasites <- cumsum(parasites)/max(cumsum(parasites))
>> cumul_hosts <- cumsum(hosts)/max(cumsum(hosts))
>> plot(cumul_hosts, cumul_parasites, type= "l?)
>
>
> Your values in hosts are frequencies. So you need to calculate
>
> cumul_hosts = cumsum(hosts)/sum(hosts)
> cumul_parasites = cumsum(hosts*parasites)/sum(parasites)

That's what I thought as well but Marine explicitly said that the 'host'
are _not_ weights. Hence I was confused what this would actually mean.

Using the "ineq" package you can also do
plot(Lc(parasites, hosts))

> The Lorenz curves starts at (0,0), so to draw it, you need to extend these vectors
>
> cumul_hosts = c(0,cumul_hosts)
> cumul_parasites = c(0,cumul_parasites)
>
> plot(cumul_hosts,cum9l_parasites,type=?l?)
>
>
> The Gini coefficient can be calculated as
> library(reldist)
> gini(parasites,hosts)
>
>
> If you want to check, you can ?recreate? the original data (number of parasited for each host) with
>
> num_parasites = rep(parasites,hosts)
>
> and
> gini(num_parasites)
>
> will also give you the Gini coefficient you want.
>
>
>
>>
>
>>> From this Lorenz curve, how can I calculate the Gini coefficient with the function "gini" in R (package reldist) given that the vector "hosts" is not a vector of weights ?
>>
>> Thank you very much for your help.
>> Have a nice day
>> Marine
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Compute the Gini coefficient

Marine
Hello,

Thank you very much for your help.

How can I draw a Lorenz curve with several replications ?

Here is an example with 4 replications:

hosts=c(23,31,19,10,7,7,3,
        39,40,8,3,6,2,2,
        47,17,8,10,6,11,1,
        30,30,10,0,15,15,0)
parasites=rep(seq(from=0,to=6,by=1),4)
replications=c(rep(1,7),rep(2,7),rep(3,7),rep(4,7))
test <- cbind(parasites,hosts,replications)

Should I calculate the average frequency of hosts (replication mean values) and next calculate the cumulative percentage of hosts from the average frequency ?

Thank you very much for your time.
Have a nice day.
Marine
________________________________________
De : Achim Zeileis <[hidden email]>
Envoyé : mercredi 30 mars 2016 12:05
À : Erich Neuwirth
Cc : Marine Regis; [hidden email]
Objet : Re: [R] Compute the Gini coefficient

On Wed, 30 Mar 2016, Erich Neuwirth wrote:

>
>> On 30 Mar 2016, at 02:53, Marine Regis <[hidden email]> wrote:
>>
>> Hello,
>>
>> I would like to build a Lorenz curve and calculate a Gini coefficient in order to find how much parasites does the top 20% most infected hosts support.
>>
>> Here is my data set:
>>
>> Number of parasites per host:
>> parasites = c(0,1,2,3,4,5,6,7,8,9,10)
>>
>> Number of hosts associated with each number of parasites given above:
>> hosts = c(18,20,28,19,16,10,3,1,0,0,0)
>>
>> To represent the Lorenz curve:
>> I manually calculated the cumulative percentage of parasites and hosts:
>>
>> cumul_parasites <- cumsum(parasites)/max(cumsum(parasites))
>> cumul_hosts <- cumsum(hosts)/max(cumsum(hosts))
>> plot(cumul_hosts, cumul_parasites, type= "l?)
>
>
> Your values in hosts are frequencies. So you need to calculate
>
> cumul_hosts = cumsum(hosts)/sum(hosts)
> cumul_parasites = cumsum(hosts*parasites)/sum(parasites)

That's what I thought as well but Marine explicitly said that the 'host'
are _not_ weights. Hence I was confused what this would actually mean.

Using the "ineq" package you can also do
plot(Lc(parasites, hosts))

> The Lorenz curves starts at (0,0), so to draw it, you need to extend these vectors
>
> cumul_hosts = c(0,cumul_hosts)
> cumul_parasites = c(0,cumul_parasites)
>
> plot(cumul_hosts,cum9l_parasites,type=?l?)
>
>
> The Gini coefficient can be calculated as
> library(reldist)
> gini(parasites,hosts)
>
>
> If you want to check, you can ?recreate? the original data (number of parasited for each host) with
>
> num_parasites = rep(parasites,hosts)
>
> and
> gini(num_parasites)
>
> will also give you the Gini coefficient you want.
>
>
>
>>
>
>>> From this Lorenz curve, how can I calculate the Gini coefficient with the function "gini" in R (package reldist) given that the vector "hosts" is not a vector of weights ?
>>
>> Thank you very much for your help.
>> Have a nice day
>> Marine
>>
>>
>>      [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.