

Hello,
I am trying to perform a Kolmogorov–Smirnov test to assess the difference between a distribution and samples drawn proportionally to size of different sizes. I managed to compute the Kolmogorov–Smirnov distance but I am lost with the pvalue. I have looked into the ks.test function unsuccessfully. Can anyone help me with computing pvalues for a twotailed test?
Below a simplified version of my code.
Thanks in advance.
Gianluca
library(spatstat)
#reference distribution
d_1 < sort(rpois(1000, 500))
p_1 < d_1/sum(d_1)
m_1 < data.frame(d_1, p_1)
#data frame to store the values of the siumation
d_stat < data.frame(1:1000, NA, NA)
names(d_stat) < c("sample_size", "ks_distance", "p_value")
#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
m_2 <m_2[order(m_2$d_1),]
d_2 < m_2$d_1
p_2 < m_2$p_1
#weighted ecdf for the reference distribution and the sample
f_d_1 < ewcdf(d_1, normalise=F)
f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
#kolmogorovsmirnov distance
d_stat[i,2] < max(abs(f_d_1(d_2)  f_d_2(d_2)))
}
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
I don't have the algorithms at hand but the KS statistic calculation is
more complicated than your max/abs difference.
Anyway, why not use ks.test? it's not that difficult:
set.seed(1234)
#reference distribution
d_1 < sort(rpois(1000, 500))
p_1 < d_1/sum(d_1)
m_1 < data.frame(d_1, p_1)
#data frame to store the values of the simulation
d_stat < data.frame(1:1000, NA, NA)
names(d_stat) < c("sample_size", "ks_distance", "p_value")
#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
d_2 < m_2$d_1
ht < ks.test(d_1, d_2)
#kolmogorovsmirnov distance
d_stat[i, 2] < ht$statistic
d_stat[i, 3] < ht$p.value
}
hist(d_stat[, 2])
hist(d_stat[, 3])
Note that d_2 is not sorted, but the results are equal in the sense of
function identical(), meaning they are *exactly* the same. Why shouldn't
they?
Hope this helps,
Rui Barradas
Às 17:06 de 05/09/19, Boo G. escreveu:
> Hello,
>
> I am trying to perform a Kolmogorov–Smirnov test to assess the difference between a distribution and samples drawn proportionally to size of different sizes. I managed to compute the Kolmogorov–Smirnov distance but I am lost with the pvalue. I have looked into the ks.test function unsuccessfully. Can anyone help me with computing pvalues for a twotailed test?
>
> Below a simplified version of my code.
>
> Thanks in advance.
> Gianluca
>
>
> library(spatstat)
>
> #reference distribution
> d_1 < sort(rpois(1000, 500))
> p_1 < d_1/sum(d_1)
> m_1 < data.frame(d_1, p_1)
>
> #data frame to store the values of the siumation
> d_stat < data.frame(1:1000, NA, NA)
> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>
> #simulation
> for (i in 1:1000) {
> #sample from the reference distribution
> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
> m_2 <m_2[order(m_2$d_1),]
> d_2 < m_2$d_1
> p_2 < m_2$p_1
>
> #weighted ecdf for the reference distribution and the sample
> f_d_1 < ewcdf(d_1, normalise=F)
> f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
>
> #kolmogorovsmirnov distance
> d_stat[i,2] < max(abs(f_d_1(d_2)  f_d_2(d_2)))
> }
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Thanks for your reply, Rui.
I don’t think that I can use directly the ks.test because I have a weighted sample (see m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]) and I want to account for that. That’s why I am trying to compute everything manually.
Also, if you look at the results of the ks.test in your simulation, you will notice that the pvalue always implies that the sample is always (even with same size = 1) drawn form the same distribution. This looks suspicious to me.
What are your thoughts?
>
> On 5 Sep 2019, at 20:29, Rui Barradas < [hidden email]> wrote:
>
> Hello,
>
> I don't have the algorithms at hand but the KS statistic calculation is more complicated than your max/abs difference.
>
> Anyway, why not use ks.test? it's not that difficult:
>
>
> set.seed(1234)
> #reference distribution
> d_1 < sort(rpois(1000, 500))
> p_1 < d_1/sum(d_1)
> m_1 < data.frame(d_1, p_1)
>
> #data frame to store the values of the simulation
> d_stat < data.frame(1:1000, NA, NA)
> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>
> #simulation
> for (i in 1:1000) {
> #sample from the reference distribution
> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
> d_2 < m_2$d_1
>
> ht < ks.test(d_1, d_2)
> #kolmogorovsmirnov distance
> d_stat[i, 2] < ht$statistic
> d_stat[i, 3] < ht$p.value
> }
>
> hist(d_stat[, 2])
> hist(d_stat[, 3])
>
>
> Note that d_2 is not sorted, but the results are equal in the sense of function identical(), meaning they are *exactly* the same. Why shouldn't they?
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 17:06 de 05/09/19, Boo G. escreveu:
>> Hello,
>> I am trying to perform a Kolmogorov–Smirnov test to assess the difference between a distribution and samples drawn proportionally to size of different sizes. I managed to compute the Kolmogorov–Smirnov distance but I am lost with the pvalue. I have looked into the ks.test function unsuccessfully. Can anyone help me with computing pvalues for a twotailed test?
>> Below a simplified version of my code.
>> Thanks in advance.
>> Gianluca
>> library(spatstat)
>> #reference distribution
>> d_1 < sort(rpois(1000, 500))
>> p_1 < d_1/sum(d_1)
>> m_1 < data.frame(d_1, p_1)
>> #data frame to store the values of the siumation
>> d_stat < data.frame(1:1000, NA, NA)
>> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>> #simulation
>> for (i in 1:1000) {
>> #sample from the reference distribution
>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>> m_2 <m_2[order(m_2$d_1),]
>> d_2 < m_2$d_1
>> p_2 < m_2$p_1
>> #weighted ecdf for the reference distribution and the sample
>> f_d_1 < ewcdf(d_1, normalise=F)
>> f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
>> #kolmogorovsmirnov distance
>> d_stat[i,2] < max(abs(f_d_1(d_2)  f_d_2(d_2)))
>> }
>> [[alternative HTML version deleted]]
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Frhelp&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C0c709068527c41e062dd08d7322f0d72%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=y9jfixyNiroKwKZEJj0owuCcWoeFQKZdaG9WLe2xHQ8%3D&reserved=0>> PLEASE do read the posting guide https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.Rproject.org%2Fpostingguide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C0c709068527c41e062dd08d7322f0d72%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=7n0doy4P1S1TpApX1zpUborAnUnxuOxYtn%2FQ%2BtVztGM%3D&reserved=0>> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
I'm sorry, but apparently I missed the point of your problem.
Please do not take my previous answer seriously.
But you can use ks.test, just in a different way than what I wrote
previously.
Corrected code:
#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
m_2 <m_2[order(m_2$d_1),]
d_2 < m_2$d_1
p_2 < m_2$p_1
#weighted ecdf for the reference distribution and the sample
f_d_1 < ewcdf(d_1, normalise=F)
f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
#kolmogorovsmirnov distance
x < f_d_1(d_2)
y < f_d_2(d_2)
ht < ks.test(x, y)
d_stat[i, 2] < ht$statistic
d_stat[i, 3] < ht$p.value
}
Hope this helps,
Rui Barradas
Às 19:29 de 05/09/19, Rui Barradas escreveu:
> Hello,
>
> I don't have the algorithms at hand but the KS statistic calculation is
> more complicated than your max/abs difference.
>
> Anyway, why not use ks.test? it's not that difficult:
>
>
> set.seed(1234)
> #reference distribution
> d_1 < sort(rpois(1000, 500))
> p_1 < d_1/sum(d_1)
> m_1 < data.frame(d_1, p_1)
>
> #data frame to store the values of the simulation
> d_stat < data.frame(1:1000, NA, NA)
> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>
> #simulation
> for (i in 1:1000) {
> #sample from the reference distribution
> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
> d_2 < m_2$d_1
>
> ht < ks.test(d_1, d_2)
> #kolmogorovsmirnov distance
> d_stat[i, 2] < ht$statistic
> d_stat[i, 3] < ht$p.value
> }
>
> hist(d_stat[, 2])
> hist(d_stat[, 3])
>
>
> Note that d_2 is not sorted, but the results are equal in the sense of
> function identical(), meaning they are *exactly* the same. Why shouldn't
> they?
>
> Hope this helps,
>
> Rui Barradas
>
>
> Às 17:06 de 05/09/19, Boo G. escreveu:
>> Hello,
>>
>> I am trying to perform a Kolmogorov–Smirnov test to assess the
>> difference between a distribution and samples drawn proportionally to
>> size of different sizes. I managed to compute the Kolmogorov–Smirnov
>> distance but I am lost with the pvalue. I have looked into the
>> ks.test function unsuccessfully. Can anyone help me with computing
>> pvalues for a twotailed test?
>>
>> Below a simplified version of my code.
>>
>> Thanks in advance.
>> Gianluca
>>
>>
>> library(spatstat)
>>
>> #reference distribution
>> d_1 < sort(rpois(1000, 500))
>> p_1 < d_1/sum(d_1)
>> m_1 < data.frame(d_1, p_1)
>>
>> #data frame to store the values of the siumation
>> d_stat < data.frame(1:1000, NA, NA)
>> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>>
>> #simulation
>> for (i in 1:1000) {
>> #sample from the reference distribution
>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>> m_2 <m_2[order(m_2$d_1),]
>> d_2 < m_2$d_1
>> p_2 < m_2$p_1
>>
>> #weighted ecdf for the reference distribution and the sample
>> f_d_1 < ewcdf(d_1, normalise=F)
>> f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
>>
>> #kolmogorovsmirnov distance
>> d_stat[i,2] < max(abs(f_d_1(d_2)  f_d_2(d_2)))
>> }
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list  To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/rhelp>> PLEASE do read the posting guide
>> http://www.Rproject.org/postingguide.html>> and provide commented, minimal, selfcontained, reproducible code.
>>
>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello again.
I have tied this before but I see two problems:
1) According to the documentation I could read (including the ks.test code), the ks statistic would be max(abs(x  y)) and if you plot this for very low sample sizes you can actually see that this make sense. The results of ks.test(x, y) yields very different values.
2) Also in this case the pvalues don’t make much sense, according to my previous interpretation.
Again, I could be wrong in my interpretation.
On 5 Sep 2019, at 20:46, Rui Barradas < [hidden email]<mailto: [hidden email]>> wrote:
Hello,
I'm sorry, but apparently I missed the point of your problem.
Please do not take my previous answer seriously.
But you can use ks.test, just in a different way than what I wrote previously.
Corrected code:
#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
m_2 <m_2[order(m_2$d_1),]
d_2 < m_2$d_1
p_2 < m_2$p_1
#weighted ecdf for the reference distribution and the sample
f_d_1 < ewcdf(d_1, normalise=F)
f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
#kolmogorovsmirnov distance
x < f_d_1(d_2)
y < f_d_2(d_2)
ht < ks.test(x, y)
d_stat[i, 2] < ht$statistic
d_stat[i, 3] < ht$p.value
}
Hope this helps,
Rui Barradas
Às 19:29 de 05/09/19, Rui Barradas escreveu:
Hello,
I don't have the algorithms at hand but the KS statistic calculation is more complicated than your max/abs difference.
Anyway, why not use ks.test? it's not that difficult:
set.seed(1234)
#reference distribution
d_1 < sort(rpois(1000, 500))
p_1 < d_1/sum(d_1)
m_1 < data.frame(d_1, p_1)
#data frame to store the values of the simulation
d_stat < data.frame(1:1000, NA, NA)
names(d_stat) < c("sample_size", "ks_distance", "p_value")
#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
d_2 < m_2$d_1
ht < ks.test(d_1, d_2)
#kolmogorovsmirnov distance
d_stat[i, 2] < ht$statistic
d_stat[i, 3] < ht$p.value
}
hist(d_stat[, 2])
hist(d_stat[, 3])
Note that d_2 is not sorted, but the results are equal in the sense of function identical(), meaning they are *exactly* the same. Why shouldn't they?
Hope this helps,
Rui Barradas
Às 17:06 de 05/09/19, Boo G. escreveu:
Hello,
I am trying to perform a Kolmogorov–Smirnov test to assess the difference between a distribution and samples drawn proportionally to size of different sizes. I managed to compute the Kolmogorov–Smirnov distance but I am lost with the pvalue. I have looked into the ks.test function unsuccessfully. Can anyone help me with computing pvalues for a twotailed test?
Below a simplified version of my code.
Thanks in advance.
Gianluca
library(spatstat)
#reference distribution
d_1 < sort(rpois(1000, 500))
p_1 < d_1/sum(d_1)
m_1 < data.frame(d_1, p_1)
#data frame to store the values of the siumation
d_stat < data.frame(1:1000, NA, NA)
names(d_stat) < c("sample_size", "ks_distance", "p_value")
#simulation
for (i in 1:1000) {
#sample from the reference distribution
m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
m_2 <m_2[order(m_2$d_1),]
d_2 < m_2$d_1
p_2 < m_2$p_1
#weighted ecdf for the reference distribution and the sample
f_d_1 < ewcdf(d_1, normalise=F)
f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
#kolmogorovsmirnov distance
d_stat[i,2] < max(abs(f_d_1(d_2)  f_d_2(d_2)))
}
[[alternative HTML version deleted]]
______________________________________________
[hidden email]<mailto: [hidden email]> mailing list  To UNSUBSCRIBE and more, see
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Frhelp&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=ZntDJlPtp%2Bu%2FeO7xNZLbUQgLwpvS1M%2FUVNwovp%2FZPmA%3D&reserved=0PLEASE do read the posting guide https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.Rproject.org%2Fpostingguide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=YKZNBeGme6V9QGyV2%2F150H6rZnLy9bX7xly%2BQEf6O14%3D&reserved=0and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email]<mailto: [hidden email]> mailing list  To UNSUBSCRIBE and more, see
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Frhelp&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=ZntDJlPtp%2Bu%2FeO7xNZLbUQgLwpvS1M%2FUVNwovp%2FZPmA%3D&reserved=0PLEASE do read the posting guide https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.Rproject.org%2Fpostingguide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=YKZNBeGme6V9QGyV2%2F150H6rZnLy9bX7xly%2BQEf6O14%3D&reserved=0and provide commented, minimal, selfcontained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
Inline.
Às 20:09 de 05/09/19, Boo G. escreveu:
> Hello again.
>
> I have tied this before but I see two problems:
>
> 1) According to the documentation I could read (including the ks.test
> code), the ks statistic would be max(abs(x  y)) and if you plot this
> for very low sample sizes you can actually see that this make sense. The
> results of ks.test(x, y) yields very different values.
The problem is that the distribution of Dn is very difficult to compute.
From the reference [1] in the help page ?ks.test:
Kolmogorov's goodnessoffit measure, Dn , for a sample CDF has
consistently been set aside for methods such as the D+n or Dn of
Smirnov, primarily, it seems, because of the difficulty of computing the
distribution of Dn . As far as we know, no easy way to compute that
distribution has ever been provided in the 70+ years since Kolmogorov's
fundamental paper. We provide one here, a C procedure that provides
Pr(Dn < d) with 1315 digit accuracy for n ranging from 2 to at least 16000.
That is why I used ks.test and its Dn and pvalues. Note that n >= 2,
size = 1 is not covered (pvalue == 1).
Also, the pvalues distribution seem to become closer to a uniform with
increasing sizes. Try
hist(d_stat[801:1000, 3])
[1] https://www.jstatsoft.org/article/view/v008i18Hope this helps,
Rui Barradas
>
> 2) Also in this case the pvalues don’t make much sense, according to my
> previous interpretation.
>
> Again, I could be wrong in my interpretation.
>
>> On 5 Sep 2019, at 20:46, Rui Barradas < [hidden email]
>> <mailto: [hidden email]>> wrote:
>>
>> Hello,
>>
>> I'm sorry, but apparently I missed the point of your problem.
>> Please do not take my previous answer seriously.
>>
>> But you can use ks.test, just in a different way than what I wrote
>> previously.
>>
>> Corrected code:
>>
>>
>> #simulation
>> for (i in 1:1000) {
>> #sample from the reference distribution
>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>> m_2 <m_2[order(m_2$d_1),]
>> d_2 < m_2$d_1
>> p_2 < m_2$p_1
>>
>> #weighted ecdf for the reference distribution and the sample
>> f_d_1 < ewcdf(d_1, normalise=F)
>> f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
>>
>> #kolmogorovsmirnov distance
>> x < f_d_1(d_2)
>> y < f_d_2(d_2)
>> ht < ks.test(x, y)
>> d_stat[i, 2] < ht$statistic
>> d_stat[i, 3] < ht$p.value
>> }
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Às 19:29 de 05/09/19, Rui Barradas escreveu:
>>> Hello,
>>> I don't have the algorithms at hand but the KS statistic calculation
>>> is more complicated than your max/abs difference.
>>> Anyway, why not use ks.test? it's not that difficult:
>>> set.seed(1234)
>>> #reference distribution
>>> d_1 < sort(rpois(1000, 500))
>>> p_1 < d_1/sum(d_1)
>>> m_1 < data.frame(d_1, p_1)
>>> #data frame to store the values of the simulation
>>> d_stat < data.frame(1:1000, NA, NA)
>>> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>>> #simulation
>>> for (i in 1:1000) {
>>> #sample from the reference distribution
>>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>>> d_2 < m_2$d_1
>>> ht < ks.test(d_1, d_2)
>>> #kolmogorovsmirnov distance
>>> d_stat[i, 2] < ht$statistic
>>> d_stat[i, 3] < ht$p.value
>>> }
>>> hist(d_stat[, 2])
>>> hist(d_stat[, 3])
>>> Note that d_2 is not sorted, but the results are equal in the sense
>>> of function identical(), meaning they are *exactly* the same. Why
>>> shouldn't they?
>>> Hope this helps,
>>> Rui Barradas
>>> Às 17:06 de 05/09/19, Boo G. escreveu:
>>>> Hello,
>>>>
>>>> I am trying to perform a Kolmogorov–Smirnov test to assess the
>>>> difference between a distribution and samples drawn proportionally
>>>> to size of different sizes. I managed to compute the
>>>> Kolmogorov–Smirnov distance but I am lost with the pvalue. I have
>>>> looked into the ks.test function unsuccessfully. Can anyone help me
>>>> with computing pvalues for a twotailed test?
>>>>
>>>> Below a simplified version of my code.
>>>>
>>>> Thanks in advance.
>>>> Gianluca
>>>>
>>>>
>>>> library(spatstat)
>>>>
>>>> #reference distribution
>>>> d_1 < sort(rpois(1000, 500))
>>>> p_1 < d_1/sum(d_1)
>>>> m_1 < data.frame(d_1, p_1)
>>>>
>>>> #data frame to store the values of the siumation
>>>> d_stat < data.frame(1:1000, NA, NA)
>>>> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>>>>
>>>> #simulation
>>>> for (i in 1:1000) {
>>>> #sample from the reference distribution
>>>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>>>> m_2 <m_2[order(m_2$d_1),]
>>>> d_2 < m_2$d_1
>>>> p_2 < m_2$p_1
>>>>
>>>> #weighted ecdf for the reference distribution and the sample
>>>> f_d_1 < ewcdf(d_1, normalise=F)
>>>> f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
>>>>
>>>> #kolmogorovsmirnov distance
>>>> d_stat[i,2] < max(abs(f_d_1(d_2)  f_d_2(d_2)))
>>>> }
>>>>
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> [hidden email] <mailto: [hidden email]> mailing list 
>>>> To UNSUBSCRIBE and more, see
>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Frhelp&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=ZntDJlPtp%2Bu%2FeO7xNZLbUQgLwpvS1M%2FUVNwovp%2FZPmA%3D&reserved=0>>>> PLEASE do read the posting
>>>> guidehttps://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.Rproject.org%2Fpostingguide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=YKZNBeGme6V9QGyV2%2F150H6rZnLy9bX7xly%2BQEf6O14%3D&reserved=0
>>>> and provide commented, minimal, selfcontained, reproducible code.
>>>>
>>> ______________________________________________
>>> [hidden email] <mailto: [hidden email]>mailing list  To
>>> UNSUBSCRIBE and more, see
>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Frhelp&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=ZntDJlPtp%2Bu%2FeO7xNZLbUQgLwpvS1M%2FUVNwovp%2FZPmA%3D&reserved=0>>> PLEASE do read the posting
>>> guidehttps://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.Rproject.org%2Fpostingguide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=YKZNBeGme6V9QGyV2%2F150H6rZnLy9bX7xly%2BQEf6O14%3D&reserved=0
>>> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello and thanks for your patience.
As far as I understand, the paper of Marsiglia and colleagues refers to CDF samples (i.e. from a hypothetical distribution — e.g. a Poisson), while I have an ECDF sample (i.e. (pseudo)observed data — e.g. rpois(1000, 500). In my study, I am actually comparing the statistical distribution of people per hectares in a region (~50,000 observations) with samples from that distribution.
Agree that we cannot consider at pvalues for low sample size. Still, somewhere above 100, I expect to see pvalues between 0 and 0.05 (rejecting the fact that the sample comes from the reference distribution). Ideally, I would try the procedure suggested by Marsiglia to compute pvalues but this is far beyond my coding skills.
> On 5 Sep 2019, at 21:21, Rui Barradas < [hidden email]> wrote:
>
> Hello,
>
> Inline.
>
> Às 20:09 de 05/09/19, Boo G. escreveu:
>> Hello again.
>> I have tied this before but I see two problems:
>> 1) According to the documentation I could read (including the ks.test code), the ks statistic would be max(abs(x  y)) and if you plot this for very low sample sizes you can actually see that this make sense. The results of ks.test(x, y) yields very different values.
>
> The problem is that the distribution of Dn is very difficult to compute. From the reference [1] in the help page ?ks.test:
>
> Kolmogorov's goodnessoffit measure, Dn , for a sample CDF has consistently been set aside for methods such as the D+n or Dn of Smirnov, primarily, it seems, because of the difficulty of computing the distribution of Dn . As far as we know, no easy way to compute that distribution has ever been provided in the 70+ years since Kolmogorov's fundamental paper. We provide one here, a C procedure that provides Pr(Dn < d) with 1315 digit accuracy for n ranging from 2 to at least 16000.
>
>
> That is why I used ks.test and its Dn and pvalues. Note that n >= 2, size = 1 is not covered (pvalue == 1).
>
> Also, the pvalues distribution seem to become closer to a uniform with increasing sizes. Try
>
> hist(d_stat[801:1000, 3])
>
>
> [1] https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jstatsoft.org%2Farticle%2Fview%2Fv008i18&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7Cfebe94a441914aa58a9908d732363a52%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=JBHkkXn6G4oLQZCV7HoqBLO4a3sMixTa16kOVFwXPlY%3D&reserved=0>
>
> Hope this helps,
>
> Rui Barradas
>
>> 2) Also in this case the pvalues don’t make much sense, according to my previous interpretation.
>> Again, I could be wrong in my interpretation.
>>> On 5 Sep 2019, at 20:46, Rui Barradas < [hidden email] <mailto: [hidden email]>> wrote:
>>>
>>> Hello,
>>>
>>> I'm sorry, but apparently I missed the point of your problem.
>>> Please do not take my previous answer seriously.
>>>
>>> But you can use ks.test, just in a different way than what I wrote previously.
>>>
>>> Corrected code:
>>>
>>>
>>> #simulation
>>> for (i in 1:1000) {
>>> #sample from the reference distribution
>>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>>> m_2 <m_2[order(m_2$d_1),]
>>> d_2 < m_2$d_1
>>> p_2 < m_2$p_1
>>>
>>> #weighted ecdf for the reference distribution and the sample
>>> f_d_1 < ewcdf(d_1, normalise=F)
>>> f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
>>>
>>> #kolmogorovsmirnov distance
>>> x < f_d_1(d_2)
>>> y < f_d_2(d_2)
>>> ht < ks.test(x, y)
>>> d_stat[i, 2] < ht$statistic
>>> d_stat[i, 3] < ht$p.value
>>> }
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Às 19:29 de 05/09/19, Rui Barradas escreveu:
>>>> Hello,
>>>> I don't have the algorithms at hand but the KS statistic calculation is more complicated than your max/abs difference.
>>>> Anyway, why not use ks.test? it's not that difficult:
>>>> set.seed(1234)
>>>> #reference distribution
>>>> d_1 < sort(rpois(1000, 500))
>>>> p_1 < d_1/sum(d_1)
>>>> m_1 < data.frame(d_1, p_1)
>>>> #data frame to store the values of the simulation
>>>> d_stat < data.frame(1:1000, NA, NA)
>>>> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>>>> #simulation
>>>> for (i in 1:1000) {
>>>> #sample from the reference distribution
>>>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>>>> d_2 < m_2$d_1
>>>> ht < ks.test(d_1, d_2)
>>>> #kolmogorovsmirnov distance
>>>> d_stat[i, 2] < ht$statistic
>>>> d_stat[i, 3] < ht$p.value
>>>> }
>>>> hist(d_stat[, 2])
>>>> hist(d_stat[, 3])
>>>> Note that d_2 is not sorted, but the results are equal in the sense of function identical(), meaning they are *exactly* the same. Why shouldn't they?
>>>> Hope this helps,
>>>> Rui Barradas
>>>> Às 17:06 de 05/09/19, Boo G. escreveu:
>>>>> Hello,
>>>>>
>>>>> I am trying to perform a Kolmogorov–Smirnov test to assess the difference between a distribution and samples drawn proportionally to size of different sizes. I managed to compute the Kolmogorov–Smirnov distance but I am lost with the pvalue. I have looked into the ks.test function unsuccessfully. Can anyone help me with computing pvalues for a twotailed test?
>>>>>
>>>>> Below a simplified version of my code.
>>>>>
>>>>> Thanks in advance.
>>>>> Gianluca
>>>>>
>>>>>
>>>>> library(spatstat)
>>>>>
>>>>> #reference distribution
>>>>> d_1 < sort(rpois(1000, 500))
>>>>> p_1 < d_1/sum(d_1)
>>>>> m_1 < data.frame(d_1, p_1)
>>>>>
>>>>> #data frame to store the values of the siumation
>>>>> d_stat < data.frame(1:1000, NA, NA)
>>>>> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>>>>>
>>>>> #simulation
>>>>> for (i in 1:1000) {
>>>>> #sample from the reference distribution
>>>>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>>>>> m_2 <m_2[order(m_2$d_1),]
>>>>> d_2 < m_2$d_1
>>>>> p_2 < m_2$p_1
>>>>>
>>>>> #weighted ecdf for the reference distribution and the sample
>>>>> f_d_1 < ewcdf(d_1, normalise=F)
>>>>> f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
>>>>>
>>>>> #kolmogorovsmirnov distance
>>>>> d_stat[i,2] < max(abs(f_d_1(d_2)  f_d_2(d_2)))
>>>>> }
>>>>>
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] <mailto: [hidden email]> mailing list  To UNSUBSCRIBE and more, see
>>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Frhelp&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7Cfebe94a441914aa58a9908d732363a52%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=mZZ05M0Qrjy8EelRpDiKpozWtLbF2kSGG%2BjzlAYyzf4%3D&reserved=0>>>>> PLEASE do read the posting guidehttps://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.Rproject.org%2Fpostingguide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=YKZNBeGme6V9QGyV2%2F150H6rZnLy9bX7xly%2BQEf6O14%3D&reserved=0
>>>>> and provide commented, minimal, selfcontained, reproducible code.
>>>>>
>>>> ______________________________________________
>>>> [hidden email] <mailto: [hidden email]>mailing list  To UNSUBSCRIBE and more, see
>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Frhelp&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7Cfebe94a441914aa58a9908d732363a52%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=mZZ05M0Qrjy8EelRpDiKpozWtLbF2kSGG%2BjzlAYyzf4%3D&reserved=0>>>> PLEASE do read the posting guidehttps://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.Rproject.org%2Fpostingguide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=YKZNBeGme6V9QGyV2%2F150H6rZnLy9bX7xly%2BQEf6O14%3D&reserved=0
>>>> and provide commented, minimal, selfcontained, reproducible code.
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Hello,
Yesterday wasn't one of my days.
The main problem I'm seeing is that the KS statistic is meant for
continuous data and you have counts data assumed to follow a Poisson
distribution. This might explain the nonsense results you are getting
from ks.test.
Have you considered a chisquared GOF test?
Hope this helps,
Rui Barradas
Às 20:51 de 05/09/19, Boo G. escreveu:
> Hello and thanks for your patience.
>
> As far as I understand, the paper of Marsiglia and colleagues refers to CDF samples (i.e. from a hypothetical distribution — e.g. a Poisson), while I have an ECDF sample (i.e. (pseudo)observed data — e.g. rpois(1000, 500). In my study, I am actually comparing the statistical distribution of people per hectares in a region (~50,000 observations) with samples from that distribution.
>
> Agree that we cannot consider at pvalues for low sample size. Still, somewhere above 100, I expect to see pvalues between 0 and 0.05 (rejecting the fact that the sample comes from the reference distribution). Ideally, I would try the procedure suggested by Marsiglia to compute pvalues but this is far beyond my coding skills.
>
>
>
>
>> On 5 Sep 2019, at 21:21, Rui Barradas < [hidden email]> wrote:
>>
>> Hello,
>>
>> Inline.
>>
>> Às 20:09 de 05/09/19, Boo G. escreveu:
>>> Hello again.
>>> I have tied this before but I see two problems:
>>> 1) According to the documentation I could read (including the ks.test code), the ks statistic would be max(abs(x  y)) and if you plot this for very low sample sizes you can actually see that this make sense. The results of ks.test(x, y) yields very different values.
>>
>> The problem is that the distribution of Dn is very difficult to compute. From the reference [1] in the help page ?ks.test:
>>
>> Kolmogorov's goodnessoffit measure, Dn , for a sample CDF has consistently been set aside for methods such as the D+n or Dn of Smirnov, primarily, it seems, because of the difficulty of computing the distribution of Dn . As far as we know, no easy way to compute that distribution has ever been provided in the 70+ years since Kolmogorov's fundamental paper. We provide one here, a C procedure that provides Pr(Dn < d) with 1315 digit accuracy for n ranging from 2 to at least 16000.
>>
>>
>> That is why I used ks.test and its Dn and pvalues. Note that n >= 2, size = 1 is not covered (pvalue == 1).
>>
>> Also, the pvalues distribution seem to become closer to a uniform with increasing sizes. Try
>>
>> hist(d_stat[801:1000, 3])
>>
>>
>> [1] https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.jstatsoft.org%2Farticle%2Fview%2Fv008i18&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7Cfebe94a441914aa58a9908d732363a52%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=JBHkkXn6G4oLQZCV7HoqBLO4a3sMixTa16kOVFwXPlY%3D&reserved=0>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>> 2) Also in this case the pvalues don’t make much sense, according to my previous interpretation.
>>> Again, I could be wrong in my interpretation.
>>>> On 5 Sep 2019, at 20:46, Rui Barradas < [hidden email] <mailto: [hidden email]>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm sorry, but apparently I missed the point of your problem.
>>>> Please do not take my previous answer seriously.
>>>>
>>>> But you can use ks.test, just in a different way than what I wrote previously.
>>>>
>>>> Corrected code:
>>>>
>>>>
>>>> #simulation
>>>> for (i in 1:1000) {
>>>> #sample from the reference distribution
>>>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>>>> m_2 <m_2[order(m_2$d_1),]
>>>> d_2 < m_2$d_1
>>>> p_2 < m_2$p_1
>>>>
>>>> #weighted ecdf for the reference distribution and the sample
>>>> f_d_1 < ewcdf(d_1, normalise=F)
>>>> f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
>>>>
>>>> #kolmogorovsmirnov distance
>>>> x < f_d_1(d_2)
>>>> y < f_d_2(d_2)
>>>> ht < ks.test(x, y)
>>>> d_stat[i, 2] < ht$statistic
>>>> d_stat[i, 3] < ht$p.value
>>>> }
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>> Rui Barradas
>>>>
>>>> Às 19:29 de 05/09/19, Rui Barradas escreveu:
>>>>> Hello,
>>>>> I don't have the algorithms at hand but the KS statistic calculation is more complicated than your max/abs difference.
>>>>> Anyway, why not use ks.test? it's not that difficult:
>>>>> set.seed(1234)
>>>>> #reference distribution
>>>>> d_1 < sort(rpois(1000, 500))
>>>>> p_1 < d_1/sum(d_1)
>>>>> m_1 < data.frame(d_1, p_1)
>>>>> #data frame to store the values of the simulation
>>>>> d_stat < data.frame(1:1000, NA, NA)
>>>>> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>>>>> #simulation
>>>>> for (i in 1:1000) {
>>>>> #sample from the reference distribution
>>>>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>>>>> d_2 < m_2$d_1
>>>>> ht < ks.test(d_1, d_2)
>>>>> #kolmogorovsmirnov distance
>>>>> d_stat[i, 2] < ht$statistic
>>>>> d_stat[i, 3] < ht$p.value
>>>>> }
>>>>> hist(d_stat[, 2])
>>>>> hist(d_stat[, 3])
>>>>> Note that d_2 is not sorted, but the results are equal in the sense of function identical(), meaning they are *exactly* the same. Why shouldn't they?
>>>>> Hope this helps,
>>>>> Rui Barradas
>>>>> Às 17:06 de 05/09/19, Boo G. escreveu:
>>>>>> Hello,
>>>>>>
>>>>>> I am trying to perform a Kolmogorov–Smirnov test to assess the difference between a distribution and samples drawn proportionally to size of different sizes. I managed to compute the Kolmogorov–Smirnov distance but I am lost with the pvalue. I have looked into the ks.test function unsuccessfully. Can anyone help me with computing pvalues for a twotailed test?
>>>>>>
>>>>>> Below a simplified version of my code.
>>>>>>
>>>>>> Thanks in advance.
>>>>>> Gianluca
>>>>>>
>>>>>>
>>>>>> library(spatstat)
>>>>>>
>>>>>> #reference distribution
>>>>>> d_1 < sort(rpois(1000, 500))
>>>>>> p_1 < d_1/sum(d_1)
>>>>>> m_1 < data.frame(d_1, p_1)
>>>>>>
>>>>>> #data frame to store the values of the siumation
>>>>>> d_stat < data.frame(1:1000, NA, NA)
>>>>>> names(d_stat) < c("sample_size", "ks_distance", "p_value")
>>>>>>
>>>>>> #simulation
>>>>>> for (i in 1:1000) {
>>>>>> #sample from the reference distribution
>>>>>> m_2 <m_1[(sample(nrow(m_1), size=i, prob=p_1, replace=F)),]
>>>>>> m_2 <m_2[order(m_2$d_1),]
>>>>>> d_2 < m_2$d_1
>>>>>> p_2 < m_2$p_1
>>>>>>
>>>>>> #weighted ecdf for the reference distribution and the sample
>>>>>> f_d_1 < ewcdf(d_1, normalise=F)
>>>>>> f_d_2 < ewcdf(d_2, 1/p_2, normalise=F, adjust=1/length(d_2))
>>>>>>
>>>>>> #kolmogorovsmirnov distance
>>>>>> d_stat[i,2] < max(abs(f_d_1(d_2)  f_d_2(d_2)))
>>>>>> }
>>>>>>
>>>>>>
>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> ______________________________________________
>>>>>> [hidden email] <mailto: [hidden email]> mailing list  To UNSUBSCRIBE and more, see
>>>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Frhelp&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7Cfebe94a441914aa58a9908d732363a52%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=mZZ05M0Qrjy8EelRpDiKpozWtLbF2kSGG%2BjzlAYyzf4%3D&reserved=0>>>>>> PLEASE do read the posting guidehttps://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.Rproject.org%2Fpostingguide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=YKZNBeGme6V9QGyV2%2F150H6rZnLy9bX7xly%2BQEf6O14%3D&reserved=0
>>>>>> and provide commented, minimal, selfcontained, reproducible code.
>>>>>>
>>>>> ______________________________________________
>>>>> [hidden email] <mailto: [hidden email]>mailing list  To UNSUBSCRIBE and more, see
>>>>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Frhelp&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7Cfebe94a441914aa58a9908d732363a52%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=mZZ05M0Qrjy8EelRpDiKpozWtLbF2kSGG%2BjzlAYyzf4%3D&reserved=0>>>>> PLEASE do read the posting guidehttps://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.Rproject.org%2Fpostingguide.html&data=01%7C01%7Cgianluca.boo%40soton.ac.uk%7C4b4a253a1edc4960297f08d732316627%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&sdata=YKZNBeGme6V9QGyV2%2F150H6rZnLy9bX7xly%2BQEf6O14%3D&reserved=0
>>>>> and provide commented, minimal, selfcontained, reproducible code.
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

