odd behaviour in quantreg::rq

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

odd behaviour in quantreg::rq

Dylan Beaudette
Hi,

I am trying to use quantile regression to perform weighted-comparisons of the
median across groups. This works most of the time, however I am seeing some
odd output in summary(rq()):

Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights =
area_fraction)
Coefficients:
                   Value    Std. Error t value  Pr(>|t|)
(Intercept)        45.44262  3.64706   12.46007  0.00000
methodmukey-HRU     0.00000  4.67115    0.00000  1.00000
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When I do not include the weights, I get something a little closer to a
weighted comparison of means, along with an error message:

Call: rq(formula = sand ~ method, tau = 0.5, data = x)
Coefficients:
                   Value    Std. Error t value  Pr(>|t|)
(Intercept)        44.91579  2.46341   18.23318  0.00000
methodmukey-HRU     9.57601  9.29348    1.03040  0.30380
Warning message:
In rq.fit.br(x, y, tau = tau, ...) : Solution may be nonunique


I have noticed that the error message goes away when specifying method='fn' to
rq(). An example is below. Could this have something to do with replication
in the data?


# example:
library(quantreg)

# load data
x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))

# with weights
summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5), se='ker')

# without weights
# note error message
summary(rq(sand ~ method, data=x, tau=0.5), se='ker')

# without weights, no error message
summary(rq(sand ~ method, data=x, tau=0.5, method='fn'), se='ker')

--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: odd behaviour in quantreg::rq

RKoenker
Admittedly this seemed quite peculiar....  but if you look at the  
entrails
of the following code you will see that with the weights the first and
second levels of your x$method variable have the same (weighted) median
so the contrast that you are estimating SHOULD be zero.  Perhaps
there is something fishy about the data construction that would have
allowed us to anticipate this?  Regarding the "fn" option, and the
non-uniqueness warning,  this is covered in the (admittedly obscure)
faq on quantile regression available at:

        http://www.econ.uiuc.edu/~roger/research/rq/FAQ

# example:
library(quantreg)

# load data
x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))

# with weights
summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),  
se='ker')

#Reproduction with more convenient notation:

X0 <- model.matrix(~method, data = x)
y <- x$sand
w <- x$area_fraction
f0 <- summary(rq(y ~ X0 - 1, weights = w),se = "ker")

#Second reproduction with orthogonal design:

X1 <- model.matrix(~method - 1, data = x)
f1 <- summary(rq(y ~ X1 - 1, weights = w),se = "ker")

#Comparing f0 and f1 we see that they are consistent!!  How can that  
be??
#Since the columns of X1 are orthogonal estimation of the 3 parameters  
are separable
#so we can check to see whether the univariate weighted medians are  
reproducible.

s1 <- X1[,1] == 1
s2 <- X1[,2] == 1
g1 <- rq(y[s1] ~ X1[s1,1] - 1, weights = w[s1])
g2 <- rq(y[s2] ~ X1[s2,2] - 1, weights = w[s2])

#Now looking at the g1 and g2 objects we see that they are equal and  
agree with f1.


url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    [hidden email]            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Urbana, IL 61801



On Jun 30, 2009, at 3:54 PM, Dylan Beaudette wrote:

> Hi,
>
> I am trying to use quantile regression to perform weighted-
> comparisons of the
> median across groups. This works most of the time, however I am  
> seeing some
> odd output in summary(rq()):
>
> Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights =
> area_fraction)
> Coefficients:
>                   Value    Std. Error t value  Pr(>|t|)
> (Intercept)        45.44262  3.64706   12.46007  0.00000
> methodmukey-HRU     0.00000  4.67115    0.00000  1.00000
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> When I do not include the weights, I get something a little closer  
> to a
> weighted comparison of means, along with an error message:
>
> Call: rq(formula = sand ~ method, tau = 0.5, data = x)
> Coefficients:
>                   Value    Std. Error t value  Pr(>|t|)
> (Intercept)        44.91579  2.46341   18.23318  0.00000
> methodmukey-HRU     9.57601  9.29348    1.03040  0.30380
> Warning message:
> In rq.fit.br(x, y, tau = tau, ...) : Solution may be nonunique
>
>
> I have noticed that the error message goes away when specifying  
> method='fn' to
> rq(). An example is below. Could this have something to do with  
> replication
> in the data?
>
>
> # example:
> library(quantreg)
>
> # load data
> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
>
> # with weights
> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),  
> se='ker')
>
> # without weights
> # note error message
> summary(rq(sand ~ method, data=x, tau=0.5), se='ker')
>
> # without weights, no error message
> summary(rq(sand ~ method, data=x, tau=0.5, method='fn'), se='ker')
>
> --
> Dylan Beaudette
> Soil Resource Laboratory
> http://casoilresource.lawr.ucdavis.edu/
> University of California at Davis
> 530.754.7341
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: odd behaviour in quantreg::rq

Dylan Beaudette
Thanks Roger. Your comments were very helpful. Unfortunately, each of
the 'groups' in this example are derived from the same set of data, two of
which were subsets-- so it is not that unlikely that the weighted medians
were the same in some cases.

This all leads back to an operation attempting to answer the question:

Of the 2 subsetting methods, which one produces a distribution most like the
complete data set? Since the distributions are not normal, and there are
area-weights involved others on the list suggested quantile-regression. For a
more complete picture of how 'different' the distributions are, I have tried
looking at the differences between weighted quantiles: (0.1, 0.25, 0.5, 0.75,
0.9) as a more complete 'description' of each distribution.

I imagine that there may be a better way to perform this comparison...

Cheers,
Dylan


On Tuesday 30 June 2009, roger koenker wrote:

> Admittedly this seemed quite peculiar....  but if you look at the
> entrails
> of the following code you will see that with the weights the first and
> second levels of your x$method variable have the same (weighted) median
> so the contrast that you are estimating SHOULD be zero.  Perhaps
> there is something fishy about the data construction that would have
> allowed us to anticipate this?  Regarding the "fn" option, and the
> non-uniqueness warning,  this is covered in the (admittedly obscure)
> faq on quantile regression available at:
>
> http://www.econ.uiuc.edu/~roger/research/rq/FAQ
>
> # example:
> library(quantreg)
>
> # load data
> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
>
> # with weights
> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
> se='ker')
>
> #Reproduction with more convenient notation:
>
> X0 <- model.matrix(~method, data = x)
> y <- x$sand
> w <- x$area_fraction
> f0 <- summary(rq(y ~ X0 - 1, weights = w),se = "ker")
>
> #Second reproduction with orthogonal design:
>
> X1 <- model.matrix(~method - 1, data = x)
> f1 <- summary(rq(y ~ X1 - 1, weights = w),se = "ker")
>
> #Comparing f0 and f1 we see that they are consistent!!  How can that
> be??
> #Since the columns of X1 are orthogonal estimation of the 3 parameters
> are separable
> #so we can check to see whether the univariate weighted medians are
> reproducible.
>
> s1 <- X1[,1] == 1
> s2 <- X1[,2] == 1
> g1 <- rq(y[s1] ~ X1[s1,1] - 1, weights = w[s1])
> g2 <- rq(y[s2] ~ X1[s2,2] - 1, weights = w[s2])
>
> #Now looking at the g1 and g2 objects we see that they are equal and
> agree with f1.
>
>
> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [hidden email]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Urbana, IL 61801
>
> On Jun 30, 2009, at 3:54 PM, Dylan Beaudette wrote:
> > Hi,
> >
> > I am trying to use quantile regression to perform weighted-
> > comparisons of the
> > median across groups. This works most of the time, however I am
> > seeing some
> > odd output in summary(rq()):
> >
> > Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights =
> > area_fraction)
> > Coefficients:
> >                   Value    Std. Error t value  Pr(>|t|)
> > (Intercept)        45.44262  3.64706   12.46007  0.00000
> > methodmukey-HRU     0.00000  4.67115    0.00000  1.00000
> >  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > When I do not include the weights, I get something a little closer
> > to a
> > weighted comparison of means, along with an error message:
> >
> > Call: rq(formula = sand ~ method, tau = 0.5, data = x)
> > Coefficients:
> >                   Value    Std. Error t value  Pr(>|t|)
> > (Intercept)        44.91579  2.46341   18.23318  0.00000
> > methodmukey-HRU     9.57601  9.29348    1.03040  0.30380
> > Warning message:
> > In rq.fit.br(x, y, tau = tau, ...) : Solution may be nonunique
> >
> >
> > I have noticed that the error message goes away when specifying
> > method='fn' to
> > rq(). An example is below. Could this have something to do with
> > replication
> > in the data?
> >
> >
> > # example:
> > library(quantreg)
> >
> > # load data
> > x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
> >
> > # with weights
> > summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
> > se='ker')
> >
> > # without weights
> > # note error message
> > summary(rq(sand ~ method, data=x, tau=0.5), se='ker')
> >
> > # without weights, no error message
> > summary(rq(sand ~ method, data=x, tau=0.5, method='fn'), se='ker')
> >

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: odd behaviour in quantreg::rq

RKoenker
It's not clear to me whether you are looking for an exploratory tool
or something more like formal inference.  For the former, it seems
that estimating a few weighted quantiles would be quite useful.  at  
least
it is rather Tukeyesque.  While I'm appealing to authorities, I can't
resist recalling that Galton's "invention of correlation" paper:  Co-
relations
and their measurement, Proceedings of the Royal Society, 1888-89,
used medians and interquartile ranges.


url:    www.econ.uiuc.edu/~roger            Roger Koenker
email    [hidden email]            Department of Economics
vox:     217-333-4558                University of Illinois
fax:       217-244-6678                Urbana, IL 61801



On Jul 1, 2009, at 2:48 PM, Dylan Beaudette wrote:

> Thanks Roger. Your comments were very helpful. Unfortunately, each of
> the 'groups' in this example are derived from the same set of data,  
> two of
> which were subsets-- so it is not that unlikely that the weighted  
> medians
> were the same in some cases.
>
> This all leads back to an operation attempting to answer the question:
>
> Of the 2 subsetting methods, which one produces a distribution most  
> like the
> complete data set? Since the distributions are not normal, and there  
> are
> area-weights involved others on the list suggested quantile-
> regression. For a
> more complete picture of how 'different' the distributions are, I  
> have tried
> looking at the differences between weighted quantiles: (0.1, 0.25,  
> 0.5, 0.75,
> 0.9) as a more complete 'description' of each distribution.
>
> I imagine that there may be a better way to perform this comparison...
>
> Cheers,
> Dylan
>
>
> On Tuesday 30 June 2009, roger koenker wrote:
>> Admittedly this seemed quite peculiar....  but if you look at the
>> entrails
>> of the following code you will see that with the weights the first  
>> and
>> second levels of your x$method variable have the same (weighted)  
>> median
>> so the contrast that you are estimating SHOULD be zero.  Perhaps
>> there is something fishy about the data construction that would have
>> allowed us to anticipate this?  Regarding the "fn" option, and the
>> non-uniqueness warning,  this is covered in the (admittedly obscure)
>> faq on quantile regression available at:
>>
>> http://www.econ.uiuc.edu/~roger/research/rq/FAQ
>>
>> # example:
>> library(quantreg)
>>
>> # load data
>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
>>
>> # with weights
>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
>> se='ker')
>>
>> #Reproduction with more convenient notation:
>>
>> X0 <- model.matrix(~method, data = x)
>> y <- x$sand
>> w <- x$area_fraction
>> f0 <- summary(rq(y ~ X0 - 1, weights = w),se = "ker")
>>
>> #Second reproduction with orthogonal design:
>>
>> X1 <- model.matrix(~method - 1, data = x)
>> f1 <- summary(rq(y ~ X1 - 1, weights = w),se = "ker")
>>
>> #Comparing f0 and f1 we see that they are consistent!!  How can that
>> be??
>> #Since the columns of X1 are orthogonal estimation of the 3  
>> parameters
>> are separable
>> #so we can check to see whether the univariate weighted medians are
>> reproducible.
>>
>> s1 <- X1[,1] == 1
>> s2 <- X1[,2] == 1
>> g1 <- rq(y[s1] ~ X1[s1,1] - 1, weights = w[s1])
>> g2 <- rq(y[s2] ~ X1[s2,2] - 1, weights = w[s2])
>>
>> #Now looking at the g1 and g2 objects we see that they are equal and
>> agree with f1.
>>
>>
>> url:    www.econ.uiuc.edu/~roger            Roger Koenker
>> email    [hidden email]            Department of Economics
>> vox:     217-333-4558                University of Illinois
>> fax:       217-244-6678                Urbana, IL 61801
>>
>> On Jun 30, 2009, at 3:54 PM, Dylan Beaudette wrote:
>>> Hi,
>>>
>>> I am trying to use quantile regression to perform weighted-
>>> comparisons of the
>>> median across groups. This works most of the time, however I am
>>> seeing some
>>> odd output in summary(rq()):
>>>
>>> Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights =
>>> area_fraction)
>>> Coefficients:
>>>                  Value    Std. Error t value  Pr(>|t|)
>>> (Intercept)        45.44262  3.64706   12.46007  0.00000
>>> methodmukey-HRU     0.00000  4.67115    0.00000  1.00000
>>>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> When I do not include the weights, I get something a little closer
>>> to a
>>> weighted comparison of means, along with an error message:
>>>
>>> Call: rq(formula = sand ~ method, tau = 0.5, data = x)
>>> Coefficients:
>>>                  Value    Std. Error t value  Pr(>|t|)
>>> (Intercept)        44.91579  2.46341   18.23318  0.00000
>>> methodmukey-HRU     9.57601  9.29348    1.03040  0.30380
>>> Warning message:
>>> In rq.fit.br(x, y, tau = tau, ...) : Solution may be nonunique
>>>
>>>
>>> I have noticed that the error message goes away when specifying
>>> method='fn' to
>>> rq(). An example is below. Could this have something to do with
>>> replication
>>> in the data?
>>>
>>>
>>> # example:
>>> library(quantreg)
>>>
>>> # load data
>>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
>>>
>>> # with weights
>>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
>>> se='ker')
>>>
>>> # without weights
>>> # note error message
>>> summary(rq(sand ~ method, data=x, tau=0.5), se='ker')
>>>
>>> # without weights, no error message
>>> summary(rq(sand ~ method, data=x, tau=0.5, method='fn'), se='ker')
>>>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: odd behaviour in quantreg::rq

Dylan Beaudette
On Wednesday 01 July 2009, roger koenker wrote:

> It's not clear to me whether you are looking for an exploratory tool
> or something more like formal inference.  For the former, it seems
> that estimating a few weighted quantiles would be quite useful.  at
> least
> it is rather Tukeyesque.  While I'm appealing to authorities, I can't
> resist recalling that Galton's "invention of correlation" paper:  Co-
> relations
> and their measurement, Proceedings of the Royal Society, 1888-89,
> used medians and interquartile ranges.
>

Thanks Roger.

We are interested in a formal inference type of comparison. I have found that
weighted density plots to be a helpful exploratory graphic-- but ultimately
we would like to (as objectively as possible) determine if distributions are
different.

Unfortunately the distributions are very poorly behaved. Once such example can
be seen here:
http://169.237.35.250/~dylan/temp/ACTIVITY-method_comparision.pdf

I was imagine using something like the number of 'significant' differences in
n-quantiles would be used to determine which treatment group was most
different than the control group.

Cheers,
Dylan


> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [hidden email]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Urbana, IL 61801
>
> On Jul 1, 2009, at 2:48 PM, Dylan Beaudette wrote:
> > Thanks Roger. Your comments were very helpful. Unfortunately, each of
> > the 'groups' in this example are derived from the same set of data,
> > two of
> > which were subsets-- so it is not that unlikely that the weighted
> > medians
> > were the same in some cases.
> >
> > This all leads back to an operation attempting to answer the question:
> >
> > Of the 2 subsetting methods, which one produces a distribution most
> > like the
> > complete data set? Since the distributions are not normal, and there
> > are
> > area-weights involved others on the list suggested quantile-
> > regression. For a
> > more complete picture of how 'different' the distributions are, I
> > have tried
> > looking at the differences between weighted quantiles: (0.1, 0.25,
> > 0.5, 0.75,
> > 0.9) as a more complete 'description' of each distribution.
> >
> > I imagine that there may be a better way to perform this comparison...
> >
> > Cheers,
> > Dylan
> >
> > On Tuesday 30 June 2009, roger koenker wrote:
> >> Admittedly this seemed quite peculiar....  but if you look at the
> >> entrails
> >> of the following code you will see that with the weights the first
> >> and
> >> second levels of your x$method variable have the same (weighted)
> >> median
> >> so the contrast that you are estimating SHOULD be zero.  Perhaps
> >> there is something fishy about the data construction that would have
> >> allowed us to anticipate this?  Regarding the "fn" option, and the
> >> non-uniqueness warning,  this is covered in the (admittedly obscure)
> >> faq on quantile regression available at:
> >>
> >> http://www.econ.uiuc.edu/~roger/research/rq/FAQ
> >>
> >> # example:
> >> library(quantreg)
> >>
> >> # load data
> >> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
> >>
> >> # with weights
> >> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
> >> se='ker')
> >>
> >> #Reproduction with more convenient notation:
> >>
> >> X0 <- model.matrix(~method, data = x)
> >> y <- x$sand
> >> w <- x$area_fraction
> >> f0 <- summary(rq(y ~ X0 - 1, weights = w),se = "ker")
> >>
> >> #Second reproduction with orthogonal design:
> >>
> >> X1 <- model.matrix(~method - 1, data = x)
> >> f1 <- summary(rq(y ~ X1 - 1, weights = w),se = "ker")
> >>
> >> #Comparing f0 and f1 we see that they are consistent!!  How can that
> >> be??
> >> #Since the columns of X1 are orthogonal estimation of the 3
> >> parameters
> >> are separable
> >> #so we can check to see whether the univariate weighted medians are
> >> reproducible.
> >>
> >> s1 <- X1[,1] == 1
> >> s2 <- X1[,2] == 1
> >> g1 <- rq(y[s1] ~ X1[s1,1] - 1, weights = w[s1])
> >> g2 <- rq(y[s2] ~ X1[s2,2] - 1, weights = w[s2])
> >>
> >> #Now looking at the g1 and g2 objects we see that they are equal and
> >> agree with f1.
> >>
> >>
> >> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> >> email    [hidden email]            Department of Economics
> >> vox:     217-333-4558                University of Illinois
> >> fax:       217-244-6678                Urbana, IL 61801
> >>
> >> On Jun 30, 2009, at 3:54 PM, Dylan Beaudette wrote:
> >>> Hi,
> >>>
> >>> I am trying to use quantile regression to perform weighted-
> >>> comparisons of the
> >>> median across groups. This works most of the time, however I am
> >>> seeing some
> >>> odd output in summary(rq()):
> >>>
> >>> Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights =
> >>> area_fraction)
> >>> Coefficients:
> >>>                  Value    Std. Error t value  Pr(>|t|)
> >>> (Intercept)        45.44262  3.64706   12.46007  0.00000
> >>> methodmukey-HRU     0.00000  4.67115    0.00000  1.00000
> >>>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>
> >>> When I do not include the weights, I get something a little closer
> >>> to a
> >>> weighted comparison of means, along with an error message:
> >>>
> >>> Call: rq(formula = sand ~ method, tau = 0.5, data = x)
> >>> Coefficients:
> >>>                  Value    Std. Error t value  Pr(>|t|)
> >>> (Intercept)        44.91579  2.46341   18.23318  0.00000
> >>> methodmukey-HRU     9.57601  9.29348    1.03040  0.30380
> >>> Warning message:
> >>> In rq.fit.br(x, y, tau = tau, ...) : Solution may be nonunique
> >>>
> >>>
> >>> I have noticed that the error message goes away when specifying
> >>> method='fn' to
> >>> rq(). An example is below. Could this have something to do with
> >>> replication
> >>> in the data?
> >>>
> >>>
> >>> # example:
> >>> library(quantreg)
> >>>
> >>> # load data
> >>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
> >>>
> >>> # with weights
> >>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
> >>> se='ker')
> >>>
> >>> # without weights
> >>> # note error message
> >>> summary(rq(sand ~ method, data=x, tau=0.5), se='ker')
> >>>
> >>> # without weights, no error message
> >>> summary(rq(sand ~ method, data=x, tau=0.5, method='fn'), se='ker')



--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: odd behaviour in quantreg::rq

Dylan Beaudette
On Wednesday 01 July 2009, roger koenker wrote:

> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> email    [hidden email]            Department of Economics
> vox:     217-333-4558                University of Illinois
> fax:       217-244-6678                Urbana, IL 61801
>
> On Jul 1, 2009, at 4:52 PM, Dylan Beaudette wrote:
> > On Wednesday 01 July 2009, roger koenker wrote:
> >> It's not clear to me whether you are looking for an exploratory tool
> >> or something more like formal inference.  For the former, it seems
> >> that estimating a few weighted quantiles would be quite useful.  at
> >> least
> >> it is rather Tukeyesque.  While I'm appealing to authorities, I can't
> >> resist recalling that Galton's "invention of correlation" paper:  Co-
> >> relations
> >> and their measurement, Proceedings of the Royal Society, 1888-89,
> >> used medians and interquartile ranges.
> >
> > Thanks Roger.
> >
> > We are interested in a formal inference type of comparison. I have
> > found that
> > weighted density plots to be a helpful exploratory graphic-- but
> > ultimately
> > we would like to (as objectively as possible) determine if
> > distributions are
> > different.
> >
> > Unfortunately the distributions are very poorly behaved. Once such
> > example can
> > be seen here:
> > http://169.237.35.250/~dylan/temp/ACTIVITY-method_comparision.pdf
> >
> > I was imagine using something like the number of 'significant'
> > differences in
> > n-quantiles would be used to determine which treatment group was most
> > different than the control group.
>
> sure, you could do that but to be formal and account for multiple
> testing you
> are going to have some headaches.  The much maligned two sample KS
> tests might be a worthwhile alternative to consider.

Right... forgot about that. OK. I had considered the KS test, but have not yet
been able to identify how one could include weights... There appear to be
methods listed on the internet, but I haven't seen one implemented in R.

Still looking...

Dylan



>
> > Cheers,
> > Dylan
> >
> >> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> >> email    [hidden email]            Department of Economics
> >> vox:     217-333-4558                University of Illinois
> >> fax:       217-244-6678                Urbana, IL 61801
> >>
> >> On Jul 1, 2009, at 2:48 PM, Dylan Beaudette wrote:
> >>> Thanks Roger. Your comments were very helpful. Unfortunately, each
> >>> of
> >>> the 'groups' in this example are derived from the same set of data,
> >>> two of
> >>> which were subsets-- so it is not that unlikely that the weighted
> >>> medians
> >>> were the same in some cases.
> >>>
> >>> This all leads back to an operation attempting to answer the
> >>> question:
> >>>
> >>> Of the 2 subsetting methods, which one produces a distribution most
> >>> like the
> >>> complete data set? Since the distributions are not normal, and there
> >>> are
> >>> area-weights involved others on the list suggested quantile-
> >>> regression. For a
> >>> more complete picture of how 'different' the distributions are, I
> >>> have tried
> >>> looking at the differences between weighted quantiles: (0.1, 0.25,
> >>> 0.5, 0.75,
> >>> 0.9) as a more complete 'description' of each distribution.
> >>>
> >>> I imagine that there may be a better way to perform this
> >>> comparison...
> >>>
> >>> Cheers,
> >>> Dylan
> >>>
> >>> On Tuesday 30 June 2009, roger koenker wrote:
> >>>> Admittedly this seemed quite peculiar....  but if you look at the
> >>>> entrails
> >>>> of the following code you will see that with the weights the first
> >>>> and
> >>>> second levels of your x$method variable have the same (weighted)
> >>>> median
> >>>> so the contrast that you are estimating SHOULD be zero.  Perhaps
> >>>> there is something fishy about the data construction that would
> >>>> have
> >>>> allowed us to anticipate this?  Regarding the "fn" option, and the
> >>>> non-uniqueness warning,  this is covered in the (admittedly
> >>>> obscure)
> >>>> faq on quantile regression available at:
> >>>>
> >>>> http://www.econ.uiuc.edu/~roger/research/rq/FAQ
> >>>>
> >>>> # example:
> >>>> library(quantreg)
> >>>>
> >>>> # load data
> >>>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
> >>>>
> >>>> # with weights
> >>>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
> >>>> se='ker')
> >>>>
> >>>> #Reproduction with more convenient notation:
> >>>>
> >>>> X0 <- model.matrix(~method, data = x)
> >>>> y <- x$sand
> >>>> w <- x$area_fraction
> >>>> f0 <- summary(rq(y ~ X0 - 1, weights = w),se = "ker")
> >>>>
> >>>> #Second reproduction with orthogonal design:
> >>>>
> >>>> X1 <- model.matrix(~method - 1, data = x)
> >>>> f1 <- summary(rq(y ~ X1 - 1, weights = w),se = "ker")
> >>>>
> >>>> #Comparing f0 and f1 we see that they are consistent!!  How can
> >>>> that
> >>>> be??
> >>>> #Since the columns of X1 are orthogonal estimation of the 3
> >>>> parameters
> >>>> are separable
> >>>> #so we can check to see whether the univariate weighted medians are
> >>>> reproducible.
> >>>>
> >>>> s1 <- X1[,1] == 1
> >>>> s2 <- X1[,2] == 1
> >>>> g1 <- rq(y[s1] ~ X1[s1,1] - 1, weights = w[s1])
> >>>> g2 <- rq(y[s2] ~ X1[s2,2] - 1, weights = w[s2])
> >>>>
> >>>> #Now looking at the g1 and g2 objects we see that they are equal
> >>>> and
> >>>> agree with f1.
> >>>>
> >>>>
> >>>> url:    www.econ.uiuc.edu/~roger            Roger Koenker
> >>>> email    [hidden email]            Department of Economics
> >>>> vox:     217-333-4558                University of Illinois
> >>>> fax:       217-244-6678                Urbana, IL 61801
> >>>>
> >>>> On Jun 30, 2009, at 3:54 PM, Dylan Beaudette wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I am trying to use quantile regression to perform weighted-
> >>>>> comparisons of the
> >>>>> median across groups. This works most of the time, however I am
> >>>>> seeing some
> >>>>> odd output in summary(rq()):
> >>>>>
> >>>>> Call: rq(formula = sand ~ method, tau = 0.5, data = x, weights =
> >>>>> area_fraction)
> >>>>> Coefficients:
> >>>>>                 Value    Std. Error t value  Pr(>|t|)
> >>>>> (Intercept)        45.44262  3.64706   12.46007  0.00000
> >>>>> methodmukey-HRU     0.00000  4.67115    0.00000  1.00000
> >>>>>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>>>
> >>>>> When I do not include the weights, I get something a little closer
> >>>>> to a
> >>>>> weighted comparison of means, along with an error message:
> >>>>>
> >>>>> Call: rq(formula = sand ~ method, tau = 0.5, data = x)
> >>>>> Coefficients:
> >>>>>                 Value    Std. Error t value  Pr(>|t|)
> >>>>> (Intercept)        44.91579  2.46341   18.23318  0.00000
> >>>>> methodmukey-HRU     9.57601  9.29348    1.03040  0.30380
> >>>>> Warning message:
> >>>>> In rq.fit.br(x, y, tau = tau, ...) : Solution may be nonunique
> >>>>>
> >>>>>
> >>>>> I have noticed that the error message goes away when specifying
> >>>>> method='fn' to
> >>>>> rq(). An example is below. Could this have something to do with
> >>>>> replication
> >>>>> in the data?
> >>>>>
> >>>>>
> >>>>> # example:
> >>>>> library(quantreg)
> >>>>>
> >>>>> # load data
> >>>>> x <- read.csv(url('http://169.237.35.250/~dylan/temp/test.csv'))
> >>>>>
> >>>>> # with weights
> >>>>> summary(rq(sand ~ method, data=x, weights=area_fraction, tau=0.5),
> >>>>> se='ker')
> >>>>>
> >>>>> # without weights
> >>>>> # note error message
> >>>>> summary(rq(sand ~ method, data=x, tau=0.5), se='ker')
> >>>>>
> >>>>> # without weights, no error message
> >>>>> summary(rq(sand ~ method, data=x, tau=0.5, method='fn'), se='ker')
> >



--
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.