Siegel nonparametric regression / mblm package

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Siegel nonparametric regression / mblm package

Marco Besozzi
I employed the "galton" set of data included in the package "psych". With
the package "mblm" I obtained the Theil-Sen nonparametric regression and
the Siegel non parametric regression, and compared them with the ordinary
least square regression line.
The results of standard regression and Theil-Sen regression are practically
identical. But the Siegel regression seems to have a bias that I cannot
understand. May I ask for a possible explanation? The bias may be related
to the number of ties in the set of data? Here's the code and the image.

Best regards.

Marco Besozzi
# Theil-Sen and Siegel nonparametric regression with package mblm
# comparison with ordinary least squares (parametric) regression
# on galton set of data included in the package psych
#
library(psych)
attach(galton)
library(mblm)
#
reglin_yx <- lm(child ~ parent, data=galton) # ordinary least squares
(parametric) regression
a_yx <- reglin_yx$coefficients[1] # intercept a
b_yx <- reglin_yx$coefficients[2] # slope b
#
regnonTS <- mblm(child ~ parent, data=galton, repeated=FALSE) # Theil-Sen
nonparametric regression (wait a few minutes!)
a_TS <- regnonTS$coefficients[1] # intercept a
b_TS <- regnonTS$coefficients[2] # slope b
#
regnonS = mblm(child ~ parent, data=galton, repeated=TRUE) # Siegel
nonparametric regression
a_S <- regnonS$coefficients[1] # intercept a
b_S <- regnonS$coefficients[2] # slope b
#
# xy plot of data and regression lines
#
windows() # open a new window
plot(parent, child, xlim = c(60,80), ylim = c(60,80), pch=1, xlab="Parent
heigt (inch)", ylab="Chile height (inch)", main="Regression lines
comparison", cex.main = 0.9) # data plot
abline(a_yx, b_yx, col="green", lty=1) # ordinary least squares
(parametric) regression line
abline(a_TS, b_TS, col="blue", lty=1) # Theil-Sen nonparametric regression
line
abline(a_S, b_S, col="red", lty=1) # Siegel nonparametric regression
legend(60, 80, legend=c("Ordinary least squares regression", "Theil-Sen
nonparametric regression","Siegel nonparametric regression"),
col=c("green", "blue", "red"), lty=c(4,4,1), cex=0.8) # add a legend
#

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Siegel.PNG (108K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Siegel nonparametric regression / mblm package

Roger Koenker-3
My first thought was also that this was an artifact of the ties, but dithering the data
n <- length(child)
child <- child + runif(n,-.5,.5)
parent <- parent + runif(n,-.5,.5)

and rerunning yields the same discrepancy between the Siegel and other fits. Curiously, both
lmsreg and ltsreg from MASS produce lines that are more steeply sloped than those
of the other methods.  Since I stupidly forgot to set.seed(), YMMV.

> On Feb 11, 2019, at 10:24 AM, Marco Besozzi <[hidden email]> wrote:
>
> I employed the "galton" set of data included in the package "psych". With
> the package "mblm" I obtained the Theil-Sen nonparametric regression and
> the Siegel non parametric regression, and compared them with the ordinary
> least square regression line.
> The results of standard regression and Theil-Sen regression are practically
> identical. But the Siegel regression seems to have a bias that I cannot
> understand. May I ask for a possible explanation? The bias may be related
> to the number of ties in the set of data? Here's the code and the image.
>
> Best regards.
>
> Marco Besozzi
> # Theil-Sen and Siegel nonparametric regression with package mblm
> # comparison with ordinary least squares (parametric) regression
> # on galton set of data included in the package psych
> #
> library(psych)
> attach(galton)
> library(mblm)
> #
> reglin_yx <- lm(child ~ parent, data=galton) # ordinary least squares
> (parametric) regression
> a_yx <- reglin_yx$coefficients[1] # intercept a
> b_yx <- reglin_yx$coefficients[2] # slope b
> #
> regnonTS <- mblm(child ~ parent, data=galton, repeated=FALSE) # Theil-Sen
> nonparametric regression (wait a few minutes!)
> a_TS <- regnonTS$coefficients[1] # intercept a
> b_TS <- regnonTS$coefficients[2] # slope b
> #
> regnonS = mblm(child ~ parent, data=galton, repeated=TRUE) # Siegel
> nonparametric regression
> a_S <- regnonS$coefficients[1] # intercept a
> b_S <- regnonS$coefficients[2] # slope b
> #
> # xy plot of data and regression lines
> #
> windows() # open a new window
> plot(parent, child, xlim = c(60,80), ylim = c(60,80), pch=1, xlab="Parent
> heigt (inch)", ylab="Chile height (inch)", main="Regression lines
> comparison", cex.main = 0.9) # data plot
> abline(a_yx, b_yx, col="green", lty=1) # ordinary least squares
> (parametric) regression line
> abline(a_TS, b_TS, col="blue", lty=1) # Theil-Sen nonparametric regression
> line
> abline(a_S, b_S, col="red", lty=1) # Siegel nonparametric regression
> legend(60, 80, legend=c("Ordinary least squares regression", "Theil-Sen
> nonparametric regression","Siegel nonparametric regression"),
> col=c("green", "blue", "red"), lty=c(4,4,1), cex=0.8) # add a legend
> #
> <Siegel.PNG>______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Siegel nonparametric regression / mblm package

Roger Koenker-3
A quick look at the code for Siegel in mblm reveals that it is extremely inefficient, but it seems to be correct.
One “explanation” for this behavior, presuming that we haven’t overlooked something more basic, is that such
high breakdown estimates sacrifice some efficiency, that is to say, they are more variable than other methods
when the data is well behaved, and of course, the Galton data is famously “almost Gaussian”.

> On Feb 11, 2019, at 12:47 PM, Marco Besozzi <[hidden email]> wrote:
>
> Thank you very much for your reply.
> If I have well understood, unfortunately in this way I have lost the only idea I had...
> Do you believe that a problem in the R algorithm employed in the package mblm for Siegel regression is possible?
> And do you know if Siegel regression is available in a different package? I was unable to find it.
> Thanks again!
> Best regards.
>
> P.S.: sorry for my bad english...
>
> Il giorno lun 11 feb 2019 alle ore 12:54 Roger Koenker <[hidden email] <mailto:[hidden email]>> ha scritto:
> My first thought was also that this was an artifact of the ties, but dithering the data
> n <- length(child)
> child <- child + runif(n,-.5,.5)
> parent <- parent + runif(n,-.5,.5)
>
> and rerunning yields the same discrepancy between the Siegel and other fits. Curiously, both
> lmsreg and ltsreg from MASS produce lines that are more steeply sloped than those
> of the other methods.  Since I stupidly forgot to set.seed(), YMMV.
>
> > On Feb 11, 2019, at 10:24 AM, Marco Besozzi <[hidden email] <mailto:[hidden email]>> wrote:
> >
> > I employed the "galton" set of data included in the package "psych". With
> > the package "mblm" I obtained the Theil-Sen nonparametric regression and
> > the Siegel non parametric regression, and compared them with the ordinary
> > least square regression line.
> > The results of standard regression and Theil-Sen regression are practically
> > identical. But the Siegel regression seems to have a bias that I cannot
> > understand. May I ask for a possible explanation? The bias may be related
> > to the number of ties in the set of data? Here's the code and the image.
> >
> > Best regards.
> >
> > Marco Besozzi
> > # Theil-Sen and Siegel nonparametric regression with package mblm
> > # comparison with ordinary least squares (parametric) regression
> > # on galton set of data included in the package psych
> > #
> > library(psych)
> > attach(galton)
> > library(mblm)
> > #
> > reglin_yx <- lm(child ~ parent, data=galton) # ordinary least squares
> > (parametric) regression
> > a_yx <- reglin_yx$coefficients[1] # intercept a
> > b_yx <- reglin_yx$coefficients[2] # slope b
> > #
> > regnonTS <- mblm(child ~ parent, data=galton, repeated=FALSE) # Theil-Sen
> > nonparametric regression (wait a few minutes!)
> > a_TS <- regnonTS$coefficients[1] # intercept a
> > b_TS <- regnonTS$coefficients[2] # slope b
> > #
> > regnonS = mblm(child ~ parent, data=galton, repeated=TRUE) # Siegel
> > nonparametric regression
> > a_S <- regnonS$coefficients[1] # intercept a
> > b_S <- regnonS$coefficients[2] # slope b
> > #
> > # xy plot of data and regression lines
> > #
> > windows() # open a new window
> > plot(parent, child, xlim = c(60,80), ylim = c(60,80), pch=1, xlab="Parent
> > heigt (inch)", ylab="Chile height (inch)", main="Regression lines
> > comparison", cex.main = 0.9) # data plot
> > abline(a_yx, b_yx, col="green", lty=1) # ordinary least squares
> > (parametric) regression line
> > abline(a_TS, b_TS, col="blue", lty=1) # Theil-Sen nonparametric regression
> > line
> > abline(a_S, b_S, col="red", lty=1) # Siegel nonparametric regression
> > legend(60, 80, legend=c("Ordinary least squares regression", "Theil-Sen
> > nonparametric regression","Siegel nonparametric regression"),
> > col=c("green", "blue", "red"), lty=c(4,4,1), cex=0.8) # add a legend
> > #
> > <Siegel.PNG>______________________________________________
> > [hidden email] <mailto:[hidden email]> mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
>


        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Siegel nonparametric regression / mblm package

Marco Besozzi
Thanks a lot!


Il giorno lun 11 feb 2019 alle ore 14:39 Roger Koenker <
[hidden email]> ha scritto:

> A quick look at the code for Siegel in mblm reveals that it is extremely
> inefficient, but it seems to be correct.
> One “explanation” for this behavior, presuming that we haven’t overlooked
> something more basic, is that such
> high breakdown estimates sacrifice some efficiency, that is to say, they
> are more variable than other methods
> when the data is well behaved, and of course, the Galton data is famously
> “almost Gaussian”.
>
> On Feb 11, 2019, at 12:47 PM, Marco Besozzi <[hidden email]>
> wrote:
>
> Thank you very much for your reply.
> If I have well understood, unfortunately in this way I have lost the only
> idea I had...
> Do you believe that a problem in the R algorithm employed in the package
> mblm for Siegel regression is possible?
> And do you know if Siegel regression is available in a different package?
> I was unable to find it.
> Thanks again!
> Best regards.
>
> P.S.: sorry for my bad english...
>
> Il giorno lun 11 feb 2019 alle ore 12:54 Roger Koenker <
> [hidden email]> ha scritto:
>
>> My first thought was also that this was an artifact of the ties, but
>> dithering the data
>> n <- length(child)
>> child <- child + runif(n,-.5,.5)
>> parent <- parent + runif(n,-.5,.5)
>>
>> and rerunning yields the same discrepancy between the Siegel and other
>> fits. Curiously, both
>> lmsreg and ltsreg from MASS produce lines that are more steeply sloped
>> than those
>> of the other methods.  Since I stupidly forgot to set.seed(), YMMV.
>>
>> > On Feb 11, 2019, at 10:24 AM, Marco Besozzi <[hidden email]>
>> wrote:
>> >
>> > I employed the "galton" set of data included in the package "psych".
>> With
>> > the package "mblm" I obtained the Theil-Sen nonparametric regression and
>> > the Siegel non parametric regression, and compared them with the
>> ordinary
>> > least square regression line.
>> > The results of standard regression and Theil-Sen regression are
>> practically
>> > identical. But the Siegel regression seems to have a bias that I cannot
>> > understand. May I ask for a possible explanation? The bias may be
>> related
>> > to the number of ties in the set of data? Here's the code and the image.
>> >
>> > Best regards.
>> >
>> > Marco Besozzi
>> > # Theil-Sen and Siegel nonparametric regression with package mblm
>> > # comparison with ordinary least squares (parametric) regression
>> > # on galton set of data included in the package psych
>> > #
>> > library(psych)
>> > attach(galton)
>> > library(mblm)
>> > #
>> > reglin_yx <- lm(child ~ parent, data=galton) # ordinary least squares
>> > (parametric) regression
>> > a_yx <- reglin_yx$coefficients[1] # intercept a
>> > b_yx <- reglin_yx$coefficients[2] # slope b
>> > #
>> > regnonTS <- mblm(child ~ parent, data=galton, repeated=FALSE) #
>> Theil-Sen
>> > nonparametric regression (wait a few minutes!)
>> > a_TS <- regnonTS$coefficients[1] # intercept a
>> > b_TS <- regnonTS$coefficients[2] # slope b
>> > #
>> > regnonS = mblm(child ~ parent, data=galton, repeated=TRUE) # Siegel
>> > nonparametric regression
>> > a_S <- regnonS$coefficients[1] # intercept a
>> > b_S <- regnonS$coefficients[2] # slope b
>> > #
>> > # xy plot of data and regression lines
>> > #
>> > windows() # open a new window
>> > plot(parent, child, xlim = c(60,80), ylim = c(60,80), pch=1,
>> xlab="Parent
>> > heigt (inch)", ylab="Chile height (inch)", main="Regression lines
>> > comparison", cex.main = 0.9) # data plot
>> > abline(a_yx, b_yx, col="green", lty=1) # ordinary least squares
>> > (parametric) regression line
>> > abline(a_TS, b_TS, col="blue", lty=1) # Theil-Sen nonparametric
>> regression
>> > line
>> > abline(a_S, b_S, col="red", lty=1) # Siegel nonparametric regression
>> > legend(60, 80, legend=c("Ordinary least squares regression", "Theil-Sen
>> > nonparametric regression","Siegel nonparametric regression"),
>> > col=c("green", "blue", "red"), lty=c(4,4,1), cex=0.8) # add a legend
>> > #
>> > <Siegel.PNG>______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <http://www.r-project.org/posting-guide.html>
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.