R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect

Pamela Krone-Davis
Hi,

I have been using lm in R to do a linear regression and find the slope
coefficients and value for R-squared.  The R-squared value reported by R
(R^2 = 0.9558) is very different than the R-squared value when I use the
same equation in Exce (R^2 = 0.328).  I manually computed R-squared and the
Excel value is correct.  I show my code for the determination of R^2 in R.
When I do not set 0 as the intercept, the R^2 value is the same in R and
Excel.  In both cases the slope coefficient from R and from Excel are
identical.

k is a data frame with two columns.

    M1 = lm(k[,1]~k[,2] + 0)     ## set intercept to 0 and get different
R^2 values in R and Excel
    M2 = lm(k[,1]~k[,2])
    sumM1 = summary(M1)
    sumM2 = summary(M2)    ## get same value as Excel when intercept is not
set to 0

Below is what R returns for sumM1:

lm(formula = k[, 1] ~ k[, 2] + 0)

Residuals:
      Min        1Q    Median        3Q       Max
-0.057199 -0.015857  0.003793  0.013737  0.056178

Coefficients:
       Estimate Std. Error t value Pr(>|t|)
k[, 2]  1.05022    0.04266   24.62   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02411 on 28 degrees of freedom
Multiple R-squared: 0.9558,     Adjusted R-squared: 0.9543
F-statistic: 606.2 on 1 and 28 DF,  p-value: < 2.2e-16

Way manual determination was performed.  The value returned coincides with
the value from Excel:

#### trying to figure out why the R^2 for R and Excel are so different.
     sqerr = (k[,1] - predict(M1))^2
     sqtot = (k[,1] - mean(k[,1])   ^2

     R2 = 1 -  sum(sqerr)/sum(sqtot)     ## for 1D get 0.328   same as
excel value

I am very puzzled by this.  How does R compute the value for R^2 in this
case? Did i write the lm incorrectly?

Thanks
Pam

PS  In case you are interested, the data I am using for hte two columns is
below.

k[, 1]
1]
 [1] 0.17170228 0.10881539 0.11843669 0.11619201 0.08441067 0.09424441
0.04782264 0.09526496 0.11596476 0.10323453 0.06487894 0.08916484
0.06358752 0.07945473
[15] 0.11213532 0.06531185 0.11503484 0.13679548 0.13762677 0.13126827
0.12350649 0.12842441 0.13075654 0.15026602 0.14536351 0.07841638
0.08419016 0.11995240
[29] 0.14425678

> k[,2]
 [1] 0.11 0.10 0.11 0.10 0.10 0.09 0.10 0.09 0.09 0.11 0.09 0.10 0.09 0.10
0.09 0.10 0.10 0.10 0.11 0.10 0.11 0.11 0.12 0.13 0.15 0.10 0.09 0.11 0.12


--
Pam Krone-Davis
Project Research Assistant and Grant Manager
PO Box 22122
Carmel, CA 93922
(831)582-3684 (o)
(831)324-0391 (h)

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect

William Dunlap
What does Excel give for the following data, where the by-hand formula
you gave is obviously wrong?
   > x <- c(1, 2, 3)
   > y <- c(13.1, 11.9, 11.0)
   > M1 <- lm(y~x+0)
   > sqerr <- (y- predict(M1)) ^ 2
   > sqtot <- (y - mean(y)) ^ 2
   > 1 - sum(sqerr)/sum(sqtot)
  [1] -37.38707

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of Pamela Krone-Davis
> Sent: Friday, July 13, 2012 9:01 AM
> To: [hidden email]
> Subject: [R] R-squared with Intercept set to 0 (zero) for linear regression in R is
> incorrect
>
> Hi,
>
> I have been using lm in R to do a linear regression and find the slope
> coefficients and value for R-squared.  The R-squared value reported by R
> (R^2 = 0.9558) is very different than the R-squared value when I use the
> same equation in Exce (R^2 = 0.328).  I manually computed R-squared and the
> Excel value is correct.  I show my code for the determination of R^2 in R.
> When I do not set 0 as the intercept, the R^2 value is the same in R and
> Excel.  In both cases the slope coefficient from R and from Excel are
> identical.
>
> k is a data frame with two columns.
>
>     M1 = lm(k[,1]~k[,2] + 0)     ## set intercept to 0 and get different
> R^2 values in R and Excel
>     M2 = lm(k[,1]~k[,2])
>     sumM1 = summary(M1)
>     sumM2 = summary(M2)    ## get same value as Excel when intercept is not
> set to 0
>
> Below is what R returns for sumM1:
>
> lm(formula = k[, 1] ~ k[, 2] + 0)
>
> Residuals:
>       Min        1Q    Median        3Q       Max
> -0.057199 -0.015857  0.003793  0.013737  0.056178
>
> Coefficients:
>        Estimate Std. Error t value Pr(>|t|)
> k[, 2]  1.05022    0.04266   24.62   <2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0.02411 on 28 degrees of freedom
> Multiple R-squared: 0.9558,     Adjusted R-squared: 0.9543
> F-statistic: 606.2 on 1 and 28 DF,  p-value: < 2.2e-16
>
> Way manual determination was performed.  The value returned coincides with
> the value from Excel:
>
> #### trying to figure out why the R^2 for R and Excel are so different.
>      sqerr = (k[,1] - predict(M1))^2
>      sqtot = (k[,1] - mean(k[,1])   ^2
>
>      R2 = 1 -  sum(sqerr)/sum(sqtot)     ## for 1D get 0.328   same as
> excel value
>
> I am very puzzled by this.  How does R compute the value for R^2 in this
> case? Did i write the lm incorrectly?
>
> Thanks
> Pam
>
> PS  In case you are interested, the data I am using for hte two columns is
> below.
>
> k[, 1]
> 1]
>  [1] 0.17170228 0.10881539 0.11843669 0.11619201 0.08441067 0.09424441
> 0.04782264 0.09526496 0.11596476 0.10323453 0.06487894 0.08916484
> 0.06358752 0.07945473
> [15] 0.11213532 0.06531185 0.11503484 0.13679548 0.13762677 0.13126827
> 0.12350649 0.12842441 0.13075654 0.15026602 0.14536351 0.07841638
> 0.08419016 0.11995240
> [29] 0.14425678
>
> > k[,2]
>  [1] 0.11 0.10 0.11 0.10 0.10 0.09 0.10 0.09 0.09 0.11 0.09 0.10 0.09 0.10
> 0.09 0.10 0.10 0.10 0.11 0.10 0.11 0.11 0.12 0.13 0.15 0.10 0.09 0.11 0.12
>
>
> --
> Pam Krone-Davis
> Project Research Assistant and Grant Manager
> PO Box 22122
> Carmel, CA 93922
> (831)582-3684 (o)
> (831)324-0391 (h)
>
> [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect

William Dunlap
You might want to look at
   http://support.microsoft.com/kb/214230
entitled
   Incorrect output is returned when you use the Linear Regression (LINEST) function in Excel

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of William Dunlap
> Sent: Friday, July 13, 2012 10:04 AM
> To: Pamela Krone-Davis; [hidden email]
> Subject: Re: [R] R-squared with Intercept set to 0 (zero) for linear regression in R is
> incorrect
>
> What does Excel give for the following data, where the by-hand formula
> you gave is obviously wrong?
>    > x <- c(1, 2, 3)
>    > y <- c(13.1, 11.9, 11.0)
>    > M1 <- lm(y~x+0)
>    > sqerr <- (y- predict(M1)) ^ 2
>    > sqtot <- (y - mean(y)) ^ 2
>    > 1 - sum(sqerr)/sum(sqtot)
>   [1] -37.38707
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]] On
> > Behalf Of Pamela Krone-Davis
> > Sent: Friday, July 13, 2012 9:01 AM
> > To: [hidden email]
> > Subject: [R] R-squared with Intercept set to 0 (zero) for linear regression in R is
> > incorrect
> >
> > Hi,
> >
> > I have been using lm in R to do a linear regression and find the slope
> > coefficients and value for R-squared.  The R-squared value reported by R
> > (R^2 = 0.9558) is very different than the R-squared value when I use the
> > same equation in Exce (R^2 = 0.328).  I manually computed R-squared and the
> > Excel value is correct.  I show my code for the determination of R^2 in R.
> > When I do not set 0 as the intercept, the R^2 value is the same in R and
> > Excel.  In both cases the slope coefficient from R and from Excel are
> > identical.
> >
> > k is a data frame with two columns.
> >
> >     M1 = lm(k[,1]~k[,2] + 0)     ## set intercept to 0 and get different
> > R^2 values in R and Excel
> >     M2 = lm(k[,1]~k[,2])
> >     sumM1 = summary(M1)
> >     sumM2 = summary(M2)    ## get same value as Excel when intercept is not
> > set to 0
> >
> > Below is what R returns for sumM1:
> >
> > lm(formula = k[, 1] ~ k[, 2] + 0)
> >
> > Residuals:
> >       Min        1Q    Median        3Q       Max
> > -0.057199 -0.015857  0.003793  0.013737  0.056178
> >
> > Coefficients:
> >        Estimate Std. Error t value Pr(>|t|)
> > k[, 2]  1.05022    0.04266   24.62   <2e-16 ***
> > ---
> > Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > Residual standard error: 0.02411 on 28 degrees of freedom
> > Multiple R-squared: 0.9558,     Adjusted R-squared: 0.9543
> > F-statistic: 606.2 on 1 and 28 DF,  p-value: < 2.2e-16
> >
> > Way manual determination was performed.  The value returned coincides with
> > the value from Excel:
> >
> > #### trying to figure out why the R^2 for R and Excel are so different.
> >      sqerr = (k[,1] - predict(M1))^2
> >      sqtot = (k[,1] - mean(k[,1])   ^2
> >
> >      R2 = 1 -  sum(sqerr)/sum(sqtot)     ## for 1D get 0.328   same as
> > excel value
> >
> > I am very puzzled by this.  How does R compute the value for R^2 in this
> > case? Did i write the lm incorrectly?
> >
> > Thanks
> > Pam
> >
> > PS  In case you are interested, the data I am using for hte two columns is
> > below.
> >
> > k[, 1]
> > 1]
> >  [1] 0.17170228 0.10881539 0.11843669 0.11619201 0.08441067 0.09424441
> > 0.04782264 0.09526496 0.11596476 0.10323453 0.06487894 0.08916484
> > 0.06358752 0.07945473
> > [15] 0.11213532 0.06531185 0.11503484 0.13679548 0.13762677 0.13126827
> > 0.12350649 0.12842441 0.13075654 0.15026602 0.14536351 0.07841638
> > 0.08419016 0.11995240
> > [29] 0.14425678
> >
> > > k[,2]
> >  [1] 0.11 0.10 0.11 0.10 0.10 0.09 0.10 0.09 0.09 0.11 0.09 0.10 0.09 0.10
> > 0.09 0.10 0.10 0.10 0.11 0.10 0.11 0.11 0.12 0.13 0.15 0.10 0.09 0.11 0.12
> >
> >
> > --
> > Pam Krone-Davis
> > Project Research Assistant and Grant Manager
> > PO Box 22122
> > Carmel, CA 93922
> > (831)582-3684 (o)
> > (831)324-0391 (h)
> >
> > [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect

John Sorkin
In reply to this post by Pamela Krone-Davis
Pamela
R squared with a non-zero, and with a zero intercept can be very different as the regression line that you get with and without a zero intercept can be very different. Have you plotted your data plot(k[,2],k[,1]) to see if a zero intercept is reasonable for your data? Have you drawn the regression lines that you get from your models and compared the lines to the plots of your data?
John

>>> Pamela Krone-Davis <[hidden email]> 7/13/2012 12:00:36 PM >>>
Hi,

I have been using lm in R to do a linear regression and find the slope
coefficients and value for R-squared.  The R-squared value reported by R
(R^2 = 0.9558) is very different than the R-squared value when I use the
same equation in Exce (R^2 = 0.328).  I manually computed R-squared and the
Excel value is correct.  I show my code for the determination of R^2 in R.
When I do not set 0 as the intercept, the R^2 value is the same in R and
Excel.  In both cases the slope coefficient from R and from Excel are
identical.

k is a data frame with two columns.

    M1 = lm(k[,1]~k[,2] + 0)     ## set intercept to 0 and get different
R^2 values in R and Excel
    M2 = lm(k[,1]~k[,2])
    sumM1 = summary(M1)
    sumM2 = summary(M2)    ## get same value as Excel when intercept is not
set to 0

Below is what R returns for sumM1:

lm(formula = k[, 1] ~ k[, 2] + 0)

Residuals:
      Min        1Q    Median        3Q       Max
-0.057199 -0.015857  0.003793  0.013737  0.056178

Coefficients:
       Estimate Std. Error t value Pr(>|t|)
k[, 2]  1.05022    0.04266   24.62   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02411 on 28 degrees of freedom
Multiple R-squared: 0.9558,     Adjusted R-squared: 0.9543
F-statistic: 606.2 on 1 and 28 DF,  p-value: < 2.2e-16

Way manual determination was performed.  The value returned coincides with
the value from Excel:

#### trying to figure out why the R^2 for R and Excel are so different.
     sqerr = (k[,1] - predict(M1))^2
     sqtot = (k[,1] - mean(k[,1])   ^2

     R2 = 1 -  sum(sqerr)/sum(sqtot)     ## for 1D get 0.328   same as
excel value

I am very puzzled by this.  How does R compute the value for R^2 in this
case? Did i write the lm incorrectly?

Thanks
Pam

PS  In case you are interested, the data I am using for hte two columns is
below.

k[, 1]
1]
 [1] 0.17170228 0.10881539 0.11843669 0.11619201 0.08441067 0.09424441
0.04782264 0.09526496 0.11596476 0.10323453 0.06487894 0.08916484
0.06358752 0.07945473
[15] 0.11213532 0.06531185 0.11503484 0.13679548 0.13762677 0.13126827
0.12350649 0.12842441 0.13075654 0.15026602 0.14536351 0.07841638
0.08419016 0.11995240
[29] 0.14425678

> k[,2]
 [1] 0.11 0.10 0.11 0.10 0.10 0.09 0.10 0.09 0.09 0.11 0.09 0.10 0.09 0.10
0.09 0.10 0.10 0.10 0.11 0.10 0.11 0.11 0.12 0.13 0.15 0.10 0.09 0.11 0.12


--
Pam Krone-Davis
Project Research Assistant and Grant Manager
PO Box 22122
Carmel, CA 93922
(831)582-3684 (o)
(831)324-0391 (h)

        [[alternative HTML version deleted]]


Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect

William Dunlap
In reply to this post by William Dunlap
While excluding the intercept may make sense, your formula for r^2 assumes
that there was an intercept (that is why mean(y)  is in your expression for
sqtot).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

From: Pamela Krone-Davis [mailto:[hidden email]]
Sent: Friday, July 13, 2012 10:32 AM
To: William Dunlap
Subject: Re: [R] R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect

Hi William,

Thanks for getting back to me.  When I use the values you provided, it would not make sense to set an intercept of 0.  For my data, 0 does make sense as an intercept.  When I do not set a 0 intercept using your data points, I get the same value for R-squared in R and in Excel and manually.

Thanks
Pam


On Fri, Jul 13, 2012 at 10:03 AM, William Dunlap <[hidden email]<mailto:[hidden email]>> wrote:
What does Excel give for the following data, where the by-hand formula
you gave is obviously wrong?
   > x <- c(1, 2, 3)
   > y <- c(13.1, 11.9, 11.0)
   > M1 <- lm(y~x+0)
   > sqerr <- (y- predict(M1)) ^ 2
   > sqtot <- (y - mean(y)) ^ 2
   > 1 - sum(sqerr)/sum(sqtot)
  [1] -37.38707

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com<http://tibco.com>


> -----Original Message-----
> From: [hidden email]<mailto:[hidden email]> [mailto:[hidden email]<mailto:[hidden email]>] On
> Behalf Of Pamela Krone-Davis
> Sent: Friday, July 13, 2012 9:01 AM
> To: [hidden email]<mailto:[hidden email]>
> Subject: [R] R-squared with Intercept set to 0 (zero) for linear regression in R is
> incorrect
>
> Hi,
>
> I have been using lm in R to do a linear regression and find the slope
> coefficients and value for R-squared.  The R-squared value reported by R
> (R^2 = 0.9558) is very different than the R-squared value when I use the
> same equation in Exce (R^2 = 0.328).  I manually computed R-squared and the
> Excel value is correct.  I show my code for the determination of R^2 in R.
> When I do not set 0 as the intercept, the R^2 value is the same in R and
> Excel.  In both cases the slope coefficient from R and from Excel are
> identical.
>
> k is a data frame with two columns.
>
>     M1 = lm(k[,1]~k[,2] + 0)     ## set intercept to 0 and get different
> R^2 values in R and Excel
>     M2 = lm(k[,1]~k[,2])
>     sumM1 = summary(M1)
>     sumM2 = summary(M2)    ## get same value as Excel when intercept is not
> set to 0
>
> Below is what R returns for sumM1:
>
> lm(formula = k[, 1] ~ k[, 2] + 0)
>
> Residuals:
>       Min        1Q    Median        3Q       Max
> -0.057199 -0.015857  0.003793  0.013737  0.056178
>
> Coefficients:
>        Estimate Std. Error t value Pr(>|t|)
> k[, 2]  1.05022    0.04266   24.62   <2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0.02411 on 28 degrees of freedom
> Multiple R-squared: 0.9558,     Adjusted R-squared: 0.9543
> F-statistic: 606.2 on 1 and 28 DF,  p-value: < 2.2e-16
>
> Way manual determination was performed.  The value returned coincides with
> the value from Excel:
>
> #### trying to figure out why the R^2 for R and Excel are so different.
>      sqerr = (k[,1] - predict(M1))^2
>      sqtot = (k[,1] - mean(k[,1])   ^2
>
>      R2 = 1 -  sum(sqerr)/sum(sqtot)     ## for 1D get 0.328   same as
> excel value
>
> I am very puzzled by this.  How does R compute the value for R^2 in this
> case? Did i write the lm incorrectly?
>
> Thanks
> Pam
>
> PS  In case you are interested, the data I am using for hte two columns is
> below.
>
> k[, 1]
> 1]
>  [1] 0.17170228 0.10881539 0.11843669 0.11619201 0.08441067 0.09424441
> 0.04782264 0.09526496 0.11596476 0.10323453 0.06487894 0.08916484
> 0.06358752 0.07945473
> [15] 0.11213532 0.06531185 0.11503484 0.13679548 0.13762677 0.13126827
> 0.12350649 0.12842441 0.13075654 0.15026602 0.14536351 0.07841638
> 0.08419016 0.11995240
> [29] 0.14425678
>
> > k[,2]
>  [1] 0.11 0.10 0.11 0.10 0.10 0.09 0.10 0.09 0.09 0.11 0.09 0.10 0.09 0.10
> 0.09 0.10 0.10 0.10 0.11 0.10 0.11 0.11 0.12 0.13 0.15 0.10 0.09 0.11 0.12
>
>
> --
> Pam Krone-Davis
> Project Research Assistant and Grant Manager
> PO Box 22122
> Carmel, CA 93922
> (831)582-3684 (o)
> (831)324-0391 (h)
>
>       [[alternative HTML version deleted]]



--
Pam Krone-Davis
Project Research Assistant and Grant Manager
PO Box 22122
Carmel, CA 93922
(831)582-3684 (o)
(831)324-0391 (h)

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect

William Dunlap
S+ and, I assume, R compute r^2 when there is no intercept as
   sum(fitted(M1)^2) / sum(y^2)
where M1 is the fitted model and y the response.

See, for example,
   http://web.ist.utl.pt/~ist11038/compute/errtheory/,regression/regrthroughorigin.pdf
for the derivation of this formula.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

From: Pamela Krone-Davis [mailto:[hidden email]]
Sent: Friday, July 13, 2012 11:05 AM
To: William Dunlap
Subject: Re: [R] R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect

Thanks William,

I have actually tried a couple of different formulas for determining R^2.  The second formula does return a different value and assumes no intercept.  The second formula is attached in the image.

However, when I use the manual formula for R^2 that is shown, I get an R^2 value that matches the second formula when I don't set the intercept.  ISo I think the formula I showed works for both cases.

M1 = lm(k[,1]~k[,2] + 0)     ## set intercept to 0
M2 = lm(k[,1]~k[,2])

sqerrM2 = (k[,1] - predict(M2))^2
sqtotM2 = (k[,1] - mean(k[,1]))  ^2
sqerrM1 = (k[,1] - predict(M1))^2
sqtotM1 = (k[,1] - mean(k[,1]))  ^2
R2M1 = 1 -  sum(sqerrM1)/sum(sqtotM1)     ##  get 0.328   same as excel value
R2M2 = 1 -  sum(sqerrM2)/sum(sqtotM2)    ## same as Excel 0.408 and as R in this case

> R2M1
[1] 0.3284381
> R2M2
[1] 0.4083052

How does R compute the R-squared value?

Thanks
Pam

On Fri, Jul 13, 2012 at 10:38 AM, William Dunlap <[hidden email]<mailto:[hidden email]>> wrote:
While excluding the intercept may make sense, your formula for r^2 assumes
that there was an intercept (that is why mean(y)  is in your expression for
sqtot).

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com<http://tibco.com>

From: Pamela Krone-Davis [mailto:[hidden email]<mailto:[hidden email]>]
Sent: Friday, July 13, 2012 10:32 AM
To: William Dunlap
Subject: Re: [R] R-squared with Intercept set to 0 (zero) for linear regression in R is incorrect

Hi William,

Thanks for getting back to me.  When I use the values you provided, it would not make sense to set an intercept of 0.  For my data, 0 does make sense as an intercept.  When I do not set a 0 intercept using your data points, I get the same value for R-squared in R and in Excel and manually.

Thanks
Pam

On Fri, Jul 13, 2012 at 10:03 AM, William Dunlap <[hidden email]<mailto:[hidden email]>> wrote:
What does Excel give for the following data, where the by-hand formula
you gave is obviously wrong?
   > x <- c(1, 2, 3)
   > y <- c(13.1, 11.9, 11.0)
   > M1 <- lm(y~x+0)
   > sqerr <- (y- predict(M1)) ^ 2
   > sqtot <- (y - mean(y)) ^ 2
   > 1 - sum(sqerr)/sum(sqtot)
  [1] -37.38707

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com<http://tibco.com>


> -----Original Message-----
> From: [hidden email]<mailto:[hidden email]> [mailto:[hidden email]<mailto:[hidden email]>] On
> Behalf Of Pamela Krone-Davis
> Sent: Friday, July 13, 2012 9:01 AM
> To: [hidden email]<mailto:[hidden email]>
> Subject: [R] R-squared with Intercept set to 0 (zero) for linear regression in R is
> incorrect
>
> Hi,
>
> I have been using lm in R to do a linear regression and find the slope
> coefficients and value for R-squared.  The R-squared value reported by R
> (R^2 = 0.9558) is very different than the R-squared value when I use the
> same equation in Exce (R^2 = 0.328).  I manually computed R-squared and the
> Excel value is correct.  I show my code for the determination of R^2 in R.
> When I do not set 0 as the intercept, the R^2 value is the same in R and
> Excel.  In both cases the slope coefficient from R and from Excel are
> identical.
>
> k is a data frame with two columns.
>
>     M1 = lm(k[,1]~k[,2] + 0)     ## set intercept to 0 and get different
> R^2 values in R and Excel
>     M2 = lm(k[,1]~k[,2])
>     sumM1 = summary(M1)
>     sumM2 = summary(M2)    ## get same value as Excel when intercept is not
> set to 0
>
> Below is what R returns for sumM1:
>
> lm(formula = k[, 1] ~ k[, 2] + 0)
>
> Residuals:
>       Min        1Q    Median        3Q       Max
> -0.057199 -0.015857  0.003793  0.013737  0.056178
>
> Coefficients:
>        Estimate Std. Error t value Pr(>|t|)
> k[, 2]  1.05022    0.04266   24.62   <2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Residual standard error: 0.02411 on 28 degrees of freedom
> Multiple R-squared: 0.9558,     Adjusted R-squared: 0.9543
> F-statistic: 606.2 on 1 and 28 DF,  p-value: < 2.2e-16
>
> Way manual determination was performed.  The value returned coincides with
> the value from Excel:
>
> #### trying to figure out why the R^2 for R and Excel are so different.
>      sqerr = (k[,1] - predict(M1))^2
>      sqtot = (k[,1] - mean(k[,1])   ^2
>
>      R2 = 1 -  sum(sqerr)/sum(sqtot)     ## for 1D get 0.328   same as
> excel value
>
> I am very puzzled by this.  How does R compute the value for R^2 in this
> case? Did i write the lm incorrectly?
>
> Thanks
> Pam
>
> PS  In case you are interested, the data I am using for hte two columns is
> below.
>
> k[, 1]
> 1]
>  [1] 0.17170228 0.10881539 0.11843669 0.11619201 0.08441067 0.09424441
> 0.04782264 0.09526496 0.11596476 0.10323453 0.06487894 0.08916484
> 0.06358752 0.07945473
> [15] 0.11213532 0.06531185 0.11503484 0.13679548 0.13762677 0.13126827
> 0.12350649 0.12842441 0.13075654 0.15026602 0.14536351 0.07841638
> 0.08419016 0.11995240
> [29] 0.14425678
>
> > k[,2]
>  [1] 0.11 0.10 0.11 0.10 0.10 0.09 0.10 0.09 0.09 0.11 0.09 0.10 0.09 0.10
> 0.09 0.10 0.10 0.10 0.11 0.10 0.11 0.11 0.12 0.13 0.15 0.10 0.09 0.11 0.12
>
>
> --
> Pam Krone-Davis
> Project Research Assistant and Grant Manager
> PO Box 22122
> Carmel, CA 93922
> (831)582-3684<tel:%28831%29582-3684> (o)
> (831)324-0391<tel:%28831%29324-0391> (h)
>
>       [[alternative HTML version deleted]]



--
Pam Krone-Davis
Project Research Assistant and Grant Manager
PO Box 22122
Carmel, CA 93922
(831)582-3684<tel:%28831%29582-3684> (o)
(831)324-0391<tel:%28831%29324-0391> (h)



--
Pam Krone-Davis
Project Research Assistant and Grant Manager
PO Box 22122
Carmel, CA 93922
(831)582-3684 (o)
(831)324-0391 (h)

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.