Different result of multiple regression in R and SPSS

classic Classic list List threaded Threaded
14 messages Options
J.
Reply | Threaded
Open this post in threaded view
|

Different result of multiple regression in R and SPSS

J.
This post was updated on .
Hi, I am trying to do a multiple regression analysis that has one nominal variable (gender) and three numeric variables as independent variables and one numeric variable as a dependent variable.

So, I got a formula like this (in R):

summary(out.3 <- lm(scale(DV) ~  gender + scale(IV.1) + scale(IV.2) + scale(IV.3))

After running analysis, I tried to compare the outcome in R with the outcome in SPSS and found the results are different!
I found that R and SPSS have the exact same outcome when every variable is numeric; however, whenever I included "gender (0/1)" variable in the equation, the result become different.

I guess that SPSS automatically treat gender as a numeric variable and standardize it when running analysis. So, I tried to change "gender" to a numeric variable and ran analysis but the results were still not identical.

What is the problem here and what is the right way to do this analysis?
Thanks,

Jay Yang
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

Bert Gunter
Answer: Contrasts, i.e. the parameterization of the categorical variable(s) df.

?contrasts may be of some help, but you really need to do some
background studying of the linear models principles involved. Googling
may provide tutorials. Also searching the mail archives, e.g.:

https://stat.ethz.ch/pipermail/r-help/2009-February/187479.html

-- Bert

On Tue, Jul 19, 2011 at 2:39 PM, J. <[hidden email]> wrote:

> Hi, I am trying to do a simple multiple regression analysis that has one
> nominal variable (gender) and three numeric variables as independent
> variables and one numeric variable as dependent variable.
>
> So, I got a formula like this:
> summary(out.3 <- lm(scale(DV) ~  gender + scale(IV.1) + scale(IV.2) +
> scale(IV.3))
>
> I tried to compare the outcome in R with the outcome in SPSS and found the
> results are different!
> I found that R and SPSS have the exact same outcome when every variable is
> numeric; however, whenever I included "gender (0/1)" variable in the
> equation, the result become different.
>
> I guess that SPSS automatically treat gender as a numeric variable and
> standardize it when running analysis. So, I tried to change "gender" to a
> numeric variable and ran analysis but the results were still not identical.
>
> What is the problem here and what is the right way to do this analysis?
> Thanks,
>
> Jay Yang
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Different-result-of-multiple-regression-in-R-and-SPSS-tp3679423p3679423.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
J.
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

J.
Thanks for the answer.

However, I am still curious about which result I should use? The result from R or the one from SPSS?
Why the results from two programs are different?

Jay
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

Dimitri Liakhovitski-2
In reply to this post by Bert Gunter
I don't think SPSS does anything with the variables you enter there.
Have you entered it as numeric?
Have you entered gender as numeric in R?

On Tue, Jul 19, 2011 at 6:11 PM, Bert Gunter <[hidden email]> wrote:

> Answer: Contrasts, i.e. the parameterization of the categorical variable(s) df.
>
> ?contrasts may be of some help, but you really need to do some
> background studying of the linear models principles involved. Googling
> may provide tutorials. Also searching the mail archives, e.g.:
>
> https://stat.ethz.ch/pipermail/r-help/2009-February/187479.html
>
> -- Bert
>
> On Tue, Jul 19, 2011 at 2:39 PM, J. <[hidden email]> wrote:
>> Hi, I am trying to do a simple multiple regression analysis that has one
>> nominal variable (gender) and three numeric variables as independent
>> variables and one numeric variable as dependent variable.
>>
>> So, I got a formula like this:
>> summary(out.3 <- lm(scale(DV) ~  gender + scale(IV.1) + scale(IV.2) +
>> scale(IV.3))
>>
>> I tried to compare the outcome in R with the outcome in SPSS and found the
>> results are different!
>> I found that R and SPSS have the exact same outcome when every variable is
>> numeric; however, whenever I included "gender (0/1)" variable in the
>> equation, the result become different.
>>
>> I guess that SPSS automatically treat gender as a numeric variable and
>> standardize it when running analysis. So, I tried to change "gender" to a
>> numeric variable and ran analysis but the results were still not identical.
>>
>> What is the problem here and what is the right way to do this analysis?
>> Thanks,
>>
>> Jay Yang
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Different-result-of-multiple-regression-in-R-and-SPSS-tp3679423p3679423.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> "Men by nature long to get on to the ultimate truths, and will often
> be impatient with elementary studies or fight shy of them. If it were
> possible to reach the ultimate truths without the elementary studies
> usually prefixed to them, these would not be preparatory studies but
> superfluous diversions."
>
> -- Maimonides (1135-1204)
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



--
Dimitri Liakhovitski
marketfusionanalytics.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

David Winsemius
In reply to this post by J.

On Jul 19, 2011, at 6:29 PM, J. wrote:

> Thanks for the answer.
>
> However, I am still curious about which result I should use? The  
> result from
> R or the one from SPSS?

It is becoming apparent that you do not know how to use the results  
from either system. The progress of science would be safer if you get  
some advice from a person that knows what they are doing.

> Why the results from two programs are different?

Different parametrizations. If I had to guess I would bet that the  
gender coefficient is R is exactly twice that of the one from SPSS.  
They are probably both correct in the context of their respective  
codings.

--
David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

Bert Gunter
On Tue, Jul 19, 2011 at 3:45 PM, David Winsemius <[hidden email]> wrote:
>
> On Jul 19, 2011, at 6:29 PM, J. wrote:
>
>> Thanks for the answer.
>>

#########################
>> However, I am still curious about which result I should use? The result
>> from
>> R or the one from SPSS?
>
> It is becoming apparent that you do not know how to use the results from
> either system. The progress of science would be safer if you get some advice
> from a person that knows what they are doing.
##########################
I nominate this for an R fortune.

-- Bert

>
>> Why the results from two programs are different?
>
> Different parametrizations. If I had to guess I would bet that the gender
> coefficient is R is exactly twice that of the one from SPSS. They are
> probably both correct in the context of their respective codings.
>
> --
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
J.
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

J.
In reply to this post by David Winsemius
@Dimitri: I tried to enter it as numeric and still got the same outcome. I still wonder if there is any way to get the same result from both programs.
@David, Bert: Yes, I found that the gender coefficient is R is exactly twice that of the one from SPSS. Need to study on parametrization.
Thanks,

Jay
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

Mike Marchywka
In reply to this post by David Winsemius





----------------------------------------

> From: [hidden email]
> To: [hidden email]
> Date: Tue, 19 Jul 2011 18:45:47 -0400
> CC: [hidden email]
> Subject: Re: [R] Different result of multiple regression in R and SPSS
>
>
> On Jul 19, 2011, at 6:29 PM, J. wrote:
>
> > Thanks for the answer.
> >
> > However, I am still curious about which result I should use? The
> > result from
> > R or the one from SPSS?
>
> It is becoming apparent that you do not know how to use the results
> from either system. The progress of science would be safer if you get
> some advice from a person that knows what they are doing.
>
> > Why the results from two programs are different?
>
> Different parametrizations. If I had to guess I would bet that the
> gender coefficient is R is exactly twice that of the one from SPSS.
> They are probably both correct in the context of their respective
> codings.

I guess I would also suggest, again, run some samples with known data sets
and see what you get(RSSWKDSASWYG). You would want to do this anyway if you want to insure
your real data is being used reasonably. You still need to have some way to check your
opinion from the expert mentioned above and known data will help there too.  A factor
of 2 often shows up from just looking at pictures once you have some intuition. I've
often been wrong on intuition, but chasing it down and proving it helps you learn a lot :)




>
> --
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
     
______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

Spencer Graves-2
In reply to this post by Bert Gunter
On 7/19/2011 4:04 PM, Bert Gunter wrote:

> On Tue, Jul 19, 2011 at 3:45 PM, David Winsemius<[hidden email]>  wrote:
>> On Jul 19, 2011, at 6:29 PM, J. wrote:
>>
>>> Thanks for the answer.
>>>
> #########################
>>> However, I am still curious about which result I should use? The result
>>> from
>>> R or the one from SPSS?
>> It is becoming apparent that you do not know how to use the results from
>> either system. The progress of science would be safer if you get some advice
>> from a person that knows what they are doing.
> ##########################
> I nominate this for an R fortune.
>
> -- Bert

None of us ever know what we're doing at some level.  We often think we
do, and sometimes we get results more in spite of what we've done than
because of it.  That of course increases our confidence and encourages
us to repeat mistakes in contexts where we might not be so lucky.


Spencer

>>> Why the results from two programs are different?
>> Different parametrizations. If I had to guess I would bet that the gender
>> coefficient is R is exactly twice that of the one from SPSS. They are
>> probably both correct in the context of their respective codings.
>>
>> --
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
J.
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

J.
In reply to this post by Dimitri Liakhovitski-2
I finally got the same result by converting "gender" variable as numeric, and standardize it.
I guess SPSS automatically doing the same thing when doing analysis.
But, it still is not clear to me how I can interpret "standardized categorical (dummy coded)" variable.
I'd rather stick to use R.
Thanks for all the comments and advice.

Jay
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

Daniel Malter
First, it would have helped if you had posted the actual results for us to see how far they are off (and, more specifically, by which factor).

Second, given your epiphany, you will find that that's exactly what David (and others before him) said or suggested. It is not about standardizing a nominal variable, which you theoretically cannot. It is about how the programs encode nominal variables by standard.

Daniel

J. wrote
I finally got the same result by converting "gender" variable as numeric, and standardize it.
I guess SPSS automatically doing the same thing when doing analysis.
But, it still is not clear to me how I can interpret "standardized categorical (dummy coded)" variable.
I'd rather stick to use R.
Thanks for all the comments and advice.

Jay
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

Heinz Tuechler
In reply to this post by Spencer Graves-2
At 19.07.2011 18:50 -0700, Spencer Graves wrote:

>On 7/19/2011 4:04 PM, Bert Gunter wrote:
>>On Tue, Jul 19, 2011 at 3:45 PM, David
>>Winsemius<[hidden email]>  wrote:
>>>On Jul 19, 2011, at 6:29 PM, J. wrote:
>>>
>>>>Thanks for the answer.
>>#########################
>>>>However, I am still curious about which result I should use? The result
>>>>from
>>>>R or the one from SPSS?
>>>It is becoming apparent that you do not know how to use the results from
>>>either system. The progress of science would be safer if you get some advice
>>>from a person that knows what they are doing.
>>##########################
>>I nominate this for an R fortune.
>>
>>-- Bert
>
>None of us ever know what we're doing at some
>level.  We often think we do, and sometimes we
>get results more in spite of what we've done
>than because of it.  That of course increases
>our confidence and encourages us to repeat
>mistakes in contexts where we might not be so lucky.
>
>
>Spencer


Wise!

Heinz


>>>>Why the results from two programs are different?
>>>Different parametrizations. If I had to guess I would bet that the gender
>>>coefficient is R is exactly twice that of the one from SPSS. They are
>>>probably both correct in the context of their respective codings.
>>>
>>>--
>>>David Winsemius, MD
>>>West Hartford, CT
>>>
>>>______________________________________________
>>>[hidden email] mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>and provide commented, minimal, self-contained, reproducible code.
>>______________________________________________
>>[hidden email] mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>
>--
>Spencer Graves, PE, PhD
>President and Chief Technology Officer
>Structure Inspection and Monitoring, Inc.
>751 Emerson Ct.
>San José, CA 95126
>ph:  408-655-4567
>web:  www.structuremonitoring.com
>
>______________________________________________
>[hidden email] mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

Bert Gunter
In reply to this post by J.
On Tue, Jul 19, 2011 at 4:19 PM, J. <[hidden email]> wrote:
> @Dimitri: I tried to enter it as numeric and still got the same outcome. I
> still wonder if there is any way to get the same result from both programs.

There is. ?C ?contrasts

But of course you must do your homework to understand how to use
these. (See the quote in my signature).

-- Bert




> @David, Bert: Yes, I found that the gender coefficient is R is exactly twice
> that of the one from SPSS. Need to study on parametrization.
> Thanks,
>
> Jay
>
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Different result of multiple regression in R and SPSS

David Winsemius
In reply to this post by J.

On Jul 19, 2011, at 7:19 PM, J. wrote:

> @Dimitri: I tried to enter it as numeric and still got the same  
> outcome. I
> still wonder if there is any way to get the same result from both  
> programs.
> @David, Bert: Yes, I found that the gender coefficient is R is  
> exactly twice
> that of the one from SPSS. Need to study on parametrization.

Yes. I tested my own advice and did a google search with "different  
parametrization of dummy variables spss and r".
1)
http://support.spss.com/productsext/spss/documentation/statistics/articles/catreg3.htm
2)
http://www.thejuliagroup.com/blog/?p=1531
3)
I'm not sure it should be in a publicly accessible site, but Google  
links to a pdf of the full text of "Data Analysis and Graphics Using R  
– an Example-Based Approach: Third Edition" by  Maindonald % Braun

http://lib.dnu.dp.ua:8001/l/%D0%9A%D0%BE%D0%BF%D1%8C%D1%8E%D1%82%D0%B5%D1%80%D1%8B%D0%98%D1%81%D0%B5%D1%82%D0%B8/%D0%9F%D0%BE%D0%BF%D1%83%D0%BB%D1%8F%D1%80%D0%BD%D1%8B%D0%B5%20%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D1%8B/S-PLUS%20R/Data%20Analysis%20and%20Graphics%20Using%20R%203rd%20Edition.pdf

And chapter 7 would be where to look.

Bottom line. You should only be looking at coefficient values when you  
know the coding of your factors. You cannot interpret the coefficients  
of an SPSS run as differences between males and females, because they  
are based on a -1 vs. 1 coding, what in R are called sum.contrasts. R  
uses a default of treatment contrasts (0 versus 1) but will offer  
sum.contrasts if asked nicely.  (And you should never interpret "main  
effects" coefficients when you are using interactions in models.  
Always use predictions in that instance.)

> Thanks,
>
> Jay
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Different-result-of-multiple-regression-in-R-and-SPSS-tp3679423p3679590.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.