Converting factors back to numbers. Trouble with SPSS import data

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Converting factors back to numbers. Trouble with SPSS import data

PaulJohnson32gmail
I'm using Fedora Core 4, R-2.2.

The basic question is: can one recover the numerical values used in
SPSS after importing data into R with read.spss from the foreign
library?  Here's why I ask.

My colleague sent an SPSS data set. I must replicate some results she
calculated in SPSS and one problem is that the numbers used in SPSS
for variable values are not easily recovered in R.

I'm comparing 2 imported datasets, "eldat" (read.spss with No
convert-to-factors) and
"eldatfac" (read.spss with convert-to-factors)

If I bring in the data without conversion to factors:

library(foreign)
eldat <- read.spss("18CitySCBSsorted.sav", use.value.labels=F,
                        to.data.frame=T)

I can see the variable HAPPY is coded 0, 1, 2, 3.  Those are the
numbers that SPSS
uses as contrast values when it runs a regression with HAPPY.

In contrast,  allow R to translate the variables with a few value
labels into factors.

library(foreign)
eldatfac <- read.spss("18CitySCBSsorted.sav",
max.value.labels=7,to.data.frame=T)

Consider the first 50 observations on the variable HAPPY

> f<- eldatfac$HAPPY[1:50]
> f
 [1] Happy          Happy          Very happy     Happy          Very happy
 [6] Very happy     Happy          Very happy     Happy          Very happy
[11] Happy          Happy          Not very happy Very happy     Very happy
[16] Happy          Happy          Very happy     Happy          Happy
[21] Not very happy Happy          Happy          Very happy     Happy
[26] Happy          Happy          Happy          Happy          Happy
[31] Happy          Happy          Happy          Happy          Happy
[36] Happy          Very happy     Very happy     Happy          Very happy
[41] Very happy     Very happy     Happy          Very happy     Very happy
[46] Happy          Happy          Happy          Very happy     Very happy
6 Levels: Not happy at all Not very happy Happy Very happy ... Refused

> levels(f)
[1] "Not happy at all" "Not very happy"   "Happy"            "Very happy"
[5] "Don't know"       "Refused"


I need the numerical values back in order to have a regression like
SPSS.  Isn't this what ?factor says one ought to do? Why are these all
missing?

> as.numeric(levels(f))[f]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA


> as.numeric(f)
 [1] 3 3 4 3 4 4 3 4 3 4 3 3 2 4 4 3 3 4 3 3 2 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 4 4
[39] 3 4 4 4 3 4 4 3 3 3 4 4

Comparing against the "as.numeric" output from the unconverted factor,
I can see the levels are just one digit different.

> g <- eldat$HAPPY[1:50]
> as.numeric(g)
 [1] 2 2 3 2 3 3 2 3 2 3 2 2 1 3 3 2 2 3 2 2 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 3 3
[39] 2 3 3 3 2 3 3 2 2 2 3 3

I'm more worried about the kinds of variables that are coded
irregularly 1, 3, 7, 11 in the SPSS scheme.

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Converting factors back to numbers. Trouble with SPSS importdata

Robert Baer
Quoted directly from the FAQ (although granted I need to look this up over
and over, myself.  Would that it had a easily remembered wrapper function):
7.10 How do I convert factors to numeric?
It may happen that when reading numeric data into R (usually, when reading
in a file), they come in as factors. If f is such a factor object, you can
use

     as.numeric(as.character(f))
to get the numbers back. More efficient, but harder to remember, is

     as.numeric(levels(f))[as.integer(f)]
In any case, do not call as.numeric() or their likes directly for the task
at hand (as as.numeric() or unclass() give the internal codes).

----- Original Message -----
From: "Paul Johnson" <[hidden email]>
To: <[hidden email]>
Sent: Sunday, February 19, 2006 2:16 PM
Subject: [R] Converting factors back to numbers. Trouble with SPSS
importdata


> I'm using Fedora Core 4, R-2.2.
>
> The basic question is: can one recover the numerical values used in
> SPSS after importing data into R with read.spss from the foreign
> library?  Here's why I ask.
>
> My colleague sent an SPSS data set. I must replicate some results she
> calculated in SPSS and one problem is that the numbers used in SPSS
> for variable values are not easily recovered in R.
>
> I'm comparing 2 imported datasets, "eldat" (read.spss with No
> convert-to-factors) and
> "eldatfac" (read.spss with convert-to-factors)
>
> If I bring in the data without conversion to factors:
>
> library(foreign)
> eldat <- read.spss("18CitySCBSsorted.sav", use.value.labels=F,
>                        to.data.frame=T)
>
> I can see the variable HAPPY is coded 0, 1, 2, 3.  Those are the
> numbers that SPSS
> uses as contrast values when it runs a regression with HAPPY.
>
> In contrast,  allow R to translate the variables with a few value
> labels into factors.
>
> library(foreign)
> eldatfac <- read.spss("18CitySCBSsorted.sav",
> max.value.labels=7,to.data.frame=T)
>
> Consider the first 50 observations on the variable HAPPY
>
>> f<- eldatfac$HAPPY[1:50]
>> f
> [1] Happy          Happy          Very happy     Happy          Very happy
> [6] Very happy     Happy          Very happy     Happy          Very happy
> [11] Happy          Happy          Not very happy Very happy     Very
> happy
> [16] Happy          Happy          Very happy     Happy          Happy
> [21] Not very happy Happy          Happy          Very happy     Happy
> [26] Happy          Happy          Happy          Happy          Happy
> [31] Happy          Happy          Happy          Happy          Happy
> [36] Happy          Very happy     Very happy     Happy          Very
> happy
> [41] Very happy     Very happy     Happy          Very happy     Very
> happy
> [46] Happy          Happy          Happy          Very happy     Very
> happy
> 6 Levels: Not happy at all Not very happy Happy Very happy ... Refused
>
>> levels(f)
> [1] "Not happy at all" "Not very happy"   "Happy"            "Very happy"
> [5] "Don't know"       "Refused"
>
>
> I need the numerical values back in order to have a regression like
> SPSS.  Isn't this what ?factor says one ought to do? Why are these all
> missing?
>
>> as.numeric(levels(f))[f]
> [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA
> [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> NA NA
>
>
>> as.numeric(f)
> [1] 3 3 4 3 4 4 3 4 3 4 3 3 2 4 4 3 3 4 3 3 2 3 3 4 3 3 3 3 3 3 3 3 3 3 3
> 3 4 4
> [39] 3 4 4 4 3 4 4 3 3 3 4 4
>
> Comparing against the "as.numeric" output from the unconverted factor,
> I can see the levels are just one digit different.
>
>> g <- eldat$HAPPY[1:50]
>> as.numeric(g)
> [1] 2 2 3 2 3 3 2 3 2 3 2 2 1 3 3 2 2 3 2 2 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2
> 2 3 3
> [39] 2 3 3 3 2 3 3 2 2 2 3 3
>
> I'm more worried about the kinds of variables that are coded
> irregularly 1, 3, 7, 11 in the SPSS scheme.
>
> --
> Paul E. Johnson
> Professor, Political Science
> 1541 Lilac Lane, Room 504
> University of Kansas
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Converting factors back to numbers. Trouble with SPSS importdata

PaulJohnson32gmail
On 2/19/06, Robert W. Baer, Ph.D. <[hidden email]> wrote:

> Quoted directly from the FAQ (although granted I need to look this up over
> and over, myself.  Would that it had a easily remembered wrapper function):
> 7.10 How do I convert factors to numeric?
> It may happen that when reading numeric data into R (usually, when reading
> in a file), they come in as factors. If f is such a factor object, you can
> use
>
>      as.numeric(as.character(f))
> to get the numbers back. More efficient, but harder to remember, is
>
>      as.numeric(levels(f))[as.integer(f)]

I don't think I have that problem described in the FAQ.  I've had that
before, though.

Observe. Here's the original thing:

> eldatfac$HAPPY[1:10]
 [1] Happy      Happy      Very happy Happy      Very happy Very happy
 [7] Happy      Very happy Happy      Very happy
6 Levels: Not happy at all Not very happy Happy Very happy ... Refused

Here's the result of the first thing you cite from the FAQ

> as.numeric(as.character(eldatfac$HAPPY))[1:10]
 [1] NA NA NA NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion

Here's the second thing from the FAQ

> as.numeric(levels(eldatfac$HAPPY))[as.integer(eldatfac$HAPPY)]
 [1] NA NA NA NA NA NA NA NA NA NA
Warning message:
NAs introduced by coercion

What am I missing here?

--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Converting factors back to numbers. Trouble with SPSS importdata

Duncan Murdoch
On 2/19/2006 7:53 PM, Paul Johnson wrote:

> On 2/19/06, Robert W. Baer, Ph.D. <[hidden email]> wrote:
>> Quoted directly from the FAQ (although granted I need to look this up over
>> and over, myself.  Would that it had a easily remembered wrapper function):
>> 7.10 How do I convert factors to numeric?
>> It may happen that when reading numeric data into R (usually, when reading
>> in a file), they come in as factors. If f is such a factor object, you can
>> use
>>
>>      as.numeric(as.character(f))
>> to get the numbers back. More efficient, but harder to remember, is
>>
>>      as.numeric(levels(f))[as.integer(f)]
>
> I don't think I have that problem described in the FAQ.  I've had that
> before, though.
>
> Observe. Here's the original thing:
>
>> eldatfac$HAPPY[1:10]
>  [1] Happy      Happy      Very happy Happy      Very happy Very happy
>  [7] Happy      Very happy Happy      Very happy
> 6 Levels: Not happy at all Not very happy Happy Very happy ... Refused
>
> Here's the result of the first thing you cite from the FAQ
>
>> as.numeric(as.character(eldatfac$HAPPY))[1:10]
>  [1] NA NA NA NA NA NA NA NA NA NA
> Warning message:
> NAs introduced by coercion
>
> Here's the second thing from the FAQ
>
>> as.numeric(levels(eldatfac$HAPPY))[as.integer(eldatfac$HAPPY)]
>  [1] NA NA NA NA NA NA NA NA NA NA
> Warning message:
> NAs introduced by coercion
>
> What am I missing here?

You're right, you have a different problem.  The FAQ is talking about
the situation where the data in a file is numeric but is read as a
factor, perhaps because of typos in one or two values.

In your case, levels(eldatfac$HAPPY) will tell you the correspondence
between R's internal numbers and labels.  It's not the same as SPSS
uses; as far as I know that coding is lost at this point.  You'll need
to work out the coding you want to use and do it yourself.  For example,
if the 1st 4 codes should be 0:3 and the others NA, you could use

encoding <- c(0:3, NA, NA)
encoding[as.integer(eldatfac$HAPPY)]

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: Converting factors back to numbers. Trouble with SPSS import data

Thomas Lumley
In reply to this post by PaulJohnson32gmail
On Sun, 19 Feb 2006, Paul Johnson wrote:

> I'm using Fedora Core 4, R-2.2.
>
> The basic question is: can one recover the numerical values used in
> SPSS after importing data into R with read.spss from the foreign
> library?  Here's why I ask.
>
> My colleague sent an SPSS data set. I must replicate some results she
> calculated in SPSS and one problem is that the numbers used in SPSS
> for variable values are not easily recovered in R.
>
> I'm comparing 2 imported datasets, "eldat" (read.spss with No
> convert-to-factors) and
> "eldatfac" (read.spss with convert-to-factors)
>
> If I bring in the data without conversion to factors:
>
> library(foreign)
> eldat <- read.spss("18CitySCBSsorted.sav", use.value.labels=F,
>                        to.data.frame=T)
>
> I can see the variable HAPPY is coded 0, 1, 2, 3.  Those are the
> numbers that SPSS
> uses as contrast values when it runs a regression with HAPPY.

So, bring in the data without conversion to factors.

Factors in R are not just labels for arbitrary numeric variables. They are a special type of variable for categorical data that happen to be implemented with the numbers 1,2,3,...

If that isn't what you want, don't use factors. read.spss will still return all the labels as attributes of the returned data frame.



> In contrast,  allow R to translate the variables with a few value
> labels into factors.
>
> library(foreign)
> eldatfac <- read.spss("18CitySCBSsorted.sav",
> max.value.labels=7,to.data.frame=T)
>
> Consider the first 50 observations on the variable HAPPY
>
>> f<- eldatfac$HAPPY[1:50]
>> f
> [1] Happy          Happy          Very happy     Happy          Very happy
> [6] Very happy     Happy          Very happy     Happy          Very happy
> [11] Happy          Happy          Not very happy Very happy     Very happy
> [16] Happy          Happy          Very happy     Happy          Happy
> [21] Not very happy Happy          Happy          Very happy     Happy
> [26] Happy          Happy          Happy          Happy          Happy
> [31] Happy          Happy          Happy          Happy          Happy
> [36] Happy          Very happy     Very happy     Happy          Very happy
> [41] Very happy     Very happy     Happy          Very happy     Very happy
> [46] Happy          Happy          Happy          Very happy     Very happy
> 6 Levels: Not happy at all Not very happy Happy Very happy ... Refused
>
>> levels(f)
> [1] "Not happy at all" "Not very happy"   "Happy"            "Very happy"
> [5] "Don't know"       "Refused"
>
>
> I need the numerical values back in order to have a regression like
> SPSS.  Isn't this what ?factor says one ought to do? Why are these all
> missing?
>
>> as.numeric(levels(f))[f]
> [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> [26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NANA

No, this is not what ?factor says you should do.  This is what you do if your levels are numbers (in character form) and you want those numbers. "Happy" is not a number.


>> as.numeric(f)
> [1] 3 3 4 3 4 4 3 4 3 4 3 3 2 4 4 3 3 4 3 3 2 3 3 4 3 3 3 3 3 3 3 3 3 3 3 3 4 4
> [39] 3 4 4 4 3 4 4 3 3 3 4 4
>
> Comparing against the "as.numeric" output from the unconverted factor,
> I can see the levels are just one digit different.

Yes, because SPSS used the codes 0,1,2,3 and R uses 1,2,3,4.  You could just subtract 1 if you want the numbers to be smaller by 1.


>> g <- eldat$HAPPY[1:50]
>> as.numeric(g)
> [1] 2 2 3 2 3 3 2 3 2 3 2 2 1 3 3 2 2 3 2 2 1 2 2 3 2 2 2 2 2 2 2 2 2 2 2 2 3 3
> [39] 2 3 3 3 2 3 3 2 2 2 3 3
>
> I'm more worried about the kinds of variables that are coded
> irregularly 1, 3, 7, 11 in the SPSS scheme.
>

If you want to keep the numeric values, don't change them to factors. That's why there is an option.


     -thomas

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html