Problem while working with SPSS data

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem while working with SPSS data

Arun.stat
Dear all R users,

I got a strange problem while working with SPSS data :

I wrote following :

library(foreign)
data.original = as.data.frame(read.spss(file="c:/Program Files/SPSS/Employee
data.sav"))

data = as.data.frame(cbind(data.original$MINORITY, data.original$EDUC,
data.original$PREVEXP, data.original$JOBCAT, data.original$GENDER))
colnames(data) = c('MINORITY', 'EDUC', 'PREVEXP', 'JOBCAT', 'GENDER')

head( data.original)

  ID GENDER       BDATE EDUC   JOBCAT SALARY SALBEGIN JOBTIME PREVEXP
MINORITY
1  1   <NA> 11654150400   15  Manager  57000    27000      98     144
No
2  2   <NA> 11852956800   16 Clerical  40200    18750      98      36
No
3  3   <NA> 10943337600   12 Clerical  21450    12000      98     381
No
4  4   <NA> 11502518400    8 Clerical  21900    13200      98     190
No
5  5   <NA> 11749363200   15 Clerical  45000    21000      98     138
No
6  6   <NA> 11860819200   15 Clerical  32100    13500      98      67
No

 head( data)
  V1 V2  V3 V4 V5
1  1  5 144  4 NA
2  1  6  36  2 NA
3  1  3 381  2 NA
4  1  2 190  2 NA
5  1  5 138  2 NA
6  1  5  67  2 NA


here I got the values of variable "V2" as 5,6,3,...........etc which should
be 15,16,12,....................

can anyone tell me why I got that?

And my second question is that in my "data.original" why I got the values of
"GENDER" as NA? Is there any way to get the actual values i.e. "m", and "f"?

Thanks
Arun

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Problem while working with SPSS data

Chuck Cleland
Arun Kumar Saha wrote:

> Dear all R users,
>
> I got a strange problem while working with SPSS data :
>
> I wrote following :
>
> library(foreign)
> data.original = as.data.frame(read.spss(file="c:/Program Files/SPSS/Employee
> data.sav"))
>
> data = as.data.frame(cbind(data.original$MINORITY, data.original$EDUC,
> data.original$PREVEXP, data.original$JOBCAT, data.original$GENDER))
> colnames(data) = c('MINORITY', 'EDUC', 'PREVEXP', 'JOBCAT', 'GENDER')
>
> head( data.original)
>
>   ID GENDER       BDATE EDUC   JOBCAT SALARY SALBEGIN JOBTIME PREVEXP
> MINORITY
> 1  1   <NA> 11654150400   15  Manager  57000    27000      98     144
> No
> 2  2   <NA> 11852956800   16 Clerical  40200    18750      98      36
> No
> 3  3   <NA> 10943337600   12 Clerical  21450    12000      98     381
> No
> 4  4   <NA> 11502518400    8 Clerical  21900    13200      98     190
> No
> 5  5   <NA> 11749363200   15 Clerical  45000    21000      98     138
> No
> 6  6   <NA> 11860819200   15 Clerical  32100    13500      98      67
> No
>
>  head( data)
>   V1 V2  V3 V4 V5
> 1  1  5 144  4 NA
> 2  1  6  36  2 NA
> 3  1  3 381  2 NA
> 4  1  2 190  2 NA
> 5  1  5 138  2 NA
> 6  1  5  67  2 NA
>
>
> here I got the values of variable "V2" as 5,6,3,...........etc which should
> be 15,16,12,....................

> can anyone tell me why I got that?

  Your use of cbind() converted the factors to numeric.

> And my second question is that in my "data.original" why I got the values of
> "GENDER" as NA? Is there any way to get the actual values i.e. "m", and "f"?

  Gender is of type "string" in the SPSS file, which seems to cause some
problem when you try to use the SPSS value labels.  You might set the
use.value.labels argument to FALSE.

df <- read.spss(file="c:/Program Files/SPSS/Employee data.sav",
                to.data.frame=TRUE, use.value.labels=FALSE)

summary(df)
       ID        GENDER      BDATE                EDUC
 Min.   :  1.0   f:216   Min.   :1.093e+10   Min.   : 8.00
 1st Qu.:119.3   m:258   1st Qu.:1.153e+10   1st Qu.:12.00
 Median :237.5           Median :1.197e+10   Median :12.00
 Mean   :237.5           Mean   :1.180e+10   Mean   :13.49
 3rd Qu.:355.8           3rd Qu.:1.208e+10   3rd Qu.:15.00
 Max.   :474.0           Max.   :1.225e+10   Max.   :21.00
                         NA's   :1.000e+00

     JOBCAT          SALARY          SALBEGIN        JOBTIME
 Min.   :1.000   Min.   : 15750   Min.   : 9000   Min.   :63.00
 1st Qu.:1.000   1st Qu.: 24000   1st Qu.:12488   1st Qu.:72.00
 Median :1.000   Median : 28875   Median :15000   Median :81.00
 Mean   :1.411   Mean   : 34420   Mean   :17016   Mean   :81.11
 3rd Qu.:1.000   3rd Qu.: 36938   3rd Qu.:17490   3rd Qu.:90.00
 Max.   :3.000   Max.   :135000   Max.   :79980   Max.   :98.00

    PREVEXP          MINORITY
 Min.   :  0.00   Min.   :0.0000
 1st Qu.: 19.25   1st Qu.:0.0000
 Median : 55.00   Median :0.0000
 Mean   : 95.86   Mean   :0.2194
 3rd Qu.:138.75   3rd Qu.:0.0000
 Max.   :476.00   Max.   :1.0000

  If you want to retain the labels for all of the variables and get
around the problem with gender, you might do this:

df1 <- read.spss(file="c:/Program Files/SPSS/Employee data.sav",
to.data.frame=TRUE, use.value.labels=TRUE)

df2 <- read.spss(file="c:/Program Files/SPSS/Employee data.sav",
to.data.frame=TRUE, use.value.labels=FALSE)

new.df <- merge(df1[,!names(df1) %in% "GENDER"], df2[,c("ID","GENDER")])

head(new.df)
  ID       BDATE EDUC   JOBCAT SALARY SALBEGIN JOBTIME PREVEXP
1  1 11654150400   15  Manager  57000    27000      98     144
2  2 11852956800   16 Clerical  40200    18750      98      36
3  3 10943337600   12 Clerical  21450    12000      98     381
4  4 11502518400    8 Clerical  21900    13200      98     190
5  5 11749363200   15 Clerical  45000    21000      98     138
6  6 11860819200   15 Clerical  32100    13500      98      67
  MINORITY GENDER
1       No      m
2       No      m
3       No      f
4       No      f
5       No      m
6       No      m

> Thanks
> Arun

> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.