Memisc package: imported varibles from SPSS have got wrong measurement

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Memisc package: imported varibles from SPSS have got wrong measurement

Marion Wenty-2
Dear list members,

I have got another problem. I imported an SPSS file with the Memisc package
using the following commands:

mz <- spss.system.file("myspssfile.sav")

mz <- subset(mz,select=c(
 bsex,balt,xurb,dtaet,kartab,bgeb,boseit,bgeblan,xnuts2,kausb,xerwstat,
 asbper,asbhh,ajahr,aquartal,bstaat,xwieoft,gew1,apkz,bpkzm,bpkzv))

Afterwards I checked the measurements of the variables and they are all
right for most of them (e.g. the variable containing the sex of a person is
"nominal" and the variable containing the year is "interval"). For two of
the variables the measurement is not o.k, though. They exclusively contain
numbers in the SPSS file (e.g. the age of a person) - and no NAs - but have
got the measurement "nominal"!

Does anyone know why this is happening?

Thank you very much for your help in advance!

Marion

--
Mag.a Marion Wenty
Wissenschaftliche Mitarbeiterin
Institut für Kinderrechte und Elternbildung
Ballgasse 2, 6. Stock
1010 Wien

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Memisc package: imported varibles from SPSS have got wrong measurement

David Winsemius

On Jul 25, 2012, at 9:48 AM, Marion Wenty wrote:

> Dear list members,
>
> I have got another problem. I imported an SPSS file with the Memisc  
> package
> using the following commands:
>
> mz <- spss.system.file("myspssfile.sav")
>
> mz <- subset(mz,select=c(
> bsex,balt,xurb,dtaet,kartab,bgeb,boseit,bgeblan,xnuts2,kausb,xerwstat,
> asbper,asbhh,ajahr,aquartal,bstaat,xwieoft,gew1,apkz,bpkzm,bpkzv))

The memisc package help file for spss.system.file() (actually labeled  
"importers") says that there is an S4 method for "subset" there does  
not seem to be a separate page describing its behavior or values

>
> Afterwards I checked the measurements of the variables

What does that phrase mean? What code did you actually use?

> and they are all
> right for most of them (e.g. the variable containing the sex of a  
> person is
> "nominal" and the variable containing the year is "interval"). For  
> two of
> the variables the measurement is not o.k, though. They exclusively  
> contain
> numbers in the SPSS file (e.g. the age of a person) - and no NAs -  
> but have
> got the measurement "nominal"!

Hard to say. There is no R storage mode that is called "nominal".  
Perhaps some sort of memisc-specific terminology? Or perhaps something  
in your spss dataset... to which you have not provided access? Or the  
default setting for the spss.system.file access method?  I know that  
the Hmisc package's describe() function will report out a variable as  
though it were categorical if there are only 8 or fewer unique values.

After looking at further help pages and trying the help pages code, I  
am guessing that some of my puzzlement might be answered by reading  
the vignettes, but you can do that yourself.

Here's a hackish guess ... try:

?measurement
measurement(mz$age_variable) <- "interval"
# where age_variable is the unstated item in that "select" list

--

David Winsemius, MD

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Memisc package: imported varibles from SPSS have got wrong measurement

Paul Bivand
David's ?measurement
measurement(mz$age_variable) <- "interval"
# where age_variable is the unstated item in that "select" list
is what I use in similar circumstances.

Where it seems to come from is the SPSS users habit of setting value
labels on various categories of user-missing values - so a survey will
commonly have no actual missings (in spss, system-missing) but -9 as
'refused', -8 as 'not contactable' and so on. The importer picks up
the existence of value labels and sets the mode as "nominal" - which
gets transformed into factor in R usage - using base read.spss would
be likely to read these in as factor.

For analysis purposes, these values would be likely to be NA, but it
may be important to record that you were making that change.

On 26 July 2012 01:35, David Winsemius <[hidden email]> wrote:

>
> On Jul 25, 2012, at 9:48 AM, Marion Wenty wrote:
>
>> Dear list members,
>>
>> I have got another problem. I imported an SPSS file with the Memisc
>> package
>> using the following commands:
>>
>> mz <- spss.system.file("myspssfile.sav")
>>
>> mz <- subset(mz,select=c(
>> bsex,balt,xurb,dtaet,kartab,bgeb,boseit,bgeblan,xnuts2,kausb,xerwstat,
>> asbper,asbhh,ajahr,aquartal,bstaat,xwieoft,gew1,apkz,bpkzm,bpkzv))
>
>
> The memisc package help file for spss.system.file() (actually labeled
> "importers") says that there is an S4 method for "subset" there does not
> seem to be a separate page describing its behavior or values
>
>
>>
>> Afterwards I checked the measurements of the variables
>
>
> What does that phrase mean? What code did you actually use?
>
>
>> and they are all
>> right for most of them (e.g. the variable containing the sex of a person
>> is
>> "nominal" and the variable containing the year is "interval"). For two of
>> the variables the measurement is not o.k, though. They exclusively contain
>> numbers in the SPSS file (e.g. the age of a person) - and no NAs - but
>> have
>> got the measurement "nominal"!
>
>
> Hard to say. There is no R storage mode that is called "nominal". Perhaps
> some sort of memisc-specific terminology? Or perhaps something in your spss
> dataset... to which you have not provided access? Or the default setting for
> the spss.system.file access method?  I know that the Hmisc package's
> describe() function will report out a variable as though it were categorical
> if there are only 8 or fewer unique values.
>
> After looking at further help pages and trying the help pages code, I am
> guessing that some of my puzzlement might be answered by reading the
> vignettes, but you can do that yourself.
>
> Here's a hackish guess ... try:
>
> ?measurement
> measurement(mz$age_variable) <- "interval"
> # where age_variable is the unstated item in that "select" list
>
> --
>
> David Winsemius, MD
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Memisc package: imported varibles from SPSS have got wrong measurement

Marion Wenty-2
Dear David and Paul,

Thank you for your answers!

Yes, Paul, I checked the SPSS file and the numerical variables that had no
value labels assigned to them in SPSS got the right measurements in R after
importing, namely "interval". The two variables that had the measurement
"nominal" in R allthough they contain exclusively numbers have value labels
assigned to them in SPSS, namely -3 for a filter. So now I know why this is
happening, thank you very much!

David, I used:


sapply(mz,function(a)measurement(a))


to check the measurements. "Nominal" and "interval" are in fact special
memisc terminology - the first for categorical variables, the second for
numeric variables.


Thank's for this suggestion of using


measurement(mz$age_variable) <- "interval"


to change the measurement which is in accordance whith the R logic, so it
is probably always a good way to try something like this even without
knowing the package very well.

Cheers,
Marion






David Winsemius <https://plus.google.com/u/0/112644179894492041799?prsrc=4>

 Dear list members,

>
> I have got another problem. I imported an SPSS file with the Memisc package
> using the following commands:
>
> mz <- spss.system.file("myspssfile.sav")
>
> mz <- subset(mz,select=c(
> bsex,balt,xurb,dtaet,kartab,bgeb,boseit,bgeblan,xnuts2,kausb,xerwstat,
> asbper,asbhh,ajahr,aquartal,bstaat,xwieoft,gew1,apkz,bpkzm,bpkzv))
>
The memisc package help file for spss.system.file() (actually labeled
"importers") says that there is an S4 method for "subset" there does not
seem to be a separate page describing its behavior or values



> Afterwards I checked the measurements of the variables
>

What does that phrase mean? What code did you actually use?


 and they are all
> right for most of them (e.g. the variable containing the sex of a person is
> "nominal" and the variable containing the year is "interval"). For two of
> the variables the measurement is not o.k, though. They exclusively contain
> numbers in the SPSS file (e.g. the age of a person) - and no NAs - but have
> got the measurement "nominal"!
>

Hard to say. There is no R storage mode that is called "nominal". Perhaps
some sort of memisc-specific terminology? Or perhaps something in your spss
dataset... to which you have not provided access? Or the default setting
for the spss.system.file access method?  I know that the Hmisc package's
describe() function will report out a variable as though it were
categorical if there are only 8 or fewer unique values.

After looking at further help pages and trying the help pages code, I am
guessing that some of my puzzlement might be answered by reading the
vignettes, but you can do that yourself.

Here's a hackish guess ... try:

?measurement
measurement(mz$age_variable) <- "interval"
# where age_variable is the unstated item in that "select" list


Paul Bivand
26. Jul (vor 1 Tag)

an r-help; Marion

Nachricht übersetzen
Deaktivieren für: Englisch
David's ?measurement
measurement(mz$age_variable) <- "interval"
# where age_variable is the unstated item in that "select" list
is what I use in similar circumstances.

Where it seems to come from is the SPSS users habit of setting value
labels on various categories of user-missing values - so a survey will
commonly have no actual missings (in spss, system-missing) but -9 as
'refused', -8 as 'not contactable' and so on. The importer picks up
the existence of value labels and sets the mode as "nominal" - which
gets transformed into factor in R usage - using base read.spss would
be likely to read these in as factor.

For analysis purposes, these values would be likely to be NA, but it
may be important to record that you were making that change.

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.