Wishlist: merge and subset to keep attributes (PR#8658)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Wishlist: merge and subset to keep attributes (PR#8658)

Frank Harrell
When importing data from SPSS, it is a nice feature of the package
foreign that
it allows (option use.value.labels=F) to work with the original SPSS
codes while
keeping the value labels as information in an attribute. Unfortunately,
after
merging or subsetting, these attributes disappear.
The code below illustrates the problem: Variable time originally has value
labels that are gone after merging or subsetting.

It would be very helpful, if this could be changed.

With kind regards, Ulrike
-------------------------------

Ulrike - see the spss.get, label, contents, and describe functions in
the Hmisc package.

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: Wishlist: merge and subset to keep attributes (PR#8658)

Ulrike Grömping-2
> When importing data from SPSS, it is a nice feature of the package
> foreign that
> it allows (option use.value.labels=F) to work with the original SPSS
> codes while
> keeping the value labels as information in an attribute. Unfortunately,
> after
> merging or subsetting, these attributes disappear.
> The code below illustrates the problem: Variable time originally has value
> labels that are gone after merging or subsetting.
>
> It would be very helpful, if this could be changed.
>
> With kind regards, Ulrike
> -------------------------------
>
> Ulrike - see the spss.get, label, contents, and describe functions in
> the Hmisc package.
>
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   Vanderbilt University
------- End of Original Message -------

For the sake of completeness of the thread in R-devel:
After a longer offline exchange, Frank and I have agreed that Hmisc spss.get
currently does not offer more than read.spss from package foreign in terms of
being able to use both original codes and value labels from SPSS files (which
is desirable when working with large datasets from well-documented studies
that often require filtering rules based on original codes to be applied
while at the same time one does want to preseve annotation with value
labels).

The solution from package foreign: The option "use.value.labels=F" prevents
SPSS factors (with codes and value labels) to be read into R as factors.
Instead, codes are read as numeric values, and the value labels are preserved
by assigning an attribute "value.labels" to each such variable. My issue is
that these attributes are lost when subsetting or merging such datasets. I
have no idea how difficult it is to get this changed; if it is doable without
too much hassle, it would be great.

And by the way - not mentioned in my wish - read.spss also assigns the
attribute "variable.labels" to the dataset itself. This attribute is
currently also lost when merging or subsetting.
(Here, spss.get from Hmisc works differently by assigning each variable a
class and a label attribute which are preserved. I have the suspicion that
this makes spss.get substantially slower than read.spss; on the other hand,
it makes it easier to use these labels in annotation.)

With kind regards, Ulrike

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Reply | Threaded
Open this post in threaded view
|

Re: Wishlist: merge and subset to keep attributes (PR#8658)

Peter Dalgaard
"Ulrike Grömping" <[hidden email]> writes:

> > When importing data from SPSS, it is a nice feature of the package
> > foreign that
> > it allows (option use.value.labels=F) to work with the original SPSS
> > codes while
> > keeping the value labels as information in an attribute. Unfortunately,
> > after
> > merging or subsetting, these attributes disappear.
> > The code below illustrates the problem: Variable time originally has value
> > labels that are gone after merging or subsetting.
> >
> > It would be very helpful, if this could be changed.
> >
> > With kind regards, Ulrike
> > -------------------------------
> >
> > Ulrike - see the spss.get, label, contents, and describe functions in
> > the Hmisc package.
> >
> > --
> > Frank E Harrell Jr   Professor and Chair           School of Medicine
> >                       Department of Biostatistics   Vanderbilt University
> ------- End of Original Message -------
>
> For the sake of completeness of the thread in R-devel:
> After a longer offline exchange, Frank and I have agreed that Hmisc spss.get
> currently does not offer more than read.spss from package foreign in terms of
> being able to use both original codes and value labels from SPSS files (which
> is desirable when working with large datasets from well-documented studies
> that often require filtering rules based on original codes to be applied
> while at the same time one does want to preseve annotation with value
> labels).
>
> The solution from package foreign: The option "use.value.labels=F" prevents
> SPSS factors (with codes and value labels) to be read into R as factors.
> Instead, codes are read as numeric values, and the value labels are preserved
> by assigning an attribute "value.labels" to each such variable. My issue is
> that these attributes are lost when subsetting or merging such datasets. I
> have no idea how difficult it is to get this changed; if it is doable without
> too much hassle, it would be great.

I don't think this is possible. It is happening at the level of "["
which always strips attributes. Try for instance

x <- 1:4
attr(x, "foo") <- "bar"
x
x[1]

It's a bit unclear to me why this is so, but e.g. dimension attributes
do fairly obviously need to be removed.

It's the sort of thing where you're bound to discover just how much
code is relying on the current behaviour (quite possibly unwittingly)
if you try to change it.

In general it is not a good idea to change language semantics for
everyone in all contexts, just because someone is unhappy with the
behaviour in one particular context...

If you want different behaviour for a limited scope, you probably need
to do it Frank's way: by defining a class and an indexing method for
it. Or copy over the attributes as needed.
 
> And by the way - not mentioned in my wish - read.spss also assigns the
> attribute "variable.labels" to the dataset itself. This attribute is
> currently also lost when merging or subsetting.
> (Here, spss.get from Hmisc works differently by assigning each variable a
> class and a label attribute which are preserved. I have the suspicion that
> this makes spss.get substantially slower than read.spss; on the other hand,
> it makes it easier to use these labels in annotation.)


--
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel