R and SAS proc format

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

R and SAS proc format

lamack lamack
Dear all, Is there an R equivalent to SAS's proc format?

Best regards

J. Lamack

_________________________________________________________________
O Windows Live Spaces é seu espaço na internet com fotos (500 por mês), blog
e agora com rede social http://spaces.live.com/

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R and SAS proc format

Frank Harrell
lamack lamack wrote:
> Dear all, Is there an R equivalent to SAS's proc format?
>
> Best regards
>
> J. Lamack

Fortunately not.  SAS is one of the few large systems that does not
implicitly support value labels and that separates label information
from the database [I can't count the number of times someone has sent me
a SAS dataset and forgotten to send the PROC FORMAT value labels].  See
the factor function for information about how R does this.

Frank Harrell

--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: R and SAS proc format

bogdan romocea
In reply to this post by lamack lamack
See ?cut for continuous variables, and ?factor, ?levels for the others.


> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of lamack lamack
> Sent: Tuesday, March 06, 2007 12:49 PM
> To: [hidden email]
> Subject: [R] R and SAS proc format
>
> Dear all, Is there an R equivalent to SAS's proc format?
>
> Best regards
>
> J. Lamack
>
> _________________________________________________________________
> O Windows Live Spaces é seu espaço na internet com fotos (500
> por mês), blog
> e agora com rede social http://spaces.live.com/
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R and SAS proc format

Ulrike Groemping
In reply to this post by lamack lamack
The down side to R's factor solution:
The numerical values of factors are always 1 to number of levels. Thus, it can be tough and requires great care to work with studies that have both numerical values different from this and value labels. This situation is currently not well-supported by R.

Regards, Ulrike

P.S.: I fully agree with Frank regarding the annoyance one sometimes encounters with formats in SAS!

lamack lamack wrote
Dear all, Is there an R equivalent to SAS's proc format?

Best regards

J. Lamack

_________________________________________________________________
O Windows Live Spaces é seu espaço na internet com fotos (500 por mês), blog
e agora com rede social http://spaces.live.com/

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R and SAS proc format

John Kane-2
In reply to this post by lamack lamack

--- lamack lamack <[hidden email]> wrote:

> Dear all, Is there an R equivalent to SAS's proc
> format?

What does the SAS PROC FORMAT do?

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R and SAS proc format

Frank Harrell
In reply to this post by Ulrike Groemping
Ulrike Grömping wrote:
> The down side to R's factor solution:
> The numerical values of factors are always 1 to number of levels. Thus, it
> can be tough and requires great care to work with studies that have both
> numerical values different from this and value labels. This situation is
> currently not well-supported by R.

You can add an attribute to a variable.  In the sas.get function in the
Hmisc package for example, when importing SAS variables that have PROC
FORMAT value labels, an attribute 'sas.codes' keeps the original codes;
these can be retrieved using sas.codes(variable name).  This could be
done outside the SAS import context also.

Frank

>
> Regards, Ulrike
>
> P.S.: I fully agree with Frank regarding the annoyance one sometimes
> encounters with formats in SAS!
>
>
> lamack lamack wrote:
>> Dear all, Is there an R equivalent to SAS's proc format?
>>
>> Best regards
>>
>> J. Lamack
>>
>> _________________________________________________________________
>> O Windows Live Spaces é seu espaço na internet com fotos (500 por mês),
>> blog
>> e agora com rede social http://spaces.live.com/
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: R and SAS proc format

Jason Barnhart
In reply to this post by John Kane-2

----- Original Message -----
From: "John Kane" <[hidden email]>
To: "lamack lamack" <[hidden email]>; <[hidden email]>
Sent: Tuesday, March 06, 2007 2:13 PM
Subject: Re: [R] R and SAS proc format


>
> --- lamack lamack <[hidden email]> wrote:
>
>> Dear all, Is there an R equivalent to SAS's proc
>> format?
>
> What does the SAS PROC FORMAT do?

It formats or reformats data in the SAS system.

It looks this:

    proc format; value kanefmt 1='A' 2='B' 3='C' 4='X' 5='Throw me
out';
    data temp; do i=1 to 10; kanevar=put(i,kanefmt.); output; end;
    proc print; run;

And produces this:

Obs     i      kanevar
  1     1    A
  2     2    B
  3     3    C
  4     4    X
  5     5    Throw me out
  6     6               6
  7     7               7
  8     8               8
  9     9               9
 10    10              10


But it is more robust than what is shown here.



>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R and SAS proc format

Ulrike Grömping-2
In reply to this post by Frank Harrell
>>The down side to R's factor solution:

>>The numerical values of factors are always 1 to number of levels. Thus, it

>>can be tough and requires great care to work with studies that have both

>>numerical values different from this and value labels. This situation is

>>currently not well-supported by R.

>>

>>Regards, Ulrike

>>

>>P.S.: I fully agree with Frank regarding the annoyance one sometimes

>>encounters with formats in SAS!

> You can add an attribute to a variable.  In the sas.get function in the
> Hmisc package for example, when importing SAS variables that have PROC
> FORMAT value labels, an attribute 'sas.codes' keeps the original codes;
> these can be retrieved using sas.codes(variable name).  This could be
> done outside the SAS import context also.
>
> Frank
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                      Department of Biostatistics   Vanderbilt University
Frank,

are these attributes preserved when merging or subsetting a data frame?
Are they used in R packages other than Hmisc and Design (e.g. in a simple table request)?

If this is the case, my wishlist items 8658 and 8659 (http://bugs.r-project.org/cgi-bin/R/wishlist?id=8658;user=guest, http://bugs.r-project.org/cgi-bin/R/wishlist?id=8659;user=guest) can be closed.
Otherwise, I maintain the opinion that there are workarounds but that R is not satisfactorily able to handle this type of data.

Regards, Ulrike

------- End of Original Message -------
 

        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R and SAS proc format

Frank Harrell
Ulrike Grömping wrote:

>
>
>>>The down side to R's factor solution:
>
>>>The numerical values of factors are always 1 to number of levels. Thus, it
>
>>>can be tough and requires great care to work with studies that have both
>
>>>numerical values different from this and value labels. This situation is
>
>>>currently not well-supported by R.
>
>>>
>
>>>Regards, Ulrike
>
>>>
>
>>>P.S.: I fully agree with Frank regarding the annoyance one sometimes
>
>>>encounters with formats in SAS!
>
>  > You can add an attribute to a variable.  In the sas.get function in the
>  > Hmisc package for example, when importing SAS variables that have PROC
>  > FORMAT value labels, an attribute 'sas.codes' keeps the original codes;
>  > these can be retrieved using sas.codes(variable name).  This could be
>  > done outside the SAS import context also.
>  >
>  > Frank
>  > --
>  > Frank E Harrell Jr   Professor and Chair           School of Medicine
>  >                       Department of Biostatistics   Vanderbilt
> University
>
> Frank,
>
> are these attributes preserved when merging or subsetting a data frame?
> Are they used in R packages other than Hmisc and Design (e.g. in a
> simple table request)?

no; would need to add functions like those that are used by the Hmisc
label or impute functions.  And they are not used outside Hmisc/Design.
  In fact I have little need for them as I always find the final labels
as the key to analysis.

>
> If this is the case, my wishlist items 8658 and 8659
> (http://bugs.r-project.org/cgi-bin/R/wishlist?id=8658;user=guest,
> http://bugs.r-project.org/cgi-bin/R/wishlist?id=8659;user=guest) can be
> closed.
> Otherwise, I maintain the opinion that there are workarounds but that R
> is not satisfactorily able to handle this type of data.

R gives the framework for doing this elegantly but the user has an
overhead of implementing new methods for such attributes.

Cheers
Frank

>
> Regards, Ulrike
>
>
> *------- End of Original Message -------*


--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|

Re: R and SAS proc format

Peter Dalgaard
In reply to this post by Jason Barnhart
Jason Barnhart wrote:

> ----- Original Message -----
> From: "John Kane" <[hidden email]>
> To: "lamack lamack" <[hidden email]>; <[hidden email]>
> Sent: Tuesday, March 06, 2007 2:13 PM
> Subject: Re: [R] R and SAS proc format
>
>
>  
>> --- lamack lamack <[hidden email]> wrote:
>>
>>    
>>> Dear all, Is there an R equivalent to SAS's proc
>>> format?
>>>      
>> What does the SAS PROC FORMAT do?
>>    
>
> It formats or reformats data in the SAS system.
>  

Slightly more precisely: It creates user-defined formats, which are
subsequently associated with variables and used for reading, printing,
tabulating, and analyzing data. It is akin to R's factor()
constructions, but not quite. For one thing, SAS's formats are separate
entities - same format can be used for many variables, whereas R's
factors have the formatting coded as a part of the object. For related
reasons, a variable in SAS can have more distinct values than there are
value labesl for, etc.

> It looks this:
>
>     proc format; value kanefmt 1='A' 2='B' 3='C' 4='X' 5='Throw me
> out';
>     data temp; do i=1 to 10; kanevar=put(i,kanefmt.); output; end;
>     proc print; run;
>
> And produces this:
>
> Obs     i      kanevar
>   1     1    A
>   2     2    B
>   3     3    C
>   4     4    X
>   5     5    Throw me out
>   6     6               6
>   7     7               7
>   8     8               8
>   9     9               9
>  10    10              10
>
>
> But it is more robust than what is shown here.
>
>
>
>  
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>    
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>  


--
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: R and SAS proc format

Carlos J. Gil Bellosta
On 3/7/07, Peter Dalgaard <[hidden email]> wrote:

> Jason Barnhart wrote:
> > ----- Original Message -----
> > From: "John Kane" <[hidden email]>
> > To: "lamack lamack" <[hidden email]>; <[hidden email]>
> > Sent: Tuesday, March 06, 2007 2:13 PM
> > Subject: Re: [R] R and SAS proc format
> >
> >
> >
> >> --- lamack lamack <[hidden email]> wrote:
> >>
> >>
> >>> Dear all, Is there an R equivalent to SAS's proc
> >>> format?
> >>>
> >> What does the SAS PROC FORMAT do?
> >>
> >
> > It formats or reformats data in the SAS system.
> >
>
> Slightly more precisely: It creates user-defined formats, which are
> subsequently associated with variables and used for reading, printing,
> tabulating, and analyzing data. It is akin to R's factor()
> constructions, but not quite. For one thing, SAS's formats are separate
> entities - same format can be used for many variables, whereas R's
> factors have the formatting coded as a part of the object. For related
> reasons, a variable in SAS can have more distinct values than there are
> value labesl for, etc.
> > It looks this:
> >
> >     proc format; value kanefmt 1='A' 2='B' 3='C' 4='X' 5='Throw me
> > out';
> >     data temp; do i=1 to 10; kanevar=put(i,kanefmt.); output; end;
> >     proc print; run;
> >
> > And produces this:
> >
> > Obs     i      kanevar
> >   1     1    A
> >   2     2    B
> >   3     3    C
> >   4     4    X
> >   5     5    Throw me out
> >   6     6               6
> >   7     7               7
> >   8     8               8
> >   9     9               9
> >  10    10              10
> >
> >
> > But it is more robust than what is shown here.
> >
> >
> >
> >
> >> ______________________________________________
> >> [hidden email] mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
> ~~~~~~~~~~ - ([hidden email])                  FAX: (+45) 35327907
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Also, SAS formats are used as a (somewhat cumbersome) replacement for
"dictionary" data structures. Starting from SAS 9.1 (I believe), "hash
tables" can be used within data steps for the same purpose (albeit
still cumbersome).

In this regard, not only formats but also lists could be a replacement
for them. They can be used as a way to get key-value mappings.

These key-value mappings (I mean, these kind of data structures) are
very handy tools. I have used both factors and lists for some kind of
"ad hoc" replacement for these data structures. Hasn't anybody
considered the posibility of having these data structures implemented
in R in a much python-like or java-like touch and feel?

Regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.