Quantcast

Reading sas7bdat files directly

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Reading sas7bdat files directly

Alex Bryant
Hi,   I have a need to process (in real-time) a large number of .sas7bdat files from within R.  The problem is I don't want to convert these files to .xpt (transport) every time.  So just checking if anyone has a (viable) way to read .sas7bdat files directly into R?

Thank You.

//----------------------------------
// Alex Bryant
// Software Developer
// Integrated Clinical systems
// 908-996-7208


________________________________
Confidentiality Note: This e-mail, and any attachment to...{{dropped:13}}

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Jorge I Velez
Hi Alex,

Perhaps the read.ssd function in the foreign package might do what you want.
See [1] for details.

HTH,
Jorge

[1]  http://cran.r-project.org/web/packages/foreign/index.html
<http://cran.r-project.org/web/packages/foreign/index.html>

On Thu, Feb 4, 2010 at 5:31 PM, Alex Bryant <> wrote:

> Hi,   I have a need to process (in real-time) a large number of .sas7bdat
> files from within R.  The problem is I don't want to convert these files to
> .xpt (transport) every time.  So just checking if anyone has a (viable) way
> to read .sas7bdat files directly into R?
>
> Thank You.
>
> //----------------------------------
> // Alex Bryant
> // Software Developer
> // Integrated Clinical systems
> // 908-996-7208
>
>
> ________________________________
> Confidentiality Note: This e-mail, and any attachment to...{{dropped:13}}
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Duncan Murdoch
In reply to this post by Alex Bryant
Alex Bryant wrote:
> Hi,   I have a need to process (in real-time) a large number of .sas7bdat files from within R.  The problem is I don't want to convert these files to .xpt (transport) every time.  So just checking if anyone has a (viable) way to read .sas7bdat files directly into R?

SAS now advertises some sort of R support (see
http://support.sas.com/rnd/app/studio/Rinterface2.html), so maybe you
could get SAS to convert them to a native R format. I think you won't
find a way for R to read a SAS proprietary format.

Duncan Murdoch

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

David Winsemius
In reply to this post by Alex Bryant

On Feb 4, 2010, at 5:31 PM, Alex Bryant wrote:

> Hi,   I have a need to process (in real-time) a large number  
> of .sas7bdat files from within R.  The problem is I don't want to  
> convert these files to .xpt (transport) every time.  So just  
> checking if anyone has a (viable) way to read .sas7bdat files  
> directly into R?

I believe (on the basis of what is written on the NCHS/NHANESwebsite)  
that if you are a Windows user, which I am only under duress, you can  
get at such data with a free product that SAS makes available:

http://www.sas.com/apps/demosdownloads/sassysview_PROD_8.2_sysdep.jsp?packageID=000176

Whether that product can be called as a system()  executable with  
arguments, I have no idea.

--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Frank Harrell
David Winsemius wrote:

>
> On Feb 4, 2010, at 5:31 PM, Alex Bryant wrote:
>
>> Hi,   I have a need to process (in real-time) a large number of
>> .sas7bdat files from within R.  The problem is I don't want to convert
>> these files to .xpt (transport) every time.  So just checking if
>> anyone has a (viable) way to read .sas7bdat files directly into R?
>
> I believe (on the basis of what is written on the NCHS/NHANESwebsite)
> that if you are a Windows user, which I am only under duress, you can
> get at such data with a free product that SAS makes available:
>
> http://www.sas.com/apps/demosdownloads/sassysview_PROD_8.2_sysdep.jsp?packageID=000176 
>
>
> Whether that product can be called as a system()  executable with
> arguments, I have no idea.
>

The SAS viewer has a pretty pathetic data export feature that won't even
  produce valid csv files if you have any unmatched quotes in character
variables.  I don't think it's very good about exporting metadata either.

Frank

--
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Annoyia Mouse
If you don't have SAS and still need to read or write sas7bdat files: there is the "World Programming System" (WPS) (commercial software).
http://www.teamwpc.co.uk/home/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Chris Long
In reply to this post by Alex Bryant
Hi Alex,

'm not an R user but I found your question during a general Google search re: SAS7BDAT files.  You might like to try my free 'dsread' utility which will convert most Windows-format SAS7BDAT files to CSV.  It's command-line based so can easily be called from other code with relevant parameters.

I'd be keen to hear any feedback on this utility from you and other R users (I'm a SAS guy myself).

Cheers,

Chris.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Chris Long
I suppose a link would have added usefulness:

http://www.oview.co.uk/dsread

Chris.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Frank Harrell
Chris Long wrote:
> I suppose a link would have added usefulness:
>
> http://www.oview.co.uk/dsread
>
> Chris.

As dsread seems to work perfectly under wine on Ubuntu linux (and quite
quickly), it could be quite valuable to many of us.  Thanks for posting
this and for developing dsread!  If time allows I'll write a function to
add to the Hmisc package that runs dsread to output the SAS dataset
metadata, then runs dsread again to read the data, adding back metadata
such as variable labels.
Frank

--
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Chris Long
No problem, Frank, I'm glad that you think it will be useful.

I will be making changes to dsread in the coming weeks so you may want to hold off with your helper function in case my changes break it (the formatting of the variable metadata listing may well change).  Re: the metadata, feel free to suggest an alternate output format that would make this a cleaner process.  For example, I have in mind to allow the -c (contents) and -v (CSV) flags together, in which case the dataset metadata will be output in CSV format.

Any other suggestions or comments are welcomed.

Chris.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Frank Harrell
Chris Long wrote:

> No problem, Frank, I'm glad that you think it will be useful.
>
> I will be making changes to dsread in the coming weeks so you may want to
> hold off with your helper function in case my changes break it (the
> formatting of the variable metadata listing may well change).  Re: the
> metadata, feel free to suggest an alternate output format that would make
> this a cleaner process.  For example, I have in mind to allow the -c
> (contents) and -v (CSV) flags together, in which case the dataset metadata
> will be output in CSV format.
>
> Any other suggestions or comments are welcomed.
>
> Chris.

Thanks very much for your note Chris.  Having an option to output the
metadata in a standard csv format as well as the nice table format you
provide already (either one or the other) will be good.  If you allow
flags (one could be -o) to specify the names of 2 different output files
when STDOUT is not being used, that will expedite things.

As sas7bdat files are unbelievably inefficient storage-wise (I'm looking
at an example where the file is 525K and the bzip2'd version is 29K) it
will be great if you can handled compressed sas7bdat files too.

Thanks for the excellent work,
Frank


--
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Frank Harrell
Department of Biostatistics, Vanderbilt University
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Chris Long
dsread (http://www.oview.co.uk/dsread) was updated yesterday, to include various new features as suggested here and elsewhere.  Of particular interest might be:

- you can now use the /c and /v options together to get dataset contents in CSV format for easier importing;

- there is now a /l option for lossless representation of numerics in the output.  Numerics will appear as (eg) '0x000000000000f03f' giving the exact hex value of each of the eight bytes making up the IEEE float value.

Please continue to report bugs or feature requests here or on the above web page.

Thanks,

Chris.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Roger DeAngelis(xlr82sas)
Hi All,

 The hack below might help R users get going with Chris's DSREAD. I have not had a chance to look at  Monday's version of DSREAD, can't wait.

Note Duncan Murdoch was most gracious to supply me with a R function to translate floats in 16 char hex to R floats.

Your utility solves the 200 byte, 8 char name and potential precision
errors with other methods of transfering SAS datasets to perl and R.
Thanks.

Importing SAS datasets(sas7bdat) into R
(32 bit windows 2000, 32 bit SAS 9.2 and
32 bit R version 2.9.0 (2009-04-17)

 Here is what I want to accomplish, the double floats below show data
from SAS to R.
 They are exactly the same in R and SAS memory, bit for bit.


  R Internal         SAS Internal
  16 Byte Float      16 byte Float


3FFAAAAAAAAAAAAB  3FFAAAAAAAAAAAAB
4002AAAAAAAAAAAB  4002AAAAAAAAAAAB
400D555555555555  400D555555555555
3FF6666666666666  3FF6666666666666
3FFCCCCCCCCCCCCD  3FFCCCCCCCCCCCCD
400199999999999A  400199999999999A
4004CCCCCCCCCCCD  4004CCCCCCCCCCCD
3FF4924924924925  3FF4924924924925
3FF9249249249249  3FF9249249249249
3FFDB6DB6DB6DB6E  3FFDB6DB6DB6DB6E
4001249249249249  4001249249249249
3FF2E8BA2E8BA2E9  3FF2E8BA2E8BA2E9
3FF5D1745D1745D1  3FF5D1745D1745D1
3FF8BA2E8BA2E8BA  3FF8BA2E8BA2E8BA
3FFBA2E8BA2E8BA3  3FFBA2E8BA2E8BA3
3FF2762762762762  3FF2762762762762
3FF4EC4EC4EC4EC5  3FF4EC4EC4EC4EC5
3FF7627627627627  3FF7627627627627
3FF9D89D89D89D8A  3FF9D89D89D89D8A
1.7976931348623E  1.7976931348623E
0010000000000000  0010000000000000


I don't believe this high accuracy transfer is possible  with any
other method except ODBC,
but SAS ODBC is unsatisfactory for me. If you use CSV with the maximum
assured decimal
precision(15 significant digits?). The CSV decimal numbers will only
approximate the double floats.

I consider the Csv to be corrupt if the relative of absolute
difference using the decimal
Csv numbers and the memory floats is greater than 10^-12.  There are
two sources of error first
the SAS floats are decimally rounded and converted to decimal then the
rounded decimal
approximations are  converted into R floats.


Status of     R Internal            CSV
Csv           16 Byte Float


Csv corrupt 3FFAAAAAAAAAAAAB   1.66666666666667    >10^-12 different
Csv corrupt 4002AAAAAAAAAAAB   2.33333333333333
Csv corrupt 400D555555555555   3.66666666666667
Csv OK      3FF6666666666666   1.4
Csv OK      3FFCCCCCCCCCCCCD   1.8
Csv OK      400199999999999A   2.2
Csv OK      4004CCCCCCCCCCCD   2.6
Csv corrupt 3FF4924924924925   1.28571428571429
Csv corrupt 3FF9249249249249   1.57142857142857
Csv corrupt 3FFDB6DB6DB6DB6E   1.85714285714286
Csv corrupt 4001249249249249   2.14285714285714
Csv corrupt 3FF2E8BA2E8BA2E9   1.18181818181818
Csv corrupt 3FF5D1745D1745D1   1.36363636363636
Csv corrupt 3FF8BA2E8BA2E8BA   1.54545454545455
Csv corrupt 3FFBA2E8BA2E8BA3   1.72727272727273
Csv corrupt 3FF2762762762762   1.15384615384615
Csv corrupt 3FF4EC4EC4EC4EC5   1.30769230769231
Csv corrupt 3FF7627627627627   1.46153846153846
Csv corrupt 3FF9D89D89D89D8A   1.61538461538462
Csv corrupt 1.7976931348623E   1.7976931348623E+308
Csv corrupt 0010000000000000   2.2250738585072E-308


Bacground


  1. Provide absolutely loss less transfer
     of character(max 32756 bytes per character variable)  and numeric
data from SAS to R
     Since SAS has only two datatypes so this code should be
exhaustive.


  2. This code is useful because:
     a. The SAS ODBC driver requires the user to not only have
        SAS but the user must bring up a SAS session and
        the session has to be closed manually. (SAS issue not a
foreign issue)
     b. The foreign package also requires interaction with SAS. (SAS
issue)
     c. SASxport only supports 8 character SAS names and a max of
        200 byte character values. (This is a SAS issue not a SASxport
issue)
     d. SASxport creates floating point doubles that have an 8 bit
exponent
        and 56 bit mantissa while IEEE is 11 bit exponent and 53 bit
mantissa
        (sometimes defined slightly differently depending of where you
consider
        the sign bits). This results is the loss of some very small
and
        very large numbers. ( SAS issue not a SASxport issue)


  3. How this code overcomes the issues above for import only.


     You need the dsread exec in the previous mesage. Also the input
SAS dataset must have
16 byte character representations for the floats. I am working with
the developer to see what we
can do about this..
He will make it an option on the invocation to do the hex conversion
for numerics.


Here is the R code run inside a SAS datastep. Actually I can interact
with the output of the R code
in the same dataqstep. It is also possible to run perl, SAS procs and
other SAS languages in the same datastep.
Note the input pipe, no physical CSV file is produced).


If there is interest I can provide the code that executes R.


data _null_;
  length pgm $1250;
  pgm=compbl("
  library (SASxport);
  library (foreign);
  hexdigits <- function(s) {;
      digits <- 0:15;
      names(digits) <- c(0:9, LETTERS[1:6]);
      digits[strsplit(s, '')[[1]]];
  };
  bytes <- function(s) {;
      digits <- matrix(hexdigits(s), ncol=2, byrow=TRUE);
      digits;
      as.raw(digits %*% c(16,1));
  };
  todouble <- function(bytes) {;
      con <- rawConnection(bytes);
      val <- readBin(con, 'double', endian='big');
      close(con);
      val;
  };
  x <-c(1:21);
  rc<-c(1:21);
  ln<-c(1:21);
  z<-read.table(pipe('C:\\tip\\dsread.exe -v C:\\tip\
\fix.sas7bdat'),header=TRUE,sep=',',colClasses='character');
  st<-z$STR;
  lin<-z$LIN;
  d<-as.numeric(z$DECIMAL_REPRESENTATION);
  h<-as.character(z$HEXIDECIMAL_REPRESENTATION);
  for ( i in 1:21 ) {;
    x[i]  <- todouble(bytes(h[i]));
    rc[i] <- if (((abs( x[i] - d[i] )       > 1E-12 )) || ;
             (abs((x[i] - d[i])/x[i] ) > 1E-12 )) 0 else 1;
    ln[i] <- nchar(st[i], type = 'bytes');
  };
  R_ntrnl    <-h ;
  SASntrnl   <-h ;
  R_deciml   <-sprintf('%.14e',x);
  SAS_deciml <-sprintf('%.14e',x);
  Csv_stmat  <-z$DECIMAL_UNTOUCHED;
  Corrupt    <-rc;
  datfrm     <-
data.frame(R_ntrnl ,SASntrnl ,R_deciml ,SAS_deciml ,Csv_stmat ,Corrupt,ln,lin);
  write.xport(datfrm,file='C:\\utl\
\datfrm.xpt',autogen.formats=FALSE);
  ");
  call rxeq(pgm);
  call getxpt('datfrm');
run;


SAS code to create fix.sas7bdat


options xsync xwait;run;
%let fac=1000;
data "c:\tip\fix.sas7bdat"(drop=prime nonprime byt);
  retain byt 0  str;
  length str $%eval(&fac * 32);
  do prime=3,5,7,11,13;
    do nonprime=2,4,6,8;
      byt+&fac;
      str=repeat(byte(64+byt/&fac),byt);
      decimal_representation    =nonprime/prime+1;
      hexidecimal_representation=put(decimal_representation,hex16.);
      decimal_untouched         =cats(put(round(decimal_representation,
1e-14),best32.));
      lin=length(str);
      if decimal_representation ne 3 then output;
    end;
  end;
  decimal_representation    =constant('big');
  hexidecimal_representation=put(constant('big'),e20.);
  decimal_untouched         =cats(put(decimal_representation,e20.));
  str=repeat('@',%eval(&fac * 30));
  lin=length(str);
  output;
  decimal_representation    =constant('small');
  hexidecimal_representation=put(constant('small'),hex16.);
  decimal_untouched         =cats(put(decimal_representation,e20.));
  str=repeat('@',%eval(&fac * 32));
  lin=length(str);
  output;
  format _numeric_ e20.;
run;


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Nordlund, Dan (DSHS/RDA)
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of Roger DeAngelis(xlr82sas)
> Sent: Monday, March 01, 2010 4:38 PM
> To: [hidden email]
> Subject: Re: [R] Reading sas7bdat files directly
>
>
> Hi All,
>
>  The hack below might help R users get going with Chris's DSREAD. I have not
> had a chance to look at  Monday's version of DSREAD, can't wait.
>
> Note Duncan Murdoch was most gracious to supply me with a R function to
> translate floats in 16 char hex to R floats.
>
> Your utility solves the 200 byte, 8 char name and potential precision
> errors with other methods of transfering SAS datasets to perl and R.
> Thanks.
>
> Importing SAS datasets(sas7bdat) into R
> (32 bit windows 2000, 32 bit SAS 9.2 and
> 32 bit R version 2.9.0 (2009-04-17)
>
>  Here is what I want to accomplish, the double floats below show data
> from SAS to R.
>  They are exactly the same in R and SAS memory, bit for bit.
>
>
>   R Internal         SAS Internal
>   16 Byte Float      16 byte Float
>
>
> 3FFAAAAAAAAAAAAB  3FFAAAAAAAAAAAAB
> 4002AAAAAAAAAAAB  4002AAAAAAAAAAAB
> 400D555555555555  400D555555555555
> 3FF6666666666666  3FF6666666666666
> 3FFCCCCCCCCCCCCD  3FFCCCCCCCCCCCCD
> 400199999999999A  400199999999999A
> 4004CCCCCCCCCCCD  4004CCCCCCCCCCCD
> 3FF4924924924925  3FF4924924924925
> 3FF9249249249249  3FF9249249249249
> 3FFDB6DB6DB6DB6E  3FFDB6DB6DB6DB6E
> 4001249249249249  4001249249249249
> 3FF2E8BA2E8BA2E9  3FF2E8BA2E8BA2E9
> 3FF5D1745D1745D1  3FF5D1745D1745D1
> 3FF8BA2E8BA2E8BA  3FF8BA2E8BA2E8BA
> 3FFBA2E8BA2E8BA3  3FFBA2E8BA2E8BA3
> 3FF2762762762762  3FF2762762762762
> 3FF4EC4EC4EC4EC5  3FF4EC4EC4EC4EC5
> 3FF7627627627627  3FF7627627627627
> 3FF9D89D89D89D8A  3FF9D89D89D89D8A
> 1.7976931348623E  1.7976931348623E
> 0010000000000000  0010000000000000
>
>
> I don't believe this high accuracy transfer is possible  with any
> other method except ODBC,
> but SAS ODBC is unsatisfactory for me. If you use CSV with the maximum
> assured decimal
> precision(15 significant digits?). The CSV decimal numbers will only
> approximate the double floats.
>
> I consider the Csv to be corrupt if the relative of absolute
> difference using the decimal
> Csv numbers and the memory floats is greater than 10^-12.  There are
> two sources of error first
> the SAS floats are decimally rounded and converted to decimal then the
> rounded decimal
> approximations are  converted into R floats.
>
>
> Status of     R Internal            CSV
> Csv           16 Byte Float
>
>
> Csv corrupt 3FFAAAAAAAAAAAAB   1.66666666666667    >10^-12 different
> Csv corrupt 4002AAAAAAAAAAAB   2.33333333333333
> Csv corrupt 400D555555555555   3.66666666666667
> Csv OK      3FF6666666666666   1.4
> Csv OK      3FFCCCCCCCCCCCCD   1.8
> Csv OK      400199999999999A   2.2
> Csv OK      4004CCCCCCCCCCCD   2.6
> Csv corrupt 3FF4924924924925   1.28571428571429
> Csv corrupt 3FF9249249249249   1.57142857142857
> Csv corrupt 3FFDB6DB6DB6DB6E   1.85714285714286
> Csv corrupt 4001249249249249   2.14285714285714
> Csv corrupt 3FF2E8BA2E8BA2E9   1.18181818181818
> Csv corrupt 3FF5D1745D1745D1   1.36363636363636
> Csv corrupt 3FF8BA2E8BA2E8BA   1.54545454545455
> Csv corrupt 3FFBA2E8BA2E8BA3   1.72727272727273
> Csv corrupt 3FF2762762762762   1.15384615384615
> Csv corrupt 3FF4EC4EC4EC4EC5   1.30769230769231
> Csv corrupt 3FF7627627627627   1.46153846153846
> Csv corrupt 3FF9D89D89D89D8A   1.61538461538462
> Csv corrupt 1.7976931348623E   1.7976931348623E+308
> Csv corrupt 0010000000000000   2.2250738585072E-308
>
>
> Bacground
>
>
>   1. Provide absolutely loss less transfer
>      of character(max 32756 bytes per character variable)  and numeric
> data from SAS to R
>      Since SAS has only two datatypes so this code should be
> exhaustive.
>
>
>   2. This code is useful because:
>      a. The SAS ODBC driver requires the user to not only have
>         SAS but the user must bring up a SAS session and
>         the session has to be closed manually. (SAS issue not a
> foreign issue)
>      b. The foreign package also requires interaction with SAS. (SAS
> issue)
>      c. SASxport only supports 8 character SAS names and a max of
>         200 byte character values. (This is a SAS issue not a SASxport
> issue)
>      d. SASxport creates floating point doubles that have an 8 bit
> exponent
>         and 56 bit mantissa while IEEE is 11 bit exponent and 53 bit
> mantissa
>         (sometimes defined slightly differently depending of where you
> consider
>         the sign bits). This results is the loss of some very small
> and
>         very large numbers. ( SAS issue not a SASxport issue)
>
>
>   3. How this code overcomes the issues above for import only.
>
>
>      You need the dsread exec in the previous mesage. Also the input
> SAS dataset must have
> 16 byte character representations for the floats. I am working with
> the developer to see what we
> can do about this..
> He will make it an option on the invocation to do the hex conversion
> for numerics.
>
>
> Here is the R code run inside a SAS datastep. Actually I can interact
> with the output of the R code
> in the same dataqstep. It is also possible to run perl, SAS procs and
> other SAS languages in the same datastep.
> Note the input pipe, no physical CSV file is produced).
>
>
> If there is interest I can provide the code that executes R.
>
>
> data _null_;
>   length pgm $1250;
>   pgm=compbl("
>   library (SASxport);
>   library (foreign);
>   hexdigits <- function(s) {;
>       digits <- 0:15;
>       names(digits) <- c(0:9, LETTERS[1:6]);
>       digits[strsplit(s, '')[[1]]];
>   };
>   bytes <- function(s) {;
>       digits <- matrix(hexdigits(s), ncol=2, byrow=TRUE);
>       digits;
>       as.raw(digits %*% c(16,1));
>   };
>   todouble <- function(bytes) {;
>       con <- rawConnection(bytes);
>       val <- readBin(con, 'double', endian='big');
>       close(con);
>       val;
>   };
>   x <-c(1:21);
>   rc<-c(1:21);
>   ln<-c(1:21);
>   z<-read.table(pipe('C:\\tip\\dsread.exe -v C:\\tip\
> \fix.sas7bdat'),header=TRUE,sep=',',colClasses='character');
>   st<-z$STR;
>   lin<-z$LIN;
>   d<-as.numeric(z$DECIMAL_REPRESENTATION);
>   h<-as.character(z$HEXIDECIMAL_REPRESENTATION);
>   for ( i in 1:21 ) {;
>     x[i]  <- todouble(bytes(h[i]));
>     rc[i] <- if (((abs( x[i] - d[i] )       > 1E-12 )) || ;
>              (abs((x[i] - d[i])/x[i] ) > 1E-12 )) 0 else 1;
>     ln[i] <- nchar(st[i], type = 'bytes');
>   };
>   R_ntrnl    <-h ;
>   SASntrnl   <-h ;
>   R_deciml   <-sprintf('%.14e',x);
>   SAS_deciml <-sprintf('%.14e',x);
>   Csv_stmat  <-z$DECIMAL_UNTOUCHED;
>   Corrupt    <-rc;
>   datfrm     <-
> data.frame(R_ntrnl ,SASntrnl ,R_deciml ,SAS_deciml ,Csv_stmat
> ,Corrupt,ln,lin);
>   write.xport(datfrm,file='C:\\utl\
> \datfrm.xpt',autogen.formats=FALSE);
>   ");
>   call rxeq(pgm);
>   call getxpt('datfrm');
> run;
>
>
> SAS code to create fix.sas7bdat
>
>
> options xsync xwait;run;
> %let fac=1000;
> data "c:\tip\fix.sas7bdat"(drop=prime nonprime byt);
>   retain byt 0  str;
>   length str $%eval(&fac * 32);
>   do prime=3,5,7,11,13;
>     do nonprime=2,4,6,8;
>       byt+&fac;
>       str=repeat(byte(64+byt/&fac),byt);
>       decimal_representation    =nonprime/prime+1;
>       hexidecimal_representation=put(decimal_representation,hex16.);
>       decimal_untouched         =cats(put(round(decimal_representation,
> 1e-14),best32.));
>       lin=length(str);
>       if decimal_representation ne 3 then output;
>     end;
>   end;
>   decimal_representation    =constant('big');
>   hexidecimal_representation=put(constant('big'),e20.);
>   decimal_untouched         =cats(put(decimal_representation,e20.));
>   str=repeat('@',%eval(&fac * 30));
>   lin=length(str);
>   output;
>   decimal_representation    =constant('small');
>   hexidecimal_representation=put(constant('small'),hex16.);
>   decimal_untouched         =cats(put(decimal_representation,e20.));
>   str=repeat('@',%eval(&fac * 32));
>   lin=length(str);
>   output;
>   format _numeric_ e20.;
> run;
>
>

The announcement that Chris Long made about a HEX output method for the dsread utility was to output the hex representation in "little endian" byte order, so the above routines will need to take that into account.

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA  98504-5204

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Roger DeAngelis(xlr82sas)
Hi,

  It looks like we may need to swap bytes(little endian to big endian). I will look into it tonight.

  As a side note, SAS reserves 28 floats for missing values. It should be easy to convert these to NaN on input to R.

You can test this in SAS by converting the 16 char floats to ieee8. in SAS and doing a put. The result will be A, B...Z, . and _.

SAS code that produced the listing is below.

Here are the floats that map to the 28 missing values in SAS

A  FFFFFD0000000000
B  FFFFFC0000000000
C  FFFFFB0000000000
D  FFFFFA0000000000
E  FFFFF90000000000
F  FFFFF80000000000
G  FFFFF70000000000
H  FFFFF60000000000
I  FFFFF50000000000
J  FFFFF40000000000
K  FFFFF30000000000
L  FFFFF20000000000
M  FFFFF10000000000
N  FFFFF00000000000
O  FFFFEF0000000000
P  FFFFEE0000000000
Q  FFFFED0000000000
R  FFFFEC0000000000
S  FFFFEB0000000000
T  FFFFEA0000000000
U  FFFFE90000000000
V  FFFFE80000000000
W  FFFFE70000000000
X  FFFFE60000000000
Y  FFFFE50000000000
Z  FFFFE40000000000
_  FFFFFF0000000000
.  FFFFFE0000000000

data mis;                                                              
retain A .A B .B C .C D .D E .E F .F G .G H .H I .I J .J K .K L .L M .M
       N .N O .O P .P Q .Q R .R S .S T .T U .U V .V W .W X .X Y .Y Z .Z
      _ ._ DOT .;                                                      
array mis[28] A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ DOT;
  do idx=1 to 28;                                                      
     hex=put(mis[idx],ieee8.);                                          
     xeh=put(hex,hex16.);                                              
     put @1 mis[idx] @6 xeh;                                            
  end;                                                                  

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Chris Long
The dsread output is little-endian, as that's the native format for
floats on the Wintel platform.  The byte order should stay the same if
converting directly to a float, using a data structure like (C/C++):

union {
     char bytes[8];
     double value;
}

If reading the values with a SAS HEX informat, the bytes will need to be
reversed.  It's obviously trivial for me to add an endian-ness option,
I'll do that later....

Chris.

On 02/03/2010 02:06, Roger DeAngelis(xlr82sas) wrote:

> Hi,
>
>    It looks like we may need to swap bytes(little endian to big endian). I
> will look into it tonight.
>
>    As a side note, SAS reserves 28 floats for missing values. It should be
> easy to convert these to NaN on input to R.
>
> You can test this in SAS by converting the 16 char floats to ieee8. in SAS
> and doing a put. The result will be A, B...Z, . and _.
>
> SAS code that produced the listing is below.
>
> Here are the floats that map to the 28 missing values in SAS
>
> A  FFFFFD0000000000
> B  FFFFFC0000000000
> C  FFFFFB0000000000
> D  FFFFFA0000000000
> E  FFFFF90000000000
> F  FFFFF80000000000
> G  FFFFF70000000000
> H  FFFFF60000000000
> I  FFFFF50000000000
> J  FFFFF40000000000
> K  FFFFF30000000000
> L  FFFFF20000000000
> M  FFFFF10000000000
> N  FFFFF00000000000
> O  FFFFEF0000000000
> P  FFFFEE0000000000
> Q  FFFFED0000000000
> R  FFFFEC0000000000
> S  FFFFEB0000000000
> T  FFFFEA0000000000
> U  FFFFE90000000000
> V  FFFFE80000000000
> W  FFFFE70000000000
> X  FFFFE60000000000
> Y  FFFFE50000000000
> Z  FFFFE40000000000
> _  FFFFFF0000000000
> .  FFFFFE0000000000
>
> data mis;
> retain A .A B .B C .C D .D E .E F .F G .G H .H I .I J .J K .K L .L M .M
>         N .N O .O P .P Q .Q R .R S .S T .T U .U V .V W .W X .X Y .Y Z .Z
>        _ ._ DOT .;
> array mis[28] A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ DOT;
>    do idx=1 to 28;
>       hex=put(mis[idx],ieee8.);
>       xeh=put(hex,hex16.);
>       put @1 mis[idx] @6 xeh;
>    end;
>
>
>    

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Ted
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Reading sas7bdat files directly

Ted
In reply to this post by Frank Harrell
Just wondering if the proposed loop for the metadata on this conversion was implemented by any chance...before I go trying the Long routine...
Loading...