importing from Stata

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

importing from Stata

Dimitri Szerman-2
Hi,

I have a new job, and everyone here uses Stata. I won't give up on R,
but I must learn better how to exchange data between the two softwares.
I am now focusing on importing data from Stata to R, and I must confess
that I am a bit disappointed with the read.dta function from the foreign
package because IT typically happens that

(i) I get a big R file (for example, a 15Mb Stata file became a 42Mb R
file; after cleanup.import() from the Hmisc package, it drooped to 35Mb,
but that's still more than 2x the original Stata file) which, in turn, I
suspect is due the fact that

(ii) factors are created using Stata labels as levels.

I wonder if

(i) there isn't a way of forcing each variable to be numeric or integer,
maintaining it's original values (instead of "Stata labels" as "R
levels"). Or,

(ii) some one has written another function/s to carry this task.

I'd appreciate any suggestions on how to import from Stata to R more
efficiently.
Thanks in advance,

Dimitri

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: importing from Stata

Fox, John
Dear Dimitri,

I don't have a solution for your problem, but your comment about factor
levels isn't the source of the problem. Factors are stored as integer vector
with a "levels" attribute (try, e.g., unclassing the factor), so the level
names are not repeated.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Dimitri Joe
> Sent: Monday, January 16, 2006 4:30 PM
> To: R-Help
> Subject: [R] importing from Stata
>
> Hi,
>
> I have a new job, and everyone here uses Stata. I won't give
> up on R, but I must learn better how to exchange data between
> the two softwares.
> I am now focusing on importing data from Stata to R, and I
> must confess that I am a bit disappointed with the read.dta
> function from the foreign package because IT typically happens that
>
> (i) I get a big R file (for example, a 15Mb Stata file became
> a 42Mb R file; after cleanup.import() from the Hmisc package,
> it drooped to 35Mb, but that's still more than 2x the
> original Stata file) which, in turn, I suspect is due the fact that
>
> (ii) factors are created using Stata labels as levels.
>
> I wonder if
>
> (i) there isn't a way of forcing each variable to be numeric
> or integer, maintaining it's original values (instead of
> "Stata labels" as "R levels"). Or,
>
> (ii) some one has written another function/s to carry this task.
>
> I'd appreciate any suggestions on how to import from Stata to
> R more efficiently.
> Thanks in advance,
>
> Dimitri
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: importing from Stata

Thomas Lumley
In reply to this post by Dimitri Szerman-2
On Mon, 16 Jan 2006, Dimitri Joe wrote:

>
> (i) I get a big R file (for example, a 15Mb Stata file became a 42Mb R
> file; after cleanup.import() from the Hmisc package, it drooped to 35Mb,
> but that's still more than 2x the original Stata file) which, in turn, I
> suspect is due the fact that
>
> (ii) factors are created using Stata labels as levels.

Your suspicion is wrong.

A more likely explanation is that Stata uses single-precision floating
point by default and can use 1-byte and 2-byte integers. R uses double
precision floating point and four-byte integers.


> I wonder if
>
> (i) there isn't a way of forcing each variable to be numeric or integer,
> maintaining it's original values (instead of "Stata labels" as "R
> levels"). Or,

Yes. If you read the help page for read.dta() it tells you how.

  -thomas

Thomas Lumley Assoc. Professor, Biostatistics
[hidden email] University of Washington, Seattle

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: importing from Stata

Ronnie Babigumira
To add onto an already clear explanation (a comment on precision in Stata). Indeed Stata stores all numbers as floats
(also known as single precision or 4-byte reals). One way you could check this is to save a small subset of your data
with all numbers as doubles in stata and see how that size of the new Stata file compares with the new file you create in R

(A section on this can be found in the Stata user manual 13.10)


Thomas Lumley wrote:

> On Mon, 16 Jan 2006, Dimitri Joe wrote:
>
>>
>> (i) I get a big R file (for example, a 15Mb Stata file became a 42Mb R
>> file; after cleanup.import() from the Hmisc package, it drooped to 35Mb,
>> but that's still more than 2x the original Stata file) which, in turn, I
>> suspect is due the fact that
>>
>> (ii) factors are created using Stata labels as levels.
>
> Your suspicion is wrong.
>
> A more likely explanation is that Stata uses single-precision floating
> point by default and can use 1-byte and 2-byte integers. R uses double
> precision floating point and four-byte integers.
>
>
>> I wonder if
>>
>> (i) there isn't a way of forcing each variable to be numeric or integer,
>> maintaining it's original values (instead of "Stata labels" as "R
>> levels"). Or,
>
> Yes. If you read the help page for read.dta() it tells you how.
>
>   -thomas
>
> Thomas Lumley Assoc. Professor, Biostatistics
> [hidden email] University of Washington, Seattle
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html