Importing data using Foreign

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Importing data using Foreign

Elham Daadmehr
Hi all,

I have a simple problem. I get stuck in using the imported spss data (.sav)
using "read.spss".
I imported data (z) without any problem. After importing, the first column
doesn't contain any "NA". but when I choose a subset of it (like:
z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the
first column).

The (.sav) file is the output of Compustat (WRDS).

It is terrible, I can't find the mistake.

Thank you in advance for your help,
Elham

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Importing data using Foreign

Eric Berger
Hi Elham,
You are not giving us much to go on here.
Show us the commands that (a) confirm there are no NA's in the first column
of z
and (b) output a row of z that has an NA in the first column.
Here's how one might do this:
(a) sum(is.na(z[,1]))
(b) z[ match(TRUE, z[,8] %in% c("11","12","14")), ]

Eric


On Wed, Aug 26, 2020 at 3:56 PM Elham Daadmehr <[hidden email]> wrote:

> Hi all,
>
> I have a simple problem. I get stuck in using the imported spss data (.sav)
> using "read.spss".
> I imported data (z) without any problem. After importing, the first column
> doesn't contain any "NA". but when I choose a subset of it (like:
> z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the
> first column).
>
> The (.sav) file is the output of Compustat (WRDS).
>
> It is terrible, I can't find the mistake.
>
> Thank you in advance for your help,
> Elham
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Importing data using Foreign

Elham Daadmehr
Thanks for your reply.

You're right, here is what I did:

> library(foreign)

> sz201401=read.spss("/Users/e.daadmehr/Desktop/Term/LastLast/untitled
folder/2014/1.sav", to.data.frame=TRUE)

Warning message:

In read.spss("/Users/e.daadmehr/Desktop/Term/LastLast/untitled
folder/2014/1.sav",  :

  /Users/e.daadmehr/Desktop/Term/LastLast/untitled folder/2014/1.sav:
Compression bias (0) is not the usual value of 100

> z =sz201401

> is.list(z)

[1] TRUE

> z=as.data.frame(z)

> is.data.frame(z)

[1] TRUE

> z=z[,-c(10)]

> sum(is.na(z[,1]))

[1] 0

> z1=z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]

> sum(is.na(z1[,1]))

[1] 399


my file is not compressed.


Thank you in advance,

Elham



On Wed, Aug 26, 2020 at 3:31 PM Eric Berger <[hidden email]> wrote:

> Hi Elham,
> You are not giving us much to go on here.
> Show us the commands that (a) confirm there are no NA's in the first
> column of z
> and (b) output a row of z that has an NA in the first column.
> Here's how one might do this:
> (a) sum(is.na(z[,1]))
> (b) z[ match(TRUE, z[,8] %in% c("11","12","14")), ]
>
> Eric
>
>
> On Wed, Aug 26, 2020 at 3:56 PM Elham Daadmehr <[hidden email]>
> wrote:
>
>> Hi all,
>>
>> I have a simple problem. I get stuck in using the imported spss data
>> (.sav)
>> using "read.spss".
>> I imported data (z) without any problem. After importing, the first column
>> doesn't contain any "NA". but when I choose a subset of it (like:
>> z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the
>> first column).
>>
>> The (.sav) file is the output of Compustat (WRDS).
>>
>> It is terrible, I can't find the mistake.
>>
>> Thank you in advance for your help,
>> Elham
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Importing data using Foreign

Peter Dalgaard-2
In reply to this post by Elham Daadmehr
Offhand, I suspect that the NAs are in the 8th column.

> On 26 Aug 2020, at 10:57 , Elham Daadmehr <[hidden email]> wrote:
>
> Hi all,
>
> I have a simple problem. I get stuck in using the imported spss data (.sav)
> using "read.spss".
> I imported data (z) without any problem. After importing, the first column
> doesn't contain any "NA". but when I choose a subset of it (like:
> z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the
> first column).
>
> The (.sav) file is the output of Compustat (WRDS).
>
> It is terrible, I can't find the mistake.
>
> Thank you in advance for your help,
> Elham
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Importing data using Foreign

Eric Berger
Good point! :-)


On Wed, Aug 26, 2020 at 5:55 PM peter dalgaard <[hidden email]> wrote:

> Offhand, I suspect that the NAs are in the 8th column.
>
> > On 26 Aug 2020, at 10:57 , Elham Daadmehr <[hidden email]> wrote:
> >
> > Hi all,
> >
> > I have a simple problem. I get stuck in using the imported spss data
> (.sav)
> > using "read.spss".
> > I imported data (z) without any problem. After importing, the first
> column
> > doesn't contain any "NA". but when I choose a subset of it (like:
> > z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the
> > first column).
> >
> > The (.sav) file is the output of Compustat (WRDS).
> >
> > It is terrible, I can't find the mistake.
> >
> > Thank you in advance for your help,
> > Elham
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Importing data using Foreign

Elham Daadmehr
Thanks guys. but I'm a bit confused. the input is the first column (z[,1]
and z1[,1]).
How is it possible that a subset of a non-NA vector, contains NA?

On Wed, Aug 26, 2020 at 4:58 PM Eric Berger <[hidden email]> wrote:

> Good point! :-)
>
>
> On Wed, Aug 26, 2020 at 5:55 PM peter dalgaard <[hidden email]> wrote:
>
>> Offhand, I suspect that the NAs are in the 8th column.
>>
>> > On 26 Aug 2020, at 10:57 , Elham Daadmehr <[hidden email]> wrote:
>> >
>> > Hi all,
>> >
>> > I have a simple problem. I get stuck in using the imported spss data
>> (.sav)
>> > using "read.spss".
>> > I imported data (z) without any problem. After importing, the first
>> column
>> > doesn't contain any "NA". but when I choose a subset of it (like:
>> > z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in
>> the
>> > first column).
>> >
>> > The (.sav) file is the output of Compustat (WRDS).
>> >
>> > It is terrible, I can't find the mistake.
>> >
>> > Thank you in advance for your help,
>> > Elham
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Peter Dalgaard, Professor,
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Office: A 4.23
>> Email: [hidden email]  Priv: [hidden email]
>>
>> ______________________________________________
>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Importing data using Foreign

Eric Berger
c(1:3)[c(1,NA,3)]
[1] 1 NA 3


On Wed, Aug 26, 2020 at 6:06 PM Elham Daadmehr <[hidden email]> wrote:

> Thanks guys. but I'm a bit confused. the input is the first column (z[,1]
> and z1[,1]).
> How is it possible that a subset of a non-NA vector, contains NA?
>
> On Wed, Aug 26, 2020 at 4:58 PM Eric Berger <[hidden email]> wrote:
>
>> Good point! :-)
>>
>>
>> On Wed, Aug 26, 2020 at 5:55 PM peter dalgaard <[hidden email]> wrote:
>>
>>> Offhand, I suspect that the NAs are in the 8th column.
>>>
>>> > On 26 Aug 2020, at 10:57 , Elham Daadmehr <[hidden email]>
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I have a simple problem. I get stuck in using the imported spss data
>>> (.sav)
>>> > using "read.spss".
>>> > I imported data (z) without any problem. After importing, the first
>>> column
>>> > doesn't contain any "NA". but when I choose a subset of it (like:
>>> > z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in
>>> the
>>> > first column).
>>> >
>>> > The (.sav) file is the output of Compustat (WRDS).
>>> >
>>> > It is terrible, I can't find the mistake.
>>> >
>>> > Thank you in advance for your help,
>>> > Elham
>>> >
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>> --
>>> Peter Dalgaard, Professor,
>>> Center for Statistics, Copenhagen Business School
>>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>>> Phone: (+45)38153501
>>> Office: A 4.23
>>> Email: [hidden email]  Priv: [hidden email]
>>>
>>> ______________________________________________
>>> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Importing data using Foreign

Peter Dalgaard-2
In reply to this post by Elham Daadmehr
It is because you don't know whether you want it or not.

It is a bit more obvious with integer indexing, as in color[race]: if race is NA you don't know what color to put in, but the result should be the same length as race.

With logical indices, the behaviour is a bit annoying, but ultimately follows from the coercion rules: You might think that you could treat NA as FALSE (& the subset() function does just that), but then you'd get the problem that x[NA] would differ from x[as.integer(NA)] because NA is of mode "logical", lowest in the coercion hierarchy.

-pd

> On 26 Aug 2020, at 17:06 , Elham Daadmehr <[hidden email]> wrote:
>
> Thanks guys. but I'm a bit confused. the input is the first column (z[,1] and z1[,1]).
> How is it possible that a subset of a non-NA vector, contains NA?
>
> On Wed, Aug 26, 2020 at 4:58 PM Eric Berger <[hidden email]> wrote:
> Good point! :-)
>
>
> On Wed, Aug 26, 2020 at 5:55 PM peter dalgaard <[hidden email]> wrote:
> Offhand, I suspect that the NAs are in the 8th column.
>
> > On 26 Aug 2020, at 10:57 , Elham Daadmehr <[hidden email]> wrote:
> >
> > Hi all,
> >
> > I have a simple problem. I get stuck in using the imported spss data (.sav)
> > using "read.spss".
> > I imported data (z) without any problem. After importing, the first column
> > doesn't contain any "NA". but when I choose a subset of it (like:
> > z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the
> > first column).
> >
> > The (.sav) file is the output of Compustat (WRDS).
> >
> > It is terrible, I can't find the mistake.
> >
> > Thank you in advance for your help,
> > Elham
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
> ______________________________________________
> [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email]  Priv: [hidden email]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|

Re: Importing data using Foreign

Elham Daadmehr
Thanks a lot. I’ve got it just now.

On Wed, Aug 26, 2020 at 6:03 PM peter dalgaard <[hidden email]> wrote:

> It is because you don't know whether you want it or not.
>
> It is a bit more obvious with integer indexing, as in color[race]: if race
> is NA you don't know what color to put in, but the result should be the
> same length as race.
>
> With logical indices, the behaviour is a bit annoying, but ultimately
> follows from the coercion rules: You might think that you could treat NA as
> FALSE (& the subset() function does just that), but then you'd get the
> problem that x[NA] would differ from x[as.integer(NA)] because NA is of
> mode "logical", lowest in the coercion hierarchy.
>
> -pd
>
> > On 26 Aug 2020, at 17:06 , Elham Daadmehr <[hidden email]> wrote:
> >
> > Thanks guys. but I'm a bit confused. the input is the first column
> (z[,1] and z1[,1]).
> > How is it possible that a subset of a non-NA vector, contains NA?
> >
> > On Wed, Aug 26, 2020 at 4:58 PM Eric Berger <[hidden email]>
> wrote:
> > Good point! :-)
> >
> >
> > On Wed, Aug 26, 2020 at 5:55 PM peter dalgaard <[hidden email]> wrote:
> > Offhand, I suspect that the NAs are in the 8th column.
> >
> > > On 26 Aug 2020, at 10:57 , Elham Daadmehr <[hidden email]>
> wrote:
> > >
> > > Hi all,
> > >
> > > I have a simple problem. I get stuck in using the imported spss data
> (.sav)
> > > using "read.spss".
> > > I imported data (z) without any problem. After importing, the first
> column
> > > doesn't contain any "NA". but when I choose a subset of it (like:
> > > z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in
> the
> > > first column).
> > >
> > > The (.sav) file is the output of Compustat (WRDS).
> > >
> > > It is terrible, I can't find the mistake.
> > >
> > > Thank you in advance for your help,
> > > Elham
> > >
> > >       [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > --
> > Peter Dalgaard, Professor,
> > Center for Statistics, Copenhagen Business School
> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Office: A 4.23
> > Email: [hidden email]  Priv: [hidden email]
> >
> > ______________________________________________
> > [hidden email] mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email]  Priv: [hidden email]
>
>
>
>
>
>
>
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.