princomp() with missing values in panel data?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

princomp() with missing values in panel data?

ivo welch-2
dear R wizards:  the good news is that I know how to omit missing
observations and run a principal components analysis.

  p= princomp( na.omit( dataset ) )
  p$scores[ ,1]  # the first factor

(where dataset contains missing values;  incidentally, princomp(retailsmall,
na.action=na.omit) does not work for me, so I must be doing something wrong,
here.)  the bad news is that I would like NA observations to be retained as
NA, so that I can reinsert the factors into the data set:
  dataset$first.factor = p$scores[,1]
there must be an elegant way of doing this.  help appreciated.

may I humbly suggest that in linear models, it would be intuitive if the
default would be for NA's to be ignored in the model computations, and that
the functions residuals and fitted (and similar, such as scores() ) to
understand when a particular obs num should be NA?

help, as always, appreciated.

sincerely,

/ivo welch

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: princomp() with missing values in panel data?

Prof Brian Ripley
See ?na.exclude (on the same page as na.omit)

On Mon, 16 Jan 2006, ivo welch wrote:

> dear R wizards:  the good news is that I know how to omit missing
> observations and run a principal components analysis.
>
>  p= princomp( na.omit( dataset ) )
>  p$scores[ ,1]  # the first factor
>
> (where dataset contains missing values;  incidentally, princomp(retailsmall,
> na.action=na.omit) does not work for me, so I must be doing something wrong,
> here.)

See ?princomp: only the formula method has an na.action argument.

> the bad news is that I would like NA observations to be retained as
> NA, so that I can reinsert the factors into the data set:
>  dataset$first.factor = p$scores[,1]
> there must be an elegant way of doing this.  help appreciated.
>
> may I humbly suggest that in linear models, it would be intuitive if the
> default would be for NA's to be ignored in the model computations, and that
> the functions residuals and fitted (and similar, such as scores() ) to
> understand when a particular obs num should be NA?

There is no function scores().

> help, as always, appreciated.
>
> sincerely,
>
> /ivo welch
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: princomp() with missing values in panel data?

ivo welch-2
thank you.  I am still not sure how to get the scores in princomp, though:

ds= as.data.frame( cbind(rnorm(10),rnorm(10)) )
names(ds)=c("x1","x2")
ds[5,]=c(NA,NA)
pc= princomp( formula = ~ ds$x1 + ds$x2, na.action=na.omit)
ds$pc1 = pc$scores[,1]  #<-- error, scores has 9 obs, ds has 10 obs

is there an elegant method to do this, or do I need to learn how to operate
with pc$loadings?  (may I also humbly suggest that the default behavior or
$scores should be to contain NA in row 5?)

Incidentally, R is a lot cleverer than I understand.  pc$loadings by itself
gives me wonderfully intuitive output, with names, text, different
components---but I can still use p$loadings[,2].  I presume that the array
operator on the "loadings" object is overloaded.  very nice.

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: princomp() with missing values in panel data?

Henric Nilsson
ivo welch said the following on 2006-01-18 14:56:

> thank you.  I am still not sure how to get the scores in princomp, though:
>
> ds= as.data.frame( cbind(rnorm(10),rnorm(10)) )
> names(ds)=c("x1","x2")
> ds[5,]=c(NA,NA)
> pc= princomp( formula = ~ ds$x1 + ds$x2, na.action=na.omit)
> ds$pc1 = pc$scores[,1]  #<-- error, scores has 9 obs, ds has 10 obs
>
> is there an elegant method to do this, or do I need to learn how to operate

Prof Ripley told you how to do it: `na.action = na.exclude'.

> with pc$loadings?  (may I also humbly suggest that the default behavior or
> $scores should be to contain NA in row 5?)

Choosing sensible defaults in the case of NAs is a tricky business.

Personally, I'd like the default to be `na.fail', so that I don't miss
out on NAs if I've been sloppy while screening the data. Genrally, just
ignoring missings and analysing the data as if it were complete may lead
to seriously biased results.

> Incidentally, R is a lot cleverer than I understand.  pc$loadings by itself

Sometimes R is almost too clever, and I end up feeling humliated when
finding out that I'm the stupid one... ;-)


HTH,
Henric



> gives me wonderfully intuitive output, with names, text, different
> components---but I can still use p$loadings[,2].  I presume that the array
> operator on the "loadings" object is overloaded.  very nice.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: princomp() with missing values in panel data?

Prof Brian Ripley
On Thu, 19 Jan 2006, Henric Nilsson wrote:

> ivo welch said the following on 2006-01-18 14:56:
>
>> thank you.  I am still not sure how to get the scores in princomp, though:
>>
>> ds= as.data.frame( cbind(rnorm(10),rnorm(10)) )
>> names(ds)=c("x1","x2")
>> ds[5,]=c(NA,NA)
>> pc= princomp( formula = ~ ds$x1 + ds$x2, na.action=na.omit)
>> ds$pc1 = pc$scores[,1]  #<-- error, scores has 9 obs, ds has 10 obs
>>
>> is there an elegant method to do this, or do I need to learn how to operate
>
> Prof Ripley told you how to do it: `na.action = na.exclude'.
>
>> with pc$loadings?  (may I also humbly suggest that the default behavior or
>> $scores should be to contain NA in row 5?)
>
> Choosing sensible defaults in the case of NAs is a tricky business.
>
> Personally, I'd like the default to be `na.fail', so that I don't miss
> out on NAs if I've been sloppy while screening the data. Genrally, just
> ignoring missings and analysing the data as if it were complete may lead
> to seriously biased results.

I tend to agree (and so does S).  You can achieve this with
options(na.action=na.fail), almost everywhere.

--
Brian D. Ripley,                  [hidden email]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html