handling NA by mean replacement

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

handling NA by mean replacement

Julie Bernauer
Hello

I am sorry fuch such a stupid question. Suppose I have a table of data having a
lot of NAs and I want to replace those NAs by the mean of the column before NA
replacement. How is it possible to do that efficiently ?

Thanks in advance,

Julie

--
Julie Bernauer
Yeast Structural Genomics
http://www.genomics.eu.org

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: handling NA by mean replacement

Bert Gunter
Lots of other folks will give you the simple answer (hint: ?'['  ?is.na)

Yours is one of those "iceberg" questions  -- 2/3 hidden underwater.

Two points:

Point 1: Generally you **don't have to do such replacement** as most of R's
functions have a na.rm or na.action argument (unfortunately, for historical
reasons, the argument names and meanings aren't consistent) that does
basically what you want anyway.

Point 2: Doing what you ask is probably a bad idea, as it creates mythical
degrees of freedom and biases results --> gives wrong statistical answers.

As a general matter, handling missing values "correctly" is a difficult
statistical issue that you may want to avoid if you can (R has plenty of
packages that can deal with it, but it requires background expertise).
Honestly, I'm not sure "if you can" makes any sense here (how do you know?),
but let's just say that I think your potential for mischief is reduced if
you use R's inbuilt arguments for ignoring missings rather than imputing
them naively.

Having said that, I believe that clustering procedures, for example, may not
permit this (but they have builtin missing imputation capabilities of their
own, do they not?), so you may have to impute. In this case, try to do so
wisely (e.g. via multiple imputation?).

Perhaps this will stimulate real experts to offer you some advice. Good
luck.

Cheers,
Bert
 
Bert Gunter
Genentech

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Julie Bernauer
> Sent: Monday, January 30, 2006 8:50 AM
> To: [hidden email]
> Subject: [R] handling NA by mean replacement
>
> Hello
>
> I am sorry fuch such a stupid question. Suppose I have a
> table of data having a
> lot of NAs and I want to replace those NAs by the mean of the
> column before NA
> replacement. How is it possible to do that efficiently ?
>
> Thanks in advance,
>
> Julie
>
> --
> Julie Bernauer
> Yeast Structural Genomics
> http://www.genomics.eu.org
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: handling NA by mean replacement

Gabor Grothendieck
In reply to this post by Julie Bernauer
Don't know about efficiency but here is one way:

# test data
A  <- matrix(1:54, ncol=6)
A[3,3] <- A[6,6] <- A[5,6] <- NA

f <- function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x)
apply(A, 2, f)

On 1/30/06, Julie Bernauer <[hidden email]> wrote:

> Hello
>
> I am sorry fuch such a stupid question. Suppose I have a table of data having a
> lot of NAs and I want to replace those NAs by the mean of the column before NA
> replacement. How is it possible to do that efficiently ?
>
> Thanks in advance,
>
> Julie
>
> --
> Julie Bernauer
> Yeast Structural Genomics
> http://www.genomics.eu.org
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: handling NA by mean replacement

Sean Davis
In reply to this post by Julie Bernauer
You might also want to look at the "impute" package on CRAN.

Sean


On 1/30/06 11:50 AM, "Julie Bernauer" <[hidden email]>
wrote:

> Hello
>
> I am sorry fuch such a stupid question. Suppose I have a table of data having
> a
> lot of NAs and I want to replace those NAs by the mean of the column before NA
> replacement. How is it possible to do that efficiently ?
>
> Thanks in advance,
>
> Julie

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Reply | Threaded
Open this post in threaded view
|

Re: handling NA by mean replacement

James Reilly
In reply to this post by Bert Gunter

Here are a couple of documents that make much the same point (e.g. "mean
value imputation is not recommended"), and discuss several alternatives.

http://nces.ed.gov/statprog/2002/appendixb3.asp
http://www2.chass.ncsu.edu/garson/pa765/missing.htm

I think we'd need more information on the context to provide any real
advice. Another possible source of help is the Impute mailing list:
http://lists.utsouthwestern.edu/mailman/listinfo/impute

Cheers,
James
--
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand

On 31/01/2006 6:20 a.m., Berton Gunter wrote:

> Lots of other folks will give you the simple answer (hint: ?'['  ?is.na)
>
> Yours is one of those "iceberg" questions  -- 2/3 hidden underwater.
>
> Two points:
>
> Point 1: Generally you **don't have to do such replacement** as most of R's
> functions have a na.rm or na.action argument (unfortunately, for historical
> reasons, the argument names and meanings aren't consistent) that does
> basically what you want anyway.
>
> Point 2: Doing what you ask is probably a bad idea, as it creates mythical
> degrees of freedom and biases results --> gives wrong statistical answers.
>
> As a general matter, handling missing values "correctly" is a difficult
> statistical issue that you may want to avoid if you can (R has plenty of
> packages that can deal with it, but it requires background expertise).
> Honestly, I'm not sure "if you can" makes any sense here (how do you know?),
> but let's just say that I think your potential for mischief is reduced if
> you use R's inbuilt arguments for ignoring missings rather than imputing
> them naively.
>
> Having said that, I believe that clustering procedures, for example, may not
> permit this (but they have builtin missing imputation capabilities of their
> own, do they not?), so you may have to impute. In this case, try to do so
> wisely (e.g. via multiple imputation?).
>
> Perhaps this will stimulate real experts to offer you some advice. Good
> luck.
>
> Cheers,
> Bert
>  
> Bert Gunter
> Genentech
>
>> -----Original Message-----
>> From: [hidden email]
>> [mailto:[hidden email]] On Behalf Of Julie Bernauer
>> Sent: Monday, January 30, 2006 8:50 AM
>> To: [hidden email]
>> Subject: [R] handling NA by mean replacement
>>
>> Hello
>>
>> I am sorry fuch such a stupid question. Suppose I have a
>> table of data having a
>> lot of NAs and I want to replace those NAs by the mean of the
>> column before NA
>> replacement. How is it possible to do that efficiently ?
>>
>> Thanks in advance,
>>
>> Julie
>>
>> --
>> Julie Bernauer
>> Yeast Structural Genomics
>> http://www.genomics.eu.org
>>
>> ______________________________________________
>> [hidden email] mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>>
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html