Quantcast

Filtering out bad data points

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Filtering out bad data points

Robert A'gata
Hi,

I always have a question about how to do this best in R. I have a data
frame and a set of criteria to filter points out. My procedure is to
always locate indices of those points, check if index vector length is
greater than 0 or not and then remove them. Meaning

dftest <- data.frame(x=rnorm(100),y=rnorm(100));
qtile <- quantile(dftest$x,probs=c(0.05,0.95));
badIdx <- which((dftest$x < qtile[1]) | (dftest$x > qtile[2]));
if (length(badIdx) > 0) {
    dftest <- dftest[-idx,];
}

My question is that is there a more streamlined way to achieve this? Thank you.

Cheers,

Robert

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Filtering out bad data points

Bill.Venables
You could use a function to do the job:

withinRange <- function(x, r = quantile(x, c(0.05, 0.95)))
    x >= r[1] & x <= r[2]

dtest2 <- subset(dftest, withinRange(x))

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Robert A'gata
Sent: Tuesday, 10 May 2011 10:57 AM
To: [hidden email]
Subject: [R] Filtering out bad data points

Hi,

I always have a question about how to do this best in R. I have a data
frame and a set of criteria to filter points out. My procedure is to
always locate indices of those points, check if index vector length is
greater than 0 or not and then remove them. Meaning

dftest <- data.frame(x=rnorm(100),y=rnorm(100));
qtile <- quantile(dftest$x,probs=c(0.05,0.95));
badIdx <- which((dftest$x < qtile[1]) | (dftest$x > qtile[2]));
if (length(badIdx) > 0) {
    dftest <- dftest[-idx,];
}

My question is that is there a more streamlined way to achieve this? Thank you.

Cheers,

Robert

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...