Quantcast

How to detect and exclude outliers in R?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

How to detect and exclude outliers in R?

vikrant
Suppose I am reading data from a file and the data contains some outliers. I want to know if it is possible in R to automatically detect outliers in a dataset and remove them
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to detect and exclude outliers in R?

Daniel Nordlund-2
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> On Behalf Of vikrant
> Sent: Monday, January 18, 2010 10:09 PM
> To: [hidden email]
> Subject: [R] How to detect and exclude outliers in R?
>
>
> Suppose I am reading data from a file and the data contains some outliers.
> I
> want to know if it is possible in R to automatically detect outliers in a
> dataset and remove them
> --

You will need to provide more information. What is your definition of an outlier?  And, why should those data be removed?

Daniel Nordlund
Bothell, WA USA
 

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to detect and exclude outliers in R?

GlenB
In reply to this post by vikrant
What makes an outlier an outlier depends on the model. A highly discrepant observation under one model is entirely typical under another.

Even given a model, criteria for what consititutes an outlier vary by application area and user.

Even given all of that, exclusion is only one of many possible actions.

Can you be more specific about your model for the data?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to detect and exclude outliers in R?

milton ruser
In reply to this post by vikrant
Hi V.S.,


Did you search first on r-repositories about this issue prior to ask?
May be not. RSiteSearch("outliers")
bests

milton



On Tue, Jan 19, 2010 at 1:08 AM, vikrant <[hidden email]> wrote:

>
> Suppose I am reading data from a file and the data contains some outliers.
> I
> want to know if it is possible in R to automatically detect outliers in a
> dataset and remove them
> --
> View this message in context:
> http://n4.nabble.com/How-to-detect-and-exclude-outliers-in-R-tp1017285p1017285.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

PCA scores and loadings

Paul Dennis-3
In reply to this post by GlenB

Dear R users group

I have performed PCA using the function rda in vegan and then used plot(pcaobject).  I have a couple of questions:

1) The default plot shows the individual sites (black) and the variables (red). What I want however is a plot showing the mean of site groups with bidirectional error bars displaying the standard deviation for those groups (with the variables still plotted in the background)...

2) ...I know how to do this by export the scores and loadings to excel and then using excel or Sigmaplot to do the graphs; however then I have an issue with the scaling of the loadings (i.e. the values are so small that they are bunched up at the origin) so here is my second question:  Can I multiply the loadings by a constant to display them in my plot and if yes what is the convention for doing this.

Many thanks

Paul

     
_________________________________________________________________
Tell us your greatest, weirdest and funniest Hotmail stories

        [[alternative HTML version deleted]]

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to detect and exclude outliers in R?

Eik Vettorazzi
In reply to this post by vikrant

fortune("outlier")

vikrant schrieb:
> Suppose I am reading data from a file and the data contains some outliers. I
> want to know if it is possible in R to automatically detect outliers in a
> dataset and remove them
>  

--
Eik Vettorazzi
Institut f├╝r Medizinische Biometrie und Epidemiologie
Universit├Ątsklinikum Hamburg-Eppendorf

Martinistr. 52
20246 Hamburg

T ++49/40/7410-58243
F ++49/40/7410-57790

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: How to detect and exclude outliers in R?

Guy Green
In reply to this post by vikrant
I had a similar problem.  In my case, I had a large table of data and wanted to find and exclude a single huge value in one column (i.e. remove the entire row).  There were thousands of rows of data, and this single value was more than 3x the next value, and at least 30x the typical value.  I wanted to see what the effect of removing that one datapoint was, without having to change the underlying data.
 
This finds & removes that one value.  I assume it could be repeated to get rid of more values based on pre-defined criteria:
 
First, load the "outliers" package.
 
outlier_tf = outlier(data_full$target column,logical=TRUE)
#This gives an array with all values False, except for the outlier (as defined in the package documentation "Finds value with largest difference between it and sample mean, which can be an outlier").  That value is returned as True.
find_outlier = which(outlier_tf==TRUE,arr.ind=TRUE)
#This finds the location of the outlier by finding that "True" value within the "outlier_tf" array.
data_new = data_full[-find_outlier,]
#This creates a new dataset based on the old data, removing the one row that contains the outlier

Guy

vikrant wrote
Suppose I am reading data from a file and the data contains some outliers. I want to know if it is possible in R to automatically detect outliers in a dataset and remove them
Loading...