> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> On Behalf Of vikrant
> Sent: Monday, January 18, 2010 10:09 PM
> To: [hidden email] > Subject: [R] How to detect and exclude outliers in R?
> Suppose I am reading data from a file and the data contains some outliers.
> want to know if it is possible in R to automatically detect outliers in a
> dataset and remove them
You will need to provide more information. What is your definition of an outlier? And, why should those data be removed?
I have performed PCA using the function rda in vegan and then used plot(pcaobject). I have a couple of questions:
1) The default plot shows the individual sites (black) and the variables (red). What I want however is a plot showing the mean of site groups with bidirectional error bars displaying the standard deviation for those groups (with the variables still plotted in the background)...
2) ...I know how to do this by export the scores and loadings to excel and then using excel or Sigmaplot to do the graphs; however then I have an issue with the scaling of the loadings (i.e. the values are so small that they are bunched up at the origin) so here is my second question: Can I multiply the loadings by a constant to display them in my plot and if yes what is the convention for doing this.
Tell us your greatest, weirdest and funniest Hotmail stories
I had a similar problem. In my case, I had a large table of data and wanted to find and exclude a single huge value in one column (i.e. remove the entire row). There were thousands of rows of data, and this single value was more than 3x the next value, and at least 30x the typical value. I wanted to see what the effect of removing that one datapoint was, without having to change the underlying data.
This finds & removes that one value. I assume it could be repeated to get rid of more values based on pre-defined criteria:
First, load the "outliers" package.
outlier_tf = outlier(data_full$target column,logical=TRUE) #This gives an array with all values False, except for the outlier (as defined in the package documentation "Finds value with largest difference between it and sample mean, which can be an outlier"). That value is returned as True.
find_outlier = which(outlier_tf==TRUE,arr.ind=TRUE) #This finds the location of the outlier by finding that "True" value within the "outlier_tf" array.
data_new = data_full[-find_outlier,] #This creates a new dataset based on the old data, removing the one row that contains the outlier
Suppose I am reading data from a file and the data contains some outliers. I want to know if it is possible in R to automatically detect outliers in a dataset and remove them