# removing outlier

## removing outlier

 Hey, i want to remove outliers so I tried do do this: # 1 define mean and sd sd.AT_ZU_SPAET <- sd(AT_ZU_SPAET) mitt.AT_ZU_SPAET <- mean(AT_ZU_SPAET) # sd.Anzahl_BAF <- sd(Anzahl_BAF) mitt.Anzahl_BAF <- mean(Anzahl_BAF) # sd.Änderungsintervall <- sd(Änderungsintervall) mitt.Änderungsintervall <- mean(Änderungsintervall) # # 2 identify outliers DA[ abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) > ( 3 * sd.AT_ZU_SPAET)  , ] DA[ abs(Anzahl_BAF - mitt.Anzahl_BAF) > ( 3 * sd.Anzahl_BAF)  , ] DA[ abs(Änderungsintervall - mitt.Änderungsintervall) > ( 3 * sd.Änderungsintervall)  , ] # # 3 remove outliers AT_ZU_SPAET.clean <- DA[ (abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) < (3*sd.AT_ZU_SPAET)), ] Anzahl_BAF.clean <- DA[ (abs(Anzahl_BAF - mitt.Anzahl_BAF) < (3*sd.Anzahl_BAF)), ] Änderungsintervall.clean <- DA[ (abs(Änderungsintervall - mitt.Änderungsintervall) < (3*sd.Änderungsintervall)), ] My problem ist, that I am only able to remove the outliers of one column of my table, but I want to remove the outliers of every column of the table. Could anybody help me?
## Re: removing outlier

 Hi Juli,

What you can do is to make your outlier remover into a function like this:

remove_outlier_by_sd<-function(x,nsd=3) {
 meanx<-mean(x,na.rm=TRUE)
 sdx<-sd(x,na.rm=TRUE)
 return(x[abs(x-xmean) < nsd*sdx])
}

Then apply the function to your data frame ("table")

newDA<-sapply(DA,remove_outlier_by_sd)

newDA will be a list, as it is likely that its elements will be of
different lengths. You may be told that you really shouldn't remove
outliers and learn to love them, but I will leave that to others.

Jim

On Sat, Sep 12, 2015 at 12:15 AM, Juli <[hidden email]> wrote:

> Hey,
>
> i want to remove outliers so I tried do do this:
>
> # 1 define mean and sd
> sd.AT_ZU_SPAET <- sd(AT_ZU_SPAET)
> mitt.AT_ZU_SPAET <- mean(AT_ZU_SPAET)
> #
> sd.Anzahl_BAF <- sd(Anzahl_BAF)
> mitt.Anzahl_BAF <- mean(Anzahl_BAF)
> #
> sd.Änderungsintervall <- sd(Änderungsintervall)
> mitt.Änderungsintervall <- mean(Änderungsintervall)
> #
> # 2 identify outliers
> DA[ abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) > ( 3 * sd.AT_ZU_SPAET)  , ]
> DA[ abs(Anzahl_BAF - mitt.Anzahl_BAF) > ( 3 * sd.Anzahl_BAF)  , ]
> DA[ abs(Änderungsintervall - mitt.Änderungsintervall) > ( 3 *
> sd.Änderungsintervall)  , ]
> #
> # 3 remove outliers
> AT_ZU_SPAET.clean <- DA[ (abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) <
> (3*sd.AT_ZU_SPAET)), ]
> Anzahl_BAF.clean <- DA[ (abs(Anzahl_BAF - mitt.Anzahl_BAF) <
> (3*sd.Anzahl_BAF)), ]
> Änderungsintervall.clean <- DA[ (abs(Änderungsintervall -
> mitt.Änderungsintervall) <
> (3*sd.Änderungsintervall)), ]
>
> My problem ist, that I am only able to remove the outliers of one column of
> my table, but I want to remove the outliers of every column of the table.
>
> Could anybody help me?
## Re: removing outlier

 Hi Jim, thank you for your help. :) My point is, that there are outlier and I don´t really know how to deal with that. I need the dataframe for a regression and read often that only a few outlier can change your results very much. In addition, regression diacnostics didn´t indcate me the best results. Yes, and I know its not the core of statistics to work in a way you get results you would like to have ;). So what is your suggestion? And if I remove the outliers, my problem ist, that as you said, they differ in length. I need the data frame for a regression, so can I remove the whole column or is there a call to exclude the data? JULI
## Re: removing outlier

## Re: removing outlier

## Re: removing outlier

 If this mailing list accepted formatted submissions I would have used the trèsModernSarcastic font for my first sentence. Failing the availability of that mode of communication I am (top) posting through Nabble (perhaps)  in "Comic Sans".