# removing outlier

7 messages
Open this post in threaded view
|

## removing outlier

 Hey, i want to remove outliers so I tried do do this: # 1 define mean and sd sd.AT_ZU_SPAET <- sd(AT_ZU_SPAET) mitt.AT_ZU_SPAET <- mean(AT_ZU_SPAET) # sd.Anzahl_BAF <- sd(Anzahl_BAF) mitt.Anzahl_BAF <- mean(Anzahl_BAF) # sd.Änderungsintervall <- sd(Änderungsintervall) mitt.Änderungsintervall <- mean(Änderungsintervall) # # 2 identify outliers DA[ abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) > ( 3 * sd.AT_ZU_SPAET)  , ] DA[ abs(Anzahl_BAF - mitt.Anzahl_BAF) > ( 3 * sd.Anzahl_BAF)  , ] DA[ abs(Änderungsintervall - mitt.Änderungsintervall) > ( 3 * sd.Änderungsintervall)  , ] # # 3 remove outliers AT_ZU_SPAET.clean <- DA[ (abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) < (3*sd.AT_ZU_SPAET)), ] Anzahl_BAF.clean <- DA[ (abs(Anzahl_BAF - mitt.Anzahl_BAF) < (3*sd.Anzahl_BAF)), ] Änderungsintervall.clean <- DA[ (abs(Änderungsintervall - mitt.Änderungsintervall) < (3*sd.Änderungsintervall)), ] My problem ist, that I am only able to remove the outliers of one column of my table, but I want to remove the outliers of every column of the table. Could anybody help me?
Open this post in threaded view
|

## Re: removing outlier

 Hi Juli, What you can do is to make your outlier remover into a function like this: remove_outlier_by_sd<-function(x,nsd=3) {  meanx<-mean(x,na.rm=TRUE)  sdx<-sd(x,na.rm=TRUE)  return(x[abs(x-xmean) < nsd*sdx]) } Then apply the function to your data frame ("table") newDA<-sapply(DA,remove_outlier_by_sd) newDA will be a list, as it is likely that its elements will be of different lengths. You may be told that you really shouldn't remove outliers and learn to love them, but I will leave that to others. Jim On Sat, Sep 12, 2015 at 12:15 AM, Juli <[hidden email]> wrote: > Hey, > > i want to remove outliers so I tried do do this: > > # 1 define mean and sd > sd.AT_ZU_SPAET <- sd(AT_ZU_SPAET) > mitt.AT_ZU_SPAET <- mean(AT_ZU_SPAET) > # > sd.Anzahl_BAF <- sd(Anzahl_BAF) > mitt.Anzahl_BAF <- mean(Anzahl_BAF) > # > sd.Änderungsintervall <- sd(Änderungsintervall) > mitt.Änderungsintervall <- mean(Änderungsintervall) > # > # 2 identify outliers > DA[ abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) > ( 3 * sd.AT_ZU_SPAET)  , ] > DA[ abs(Anzahl_BAF - mitt.Anzahl_BAF) > ( 3 * sd.Anzahl_BAF)  , ] > DA[ abs(Änderungsintervall - mitt.Änderungsintervall) > ( 3 * > sd.Änderungsintervall)  , ] > # > # 3 remove outliers > AT_ZU_SPAET.clean <- DA[ (abs(AT_ZU_SPAET - mitt.AT_ZU_SPAET) < > (3*sd.AT_ZU_SPAET)), ] > Anzahl_BAF.clean <- DA[ (abs(Anzahl_BAF - mitt.Anzahl_BAF) < > (3*sd.Anzahl_BAF)), ] > Änderungsintervall.clean <- DA[ (abs(Änderungsintervall - > mitt.Änderungsintervall) < > (3*sd.Änderungsintervall)), ] > > My problem ist, that I am only able to remove the outliers of one column of > my table, but I want to remove the outliers of every column of the table. > > Could anybody help me? > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/removing-outlier-tp4712137.html> Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > [hidden email] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.         [[alternative HTML version deleted]] ______________________________________________ [hidden email] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.
Open this post in threaded view
|

## Re: removing outlier

 Hi Jim, thank you for your help. :) My point is, that there are outlier and I don´t really know how to deal with that. I need the dataframe for a regression and read often that only a few outlier can change your results very much. In addition, regression diacnostics didn´t indcate me the best results. Yes, and I know its not the core of statistics to work in a way you get results you would like to have ;). So what is your suggestion? And if I remove the outliers, my problem ist, that as you said, they differ in length. I need the data frame for a regression, so can I remove the whole column or is there a call to exclude the data? JULI
Open this post in threaded view
|

## Re: removing outlier

Open this post in threaded view
|

## Re: removing outlier

Open this post in threaded view
|

## Re: removing outlier

 If this mailing list accepted formatted submissions I would have used the trèsModernSarcastic font for my first sentence. Failing the availability of that mode of communication I am (top) posting through Nabble (perhaps)  in "Comic Sans".