detect multivariate outliers with aq.plot {mvoutliers} high dimensions

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

detect multivariate outliers with aq.plot {mvoutliers} high dimensions

monaR
Hei,
i have a species abundance data set CommData, with n (samples)=40 and p (species)=107.
Sample Species A Species B Species C Species D ….
411_2010 40 20 0 0
412_2010 30 20 0 0
413_2010 0 0 0 0
414_2010 0 10 0 0
415_2010 20 0 0 0
418_2010 0 0 0 0
419_2010 0 0 0 0
421_2010 160 40 0 10
….
       
I try to find outliers based on the Mahalonis distance with the package {mvoutliers}. I get an error using >aq.plot(CommData): "Error in covMcd(x, alpha = quan) : n <= p -- you can't be serious!"
SoI try >pcout(CommData), which is supposed to work for high dimensions, but get the error "More than 50% equal values in one or more variables!"

Can this be fixed? Any idea how i can find outliers in my multidimensional data?
Thanks a lot for any help!!
Reply | Threaded
Open this post in threaded view
|

Re: detect multivariate outliers with aq.plot {mvoutliers} high dimensions

Mark Difford
This post has NOT been accepted by the mailing list yet.
On Jul 31, 2013 @ 12:45pm MonaR wrote:

> I try to find outliers based on the Mahalonis distance with the package {mvoutliers}. I get an error using >aq.plot
> (CommData): "Error in covMcd(x, alpha = quan) : n <= p -- you can't be serious!"

Basically you don't have enough cases/samples to get even close to doing this. If I were you I would look at partial least squares methods, which are designed to handle many variables and few cases/samples. If, that is, you have to remove outliers, which usually is not advised.

Regards, Mark.
Mark Difford (Ph.D.)
Research Associate
Botany Department
Nelson Mandela Metropolitan University
Port Elizabeth, South Africa
Reply | Threaded
Open this post in threaded view
|

Re: detect multivariate outliers with aq.plot {mvoutliers} high dimensions

monaR
OK, thanks:) Maybe for my purpose a visible evaluation is also fine. But in general I imagine it to happen quite often that in ecological data, people have e.g. more species than sites.