During data collection and analysis, it is often necessary to identify and possibly remove outliers that exist. It is often critical to have an objective method of identifying outliers to be removed. There are many automated outlier detection methods, however, many are limited by assumptions of a distribution or they require upper and lower pre-defined boundaries in which the data should exist. If there is a known distribution for the data, then using that distribution can aid in finding outliers. Often, a distribution is not known, or the experimenter does not want to make an assumption about a certain distribution. Also, enough information may not exist about a set of data to be able to determine reliable upper and lower boundaries. For these cases, an outlier detection method, using the empirical data and based upon Chebyshev’s inequality, was formed. This method also allows for detection of multiple outliers, not just one at a time.
Revised: February 1, 2006 |
Published: May 12, 2005
Citation
Amidan B.G., T.A. Ferryman, and S.K. Cooley. 2005.Data Outlier Detection using the Chebyshev Theorem. In 2005 IEEE Aerospace Conference, 1-6. Manhattan Beach, California:IEEE Conference Publications. PNWD-SA-6701.