| Peer-Reviewed

Discordancy in Reduced Dimensions of Outliers in High-Dimensional Datasets: Application of an Updating Formula

Published: 2 April 2013
Views:       Downloads:
Abstract

In multivariate outlier studies, the sum of squares and cross-product (SSCP) is an important property of the data matrix. For example, the much used Mahalanobis distance and the Wilk's ratio make use of SSCP matrices. One of the SSCP matrices involved in outlier studies is the matrix for the set of multiple outliers in the data. In this paper, an explicit expression for this matrix is derived. It has then been shown that in general the discordancy of multiple outliers is preserved along Multiple-Outlier Displaying Components with much lower dimensions than the original high-dimensional dataset.

Published in American Journal of Theoretical and Applied Statistics (Volume 2, Issue 2)
DOI 10.11648/j.ajtas.20130202.14
Page(s) 29-37
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2013. Published by Science Publishing Group

Keywords

Outlier Detection, Discordancy, Updating Formula, Outlier Displaying Components

References
[1] Anderson, T.W. (2003). Introduction to Multivariate Statistical Analysis. New Jersey: Prentice Hall.
[2] Barnett, V., & Lewis, T. (1994). Outliers in Statistical Data. (3rd ed.) New York: John Wiley and Sons Limited.
[3] Caroni, C. & Prescott, P. (1992). Sequential Application of Wilk's Multivariate Outlier Test. Applied Statistics, 41, 355-364.
[4] Gordor, B. K. & Fieller, N. R. J. (1999). How to display an outlier in multivariate datasets. Journal of Applied Sciences & Technology, 4(2).
[5] Johnson, R.A. and Wichern, D.W. (2002). Applied Multivariate Statistical Analysis. New Jersey: Prentice Hall.
[6] Miller, K. S. (1981). On the Inverse of the Sum of Matrices. JSTOR 54(2), 67-72.
[7] Nkansah, B. K. & Gordor B. K. (2012a): A Procedure for Detecting a Pair of Outliers in Multivariate Datasets. Studies in Mathematical Sciences 4(2), 1-9.
[8] Nkansah, B. K. & Gordor B. K. (2012b): On the One-Outlier Displaying Component in Multivariate Datasets. Journal of Informatics and Mathematical Sciences, 4(2), 229-239.
[9] Pan, J. X., & Wang, X. R. (1994). Unbiasedness of a Multivariate Outlier Test For Elliptically Contoured Distributions, Multivariate Analysis and its Applications. Monograph Series, 24, 457-460.
[10] Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical Recipes: The Art of Scientific Computing (3rd ed.), New York: Cambridge University Press.
[11] Sharma, S. (1996). Applied Multivariate Techniques, New York: Wiley.
[12] Wilks, S. S. (1963). Multivariate Statistical Outliers. Sankhya, A, 25, 407-426.
Cite This Article
  • APA Style

    B. K. Nkansah, B. K. Gordor. (2013). Discordancy in Reduced Dimensions of Outliers in High-Dimensional Datasets: Application of an Updating Formula. American Journal of Theoretical and Applied Statistics, 2(2), 29-37. https://doi.org/10.11648/j.ajtas.20130202.14

    Copy | Download

    ACS Style

    B. K. Nkansah; B. K. Gordor. Discordancy in Reduced Dimensions of Outliers in High-Dimensional Datasets: Application of an Updating Formula. Am. J. Theor. Appl. Stat. 2013, 2(2), 29-37. doi: 10.11648/j.ajtas.20130202.14

    Copy | Download

    AMA Style

    B. K. Nkansah, B. K. Gordor. Discordancy in Reduced Dimensions of Outliers in High-Dimensional Datasets: Application of an Updating Formula. Am J Theor Appl Stat. 2013;2(2):29-37. doi: 10.11648/j.ajtas.20130202.14

    Copy | Download

  • @article{10.11648/j.ajtas.20130202.14,
      author = {B. K. Nkansah and B. K. Gordor},
      title = {Discordancy in Reduced Dimensions of Outliers in High-Dimensional Datasets: Application of an Updating Formula},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {2},
      number = {2},
      pages = {29-37},
      doi = {10.11648/j.ajtas.20130202.14},
      url = {https://doi.org/10.11648/j.ajtas.20130202.14},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20130202.14},
      abstract = {In multivariate outlier studies, the sum of squares and cross-product (SSCP) is an important property of the data matrix. For example, the much used Mahalanobis distance and the Wilk's ratio make use of SSCP matrices. One of the SSCP matrices involved in outlier studies is the matrix for the set of multiple outliers in the data. In this paper, an explicit expression for this matrix is derived. It has then been shown that in general the discordancy of multiple outliers is preserved along Multiple-Outlier Displaying Components with much lower dimensions than the original high-dimensional dataset.},
     year = {2013}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Discordancy in Reduced Dimensions of Outliers in High-Dimensional Datasets: Application of an Updating Formula
    AU  - B. K. Nkansah
    AU  - B. K. Gordor
    Y1  - 2013/04/02
    PY  - 2013
    N1  - https://doi.org/10.11648/j.ajtas.20130202.14
    DO  - 10.11648/j.ajtas.20130202.14
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 29
    EP  - 37
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20130202.14
    AB  - In multivariate outlier studies, the sum of squares and cross-product (SSCP) is an important property of the data matrix. For example, the much used Mahalanobis distance and the Wilk's ratio make use of SSCP matrices. One of the SSCP matrices involved in outlier studies is the matrix for the set of multiple outliers in the data. In this paper, an explicit expression for this matrix is derived. It has then been shown that in general the discordancy of multiple outliers is preserved along Multiple-Outlier Displaying Components with much lower dimensions than the original high-dimensional dataset.
    VL  - 2
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Department of Mathematics and Statistics, Cape Coast, Ghana

  • Department of Mathematics and Statistics, Cape Coast, Ghana

  • Sections