LU Zhimao, LIU Chen, ZHANG Qi, Massinanke Sambourou, FAN Dongmei. Super Large Data Sets Clustering by Means Radial Compression[J]. Chinese Journal of Electronics, 2013, 22(2): 335-340.
Citation: LU Zhimao, LIU Chen, ZHANG Qi, Massinanke Sambourou, FAN Dongmei. Super Large Data Sets Clustering by Means Radial Compression[J]. Chinese Journal of Electronics, 2013, 22(2): 335-340.

Super Large Data Sets Clustering by Means Radial Compression

Funds:  This work is supported partly by the National Natural Science Foundation of China (No.60603092, No.60903082, No.60975042), the Research Fund for the Doctoral Program of Higher Education of China (No.20070217043).
  • Received Date: 2012-06-01
  • Rev Recd Date: 2012-06-01
  • Publish Date: 2013-04-25
  • Clustering analysis is an effective technique for exploring data analysis which has been widely applied to varied tasks. Many classical clustering algorithms do good jobs on their prerequisite, but few of them are scalable when applied to Very large data sets (VLDS). In this study, a novel means radial compression clustering method is proposed to deal with the VLDS. First, the concept of means radial compression is defined to describe theoretical model. Next, mean merging is defined and it is proved that the process of mean merging is an efficient method for the implementation of means radial compression. Then, the members will be assigned to the suitable clusters based on the minimum distance between each member and the centers that is found by means radial compression clustering. The experimental results show that means radial compression algorithm can make better solutions compared with the most well known clustering algorithms as K-means clustering, affinity propagation clustering, hierarchical clustering with time complexity of O(n).
  • loading
  • J.B. MacQueen, "Some methods for classification and analysis of multivariate observations", the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA, pp.281-297, 1967.
    B. Tian, C.A. Kulikowski et al, "A global k-modes algorithm for clustering categorical data", Chinese Journal of Electronics, Vol.21, No.3, pp.460-465, 2012.
    B.J. Frey and D. Dueck, "Clustering by passing message between data points", Science, Vol.315, No.5814, pp.972-976, 2007.
    H. Liu, G.L. Yao, J.H. Wu, "CBA-MAC: An adaptive MAC protocol over clustering based Wireless Sensor Networks", Acta Electronica Sinica, Vol.39, No.1, pp.133-137, 2011. (in Chinese)
    T. Zhang, R. Ramakrishna, M. Livny, "Birch: An efficient data clustering method for large databases", SIGMOD, Vol.25, No.2, pp.103-114, 1996.
    S. Guha, R. Rastogi and K. Shim, "Cure: An efficient clustering algorithm for large databases", SIGMOD Rec, Vol.27, No.2, pp.73-84, 1998.
    M. Liu, X.L. Wang, Y.C. Liu, "A fast clustering algorithm for large-scale and high dimensional data", Acta Automatica Sinica, Vol.35, No.7, pp.859-866, 2009. (in Chinese)
    G. Sheikholeslami, S. Chatterjee, A. Zhang, "WaveCluster: A muti-resolution clustering approach for very large spatial databases", Proc. 24th International Conference on Very Large Database, New York, USA, pp.428-139, 1998.
    R.T. Ng and J. Han, "Efficient and effective clustering methods for spatial data mining", Proceedings of the 20th VLDB Conference, Santiago, Chile, pp.144-155, 1994.
    P. Bradley, U. Fayyad, C. Reina, "Scaling clustering algorithms to large databases", Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD98), pp.9-15, 1998.
    P. Bradley, U. Fayyad, C. Reina, "Scaling EM (expectationmaximization) clustering to large databases", Technical Report MSR-TR-98-35, Microsoft Research, pp.9-15, 1998.
    A.P. Dempster, N.M. Laird and D.B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm", Journal of the Royal Statistical Society Series B, Vol.39, No.1, pp.1-38, 1977.
    S.J. Kiddle, O.P. Windram, S. Mchattie, "Temporal clustering by affinity propagation reveals transcriptional modules in arabidopsis thaliana", Bioinformatics, Vol.26, No.3, pp.355-362, 2010.
    M. Mézard, "Passing messages between disciplines", Science, Vol.301, No.5640, pp.1685-1686, 2003.
    Y. Xiao and J. Yu, "Semi-supervised clustering based on affinity propagation algorithm", Journal of Software, Vol.19, No.11, pp.2803-2813, 2008. (in Chinese)
    P.J. Qian, S.T. Wang, Z.H. Deng et al, "Fast spectral clustering for large data sets using minimal enclosing ball", Acta Electronica Sinica, Vol.38, No.9, pp.2035-2041, 2010. (in Chinese)
    UCI Machine Learning Repositpory. Available: http://archive. ics.uci.edu/ml/, 2011.12.5.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (394) PDF downloads(1064) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return