WANG Shuliang, WANG Dakui, LI Caoyuan, LI Yan, DING Gangyi. Clustering by Fast Search and Find of Density Peaks with Data Field[J]. Chinese Journal of Electronics, 2016, 25(3): 397-402. doi: 10.1049/cje.2016.05.001
Citation: WANG Shuliang, WANG Dakui, LI Caoyuan, LI Yan, DING Gangyi. Clustering by Fast Search and Find of Density Peaks with Data Field[J]. Chinese Journal of Electronics, 2016, 25(3): 397-402. doi: 10.1049/cje.2016.05.001

Clustering by Fast Search and Find of Density Peaks with Data Field

doi: 10.1049/cje.2016.05.001
Funds:  This work is supported by the National Natural Science Foundation of China (No.61173061, No.61472039, No.71201120) and the Doctoral Fund of Higher Education (No.20121101110036).
More Information
  • Corresponding author: WANG Dakui is a Ph.D. candidate inWuhan Universiry in China. His research interests include data field and data mining. (Email: dkwang2013@whu.edu.cn)
  • Received Date: 2015-04-08
  • Rev Recd Date: 2015-06-03
  • Publish Date: 2016-05-10
  • A clustering algorithm named "Clustering by fast search and find of density peaks" is for finding the centers of clusters quickly. Its accuracy excessively depended on the threshold, and no efficient way was given to select its suitable value, i.e., the value was suggested be estimated on the basis of empirical experience. A new way is proposed to automatically extract the optimal value of threshold by using the potential entropy of data field from the original dataset. For any dataset to be clustered, the threshold can be calculated from the dataset objectively instead of empirical estimation. The results of comparative experiments have shown the algorithm with the threshold from data field can get better clustering results than with the threshold from empirical experience.
  • loading
  • A. Rodriguez and A. Laio, "Clustering by fast search and find of density peaks", Science, Vol.344, No.6191, pp.1492-1496, 2014.
    United Nations Global Pulse, Big Data for Development: Challenges & Opportunities, http://unglobalpulse.org/, 2012.
    C. Seife, "Big data: The revolution is digitized", Nature, Vol.518, pp.480-481, 2014.
    L. Einav and J. Levin, "Economics in the age of big data", Science, Vol.346, No.6210, pp.715, 2014.
    E.E. Schadt, M.D. Linderman, J. Sorenson, L. Lee and G.P. Nolan, "Computational solutions to large-scale data management and analysis", Nature Reviews Genetics, Vol.11, pp.647- 657, 2010.
    S.L. Wang, W.Y. Gan, D.Y. Li and D.R. Li, "Data field for hierarchical clustering", International Journal of Data Warehousing and Mining, Vol.7, No.2, pp.43-63, 2011.
    A. Rajaraman and J.D. Ullman, Mining of Massive Datasets, Cambridge University Press, London, UK, 2011.
    R. Xu and D. Wunsch, "Survey of clustering algorithms", IEEE Transactions on Neural Networks, Vol.16, No.3, pp.645-678, 2005.
    C.C. Aggarwal and C.K. Reddy, Data Clustering: Algorithms and Applications, CRC Press, New York, USA, 2014.
    D.R. Li, S.L. Wang, D.Y. Li, Spatial Data Mining Theories and Applications (second edition), Science Press, Beijing, China, 2013.
    R. Ng and J. Han, "CLARANS: A method for clustering objects for spatial data mining", IEEE Transactions on Knowledge and Data Engineering, Vol.14, No.5, pp.1003-1016, 2002.
    G. Karypis, et al., "Chameleon: Hierarchical clustering using dynamic modeling", Computer, Vol.32, No.8, pp.68-75, 1999.
    M. Ester, et al., "A density-based algorithm for discovering clusters in large spatial databases with noise". Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp.226-331, 1996.
    F. Murtagh and P. Contreras, "Algorithms for hierarchical clustering: An overview", WIREs Data Mining and Knowledge Discovery, Vol.2, No.1, pp.86-97, 2012.
    H.N. Yuan, S.L. Wang, Y. Li, and J.H. Fan, "Feature selection with data field". Chinese Journal of Electronics, Vol.23, No.4, pp.661-665, 2014.
    G. Sheikholeslami, et al., "WaveCluster: A multi-resolution clustering approach for very large spatial databases" Proceedings of the 24th International Conference on Very Large Databases, pp.428-439, 1998.
    G.J. McLachlan and T. Krishnan, EM Algorithm and Extensions, Wiley, New York, 1997.
    T. Zhang, "BIRCH: An efficient dataclustering method for very large databases", Proceedings of the 1996 ACM SIGMOD International Conference on Man-Agement of Data, pp.103-114, 1996.
    R.K. Zalik, "An efficient k-means clustering algorithm", Pattern Recognit Letter, Vol.29, pp.1385-1391, 2008.
    S.L. Wang and Y. Chen, "HASTA: A hierarchical-grid clustering algorithm with data field", International Journal of Data Warehousing and Mining, Vol.10, No.2, pp.39-54, 2014.
    S.L. Wang, J.H. Fan, M. Fang, and H.N. Yuan, "HGCUDF: Hierarchical grid clustering using data field", Chinese Journal of Electronics, Vol.23, No.1, pp.37-42, 2014.
    S.L. Wang, Y. Li, W. Tu and P. Wang, "Automatic quantitative analysis and localization of protein expression with GDF", International Journal of Data Mining and Bioinformatics, Vol.10, No.3, pp.300-314, 2014.
    S.L. Wang and H.N. Yuan, "Spatial data mining: A perspective of big data", International Journal of Data Warehousing and Mining, Vol.10, No.4, pp.50-70, 2014.
    I. Bárány and V. Vu, "Central limit theorems for Gaussian polytopes", Annals of Probability (Institute of Mathematical Statistics), Vol.35, No.4, pp.1593-1621, 2007.
    S.L. Wang, D.K. Wang, C.Y. Li, Y. Li, "Comment on Clustering by fast search and find of density peaks", arXiv: CoRR abs/1501.04267, 2015.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (649) PDF downloads(2537) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return