Volume 32 Issue 6
Nov.  2023
Turn off MathJax
Article Contents
ZHANG Chunhao, XIE Bin, ZHANG Yiran, “Reverse-Nearest-Neighbor-Based Clustering by Fast Search and Find of Density Peaks,” Chinese Journal of Electronics, vol. 32, no. 6, pp. 1341-1354, 2023, doi: 10.23919/cje.2022.00.165
Citation: ZHANG Chunhao, XIE Bin, ZHANG Yiran, “Reverse-Nearest-Neighbor-Based Clustering by Fast Search and Find of Density Peaks,” Chinese Journal of Electronics, vol. 32, no. 6, pp. 1341-1354, 2023, doi: 10.23919/cje.2022.00.165

Reverse-Nearest-Neighbor-Based Clustering by Fast Search and Find of Density Peaks

doi: 10.23919/cje.2022.00.165
Funds:  This work was supported by the National Natural Science Foundation of China (62076088) and the Technological Innovation Foundation of Hebei Normal University (L2020K09)
More Information
  • Author Bio:

    Chunhao ZHANG received the B.S. degree in information and computational science from Hebei University of Engineering in 2019. He is now an M.S. candidate of Hebei Normal University. His research interests include machine learning and data mining. (Email: zhangchunhao_hebtu@163.com)

    Bin XIE (corresponding author) received the B.S. degree in computational mathematics from Jilin University in 1998. He received the M.S. degree in computational mathematics from Jilin University in 2004. He received the Ph.D. degree in applied mathematics from Hebei Normal University in 2011. His research interests include granular computing, machine learning and approximate reasoning. (Email: xiebin_hebtu@126.com)

    Yiran ZHANG received the B.E. degree in software engineering from Hebei Normal University in 2019. She is now an M.S. candidate of Hebei Normal University. Her research interests include machine learning and data mining. (Email: zhangyiran19@163.com)

  • Received Date: 2022-06-08
  • Accepted Date: 2022-09-26
  • Available Online: 2022-10-09
  • Publish Date: 2023-11-05
  • Clustering by fast search and find of density peaks (CFSFDP) has the advantages of a novel idea, easy implementation, and efficient clustering. It has been widely recognized in various fields since it was proposed in Science in 2014. The CFSFDP algorithm also has certain limitations, such as non-unified sample density metrics defined by cutoff distance, the domino effect for the assignment of remaining samples triggered by unstable assignment strategy, and the phenomenon of picking wrong density peaks as cluster centers. We propose reverse-nearest-neighbor-based clustering by fast search and find of density peaks (RNN-CFSFDP) to avoid these shortcomings. We redesign and unify the sample density metric by introducing reverse nearest neighbor. The newly defined local density metric and the K-nearest neighbors of each sample are combined to make the assignment process more robust and alleviate the domino effect. A cluster fusion algorithm is proposed, which further alleviates the domino effect and effectively avoids the phenomenon of picking wrong density peaks as cluster centers. Experimental results on publicly available synthetic data sets and real-world data sets show that in most cases, the proposed algorithm is superior to or at least equivalent to the comparative methods in clustering performance. The proposed algorithm works better on manifold data sets and uneven density data sets.
  • loading
  • [1]
    Z. W. Gu, P. Li, X. Lang, et al., “A multi-granularity density peak clustering algorithm based on variational mode decomposition,” Chinese Journal of Electronics, vol.30, no.4, pp.658–668, 2021. doi: 10.1049/cje.2021.03.001
    [2]
    Y. Oktar and M. Turkan, “A review of sparsity-based clustering methods,” Signal Processing, vol.148, pp.20–30, 2018. doi: 10.1016/j.sigpro.2018.02.010
    [3]
    M. S. Chang, L. H. Chen, L. J. Hung, et al., “Exact algorithms for problems related to the densest k-set problem,” Information Processing Letters, vol.114, no.9, pp.510–513, 2014. doi: 10.1016/j.ipl.2014.04.009
    [4]
    Y. Shi, Z. S. Chen, Z. Q. Qi, et al., “A novel clustering-based image segmentation via density peaks algorithm with mid-level feature,” Neural Computing and Applications, vol.28, no.S1, pp.29–39, 2017. doi: 10.1007/s00521-016-2300-1
    [5]
    Y. Wang, “Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol.17, no.1s, article no.10, 2021. doi: 10.1145/3408317
    [6]
    J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, pp.281–297, 1967.
    [7]
    J. C. Bezdek, R. Ehrlich, and W. Full, “FCM: the fuzzy c-means clustering algorithm,” Computers & Geosciences, vol.10, no.2-3, pp.191–203, 1984. doi: 10.1016/0098-3004(84)90020-7
    [8]
    W. Wang, J. Yang, and R. R. Muntz, “STING: A statistical information grid approach to spatial data mining,” in Proceedings of the 23rd International Conference on Very Large Data Bases, San Francisco, CA, United States, pp.186–195, 1997.
    [9]
    S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large databases,” Information Systems, vol.26, no.1, pp.35–58, 2001. doi: 10.1016/S0306-4379(01)00008-4
    [10]
    X. F. Wang and Y. F. Xu, “Fast clustering using adaptive density peak detection,” Statistical Methods in Medical Research, vol.26, no.6, pp.2800–2811, 2017. doi: 10.1177/0962280215609948
    [11]
    J. Xu, G. Y. Wang, and W. H. Deng, “DenPEHC: Density peak based efficient hierarchical clustering,” Information Sciences, vol.373, pp.200–218, 2016. doi: 10.1016/j.ins.2016.08.086
    [12]
    M. Ester, H. P. Kriegel, J. Sander, et al., “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, pp.226–231, 1996.
    [13]
    B. X. Zhao, S. L. Wang, and C. L. Liu, “STATE: A clustering algorithm focusing on edges instead of centers,” Chinese Journal of Electronics, vol.30, no.5, pp.902–908, 2021. doi: 10.1049/cje.2021.07.001
    [14]
    Y. Wang, W. J. Zhang, L. Wu, et al., “Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering,” in Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, NY, USA, pp.2153–2159, 2016.
    [15]
    A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, vol.344, no.6191, pp.1492–1496, 2014. doi: 10.1126/science.1242072
    [16]
    J. Y. Xie, H. C. Gao, and W. X. Xie, “K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset,” Scientia Sinica:Informationis, vol.46, no.2, pp.258–280, 2016. (in Chinese) doi: 10.1360/N112015-00135
    [17]
    T. F. Xu and J. H. Jiang, “A Graph Adaptive Density Peaks Clustering algorithm for automatic centroid selection and effective aggregation,” Expert Systems with Applications, vol.195, article no.116539, 2022. doi: 10.1016/J.ESWA.2022.116539
    [18]
    X. Xu, S. F. Ding, H. Xu, et al., “A feasible density peaks clustering algorithm with a merging strategy,” Soft Computing, vol.23, no.13, pp.5171–5183, 2019. doi: 10.1007/s00500-018-3183-0
    [19]
    J. Y. Xie, H. C. Gao, W. X. Xie, et al., “Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors,” Information Sciences, vol.354, pp.19–40, 2016. doi: 10.1016/j.ins.2016.03.011
    [20]
    J. H. Jiang, Y. J. Chen, X. Q. Meng, et al., “A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process,” Physica A:Statistical Mechanics and its Applications, vol.523, pp.702–713, 2019. doi: 10.1016/j.physa.2019.03.012
    [21]
    R. Zhang, T. Du, S. N. Qu, et al., “Adaptive density-based clustering algorithm with shared KNN conflict game,” Information Sciences, vol.565, pp.344–369, 2021. doi: 10.1016/j.ins.2021.02.017
    [22]
    Y. H. Liu, Z. M. Ma, and F. Yu, “Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy,” Knowledge-Based Systems, vol.133, pp.208–220, 2017. doi: 10.1016/j.knosys.2017.07.010
    [23]
    L. Bai, X. Q. Cheng, J. Y. Liang, et al., “Fast density clustering strategies based on the k-means algorithm,” Pattern Recognition, vol.71, pp.375–386, 2017. doi: 10.1016/j.patcog.2017.06.023
    [24]
    A. Bryant and K. Cios, “RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates,” IEEE Transactions on Knowledge and Data Engineering, vol.30, no.6, pp.1109–1121, 2018. doi: 10.1109/tkde.2017.2787640
    [25]
    M. Chen, L. J. Li, B. Wang, et al., “Effectively clustering by finding density backbone based-on kNN,” Pattern Recognition, vol.60, pp.486–498, 2016. doi: 10.1016/j.patcog.2016.04.018
    [26]
    Y. W. Chen, L. L. Shen, C. M. Zhong, et al., “Survey on density peak clustering algorithm,” Journal of Computer Research and Development, vol.57, no.2, pp.378–394, 2020. (in Chinese) doi: 10.7544/issn1000-1239.2020.20190104
    [27]
    L. M. Fu and E. Medico, “FLAME, A novel fuzzy clustering method for the analysis of DNA microarray data,” BMC Bioinformatics, vol.8, article no.3, 2007. doi: 10.1186/1471-2105-8-3
    [28]
    H. Chang and D. Y. Yeung, “Robust path-based spectral clustering,” Pattern Recognition, vol.41, no.1, pp.191–203, 2008. doi: 10.1016/j.patcog.2007.04.010
    [29]
    A. Gionis, H. Mannila, and P. Tsaparas, “Clustering aggregation,” ACM Transactions on Knowledge Discovery from Data, vol.1, no.1, pp.4–es, 2007. doi: 10.1145/1217299.1217303
    [30]
    A. K. Jain and M. H. C. Law, “Data clustering: A user’s dilemma,” in Proceedings of the 1st International Conference on Pattern Recognition and Machine Intelligence, Kolkata, India, pp.1–10, 2005.
    [31]
    C. J. Veenman, M. J. T. Reinders, and E. Backer, “A maximum variance cluster algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, no.9, pp.1273–1280, 2002. doi: 10.1109/TPAMI.2002.1033218
    [32]
    Q. S. Zhu, J. Feng, and J. L. Huang, “Natural neighbor: A self-adaptive neighborhood method without parameter K,” Pattern Recognition Letters, vol.80, pp.30–36, 2016. doi: 10.1016/j.patrec.2016.05.007
    [33]
    C. T. Zahn, “Graph-theoretical methods for detecting and describing gestalt clusters,” IEEE Transactions on Computers, vol.C-20, no.1, pp.68–86, 1971. doi: 10.1109/T-C.1971.223083
    [34]
    S. F. Ding, M. J. Du, T. F. Sun, et al., “An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood,” Knowledge-Based Systems, vol.133, pp.294–313, 2017. doi: 10.1016/j.knosys.2017.07.027
    [35]
    L. Hubert and P. Arabie, “Comparing partitions,” Journal of Classification, vol.2, no.1, pp.193–218, 1985. doi: 10.1007/BF01908075
    [36]
    R. Liu, H. Wang, and X. M. Yu, “Shared-nearest-neighbor-based clustering by fast search and find of density peaks,” Information Sciences, vol.450, pp.200–226, 2018. doi: 10.1016/j.ins.2018.03.031
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(6)

    Article Metrics

    Article views (629) PDF downloads(51) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return