Reverse-Nearest-Neighbor-Based Clustering by Fast Search and Find of Density Peaks

ZHANG Chunhao; XIE Bin; ZHANG Yiran

doi:10.23919/cje.2022.00.165

Volume 32 Issue 6

Nov. 2023

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Electronics > 2023 > 32(6): 1341-1354

ZHANG Chunhao, XIE Bin, ZHANG Yiran, “Reverse-Nearest-Neighbor-Based Clustering by Fast Search and Find of Density Peaks,” Chinese Journal of Electronics, vol. 32, no. 6, pp. 1341-1354, 2023, doi: 10.23919/cje.2022.00.165

Citation:

ZHANG Chunhao, XIE Bin, ZHANG Yiran, “Reverse-Nearest-Neighbor-Based Clustering by Fast Search and Find of Density Peaks,” Chinese Journal of Electronics, vol. 32, no. 6, pp. 1341-1354, 2023, doi: 10.23919/cje.2022.00.165

Citation:

PDF( 3468 KB)

Reverse-Nearest-Neighbor-Based Clustering by Fast Search and Find of Density Peaks

doi: 10.23919/cje.2022.00.165

ZHANG Chunhao^1
,,
XIE Bin^{1, 2, 3
,},
ZHANG Yiran^1
,

1.
College of Computer and Cyber Security, Hebei Normal University, Shijiazhuang 050024, China
2.
Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics and Data Security, Hebei Normal University, Shijiazhuang 050024, China
3.
Hebei Provincial Key Laboratory of Network and Information Security, Hebei Normal University, Shijiazhuang 050024, China

Funds: This work was supported by the National Natural Science Foundation of China (62076088) and the Technological Innovation Foundation of Hebei Normal University (L2020K09)

More Information

Author Bio:
Chunhao ZHANG received the B.S. degree in information and computational science from Hebei University of Engineering in 2019. He is now an M.S. candidate of Hebei Normal University. His research interests include machine learning and data mining. (Email: zhangchunhao_hebtu@163.com)

Bin XIE (corresponding author) received the B.S. degree in computational mathematics from Jilin University in 1998. He received the M.S. degree in computational mathematics from Jilin University in 2004. He received the Ph.D. degree in applied mathematics from Hebei Normal University in 2011. His research interests include granular computing, machine learning and approximate reasoning. (Email: xiebin_hebtu@126.com)

Yiran ZHANG received the B.E. degree in software engineering from Hebei Normal University in 2019. She is now an M.S. candidate of Hebei Normal University. Her research interests include machine learning and data mining. (Email: zhangyiran19@163.com)
Received Date: 2022-06-08
Accepted Date: 2022-09-26

Available Online: 2022-10-09

Publish Date: 2023-11-05

Abstract

Abstract

Clustering by fast search and find of density peaks (CFSFDP) has the advantages of a novel idea, easy implementation, and efficient clustering. It has been widely recognized in various fields since it was proposed in Science in 2014. The CFSFDP algorithm also has certain limitations, such as non-unified sample density metrics defined by cutoff distance, the domino effect for the assignment of remaining samples triggered by unstable assignment strategy, and the phenomenon of picking wrong density peaks as cluster centers. We propose reverse-nearest-neighbor-based clustering by fast search and find of density peaks (RNN-CFSFDP) to avoid these shortcomings. We redesign and unify the sample density metric by introducing reverse nearest neighbor. The newly defined local density metric and the K-nearest neighbors of each sample are combined to make the assignment process more robust and alleviate the domino effect. A cluster fusion algorithm is proposed, which further alleviates the domino effect and effectively avoids the phenomenon of picking wrong density peaks as cluster centers. Experimental results on publicly available synthetic data sets and real-world data sets show that in most cases, the proposed algorithm is superior to or at least equivalent to the comparative methods in clustering performance. The proposed algorithm works better on manifold data sets and uneven density data sets.
- Density peaks,
- Reverse nearest neighbor,
- Clustering,
- Cluster fusion

FullText(HTML)

References(36)

References

[1]	Z. W. Gu, P. Li, X. Lang, et al., “A multi-granularity density peak clustering algorithm based on variational mode decomposition,” Chinese Journal of Electronics, vol.30, no.4, pp.658–668, 2021. doi: 10.1049/cje.2021.03.001
[2]	Y. Oktar and M. Turkan, “A review of sparsity-based clustering methods,” Signal Processing, vol.148, pp.20–30, 2018. doi: 10.1016/j.sigpro.2018.02.010
[3]	M. S. Chang, L. H. Chen, L. J. Hung, et al., “Exact algorithms for problems related to the densest k-set problem,” Information Processing Letters, vol.114, no.9, pp.510–513, 2014. doi: 10.1016/j.ipl.2014.04.009
[4]	Y. Shi, Z. S. Chen, Z. Q. Qi, et al., “A novel clustering-based image segmentation via density peaks algorithm with mid-level feature,” Neural Computing and Applications, vol.28, no.S1, pp.29–39, 2017. doi: 10.1007/s00521-016-2300-1
[5]	Y. Wang, “Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol.17, no.1s, article no.10, 2021. doi: 10.1145/3408317
[6]	J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, pp.281–297, 1967.
[7]	J. C. Bezdek, R. Ehrlich, and W. Full, “FCM: the fuzzy c-means clustering algorithm,” Computers & Geosciences, vol.10, no.2-3, pp.191–203, 1984. doi: 10.1016/0098-3004(84)90020-7
[8]	W. Wang, J. Yang, and R. R. Muntz, “STING: A statistical information grid approach to spatial data mining,” in Proceedings of the 23rd International Conference on Very Large Data Bases, San Francisco, CA, United States, pp.186–195, 1997.
[9]	S. Guha, R. Rastogi, and K. Shim, “Cure: An efficient clustering algorithm for large databases,” Information Systems, vol.26, no.1, pp.35–58, 2001. doi: 10.1016/S0306-4379(01)00008-4
[10]	X. F. Wang and Y. F. Xu, “Fast clustering using adaptive density peak detection,” Statistical Methods in Medical Research, vol.26, no.6, pp.2800–2811, 2017. doi: 10.1177/0962280215609948
[11]	J. Xu, G. Y. Wang, and W. H. Deng, “DenPEHC: Density peak based efficient hierarchical clustering,” Information Sciences, vol.373, pp.200–218, 2016. doi: 10.1016/j.ins.2016.08.086
[12]	M. Ester, H. P. Kriegel, J. Sander, et al., “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, pp.226–231, 1996.
[13]	B. X. Zhao, S. L. Wang, and C. L. Liu, “STATE: A clustering algorithm focusing on edges instead of centers,” Chinese Journal of Electronics, vol.30, no.5, pp.902–908, 2021. doi: 10.1049/cje.2021.07.001
[14]	Y. Wang, W. J. Zhang, L. Wu, et al., “Iterative views agreement: An iterative low-rank based structured optimization method to multi-view spectral clustering,” in Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, NY, USA, pp.2153–2159, 2016.
[15]	A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, vol.344, no.6191, pp.1492–1496, 2014. doi: 10.1126/science.1242072
[16]	J. Y. Xie, H. C. Gao, and W. X. Xie, “K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset,” Scientia Sinica:Informationis, vol.46, no.2, pp.258–280, 2016. (in Chinese) doi: 10.1360/N112015-00135
[17]	T. F. Xu and J. H. Jiang, “A Graph Adaptive Density Peaks Clustering algorithm for automatic centroid selection and effective aggregation,” Expert Systems with Applications, vol.195, article no.116539, 2022. doi: 10.1016/J.ESWA.2022.116539
[18]	X. Xu, S. F. Ding, H. Xu, et al., “A feasible density peaks clustering algorithm with a merging strategy,” Soft Computing, vol.23, no.13, pp.5171–5183, 2019. doi: 10.1007/s00500-018-3183-0
[19]	J. Y. Xie, H. C. Gao, W. X. Xie, et al., “Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors,” Information Sciences, vol.354, pp.19–40, 2016. doi: 10.1016/j.ins.2016.03.011
[20]	J. H. Jiang, Y. J. Chen, X. Q. Meng, et al., “A novel density peaks clustering algorithm based on k nearest neighbors for improving assignment process,” Physica A:Statistical Mechanics and its Applications, vol.523, pp.702–713, 2019. doi: 10.1016/j.physa.2019.03.012
[21]	R. Zhang, T. Du, S. N. Qu, et al., “Adaptive density-based clustering algorithm with shared KNN conflict game,” Information Sciences, vol.565, pp.344–369, 2021. doi: 10.1016/j.ins.2021.02.017
[22]	Y. H. Liu, Z. M. Ma, and F. Yu, “Adaptive density peak clustering based on K-nearest neighbors with aggregating strategy,” Knowledge-Based Systems, vol.133, pp.208–220, 2017. doi: 10.1016/j.knosys.2017.07.010
[23]	L. Bai, X. Q. Cheng, J. Y. Liang, et al., “Fast density clustering strategies based on the k-means algorithm,” Pattern Recognition, vol.71, pp.375–386, 2017. doi: 10.1016/j.patcog.2017.06.023
[24]	A. Bryant and K. Cios, “RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates,” IEEE Transactions on Knowledge and Data Engineering, vol.30, no.6, pp.1109–1121, 2018. doi: 10.1109/tkde.2017.2787640
[25]	M. Chen, L. J. Li, B. Wang, et al., “Effectively clustering by finding density backbone based-on kNN,” Pattern Recognition, vol.60, pp.486–498, 2016. doi: 10.1016/j.patcog.2016.04.018
[26]	Y. W. Chen, L. L. Shen, C. M. Zhong, et al., “Survey on density peak clustering algorithm,” Journal of Computer Research and Development, vol.57, no.2, pp.378–394, 2020. (in Chinese) doi: 10.7544/issn1000-1239.2020.20190104
[27]	L. M. Fu and E. Medico, “FLAME, A novel fuzzy clustering method for the analysis of DNA microarray data,” BMC Bioinformatics, vol.8, article no.3, 2007. doi: 10.1186/1471-2105-8-3
[28]	H. Chang and D. Y. Yeung, “Robust path-based spectral clustering,” Pattern Recognition, vol.41, no.1, pp.191–203, 2008. doi: 10.1016/j.patcog.2007.04.010
[29]	A. Gionis, H. Mannila, and P. Tsaparas, “Clustering aggregation,” ACM Transactions on Knowledge Discovery from Data, vol.1, no.1, pp.4–es, 2007. doi: 10.1145/1217299.1217303
[30]	A. K. Jain and M. H. C. Law, “Data clustering: A user’s dilemma,” in Proceedings of the 1st International Conference on Pattern Recognition and Machine Intelligence, Kolkata, India, pp.1–10, 2005.
[31]	C. J. Veenman, M. J. T. Reinders, and E. Backer, “A maximum variance cluster algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, no.9, pp.1273–1280, 2002. doi: 10.1109/TPAMI.2002.1033218
[32]	Q. S. Zhu, J. Feng, and J. L. Huang, “Natural neighbor: A self-adaptive neighborhood method without parameter K,” Pattern Recognition Letters, vol.80, pp.30–36, 2016. doi: 10.1016/j.patrec.2016.05.007
[33]	C. T. Zahn, “Graph-theoretical methods for detecting and describing gestalt clusters,” IEEE Transactions on Computers, vol.C-20, no.1, pp.68–86, 1971. doi: 10.1109/T-C.1971.223083
[34]	S. F. Ding, M. J. Du, T. F. Sun, et al., “An entropy-based density peaks clustering algorithm for mixed type data employing fuzzy neighborhood,” Knowledge-Based Systems, vol.133, pp.294–313, 2017. doi: 10.1016/j.knosys.2017.07.027
[35]	L. Hubert and P. Arabie, “Comparing partitions,” Journal of Classification, vol.2, no.1, pp.193–218, 1985. doi: 10.1007/BF01908075
[36]	R. Liu, H. Wang, and X. M. Yu, “Shared-nearest-neighbor-based clustering by fast search and find of density peaks,” Information Sciences, vol.450, pp.200–226, 2018. doi: 10.1016/j.ins.2018.03.031