Citation: | GUANG Mingjian, YAN Chungang, LIU Guanjun, et al., “A Novel Neighborhood-Weighted Sampling Method for Imbalanced Datasets,” Chinese Journal of Electronics, vol. 31, no. 5, pp. 969-979, 2022, doi: 10.1049/cje.2021.00.121 |
[1] |
F. Zhang, G. Liu, Z. Li, et al., “GMM-based undersampling and its application for credit card fraud detection,” in Proc. of International Joint Conference on Neural Networks, Budapest, Hungary, pp.1–8, 2019.
|
[2] |
L. Zheng, G. Liu, C. Yan, et al., “Transaction fraud detection based on total order relation and behavior diversity,” IEEE Transactions on Computational Social Systems, vol.5, no.3, pp.796–806, 2018. doi: 10.1109/TCSS.2018.2856910
|
[3] |
C. Jiang, J. Song, G. Liu, et al., “Credit card fraud detection: A novel approach using aggregation strategy and feedback mechanism,” IEEE Internet of Things Journal, vol.5, no.5, pp.3637–3647, 2018. doi: 10.1109/JIOT.2018.2816007
|
[4] |
Z. Li, M. Huang, G. Liu, et al., “A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection,” Expert Systems with Applications, vol.175, pp.1–10, 2021.
|
[5] |
C. Yang, G. Liu, C. Yan, et al., “A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection,” Science China-Information Science, vol.64, no.12, pp.1–11, 2021.
|
[6] |
L. Zheng, G. Liu, C. Yan, et al., “Improved tradaboost and its application to transaction fraud detection,” IEEE Transactions on Computational Social Systems, vol.7, no.5, pp.1304–1316, 2020. doi: 10.1109/TCSS.2020.3017013
|
[7] |
Z. Li, G. Liu, and C. Jiang, “Deep representation learning with full center loss for credit card fraud detection,” IEEE Transactions on Computational Social Systems, vol.7, no.2, pp.569–579, 2020. doi: 10.1109/TCSS.2020.2970805
|
[8] |
S. Xuan, G. Liu, Z. Li, et al., “Random forest for credit card fraud detection,” in Proc. of IEEE 15th International Conference on Networking, Sensing and Control (ICNSC) , Zhuhai, China, pp.1–6, 2018.
|
[9] |
Q. Yang and X. Wu, “10 challenging problems in data mining research,” International Journal of Information Technology and Decision Making, vol.5, no.4, pp.597–604, 2006. doi: 10.1142/S0219622006002258
|
[10] |
C. Seiffert, T. M. Khoshgoftaar, J. V. Hulse, et al., “RUSBoost: A hybrid approach to alleviating class imbalance,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol.40, no.1, pp.185–197, 2009.
|
[11] |
N. V. Chawla, A. Lazarevic, L. O. Hall, et al., “SMOTEBoost: Improving prediction of the minority class in boosting,” in Proc. of European Conference on Principles of Data Mining and Knowledge Discovery, Berlin, Heidelberg, Germany, pp.107–119, 2003.
|
[12] |
V. López, A. Fernández, S. García, et al., “An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics,” Information Sciences, vol.250, pp.113–141, 2013. doi: 10.1016/j.ins.2013.07.007
|
[13] |
J. Błaszczyński and J. Stefanowski, “Neighbourhood sampling in bagging for imbalanced data,” Neurocomputing, vol.150, pp.529–542, 2015. doi: 10.1016/j.neucom.2014.07.064
|
[14] |
S. Barua, M. M. Islam, X. Yao, et al., “MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning,” IEEE Transactions on Knowledge and Data Engineering, vol.26, no.2, pp.405–425, 2014. doi: 10.1109/TKDE.2012.232
|
[15] |
M. Bader-El-Den, E. Teitei, and T. Perry, “Biased random forest for dealing with the class imbalance problem,” IEEE Transactions on Neural Networks and Learning Systems, vol.30, no.7, pp.2163–2172, 2019. doi: 10.1109/TNNLS.2018.2878400
|
[16] |
L. Breiman, “Bagging predictors,” Machine Learning, vol.24, no.2, pp.123–140, 1996.
|
[17] |
I. Tomek, “An experiment with the edited nearest-neighbor rule,” IEEE Transactions on Systems, Man, and Cybernetics, vol.6, no.6, pp.448–452, 1976.
|
[18] |
J. Laurikkala, “Improving identification of difficult small classes by balancing class distribution,” in Proc. of Artificial Intelligence in Medicine in Europe, Berlin, Heidelberg, Germany, pp.63–66, 2001.
|
[19] |
N. V. Chawla, K. W. Bowyer, L. O. Hall, et al., “SMOTE: Synthetic minority over-sampling technique,” Journal of Articial Intelligence Research, vol.16, no.1, pp.321–357, 2002.
|
[20] |
A. Fernández, S. Garcia, F. Herrera, et al., “SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary,” Journal of Artificial Intelligence Research, vol.61, pp.863–905, 2018. doi: 10.1613/jair.1.11192
|
[21] |
H. Han, W. Y. Wang, and B. H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” in Proc. of International Conference on Intelligent Computing, Berlin, Heidelberg, Germany, pp.878–887, 2005.
|
[22] |
H. He, Y. Bai, E. A. Garcia, et al., “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in Proc. of 2008 IEEE International Joint Conference on Neural Networks, Hong Kong, China, pp.1322–1328, 2008.
|
[23] |
Y. Zhai, N. Ma, D. Ruan, et al., “An effective over-sampling method for imbalanced data sets classification,” Chinese Journal of Electronics, vol.20, no.3, pp.489–494, 2011.
|
[24] |
J. A. Sáez, J. Luengo, J. Stefanowski, et al., “SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering,” Information Sciences, vol.291, pp.184–203, 2015. doi: 10.1016/j.ins.2014.08.051
|
[25] |
R. E. Schapire, “The strength of weak learnability,” Machine Learning, vol.5, no.2, pp.197–227, 1990.
|
[26] |
M. Galar, A. Fernandez, E. Barrenechea, et al., “A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches,” IEEE Transactions on Systems, Man, and Cybernetics, vol.42, no.4, pp.463–484, 2011.
|
[27] |
T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano, “Comparing boosting and bagging techniques with noisy and imbalanced data,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol.41, no.3, pp.552–568, 2010.
|
[28] |
S. Kumar, S. K. Biswas, and D. Devi, “TLUSBoost algorithm: A boosting solution for class imbalance problem,” Soft Computing, vol.23, no.21, pp.10755–10767, 2019. doi: 10.1007/s00500-018-3629-4
|
[29] |
S. Wang and X. Yao, “Diversity analysis on imbalanced data sets by using ensemble models,” in Proc. of 2009 IEEE Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, pp.324–331, 2009.
|
[30] |
X. Y. Liu, J. W, and Z. H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol.39, no.2, pp.539–550, 2009. doi: 10.1109/TSMCB.2008.2007853
|
[31] |
S. Hido, H. Kashima, and Y. Takahashi, “Roughly balanced bagging for imbalanced data,” Statistical Analysis and Data Mining: The ASA Data Science Journal, vol.2, no.5, pp.412–426, 2009.
|
[32] |
D. L. Wilson, “Asymptotic properties of nearest neighbor rules using edited data,” IEEE Transactions on Systems, Man, and Cybernetics, vol.2, no.3, pp.408–421, 1972.
|
[33] |
D. R. Wilson and T. R. Martinez, “Improved heterogeneous distance functions,” Journal of Artificial Intelligence Research, vol.6, pp.1–34, 1997. doi: 10.1613/jair.346
|
[34] |
B. W. Matthews, “Comparison of the predicted and observed secondary structure of T4 phage lysozyme,” Biochimica et Biophysica Acta (BBA) - Protein Structure, vol.405, no.2, pp.442–451, 1975. doi: 10.1016/0005-2795(75)90109-9
|
[35] |
D. Chicco and G. Jurman, “The advantages of the matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol.21, no.1, pp.1–13, 2020. doi: 10.1186/s12864-019-6419-1
|
[36] |
H. B. He and Y. Q. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications, New York: John Wiley & Sons, Hoboken, NJ, USA, pp.61–82, 2013.
|
[37] |
G. Lemaître, F. Nogueira, and C. K. Aridas, “Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning,” The Journal of Machine Learning Research, vol.18, no.1, pp.559–563, 2017.
|
[38] |
J. Alcalá-Fdez, A. Fernández, J. Luengo, et al., “Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing, vol.17, pp.255–287, 2011.
|
[39] |
I. Mukherjee and R. E. Schapire, “A theory of multiclass boosting,” Journal of Machine Learning Research, vol.14, no.1, pp.437–497, 2011.
|
[40] |
C. T. Su and Y. H. Hsiao, “An evaluation of the robustness of MTS for imbalanced data,” IEEE Transactions on Knowledge and Data Engineering, vol.19, no.10, pp.1321–1332, 2007. doi: 10.1109/TKDE.2007.190623
|
[41] |
D. J. Drown, T. M. Khoshgoftaar, and N. Seliya, “Evolutionary sampling and software quality modeling of high-assurance systems,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol.39, no.5, pp.1097–1107, 2009. doi: 10.1109/TSMCA.2009.2020804
|
[42] |
S. García, A. Fernández, and F. Herrera, “Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems,” Applied Soft Computing, vol.9, no.4, pp.1304–1314, 2009. doi: 10.1016/j.asoc.2009.04.004
|
[43] |
Pedregosa, Fabian, Varoquaux, et al., “Scikit-learn: Machine learning in python,” Journal of Machine Learning Research, vol.18, pp.2825–2830, 2011.
|
[44] |
F. Wilcoxon, “Individual comparisons by ranking methods,” International Biometric Society, vol.1, no.6, pp.80–83, 1945.
|
[45] |
J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” The Journal of Machine Learning Research, vol.7, pp.1–30, 2006.
|