A Region-Based Analysis for the Feature Concatenation in Deep Forests

LYU Shen-Huan; CHEN Yi-He; ZHOU Zhi-Hua

doi:10.1049/cje.2022.00.178

Volume 31 Issue 6

Nov. 2022

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Electronics > 2022 > 31(6): 1072-1080

LYU Shen-Huan, CHEN Yi-He, ZHOU Zhi-Hua, “A Region-Based Analysis for the Feature Concatenation in Deep Forests,” Chinese Journal of Electronics, vol. 31, no. 6, pp. 1072-1080, 2022, doi: 10.1049/cje.2022.00.178

Citation:

LYU Shen-Huan, CHEN Yi-He, ZHOU Zhi-Hua, “A Region-Based Analysis for the Feature Concatenation in Deep Forests,” Chinese Journal of Electronics, vol. 31, no. 6, pp. 1072-1080, 2022, doi: 10.1049/cje.2022.00.178

Citation:

PDF( 2769 KB)

A Region-Based Analysis for the Feature Concatenation in Deep Forests

doi: 10.1049/cje.2022.00.178

1.
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China

Funds: This work was supported by the National Natural Science Foundation of China (61921006)

More Information

Author Bio:
Shen-Huan LYU received the B.E. degree in statistics from University of Science and Technology of China. He is a Ph.D. candidate of Nanjing University. His research interests include machine learning and data mining. (Email: lvsh@lamda.nju.edu.cn)

Yi-He CHEN received the B.E. degree in School of Computer Science and Technology from Southeast University, China. He received the M.S. degree from Nanjing University, China. His research interests include machine learning and data mining. (Email: chenyh@lamda.nju.edu.cn)

Zhi-Hua ZHOU (corresponding author) received the Ph.D. degree in computer science from Nanjing University, Nanjing, China, in 2000. Currently, he is a Professor at Nanjing University, China. His research interests mainly include artificial intelligence, machine learning, and data mining. (Email: zhouzh@lamda.nju.edu.cn)
Received Date: 2022-06-27
Accepted Date: 2022-08-31

Available Online: 2022-10-24

Publish Date: 2022-11-05

Abstract

Abstract

Deep forest is a tree-based deep model made up of non-differentiable modules that are trained without backpropagation. Despite the fact that deep forests have achieved considerable success in a variety of tasks, feature concatenation, as the ingredient for forest representation learning, still lacks theoretical understanding. In this paper, we aim to understand the influence of feature concatenation on predictive performance. To enable such theoretical studies, we present the first mathematical formula of feature concatenation based on the two-stage structure, which regards the splits along new features and raw features as a region selector and a region classifier respectively. Furthermore, we prove a region-based generalization bound for feature concatenation, which reveals the trade-off between Rademacher complexities of the two-stage structure and the fraction of instances that are correctly classified in the selected region. As a consequence, we show that compared with the prediction-based feature concatenation (PFC), the advantage of interaction-based feature concatenation (IFC) is that it obtains more abundant regions through distributed representation and alleviates the overfitting risk in local regions. Experiments confirm the correctness of our theoretical results.
- Deep forest,
- Overfitting,
- Generalization bound,
- Representation learning

FullText(HTML)

References(43)

References

[1]	Z. -H. Zhou and J. Feng, “Deep forest,” National Science Review, vol.6, no.1, pp.74–86, 2019. doi: 10.1093/nsr/nwy108
[2]	L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Chapman and Hall/CRC, Boca Raton, FL, USA, 1984.
[3]	Z. -H. Zhou, Machine Learning, Springer, Berlin, Germany, 2021.
[4]	J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol.1, pp.81–106, 1986. doi: 10.1007/BF00116251
[5]	Z. -H. Zhou, Ensemble Methods: Foundations and Algorithms, Chapman and Hall/CRC, Boca Raton, FL, USA, 2012.
[6]	R. E. Schapire and Y. Freund, Boosting: Foundations and Algorithms, MIT Press, Cambridge, MA, USA, 2012.
[7]	L. Breiman, “Random forests,” Machine Learning, vol.45, no.1, pp.5–32, 2001. doi: 10.1023/A:1010933404324
[8]	J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” The Annals of Statistics, vol.29, no.5, pp.1189–1232, 2001. doi: 10.1214/aos/1013203450
[9]	P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, vol.63, no.1, pp.3–42, 2006. doi: 10.1007/s10994-006-6226-1
[10]	T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, pp.785–794, 2016.
[11]	F. T. Liu, K. M. Ting, and Z. -H. Zhou, “Isolation forest,” in Proceedings of the 8th IEEE International Conference on Data Mining, Pisa, Italy, pp.413–422, 2008.
[12]	K. M. Ting, Y. Zhu, and Z. -H. Zhou, “Isolation kernel and its effect on SVM,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, pp.2329–2337, 2018.
[13]	B. -C. Xu, K. M. Ting, and Z. -H. Zhou, “Isolation set-kernel and its application to multi-instance learning,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, pp.941–949, 2019.
[14]	K. M. Ting, B. -C. Xu, T. Washio, and Z. -H. Zhou, “Isolation distributional kernel: A new tool for kernel based anomaly detection,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Virtual Event, pp.198–206, 2020.
[15]	K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp.770–778, 2016.
[16]	S. -H. Lyu, L. Wang, and Z. -H. Zhou, “Improving generalization of deep neural networks by leveraging margin distribution,” Neural Networks, vol.151, pp.48–60, 2022. doi: 10.1016/j.neunet.2022.03.019
[17]	S. Kriman, S. Beliaev, B. Ginsburg, J. Huang, et al., “Quartznet: Deep automatic speech recognition with 1D time-channel separable convolutions,” in Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020), Barcelona, Spain, pp.6124–6128, 2020.
[18]	J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, pp.4171–4186, 2019.
[19]	J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyper-parameter optimization,” in Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain, pp.2546–2554, 2011.
[20]	Z. -H. Zhou, “Why over-parameterization of deep neural networks does not overfit?,” Science China Information Sciences, vol.64, no.1, article no.116101, 2021. doi: 10.1007/s11432-020-2885-6
[21]	C. Cortes, M. Mohri, and U. Syed, “Deep boosting,” in Proceedings of the 31st International Conference on Machine Learning, Beijing, China, pp.1179–1187, 2014.
[22]	G. DeSalvo, M. Mohri, and U. Syed, “Learning with deep cascades,” in Proceedings of the 26th International Conference on Algorithmic Learning Theory, Banff, AB, Canada, pp.254–269, 2015.
[23]	Z. -H. Zhou and J. Feng, “Deep forest: Towards an alternative to deep neural networks,” in Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, pp.3553–3559, 2017.
[24]	J. Feng and Z. -H. Zhou, “Autoencoder by forest,” in Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, pp.2967–2973, 2018.
[25]	L. V. Utkin and M. A. Ryabinin, “Discriminative metric learning with deep forest,” International Journal on Artificial Intelligence Tools, vol.28, no.2, article no.1950007, 2019. doi: 10.1142/S0218213019500076
[26]	L. Yang, X. Wu, Y. Jiang, and Z. Zhou, “Multi-label learning with deep forest,” in Proceedings of the 24th European Conference on Artificial Intelligence, Santiago de Compostela, Spain, pp.1634–1641, 2020.
[27]	Q. Wang, L. Yang, and Y. Li, “Learning from weak-label data: A deep forest expedition,” in Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, pp.6251–6258, 2020.
[28]	Y. Zhang, J. Zhou, W. Zheng, J. Feng, et al., “Distributed deep forest and its application to automatic detection of cash-out fraud,” ACM Transactions on Intelligent Systems and Technology, vol.10, no.5, article no.55, 2019. doi: 10.1145/3342241
[29]	L. V. Utkin and M. A. Ryabinin, “A siamese deep forest,” Knowledge-Based Systems, vol.139, pp.13–22, 2018. doi: 10.1016/j.knosys.2017.10.006
[30]	M. Pang, K. -M. Ting, P. Zhao, and Z. -H. Zhou, “Improving deep forest by confidence screening,” in Proceedings of the 18th IEEE International Conference on Data Mining, Singapore, pp.1194–1199, 2018.
[31]	Y. -H. Chen, S. -H. Lyu, and Y. Jiang, “Improving deep forest by exploiting high-order interactions,” in Proceedings of the 21st IEEE International Conference on Data Mining, Auckland, New Zealand, pp.1036–1041, 2021.
[32]	S. -H. Lyu, L. Yang, and Z. -H. Zhou, “A refined margin distribution analysis for forest representation learning,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver Canada, pp.5530–5540, 2019.
[33]	L. Arnould, C. Boyer, and E. Scornet, “Analyzing the tree-layer structure of deep forests,” in Proceedings of the 38th International Conference on Machine Learning, Virtual Event, vol.139, pp.342–350, 2021.
[34]	Z. -H. Zhou, “Open-environment machine learning,” National Science Review, vol.9, no.8, article no.nwac123, 2022. doi: 10.1093/nsr/nwac123
[35]	P. Zhao, Y. -J. Zhang, L. Zhang, and Z. -H. Zhou, “Dynamic regret of convex and smooth functions,” in Proceedings of the 34th Conference on Neural Information Processing Systems, Vancouver, Canada, pp.12510–12520, 2020.
[36]	P. Zhao, G. Wang, L. Zhang, and Z. -H. Zhou, “Bandit convex optimization in non-stationary environments,” Journal of Machine Learning Research, vol.22, no.1, pp.5562–5606, 2022.
[37]	M. Pang, K. M. Ting, P. Zhao, and Z. -H. Zhou, “Improving deep forest by screening,” IEEE Transactions on Knowledge and Data Engineering, vol.34, no.9, pp.4298–4312, 2020. doi: 10.1109/TKDE.2020.3038799
[38]	X. -C. Li, D. -C. Zhan, J. -Q. Yang, and Y. Shi, “Deep multiple instance selection,” Science China Information Sciences, vol.64, no.3, article no.130102, 2021. doi: 10.1007/s11432-020-3117-3
[39]	Y. Ren, N. Xu, M. Ling, and X. Geng, “Label distribution for multimodal machine learning,” Frontiers of Computer Science, vol.16, no.1, article no.161306, 2022. doi: 10.1007/s11704-021-0611-6
[40]	S. -Y. Li, S. -J. Huang, and S. Chen, “Crowdsourcing aggregation with deep bayesian learning,” Science China Information Sciences, vol.64, no.3, article no.130104, 2021. doi: 10.1007/s11432-020-3118-7
[41]	Z. -H. Zhou, “Rehearsal: Learning from prediction to decision,” Frontiers of Computer Science, vol.16, no.1, article no.164352, 2022. doi: 10.1007/s11704-022-2900-0
[42]	C. Cortes, M. Mohri, D. Storcheus, and A. T. Suresh, “Boosting with multiple sources,” in Proceedings of the 35th Conference on Neural Information Processing Systems, Virtual Event, pp.17373–17387, 2021.
[43]	D. Dua and C. Graff, “UCI machine learning repository,” Available at: https://archive.ics.uci.edu/ml/index.php, 2017.