DeepLogic: Priority Testing of Deep Learning Through Interpretable Logic Units

LIN Chenhao; ZHANG Xingliang; SHEN Chao

doi:10.23919/cje.2022.00.451

Volume 33 Issue 4

Jul. 2024

Turn off MathJax

Article Contents

Article Navigation > Chinese Journal of Electronics > 2024 > 33(4): 948-964

Chenhao LIN, Xingliang ZHANG, and Chao SHEN, “DeepLogic: Priority Testing of Deep Learning Through Interpretable Logic Units,” Chinese Journal of Electronics, vol. 33, no. 4, pp. 948–964, 2024 doi: 10.23919/cje.2022.00.451

Citation:

Chenhao LIN, Xingliang ZHANG, and Chao SHEN, “DeepLogic: Priority Testing of Deep Learning Through Interpretable Logic Units,” Chinese Journal of Electronics, vol. 33, no. 4, pp. 948–964, 2024 doi: 10.23919/cje.2022.00.451

Citation:

PDF( 22068 KB)

DeepLogic: Priority Testing of Deep Learning Through Interpretable Logic Units

doi: 10.23919/cje.2022.00.451

1.
Faculty of Electronic and Infomation Engineering, Xi’an Jiaotong University, Xi’an 710049, China

More Information

Author Bio:
Chenhao LIN received the B.E. degree in automation from Xi’an Jiongtong University, Xi’an, China, in 2011, the M.S. degree in electrical engineering from Columbia University, New York, USA, in 2013, and the Ph.D. degree from The Hong Kong Polytechnic University, Hong Kong, China, in 2018. He is currently a Research Fellow at the Xi’an Jiongtong University. His research interests include artificial intelligence security, adversarial attack and robustness, identity authentication, and pattern recognition. (Email: linchenhao@xjtu.edu.cn)

Xingliang ZHANG received the B.E. degree from the Information Engineering University, Zhengzhou, China. He is currently pursuing the M.S. degree in cyberspace security with Xi’an Jiaotong University, Xi’an, China. His current research interest focuses on artificial intelligence security. (Email: zhangxliang@stu.xjtu.edu.cn)

Chao SHEN received the B.S. degree in automation and Ph.D. degree in control theory and control engineering from Xi’an Jiaotong University, Xi’an, China, in 2007 and 2014, respectively. He is currently a Professor with the Faculty of Electronic and Information Engineering, Xi’an Jiaotong University. His current research interests include AI security, insider/intrusion detection, behavioral biometrics, and measurement and experimental methodology. (Email: chaoshen@xjtu.edu.cn)
Corresponding author: Email: chaoshen@xjtu.edu.cn
Received Date: 2022-12-27
Accepted Date: 2023-06-05

Available Online: 2023-08-19

Publish Date: 2024-07-05

Abstract

Abstract

With the increasing deployment of deep learning-based systems in various scenes, it is becoming important to conduct sufficient testing and evaluation of deep learning models to improve their interpretability and robustness. Recent studies have proposed different criteria and strategies for deep neural network (DNN) testing. However, they rarely conduct effective testing on the robustness of DNN models and lack interpretability. This paper proposes a new priority testing criterion, called DeepLogic, to analyze the robustness of the DNN models from the perspective of model interpretability. We first define the neural units in DNN with the highest average activation probability as “interpretable logic units”. We analyze the changes in these units to evaluate the model’s robustness by conducting adversarial attacks. After that, the interpretable logic units of the inputs are taken as context attributes, and the probability distribution of the softmax layer in the model is taken as internal attributes to establish a comprehensive test prioritization framework. The weight fusion of context and internal factors is carried out, and the test cases are sorted according to this priority. The experimental results on four popular DNN models using eight testing metrics show that our DeepLogic significantly outperforms existing state-of-the-art methods.
- Deep learning testing,
- Interpretable logic units,
- Adversarial test,
- Model interpretability,
- Defect detection

FullText(HTML)

References(40)

References

[1]	K. Eykholt, I. Evtimov, E. Fernandes, et al., “Robust physical-world attacks on deep learning visual classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 1625–1634, 2018.
[2]	X. J. Ma, Y. H. Niu, L. Gu, et al., “Understanding adversarial attacks on deep learning based medical image analysis systems,” Pattern Recognition, vol. 110, article no. 107332, 2021.
[3]	K. D. Julian, J. Lopez, J. S. Brush, et al., “Policy compression for aircraft collision avoidance systems,” in Proceedings of the 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC), Sacramento, CA, USA, pp. 1–10, 2016.
[4]	K. Eykholt, I. Evtimov, E. Fernandes, et al., “Physical adversarial examples for object detectors,” in Proceedings of the 12th USENIX Conference on Offensive Technologies, Baltimore, MD, USA, p. 1, 2018.
[5]	J. M. Zhang, M. Harman, L. Ma, et al., “Machine learning testing: Survey, landscapes and horizons,” IEEE Transactions on Software Engineering, vol. 48, no. 1, pp. 1–36, 2022. doi: 10.1109/TSE.2019.2962027
[6]	K. X. Pei, Y. Z. Cao, J. F. Yang, et al., “DeepXplore: Automated whitebox testing of deep learning systems,” in Proceedings of the 26th Symposium on Operating Systems Principles, Shanghai, China, pp. 1–18, 2017.
[7]	L. Ma, F. Juefei-Xu, F. Y. Zhang, et al., “DeepGauge: Multi-granularity testing criteria for deep learning systems,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, France, pp. 120–131, 2018.
[8]	Y. C. Sun, X. W. Huang, D. Kroening, et al., “Testing deep neural networks,” arXiv preprint, arXiv: 1803.04792, 2018.
[9]	Y. Feng, Q. K. Shi, X. Y. Gao, et al., “DeepGini: Prioritizing massive tests to enhance the robustness of deep neural networks,” in Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, pp. 177–188, 2020.
[10]	Y. Z. Dong, P. X. Zhang, J. Y. Wang, et al., “There is limited correlation between coverage and robustness for deep neural networks,” arXiv preprint, arXiv: 1911.05904, 2019.
[11]	D. Wang, Z. Y. Wang, C. R. Fang, et al., “DeepPath: Path-driven testing criteria for deep neural networks,” in Proceedings of the 2019 IEEE International Conference on Artificial Intelligence Testing (AITest), Newark, CA, USA, pp. 119–120, 2019.
[12]	T. W. Weng, H. Zhang, P. Y. Chen, et al., “Evaluating the robustness of neural networks: An extreme value theory approach,” in Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 2018.
[13]	G. Katz, C. Barrett, D. L. Dill, et al., “Reluplex: An efficient SMT solver for verifying deep neural networks,” in Proceedings of the 29th International Conference on Computer Aided Verification, Heidelberg, Germany, pp. 97–117, 2017.
[14]	T. Gehr, M. Mirman, D. Drachsler-Cohen, et al., “AI2: Safety and robustness certification of neural networks with abstract interpretation,” in Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, pp. 3–18, 2018.
[15]	Z. N. Li, X. X. Ma, C. Xu, et al., “Structural coverage criteria for neural networks could be misleading,” in Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), Montreal, QC, Canada, pp. 89–92, 2019.
[16]	F. Harel-Canada, L. X. Wang, M. A. Gulzar, et al., “Is neuron coverage a meaningful measure for testing deep neural networks?” in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, pp. 851–862, 2020.
[17]	J. Y. Wang, J. L. Chen, Y. C. Sun, et al., “RobOT: Robustness-oriented testing for deep learning systems,” in Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, pp. 300–311, 2021.
[18]	Q. Hu, Y. J. Guo, M. Cordy, et al., “An empirical study on data distribution-aware test selection for deep learning enhancement,” ACM Transactions on Software Engineering and Methodology, vol. 31, no. 4, article no. 78, 2022. doi: 10.1145/3511598
[19]	K. Simonyan, A. Vedaldi, and A. Zisserman, “Deep inside convolutional networks: Visualising image classification models and saliency maps,” in Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada, 2014.
[20]	B. L. Zhou, A. Khosla, A. Lapedriza, et al., “Learning deep features for discriminative localization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 2921–2929, 2016.
[21]	R. R. Selvaraju, M. Cogswell, A. Das, et al., “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp. 618–626, 2017.
[22]	D. Bau, J. Y. Zhu, H. Strobelt, et al., “Understanding the role of individual units in a deep neural network,” Proceedings of the National Academy of Sciences of the United States of America, vol. 117, no. 48, pp. 30071–30078, 2020. doi: 10.1073/pnas.1907375117
[23]	Y. Bai, Y. Y. Zeng, Y. Jiang, et al., “Improving adversarial robustness via channel-wise activation suppressing,” in Proceedings of the 9th International Conference on Learning Representations, Virtual Event, pp. 1−19, 2021.
[24]	S. C. Han, C. H. Lin, C. Shen, et al., “Interpreting adversarial examples in deep learning: A review,” ACM Computing Surveys, vol. 55, no. 14s, article no. 328, 2023. doi: 10.1145/3594869
[25]	G. Rothermel, R. H. Untch, C. Y. Chu, et al., “Prioritizing test cases for regression testing,” IEEE Transactions on Software Engineering, vol. 27, no. 10, pp. 929–948, 2001. doi: 10.1109/32.962562
[26]	J. M. Kim and A. Porter, “A history-based test prioritization technique for regression testing in resource constrained environments,” in Proceedings of the 24th International Conference on Software Engineering, Orlando, FL, USA, pp. 119–129, 2002.
[27]	Z. Li, M. Harman, and R. M. Hierons, “Search algorithms for regression test case prioritization,” IEEE Transactions on Software Engineering, vol. 33, no. 4, pp. 225–237, 2007. doi: 10.1109/TSE.2007.38
[28]	D. Leon and A. Podgurski, “A comparison of coverage-based and distribution-based techniques for filtering and prioritizing test cases,” in Proceedings of the 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003, Denver, CO, USA, pp. 442–453, 2003.
[29]	M. Tyagi and S. Malhotra, “An approach for test case prioritization based on three factors,” International Journal of Information Technology and Computer Science, vol. 7, no. 4, pp. 79–86, 2015. doi: 10.5815/ijitcs.2015.04.09
[30]	W. J. Shen, Y. H. Li, L. Chen, et al., “Multiple-boundary clustering and prioritization to promote neural network retraining,” in Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, Melbourne, VIC, Australia, pp. 410–422, 2020.
[31]	J. Kim, R. Feldt, and S. Yoo, “Guiding deep learning system testing using surprise adequacy,” in Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada, pp. 1039–1049, 2019.
[32]	A. Sharif, D. Marijan, and M. Liaaen, “DeepOrder: Deep learning for test case prioritization in continuous integration testing,” in Proceedings of the 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), Luxembourg, Luxembourg, pp.525–534, 2021.
[33]	Y. Li, M. Li, Q. X. Lai, et al., “TestRank: Bringing order into unlabeled test instances for deep learning tasks,” in Proceedings of the 35th International Conference on Neural Information Processing Systems, Virtual Event, pp. 20874–20886, 2021.
[34]	H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, pp. 1520–1528, 2015.
[35]	M. Lin, Q. Chen, and S. C. Yan, “Network in network,” arXiv preprint, arXiv: 1312.4400, 2013.
[36]	A. Mor, “Evaluate the effectiveness of test suite prioritization techniques using APFD metric,” IOSR Journal of Computer Engineering, vol. 16, no. 4, pp. 47–51, 2014. doi: 10.9790/0661-16414751
[37]	I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, pp. 448−456, 2015.
[38]	A. Madry, A. Makelov, L. Schmidt, et al., “Towards deep learning models resistant to adversarial attacks,” in Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, pp. 1−18, 2018.
[39]	N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, pp. 39–57, 2017.
[40]	H. Y. Zhang, Y. D. Yu, J. T. Jiao, et al., “Theoretically principled trade-off between robustness and accuracy,” in Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, pp. 7472–7482, 2019.