Volume 30 Issue 6
Nov.  2021
Turn off MathJax
Article Contents
PENG Pai, ZHU Fei, LIU Quan, ZHAO Peiyao, WU Wen. Achieving Safe Deep Reinforcement Learning via Environment Comprehension Mechanism[J]. Chinese Journal of Electronics, 2021, 30(6): 1049-1058. doi: 10.1049/cje.2021.07.025
Citation: PENG Pai, ZHU Fei, LIU Quan, ZHAO Peiyao, WU Wen. Achieving Safe Deep Reinforcement Learning via Environment Comprehension Mechanism[J]. Chinese Journal of Electronics, 2021, 30(6): 1049-1058. doi: 10.1049/cje.2021.07.025

Achieving Safe Deep Reinforcement Learning via Environment Comprehension Mechanism

doi: 10.1049/cje.2021.07.025

This work is supported by the National Natural Science Foundation of China (No.61303108), Suzhou Key Industries Technological Innovation-Prospective Applied Research Project (No.SYG201804), and a project funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions.

  • Received Date: 2020-06-05
  • Rev Recd Date: 2021-05-20
  • Available Online: 2021-09-23
  • Publish Date: 2021-11-05
  • Deep reinforcement learning (DRL), which combines deep learning with reinforcement learning, has achieved great success recently. In some cases, however, during the learning process agents may reach states that are worthless and dangerous where the task fails. To address the problem, we propose an algorithm, referred as Environment comprehension mechanism (ECM) for deep reinforcement learning to attain safer decisions. ECM perceives hidden dangerous situations by analyzing object and comprehending the environment, such that the agent bypasses inappropriate actions systematically by setting up constraints dynamically according to states. ECM, which calculates the gradient of the states in Markov tuple, sets up boundary conditions and generates a rule to control the direction of the agent to skip unsafe states. ECM is able to be applied to basic deep reinforcement learning algorithms to guide the selection of actions. The experiment results show that the algorithm promoted safety and stability of the control tasks.
  • loading
  • R.S. Sutton and A.G. Barto, Reinforcement Learning:An Introduction, MIT press, Ambridge, Massachusetts, London, England, pp.1-5, 2018.
    Y. Cheng, J. Peng, X. G, et al., "RLCP:A reinforcement learning method for health stage division using change points", 2018 IEEE International Conference on Prognostics and Health Management, Seattle, Washington, USA, pp.1-6, 2018.
    J. Hwangbo, I. Sa, R. Siegwart, et al., "Control of a quadrotor with reinforcement learning", IEEE Robotics and Automation, Vol.2, No.4, pp.2096-2103, 2017.
    D. Silver, A. Huang, CJ. Maddison, et al., "Mastering the game of Go with deep neural networks and tree search", Nature, Vol.529, No.7587, pp.484-489, 2016
    D.Y. Meng and L.N. Sun, "Some new trends of deep learning research", Chinese Journal of Electronics, Vol.28, No.6, pp.1087-1090, 2019.
    S. Zhang, L.N. Yao, A. Sun, et al., "Deep learning based recommender system:A survey and new perspectives", ACM Computing Surveys, Vol.52, No.1, pp.1-38, 2019.
    P. Barré, B.C. Stöver and K.F. Müller, "Leafnet:A computer vision system for automatic plant species identification", Ecological Informatics, Vol.40, pp.50-56, 2017.
    L.W. Wu, Y. Rao, H.L. Yu, et al., "A multi-semantics classification method based on deep learning for incredible messages on social media", Chinese Journal of Electronics, Vol.28, No.4, pp.754-763, 2019.
    B.L. Peng, X.J. Li, J.F. Gao, et al., "Adversarial advantage actor-critic model for task-completion dialogue policy learning", 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, Alberta, Canada, pp.6149-6153, 2018.
    M. Hessel, J. Modayil, H.V. Hasselt, et al., "Rainbow:Combining improvements in deep reinforcement learning", National Conference on Artificial Intelligence, Stockholm, Sweden, pp.3215-3222, 2018.
    A. Nair, B. McGrew, M. Andrychowicz, et al., "Overcoming exploration in reinforcement learning with demonstrations", 2018 IEEE International Conference on Robotics and Automation, Prague, Czech, pp.6292-6299, 2018.
    M.G. Bellemare, W. Dabney and R.A. Munos, "A distributional perspective on reinforcement learning", International Conference on Machine Learning, Boston, Massachusetts, USA, pp.449-458, 2017.
    X. Chen, Z. Li, K. Wang, et al., "Mdpbased network selection with reward optimization in hetnets", Chinese Journal of Electronics, Vol.27, No.1, pp.183-190, 2018.
    D. Silver, T. Hubert, J.L. Schrittwieser, et al., "A general reinforcement learning algorithm that masters chess, shogi, and go through self-play", Science, Vol.362, No.6419, pp.1140-1144, 2018.
    H. Pham and X.L. Wei, "Bellman equation and viscosity solutions for mean-field stochastic control problem", ESAIM:Control, Optimisation and Calculus of Variations, Vol.24, No.1, pp.437-461. 2018.
    S. Hota, P. Satapathy, S.P. Pati, et al., "Net asset value prediction using extreme learning machine with dolphin swarm algorithm", 20182nd International Conference on Data Science and Business Analytics, ChangSha, Hunan, China, pp.13-18, 2018.
    X. Xu, Z.H. Huang, L. Zuo, et al., "Manifold-based reinforcement learning via locally linear reconstruction", IEEE Transactions on Neural Networks, Vol.28, No.4, pp.934-947, 2017.
    E. Delage and S. Mannor, "Percentile optimization in uncertain markov decision processes with application to efficient exploration", Machine Learning, Proceedings of the Twenty-Fourth International Conference, Corvallis, Oregon, USA, pp.225-232, 2007.
    Z.Y. Wang, T. Schaul, M. Hessel, et al., "Dueling network architectures for deep reinforcement learning", International Conference on Machine Learning, New York, USA, pp.1995-2003, 2016.
    A. Hans, D. Schneegaß, A.M. Schäfer, et al., "Safe exploration for reinforcement learning", The European Symposium on Artificial Neural Networks, Bruges, Belgium, pp.143-148, 2008.
    M. Heger, "Consideration of risk in reinforce-ment learning", Machine Learning Proceedings, Elsevier, Amherst, Massachusetts, USA, pp.105-111, 1994.
    F. Zhu, W. Wen, Y.C. Fu, et al., "A dual deep network based secure deep reinforcement learning method", Chinese Journal of Computers, Vol.42, No.8, pp.1-15, 2019.
    F. Berkenkamp, M. Turchetta, A. PSchoellig et al., "Safe model-based reinforcement learning with stability guarantees", Neural Information Processing Systems, Vol.2, pp.908-918, 2017.
    W. Saunders, G. Sastry, A. Stuhlmueller, et al., "Trial without error:Towards safe reinforcement learning via human intervention", Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, Richland, USA, pp.2067-2069, 2018.
    J. Achiam, D. Held, A. Tamar, et al., "Constrained policy optimization", Proceedings of the 34th International Conference on Machine Learning, Vol.70, pp.22-31, 2017.
    P. Geibel and F. Wysotzki, "Risk-sensitive reinforcement learning applied to control under constraints", Journal of Artificial Intelligence Research, Vol.24, pp.81-108, 2015.
    Z. Zhang, M. Zhao and T.W. Chow, "Binary-and multi-class group sparse canonical correlation analysis for feature extraction and classification", IEEE Transactions on Knowledge Data Engineering, Vol.25, No.10, pp.2192-2205, 2013.
    M. Nikolic, E. Tuba and M. Tuba, "Edge detection in medical ultrasound images using adjusted canny edge detection algorithm", 2016 24th Telecommunications Forum, Belgrade, Serbia, pp.104, 2016.
    S. Kaur and I. Singh, "Comparison between edge detection techniques", International Journal of Computer Applications, Vol.145, No.15, pp.15-18, 2016.
    Y.J. Cha, W. Choi and O. Buyukozturk, "Deep learningbased crack damage detection using convolutional neural networks", Computer-aided Civil Infrastructure Engineering, Vol.32, No.5, pp.361-378, 2017.
    V. Mnih, K. Kavukcuoglu, D. Silver, et al., "Humanlevel control through deep reinforcement learning", Nature, Vol.518, No.7540, pp.529-533, 2015.
    P. He, H. Wu, C. Zeng, et al., "Truser:An approach to service recommendation based on trusted users", Chinese Journal of Computers, Vol.42, No.4, pp.851-863, 2019.
  • 加载中


    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (98) PDF downloads(20) Cited by()
    Proportional views


    DownLoad:  Full-Size Img  PowerPoint