CHENG Yuhu, FENG Huanting, WANG Xuesong. Actor-Critic Learning Based on AdaptiveImportance Sampling[J]. Chinese Journal of Electronics, 2010, 19(4): 583-588.
Citation: CHENG Yuhu, FENG Huanting, WANG Xuesong. Actor-Critic Learning Based on AdaptiveImportance Sampling[J]. Chinese Journal of Electronics, 2010, 19(4): 583-588.

Actor-Critic Learning Based on AdaptiveImportance Sampling

  • Received Date: 2010-01-01
  • Rev Recd Date: 2010-03-01
  • Publish Date: 2010-11-25
  • For on-policy Actor-critic (AC) reinforcementlearning, sampling is a time-consuming and expensivework. In order to efficiently reuse previously collectedsamples and to reduce large estimation variance, a kindof off-policy AC learning algorithm based on an Adaptiveimportance sampling (AIS) technique is proposed. TheCritic estimates the value-function using the least squarestemporal difference with eligibility trace and the AIS technique.In order to control the trade-off between bias andvariance of the estimation of policy gradient, a flatteningfactor is introduced to the importance weight in the AIS.The value of the flattening factor can be determined by animportance-weight cross-validation method automaticallyfrom samples and policies. Based on the estimated policygradient from the Critic, the Actor updates the policyparameter so as to obtain an optimal control policy. Simulationresults concerning a queuing problem illustrate thatthe AC learning based on AIS not only has good and stablelearning performance but also has quick convergencespeed.
  • loading
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (827) PDF downloads(1361) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return