Citation: | CHEN Xingguo, GAO Yang, FAN Shunguo, “Temporal Difference Learning with Piecewise Linear Basis,” Chinese Journal of Electronics, vol. 23, no. 1, pp. 49-54, 2014, |
R. Sutton and A. Barto, Reinforcement Learning: An Introduction, Cambridge, MA: MIT Press, 1998.
|
S. Dˇzeroski, L. De Raedt et al.,"Relational reinforcement learning", Machine Learning, Vol.43, No.1, pp.7-52, 2001.
|
F. Wang, N. Jin, D. Liu and Q. Wei,"Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ∈-error bound", IEEE Transactioin on Neural Network, Vol.22, No.1, pp.1-13, 2011.
|
Y. Cheng, X. Wang and Y. Zhang,"A bayesian reinforcement learning algorithm based on abstract states for elevator group scheduling systems", Chinese Journal of Electronics, Vol.19, No.3, pp.394-398, 2010.
|
X. Xu, D. Hu and X. Lu,"Kernel-based least squares policy iteration for reinforcement learning", IEEE Transactioin on Neural Network, Vol.18, No.7, pp.973-992, 2007.
|
R. Sutton,"Learning to predict by the methods of temporal differences", Machine Learning, Vol.3, No.1, pp.9-44, 1988.
|
J. Boyan,"Technical update: Least-squares temporal difference learning", Machine Learning, Vol.49, No.2, pp.233-246, 2002.
|
A. Geramifard, M. Bowling and R. Sutton,"Incremental leastsquares temporal difference learning", in Proceedings of the 21st AAAI Conference on Artificial Intelligence, Boston, Massachusetts, pp.356-361, 2006.
|
R. Sutton et al.,"A convergent O(n) algorithm for off-policy temporal difference learning with linear function approximation", in Advances in Neural Information Processing Systems 21, Vancouver, B.C., Canada, pp.1609-1616, 2008.
|
R. Sutton, H. Maei et al.,"Fast gradient descent methods for temporal difference learning with linear function approximation", in Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada, pp.993-1000, 2009.
|
J. Tsitsiklis and B. Van Roy,"An analysis of temporal difference learning with function approximation", IEEE Transactions on Automatic Control, Vol.42, No.5, pp.674-690, 1997.
|
R. Schoknecht,"Optimality of reinforcement learning algorithms with linear function approximation", in Advances in Neural Information Processing Systems 15, Vancouver, B.C., Canada, pp.1555-1562, 2002.
|
B. Scherrer,"Should one compute the temporal difference fix point or minimize the bellman residual? the unified oblique projection view", in Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp.959-966, 2010.
|
M. Ghavamzadeh, A. Lazaric et al.,"LSTD with random projections", in Advances in Neural Information Processing Systems 23, Lake Tahoe, Nevada, USA, pp.721-729, 2010.
|
D. Bertsekas,"Temporal difference methods for general projected equations", IEEE Transactions on Automatic Control, Vol.56, No.9, pp.2128-2139, 2011.
|
R. Parr, C. Painter-Wakefield, L. Li and M. Littman,"Analyzing feature generation for value function approximation", in Proceedings of the 24th International Conference on Machine Learning, Corvallis, Oregon: ACM, pp.737-744, 2007.
|
M. Loth et al.,"Sparse temporal difference learning using LASSO", in IEEE Symp. on Adaptive Dynamic Programming and Reinforcement Learning, Honolulu, pp.352-359, 2007.
|
J. Kolter and A. Ng,"Regularization and feature selection in least-squares temporal difference learning", in Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada, pp.521-528, 2009.
|
M. Hoffman et al.,"Regularized least squares temporal difference learning with nested l2 and l1 penalization", in Proceedings of European Workshop on Reinforcement Learning, Athens, Greece, pp.102-114, 2011.
|
M. Ghavamzadeh et al.,"Finite sample analysis of Lasso-TD", in Proceedings of the 28th International Conference on Machine Learning, Bellevue, Washington, USA, pp.1177-1184, 2011.
|
M. Geist et al.,"A dantzig selector approach to temporal difference learning", in Proceedings of the 29th International Conf. Machine Learning, Edinburgh, Scotland, pp.1399-1406, 2012.
|
C. Phua and R. Fitch,"Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation", in Proceedings of the 24th International Conference on Machine Learning, Corvallis, Oregon, USA, pp.751758, 2007.
|
L. Zhao and J. Ding,"Least squares approximation to lognormal sum distribution via piecewise linear functions", in Proceedings of the 4th IEEE Conference on Industrial Electronics and Applications, Xi'an, China, pp.1324-1329, 2009.
|
D. Bertsekas, J. Tsitsiklis and A. Scientific, Neuro-Dynamic Programming, Athens, Greece: Athena Scientific Press, 1996.
|
A. Dutech et al.,"Reinforcement learning benchmarks and bake-offs II", in Workshop of 17th Advances in Neural Information Processing Systems, Vancouver, B.C., Canada, 2005.
|
N. Ferns, P. Castro, D. Precup and P. Panangaden,"Methods for computing state similarity in markov decision processes", in Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, Cambridge, MA, USA, pp.174-181, 2006.
|
N. Ferns, P. Panangaden and D. Precup,"Bisimulation metrics for continuous markov decision processes", SIAM Journal of Computing, Vol.40, No.6, pp.1662-1714, 2011.
|