[1] 阮晓钢, 黄静, 范青武, 等. 一种基于操作条件反射原理的学习模型[J]. 控制与决策, 2014, 29(6):1016-1020.Ruan Xiaogang, Huang Jing, Fan Qingwu, et al. A learning model based on operant conditioning principles[J]. Control and Decision, 2014, 29(6): 1016-1020. [2] 郜园园, 阮晓钢, 宋洪军. 操作条件反射学习自动机及其在机器人平衡控制中的应用[J]. 控制与决策, 2013, 28(6): 930-934. Gao Yuanyuan, Ruan Xiaogang, Song Hongjun. Operant conditioning learning automatic and its application on robot balance control[J]. Control and Decision, 2013, 28(6): 930-934. [3] CUTSURIDIS V, TAYLOR J G.A cognitive control architecture for the perception-action cycle in robots and agents[J]. Cognitive Computation(S1866-9956), 2013, 5(3): 383-395. [4] 林歆悠, 薛瑞, 孙冬野. SPHEB基于动态规划的规则控制策略研究[J]. 系统仿真学报, 2013, 25(5): 1077-1082. Lin Xinyou, Xue Rui, Sun Dongye. Rule based strategy from dynamic programming for novel series-parallel hybrid electric city bus[J]. Journal of System Simulation, 2013, 25(5): 1077-1082. [5] 方啸, 郑德忠. 基于自适应动态规划算法的小车自主导航控制策略设计[J]. 燕山大学学报, 2014, 38(1): 57-65. Fang Xiao, Zheng Dezhong. Control strategy design for car autonomous navigation using adaptive dynamic programming[J]. Journal of Yanshan University, 2014, 38(1): 57-65. [6] Ruan Xiaogang, Chen Jing, Yu Naigong.Thalamic Cooperation between Cerebellum and Basal Ganglia Based on a New Tropism-Based Action-Dependent Heuristic Programming Method[J]. Neurocomputing (S0925-2312), 2012, 93(93): 27-40. [7] 沈晶, 顾国昌, 刘海波. 分层强化学习研究综述[J]. 模式识别与人工智能, 2005, 18(5): 574-581. Shen Jing, Gu Guochang, Liu Haibo. A Survey of Hierarchieal Reinforcement Learning[J]. Pattern Recognition and Artificial Intelligence, 2005, 18(5): 574-581. [8] K Doya.Reinforcement Learning in Continuous Time and Space[J]. Neural Computation(S0899-7667), 2000, 12(1): 219-245. [9] H He, Z Ni, J Fu.A three-network architecture for on-line learning and optimization based on adaptive dynamic programming[J]. Neurocomputing (S0925-2312), 2012, 78(1): 3-13. [10] 杜治, 苏宇, 彭昌勇, 等. 基于多层次启发式动态规划算法的电力系统动态等值[J]. 电力系统保护与控制, 2016, 44(17): 1-9. Du Zhi, Su Yu, Peng Changyong, et al. Dynamic equivalent of power system based on global representation heuristic dynamic programming algorithm [J]. Power System Protection and Control, 2016, 44(17): 1-9. [11] 沈郁, 陈伟彪, 姚伟, 等. 采用新型自适应动态规划算法的柔性直流输电附加阻尼控制[J]. 电网技术, 2016, 40(12): 3768-3774. SHEN Yu, CHEN Weibiao, YAO Wei, et al. Supplementary Damping Control of VSC-HVDC Transmission System Using a Novel Heuristic Dynamic Programming[J]. Power System Technology, 2016, 40(12): 3768-3774. [12] B Liu, S Li, Y Lou, et al.RETRACTED ARTICLE: A hierarchical learning architecture with multiple-goal representations and multiple timescale based on approximate dynamic programming[J]. Neural Computing and Applications(S0941-0643), 2013, 22(6): 1257-1257. [13] L Derong, X Xiaoxu, Z Yi.Action-dependent adaptive Critic designs[C]//International Joint Conference on Neural Networks. 2001: 990-995. [14] J Si, A G Barto, W B Powell, et al Handbook of Learning and Approximate Dynamic Programming[M]. Wiley, New York, 2004. [15] Y Kobayashi, Y Inoue, M Yamamoto, et al.Contribution of Pedunculopontine Tegmental Nucleus Neurons to Performance of Visually Guided Saccade Tasks in Monkeys[J]. Journal of Neurophysiology(S0022-3077), 2002, 88(2): 715-731. [16] Y Kobayashi, K I Okada.Reward Prediction Error Computation in the Pedunculopontine Tegmental Nucleus Neurons[J]. Annals of the New York Academy of Sciences(S0077-8923), 2007, 1104(1): 310-323. [17] M M Botvinick, Y Niv, A C Barto.Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective[J]. Cognition (S0010-0277), 2009, 113(3): 262-280. |