TY - GEN
T1 - Value Function Dynamic Estimation in Reinforcement Learning based on Data Adequacy
AU - Gao, Huifan
AU - Pan, Yinghui
AU - Tang, Jing
AU - Zeng, Yifeng
AU - Chai, Peihua
AU - Cao, Langcai
N1 - Funding Information: This work is supported in part by the National Natural Science Foundation of China (Grants No.61772442, 61836005, 11671335, 61562033 and 61806089).
PY - 2020/7/3
Y1 - 2020/7/3
N2 - In recent years, reinforcement learning has played an important role in the study of decision problem in computer games. To solve the problem of how to better estimate the value function with limited computational resources, this paper proposes a dynamic estimation method of value function based on data adequacy. In consideration of the varying complexity of each state in the MDP model, we propose a dynamic value function estimation method which is different from the fixed value function estimation method in traditional methods. Based on the PigChase challenge of the Malmo project launched by Microsoft in 2017, we compare the new method with the existing techniques. Experimental results show that the performance of the proposed algorithm is better than traditional algorithms.
AB - In recent years, reinforcement learning has played an important role in the study of decision problem in computer games. To solve the problem of how to better estimate the value function with limited computational resources, this paper proposes a dynamic estimation method of value function based on data adequacy. In consideration of the varying complexity of each state in the MDP model, we propose a dynamic value function estimation method which is different from the fixed value function estimation method in traditional methods. Based on the PigChase challenge of the Malmo project launched by Microsoft in 2017, we compare the new method with the existing techniques. Experimental results show that the performance of the proposed algorithm is better than traditional algorithms.
KW - confidence interval
KW - dynamic programming
KW - probability distribution
KW - q-learning
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85090892837&partnerID=8YFLogxK
U2 - 10.1145/3409501.3409517
DO - 10.1145/3409501.3409517
M3 - Conference contribution
AN - SCOPUS:85090892837
T3 - ACM International Conference Proceeding Series
SP - 204
EP - 208
BT - Proceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference, HPCCT 2020 and 3rd International Conference on Big Data and Artificial Intelligence, BDAI 2020
PB - ACM
T2 - 4th High Performance Computing and Cluster Technologies Conference, HPCCT 2020 and the 3rd International Conference on Big Data and Artificial Intelligence, BDAI 2020
Y2 - 3 July 2020 through 6 July 2020
ER -