Value Function Dynamic Estimation in Reinforcement Learning based on Data Adequacy

Huifan Gao, Yinghui Pan, Jing Tang, Yifeng Zeng, Peihua Chai, Langcai Cao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

In recent years, reinforcement learning has played an important role in the study of decision problem in computer games. To solve the problem of how to better estimate the value function with limited computational resources, this paper proposes a dynamic estimation method of value function based on data adequacy. In consideration of the varying complexity of each state in the MDP model, we propose a dynamic value function estimation method which is different from the fixed value function estimation method in traditional methods. Based on the PigChase challenge of the Malmo project launched by Microsoft in 2017, we compare the new method with the existing techniques. Experimental results show that the performance of the proposed algorithm is better than traditional algorithms.

Original languageEnglish
Title of host publicationProceedings of the 2020 4th High Performance Computing and Cluster Technologies Conference, HPCCT 2020 and 3rd International Conference on Big Data and Artificial Intelligence, BDAI 2020
PublisherACM
Pages204-208
Number of pages5
ISBN (Electronic)9781450375603
DOIs
Publication statusPublished - 3 Jul 2020
Event4th High Performance Computing and Cluster Technologies Conference, HPCCT 2020 and the 3rd International Conference on Big Data and Artificial Intelligence, BDAI 2020 - Qingdao, Online, China
Duration: 3 Jul 20206 Jul 2020

Publication series

NameACM International Conference Proceeding Series

Conference

Conference4th High Performance Computing and Cluster Technologies Conference, HPCCT 2020 and the 3rd International Conference on Big Data and Artificial Intelligence, BDAI 2020
Country/TerritoryChina
CityQingdao, Online
Period3/07/206/07/20

Keywords

  • confidence interval
  • dynamic programming
  • probability distribution
  • q-learning
  • reinforcement learning

Fingerprint

Dive into the research topics of 'Value Function Dynamic Estimation in Reinforcement Learning based on Data Adequacy'. Together they form a unique fingerprint.

Cite this