Cooperative Multiagent Learning and Exploration With Min-Max Intrinsic Motivation

Yaqing Hou, Jie Kang, Haiyin Piao, Yifeng Zeng, Yew-Soon Ong, Yaochu Jin, Qiang Zhang*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)
28 Downloads (Pure)

Abstract

In the field of multiagent reinforcement learning (MARL), the ability to effectively explore unknown environments and collect information and experiences that are most beneficial for policy learning represents a critical research area. However, existing work often encounters difficulties in addressing the uncertainties caused by state changes and the inconsistencies between agents' local observations and global information, which presents significant challenges to coordinated exploration among multiple agents. To address this issue, this article proposes a novel MARL exploration method with Min-Max intrinsic motivation (E2M) that promotes the learning of joint policies of agents by introducing surprise minimization and social influence maximization. Since the agent is subject to unstable state changes in the environment, we introduce surprise minimization by computing state entropy to encourage the agents to cope with more stable and familiar situations. This method enables surprise estimation based on the low-dimensional representation of states obtained from random encoders. Furthermore, to prevent surprise minimization from leading to conservative policies, we introduce mutual information between agents' behaviors as social influence. By maximizing social influence, the agents are encouraged to interact to facilitate the emergence of cooperative behavior. The performance of our proposed E2M is testified across a range of popular StarCraft II and Multiagent MuJoCo tasks. Comprehensive results demonstrate its effectiveness in enhancing the cooperative capability of the multiple agents.

Original languageEnglish
Pages (from-to)2852-2864
Number of pages13
JournalIEEE Transactions on Cybernetics
Volume55
Issue number6
Early online date18 Apr 2025
DOIs
Publication statusPublished - 1 Jun 2025

Keywords

  • Cooperation
  • exploration
  • multiagent reinforcement learning (MARL)
  • multiagent systems

Cite this