TY - JOUR
T1 - Cooperative Multiagent Learning and Exploration With Min-Max Intrinsic Motivation
AU - Hou, Yaqing
AU - Kang, Jie
AU - Piao, Haiyin
AU - Zeng, Yifeng
AU - Ong, Yew-Soon
AU - Jin, Yaochu
AU - Zhang, Qiang
PY - 2025/6/1
Y1 - 2025/6/1
N2 - In the field of multiagent reinforcement learning (MARL), the ability to effectively explore unknown environments and collect information and experiences that are most beneficial for policy learning represents a critical research area. However, existing work often encounters difficulties in addressing the uncertainties caused by state changes and the inconsistencies between agents' local observations and global information, which presents significant challenges to coordinated exploration among multiple agents. To address this issue, this article proposes a novel MARL exploration method with Min-Max intrinsic motivation (E2M) that promotes the learning of joint policies of agents by introducing surprise minimization and social influence maximization. Since the agent is subject to unstable state changes in the environment, we introduce surprise minimization by computing state entropy to encourage the agents to cope with more stable and familiar situations. This method enables surprise estimation based on the low-dimensional representation of states obtained from random encoders. Furthermore, to prevent surprise minimization from leading to conservative policies, we introduce mutual information between agents' behaviors as social influence. By maximizing social influence, the agents are encouraged to interact to facilitate the emergence of cooperative behavior. The performance of our proposed E2M is testified across a range of popular StarCraft II and Multiagent MuJoCo tasks. Comprehensive results demonstrate its effectiveness in enhancing the cooperative capability of the multiple agents.
AB - In the field of multiagent reinforcement learning (MARL), the ability to effectively explore unknown environments and collect information and experiences that are most beneficial for policy learning represents a critical research area. However, existing work often encounters difficulties in addressing the uncertainties caused by state changes and the inconsistencies between agents' local observations and global information, which presents significant challenges to coordinated exploration among multiple agents. To address this issue, this article proposes a novel MARL exploration method with Min-Max intrinsic motivation (E2M) that promotes the learning of joint policies of agents by introducing surprise minimization and social influence maximization. Since the agent is subject to unstable state changes in the environment, we introduce surprise minimization by computing state entropy to encourage the agents to cope with more stable and familiar situations. This method enables surprise estimation based on the low-dimensional representation of states obtained from random encoders. Furthermore, to prevent surprise minimization from leading to conservative policies, we introduce mutual information between agents' behaviors as social influence. By maximizing social influence, the agents are encouraged to interact to facilitate the emergence of cooperative behavior. The performance of our proposed E2M is testified across a range of popular StarCraft II and Multiagent MuJoCo tasks. Comprehensive results demonstrate its effectiveness in enhancing the cooperative capability of the multiple agents.
KW - Cooperation
KW - exploration
KW - multiagent reinforcement learning (MARL)
KW - multiagent systems
UR - http://www.scopus.com/inward/record.url?scp=105002843274&partnerID=8YFLogxK
U2 - 10.1109/TCYB.2025.3557694
DO - 10.1109/TCYB.2025.3557694
M3 - Article
AN - SCOPUS:105002843274
SN - 2168-2267
VL - 55
SP - 2852
EP - 2864
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 6
ER -