Cooperative Multi-Agent Reinforcement Learning With Approximate Model Learning

Park, Young Joon; Lee, Young Jae; Kim, Seoung Bum

doi:10.1109/ACCESS.2020.3007219

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Cooperative Multi-Agent Reinforcement Learning With Approximate Model Learning

Authors: Park, Young Joon; Lee, Young Jae; Kim, Seoung Bum

Issue Date: 2020

Publisher: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords: Reinforcement learning; model-free method; multi-agent system; multi-agent cooperation; actor-critic method; deterministic policy gradient

Citation: IEEE ACCESS, v.8, pp.125389 - 125400

Indexed: SCIE
SCOPUS

Journal Title: IEEE ACCESS

Volume: 8

Start Page: 125389

End Page: 125400

URI: https://scholar.korea.ac.kr/handle/2021.sw.korea/58989

DOI: 10.1109/ACCESS.2020.3007219

ISSN: 2169-3536

Abstract: In multi-agent reinforcement learning, it is essential for agents to learn communication protocol to optimize collaboration policies and to solve unstable learning problems. Existing methods based on actor-critic networks solve the communication problem among agents. However, these methods have difficulty in improving sample efficiency and learning robust policies because it is not easy to understand the dynamics and nonstationary of the environment as the policies of other agents change. We propose a method for learning cooperative policies in multi-agent environments by considering the communications among agents. The proposed method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients to centrally train decentralized policies. The actor networks cause the agents to communicate using forward and backward paths and to determine subsequent actions. The critic network helps to train the actor networks by sending gradient signals to the actors according to their contribution to the global reward. To address issues with partial observability and unstable learning, we propose using auxiliary prediction networks to approximate state transitions and the reward function. We used multi-agent environments to demonstrate the usefulness and superiority of the proposed method by comparing it with existing multi-agent reinforcement learning methods, in terms of both learning efficiency and goal achievements in the test phase. The results demonstrate that the proposed method outperformed other alternatives.

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > School of Industrial and Management Engineering > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher KIM, Seoung Bum photo

KIM, Seoung Bum: College of Engineering (School of Industrial and Management Engineering)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :7,024,199; Today View :1,433

RSS_1.0 RSS_2.0 ATOM_1.0

145 Anam-ro, Seongbuk-gu, Seoul, 02841, Korea+82-2-3290-2963

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE