The Mutual Support Between Smart Grids and Smart Transportation
With AlphaGo’s victory, the launch of driverless taxis, and an AI agent winning a silver medal at the International Mathematical Olympiad (IMO), the rapid development of artificial intelligence is changing people’s lives and repeatedly demonstrating our understanding of learning. However, given the current general deficiencies in environmental perception and execution response capabilities of intelligent agents, how to achieve efficient collaboration among multiple agents through the learning process, and how to move AI technology from theory to practice, has become a key challenge in promoting the large-scale application of AI.
A research project titled “Efficient Reinforcement Learning for Large-Scale Multi-Agent Systems,” led by Professor Yaodong Yang’s research group at the Institute for Artificial Intelligence, Peking University, and in collaboration with the School of Engineering, School of Computer Science and Technology, Peking University, and King’s College London, has been published in *Nature Machine Intelligence*, a top-tier academic journal in artificial intelligence. This work achieves, for the first time, efficient decentralized collaborative decision-making in multi-agent systems, significantly improving the scalability and applicability of artificial intelligence methods in large-scale multi-agent systems.
1. To explore cooperation among intelligent agents through reinforcement learning
“Efficient Reinforcement Learning for Large-Scale Multi-Agent Systems,” literally meaning extending the reinforcement learning techniques used by AlphaGo from single-agent systems to multi-agent systems, is a significant advancement. “This means we can now control multiple agents simultaneously, for example, in applications such as traffic lights, power grids, and autonomous vehicles. These systems involve the collaborative operation of numerous agents. The social impact of this work lies in the fact that it is the first multi-agent reinforcement learning application led by Chinese researchers, breaking the previous monopoly of Western institutions in this field,” explained Yang Yaodong, the corresponding author of the paper and a researcher at the Institute for Artificial Intelligence, Peking University.
As an efficient learning paradigm, reinforcement learning is widely used in the field of artificial intelligence and has given rise to many well-known applications in the gaming industry, such as AlphaStar in StarCraft, AI opponents in Honor of Kings, and AlphaGo. The explosive popularity of Chat-GPT, trained through a reinforcement learning process based on human feedback, further demonstrates the importance of reinforcement learning and its core position in the development of artificial intelligence.
“Researching reinforcement learning in multi-agent systems is a complex problem because when multiple agents learn together, they engage in complex interactions such as cooperation, competition, and game theory. Our goal is to explore how each agent can effectively learn and form policies in a multi-agent environment,” said Yang Yaodong. He explained that compared to previous studies that often only dealt with a few agents, the research team extended it to real-world scenarios with hundreds or even thousands of agents. The challenge arises because the interaction relationships between agents grow exponentially with the number of agents, leading to the so-called “curse of dimensionality.” “To address this problem, we designed an internal representation structure for multi-agent systems, enabling the multi-agent problem to be effectively extended, providing a new solution for reinforcement learning in large-scale agent systems.”
It is understood that this algorithm significantly reduces interaction costs without sacrificing decision-making performance, thereby improving the algorithm’s scalability in large-scale systems. The paper’s first author, Chengdong Ma, a doctoral student at the Institute for Artificial Intelligence at Peking University, explained: “This algorithm decouples the global dynamic characteristics of the system, enabling each agent to independently learn local dynamic characteristics and decentralized strategies, transforming complex large-scale multi-agent decision-making problems into more easily solvable optimization problems.” Multiple test results show that this method can be efficiently extended to complex systems containing hundreds or thousands of agents, promoting the development of large-scale artificial intelligence algorithm applications.
2. The difficulty of AI collaboration lies in the word "many".
With the development of artificial intelligence, more and more AI systems are being applied to various fields. These systems are often not isolated entities. In complex real-world environments, how can they coexist harmoniously? How to enable efficient decentralized collaborative decision-making by each agent in a large-scale multi-agent environment involving hundreds of agents, without relying on global information, has become key to the advanced applications of artificial intelligence.
Yang Yaodong explained that the highlight of the research is the proposed method for solving the problem under conditions where the communication network is not globally communicable. By utilizing the network structure to connect a large number of agents, a method similar to the “six degrees of separation” theory is adopted (i.e., any two strangers can be connected through a network of acquaintances, with no more than six people required to become acquainted). A few intermediate nodes can connect to a wider network.
“Simply put, it’s a ‘world model,’ predicting your predictions,” Yang Yaodong added. He also stated that capturing global information in a localized way assists the decision-making process. This method not only solves the problem of how to achieve effective collaboration under limited communication conditions but also provides a new perspective for understanding and optimizing information dissemination and decision-making in decentralized systems.
“Because the interaction costs between each control unit and between the control unit and the environment are very high, and such interactions carry certain security risks (such as in large power grid systems), these systems often have objective communication limitations (such as long communication distances, privacy risks in global communication, and energy consumption limitations), making it difficult for control units to achieve global information exchange. These challenges hinder the expansion and application of artificial intelligence decision-making algorithms in large-scale systems. Against this backdrop, there is an urgent need for an algorithm that can help large-scale multi-agent systems achieve efficient, low-interaction-cost, and decentralized decision-making capabilities. This is the challenge that this research has overcome,” said Yang Yaodong.
According to reports, this research has already been applied in the fields of smart transportation(OTTAI-ITS) and smart energy. “Smart grids are relatively simple to implement because their settings can be determined by humans, while autonomous driving involves more interactions between intelligent agents and social issues. Through effective decoupling, smart grid management, autonomous driving, and travel efficiency have achieved approximately 30% improvement in energy utilization and 50% reduction in energy consumption, respectively, while maintaining performance, significantly improving overall efficiency,” said Yang Yaodong. He added that although autonomous driving currently faces many challenges, the trend towards intelligentization will greatly improve traffic efficiency and reduce accident rates. Furthermore, utilizing heterogeneous intelligent agents can better enable different agents to work collaboratively, which will be a crucial aspect of future intelligent development and widespread adoption.