系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (7): 2311-2318.doi: 10.12305/j.issn.1001-506X.2022.07.28

• 通信与网络 • 上一篇    下一篇

基于深度强化学习的网络路由优化方法

孟泠宇1,2, 郭秉礼1,2,*, 杨雯1,2, 张欣伟1,2, 赵柞青1,2, 黄善国1,2   

  1. 1. 北京邮电大学电子工程学院, 北京 100876
    2. 信息光子学与光通信国家重点实验室, 北京 100876
  • 收稿日期:2021-06-29 出版日期:2022-06-22 发布日期:2022-06-28
  • 通讯作者: 郭秉礼
  • 作者简介:孟泠宇(1996—), 男, 硕士研究生, 主要研究方向为深度强化学习、数据中心网络资源优化|郭秉礼(1982—), 男, 副教授, 硕士研究生导师, 博士, 主要研究方向为数据中心与高性能计算中的光互连网络技术、天地一体化网络控制技术|杨雯(1995—), 女, 硕士研究生, 主要研究方向为强化学习算法、数据中心光网络性能优化方面|张欣伟(1998—), 男, 硕士研究生, 主要研究方向为数据中心光互连|赵柞青(1994—), 男, 博士研究生, 主要研究方向为数据中心光互连架构和调度算法|黄善国(1978—), 男, 教授, 博士研究生导师, 博士,主要研究方向为多维光交换、光网络理论与技术
  • 基金资助:
    国家自然科学基金(61771074);国家重点研发计划(2018YFB1801702)

Network routing optimization approach based on deep reinforcement learning

Lingyu MENG1,2, Bingli GUO1,2,*, Wen YANG1,2, Xinwei ZHANG1,2, Zuoqing ZHAO1,2, Shanguo HUANG1,2   

  1. 1. School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2. State Key Laboratory of Information Photonics and Optical Communication, Beijing 100876, China
  • Received:2021-06-29 Online:2022-06-22 Published:2022-06-28
  • Contact: Bingli GUO

摘要:

针对同一网络拓扑下不同网络负载的路由优化问题, 在深度强化学习方法的基础上, 提出了两种依据当前网络流量状态进行路由分配的优化方法。通过网络仿真系统与深度强化学习模型的迭代交互, 实现了对于流量关系分布的网络路由持续训练与优化。在利用深度确定性策略梯度(deep deterministec policy gradient, DDPG)算法解决路由优化问题上进行了提升和改进, 使得该优化方法更适合解决网络路由优化的问题。同时, 设计了一种全新的链路权重构造策略, 利用网络流量构造出用于神经网络输入状态元素, 通过对原始数据的预处理加强了神经网络的学习效率, 大大提升了训练模型的稳定性。并针对高纬度大规模网络的连续动作空间进行了动作空间离散化处理, 有效降低了其动作空间的复杂度, 加快了模型收敛速度。实验结果表明, 所提优化方法可以适应不断变化的流量和链路状态, 增强模型训练的稳定性并提升网络性能。

关键词: 深度强化学习, 路由优化, 深度确定性策略梯度算法

Abstract:

Aiming at the routing optimization problem of different network loads under the same network topology, based on the deep reinforcement learning method, two optimization methods for routing distribution based on the current network traffic state are proposed. Through the iterative interaction between the network simulation system and the deep reinforcement learning model, continuous training and optimization of network routing for the distribution of traffic relationships are realized. Improvements have been made in using the deep deterministec policy gradient (DDPG) algorithm to solve the routing optimization problem, making this optimization method more suitable for solving the problem of network routing optimization. At the same time, a brand-new link weight construction strategy is designed, which uses network traffic to construct input state elements for the neural network. Through the preprocessing of the original data, the learning efficiency of the neural network is strengthened, and the stability of the training model is greatly improved. And for the continuous action space of the high-latitude large-scale network, the action space is discretized, which effectively reduces the complexity of the action space and speeds up the model convergence. Experimental results show that the proposed optimization method can adapt to changing traffic and link status, enhance the stability of model training and improve network performance.

Key words: deep reinforcement learning, routing optimization, deep deterministec policy gradient (DDPG) algorithm

中图分类号: