基于深度强化学习的网络路由优化方法

doi:10.12305/j.issn.1001-506X.2022.07.28

系统工程与电子技术 ›› 2022, Vol. 44 ›› Issue (7): 2311-2318.doi: 10.12305/j.issn.1001-506X.2022.07.28

基于深度强化学习的网络路由优化方法

孟泠宇^1,², 郭秉礼^1,^2,*, 杨雯^1,², 张欣伟^1,², 赵柞青^1,², 黄善国^1,²

1. 北京邮电大学电子工程学院, 北京 100876
2. 信息光子学与光通信国家重点实验室, 北京 100876

收稿日期:2021-06-29 出版日期:2022-06-22 发布日期:2022-06-28
通讯作者: 郭秉礼
作者简介:孟泠宇(1996—), 男, 硕士研究生, 主要研究方向为深度强化学习、数据中心网络资源优化|郭秉礼(1982—), 男, 副教授, 硕士研究生导师, 博士, 主要研究方向为数据中心与高性能计算中的光互连网络技术、天地一体化网络控制技术|杨雯(1995—), 女, 硕士研究生, 主要研究方向为强化学习算法、数据中心光网络性能优化方面|张欣伟(1998—), 男, 硕士研究生, 主要研究方向为数据中心光互连|赵柞青(1994—), 男, 博士研究生, 主要研究方向为数据中心光互连架构和调度算法|黄善国(1978—), 男, 教授, 博士研究生导师, 博士，主要研究方向为多维光交换、光网络理论与技术
基金资助:
国家自然科学基金(61771074);国家重点研发计划(2018YFB1801702)

Network routing optimization approach based on deep reinforcement learning

Lingyu MENG^1,², Bingli GUO^1,^2,*, Wen YANG^1,², Xinwei ZHANG^1,², Zuoqing ZHAO^1,², Shanguo HUANG^1,²

1. School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
2. State Key Laboratory of Information Photonics and Optical Communication, Beijing 100876, China

Received:2021-06-29 Online:2022-06-22 Published:2022-06-28
Contact: Bingli GUO

摘要/Abstract

摘要：

针对同一网络拓扑下不同网络负载的路由优化问题, 在深度强化学习方法的基础上, 提出了两种依据当前网络流量状态进行路由分配的优化方法。通过网络仿真系统与深度强化学习模型的迭代交互, 实现了对于流量关系分布的网络路由持续训练与优化。在利用深度确定性策略梯度(deep deterministec policy gradient, DDPG)算法解决路由优化问题上进行了提升和改进, 使得该优化方法更适合解决网络路由优化的问题。同时, 设计了一种全新的链路权重构造策略, 利用网络流量构造出用于神经网络输入状态元素, 通过对原始数据的预处理加强了神经网络的学习效率, 大大提升了训练模型的稳定性。并针对高纬度大规模网络的连续动作空间进行了动作空间离散化处理, 有效降低了其动作空间的复杂度, 加快了模型收敛速度。实验结果表明, 所提优化方法可以适应不断变化的流量和链路状态, 增强模型训练的稳定性并提升网络性能。

关键词: 深度强化学习, 路由优化, 深度确定性策略梯度算法

Abstract:

Aiming at the routing optimization problem of different network loads under the same network topology, based on the deep reinforcement learning method, two optimization methods for routing distribution based on the current network traffic state are proposed. Through the iterative interaction between the network simulation system and the deep reinforcement learning model, continuous training and optimization of network routing for the distribution of traffic relationships are realized. Improvements have been made in using the deep deterministec policy gradient (DDPG) algorithm to solve the routing optimization problem, making this optimization method more suitable for solving the problem of network routing optimization. At the same time, a brand-new link weight construction strategy is designed, which uses network traffic to construct input state elements for the neural network. Through the preprocessing of the original data, the learning efficiency of the neural network is strengthened, and the stability of the training model is greatly improved. And for the continuous action space of the high-latitude large-scale network, the action space is discretized, which effectively reduces the complexity of the action space and speeds up the model convergence. Experimental results show that the proposed optimization method can adapt to changing traffic and link status, enhance the stability of model training and improve network performance.

Key words: deep reinforcement learning, routing optimization, deep deterministec policy gradient (DDPG) algorithm

中图分类号:

TN256

孟泠宇, 郭秉礼, 杨雯, 张欣伟, 赵柞青, 黄善国. 基于深度强化学习的网络路由优化方法[J]. 系统工程与电子技术, 2022, 44(7): 2311-2318.

Lingyu MENG, Bingli GUO, Wen YANG, Xinwei ZHANG, Zuoqing ZHAO, Shanguo HUANG. Network routing optimization approach based on deep reinforcement learning[J]. Systems Engineering and Electronics, 2022, 44(7): 2311-2318.

图/表 7

图1

图2

图3

图4

图5

表1

图6

参考文献 22

1	BOUZIDI H, OUTTAGARTS A, LANGAR R. Deep reinforcement learning application for network latency management in software defined networks[C]//Proc. of the IEEE GLOBECOM, 2019.
2	DIEGO K , FERNANDO M R . Paulo estevesverissimo, Christian Esteverothenberg, siamakazodolmolky, andsteveuhlig[J]. Proceedings of the IEEE, 2015, 103 (1): 14- 76. doi: 10.1109/JPROC.2014.2371999
3	欧阳晔, 王立磊, 杨爱东, 等. 通信人工智能的下一个十年[J]. 电信科学, 2021, 37 (3): 1- 36.
	OUYANG Y , WANG L L , YANG A D . Next decade of telecommunications artificial intelligence[J]. Telecommunications Science, 2021, 37 (3): 1- 36.
4	ALEXANDER C, MOULI C, SAILESH K D: A SDN framework for distributed network analytics[C]//Proc. of the IFIP/IEEE International Symposium on Integrated Network Management, 2015: 9-17.
5	WANG F Y , ZHANG J J , ZHENG X , et al. Where does alpha go go: from church-turning thesis to alpha go thesis and beyond[J]. Acta Automatica Sinica, 2016, 3 (2): 113- 120.
6	CLARK D D, PARTRIDGE C, RAMMING J C, et al. A knowledge plane for the internet[C]//Proc. of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, 2003: 3-10.
7	BOUTABA R , SALAHUDDIN M A , LIMAM N , et al. A comprehensive survey on machine learning for networking: evolution, applications and research opportunities[J]. Journal of Internet Services and Applications, 2018, doi: 10.1186/s13174-018-0087-2
8	SENDRA S, REGO A, LLORET J, et al. Including artificial intelligence in a routing protocol using software defined networks[C]//Proc. of the IEEE International Conference on Communications Workshops, 2017: 670-674.
9	STAMPA G, ARIAS M, SANCHEZ-CHARLES D, et al. A deep-reinforcement learning approach for software-defined networking routing optimization[EB/OL]. [2021-06-25]. https://arxiv.org/abs/1709.07080.
10	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]. [2021-06-25]. https://arxiv.org/abs/1509.02971.
11	ZOPH B, GHIASI G, LIN T Y, et al. Rethinking pre-training and self-training[EB/OL]. [2021-06-25]. https://arxiv.org/abs/2006.06882v2/
12	王雪松, 张依阳, 程玉虎. 基于高斯过程分类器的连续空间强化学习[J]. 电子学报, 2009, 37 (6): 1153- 1158. doi: 10.3321/j.issn:0372-2112.2009.06.001
	WANG X S , ZHANG Y Y , CHENG Y H . Reinforcement learning for continuous spaces based on Gaussian process classifier[J]. Acta Electronica Sinica, 2009, 37 (6): 1153- 1158. doi: 10.3321/j.issn:0372-2112.2009.06.001
13	NGIAM J, KHOSLA A, KIM M, et al. Multimodal deep learning[C]//Proc. of the International Conference on Machine Learning, 2009.
14	SUTTON R S , BARTO A G . Reinforcement learning[J]. A Bradford Book, 1998, 15 (7): 665- 685.
15	LBERT M , ALBERTO R N , JOSEP C . Knowledge-defined networking[J]. ACM SIGCOMM Computer Communication Review, 2017, 47 (3): 2- 10. doi: 10.1145/3138808.3138810
16	CLARK D. A knowledge plane for the internet[C]//Proc of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, 2003.
17	SUTTON R S , BARTO N G . Reinforcement learning: an introduction[M]. Cambridge: MIT Press, 1998.
18	REIF J H . Depth-first search is inherently sequential[J]. Information Processing Letters, 1985, 20 (5): 229- 234. doi: 10.1016/0020-0190(85)90024-9
19	YAO Q, FAN Y, HU W, et al. On the training aspects of deep neural network (DNN) for parametric TTS synthesis[C]//Proc. of the IEEE International Conference on Acoustics, 2014.
20	GUO Z T , WEN G . Shortest path algorithm in time-dependent networks[J]. Chinese Journal of Computers, 2002, 2 (2): 165- 172.
21	ROUGHAN M . Simplifying the synthesis of internet traffic matrices[J]. ACM SIGCOMM Computer Communication Review, 2005, 35 (5): 93- 96. doi: 10.1145/1096536.1096551
22	WASSERSTEI N , RONALD L . Monte Carlo: concepts, algorithms, and applications[J]. Technometrics, 1996, 39 (3): 338- 338.

流量强度占全网总负载/%	未预处理模测试异常值/个	预处理模型测试异常值/个
25.0	25	14
37.5	15	8
50.0	5	7
62.5	14	4
75.0	11	2
87.5	10	5
100.0	3	2
112.5	2	0
125.0	8	1
合计	91	43

[1]	马子杰, 谢拥军. 体系作战下巡航导弹的动态隐身[J]. 系统工程与电子技术, 2022, 44(9): 2826-2831.
[2]	王冠, 茹海忠, 张大力, 马广程, 夏红伟. 弹性高超声速飞行器智能控制系统设计[J]. 系统工程与电子技术, 2022, 44(7): 2276-2285.
[3]	杨清清, 高盈盈, 郭玙, 夏博远, 杨克巍. 基于深度强化学习的海战场目标搜寻路径规划[J]. 系统工程与电子技术, 2022, 44(11): 3486-3495.
[4]	高昂, 董志明, 李亮, 宋敬华, 段莉. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2): 420-433.
[5]	马文, 李辉, 王壮, 黄志勇, 吴昭欣, 陈希亮. 基于深度随机博弈的近距空战机动决策[J]. 系统工程与电子技术, 2021, 43(2): 443-451.
[6]	高昂, 郭齐胜, 董志明, 杨绍卿. 基于EAS+MADRL的多无人车体系效能评估方法研究[J]. 系统工程与电子技术, 2021, 43(12): 3643-3651.
[7]	张堃, 李珂, 时昊天, 张振冲, 刘泽坤. 基于深度强化学习的UAV航路自主引导机动控制决策算法[J]. 系统工程与电子技术, 2020, 42(7): 1567-1574.
[8]	谢浩, 郭爱煌, 宋春林, 焦润泽. LTE-V下基于深度强化学习的基站选择算法[J]. 系统工程与电子技术, 2019, 41(7): 1652-1657.
[9]	李晨溪, 曹雷, 张永亮, 陈希亮, 周宇欢, 段理文. 基于知识的深度强化学习研究综述[J]. 系统工程与电子技术, 2017, 39(11): 2603-2613.

基于深度强化学习的网络路由优化方法

Network routing optimization approach based on deep reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 22

相关文章 9

编辑推荐

Metrics

本文评价