Deep RL agents for TE
一、 前言
這篇文章選于ICNP2020,作者將深度強化學習方法用于流量工程問題,能夠實作multi-region網路的全域優化,并能夠適應高維、動態變化的網路,在看這篇文章之前,我未曾深入的了解過強化學習,但在這篇文章之后,我覺得相見恨晚,常說多智能體協同,而未曾聽說強化學習中的agent就具有智能體的含義,這是一種損失,之前,我嘗試將遺傳演算法和神經網路相結合以解決某些控制問題,這篇文章也給了我一些新的思路,
二、 文章概述
對于multi-regionTE問題的解決方法可以大致分為兩類,一類是Traditional model-based routing,另一類是Data-driven routing,作者的方法屬于第二種,結合了RL強化學習和DNN深度神經網路,(在此,我不帶入任何一方的觀點來評價哪種方法更好,而是客觀闡述作者如何解決問題,)為了解決多區域問題,作者選擇為每一種區域設定兩種agent:T-agent負責terminal demand,O-agent負責outgoing demand,其中,terminal demand的destination node在當前區域,outgoing demand的destination node在其他區域,同時,兩者(agents)的輸入采用edge utilization代替傳統RL方法的TM,可以加快收斂速度,T-agent的reward function只和當前region有關,O-agent的reward不僅與當前region有關,還和其他的區域相關,因為outgoing demand可能會造成相鄰區域的擁塞,區分T-agent和O-agent的方法可以減少區域間的通信開銷,
三、 演算法設計
為了減少decision space,作者采用了預計算forwarding paths和區分mice flow及elephant flow的方法,這在下面介紹T-agent和O-agent的設計時,會集中體現,
T-agent
- input

- 當edge failure發生時,edge utilization為0.
- action
- 區分mice flow和elephant flow,mice flow使用static routing(ECMP),agent只學習并調整elephant flow,預計算每對ingress node to egress node的K條轉發路徑(K=3較好,K>3計算消耗大),T-agent學習的是路徑上的流量分割比率,
- reward

O-agent
- input

- action
- O-agent不區分mice flow和elephant flow,因為一個mice flow如果跨越多個區域,也可能會導致多個區域的擁塞,它也設定一系列轉發路徑,它決定的是如何跳轉到下一個egress node,
- reward

四、 仿真
- First, we use a measured topology called Telstra (AS 1221) obtained from the Rocketfuel project [29]. The network nodes are scattered(分散的) across Australia. We consider each state or territory(領土) of Australia as a region and ignore the regions with few nodes. Thus we obtain five regions. We also remove the nodes whose degree is no larger than one, which does not affect the evaluation of routings [30]. (孤立或不重要的,到時候需要看一下文獻30)Particularly, the reduced Telstra topology contains 38 nodes and 152 edges.
- Second, we use a real topology obtained from Google cloud [31]. Particularly, we consider three regions: Europe, Asia, and North America, and there are a total of 44 nodes and 160 edges.
- Third, we use a large-scale synthetic topology whose region-level topology is a 2D 4×4 grid. Thus there are 16 regions in total. We use BRITE [32] to generate each region’s topology randomly. In particular, each region’s topology contains 10 to 15 nodes, and the link density (the ratio of link number divided by node number) is set to 2 (i.e., 20 to 30 pairs of edges in one region) according to our analysis of many available topologies [33] [29]. For any two adjacent regions, we generate 2 to 4 pairs of edges by selecting border nodes in each region randomly. Particularly, we use a synthetic topology (named as BRITE) with 204 nodes and 964 edges.
MRTE對比方案,HPR,ECMP,TRPO,

可見,演算法表現能力顯著,
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/352102.html
標籤:AI
