A Recursive DRL-based Resource Allocation Method for Multibeam Satellite Communication Systems

MENG Haowei; XIN Ning; QIN Hao; ZHAO Di

doi:10.23919/cje.2022.00.135

Article Contents

Article Navigation > Chinese Journal of Electronics > 2024 > Uncorrected proof

Haowei MENG, Ning XIN, Hao QIN, et al., “A Recursive DRL-based Resource Allocation Method for Multibeam Satellite Communication Systems,” Chinese Journal of Electronics, vol. 33, no. 2, article no. , 2024 doi: 10.23919/cje.2022.00.135

Citation:

Haowei MENG, Ning XIN, Hao QIN, et al., “A Recursive DRL-based Resource Allocation Method for Multibeam Satellite Communication Systems,” Chinese Journal of Electronics, vol. 33, no. 2, article no. , 2024 doi: 10.23919/cje.2022.00.135

Citation:

PDF( 4048 KB)

A Recursive DRL-based Resource Allocation Method for Multibeam Satellite Communication Systems

doi: 10.23919/cje.2022.00.135

MENG Haowei^1
,,
XIN Ning^2
,,
QIN Hao^1
,,
ZHAO Di^1
,

1.
the State Key Laboratory of Integrated Services Networks, Xidian University, 710071, China
2.
the China Academy of Space Technology, Institute of Telecommunication Satellite, Beijing 100094, China

Funds: This work has been supported by the National Natural Science Foundation of China (62071354), the Key Research and Development Program of Shaanxi (2022ZDLGY05-08), and the ISN State Key Laboratory

More Information

Author Bio:
Haowei MENG received the B.S. degree on Communication Engineering from Zheng Zhou University, Henan, China, 2020. He is currently working towards his master’s degree from Xidian University, Xi’an, China. His research interests include wireless resource management and reinforcement learning. (Email: 20011210418@stu.xidian.edu.cn)

Ning XIN received the Ph.D. degree from CAS in 2014, M.S. degree from Naval Aeronautical Engineering Institute in 2007, and B.S. degree from Yantai University in 2004. He is a researcher in the Institute of Telecommunication Satellite, China Academy of Space Technology, Beijing, China. His research interests are spacecraft design and satellite payload design. (Email: xinning7@sina.com)

Hao QIN (corresponding author) received the B.S., M.S., and Ph.D. degrees in Communication and Information systems from Xidian University, Xi’an, China, in 1996, 1999, and 2004, respectively. In 2004, he joined the School of Telecommunications Engineering, Xidian University, where he is currently an Associate Professor of communications and information systems. His research interests include wireless communications and satellite communications. (Email: hqin@mail.xidian.edu.cn)

Di ZHAO received the B.S. degree on Communication Engineering from Shandong Normal University, Jinan, China, 2018. She is currently working towards her Ph.D. degree from Xidian University, Xi’an, China. Her research interests include wireless resource management, satellite communications and reinforcement learning in wireless networks. (Email: dzhao_1@stu.xidian.edu.cn)
Received Date: 2022-05-17
Accepted Date: 2023-06-20

Available Online: 2023-07-20

Abstract

Abstract

Optimization-based radio resource management (RRM) has shown significant performance gains on high-throughput satellites (HTSs). However, as the number of allocable on-board resources increases, traditional RRM are difficult to apply in real satellite systems due to its intense computational complexity. DRL is a promising solution for the resource allocation problem due to its model-free advantages. Nevertheless, the action space faced by DRL increases exponentially with the increase of communication scale, which leads to an excessive exploration cost of the algorithm. In this paper, we propose a recursive frequency resource allocation algorithm based on long-short term memory (LSTM) and proximal policy optimization (PPO), called PPO-RA-LOOP, where RA means resource allocation and LOOP means the algorithm outputs actions in a recursive manner. Specifically, the PPO algorithm uses LSTM network to recursively generate sub-actions about frequency resource allocation for each beam, which significantly cut down the action space. In addition, the LSTM-based recursive architecture allows PPO to better allocate the next frequency resource by using the generated sub-actions information as a prior knowledge, which reduces the complexity of the neural network. The simulation results show that PPO-RA-LOOP achieved higher spectral efficiency and system satisfaction compared with other frequency allocation algorithms.
- High-throughput satellites,
- Proximal policy optimization,
- Deep reinforcement learning,
- Long-short term memory

FullText(HTML)

References(17)

References

[1]	F. Fourati and M. S. Alouini, “Artificial intelligence for satellite communication: A review,” Intelligent and Converged Networks, vol. 2, no. 3, pp. 213–243, 2021. doi: 10.23919/ICN.2021.0015
[2]	H. Yang, J. H. Dang, Y. H. Pan, et al., “A digital channelizer design approach for broadband satellite communications based on frequency domain filter theory,” in Proceedings 2013 International Conference on Mechatronic Sciences, Electric Engineering and Computer (MEC), Shenyang, China, pp.2986–2990, 2013.
[3]	V. K. Singh, W. G. Ho, and R. Gharpurey, “A frequency-folded ADC channelizer with digital equalization and relaxed anti-alias filtering,” IEEE Transactions on Circuits and Systems I:Regular Papers, vol. 65, no. 7, pp. 2304–2317, 2018. doi: 10.1109/TCSI.2017.2776918
[4]	K. Kaneko, H. Nishiyama, N. Kato, et al., “An evaluation of flexible frequency utilization in high throughput satellite communication systems with digital channelizer,” in IEEE International Conference on Communications (ICC), Paris, France, pp.1–6, 2017.
[5]	L. Del Consuelo Hernandez Ruiz Gaytan, Z. N. Pan, J. Liu, et al., “Dynamic scheduling for high throughput satellites employing priority code scheme,” IEEE Access, vol. 3, pp. 2044–2054, 2015. doi: 10.1109/ACCESS.2015.2495226
[6]	A. I. Aravanis, B. M. R. Shankar, P. D. Arapoglou, et al., “Power allocation in multibeam satellite systems: A two-stage multi-objective optimization,” IEEE Transactions on Wireless Communications, vol. 14, no. 6, pp. 3171–3182, 2015. doi: 10.1109/TWC.2015.2402682
[7]	X. Zhang, J. J. Wang, C. X. Jiang, et al., “Robust beamforming for multibeam satellite communication in the face of phase perturbations,” IEEE Transactions on Vehicular Technology, vol. 68, no. 3, pp. 3043–3047, 2019. doi: 10.1109/TVT.2019.2896245
[8]	H. M. Zhang, C. X. Jiang, J. J. Wang, et al., “Multicast beamforming optimization in cloud-based heterogeneous terrestrial and satellite networks,” IEEE Transactions on Vehicular Technology, vol. 69, no. 2, pp. 1766–1776, 2020. doi: 10.1109/TVT.2019.2959933
[9]	G. Cocco, T. De Cola, M. Angelone, et al., “Radio resource management optimization of flexible satellite payloads for DVB-S2 systems,” IEEE Transactions on Broadcasting, vol. 64, no. 2, pp. 266–280, 2018. doi: 10.1109/TBC.2017.2755263
[10]	Y. Kawamoto, T. Kamei, M. Takahashi, et al., “Flexible resource allocation with inter-beam interference in satellite communication systems with a digital channelizer,” IEEE Transactions on Wireless Communications, vol. 19, no. 5, pp. 2934–2945, 2020. doi: 10.1109/TWC.2020.2969173
[11]	F. G. Ortiz-Gomez, L. Lei, E. Lagunas, et al., “Machine learning for radio resource management in multibeam GEO satellite systems,” Electronics, vol. 11, no. 7, article no. 992, 2022. doi: 10.3390/electronics11070992
[12]	J. H. Liu, B. K. Zhao, Q. Xin, et al., “Dynamic channel allocation for satellite internet of things via deep reinforcement learning,” in 2020 International Conference on Information Networking (ICOIN), Barcelona, Spain, pp.465–470, 2020.
[13]	X. Hu, S. J. Liu, R. Chen, et al., “A deep reinforcement learning-based framework for dynamic resource allocation in multibeam satellite systems,” IEEE Communications Letters, vol. 22, no. 8, pp. 1612–1615, 2018. doi: 10.1109/LCOMM.2018.2844243
[14]	S. J. Ma, X. Hu, X. L. Liao, et al., “Deep reinforcement learning for dynamic bandwidth allocation in multi-beam satellite systems,” in 2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS), Chengdu, China, pp.955–959, 2021.
[15]	G. Maral and M. Bousquet, Satellite Communications Systems: Systems, Techniques and Technology, 6th ed., John Wiley & Sons, Sussex, UK, 2020.
[16]	M. Takahashi, Y. Kawamoto, N. Kato, et al., “DBF-based fusion control of transmit power and beam directivity for flexible resource allocation in HTS communication system toward B5G,” IEEE Transactions on Wireless Communications, vol. 21, no. 1, pp. 95–105, 2022. doi: 10.1109/TWC.2021.3093878
[17]	X. Hu, X. L. Liao, Z. J. Liu, et al., “Multi-agent deep reinforcement learning-based flexible satellite payload for mobile terminals,” IEEE Transactions on Vehicular Technology, vol. 69, no. 9, pp. 9849–9865, 2020. doi: 10.1109/TVT.2020.3002983