摘要: 深度強化學習是人工智能領域新興技術之一, 它將深度學習強大的特征提取能力與強化學習的決策能力相結合, 實現從感知輸入到決策輸出的端到端框架, 具有較強的學習能力且應用廣泛. 然而, 已有研究表明深度強化學習存在安全漏洞, 容易受到對抗樣本攻擊. 為提高深度強化學習的魯棒性、實現係統的安全應用, 本文針對已有的研究工作, 較全麵地綜述了深度強化學習方法、對抗攻擊、防禦方法與安全性分析, 並總結深度強化學習安全領域存在的開放問題以及未來發展的趨勢, 旨在為從事相關安全研究與工程應用提供基礎.
Deep Reinforcement Learning (DRL) solutions are becoming pervasive at the edge of the network as they enable autonomous decision-making in a dynamic environment. However, to be able to adapt to the ever-changing environment, the DRL solution implemented on an embedded device has to continue to occasionally take exploratory actions even after initial convergence. In other words, the device has to occasionally take random actions and update the value function, i.e., re-train the Artificial Neural Network (ANN), to ensure its performance remains optimal. Unfortunately, embedded devices often lack processing power and energy required to train the ANN. The energy aspect is particularly challenging when the edge device is powered only by a means of Energy Harvesting (EH). To overcome this problem, we propose a two-part algorithm in which the DRL process is trained at the sink. Then the weights of the fully trained underlying ANN are periodically transferred to the EH-powered embedded device taking actions. Using an EH-powered sensor, real-world measurements dataset, and optimizing for Age of Information (AoI) metric, we demonstrate that such a DRL solution can operate without any degradation in the performance, with only a few ANN updates per day.