Loading [MathJax]/jax/output/SVG/jax.js
WU Guangyu, GU Jiangchun. Remote Interference Source Localization: A Multi-UAV-Based Cooperative Framework[J]. Chinese Journal of Electronics, 2022, 31(3): 442-455. DOI: 10.1049/cje.2021.00.310
Citation: WU Guangyu, GU Jiangchun. Remote Interference Source Localization: A Multi-UAV-Based Cooperative Framework[J]. Chinese Journal of Electronics, 2022, 31(3): 442-455. DOI: 10.1049/cje.2021.00.310

Remote Interference Source Localization: A Multi-UAV-Based Cooperative Framework

Funds: This work was supported by the National Key Scientific Instrument and Equipment Development Project (61827801).
More Information
  • Author Bio:

    WU Guangyu: is currently pursuing the M.S. degree with the Department of Computer Science and Technology, University of Science and Technology of China, Hefei, China. He received the B.S. degree with the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China. His research interests include machine learning, mobile computing, and the Internet of things. (Email: gywu@mail.ustc.edu.cn)

    GU Jiangchun: (corresponding author) received the B.S. degree in electronic and information engineering from Xidian University, Xi’an, China, in 2018, and the M.S. degree in information and communication engineering from the College of Communications Engineering, Army Engineering University of PLA, Nanjing, China, in 2020, where he is currently pursuing the Ph.D. degree. His research interests include UAV communications, convex optimization techniques, and reinforcement learning. (Email: gujiangchungjc@sina.com)

  • Received Date: August 28, 2021
  • Accepted Date: February 09, 2022
  • Available Online: March 04, 2022
  • Published Date: May 04, 2022
  • Interference source localization with high accuracy and time efficiency is of crucial importance for protecting spectrum resources. Due to the flexibility of unmanned aerial vehicles (UAVs), exploiting UAVs to locate the interference source has attracted intensive research interests. The off-the-shelf UAV-based interference source localization schemes locate the interference sources by employing the UAV to keep searching until it arrives at the target. This obviously degrades time efficiency of localization. To balance the accuracy and the efficiency of searching and localization, this paper proposes a multi-UAV-based cooperative framework alone with its detailed scheme, where search and remote localization are iteratively performed with a swarm of UAVs. For searching, a low-complexity Q-learning algorithm is proposed to decide the direction of flight in every time interval for each UAV. In the following remote localization phase, a fast Fourier transformation based location prediction algorithm is proposed to estimate the location of the interference source by fusing the searching result of different UAVs in different time intervals. Numerical results reveal that in the proposed scheme outperforms the state-of-the-art schemes, in terms of the accuracy, the robustness and time efficiency of localization.
  • With the development of the wireless communication technology and Internet of things (IoT), the number of frequency devices such as wireless communication terminals and the intelligent network terminals is accelerating[1-3]. However, due to the openness of spectrum access[4], the number of interference sources that can illegally occupy spectrum resources is also accelerating and have brought grave implications to many fields[5], such as broadcast channels. Therefore, the demand of an efficient and accurate search and localization method to locate interference source is soaring[6,7]. However, due to the lack of the interference’s priori knowledge and the unknown but dynamic surroundings, e.g. random background noise, the traditional interference source localization methods need fine-grained search in a huge space which result in inefficient localization[8-10].

    Fortunately, unmanned aerial vehicle (UAV) based interference source localization is promising to tackle this issue due to UAVs’ unique characteristics[11]. As illustrated in Fig.1, compared with ground-based methods, UAVs can be easily deployed almost everywhere and anytime, due to their ability of flexibly move at higher altitude and thereby less affected by the obstacles. Moreover, the localization can be more accurate and reliable since some signal processing devices and sensors, e.g., electronic scanning antennas, carried by UAVs, suffer less multipath interference. Hence, they are capable of performing effective interference source localization. Therefore, in this paper, we consider a scenario where UAVs are applied to locate the interference source. In order to achieve efficient data acquisition, UAVs are equipped with a electronic scanning antenna that can measure power values of received signals in various horizontal and vertical directions.

    Figure  1.  Comparisons between the UAV-based interference source localization and the ground-based interference source localization

    However, UAV-based interference source localization still suffers from several challenges. Firstly, since the UAVs are powered by battery with the energy limitations, single UAV cannot cover a long-range localization and realize complex computations[12]. Meanwhile, fully autonomous localization method is needed for UAV-based interference source localization. Due to the complex flying environment and dynamic frequency environment, using ground controllers to manually control the UAVs is difficult to adjust UAVs’ flying directions in time. Thirdly, no priori knowledge of the environments and the interference source can be acquired before the localization task. Therefore, it remains challenging to achieve UAV-based interference source localization with a large cover area, a high accuracy and a low energy cost.

    Recently, some works have focused on the UAV-based interference source localization. However, they are unsuitable for remote interference source localization since those methods fail to simultaneously consider the demand of a large cover area, a high accuracy and a low energy cost. Meanwhile, many methods requires the UAVs to approach the interference source which may cost energy waste.

    Against this background, this paper proposes a novel multi-UAV-based cooperative framework for effective interference source localization. Based on the proposed framework, a novel collaborative search and localization (CSL) scheme is proposed. The framework allows synthesizing individual decisions within a swarm of UAVs to address the mentioned challenges while the CSL scheme uses multimodal reinforcement learning and fast Fourier transformation (FFT) to achieve effective remote localization. The contributions of this paper are listed as follows.

    1) We propose a novel multi-UAV-based cooperative framework to achieve accurate and energy efficient interference source localization in a large area. In order to benefit from the cooperate of multiple UAVs, the framework divides the localization problem into two alternatively performed phases: an RL-based searching phase and a localization phase which can achieve remote location prediction. Meanwhile, we formulate the search phase as a novel multimodal Markov decision process (M-MDP), in which we consider the dynamic environment in reality.

    2) A novel CSL scheme based on a low-complexity Q-learning algorithm and an FFT based location prediction algorithm is proposed for a swarm of UAVs to locate an interference source. For the search phase, a low-complexity Q-learning algorithm is proposed. During this phase, each UAV in the swarm can decide its own direction of searching through the proposed Q-learning algorithm. In the localization phase, a novel FFT-based location prediction algorithm is proposed to find out the potential location of the interference source, by comprehensively denoising different UAVs’ predictions in different time intervals.

    3) In-depth simulation results are presented to demonstrate the performance of the proposed method. The effectiveness of the proposed CSL scheme is confirmed based on the visualization result of UAVs’ trajectories and the predicted locations. Numerical results show that compared to the baseline schemes, the proposed scheme is more time-efficient and can achieve higher accuracy. It is also shown that the proposed scheme can accurately localize the interference source under low signal to noise ratio (SNR).

    The remainder of this paper is organized as follows. The recent works are discussed in Section II. Section III presents the system model and problem formulation. In Section IV, we describe the proposed CSL scheme in detail. Simulation results are presented and analyzed in Section V. Conclusions are drawn in Section VI.

    Recently, UAVs are studied in many scenarios, such as disaster rescue and UAV-based communication[13,14]. Meanwhile, UAV-based anti-interference technologies are studied due to their unique abilities. In Ref.[14], the authors proposed a interference source localization method which is achieved by a UAV with an angle of arrival (AOA) array antenna and a ground-based beacon transmitter. However, when there’s no reference stations on the ground, unknown changes of antenna element positions will influence the accuracy of localization. Since single UAV cannot cover a long-range localization due to the energy limitations, attempts were made to employ multiple UAVs for a collaborative search and localization of interference sources. In Ref.[15], a received signal strength (RSS) value based localization method using multi-UAV is proposed. However, such a method is applicable only when transmit power and propagation parameters of the interference source are known.

    Since the interference source in the real world often remains random, little priori knowledge of the interference source and the environment can be acquired before the localization begins. Therefore, methods that do not require priori knowledge are studied, including the pre-path methods and the reinforcement learning methods. In Ref.[16], the authors proposed a pre-path method where the search area are divided into different cells and the UAV decides the optimal one as the interference source’s location. However, pre-path methods are not suitable for large area search since the interference source only exists in a small area which causes larger number of redundant way points.

    Instead of pre-path planning, reinforcement learning[17,18] is able to exploit samples and function approximation to optimize performance in dynamic environments. Recently, some researches studied reinforcement-learning-based interference source localization with UAVs[19,20]. In these works, a single UAV is exploited to locate interference source in unknown dynamic environments. Such reinforcement-learning-based searching methods require the UAV to keep searching until it arrives at the target. They can achieve high localization accuracy but will heavily increase energy consumption while degrading time efficiency. Multiple UAVs can enhance the searching range by collaborative search and localization[21,22].

    In Ref.[21], the authors propose a collaborative search and localization based on reinforcement learning and clustering algorithm. Such method can achieve time efficient localization but has lower accuracy compared to single-UAV-based methods. In Ref.[23], the method achieves higher accuracy by proposing a deep reinforcement learning algorithm. However, due to the high computational cost of the deep reinforcement learning algorithm, such method is not suitable for UAVs equipped with limited computing equipment and power source. Therefore, a multi-UAV collaborative search and localization approach which is both accurate and energy-efficient in complex environments is of the focus of this paper[24].

    In this paper, we aim at addressing the problem of multi-UAV-based efficient interference source localization without a priori knowledge of the interference source model or the noise model. As is shown in Fig.2, the multi-UAV-based cooperative framework for remote interference source localization is proposed where N UAVs, denoted as N={1,2,3,,N}, are employed as a swarm to collaboratively search and locate an interference source on the ground whose location is pT=(xT,yT,0).

    Figure  2.  The multi-UAV-based cooperative framework for remote interference source localization

    In the proposed framework, each UAV is equipped with an electronic scanning antenna for interference source localization. In order to improve efficiency, the swarm of UAVs conduct reinforcement-learning-based searching and remote localization iteratively. Specifically, the localization process contains m time intervals, denoted M={1,2,3,,m} and each time interval contains three slots, namely, data acquisition (slot 1), action selection (slot 2) and location prediction (slot 3). The reinforcement-learning-based searching is achieved in slot 2 and remote localization is achieved in slot 3. In order to achieve remote localization in slot 3, a cluster head UAV is defined to acquire the search results of all other UAVs while compute the localization results and decide whether the localization is succeed.

    In the studied scenario, each UAV is equipped with a three-dimensional electronic scanning antenna to measure power of receive signals from horizontal, as well as vertical, directions. Each electronic scanning antenna is able to measure u horizontal directions {θ1,θ2,θ3,,θu1,θu} and v vertical directions {φ1,φ2,φ3,,φv1,φv}, as shown in Fig.3.

    Figure  3.  The detection range of an electronic scanning antenna

    Therefore, the power measured by UAV Ui can be defined as:

    D(i)k=g(O(i)θk,φj),k{1,2,3,,u},j{1,2,3,,v} (1)

    where O(i)θk,φj is the raw data directly sensed by the antenna at horizontal angle θk and vertical angle φj, g() is a function which turns the raw data O(i) into the preprocessed data D(i) for ease of further inference.

    The time-varying coordinate of a certain UAV Ui at time interval j1 can be designated as (x(i)j1,y(i)j1,z(i)j1). The initial coordinate of a certain UAV Ui (iN) can be written as (x(i)0,y(i)0,z(i)0). We assume that each UAV flies in a straight line between two adjacent time intervals, and stays at a constant altitude z(i)0. Hence, the position of Ui at time interval j can be expressed as

    x(i)j=x(i)j1+l(i)jcosλ(i)j (2)
    y(i)j=y(i)j1+l(i)jsinλ(i)j (3)
    z(i)j=z(i)0 (4)

    where l(i)j represents the horizontal distance between horizontal coordinates (x(i)j1,y(i)j1) and (x(i)j,y(i)j); λ(i)j is the angle between the x-axis and the direction UAV Ui flies towards. In this paper, we assume that l(i)j is equal to l i,j.

    Therefore, the action each UAV needs to choose is their own flight direction. At time interval j, UAV Ui aims at choosing the optimal action based on the received power set D(i)j, which can be expressed as:

    aij=fr(D(i)j,s(i)j,θij) (5)

    where aij[1,u] and sij[1,u] represent the action and state of UAV Ui, respectively. θij is the reinforcement learning method fr’s parameters corresponding to Ui. The action selects direction of flights from a set of u possible flight directions {θ1,θ2,,θu}.

    As is shown in Fig.4, since UAVs conduct 3D sensing, the maximum-power direction corresponding to the current flight direction can be achieved. The extension of such direction will intersect with the interference source’s plane. The intersection essentially approximates a possible position of the interference source. In this paper, we study the scenario where there is only one interference source. Intuitively, in the ideal case, the intersection represents the interference source’s location. Therefore, the intersections of different UAVs can be used for remote location prediction.

    Figure  4.  Intersection of UAV’s trajectory at time interval j

    After the flight direction ηi is chosen at time interval j, UAV Ui flies in its direction for a distance of l. The maximum power value received on horizontal angle ηi can be expressed by ωi. Moreover, since the intersection and the interference source are in the same plane, the intersection can be expressed in a 2D coordinate form for ease of computation. Therefore, the intersection p(i)j of the interference source’s plane and the maximum power’s direction at time interval j can be obtained by

    p(i)j=(z(i)0tanηicosωi+x(i)j1,z(i)0tanηisinωi+y(i)j1) (6)

    However, due to the noise, multi-path effect, etc., there can be a large number of intersections and may have large deviation from the interference source. In order to reduce the deviation and make precised prediction, intersections of different UAVs’ trajectories and time intervals are considered. Therefore, precised prediction ˆp is achieved by the location set L={p(i)j}(i,k[1,n],i<k,j), given as:

    ˆp=fl(L) (7)

    where L is the intersection set, fl is the prediction process and l is the predicted location.

    In the proposed framework, the problem of locating an interference source is divided into two parts: the searching phase and the localization phase. The searching phase aims at finding the optimal trajectories for each UAV, while the location phase aims at achieving the optimal predicted location based on UAVs’ trajectories.

    In the search phase, changes in measured power of the UAVs’ received signals can indicate whether the distance between the UAV and the interference source is reduced. Therefore, for each UAV, in order to find out the position of the interference source, the direction of flight in each time interval can be selected according to measured power. Such a search problem can be equivalent to maximizing the expected long-term measured power, which can be formulated as the discounted sum of all future rewards (i.e. measured power) at current time interval t. Considering the dynamic environment, such a problem can be modeled as a multimodal Markov decision process (M-MDP) problem, which aims at finding the optimal policy π(i) for each UAV Ui. Different from the MDP problem which is a tuple <S,A,r,P,γ>[25,26], the M-MDP problem consists of six elements <S,A,r,E,P,γ> as follows:

     S is the state set;

     A is the action set;

     r is the reward function;

     E is the modality set;

     P is the transition probability matrix;

     γ[0,1] is a discount factor.

    For the interference source localization problem, the UAVs attempt to take more accurate actions when approaching the target for higher RSS. Under this circumstance, the state set and the action set can be adjusted to reduce the redundancy states and actions. Therefore, the modality set is introduced. For certain time interval, the modality can be calculated by the long term state memory M(S1,,St1), given as:

    et=argmaxeP[e|M(S1,,St1)],s.t.eiE (8)

    where for a given time interval i, the UAV’s modality is the maximum probability of transferring to e based on the state memory.

    When et is defined, its corresponding state set and action set can be expressed as SetS and AetA, respectively. Therefore, as is shown in Fig.5, the search space is reduced and the policy πe has higher possibility to select the optimal action. Meanwhile, the reward function can also be adjusted to r(i)e.

    Figure  5.  The reduced search space of the proposed M-MDP

    The goal of the agent is to maximize the expectation of discounted sum of all future measured power under different modalities, given as

    P1:maxπ(i)Eπ(i){limTTt=0γt.r(i)e(s(i)e,a(i)e)} (9)

    where the policy π(i) indicates UAV Ui’s mapping from its state space S(i) to action space Ai under different modalities, s(i)eS(i)e and a(i)eA(i)e denote the state and action selected at the current modality eE, respectively. r(i)e is the reward function constraint to the current modality.

    The localization phase can be equivalent to minimize the horizontal distance between the predicted location and the actual location, given as:

    P2:min(ˆppT22) (10)

    where pT is the horizontal coordinate of the interference source’s location and ˆp is the predicted location.

    Based on the proposed multi-UAV-based remote localization framework, we design the CSL scheme where the search phase is based on a lox-complexity Q-learning algorithm while the localization phase is achieved by a FFT-based location prediction algorithm.

    As demonstrated in Fig.6, in the search phase of every iteration (i.e. each time interval), each UAV individually decides the direction of flight in the next time interval by performing reinforcement learning. To find out the potential position of the interference source, the intersections of UAVs’ trajectories are processed by the FFT-based clustering algorithm during the localization phase. The FFT-based clustering algorithm first reduces the noise among the candidate intersections by FFT and calculate the predicted location based on the denoised intersections. The search phase and the localization phases will be implemented until the estimated position meets the terminating condition.

    Figure  6.  The flowchart of the proposed collaborative search and localization scheme

    In order to achieve efficient reinforcement-learning-based search, a lox-complexity Q-learning algorithm is proposed based on the M-MDP problem. As is shown in Fig.7, the proposed algorithm consists of the data acquisition unit, the modality recognition unit, the reward function unit, the Q-table update unit and the action selection unit. In order to achieve lox-complexity and high-efficient searching, the data acquisition unit and the modality recognition unit determine the current modality e to dynamically adjust the following units.

    Figure  7.  The architecture of the proposed lox-complexity Q-learning algorithm

    1) Data acquisition unit

    Considering the random disturbance of power of receive signals, for each direction, the antenna performs sampling for N times and the acquired data can be defined as

    D(i)j,k=1NNc=1O(i)θk,φj(c)max(O(i)(c)) (11)

    where the raw data O(i)θk,φj acquired by the antenna is normalized as the acquired data by the UAV. D(i)j,k is the acquired data at the horizontal angle θk and the vertical angle φj the current state.

    2) Modality recognition unit

    Based on the acquired data, the modality of each UAV can be determined. Although the environment changes dynamically in the searching phase, the noise always ranges in a fixed interval while signal power of the interference source will get larger as the distance getting smaller. Thus, the classification basis of different modalities are identified by the UAV’s acquired data, given as:

    e=E(m),s.t.h(m)(D(i))E(D(i))<h(m+1) (12)

    where E() indicates the average function, () is the variance function. The current modality e is defined by the threshold function h which maps the number of modality to the variation coefficient of the acquired data (i.e., (D(i))/E(D(i))).

    3) States and actions of UAV

    In the proposed low-complexity Q-learning algorithm, the action a(i){1,2,3,,u} selects direction of flights from a set of u possible flight directions corresponding to the detection directions of the antenna shown Fig.3. The state s(i) is the direction selected by UAV Ui in the previous time interval, e.g., state s(i) is the action a(i) in last time interval, given as

    s(i)=a(i) (13)

    4) Reward function controlled by multimodal recognition unit

    In original Q-learning algorithm, the update function of reward table is fixed based on the state and action. However, when the change of modality is considered, adjustments need to be done to the original update function of the reward. Specifically, the update range will be adjusted based on the current modality, given as

    μ=uεe (14)

    where μ is the update range, εe is a discount factor based on the current modality. Therefore, higher efficiency is achieved by partially updating the reward function. Furthermore, in this paper, the improved reward function is given as:

    r(i)e(s(i),a(i))={max(D(i)a(i),:)max(D(i):,:),a(i)[a(i1)μe2,a(i1)+μe2]0,otherwise (15)

    where if the potential action a(i) is within the update range, the reward r(i)e(s(i),a(i)) equals to the ratio of the maximum power on the current horizontal direction max(D(i)a(i),:) to the maximum power of all directions max(D(i):,:). The update range is centered by the current state, since it is the most likely direction of the interference source

    5) Q-learning update and action selection

    In this paper, in order to accelerate the convergence of Q-learning, we simultaneously update action values Q(i)(s(i),:) with various actions for a given state s(i). Hence, different from the conventional Q-learning, the update rule is given as

    Q(i)(s(i),:)Q(i)(s(i),:)+α[r(i)(s(i),:)+γQ(i)(s(i),:)Q(i)(s(i),:)] (16)

    where Q(i)(s(i),:) collects action values ranging from Q(i)(s(i),θ1) to Q(i)(s(i),θu), which are the action values with current state s(i) and all possible actions. Similarly, Q(i)(s(i),:) collects qualities of actions w.r.t. the previous state s(i).

    Once action values Q(i)(s(i),:) is updated, each UAV selects the optimal action by performing:

    ˆa(i)=argmaxa(i)Qi(s(i),max(s(i)μe2,0):min(s(i)+μe2,u)]) (17)

    where the selection range is corresponding to the update range in (16).

    After action ˆa(i) is chosen at time interval j, each UAV flies in its direction θˆa(i) for a distance of le which is defined by the current modality. The maximum vertical direction φ(i)j can be achieved, given as:

    φ(i)j=argmax(D(i)ˆa(i),:)π2v (18)

    where π/2v is the sampling interval on the vertical angles. The intersections can be achieved based on (6), given as

    p(i)j=(z(i)0tanφ(i)jcosθˆa(i)+x(i)j1,z(i)0tanφ(i)jsinθˆa(i)+y(i)j1), (19)

    The intersection will be added into the intersection set L={p(i)j}(i[1,n],j). The proposed low-complexity Q-learning algorithm is summarized in Algorithm 1.

    Algorithm 1 The proposed collaborative localization scheme

    1: Initialize the number of UAVs n;

    2: Initialize flight direction set d={1,2,3,u};

    3:   Initialize learning rate α, discount factor γ and step length l;

    4   Set iteration number j=1, operation command flag0=false;

    5: For i=1n

    6: Initialize UAV Ui’s position (x(i)0,y(i)0,z(i));

    7: Obtain current environment by Eq.(11);

    7: Obtain current modality e(i) by Eq.(12);

    8:    Initialize Q(i)e(:,:) and r(i)e(:,:) to 0 (i.e. a zero matrix);

    8: Select an initial state s(i)0=randi(d);

    9: End For

    10: Repeat

    11: For i=1n

    12: e0=e(i);

    13: Obtain current modality e(i)by Eq.(12);

    14: If e0e(i)

    15: Update μi,l(i)e

    16: End If

    17:      Obtain measured values and update reward table rie according to Eq.(15);

    18: Update action values according to Eq.(16);

    19: Select action by Eq.(17);

    20: Update current state s(i)ˆa(i);

    21:      Update Ui’s position according to Eq.(1) and Eq.(2);

    22:     Compute Ui’s intersection according to Eq.(18) and Eq.(19);

    23: End For

    24: j=j+1;

    25: Obtain the intersection set L;

    26: do FFT-based location prediction (which is shown in Algorithm 2);

    27: Obtain operation command flag0;

    28: Until flag0=1

    As is shown in Fig.8, we propose the FFT-based location prediction algorithm to achieve efficient remote localization of the interference source. The FFT-based location prediction algorithm contains three units. The direct localization unit directly presents a prediction based on the current result of the low-complexity Q-learning algorithm. The FFT-based denoising unit aims at reducing the bias in a sequential predictions presented by the former unit. The stability-based stopping strategy aims at deciding whether the proposed CSL scheme terminates.

    Figure  8.  The architecture of the proposed FFT-based location prediction algorithm

    1) Direct localization based on searching result

    The proposed FFT-based location prediction algorithm for UAV Ui is performed when its current reward r(i)e(s(i),:) satisfies a constraint in the variation coefficient:

    (r(i)e(s(i),:))E(r(i)e(s(i),:))>λ (20)

    Since the reward r(i)e(s(i),a(i))[0,1], the variation coefficient (r(i)e(s(i),:))/E(r(i)e(s(i),:))[0,1] holds. Therefore, given a small value of λ[0,1], a certain r(i)e(s(i),:) satisfying (20) means that r(i)(s(i)j,:) suffers less noise or multi-path interference. As a consequence, the direction (i.e. action ˆa(i)) of flight obtained by the reinforcement-learning-based searching algorithm can be aligned with the direction of the interference. Therefore, at a certain time interval j, only in the case when r(i)e(s(i),a(i)) satisfies (20), can the intersection p(i)j be reliable for localization.

    In each time interval, an estimate of the position of the interference source is obtained. At a certain time interval j, the estimate can be achieved by

    pj=1|C|(i)Cp(i)j (21)

    where C denotes the set of all intersections that satisfy the Eq.(20) at time interval j, pj is the direct predicted location by the swarm of UAVs.

    2) FFT-based denoising of the predicted locations

    Due to the noise, multi-path effect, etc., the direct prediction is an approximate representation of the interference source’s location which includes certain disturbance and can be expressed as

    pj=(xpT+xjb,ypT+yjb) (22)

    where xjb and yjb denote the disturbance on the X axis and Y axis. In this way, predictions can be achieved by reducing the disturbance on the X axis and Y axis. Then, a queue K={pjχ+1,pjχ+2,,pj} containing χ elements is defined and the disturbance is reduced by FFT[27,28]. Since the disturbance on the X axis and that on the the Y axis are mutually orthogonal, the reduction of disturbance is divided into 2 parts, denoted as the X-axis reduction and the Y-axis reduction, respectively, to get higher accuracy. Each part contains an FFT process to convert the prediction queue from the the time domain to the frequency domain. A cut-off number is defined to filter exceptional data and an Inverse FFT (IFFT) process is conducted to recover the queue from the frequency domain. The FFT process can be expressed as:

    xFk=χn=0xpjn+1ej2πk/N (23)
    yFk=χn=0ypjn+1ej2πk/N (24)

    where xpjn+1 denotes the horizontal ordinate of pjn+1, ypjn+1 denotes the ordinate of pjn+1. (xFk,yFk) denotes the transformed coordinate. Let XF={xF1,xF2,,xFχ} and YF={yF1,yF2,,yFχ}, the cut-off numbers of the X-axis reduction and the Y-axis reduction are denoted as 14max(|XF|) and 14max(|YF|), respectively. XF and YF are then filtered by their cut-off number. For xFkXF,the filtered data xFk can be expressed as

    xFk={xFk,xFk>14max(|XF|)0,otherwise (25)

    Similarly, for yFkYF,the filtered data yFk can be expressed as

    yFk={yFk,yFk>14max(|YF|)0,otherwise (26)

    Afterwards, the recover process is achieved by IFFT and can be expressed as

    xpjn+1=1χχk=0xFkej2πk/N (27)
    ypjn+1=1χχk=0yFkej2πk/N (28)

    where pj+n1=(xpjn+1,ypjn+1) is the denoised coordinate. The denoised intersection queue K is then achieved, where K={pjχ+1,pjχ+2,,pj}. We then compute the centroid of K by performing

    pcj=1χχn=1pj+n1 (29)

    where pcj denotes the center of K at time interval j. Hence, pcj is the predicted location at time interval j.

    3) The stability-based stopping strategy

    As is shown in P2(see Eq.(10)), the ideal stopping strategy is to estimate the deviation between the predicted location and the location of the interference source. However, since the location of the interference source is unknown, we select pcj as the final predicted location of the interference source when PTj converges over time intervals. In this work, if the standard deviation σ of the latest three predicted coordinates {pcj,pc(j1),pc(j2)} satisfies

    σ({pcjpc(j1),pcjpc(j3),pc(j1)pc(j2)})β (30)

    then the collaborative search and localization terminate, and pcj is output as the predicted location of the interference source. Meanwhile, the stop sign flag0 will be set to 1 and broadcast to other UAVs. When the UAVs receives flag0=1, they will terminate their search phase and the cluster head UAV will send the predicted location to the ground center.

    The proposed FFT-based location prediction is summarized in Algorithm 2.

    Algorithm 2 The proposed FFT-based location prediction

    1:  Obtain the result of the low-complexity Q-learning: intersection set L;

    2:  Obtain current iteration number j and operation command flag0;

    3: Initialize flag1=0, C=;

    4: For i=1n

    5: If p(ji) satisfies Eq.(20) Then

    6: C=Ci;

    7: flag1=1;

    8: End If

    9: End For

    10: If flag1 Then

    11: Compute pj by Eq.(21)

    12: Pop pjχ from K and push pj;

    13:    Transform K into frequency domain by Eq.(23) and Eq.(24);

    14: Filter the transformed data by Eq.(25) and Eq.(26);

    15:     Achieve the denoised queue K by Eq.(27) and Eq.(28);

    16: Pop pjχ from K and push pj;

    17: Compute the centroid of K, pcj, by Eq.(29);

    18: End If

    19: If pcj satisfies Eq.(30) Then

    20: flag0=1;

    21: End If

    1) Simulation settings

    In the simulations, we consider one interference source located at (5000, 2885, 0) (m) with transmit power of 20 W and a swarm of three UAVs. Initial positions of the UAVs are set as (0, 0, 200) (m), (5100, 8660, 190) (m) and (10000, 0, 190) (m), respectively. Each UAV is equipped with an electronic scanning antenna which can explore u=36 directions and the interval between each direction is π/18. The antennas have the same radiation characteristic and is given by

    F(θ,φ)=cos(πcos2φ+π2)sin(φ+π2)cos(π4sin(θπ2)sin(φ+π2)+π4) (31)

    Then, the receive gain GR(θ) of each electronic scanning antenna for the horizontal angle θ and elevation angle φ is represented as follows:

    GR(θ)=4πηF2(θ)π202π0F2(θ)sinφdφdθ (32)

    where θ[0,2π) and the antenna efficiency β = 1. Thus, received power at each antenna can be obtained as

    O(θ)=PTGTGR(θ)λ2(4π)2d2L+n2 (33)

    where OT and GT represent transmit power of the interference source and transmit antenna gain, respectively. The wave length λ is set as 3 m; the loss factor L is set as 1; n2 stands for power of noise signal, which is a random variable with an average value of −38 dBm.

    In Algorithm 1, three modalities are defined E={1,2,3}, given as

    e={1,(D(i))E(D(i))<0.32,0.3(D(i))E(D(i))<0.73,(D(i))E(D(i))0.7 (34)

    where (D0)/E(D0) is a constant indicating the maximum initial variation coefficient of the swarm of UAVs. With the increase of e, the received signal power of the UAV also increases, indicating that the UAV is approaching the interference source. Therefore, the discount factor of the update range in (14) can be written as

    εe={1,e=10.7,e=20.5,e=3 (35)

    which means that the UAVs explore less direction when getting closer to the interference source in order to balance the searching time and accuracy. Meanwhile, the step length le of the UAV is given by

    le={10,e=120,e=215,e=3 (36)

    In the FFT-based location prediction, λ in Eq.(20) is set as 0.2, and queue K’s length χ is set as 150.

    2) Performance metrics

    We propose current reward ratio (CRR) in order to measure the convergence of different schemes, given as

    CRR=r(i)e(s(i),a(i))max(r(i)e(s(i),:)) (37)

    where max(r(i)e(s(i),:)) denotes the max reward value at the current state. Meanwhile, we assume that each UAV in the swarm consumes 1 unit power per meter when flying and consumes 5 unit power per second when conducting inference. The total power Pall consumed by a UAV is given as

    Pall=lall+0.5tin (38)

    where lall is the total flight length and tin is the total inference time.

    In this paper, we compare our proposed methods with three benchmark schemes, namely the SCAN scheme[16], the directional Q-learning scheme[20] and the RL-TC scheme[21].

    The SCAN method is a SOTA pre-path scheme. As is shown in Fig.9, the SCAN scheme divides the region into equal cells and the center of each cell is considered as a waypoint. The UAV will observe the environment at the waypoints and define whether there exists the interference source in the corresponding cell. In this paper, the SCAN scheme first divides the area into 200×200 cells. When the UAV finds the optimal cell, the SCAN scheme then divides the cell into 20×20 sub-cells. The UAV will report the optimal sub-cell as the interference source’s location.

    Figure  9.  The SCAN method’s search policy

    The directional Q-learning scheme achieves single UAV-based interference source localization based on the reinforcement learning. The authors modified the rule for updating the quality of the state-action combinations in the reinforcement learning and the multiple potential flight directions of the UAV can be simultaneously evaluated.

    The RL-TC scheme achieves remote interference source localization based on multiple UAVs. The RL-TC scheme uses a low-complexity RSS-based reinforcement learning method at the search phase in order to decrease the computational time. At the localization phase, a two-stage clustering algorithm is proposed in order to reduce the affect of the singular points.

    Fig.10 depicts UAVs’ trajectories achieved by different approaches. The SCAN method can locate the interference source but has the longest path. The proposed algorithm and the RL-TC scheme can locate the interference source from a distance. However, the proposed algorithm enables the UAVs to realize localization from a distance of 2.9 km while the RL-TC scheme requires the UAVs to get closer but achieves lower localization accuracy. The directional Q-learning algorithm can achieves higher accuracy at the cost of time efficiency which requires the UAVs to approach the interference source.

    Figure  10.  UAVs’ trajectories achieved by different approaches

    Fig.11 investigates the convergence of different reinforcement-learning-based schemes. Compared with other RL-based methods, the proposed algorithm has the highest convergence speed, which can converge after 600 time intervals and locate the interference source after 640 time intervals. Therefore, it can be concluded that the proposed algorithm selects the optimal action at each time interval more stably since the modality recognition unit enables the UAVs to focus on the most possible directions of the interference source based on their previous knowledge. The directional Q-learning algorithm has similar convergence performance (about 800 time intervals) compared to the RL-TC scheme since the search phase of the RL-TC scheme and the directional Q-learning scheme use similar algorithms. However, the RL-TC scheme takes less time intervals to locate the interference source compared with the directional Q-learning scheme due to the ability of the remote localization. Although the RL-TC scheme and the proposed scheme take similar time intervals to locate the interference source after the RL methods converge, the proposed scheme reduces the computational complexity of the localization scheme from O(n2)[29] in the RL-TC scheme to O(nlog(n))[30].

    Figure  11.  The convergence performance of different schemes

    As is shown in Fig.12, we evaluate the relationship between the number of the UAVs’ flight directions and the localization accuracy and efficiency. 19 different direction numbers are evaluated, i.e, Ref.[18]. Since the proposed scheme predicts the location by long-term search results from different UAVs, the localization accuracy is not heavily influenced by the change of flight directions and the bias can remain in 30 m. However, the efficiency is heavily influenced by the flight directions. While the UAVs only need 403 time intervals to locate the interference when they can fly in 36 directions, they can consume up to 758 time intervals when the direction number decreases. It can be concluded that, when the search phase is coarse-grained, i.e. the flight directions are fewer, the localization phase needs more search data to achieve accurate localization and therefore consumes more time intervals.

    Figure  12.  The influence of the number of the flight directions on the efficiency and accuracy

    As is shown in Fig.13, the effectiveness of the proposed FFT-based location prediction algorithm is investigated. The singular points in the direct predictions of the UAV swarm are effectively reduced by FFT. The result of the proposed prediction algorithm can achieve the high accuracy and the localization bias is within 25 m.

    Figure  13.  The effectiveness of the proposed FFT-based location prediction (the noise is −28 dBm)

    As is shown in Table 1, the time efficiency and power efficiency of different schemes under a high SNR are compared. Compared with the Rl-based schemes, the SCAN scheme has the fastest inference time at each time interval, which only need 0.01 seconds, but consumes the most power due to the longest flight distance. Due to lack the ability of remote localization, the directional Q-learning scheme takes up to 1267 time intervals to locate the interference source. The RL-TC scheme consumes the most time for inference due to the high computational complexity of the two stage clustering algorithm. However, since the RL-TC scheme can locate the interference source from a distance, the total power consumption is less than that of the directional Q-learning scheme. The proposed scheme needs more inference time than the directional Q-learning scheme but achieves the least total time consumption and power consumption due to lower computational complexity of the FFT-based remote localization algorithm.

    Table  1.  The efficiency of different reinforcement-learning-based schemes under the noise of −28 dbm
    Inference time per interval (s) Total time intervals Total power consumption
    The SCAN scheme 0.010 885 175268.50
    The directional Q-learningscheme 0.022 1267 38428.11
    The RL-TC scheme 0.041 814 24920.61
    The proposed scheme 0.034 633 19957.83
     | Show Table
    DownLoad: CSV

    We compare the localization accuracy of different schemes under different noise conditions in Fig.14. The SCAN method achieves the most stable performance under all SNRs due to the pre-path planning, while the RL-based methods’ localization biases decrease by the SNR. The directional Q-learning scheme has the highest accuracy since it can approach the interference source to realize the localization. Since the proposed scheme employs the 3D single feature and has better convergence performance, it achieves similar localization accuracy with the directional Q-learning scheme when the noise is below −27 dBm while the accuracy difference increases when the noise exceeds −28 dBm. The RL-TC scheme achieves the lowest accuracy because it achieves remote localization by the 2D trajectories which varies in low SNR condition.

    Figure  14.  The localization bias of different schemes under different noise conditions

    In this paper, a multi-UAV-based remote localization framework is proposed for interference source localization. In order to achieve higher efficiency, the localization of an interference source is innovatively divided into two procedures. Based on the proposed framework, a CSL scheme is proposed that can simultaneously achieve efficient searching and accurate remote localization. At the searching phase, each UAV can independently explore the environment and efficently search for the interference source by the proposed low-complexity Q-learning algorithm. The results of each UAV are collaboratively analyzed in the localization procedure. At the localization phase, a novel FFT-based location prediction algorithm is presented in order to achieve accurate localization. The simulation results confirm the time efficiency and the localization accuracy of the proposed scheme.

    For future work, more improvement will be done about the multi-UAV based collaborative search and localization. The situation of multiple UAV swarms and multiple interference sources will be studied.

  • [1]
    W. Saad, M. Bennis, and M. Chen, “A vision of 6G wireless systems: Applications, trends, technologies, and open research problems,” IEEE Network, vol.34, no.3, pp.134–142, 2020.
    [2]
    E. Khorov, A. Krasilov, I. Selnitskiy, et al., “A framework to maximize the capacity of 5G systems for ultra-reliable low-latency communications,” IEEE Transactions on Mobile Computing, vol.20, no.6, pp.2111–2123, 2021.
    [3]
    Q. Wu, G. Ding, Y. Xu, et al., “Cognitive internet of things: A new paradigm beyond connection,” IEEE Internet of Things Journal, vol.1, no.2, pp.129–143, 2014.
    [4]
    Y. Xu, A. Anpalagan, Q. Wu, et al., “Decision-theoretic distributed channel selection for opportunistic spectrum access: Strategies, challenges and solutions,” IEEE Communications Surveys & Tutorials, vol.15, no.4, pp.1689–1713, 2013.
    [5]
    J. Q. Wu, G. Ding, J. Wang, et al., “Spatial-temporal opportunity detection for spectrum-heterogeneous cognitive radio networks: Two-dimensional sensing,” IEEE Transactions on Wireless Communications, vol.12, no.2, pp.516–526, 2013.
    [6]
    L. Wang and Y. Huang, “UAV-based estimation of direction of arrival: An approach based on image processing,” 2020 International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, pp.1165–1169, 2020.
    [7]
    F. Lemic, J. Büsch, M. Chwalisz, et al., “Infrastructure for benchmarking RF-based indoor localization under controlled interference,” The International Conference on Ubiquitous Positioning, Indoor Navigation and Location-Based Services (UPINLBS'14), Corpus Christi, Texas, USA, pp.26–35, 2014.
    [8]
    J. Schuette, B. Fell, J. Chapin, et al., “Performance of RF mapping using opportunistic distributed devices, ” 2015 IEEE Military Communications Conference, Tampa, FL, USA, DOI: 10.1109/MILCOM.2015.7357677, 2015.
    [9]
    Q. Wu, F. Shen, Z. Wang, et al., “3D spectrum mapping based on ROI-driven UAV deployment,” IEEE Network, vol.34, no.5, pp.24–31, 2020.
    [10]
    G. Sun and J. van de Beek, “Simple distributed interference source localization for radio environment mapping,” 2010 IFIP Wireless Days, Venice, Italy, DOI: 10.1109/WD.2010.5657755, 2010.
    [11]
    G. Ding, Q. Wu, L. Zhang, et al., “An amateur drone surveillance system based on the cognitive Internet of things,” IEEE Communications Magazine, vol.56, no.1, pp.29–35, 2018.
    [12]
    D. J. Pack, P. DeLima, G. J. Toussaint, et al., “Cooperative control of UAVs for localization of intermittently emitting mobile targets,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol.39, no.4, pp.959–970, 2009.
    [13]
    Nan Zhao, Weidang Lu, Min Sheng, et al., “UAV-assisted emergency networks in disasters,” IEEE Wireless Communications, vol.26, no.1, pp.45–51, 2019.
    [14]
    X. Chen, M. Sheng, N. Zhao, et al., “UAV-relayed covert communication towards a flying warden,” IEEE Transactions on Communications, vol.69, no.11, pp.7659–7672, 2021.
    [15]
    H. Tsuji, D. Gray, M. Suzuki, et al., “Radio location estimation experiment using array antennas for high altitude platforms,” in Proc. of 2007 IEEE 18th International Symposium on Personal, Indoor and Mobile Radio Communications, Athens, Greece, pp.1–5, 2007.
    [16]
    F. B. Sorbelli, S. K. Das, C. M. Pinotti, et al., “Range based algorithms for precise localization of terrestrial objects using a drone,” Pervasive Mobile Comput., vol.48, pp.20–42, 2018. DOI: 10.1016/j.pmcj.2018.05.007
    [17]
    H. Bayerlein, P. De Kerret, and D. Gesbert, “Trajectory optimization for autonomous flying base station via reinforcement learning,” 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Kalamata, Greece, pp.1–5, 2018.
    [18]
    J. Gu, H. Wang, G. Ding, et al., “UAV-enabled mobile radiation source tracking with deep reinforcement learning,” 2020 International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, pp.672–678, 2020.
    [19]
    C. H. Liu, X. Ma, X. Gao, et al., “Distributed energy-efficient multi-UAV navigation for long-term communication coverage by deep reinforcement learning,” IEEE Transactions on Mobile Computing, vol.19, no.6, pp.1274–1285, 2020.
    [20]
    S. Wu, “Illegal radio station localization with UAV-based Q-learning,” China Communications, vol.15, no.12, pp.122–131, 2018.
    [21]
    G. Wu, “UAV-Based Interference Source Localization: A Multimodal Q-Learning Approach,” IEEE Access, vol.7, pp.137982–137991, 2019.
    [22]
    J. Tisdale, A. Ryan, Zu Kim, et al., “A multiple UAV system for vision-based search and localization,” 2008 American Control Conference, Seattle, Washington, USA, pp.1985−1990, 2008.
    [23]
    K. Maeda, S. Doki, Y. Funabora, et al., “Flight path planning of multiple UAVs for robust localization near infrastructure facilities,” IECON 2018 - 44th Annual Conference of the IEEE Industrial Electronics Society, Washington, DC, USA, pp.2522−2527, 2018.
    [24]
    Y. -J. Chen, D. -K. Chang, and C. Zhang, “Autonomous tracking using a swarm of UAVs: A constrained multi-agent reinforcement learning approach,” IEEE Transactions on Vehicular Technology, vol.69, no.11, pp.13702–13717, 2020.
    [25]
    W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley & Sons, Inc., 2011.
    [26]
    Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc., 1995.
    [27]
    A. Wahbi, A. Roukhe, and L. Hlou, “Enhancing the quality of voice communications by acoustic noise cancellation (ANC) using a low cost adaptive algorithm based Fast Fourier Transform (FFT) and circular convolution,” 2014 9th International Conference on Intelligent Systems: Theories and Applications (SITA-14), Rabat, Morocco, DOI: 10.1109/SITA.2014.6847310, 2014.
    [28]
    S. S. Agaian, Mei-Ching Chen, and C. L. P. Chen, “Noise reduction algorithms using Fibonacci Fourier transforms,” 2008 IEEE International Conference on Systems, Man and Cybernetics, Singapore, pp.1048–1052, 2008.
    [29]
    M. K. Pakhira, “A linear time-complexity k-means algorithm using cluster shifting,” 2014 International Conference on Computational Intelligence and Communication Networks, Bhopal, India, pp.1047–1051, 2014.
    [30]
    B. Farhang-Boroujeny and Y. C. Lim, “A comment on the computational complexity of sliding FFT,” IEEE Transactions on Circuits and Systems Ⅱ: Analog and Digital Signal Processing, vol.39, no.12, pp.875–876, 1992.
  • Related Articles

    [1]LI Chao, HU Zhijia. Low-Complexity Digital Equalizers for High-Speed Underwater Optical Wireless Communication[J]. Chinese Journal of Electronics, 2021, 30(6): 1167-1172. DOI: 10.1049/cje.2021.08.012
    [2]LI Peng, YU Xiaotian, XU He, WANG Ruchuan. Secure Localization Technology Based on Dynamic Trust Management in Wireless Sensor Networks[J]. Chinese Journal of Electronics, 2021, 30(4): 759-768. DOI: 10.1049/cje.2021.05.019
    [3]BAI Nana, GONG Siliang, SHI Haiyan, CHENG Zhen, ZHU Yihua. Improving Throughput of Communication Link in IEEE 802.15.4 Based Energy-Harvesting Wireless Sensor Network[J]. Chinese Journal of Electronics, 2019, 28(4): 841-849. DOI: 10.1049/cje.2019.03.003
    [4]SHEN Shikai, YANG Bin, QIAN Kaiguo, SHE Yumei, WANG Wu. On Improved DV-Hop Localization Algorithm for Accurate Node Localization in Wireless Sensor Networks[J]. Chinese Journal of Electronics, 2019, 28(3): 658-666. DOI: 10.1049/cje.2019.03.013
    [5]SONG Lizhong, FANG Qingyuan. A Conformal Conical Archimedean Spiral Antenna for UWB Communications[J]. Chinese Journal of Electronics, 2015, 24(2): 402-407. DOI: 10.1049/cje.2015.04.030
    [6]LIU Yong, PAN Quan, YU Hen Hu, LIANG Yan. Robust Acoustic Source Localization in Energy-stringent Sensor Networks[J]. Chinese Journal of Electronics, 2012, 21(2): 332-338.
    [7]SUN Yanqiang, WANG Xiaodong, ZHOU Xingming. Jammer Localization for Wireless Sensor Networks[J]. Chinese Journal of Electronics, 2011, 20(4): 735-738.
    [8]HUANG He, SUN Yu-e, XIAO Mingjun, XU Hongli, CHEN Guoliang. Collaborative Localization Method in Wireless Sensor Networks: A Game Theoretic Perspective[J]. Chinese Journal of Electronics, 2011, 20(1): 155-160.
    [9]DU Bing and ZHANG Jun. Parity Check Network Coding for WirelessCooperative Communications[J]. Chinese Journal of Electronics, 2010, 19(2): 339-344.
    [10]DONG Haijiang, QIU Xuesong, CHENG Lu, LI Zhiqing, QIAO Yan. Fast Fault Localization for Large-scale IP-based Communication Systems Using Bayesian Networks[J]. Chinese Journal of Electronics, 2009, 18(4): 735-740.
  • Cited by

    Periodical cited type(6)

    1. Zhang, M., Zhang, L., Cheng, H. et al. Adaptive and Load Balancing Ground Users Access Design for UAV-Assisted Networks. IEEE International Conference on Communications, 2024. DOI:10.1109/ICC51166.2024.10623041
    2. Zhou, L., Pu, W., Jiang, Y. et al. Joint Optimization of UAV Deployment and Directional Antenna Orientation for Multi-UAV Cooperative Sensing System. IEEE Transactions on Wireless Communications, 2024, 23(10): 14052-14065. DOI:10.1109/TWC.2024.3407837
    3. Ma, Y., Luo, R., Meng, X. et al. Path Planning for Searching Submarine with Cooperative Coverage of Fixed-Wing UAVs Cluster in Complex Boundary Sea Area. IEEE Sensors Journal, 2023, 23(24): 30070-30083. DOI:10.1109/JSEN.2023.3271352
    4. Yao, J., Zhao, C., Bai, J. et al. Satellite Interference Source Direction of Arrival (DOA) Estimation Based on Frequency Domain Covariance Matrix Reconstruction. Sensors, 2023, 23(17): 7575. DOI:10.3390/s23177575
    5. Gul, N., Kim, S.M., Ali, J. et al. UAV aided virtual cooperative spectrum sensing for cognitive radio networks. PLoS ONE, 2023, 18(9 September): e0291077. DOI:10.1371/journal.pone.0291077
    6. Qiao, J., Lu, Z., Lin, B. et al. A survey of GNSS interference monitoring technologies. Frontiers in Physics, 2023. DOI:10.3389/fphy.2023.1133316

    Other cited types(0)

Catalog

    Figures(14)  /  Tables(1)

    Article Metrics

    Article views (1602) PDF downloads (81) Cited by(6)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return