### 2.1 System model

Consider the system illustrated in Fig. 1, comprised of a legitimate pair of ground nodes, the transmitter Alice (A) and the receiver Bob (B), who establish an open wireless link to send private information from A to B. They are confined on a circular area S of radius \(R_A\) around A. Within S, the presence of an illegitimate node Eve (E) is established, trying to leak the information from the legitimate transmission shared through the wireless medium. It is assumed that E is a passive eavesdropper located within the region S, but its exact position and available resources are unknown. A is located at the origin of coordinates (0, 0, 0) and B is located along the x-axis at \((d_{\mathrm {AB}},0,0)\), without losing generality. To improve the secrecy performance of the system, *N* UAVs, \(\{\mathrm {J_i}\}_{i\in \{1,\ldots ,N\}}\) are deployed to act as friendly jammers by emitting pseudorandom noise isotropically in order to prevent E from leaking information. The jammers are positioned at a common height \(z_{\mathrm {J}}\) and within a circular orbit of radius \(R_{\mathrm {J}}\) around A, at angular positions \(\theta _{\mathrm {J_i}}\) with \(i\in \{1,\ldots ,N\}\). We assume that the estimate of the radial position of B with respect to A is unreliable; thus, we model the distance between A and B as a random Gaussian variable with the actual distance \(d_{\mathrm {AB}}\) being the mean of the estimate (unbiased), and a given uncertainty \(\sigma _{\mathrm {AB}}\), \({\widehat{d}}_{\mathrm {AB}} \sim {\mathcal {N}}(d_{\mathrm {AB}},\,\sigma _{\mathrm {AB}}^{2}),\) where \({\widehat{d}}_{\mathrm {AB}}\) is the estimate of the distance between A and B.

#### 2.1.1 Ground channels

There are two ground channels to consider between ground nodes, one between A and B and the other between A and E. Both channels are considered to undergo Rayleigh fading and are subject to additive white Gaussian noise (AWGN) with mean power \(N_0\). Then, the corresponding channel coefficients are \(h_{\mathrm {AB}}\) and \(h_{\mathrm {AE}}\), and the respective channel gains are \(|h_{\mathrm {AB}}|^2\) and \(|h_{\mathrm {AE}}|^2\). For a node \(\mathrm {U}\in \{\mathrm {B},\mathrm {E}\}\), the channel coefficient \(h_{\mathrm {AU}}\) is an independent complex circularly symmetric Gaussian random variable with a channel gain of \(g_{\mathrm {AU}} = |h_{\mathrm {AU}}|^2\) with a scale parameter of \(\Omega _{\mathrm {AU}} = {\mathbb {E}}\left[ |h_{\mathrm {AU}}|^2\right] =\gamma _{\mathrm {A}}d_{\mathrm {AU}}^{-\alpha _G}\), where \(d_{\mathrm {AU}}\) is the distance between A and node U, \(\alpha _G\) is the path loss exponent for the ground links and \(\gamma _{\mathrm {A}}\) is the transmit SNR of A given by \(\gamma _{\mathrm {A}}=P_{\mathrm {A}}/N_0\) with \(P_{\mathrm {A}}\) as the transmit power of A.

#### 2.1.2 Air-to-ground channels

There are two air-to-ground channels for each UAV jammer, one between the UAV and B and the other between the UAV and E. The channel coefficients for those links are given by \(\mathtt {h}_{\mathrm {J_iU}}\), with \(\mathrm {U}\in \{\mathrm {B},\mathrm {E}\}\) and \(i\in \{1,\ldots ,N\}.\)

The propagation path loss for the A2G channels presents a contribution from a LoS component and a non-LoS (NLoS) component, where the contribution of each component to the overall path loss is determined by the probabilities \(P_{\mathrm {LoS}}\) and \(P_{\mathrm {NLoS}}\), respectively [20]. These probabilities are functions of the UAV position with respect to the ground node of interest U and are given by [20]

$$\begin{aligned} P_{\text {LoS}}&= \frac{1}{1 + \psi \exp \left( -\omega \left[ \frac{180}{\pi }\tan ^{-1}\left( \frac{{z_{\mathrm {J}}}}{r_{\mathrm {J_iU}}}\right) -\psi \right] \right) }, \end{aligned}$$

(1)

$$\begin{aligned} P_{\text {NLoS}}&= 1 - P_{\text {LoS}}, \end{aligned}$$

(2)

where \(\psi\) and \(\omega\) are environmental constants [21, 22] and \(r_{\mathrm {J_iU}}\) is the distance from node U and the projection on the plane of the *i*th UAV.

The path loss of each component is given by

$$\begin{aligned} L^{\text {LoS}}_{\mathrm {J_iU}}&= \xi _{\mathrm {LoS}}d_{\mathrm {J_iU}}^{\alpha _J} \end{aligned}$$

(3)

$$\begin{aligned} L^{\text {NLoS}}_{\mathrm {J_iU}}&= \xi _{\mathrm {NLoS}}d_{\mathrm {J_iU}}^{\alpha _J} \end{aligned}$$

(4)

where \(\alpha _J\) is the path loss exponent for the A2G links and \(\xi _{\mathrm {LoS}}\) and \(\xi _{\mathrm {NLoS}}\) are the attenuation factors for the LoS and the NLoS links, respectively. It is also assumed that the LoS channel undergoes Rician fading with channel coefficient \(\mathtt {h}_{\mathrm {J_iU}}^{\mathrm {LoS}}\) and channel gain given by \(g_{\mathrm {J_iU}}^{\mathrm {LoS}}=|\mathrm {h}_{\mathrm {J_iU}}^{\mathrm {LoS}}|^2\), with a scale parameter of \(\Omega _{\mathrm {J_iU}}^{\mathrm {LoS}}={\mathbb {E}}[|\mathrm {h}_{\mathrm {J_iU}}^{\mathrm {LoS}}|^2] = \gamma _{\mathrm {J}_i}P_{\mathrm {LoS}}(L_{\mathrm {J_iU}}^{\mathrm {LoS}})^{-1}\) and shape parameter of \(K_{\mathrm {J_iU}}\), where \(\gamma _{\mathrm {J}_i}\) is the transmit SNR of UAV \(\mathrm {J}_i\), \(\gamma _{\mathrm {J}_i}=P_{\mathrm {J}_i}/N_0\) and \(P_{\mathrm {J}_i}\) is the transmit power, with a total jamming SNR of \(\gamma _{\mathrm {T}}=\sum _i\gamma _{\mathrm {J}_i}\). The NLoS component undergoes Rayleigh fading with channel gain \(g_{\mathrm {J_iU}}^{\mathrm {NLoS}} = |\mathrm {h}_{\mathrm {J_iU}}^{\mathrm {NLoS}}|^2\), with a scale parameter of \(\Omega _{\mathrm {J_iU}}^{\mathrm {NLoS}} = {\mathbb {E}}[|\mathrm {h}_{\mathrm {J_iU}}^{\mathrm {NLoS}}|^2] = \gamma _{\mathrm {J}_i}P_{\mathrm {NLoS}}(L_{\mathrm {J_iU}}^{\mathrm {NLoS}})^{-1}\). Considering that, the average channel gain can be expressed as

$$\begin{aligned} g_{\mathrm {J_iU}}&= g_{\mathrm {J_iU}}^{\mathrm {LoS}} + g_{\mathrm {J_iU}}^{\mathrm {NLoS}}. \end{aligned}$$

(5)

#### 2.1.3 Signal analysis

For the communication process, A sends a symbol *x* with mean power \({\mathbb {E}}\left[ |x|^2\right] = 1,\) while the UAVs send pseudorandom symbols \(s_i\) with mean power \({\mathbb {E}}\left[ |s_i|^2\right] = 1\), with \(i\in \{1,\ldots ,N\}\). We consider a common noise level with power \({\mathbb {E}}\left[ |w|^2\right] = N_0\) at every node in the system. Thus, the received signal at both B and E is, respectively, given by

$$\begin{aligned} y_{\mathrm {U}}&= h_{\mathrm {AU}}x + \sum _{i=1}^N h_{\mathrm {J_iU}}s_i + w, \end{aligned}$$

(5)

with \(\mathrm {U}\in \{\mathrm {B},\mathrm {E}\}\). Then, the instantaneous received signal-to-interference-plus-noise ratio (SINR) at node U can be expressed as

$$\begin{aligned} \gamma _{\mathrm {U}} = \frac{g_{\mathrm {AU}}}{1+\sum _{i=1}^Ng_{\mathrm {J_iU}}}, \end{aligned}$$

(6)

For the particular case with no UAV jammers, the SINR values at B and E are, respectively, given by \(\gamma _{\mathrm {B}} = \gamma _{\mathrm {A}} g_{\mathrm {AB}}\) and \(\gamma _{\mathrm {E}} = \gamma _{\mathrm {A}} g_{\mathrm {AE}}\).

### 2.2 Performance analysis

As previously mentioned, E is located within a circular area S around A, but no further knowledge on the exact position of E is assumed, i.e. E can be in whichever point inside S. Therefore, to evaluate the secrecy performance of the proposed system, we consider the area-based secrecy metrics proposed in [17], namely jamming coverage (JC) and jamming efficiency (JE), and a new hybrid metric, the WSC, introduced in [18]. These metrics’ definition is based on the SOP, which is derived for the proposed system as described below.

#### 2.2.1 Secrecy outage probability

For the definition of the area-based secrecy metrics, we consider first the SOP [6] defined as

$$\begin{aligned} {\mathrm {SOP}}= \Pr \left[ C_S < R_S \right] , \end{aligned}$$

(7)

where \(R_S\) is the chosen rate for a secrecy code and \(C_S\) is the secrecy capacity, which for our system is given by

$$\begin{aligned} C_S&= \left[ C_{\mathrm {B}} - C_{\mathrm {E}}\right] ^+= \left[ \log _2\left( \frac{1 + \gamma _{\mathrm {B}}}{1 + \gamma _{\mathrm {E}}} \right) \right] ^+, \end{aligned}$$

(8)

where \(C_{\mathrm {B}}\) and \(C_{\mathrm {E}}\) are the capacities of the channels between A and B and between A and E, respectively, with \([X]^+=\max [X,0]\), which tells us that if the capacity of the illegitimate channel is greater than the capacity of the legitimate channel, no secrecy can be achieved.

#### 2.2.2 Secrecy improvement metric

This metric measures the improvement on the secrecy performance of the proposed system, which is measured by the SOP, attained by the introduction of the friendly jamming sent by the UAV jammers. Thus, this metric is given by [17]

$$\begin{aligned} \Delta = \frac{ {\mathrm {SOP}}_{\mathrm {NJ}}}{ {\mathrm {SOP}}_{\mathrm {J}}}, \end{aligned}$$

(9)

where the SOP subscript identifies if the SOP is computed with (J) or without (NJ) the presence of friendly jamming. Then, \(\Delta >1\) values imply a reduction on the SOP by the presence of the UAV jammers, while \(\Delta <1\) is the opposite.

For mathematical tractability purposes, in [18] we proposed an analogous secrecy improvement metric that provides the same general idea with the criteria of secrecy achievement (\(1-{\mathrm {SOP}}\)) instead of SOP, thus given by

$$\begin{aligned} {\overline{\Delta }} = \frac{ 1-{\mathrm {SOP}}_{\mathrm {J}}}{ 1-{\mathrm {SOP}}_{\mathrm {NJ}}}. \end{aligned}$$

(10)

The SOP without jamming term, \({\mathrm {SOP}}_{\mathrm {NJ}},\) is obtained in closed form in [18] as

$$\begin{aligned} {\mathrm {SOP}}_{\mathrm {NJ}} = 1 - e^{-\frac{1}{\Omega _{\mathrm {AB}}}\left( 2^{R_S}-1\right) }\left( \frac{1}{2^{R_S}\left( \frac{\Omega _{\mathrm {AE}}}{\Omega _{\mathrm {AB}}}\right) +1}\right) , \end{aligned}$$

(11)

while, the SOP including jamming, \({\mathrm {SOP}}_{\mathrm {J}}\) is obtained as in Proposition 1.

### Proposition 1

The SOP in the presence of *N* UAV jammers \({\mathrm {SOP}}_{\mathrm {J}}\) for the proposed system is given by

$$\begin{aligned} {\mathrm {SOP}}_{\mathrm {J}} = \int _0^\infty F_{\gamma _\mathrm {B}}(2^{R_S}(1+x)-1)f_{\gamma _{\mathrm {E}}}(x)dx. \end{aligned}$$

(12)

where \(F_{\gamma _\mathrm {B}}(\cdot )\) is the CDF of the SINR at B, \(\gamma _\mathrm {B},\) and \(f_{\gamma _{\mathrm {E}}}(\cdot )\) is the PDF of the SINR at E, \(\gamma _\mathrm {E}\), which are, respectively, expressed as

$$\begin{aligned}&F_{\gamma _{\mathrm {U}}}(x) = 1 - e^{-{\widehat{x}}} e^{\sum _{i=1}^N\left( \frac{\eta _i}{\eta _i+{\widehat{x}}} -1\right) K_{\mathrm {J_iU}}}\prod _{i=1}^N\left( \frac{\eta _i}{\eta _i+{\widehat{x}}} \right) \end{aligned}$$

(13)

$$\begin{aligned}&f_{\gamma _{\mathrm {U}}}(x) =\frac{1}{\Omega _{\mathrm {AU}}} e^{-{\widehat{x}}} e^{ \sum _{i=1}^N \left( \frac{\eta _i}{\eta _i+{\widehat{x}}} -1\right) K_{\mathrm {J_iU}}} \left( 1 + \sum _{i=1}^N\frac{1}{\eta _i + {\widehat{x}}}\left( 1 + \frac{\eta _i K_{\mathrm {J_iU}}}{\eta _i + {\widehat{x}}}\right) \right) \cdot \\&\quad \prod _{i=1}^N\left( \frac{\eta _i}{\eta _i+{\widehat{x}}} \right) , \end{aligned}$$

(14)

with \(\mathrm {U}\in \{\mathrm {B},\mathrm {E}\}\), \({\widehat{x}}=\frac{x}{\Omega _{\mathrm {AU}}}\) and

$$\begin{aligned} \eta _i = \frac{1+K_{\mathrm {J_iU}}}{\Omega _{\mathrm {J_iU}}}. \end{aligned}$$

(15)

The SOP in (12) can be extended for the channel in (5), with both LoS and NLoS components, by considering \(g_{\mathrm {J_iU}} = g_{\mathrm {J_iU}}^{\mathrm {LoS}} + g_{\mathrm {J_iU}}^{\mathrm {NLoS}}\) , which implies doubling the amount of terms in the sums and products in (13) and (14). The Rayleigh NLoS parameters are adapted from Rician channels by setting the shape parameters to zero, \(K_{\mathrm {J_iU}}^{\mathrm {NLOS}}=0\), making \(\eta _i^{\mathrm {NLoS}} = (\Omega _{\mathrm {J_iU}})^{-1}\).

### Proof

Let us consider first the case with 2 UAVs and LoS connection between the UAVs and the ground nodes. Under these conditions, \(g_{\mathrm {J_iU}} = g_{\mathrm {J_iU}}^{\mathrm {LoS}}\) and \(\Omega _{\mathrm {J_iU}} = \Omega _{\mathrm {J_iU}}^{\mathrm {LoS}}\). For that case, the PDF and CDF of the effective A2G channel gains \(g_{\mathrm {J_iU}}\) are given by [23]

$$\begin{aligned} f_{g_{\mathrm {J_iU}}}(x)&= \frac{1+K_{\mathrm {J_iU}}}{\Omega _{\mathrm {J_iU}}}e^{-K_{\mathrm {J_iU}} - \frac{1+K_{\mathrm {J_iU}}}{\Omega _{\mathrm {J_iU}}} x }I_0\left( \sqrt{\frac{4K_{\mathrm {J_iU}}(K_{\mathrm {J_iU}}+1)}{\Omega _{\mathrm {J_iU}}}x}\right) \end{aligned}$$

(16)

$$\begin{aligned} F_{g_{\mathrm {J_iU}}}(x)&= 1 - Q_1\left[ \sqrt{2K_{\mathrm {J_iU}}}, \sqrt{\frac{2(K_{\mathrm {J_iU}}+1)}{\Omega _{\mathrm {J_iU}}}x}\right] \end{aligned}$$

(17)

where \(I_0(\cdot )\) is the zero-order modified Bessel function of first kind and \(Q_1[\cdot ]\) is the Marcum-Q function of order 1. Additionally, the PDF and CDF of the ground channels \(g_{\mathrm {AU}}\) are given by

$$\begin{aligned} f_{g_{\mathrm {AU}}}(x)&= \frac{1}{\Omega _{\mathrm {AU}}}e^{-\frac{1}{\Omega _{\mathrm {AU}}}x} \end{aligned}$$

(18)

$$\begin{aligned} F_{g_{\mathrm {AU}}}(x)&=1 - e^{-\frac{1}{\Omega _{\mathrm {AU}}}x} \end{aligned}$$

(19)

Therefore, the CDF of \(\gamma _{\mathrm {U}}\) is obtained as

$$\begin{aligned} F_{\gamma _{\mathrm {U}}}&= \mathrm {Pr}\left[ \frac{g_{\mathrm {AU}}}{1+g_{\mathrm {J_1U}} +g_{\mathrm {J_2U}} }<x\right] \\&=\mathrm {Pr}\left[ g_{\mathrm {AU}} < x(1+g_{\mathrm {J_1U}}+g_{\mathrm {J_2U}}) \right] \\&=\int _0^\infty \int _0^\infty F_{g_{\mathrm {AU}}}\left( x(1+y+z) \right) f_{g_{\mathrm {J_1U}}}(y)f_{g_{\mathrm {J_2U}}}(z)dydz, \end{aligned}$$

(20)

while the PDF is derived from the CDF as

$$\begin{aligned} f_{\gamma _{\mathrm {U}}}(x)&=\frac{d}{dx}F_{\gamma _{\mathrm {U}}}(x) \\&=\int _0^\infty \int _0^\infty \frac{d}{dx}F_{g_{\mathrm {AU}}}\left( x(1+y+z) \right) f_{g_{\mathrm {J_1U}}}(y)f_{g_{\mathrm {J_2U}}}(z)dydz \\&=\int _0^\infty \int _0^\infty (1+y+z)f_{g_{\mathrm {AU}}}\left( x(1+y+z) \right) f_{g_{\mathrm {J_1U}}}(y)f_{g_{\mathrm {J_2U}}}(z)dydz. \end{aligned}$$

(21)

To simplify the notation, in the following steps \(g_{\mathrm {AU}}\) is used for \(g_A\), \(g_{\mathrm {J_iU}}\) for \(g_i\), \(K_{\mathrm {J_iU}}\) for \(K_i\) and \(\Omega _{\mathrm {J_iU}}\) for \(\Omega _i\). Thus, by considering [24, 8.445], the term \(I_0(\cdot )\) in (16) can be rewritten as its series representation as

$$\begin{aligned} I_0\left( \sqrt{\frac{4K_i(K_i+1)}{\Omega _i}x}\right)&= \sum _{n=0}^{\infty }\frac{1}{n!\Gamma (n+1)2^{2n}}\left( \left( \frac{4K_i(1+K_i)}{\Omega _i}x \right) ^{1/2} \right) ^{2n} \\&= \sum _{n=0}^{\infty }\frac{1}{n!^2}\left( \frac{K_i(1+K_i)}{\Omega _i}\right) ^nx^n, \end{aligned}$$

(22)

then, by defining \(\eta _i {\mathop {=}\limits ^{\Delta }} \tfrac{1+K_i}{\Omega _i}\), (16) can be rewritten as

$$\begin{aligned} f_{g_i}(x) = e^{-K_i}\sum _{n=0}^{\infty } \frac{K_i^n}{n!^2} \eta _i^{n+1}e^{-\eta _ix}x^n. \end{aligned}$$

(23)

Then, by replacing (24) and (23) into (20) leads to

$$\begin{aligned} F_{g_A}(x(1+y+z)) = 1 - e^{-{\widehat{x}}} e^{-{\widehat{x}}y}e^{-{\widehat{x}}z}. \end{aligned}$$

(24)

Then, by plugging (24) and (23) into (20) we obtain

$$\begin{aligned} F_{\gamma _{\mathrm {U}}}&= e^{-K_1-K_2}({\mathcal {I}}_1 - {\mathcal {I}}_2), \end{aligned}$$

(25)

where \({\mathcal {I}}_1\) and \({\mathcal {I}}_2\) are given by

$$\begin{aligned} {\mathcal {I}}_1 = \int _{0}^{\infty }\int _{0}^{\infty } \left( \sum _{n=0}^{\infty } \frac{K_1^n}{n!^2} \eta _1^{n+1}e^{-\eta _1y}y^n\right) \left( \sum _{m=0}^{\infty } \frac{K_2^m}{m!^2} \eta _2^{m+1}e^{-\eta _2z}z^m\right) dydz, \end{aligned}$$

(26)

and

$$\begin{aligned} {\mathcal {I}}_2 = e^{-{\widehat{x}}} \int _{0}^{\infty }\int _{0}^{\infty } \Bigg (\sum _{n=0}^{\infty } \frac{K_1^n}{n!^2} \eta _1^{n+1}&e^{-(\eta _1+{\widehat{x}})y}y^n \Bigg )\cdot \\&\left( \sum _{m=0}^{\infty } \frac{K_2^m}{m!^2} \eta _2^{m+1}e^{-(\eta _2+{\widehat{x}})z}z^m\right) dydz. \end{aligned}$$

(27)

By considering [24, 3.326.2], each individual integral in \({\mathcal {I}}_1\) can be solved as

$$\begin{aligned} \int _{0}^{\infty } \left( \sum _{n=0}^{\infty } \frac{K_i^n}{n!^2} \eta _i^{n+1}e^{-\eta _iy}y^n\right) dy&= \sum _{n=0}^{\infty } \frac{K_i^n}{n!^2} \eta _i^{n+1}\int _{0}^{\infty }e^{-\eta _iy}y^ndy \\&= \sum _{n=0}^{\infty } \frac{K_i^n}{n!^2} \eta _i^{n+1} \frac{n!}{\eta _i^{n+1}}dy \\&= \sum _{n=0}^{\infty } \frac{K_i^n}{n!} \\&= e^{K_i}, \end{aligned}$$

(28)

and the same reasoning is applied for each individual integral in \({\mathcal {I}}_2\), which can be solved as

$$\begin{aligned} \int _{0}^{\infty } \left( \sum _{n=0}^{\infty } \frac{K_i^n}{n!^2} \eta _i^{n+1}e^{-(\eta _i+{\widehat{x}})y}y^n\right) dy&= \sum _{n=0}^{\infty } \frac{K_i^n}{n!^2} \eta _i^{n+1}\int _{0}^{\infty }e^{-(\eta _i+{\widehat{x}})y}y^n \\&=\sum _{n=0}^{\infty } \frac{K_i^n}{n!^2} \eta _i^{n+1} \frac{n!}{(\eta _i+{\widehat{x}})^{n+1}}dy \\&= \left( \frac{\eta _i}{\eta _i+{\widehat{x}}}\right) \sum _{n=0}^{\infty }\left( \frac{\eta _iK_i}{\eta _i {\widehat{x}}}\right) ^n\frac{1}{n!} \\&= \left( \frac{\eta _i}{\eta _i+{\widehat{x}}}\right) e^{\frac{\eta _iK_i}{\eta _i+{\widehat{x}}}}. \end{aligned}$$

(29)

Then, by replacing (28) in (26) and (29) in (27), \({\mathcal {I}}_1\) and \({\mathcal {I}}_2\) can be, respectively, expressed as

$$\begin{aligned} {\mathcal {I}}_1&= e^{K_1+K_2} \end{aligned}$$

(30)

$$\begin{aligned} {\mathcal {I}}_2&= e^{-{\widehat{x}}} \left( \frac{\eta _1}{\eta _1+{\widehat{x}}}\right) \left( \frac{\eta _2}{\eta _2+{\widehat{x}}}\right) e^{\left( \frac{\eta _1}{\eta _1+{\widehat{x}}}\right) K_1+\left( \frac{\eta _2}{\eta _2+{\widehat{x}}}\right) K_2}. \end{aligned}$$

(31)

Finally, (25) can be expressed as

$$\begin{aligned} F_{\gamma _{\mathrm {U}}}(x) = 1 - e^{-{\widehat{x}}}\left( \frac{\eta _1}{\eta _1+{\widehat{x}}} \right) \left( \frac{\eta _2}{\eta _2+{\widehat{x}}} \right) e^{\left( \frac{\eta _1}{\eta _1+{\widehat{x}}} -1\right) K_1 + \left( \frac{\eta _2}{\eta _2+{\widehat{x}}} -1\right) K_2}. \end{aligned}$$

(32)

To compute the PDF in (16), it is followed a similar process for the CDF calculation, by considering that

$$\begin{aligned} \int _0^{\infty } e^{-(\eta _i+{\widehat{x}}) x} x^{n+1} \mathrm{{d}}x&= \frac{(n+1)!}{(\eta _i+{\widehat{x}})^{n+2}} \\&= \left( \frac{n+1}{\eta _i+{\widehat{x}}} \right) \frac{n!}{(\eta _i+{\widehat{x}})^{n+1}}, \end{aligned}$$

(33)

and

$$\begin{aligned} \sum _{n=0}^{\infty } \left( \frac{\eta _i K_i}{\eta _i + {\widehat{x}}} \right) \frac{n+1}{n!} = \left( 1 + \frac{\eta _i K_i}{\eta _i + {\widehat{x}}} \right) e^{\frac{\eta _i K_i}{\eta _i + {\widehat{x}}}}. \end{aligned}$$

(34)

Thus, the PDF can be obtained as

$$\begin{aligned} f_{\gamma _{\mathrm {U}}}(x)&=\frac{1}{\Omega _A} e^{-{\widehat{x}}}\left( \frac{\eta _1}{\eta _1+{\widehat{x}}} \right) \left( \frac{\eta _2}{\eta _2+{\widehat{x}}} \right) e^{\left( \frac{\eta _1}{\eta _1+{\widehat{x}}} -1\right) K_1 + \left( \frac{\eta _2}{\eta _2+{\widehat{x}}} -1\right) K_2}\cdot \\&\left( 1 + \frac{1}{\eta _1 + {\widehat{x}}}\left( 1 +\frac{\eta _1 K_1}{\eta _1 + {\widehat{x}}}\right) + \frac{1}{\eta _2 + {\widehat{x}}}\left( 1 +\frac{\eta _2 K_2}{\eta _2 + {\widehat{x}}}\right) \right) . \end{aligned}$$

(35)

It is worthwhile to note that the integrals in (26) and (27) can be separated into independent terms for each UAV. Therefore, the CDF and PDF for the general case of *N* UAVs can be obtained as in (13) and (14), respectively.

Then, the SOP is calculated as

$$\begin{aligned} {\mathrm {SOP}}&= \mathrm {Pr}[C_S< R_S] \\&= \mathrm {Pr}\left[ \frac{1+\gamma _{\mathrm {B}}}{1+\gamma _{\mathrm {E}}}< 2^{R_S}\right] \\&= \mathrm {Pr}\left[ \gamma _{\mathrm {B}} < 2^{R_S}(1+\gamma _{\mathrm {E}})-1\right] \\&= \int _0^\infty F_{\gamma _\mathrm {B}}(2^{R_S}(1+x)-1)f_{\gamma _{\mathrm {E}}}(x)\mathrm{{d}}x. \end{aligned}$$

(36)

\(\square\)

#### 2.2.3 Weighted secrecy coverage

As mentioned before, we assume no knowledge on the position of E, other than it is located inside the circular region S within a radius \(R_A\) from A, so we analyse the secrecy performance of the proposed system in terms of the area-based metrics in [17], the jamming coverage (JC) and the jamming efficiency (JE). Both of these metrics give us the notion on the effect over the secrecy performance inside S by the presence of the UAV jammers.

For the JC, consider that E is located at a single point within the area S, where a certain \({\overline{\Delta }}\) value can be calculated, and we are interested in such points that lead into a \({\overline{\Delta }}>1\) value. Then, the jamming secrecy coverage is the integral over the area where \({\overline{\Delta }}>1\), expressed as

$$\begin{aligned} \text {JC} = \iint _{{\overline{\Delta }}>1} d{S_{\mathrm {E}}}, \end{aligned}$$

(37)

where the \(d{S_{\mathrm {E}}}\) term indicates an integral over the positions of E over the whole area S. To illustrate this concept, Fig. 2 shows a simplified overview of the system as a heatmap of \({\overline{\Delta }}\) over the whole area S. The JC would be the total area where \({\overline{\Delta }}>1\), which is enclosed by the yellow line surrounding the UAVs and A.

On the other hand, JE measures the average improvement in the secrecy over the whole area S:

$$\begin{aligned} \text {JE} = \frac{1}{|S|}\iint _{S}{\overline{\Delta }} \mathrm{{d}}{S_{\mathrm {E}}}, \end{aligned}$$

(38)

where |*S*| is the area of the region *S*.

Note that JC gives a measure of the area within S where an improvement on the secrecy performance of the system is obtained due to the UAV jammers, while JE gives a measure of the average improvement in the secrecy performance over the area *S*, if *E* were located at a random point.

To get further insights on the jamming effective coverage, in [18] we proposed a hybrid metric, the WSC, to account for both, the area over which secrecy is improved and the average secrecy improvement over the whole area *S*. The WSC is given by

$$\begin{aligned} \text {WSC }&= \left( \iint _{{\overline{\Delta }}>1} \mathrm{{d}}S_{\mathrm {E}}\right) \left( \frac{1}{|\mathrm {S}|}\iint _{\mathrm {S}}{\overline{\Delta }} \mathrm{{d}}S_{\mathrm {E}}\right) . \end{aligned}$$

(39)

### 2.3 Positioning optimisation

In this section, we consider joint optimisation of the 3D positioning of the UAVs (common height, common orbit radius and angles around A) and the power allocation between the UAVs in order to maximise the WSC, given a relative position of B with respect to A, which is characterised by \(d_{\mathrm {AB}}\). Thus, the optimisation problem is formulated as

$$\begin{aligned}&\!\max _{{\Omega =\{ \{\theta _{\mathrm {J_i}}\}_{i\in \{1,\ldots ,N\}},\{\gamma _{\mathrm {J_i}}\}_{i\in \{1,\ldots ,N\}},{z_{\mathrm {J}}},R_{\mathrm {J}}\}}}&\mathrm {WSC}(\Omega ,d_{\mathrm {AB}}) \end{aligned}$$

(40a)

$$\begin{aligned}&\text {subject to}&{ 0\le \theta _{\mathrm {J_i}}\le 2\pi \quad ,\quad \forall i\in \{1,\ldots ,N\} } \end{aligned}$$

(40b)

$$\begin{aligned}&&{ \gamma _{\mathrm {J_i}} \ge 0\quad ,\quad \forall i\in \{1,\ldots ,N\} } \end{aligned}$$

(40c)

$$\begin{aligned}&&{ \sum _{i=1}^N \gamma _{\mathrm {J_i}} \le \gamma _{\mathrm {T}}, } \end{aligned}$$

(40d)

$$\begin{aligned}&&{{z_{\mathrm {MIN}}}} \le {z_{\mathrm {J}}} \le {z_{\mathrm {MAX}}}, \end{aligned}$$

(40e)

$$\begin{aligned}&&0 \le R_{\mathrm {J}} \le R_{\mathrm {MAX}}, \end{aligned}$$

(40f)

where \({z_{\mathrm {MIN}}}\) is the minimum flying height, \({z_{\mathrm {MAX}}}\) is the maximum allowed flying height for the UAVs, \(R_{\mathrm {MAX}}\) is the limit of the orbit radius around A, which is the radius of *S*, and \(\gamma _{\mathrm {T}}\) is the maximum jamming transmit SNR from all UAVs.

To simplify the optimisation problem in (40), some trends are considered regarding the angular positioning and the allocated jamming power for the case of two UAVs provided as observed in [18]. In that work, it was found locating both UAVs symmetrically behind the line between A and B leads to the optimal performance; thus, this trend is generalised to the *N* UAVs case by considering a single opening angle \(\theta _{\mathrm {J}}\) between any pair of adjacent UAVs symmetrically located, as shown in Fig. 1. Then, it was proved that the WSC is maximised by having an equal power allocation for the friendly jammers, which is also generalised to the *N* UAV case.

Under these observations, the optimisation problem in (40) can be reformulated as

$$\begin{aligned}&\!\max _{\Omega =\{\theta _{\mathrm {J}},{z_{\mathrm {J}}},R_{\mathrm {J}}\}}&\qquad&\mathrm {WSC}(\Omega ,d_{\mathrm {AB}}) \end{aligned}$$

(41a)

$$\begin{aligned}&\text {subject to}&{ 0\le \theta _{\mathrm {J}}\le \frac{2\pi }{N-1}, } \end{aligned}$$

(41b)

$$\begin{aligned}&&{ {z_{\mathrm {MIN}}} }\le {z_{\mathrm {J}}} \le {z_{\mathrm {MAX}}}, \end{aligned}$$

(41c)

$$\begin{aligned}&&0 \le R_{\mathrm {J}} \le R_{\mathrm {MAX}}, \end{aligned}$$

(41d)

where only three optimisation variables are considered, namely the opening angle \(\theta _{\mathrm {J}}\), the UAV common height \(z_{\mathrm {J}}\) and the UAV surveillance orbit radius \(R_{\mathrm {J}}\).

#### 2.3.1 Reinforcement learning-based positioning

Given that the estimate of the distance from A to B is unreliable, the optimisation problem in (41) cannot be reliably solved. To account for the stochastic nature of the estimate of the distance to B, \(d_{\mathrm {AB}}\), we consider a coordinate-descent-based [25] iterative scheme to reliably solve the optimisation problem in (41) by employing an RL approach to ascertain the optimum positioning for the UAVs around A. Particularly, we model this problem as a multi-armed bandit (MAB) problem, by considering the discrete positioning variables values as the arms or actions, and the WSC reading obtained at each step as the values or rewards. In the following, we briefly introduce the basis of the MAB problem and some relevant RL concepts to help us explain our approach.^{Footnote 2}

*Multi-Armed Bandit Problem* [26] An MAB problem consists of an agent (bandit) which has to choose at each time step among a set of actions (arms) to obtain rewards. At each step, each chosen action provides a reward, which is a random variable with a given distribution per action. The goal of the agent is to maximise the reward obtained over the time, which could be understood as choosing the optimum action, which is the action with the highest expected reward, so-called *exploitation*. This is done by keeping estimates of each of the actions’ expected rewards. Therefore, it is also of interest to keep learning more about other actions to refine the estimates for each of them, which is called *exploration*. The action chosen at each step is determined by a policy, which in part sets the exploration/exploitation balance to be taken. An illustrative example of this learning process is shown in Fig. 3.

Considering the optimisation problem in (41), we have three positioning variables, the opening angle of adjacent UAVs behind A (\(\theta _{\mathrm {J}}\)), the common height of the UAVs (\(z_{\mathrm {J}}\)) and the orbit radius of the UAVs around A (\(R_{\mathrm {J}}\)). Each variable is separated into its own RL process, independent of the other two. For each positioning variable, we define its possible actions as a range of values the variable can take, which are given by the constraints in (41), and a discretised number of actions per variable (\(N_{\theta }\), \({N_{z}}\), \(N_{R}\)). Each action of a variable has a reward distribution, which corresponds to the distribution of WSC values obtained by performing that action. The goal is to be able to estimate with high accuracy which of the actions has the greatest expected reward. At each step, one of the actions is chosen following a policy and the received WSC reward is processed to contribute for the estimation of the expected reward (WSC) for said action.

To simplify the computations, we perform three separated RL processes, one for each positional variable with its own action range discretisation. The RL loops for each of the variables are to be repeated back to back, alternating between the variables.

Considering that for each RL step of a given positioning variable, an assumption needs to be made regarding the other two positioning variables. The natural way of choosing which value should be considered for the other two positioning variables is to choose them in a *greedy* fashion, i.e. choose the values for the other two positioning variables that are estimated thus far to be the ones that lead to the highest reward. This implies that for any of the positioning variables, the RL process being carried out is non-stationary since the values for the other positioning variables, which are considered as part of the environment, change during the process, thus changing the environment. To account for the non-stationarity of the RL processes, consider the following generic estimate update rule [26]:

$$\begin{aligned} Q_{n+1} = Q_n + \alpha _n \left[ R_n - Q_n \right] , \end{aligned}$$

(42)

where \(Q_n\) is a generic action reward estimate at time *n*, \(R_n\) is the observed reward at time *n* and \(\alpha _n\) is the so-called step size at time *n*, which controls the contribution of the observed data to the estimate at time *n*. As we consider that all observed rewards will contribute evenly to the estimate, we set \(\alpha _n = 1/n\). However, in a non-stationary environment, we may want to give a higher weight to the new observations over the past observations, so that the RL process would be more sensitive to the environmental changes. To accomplish this, we set \(\alpha _n=\alpha\) for all *n* values to be a constant, such that \(0<\alpha <1\) [26].

Regarding the policy to be used, we consider the upper confidence bound (UCB) policy [26] that is described next:

*Upper Confidence Bound* The action chosen at each step is determined by both the estimated value of the action thus far (greedy) and by the frequency of chosen that action in the past. This rule is determined by [26]

$$\begin{aligned} A_t = {\mathop {{{\,\mathrm{arg\,max}\,}}}\limits _{a}}\left[ Q_t(a)+c\sqrt{\frac{\ln (t)}{N_t(a)}}\right] \end{aligned}$$

(43)

where \(N_t(a)\) is the number of times the action *a* has been chosen up to time *t* and *c* is a constant parameter that controls the degree of exploration. Then, with this policy, a continuous exploration is performed as time goes on in favour of less chosen actions over time that is controlled by the *c* constant, which has to be set depending on the desired degree of exploration, and the expected reward values.

#### 2.3.2 Positioning learning block

RL loops will be employed over the positional variables of the UAVs in order to iteratively reach the optimum values in a coordinate descent fashion [25]. This processing is performed at A that has a global understanding of the system, and it transmits the positional information to the UAVs for physical adjustment. However, the transmission frequency of positional information to the UAVs is a concern, since every time this information is received, the UAVs are compelled to adjust their position, thus entailing energy consumption. If this occurs after each RL step of each variable, the movement of the UAVs may be unnecessarily erratic (given the randomness of the estimate and the discretisation level of the variable domains), consume a high amount of energy from the UAVs over time and introduce a substantial amount of delay, given that A needs to receive an acknowledgement (ACK) from the UAVs alerting that the required new position has been assumed before starting another RL step.

Thus, we propose a time frame-based scheme that splits a given time range, which we name a positioning learning block (PLB), into individual slots, namely RL slots (RLSs) and positioning slots, as shown in Fig. 4. A PLB comprises *nRLS* consecutive RLSs and a single positioning slot at the end of it. At the beginning of an RLS, a \({\widehat{d}}_{\mathrm {AB}}\) estimate is obtained and used in the rest of the slot, where a single RL step is performed for each of the positioning variables (\(\theta _{\mathrm {J}}\), \(z_{\mathrm {J}}\), \(R_{\mathrm {J}}\)), one after another. Each RL step assumes a greedy positioning from the other variables.

For the duration of the RLSs, A performs internal processing of the RL steps, and at the positioning slot, A chooses the greedy actions from the three positioning variables and transmits this information to the UAVs. Then, the UAVs assume their new positions based on this information and send an ACK signal to A, which, upon reception, starts another PLB as shown in Fig. 4. Therefore, we define an off-policy scheme, where we employ a greedy policy at the positioning slots, and a UCB policy at the RLSs.

Given this approach, each UAV incurs in energy consumption at each positioning slot that is simply given by: the energy needed to receive the positioning instructions from A (\(E_\mathrm{RX}\)), the energy needed to manoeuvre to its new position (\(E_\mathrm{Mov}\)) and the energy needed to send an ACK back to A (\(E_\mathrm{ACK}\)). This energy term is given by

$$\begin{aligned} E&= E_\mathrm{RX} + E_\mathrm{ACK} + E_\mathrm{Mov} \end{aligned}$$

(44)

$$\begin{aligned}&= E_\mathrm{RX} + E_\mathrm{ACK} + \Delta t_v P_\mathrm{Mov}, \end{aligned}$$

(45)

where \(P_\mathrm{Mov}\) is the power needed by the UAV to manoeuvre and \(\Delta t_v\) is the time it takes the UAV to perform this change in position. Assuming that the UAV changes its position by assuming its new angle, height and radius in that order, \(\Delta t_v\) is given by

$$\begin{aligned} \Delta t_v&= \frac{1}{v_{\mathrm {J}}}\left( |\Delta s| + |\Delta {z_{\mathrm {J}}}| + |\Delta R_{\mathrm {J}}| \right) \end{aligned}$$

(46)

$$\begin{aligned}&= \frac{1}{v_{\mathrm {J}}}\left( \frac{1}{2}R_{\mathrm {J}_0} |\Delta \theta _{\mathrm {J}}| + |\Delta {z_{\mathrm {J}}}| + |\Delta R_{\mathrm {J}}| \right) , \end{aligned}$$

(47)

where \(v_{\mathrm {J}}\) is the manoeuvring speed of the UAV (assumed constant throughout the flight), \(\Delta \theta _{\mathrm {J}}\), \(\Delta {z_{\mathrm {J}}}\) and \(\Delta R_{\mathrm {J}}\) are the angle, height and radius variations, and \(R_{\mathrm {J}_0}\) is the initial UAV radius value.

#### 2.3.3 MAB-based WSC improvement UAV positioning algorithm

The concepts defined so far have the main goal of establishing the optimal position for the *N* UAV jammers in order to maximise the WSC, while A sends out information to B over the wireless medium. In Algorithm 1, we present the process followed by the proposed algorithm, where the variables in brackets (\([\theta _{\mathrm {J}}]\), \([{z_{\mathrm {J}}}]\), \([R_{\mathrm {J}}]\)) represent the action values estimates array for each of the variables.

Algorithm 1 provides a description of the processes depicted in Fig. 4 over time. In this algorithm, MAB processes are carried out, once for each RLS, for every positioning variable sequentially with the UCB action-choosing policy, over a number of PLBs. This algorithm refines its action estimates for each of the positioning variables over time in each RLS, adapting to the changes in the other positioning variables and allowing the UAVs to take positions that increase their WSC at the end of each PLB. Thus, the WSC of the system increases closer to the optimum at every PLB.