Primary User Emulation and Jamming Attack Detection in Cognitive Radio via Sparse Coding

Cognitive radio is an intelligent and adaptive radio that improves the utilization of the spectrum by its opportunistic sharing. However, it is inherently vulnerable to primary user emulation and jamming attacks that degrade the spectrum utilization. In this paper, an algorithm for the detection of primary user emulation and jamming attacks in cognitive radio is proposed. The proposed algorithm is based on the sparse coding of the compressed received signal over a channel-dependent dictionary. More specifically, the convergence patterns in sparse coding according to such a dictionary are used to distinguish between a spectrum hole, a legitimate primary user, and an emulator or a jammer. The process of decision-making is carried out as a machine learning-based classification operation. Extensive numerical experiments show the effectiveness of the proposed algorithm in detecting the aforementioned attacks with high success rates. This is validated in terms of the confusion matrix quality metric. Besides, the proposed algorithm is shown to be superior to energy detection-based machine learning techniques in terms of receiver operating characteristics curves and the areas under these curves


Introduction
Due to the rapid growth in wireless technology and services, the scarcity of the wireless spectrum has become a major problem [1]. To meet the requirements of future wireless networks and to alleviate this spectrum scarcity problem, cognitive radio (CR) is one of the most promising solutions. CR allows spectrum sharing between the primary users (PUs) and secondary users (SUs). More specifically, it enables SUs to opportunistically utilize empty spectrum bands without harming the PUs by following these steps: i) determining whether the channel is occupied or not, ii) choosing the best part of the spectrum based on their quality of service (QoS) requirements, iii) coordinating with other users to access the spectrum, and iv) leaving the channel whenever a PU starts to transmit its data [2].
Although CR is a promising solution to address the spectrum shortage problem, it is inherently vulnerable to both traditional and new security threats [3]. This is due to the wireless nature and unique characteristics of CR. Traditional security threats include eavesdropping, spoofing, and jamming attacks [4], while new security threats include spectrum sensing data falsification (SSDF) and primary user emulation attack (PUEA) [3,5].
An eavesdropper tries to "hear" the secret communication between legitimate nodes while a spoofer can modify, intercept, and replace the messages between the arXiv:2006.09231v1 [eess.SP] 16 Jun 2020 legitimate parties. On the other hand, a jammer can generate intentional interference signals to degrade the quality of communication for both PUs and SUs. Thus, a jammer can also prevent an SU from efficiently utilizing the white spaces of the spectrum by causing false alarms regarding spectrum occupancy [4].
In an SSDF attack, an illegitimate node provides false sensing information to degrade the performance of the collaborative spectrum sensing approach, where collaborative approaches include the interaction of multiple CRs to improve the sensing performance in the fading environment. On the other hand, PUEA is based on emulating the characteristics of the PU transmission to deceive the SUs about spectrum occupancy. PUEA prevents them from utilizing the existing spectrum holes and can even cause interference to the PUs in some cases [6].
Popular PUEA types include malicious and selfish attacks. The malicious attackers objective is to degrade the CRs performance by preventing them from opportunistic exploitation of spectrum. Particularly, a malicious attacker destroys the operations of the CR network. Thus, it can stop CRs from sensing and can also disengage the already used spectrum by them. On the other hand, a selfish attacker aims at exploiting the space of the spectrum by preventing other secondary users from using it. More specifically, it focuses on enhancing its consumption of the spectrum by degrading the overall fairness of the system.
The focus of this work is to detect false alarm about the spectrum occupancy that is caused by illegitimate nodes. An illegitimate node can transmit a signal similar to that of a PU, considered as a primary user emulator (PUE), or can send an unstructured signal, considered as a jamming attack. In the literature, several solutions are proposed for illegitimate node detection. For instance, the power level of the signal through the energy detection (ED) algorithm can decide on the source of the signal [7] using a pre-defined threshold. In [8], the authors presented a Markov random field-based belief propagation framework with ED for PUEA detection. Firstly, SUs employ the energy-based algorithm and calculate the belief values about the real source of the signal. Afterwards, the belief values are shared between different users. Finally, the average belief value is compared with the pre-defined threshold. If the average is less than the threshold, it is assumed that the signal source is fake, otherwise, the source of the signal is assumed to be real. These approaches are simple, however, they are shown to create high levels of false alarm rates. Cross-layer techniques are also effective for illegitimate node detection. In [9], the authors proposed a cross-layer approach for jamming attack and PUEA detection in CR networks by using information from physical layer spectrum sensing, statistical analysis of routing information, and prior knowledge about PUs. This technique is effective for detecting PUEA and jamming attack. However, there is an excessive overhead in analyzing and comparing information from physical and network layers.
The wireless channel and inherent physical characteristics of communication devices are also effective for illegitimate node detection [10][11][12][13]. For instance, wireless channel-based detection schemes are proposed in [10][11][12] for PUEA detection. These techniques are based on the fact that the channel between different transmitterreceiver pairs is different due to its spatial decorrelation nature. In [13], the inherent physical layer features of devices based on hardware impairments are exploited for PUEA detection. Nevertheless, these techniques require excessive software and hardware overheads for their implementation.
Localization-based detection is also popular for PUEA detection. The basic idea is to infer the position of the signal's source by using the received signal and compare it with a database of pre-known locations of legitimate PUs. However, database management is not applicable in all scenarios [14,15]. Similarly, the authors in [16] used the time difference of arrival-based position estimation approach for PUEA detection. However, this requires a strict synchronization between the receiver and the transmitter.
Machine learning (ML)-based solutions also received considerable attention for CR security. In [17], an anomaly detection framework for CR networks based on the characteristics of radio propagation is proposed. However, it does not consider specific attacks and is designed only for the detection of general anomalies. In [18], the authors proposed a technique based on support vector data description (SVDD) and zoom fast Fourier transform (zoom FFT). In the first step, the pilot and symbol rate are estimated using zoom FFT. Afterwards, a boundary around the PU objects is constructed using the SVDD classifier which is used to distinguish between PU and PUE. However, this method does not perform well in low signalto-noise ratio (SNR) operating conditions. Furthermore, the method fails when the PUE is extremely intelligent (the only information unknown by the PUE is the channel). In [19], the authors proposed an ML-based algorithm for PUEA detection that exploits the signal strength and boundaries around the position of PU for the correct detection. This method is good in terms of complexity but it suffers from performance degradation.
Recently, compressive sensing (CS)-based approaches were applied in spectrum sensing where CS offers several benefits. For example, it can alleviate the need for high sampling rate analog-to-digital converters [20][21][22]. This results in a reduction of the overall complexity, energy consumption, and memory requirements. Following its success in various application areas [20], CS has been applied to the problem of PUEA detection. Works along this line include PUEA detection based on CS and received signal strength [23]. This approach needs multiple sensors throughout the network. Hence, it increases the overall complexity. Another example considers exploiting belief propagation and CS for PUEA detection [24]. However, this requires a centralized node for its implementation. In [25], the authors proposed an algorithm for jamming attack detection in wide-band CR. In the first step, CS is performed to estimate a wide-band spectrum. Afterwards, an ED algorithm is applied to identify the occupied spectrum sub-bands. Lastly, waveform parameters of the sub-bands are compared with the known user database to determine the jamming attack. However, this method also requires database management.
In this paper, we propose an algorithm for PUEA and jamming attack detection corresponding to the narrow-band spectrum using the convergence patterns of the sparse coding over channel-dependent sampled dictionary. This convergence is characterized by the sparse coding residual signal energy decay rates. The proposed algorithm does not require a centralized node or strict synchronization between transceiver ends. Moreover, it does not require information from multiple sensors for the implementation. Furthermore, it eliminates the need for estimating the sparse coding error tolerance or the sparsity level, as typically required in CS-based approaches. The reason is that the sparse recovery in the proposed algorithm is just used for energy convergence rate revelation rather than accurate signal reconstruction. The main contributions of this paper are as follows: • First, the decaying pattern of sparse coding is used for PUEA detection. This is achieved by exploiting the convergence patterns of the sparse coding over a PU channel-dependent dictionary. In this context, these patterns guide on identifying a spectrum hole, a PU, and a PUE through ML approaches. • Second, jamming attack detection is also performed based on the decay pattern of sparse coding. Here, the idea is that the noise and jamming signals are not compressible because they are not structured. So, residual energy decay patterns with a channel-dependent dictionary along with the non-compressive nature of jamming signals are used for efficient jamming attack detection via ML classification. The rest of this paper is organized as follows. Preliminary information and the system model are presented in Section 2. Section 3 provides the proposed algorithm, while the complexity analysis is presented in Section 4. Section 5 presents the simulation results and discussions. Finally, the paper is concluded in Section 6.
Notation: Upper-case bold-faced, lower-case bold-faced and lower-case plain letters represent matrices, vectors, and scalars, respectively. The symbols . 0 and . 2 denote the number of nonzero elements and the 2-norm of a vector, respectively. The ·, · , †, and C symbols represent inner product, Moore-Penrose pseudoinverse, and complex number field.

Preliminaries and System Model
This section reviews background information related to CS, sparse recovery, and ML approaches.

Compressive Sensing and Sparse Recovery
Using a random sensing matrix, CS merges data measurement and compression into a unified operation. CS applies to compressible signals, i.e., either the explicitly sparse signals, or the ones admitting sparsity in a certain domain [26].
Let us assume a signal vector y ∈ C N . A compressed version of y can be obtained by applying a measurement matrix Φ ∈ C M ×N as y c = Φy, where M N . Hence, a reduction in dimensionality from N -to-M is achieved. A high-dimensional version of the original signal can be reconstructed from this low dimensional measurement via sparse recovery [26].
Generally speaking, let us assume that a signal y admits sparse coding over a dictionary (D ∈ C N ×K ). The signal can be represented in terms of D as y ≈ Dw, where w ∈ C K is a sparse coefficient vector. The calculation of w can be cast as follows.
where S denotes the sparsity level of the signal. Sparse recovery is an NP-hard problem. However, sparse recovery methods offer efficient approximate solutions. As shown in (1), the 0 pseudo-norm is principally used to exactly quantify the sparsity level. However, its minimization is mathematically intractable and highly complex. Therefore, there exist only approximate solutions to 0 minimization, such as the matching pursuit and orthogonal matching pursuit (OMP) approaches. Alternatively, this problem can be overcome by relaxing the 0 norm minimization condition to minimizing the 1 norm which is a loose bound on sparsity. Still, 1 minimization is convex and accepts linear programming. Thus, replacing 0 minimization with 1 minimization offers a significant reduction to the computational complexity of sparse coding. However, 1 minimization requires information about the noise level of the signal being recovered. Thus, in this work, we adopt approximate 0 minimization through the OMP algorithm [1] .
The intrinsic sparsity of the signal can be revealed by a dictionary. This dictionary can be formed of fixed basis functions such as Fourier basis, Gabor functions, wavelets, and contourlets. Alternatively, it can be generated as a learned dictionary. In this setting, a dictionary is obtained by training over training data signals Y ∈ C N ×L [28]. This dictionary learning process can be formulated as where represents error tolerance. Since the problem is non-tractable and nonconvex, most of the dictionary learning algorithms perform the learning by iteratively alternating between a sparse representation stage and a dictionary update stage. As an example, the K-SVD algorithm [28] is one of the widely used algorithms for the dictionary learning process.
The above-mentioned dictionary learning is a computationally demanding process. Therefore, developing efficient alternatives to the classical dictionary learning approach is needed for CR-related applications [21]. In this context, the use of sampled dictionaries is an efficient alternative. One can obtain a sampled dictionary by picking a set of randomly-selected data vectors that serve for the sparse coding without the need for applying an expensive learning process. Thus, this offers a compromise in terms of computational complexity at a tolerable loss in the representational power of the dictionary. In [22], the use of sampled dictionaries is justified by their usage to represent data points in a specific class, which have a general similarity. Similarly, sampled dictionaries are used in this work to represent signals.

Residual Components in Pursuit Sparse Coding
A widely used sparse representation algorithm is OMP. This algorithm is based on iteratively obtaining the coefficients in a sparse coefficient vector (w). Particularly, each iteration identifies the location and adjusts the value of a nonzero element in w. This is achieved by selecting one atom (column) from a dictionary D and adjusting its respective weight. [1] The proposed algorithm is not limited to OMP and it can be implemented with any sparse recovery algorithm [27]. We prefer to use the OMP algorithm since it is computationally efficient and simple.
To implement the above-explained atom selection and coefficient update processes, algorithms such as OMP define a so-called residual signal r. Conceptually, r represents signal portions that have not yet been represented by the selected dictionary atoms. Hence, sparse coding initializes r with the signal itself, as r ← x. In the first iteration, the sparse representation algorithm loops through all dictionary atoms and selects the one most similar to the current residual r. Once this atom is selected, the corresponding weight is calculated. To this end, the next residual is calculated by subtracting the resultant one-atom sparse approximation from the original residual. Then, the residual is considered as a new signal for which another dictionary atom is selected and another coefficient is calculated and the process continues until a certain halting condition is met.
The interesting point to consider in the above-explained sparse coding approach is that the energy of the residual components should dramatically decrease as sparse coding progresses. Intuitively, this is because more atoms are selected, and thus more signal portions are excluded from the residual.

Machine Learning for Classification
The successful works of the ML algorithms in many application areas such as computer vision, fingerprint identification, image processing, and speech recognition led these algorithms to become appealing for the area of wireless communication [29]. These ML algorithms are categorized under three categories called supervised, unsupervised, and reinforcement learning. Supervised learning-based ML algorithms are widely used for classification problems when the number of present classes is known and the information of the classes that samples belong to in the training stage is available.
Amongst many supervised learning-based algorithms, the feed-forward neural network has received growing interest in classification problems since it can recognize classes accurately and quickly [30]. This network can be used with a single-layer and multi-layer. Although single-layer algorithms are computationally good, these algorithms can only be used for simple problems. Alternatively, the multi-layerbased algorithms that include the usage of one or more hidden layers are used. Even though these algorithms increase computational complexity, they are able to solve more complex problems. Besides the effect of the extra layers, the number of neurons that are used in hidden layers is also effective on the accuracy and complexity performances. Therefore, it is quite significant to set these hyper-parameters optimally. Moreover, the complexity and accuracy performances can be increased by feature extraction (with the domain knowledge). Along this line, CS is used to extract features in this work with the aim of increasing the performance of the ML.

System Model
The system model used is intended to characterize the existence of legitimate and illegitimate source nodes. Thus, it consists of a PU node, an SU node, and an illegitimate node as presented in Fig. 1. In this setting, an SU node opportunistically exploits the spectrum in the presence of an illegitimate node that can launch either PUEA or jamming attack. A jammer transmits a random signal, while a PU node and a PUE transmit structured signals that mimic the legitimate PUs.
We can represent the transmitted signal as: x = As, where A is a coefficient matrix with a size of N × N . Each component is denoted by a i,j with i, j = 1, . . . N , and s = [s 1 (t), . . . , s N (t)] T represents the transmitted data vector. Any coordinate of s is given as s i (t) = ∞ k=−∞ d k u(t − kT s )e j2πfc,ot , where T s is the symbol duration, f c,o represents the center frequency, d represents digitally modulated data symbols, u(t) represents the pulse shaping filter, and o = 1, 2, . . . , N .
The signal at the receiver sent by any node can be written as where h is a multipath Rayleigh fading channel between any transmitter-receiver pair and n is additive white Gaussian noise. Due to the spatial decorrelation concept, the channel between different transmitter-receiver pairs is assumed to be different [31].

The Proposed Algorithm for PUEA and Jamming Attack Detection
The objective of this work is to differentiate between the following hypotheses: where n is additive white Gaussian noise and y is the received signal. Besides, h P U denotes the channel corresponding to the legitimate PU, h i is the channel corresponding to PUE or jammer, x n represents the (unstructured) jamming signal, and x s is a structured signal. In this work, two goals are set. The first is to detect PUEA, i.e., to differentiate between the H 0 , H 1 , and H 2 hypotheses. The second goal is to detect jamming attacks, i.e., to differentiate between H 0 , H 1 , and H 3 . To meet the above-mentioned goals, a compressed version of the received signal is observed by the CS algorithm and its sparse coding is calculated with respect to a PU channel-depended dictionary D P U . As detailed in Section 2, sparse coding iteratively minimizes the energy of a residual ( r 2 ). For each iteration, we calculate the value of r 2 . Then, we quantify the rate of its decay using the gradient operator (|G|). It is noted that the speed of this decay depends on the harmony between the received signal and the dictionary.
The convergence profile of this residual or gradient versus iteration can be used to distinguish between the aforementioned hypotheses. The idea behind this approach is that the unstructured signals (noise and jamming) are not compressible, while structured signals are compressible. Hence, different signals have different r 2 and |G| profiles that help to distinguish between different hypotheses. Following the same logic, different signals have different patterns based on the similarity between the dictionary atoms and signals. In other words, residual energy patterns show how much dictionary atoms can guarantee accurate and sparse representation for signals that can also help in distinguishing between various hypotheses. Intuitively speaking, a signal that is compressible in the given dictionary has a faster decay speed compared to other signals. Thus, if the dictionary is channel-dependent, it will also affect the pattern corresponding to r 2 and |G|, which can be used also to differentiate between different hypotheses.
To this end, we analyze the usefulness of r 2 and |G| in distinguishing between the aforementioned hypotheses in (4) with the following test. We use a test set of 10 3 quadruplets of synthetically-generated received signals (y) that correspond to the hypotheses H 0 , H 1 , H 2 , and H 3 , respectively. In other words, one signal is mere noise, the other one is the signal received from the legitimate PU, the third one is a PUE signal that mimics the PU signal, and the fourth one is an unstructured jamming signal. These signals are generated as described in Section 5.
For each quadruplet, we calculate a PU-dependent dictionary (D P U ) based on the known PU channel (h P U ). In this work, a channel-dependent dictionary is obtained by convolving a set of randomly selected data (X) with the channel corresponding to the legitimate PU. Formally stated, D P U = h P U * X, where * denotes convolution. Afterwards, we perform an iterative sparse coding operation on a compressed version of each signal in the quadruplet with D P U while calculating r 2 . Next, we calculate the gradient of each residual vector as |G|.
The average values of r 2 and |G| in the above-explained test are presented in Fig. 2. In view of this figure, it is seen that one can differentiate between the four hypotheses based on |G| and r 2 using ML approaches. For example, the gradient of H 1 has faster decay as compared to H 0 , H 2 , H 3 as presented in Fig. 2 (f), Fig. 2 (e), Fig. 2 (g), and Fig. 2 (h), respectively. The reason for exhibiting a faster decay is that the received signal in H 1 (corresponding to PU) is the only one compressible in the given dictionary.
Based on the above discussion, we present the proposed algorithm. It is divided into two main stages. First, is a classifier training stage, where one uses a comprehensive set of training signals. We can either concatenate r 2 and its absolute gradient |G| into a unified feature vector or use them separately as classification features. These features are used to make training data sets f i 0 , f i 1 , f i 2 , and f i  case of jammer detection, the training set contains f i 0 , f i 1 , and f i 3 corresponding to the hypotheses H 0 , H 1 and H 3 , respectively. Afterwards, these training vectors, along with their class labels are fed to the ML training stage, where a classifier model is trained accordingly. The workflow of the training set preparation stage is pictorially described in Fig. 3-(a). In this figure, Y i n represents the set of compressed received signals y i 0 , y i 1 , y i 2 for the case of PUEA detection or y i 0 , y i 1 , y i 3 for the case of jamming attack detection. Similarly, F i n represents the set of training vectors. After classifier training, the testing stage represents the run-time operation of the proposed algorithm. This process is explained in Fig. 3-(b). For each incoming test signal, y, sparse coding is performed over D P U and feature vector f is obtained. Afterwards, f is fed into the learned classifier. Finally, this classifier will decide on the hypothesis corresponding to the current signal of interest. An analysis of this idea is provided in the Appendix.

Complexity Analysis
In this section, we roughly quantify the computational complexity of the proposed algorithm. This complexity is primarily required by sparse coding and ML.
The OMP computational complexity at the k-th iteration is O(M K +KS +KS 2 + S 3 ) while the overall complexity is O(M KS +KS 2 +KS 3 +S 4 ), where S represents the sparsity level [32]. Thus, the overall computational complexity of sparse coding with a sparsity level of M is O(KM 2 + KM 2 + KM 3 + M 4 ). This can be simplified as O(2KM 2 +KM 3 +M 4 ). Note that sparse coding is used during both the training and the testing phase in the proposed algorithm.
The computational complexity of ML is divided into two main stages which are training and testing. The computational complexity of two-layer neural network per sample is O(e(lk + ml)) for training stage, where e denotes the number of epochs, while k, l, and m represent the number of neurons at the input, hidden, and output layers, respectively. The total complexity of training stage is O(ep(lk + ml)) for p number of samples. Moreover, the computational complexity of training per sample is roughly double as compared to the complexity of testing per sample [33]. It is worth to note that k = 2M , since r 2 and |G| are concatenated into a unified feature vector in the simulations.

Results and Discussion
This section presents numerical experiments to assess the performance of the proposed algorithm comparing it with the ED approach.

Parameter Setting
The simulations are conducted with different modulation settings based on the system model specifications presented in Section 2. The modulation types used include quadrature amplitude modulation (QAM), pulse amplitude modulation (PAM), frequency-shift keying (FSK), and phase-shift keying (PSK). Moreover, the proposed algorithm uses a 100 × 400 dictionary. For each received signal, a channel realization [34] is generated for the PU and uncorrelated channel realizations are generated for illegitimate node based on channel decorrelation concept [31]. The assumed model of h P U is: h P U = ρh + (1 − ρ), where ρ is the correlation factor and h is Rayleigh fading channel [35]. The details of the simulation parameters are presented in Table 1.
We use a standard two-layer feed-forward network [30] for that consists of a hidden layer and an output layer with sigmoid functions. The number of hidden neurons is set to 64 while the number of output neurons is set to the number of elements in the target vector which is 3 (corresponding to the number of classes in PUEA or jamming attack detection). For the case of PUEA detection, the vectors f i 0 , f i 1 , and f i 2 are used for training. For jamming attack detection, f i 0 , f i 1 , and f i 3 are used as input vectors. Energy decay rate and gradient vectors r 2 and |G| are used as feature vectors. Here, the dimension of both r 2 and |G| is 1 × M . Therefore, the feature vector dimension 1 × 2M .
It is noted that we take 4000 samples from each class in the training stage for all cases and 1000 samples from each class in the testing stage for each of the SNR values. Also, the neural network is trained over the SNR values ranging between −5 dB and 15 dB with a step size of 5 dB.

Performance Analysis
This section presents the performance analysis of the proposed algorithm in terms of confusion matrices, receiver operating characteristics (ROC) curves and area under ROC (AUROC) curves. For the jamming detection scenario, it is assumed that the illegitimate node broadcasts non-structured signals. On the other hand, it is assumed that PUE signal's parameters are identical to that of PU signal.
To examine the performance of the classification, confusion matrices are often used. They present the number of both correctly and incorrectly classified observations. Thus, diagonal elements present the number of those observations correctly classified while off-diagonal elements indicate the number of incorrectly classified observations. Table 2 presents the confusion matrices for the case of PUEA detection for different M and SNR values, where M is the number of samples in the compressed received signal. It is observed from Table 2 that the overall performance of the proposed algorithm is satisfactory for PUEA detection, especially at high SNR. Besides, the performance also improves with the increase in the values of M . Table 3 presents the confusion matrices for the case of jamming detection for different M and SNR values. It is seen from the table that the classification accuracy based on the proposed algorithm improves with the increase in M and SNR similar to PUEA case.
It is also observed from Tables 2 and 3 that the performance of the proposed jammer detection outperforms PUEA detection. This is because the jammer detection benefits from both the non-compressive nature of the jamming signal and the channel-dependent dictionary while the PUEA detection benefits only from the channel-dependent dictionary.
In classification, if a signal belongs class i and is correctly classified in to belong to the same class, then it is said to be as true positive (T P ). If it is wrongly classified to belong to a different class j, then it is said to be a false negative (F N ). If, however, the signal does not belong to class i and is wrongly classified as such, then it is counted as false positive (F P ). Finally, if it does not really belong to i and is classified to belong to i, then it is a true negative (T N ). To this end, the true positive rate (T P R) or recall can be defined as T P R = T P /(T P + F N ), whereas the false positive rate (F P R) can be defined as F P R = F P /(F P + T N ). ROC curves and AUROC curve values show the capability of a classifier to distinguish between different classes. ROC is a probabilistic curve which is plotted with a T P R on the vertical axis and F P R on the horizontal axis. Ideally, the T P R equals 1 and the F P R equals 0. Generally speaking, the closer the ROC curve is to the topleft corner, the better the performance. Similarly, the higher values of the AUROC curve shows better performance. In this work, there are three classes (H 0 , H 1 , H 2 or H 0 , H 1 , H 3 ) and ROC curve for each class is plotted separately. Fig. 4 and Fig. 5 present a performance comparison of the proposed algorithm with the ED-based ML algorithm for PUEA and jamming attack detection, respectively.  In the case of ED-based ML, the energy of the received signals is used for the detection of different hypotheses while using ML structure similar to the one used for the proposed algorithm. It is observed from Fig. 4 and outperforms the ED-based algorithm by 2.24 % in the case of PUEA and 6.88 % in case of jamming attack detection in terms of AUROC values. This is because the energy patterns in the residual and gradient vector enhance the detection capability of the proposed algorithm compared to the ED-based algorithm. From an ML point-of-view, a trained model (classifier in this work) should not memorize the inputs used in its training. To investigate this quality in the trained ML classifier model in the proposed algorithm, Fig. 6 shows the training and testing losses versus epochs for the PUEA detection when M = 100. In view of this figure, it is evident that the accuracy of the training sets converges to the test set. These results signify the absence of overfitting, thereby validating the generalizability of the proposed model. In other words, the trained model does not memorize the training data. Here, it should be noted that we include the loss graph only for M = 100 case of PUEA to avoid repetition. For the other values of M and for the jammer case we observe the similar behavior in loss graphs.

Conclusions
In this paper, the convergence patterns of sparse recovery are exploited for the purpose of PUEA and jamming attack detection. Sparse recovery was conducted over a legitimate PU channel-dependent dictionary. Consequently, the signal from the legitimate node has smooth convergence as compared to the signal from the illegitimate node. Essentially, this awes to the fact that this signal is the only one compressible in the domain exclusively defined by this sparsifying dictionary. Besides, the non-compressive nature of a jamming signal with sparse coding over a PU channel-dependent dictionary was also exploited to detect jamming attacks. This detection algorithm made use of ML-based approaches. Numerical experiments showed the effectiveness of the proposed algorithm and its superior performance compared to ED-based ML algorithms. These results were validated in terms of confusion matrices, ROC curves, and values of AUROC curves, as quality metrics. In terms of AUROC curve values, the proposed algorithm outperformed the ED-based algorithm by 2.24 % in the case of PUEA and 6.88 % in case of jamming attack detection.

Appendix Residual Energy Gradient Decay Analysis
The proposed algorithm is based on the convergence patterns in the sparse coding of the compressed received signal. More specifically, the proposed algorithm uses a channel-dependent dictionary to identify different characteristics of gradients and residuals to detect PUEA and jamming attack.
In this work, we employ the computationally-efficient OMP for sparse coding. Let us focus on its first iteration for the sake of simplicity. At the start of the first OMP iteration, the signal itself is used to initialize the zero-th residual r 0 . Afterwards, OMP chooses an atom (d) from the atoms of the given dictionary D P U that have the strongest similarity to the r 0 . This similarity is characterized by the projection corresponding to each atom as E = dd † . The updated residual after the selection of atom can be given as For simplicity, the least-squares refinement of OMP is ignored. With each iteration, the residual magnitude is decreasing and the pattern of the concatenated residual values ( r 1 2 2 ) is used for classification. To this end, the first element in G can be represented as Using this gradient magnitude property we can differentiate the cases H 0 , H 1 , H 2 , and H 3 . The general received signal can be given as: y = hx + n and we can write the G(1) as follows G(1) = y − Ey 2 2 − y 2 2 .
With respect to the properties of projection, we know that En, n = En, En = En 2 2 . Hence, (8) can be written as Following the same logic, G(1) for H 1 , H 2 and H 3 can be expressed as follows G(1) = hx + n − E(hx + n), hx + n − E(hx + n) − hx + n, hx + n , where a, b, c and d are defined next. Specifically, a can be written as a = hx + n, hx + n , = hx + hx + hx + n + hx + n + n + n , = hx + hx + 2 hx + n + n + n .
Assuming that the noise is independent of hx, hx + n = 0, we can write (11) as a = hx + hx + n + n .
Lastly, d can be given as Based on (12), (13), (14) and (15), and making the appropriate substitution, G(1) can be written as Finally, the generic expression of the gradient magnitude for hypotheses H 1 , H 2 and H 3 can be expressed G(1) H1,H2,H3 = − Ehx 2 − En 2 , where h corresponds to h P U in case of PU or h i in case of PUE/jammer as explained in (4). Moreover, x will be structured in case of PU and PUE, while unstructured in the case of a jamming attack.  Figure 4: Title{ROC curves for PUEA detection performance analysis} Figure 4: Legend{Comparison of the proposed algorithm with the ED-based ML algorithm for PUEA detection using ROC curves.} Figure 5: Title{ROC curves for jamming attack detection performance analysis} Figure 5: Legend{Comparison of the proposed algorithm with the ED-based ML algorithm for jamming attack detection using ROC curves.} Figure 6: Title{Model loss graph} Figure 6: Legend{Model loss graph for the PUEA detection when M = 100}