Skip to main content

An automatic modulation classification network for IoT terminal spectrum monitoring under zero-sample situations


Rely on powerful computing resources, a large number of internet of things (IoT) sensors are placed in various locations to sense the environment we live. However, the proliferation of IoT devices has led to the misuse of spectrum resources, and many IoT devices occupy the frequency band without permission. As a consequence, the spectrum regulation has become an essential part of the development of IoT. Automatic modulation classification (AMC) is a task in spectrum monitoring, which senses the electromagnetic space and is carried out under non-cooperative communication. Generally, deep learning (DL)-based methods are data-driven and require large amounts of training data. In fact, under some non-cooperative communication scenarios, it is challenging to collect the wireless signal data directly. How can the DL-based algorithm complete the inference task under zero-sample conditions? In this paper, a signal zero-shot learning network (SigZSLNet) is proposed for AMC under the zero-sample situations. The semantic descriptions and the corresponding semantic vectors are designed to generate the feature vectors of the modulated signals. The generated feature vectors act as the training data of zero-sample classes. The experimental results demonstrate the effectiveness of the proposed SigZSLNet. The accuracy of one unseen class and two unseen classes exceeds 90% and 76%, respectively. Simultaneously, we show the generated feature vectors and the intermediate layer output of the model.


The rapid development of the Internet of Things (IoT) and mobile networks is to meet users’ demands for performance, speed, seamless connectivity, intelligence, and portability [1,2,3,4,6]. As communication technology continues to evolve to meet application needs [7,7,9], the 6th Generation Mobile Communication Technology (6G) is the next generation of telecommunication standards to ensure the needs of ever-growing intelligence communication applications. The IoT is a potential candidate for leveraging the resources of communication networks. It is estimated that there are currently over 50 billion devices connected to the internet. Based on these IoT devices to sense the surrounding environment and provide better services. Spectrum is the most valuable resource in wireless networks, and the increase in IoT end devices may lead to spectrum resource abuse. Therefore, regulating the spectrum becomes an important task, and it is a non-cooperative communication method. Consequently, it is imperative to classify receiver modulation types under non-cooperative communication conditions. Some researchers have proposed an automatic classification modulation scheme called automatic modulation classification (AMC), which contributes to signal recognition, threat assessment, and spectrum monitoring [10].

For the urgent needs of spectrum regulation in IoT, AMC has attracted much attention in recent years. The conventional modulation recognition algorithms can be divided into likelihood-based (LB) methods and feature-based (FB) methods [11]. LB determined the modulation mode of the received signal by the likelihood comparison, which infers the labels by the Bayesian optimization. FB regarded AMC as a pattern recognition issue, and it yielded suboptimal classification accuracy although it was less sensitive to the uncertainties. The algorithm extracts the statistical features [12,12,14], instantaneous time features [15] and wavelet features [16] from the original modulated signal, and these features are used as input to the machine learning algorithm, such as Support Vector Machine (SVM), Lightgboost, Decision Tree (DTree), K-Nearest Neighbors (KNN), [17,17,18,20], and infer the class to which they belong as a consequence. However, LB methods are usually of considerable computational complexity, prior knowledge-based accompanying sensitivity to channel noise, and poor robustness. FB methods are expert feature-based and suboptimum classification accuracy. Accordingly, both LB and FB are poorly adapted to non-cooperative communication environments.

Recently, Convolutional Neural Networks (CNNs) have been successfully applied to AMC by learning valid representations from complex data [21]. Reference [22] leveraged CNN network on AMC, and high classification accuracy was obtained, which proved the CNN-based methods are better than traditional feature-based methods. Reference [23] employed CNN, residual network (ResNet), Inception network, and Convolutional Long Short Term Memory Deep Neural Network (CLDNN) to process modulated signals, respectively, which demonstrated that the temporal feature was effective in improving the classification accuracy. Reference [24] considered feature interactions and combined the advantages of CNN and Long Short Term Memory (LSTM) to solve the AMC issue. Researchers propose a CNN-LSTM dual-stream structure to explore feature interactions and the spatial-temporal properties of the original signal. Simultaneously, the signal constellation was applied to AMC for the classification accuracy in [25]. Progressively, for better robustness and decent inference, a grid constellation matrix-based algorithm was considered in [26]. For the few-sample conditions, a host of researchers have made some explorations. Bu et al. [27] proposed an adversarial transfer learning architecture (ATLA) to reduce the differential on sampling rates, with 10% of the dataset for training, and achieved a favorable classification result. Li et al. [28] exploited a capsule network (CapsNet) to extract features from signals, with only 3% of the training data set, and obtained a promising inference result. Nevertheless, the schemes mentioned above work on the assumption that all categories of training sets have data, whether sufficient or few. In fact, these methods cannot work under class-imbalance situations, in which the training set misses data for some classes.

The data augmentation method that based on Generative Adversarial Networks (GAN) [29] is a possible way to address the issue of missing data for some classes, where the generator generates a specified data to fool the discriminator, and then the discriminator attempts to distinguish the real data from the generated data accurately. In the literature [30], the unstable convergence of GAN was solved, and an efficient approximation of the Wasserstein distance was provided for WGAN. For WGAN-GP [31], the gradient disappearance and the weight clipping were considered by an improved gradient penalty. Note that GAN-based data augmentation methods still require the corresponding data for the training and the augmentation. Under the zero-sample conditions, GAN-based data augmentation methods are not applicable anymore for the data imbalance issue. For the classes with entirely missing data, the zero-sample learning scheme is always considered by expert-set linguistic descriptions, transfer learning, or matrix transformations [32, 33]. However, the linear matrix transformation methods have a particularly low inferential accuracy under complex data scenes. Furthermore, the modulated signal does not have a semantic expert dataset. As a result, the methods based on semantic space mapping do not have modulated semantics.

In this paper, in order to better regulate the spectrum usage of IoT devices, a novel signal zero-shot learning network (SigZSLNet) is proposed to solve the issue of zero-sample mentioned above for AMC firstly, where the mapping relations between the classes are established by the semantic space mapping, and the expert-set linguistic descriptions are conducted for the modulated modes. GAN drives the generation of unseen classes, which enriches the training set. In summary, the contributions of this article can be concluded as

  1. (1)

    The semantic descriptions are designed for different modulated signals by their properties, and the semantic vectors are obtained based on the generation module of the semantic attribute vector.

  2. (2)

    The WGAN-GP based method is employed to generate the feature vectors of the modulated signals under the guidance of the semantic vectors. After that, a complete training dataset is constructed with the WGAN-GP module, including the real feature vector and the synthetic feature vector.

  3. (3)

    The complete training dataset is input to the classifier, and the estimation result of the AMC is obtained. Experimental results indicate that the proposed SigZSLNet can solve the zero-sample issue for AMC effectively.

To better present this paper, the remainder of this paper is organized as follows. Section II introduces the motivation and the description of the model. Section III presents the experiments of various groups, and the result is analyzed in detail. We discuss the result in section IV. Finally, Section V presents the conclusion of this article.



With the rapid development of 5G technology, more and more sensors and devices are connected to the internet, and the internet is transmitted through the wire and air network, satellite network, etc. As the number of participating end terminal devices becomes more and more extensive, the information received by the central server becomes more and more complex. As the wireless bands become more compact, monitoring and management of wireless signals turn critical and challenging. As shown in Fig. 1, the scenarios of wireless spectrum management are considered in this paper, where the receiver does not have prior knowledge of the modulated signal at the transmitter. Accordingly, the classes of the modulated signal at the receiver may be different from the pre-collected data, and we also need to learn the details of the received signal. The task of non-cooperative communication is to recognize the modulation types of the wireless signal under the above-mentioned scenarios. Additionally, the training data collected in advance are called seen classes, and training data that are not available are called unseen classes. In this way, the available classes of the modulated signal at the receiver is \(C_{\rm seen} = \{c_1, c_2, \ldots , c_{m-k}\}\), and the unavailable classes \(C_{\rm unseen} = \{c_{m-k+1}, c_{m-k+2}, \ldots , c_{m}\}\). The wireless signal at receiver can be represented as

$$\begin{aligned} \begin{aligned} R(t) = s(t) * G(t) + n(t), \end{aligned} \end{aligned}$$

where s(t) is the modulated signal from the transmitter, G(t) is the channel, and n(t) is the Additive White Gaussian Noise (AWGN). The goal of the whole work is to complete the training of the classifier based on the data of seen categories, making the classifier able to complete the inference of seen and unseen categories.

$$\begin{aligned} \begin{aligned} L= f(T_{\rm seen}, T_{\rm unseen}| C_{\rm seen}), \end{aligned} \end{aligned}$$

where the \(T_{\rm seen}\) and \(T_{\rm unseen}\) are the test signal data of seen and unseen classes, respectively. L is the label of the test data.

Fig. 1
figure 1

AMC application scenarios in the Internet of Things

Fig. 2
figure 2

Block diagram of the proposed SigZSLNet

Fig. 3
figure 3

The generation module of semantic attribute vector. The left part is the convolutional module and the right part is the convolutional encoding module

Fig. 4
figure 4

The network architecture of SigZSLNet. WGAN-GP can generate the feature data of the unseen (missing) classes, D is the discriminator, and G is the generator. The classifier consists of a fully connected layer with the neuron number 128 and a softmax layer with the class 7

Generally, the common DL-based methods [21] can infer the received data with an adequate training data set. To make the DL-based method work under zero-sample conditions, this paper mainly concerns how to generate the missing data \(C_{\rm unseen}\) accurately with WGAN-GP and how to conduct AMC on the basis of the seen data and the generated data.

figure a
figure b
figure c

Model description

The overall process of our proposed method is shown in Fig. 2, which consists of three subsystems: semantic vector generation, generated module, and classification subsystems.

In the former subsystem, there are two inputs: the wireless signal (seen) and semantic description (seen and unseen). Accordingly, the data of seen signal and the semantic description of seen classes are leveraged to train the generation module of the semantic attribute vector. Consequently, the semantic vector of unseen classes is obtained by the semantic description of unseen classes. Then, in step 2, the original wireless signal of seen types and the pre-trained convolutional neural part are employed to yield the CNN feature vector of seen classes. Next, the generated semantic vector in step 1 and the CNN feature in step 2 are used to generate CNN features of unseen types in step 3. Finally, the CNN feature of seen classes and the generated feature vector of unseen classes form the training set, which is employed to classification subsystem in step 4.

The generation module of the semantic attribute vector is shown in Fig. 3. The network architecture of the proposed SigZSLNet is exhibited in Fig. 4. From Fig. 4, SigZSLNet contains three modules, the CNN module, the WGAN-GP module, and the Classifier module.

Semantic generation

As it is shown in Fig. 3, the generation module of the semantic attribute vector contains the convolutional part and the encoding part, where the convolutional part is to extract the signal data features, and the convolutional encoding part is to encode semantic descriptions and extract semantic features. The layer of semantic generation module and output dimensions of each layer are represented in Table 1. More specifically, various modulated signals are related to their property descriptions. As a result, the property descriptions of different modulated signals are quantified with one-hot encoding in this letter, and the quantification results form the specific semantic vectors, which can guide the WGAN-GP to generate specific data. After the loss function optimization, a semantic feature vector is obtained that can be used as the input to the generator to generate the specific data.

Table 1 Layers of semantic generation module and output dimensions of each layer

The measured relations between the semantic description and the signal are optimized by Joint Embedding Loss [34], which is formulated as

$$\begin{aligned} \begin{aligned} L_A&= \frac{1}{N} \sum _{n=1}^N{{\rm max}}\ l_s(s_n, t_n, y_n)+l_t(s_n, t_n, y_n)\\&= \frac{1}{N} \sum _{n=1}^N{{\rm max}}(0,\ \Delta (y_n,y) + \mathbb {E}[F(s_n, t)-F(s_n,t_n)])\\&\quad + {{\rm max}}(0, \Delta (y_n,y)+\mathbb {E}[F(s,t_n)-F(s_n,t_n)]) \end{aligned} \end{aligned}$$
$$\begin{aligned} \begin{aligned} \Delta (a,b) = {\left\{ \begin{array}{ll} 1, &{} \text{ if } ~a \ne b \\ 0, &{} \text{ if } ~a = b \end{array}\right. } \end{aligned} \end{aligned}$$

where N denotes the total number of the signal and text pairs; \(l_s\) and \(l_t\) are the loss of the signal and the semantic, respectivelty; \(\Delta\) signifies the 0–1 loss; y denotes the class label; t is the subset of text descriptions, and \(y_n\) is the n-th text descriptions; s signifies the subset of signals; \(\mathbb {E}[\cdot ]\) denotes the mean operation; \(F(s, t)= e(s)^\mathsf {T} \varphi (t)\), where e(s) is the encoder of the input signals, and \(\varphi (t)\) is the encoder of the input attribute description. The algorithm for semantic generation module is shown in Algorithm 1, where \(C_{\rm seen}\) is the signal of seen classes, \(D_{\rm seen}\) is the semantic description of \(C_{\rm seen}\), \(D_{\rm unseen}\) represents the semantic description of unseen classes. \(S_{\rm seen}\) and \(S_{\rm unseen}\) are the output semantic vector of seen classes and unseen classes.

CNN module

The CNN module is leveraged to obtain the feature vectors, which consists of five convolutional layers, one fully connected layer. In detail, the convolution layer includes a convolution with the stride of 1 × 1 and the convolution kernel of 1 × 8, an activation function \(f_{\rm relu} = \max (0, x)\), and a maximum pooling with the stride of 1 × 2 and the kernel of 1 × 2. The output of the last fully connected layer is 128 dimensions. The details are depicted in Table 2. In general, the main role of CNN module is to extract spatial features of modulated signal data. Similarly, for CNN’s ability to extract spatial features, the open source pre-trained CNN model with convolutional module can be used as the CNN module in the proposed SigZSLNet. The pre-training weights of the model proposed in [22] can be found at GitHub. In this way, we utilize this model as a CNN module of SigZSLNet, using the output of the first fully connected layer as the feature vector of the signal.

Table 2 Layers of SigZSLNet and the output dimensions of each layer

GAN module

The main motivation of the GAN module is to generate the feature vectors of the unseen (missing) classes, which contains the generator (G) part and the discriminator (D) part. The generator consists of two fully connected layers with 256 neurons in the middle layer and 128 outputs, and the activation function in each layer is \(leaky\_relu\). Similarly, the discriminator consists of two fully connected layers with 256 neurons, and the activation function in the first layer is \(leaky\_relu\), while the second layer has no activation function. Table 2 shows the detailed structure of WGAN-GP.

Fig. 5
figure 5

The test accuracy comparisions under various SNRs (Group 1 and Group 2 have one class missing while Group 3 and Group 4 have two class missing)

Fig. 6
figure 6

The visualized feature vector: a The synthetic feature vector, b The real feature vector

Fig. 7
figure 7

The confusion matrix comparisons of different groups under SNR = 6 dB

Fig. 8
figure 8

The visualized t-SNE of Group 4 under different SNRs

The input to the generator is a semantic attribute vector of the modulated signal, which is generated in the semantic generation module. The input of the discriminator is a 128-dimensional attribute vector generated by the generator. The generator and the discriminator are playing a game called Min–Max, where the generator tries to generate data from the semantic attribute vector that can be judged as true by the discriminator. Simultaneously, the discriminator tries to distinguish between the real feature data from the CNN module and the synthetic data generated by the generator. After several epochs, the generator generates modulated signal data that makes the discriminator difficult to distinguish. In this way, the generated feature data can substitute for the real data. The Wasserstein GAN with gradient penalty (WGAN-GP) objective function is employed in the process we trained, which is proposed by Gulrajani et al. [35]. The optimization objective is expressed as

$$\begin{aligned} \begin{aligned} \min \limits _{G} \max \limits _{D} L_{GAN}=&-E_{x \sim p_{r}}[D(x)]+E_{x \sim p_{g}} [D(x)] \\&+ \lambda E_{x \sim P_{\hat{a}}}[ \parallel \nabla _{x}D(x) \parallel _{2}-1 ]^{2} \end{aligned} \end{aligned}$$

where \(x\sim p_{r}\) denotes the real data and \(x \sim p_{g}\) signifies the generated data from the generator. \([\parallel \nabla _{x}D(x) \parallel _{2}-1 ]^{2}\) is the gradient penalty, while \(\lambda\) serves as the penalty coefficient with the default \(\lambda =10\). Algorithm 2 describes the generation process of the missing signal classes, where \(f_G(\cdot )\) denotes the generator, \(f_D(\cdot )\) denotes the descriminator, \(f_{CNN}(\cdot )\) demotes the CNN part, \(y_{\rm unseen}\) is the generated feature vector of unseen classes.

Classifier module

The classifier module contains a fully connected layer and a Softmax layer. Specifically, the fully connected layer has 128 neurons. PReLU is used as the activation functions in the fully connected layer, and Softmax is the judicial function of the last layer, which are, respectively, formulated as

$$\begin{aligned} \begin{aligned} f_{\rm PReLU}(x_i) = {\left\{ \begin{array}{ll} x_i, &{} \text{ if } x_i > 0 \\ \lambda x_i, &{} \text{ if } x_i\le 0 \end{array}\right. } \end{aligned} \end{aligned}$$
$$\begin{aligned} \begin{aligned} f_{\rm Softmax}(x_i) = \frac{e^{x_i}}{\sum _{i}e^{x_i}} \end{aligned} \end{aligned}$$

where \(\lambda \in (0,1)\) is a variable that can be learned by a backpropagation algorithm, and adjust to the most appropriate slope value.

The training stage focuses on training the classifier to have the ability to classify seen and unseen classes. First, the semantic attribute vector is generated based on the semantic description. In this way, the semantic vectors of the seen and unseen categories are obtained. Then, WGAN-GP is trained by the seen classes and their semantic vectors. Consequently, synthetic feature vectors for the unseen categories are generated from the generator based on their semantic attribute vectors. Finally, both the real feature vectors and the synthetic feature vectors are input to the Classifier module. After processing by the full connection layer and the Softmax layer, the prediction result of AMC is obtained. The final loss function is formulated as

$$\begin{aligned} \begin{aligned} L_{C} = h(y, f(x_{\rm seen}, x_{\rm unseen}\ |\ \theta )) \end{aligned} \end{aligned}$$

where h is the cross entropy, which is defined as \(h(p,q) =- \sum _{i} p_i \log (q_i)\). y denotes the true label and f signifies the classifier; \(\theta\) is the weight of the fully connected layer, and \(x_{\rm seen}\) is the CNN feature of the seen classes; \(x_{\rm unseen}\) denotes the unseen class.

The test stage is concerned with determining of the category to which the received modulated signal data belongs. Generally, In an end-to-end AMC system, the modulated signals received by the receiver may belong to the seen class or the unseen class of modulated signals. The feature vector is obtained after feature extraction by the CNN module, and then inferred by the classifier module to obtain the inference results.


Dataset and settings

The experimental dataset of the in-phase and quadrature (IQ) samples is obtained based on the MATLAB 2019b platform. The dataset consists of seven modulated modes, Group = {BPSK, QPSK, 2ASK, 4ASK, 16QAM, 32QAM, 64QAM} (SNRs range from 0 to 10 dB, with the stride 1 dB). For each class, there are 400 modulated signals for each SNR, where 300 of them are for training and the rest are for the test. To sum up, the training set contains 23,100 modulated signals, while the test set contains 7700 modulated signals. The semantic description of each modulated mode is shown in Table 3, where the “statistical peaks” is the peak number of the modulated signal.

Table 3 The semantic description of each category

Four groups of the seen classes are considered in the experiments, which are presented in Table 4. For Group 1, we do not have the class of BPSK for training. Consequently, we utilize the data of seen classes {QPSK, 2ASK, 4ASK, 16QAM, 32QAM, 64QAM} and the description semantic of all classes to generate unseen classes. Group 2 is similar. Group 3 and Group 4 are tested under two unseen classes conditions. Similarly, for Group 3, we do not have {QPSK, 4ASK}, and leverage {BPSK, 2ASK, 16QAM, 32QAM, 64QAM} to obtain unseen classes. For Group 4, we do not have {QPSK, 4ASK}.

Table 4 The semantic description of each category

The attribute feature vector is normalized in this letter because it can accelerate the training and prevent overfitting. Simultaneously, the Gaussian Noise is added to the input of the generator because the Gaussian data are easier to map into the CNN feature distribution. Additionally, the harmonic mean is considered to reflect the inference ability, and it is defined as

$$\begin{aligned} \begin{aligned} H = \frac{2su}{s+u} \end{aligned} \end{aligned}$$

where s denotes the average classification accuracy for the seen classes, and u is for the unseen classes. The implementation of the SigZSLNet is based on Tensorflow.


Table 5 provides the performance comparisons of SigZSLNet among various groups. In addition, there is one missing class in Group 1 and Group 2, while Group 3 and Group 4 have two classes of data missing. The recognition accuracy in Table 5 is the average value of five experiments. In fact, the convolutional neural part of the pre-trained model can employ in the CNN part of the proposed SigZSLNet. In the experiment, we leverage the ResNet pre-trained on rml2018.a [22] to serve as the CNN module of the SigZSLNet. As shown in Table 5, the average classification accuracy of the proposed SigZSLNet exceeds 76%. In detail, the accuracy of Group 1 and Group 2 exceeds 85%, which indicates that SigZSLNet can effectively conduct AMC under zero-sample situations. Besides, compared with the recognition accuracy of Group 1 and Group 2, the performance of Group 3 and Group 4 declines. With the increase in missing categories, the data quality of generating different missing categories decreases.

Table 5 The performance comparison among various methods

The reason for the above result is the limited generation ability of the proposed two fully connected layers. With the increase in missing categories, the data quality of generating decreases. As a result, each class of experiment data has its unique embedding space, and it is difficult for generate models to map various different distribution spaces at the same time. Simultaneously, the modulated signal is highly susceptible to the signal-to-noise ratio. Under low SNR conditions, the signal is strongly disturbed, the features are not obvious, and the features learned by the generator are not obvious. In experiments, we mixed low SNR and high SNR data, and the model was unable to learn a valid feature representation. Consequently, the recognition accuracy decreases due to the fall of the data generation quality. In total, Table 5 states that the proposed SigZSLNet is an AMC scheme more applicable to real scenarios.

The test accuracy comparisons under various SNRs are shown in Fig. 5, where the settings of Group 1-4 are the same as that of Table 5. From Fig. 5, the recognition accuracy improves with the rise of SNR firstly and then slightly declines with the rise of SNR when the SNR is above 6 dB. It indicates that SNR is an important influence factor for the recognition accuracy of AMC. Particularly, the average classification accuracy exceeds 85% of various groups when SNR varies from 6 dB to 9 dB because GAN balances the quality of the generated feature vector. In addition, as discussed in Table 5, the recognition accuracy of the proposed SigZSLNet gradually deteriorates with the rise of the unseen classes.

In Fig. 6, 50 synthetic feature vectors and 50 real feature vectors of the random 2ASK modulation signal are visualized. From Fig. 6, although there is much difference between the synthetic feature vector and the real feature vector, their similarity is at a high level, especially for the waveform variation trend. The generated feature vectors have great similarity with the real feature vectors. This means that SigZSLNet can generate the feature vectors of zero-sample classes accurately.

As a supplement, the confusion matrix comparisons of various groups are made under SNR = 6 dB in Fig. 7, where the BPSK signal is missing in Group 1 and exists in Group 2–4. Figure 7 indicates that the classification accuracy of Group 1 is obviously higher than that of Group 2–4 for the BPSK signal, which manifests the generation quality of the synthetic feature vectors by the GAN module outperforms that of the real feature vectors by the CNN module. This states that the proposed SigZSLNet can effectively fill the missing classes and improve the accuracy of AMC under zero-sample situations. Simultaneously, for the computational complexity of the proposed model, the generator and discriminator separately consist of two fully connected layers, and the WGAN module and the classifier contain 91,009 parameters, which make the network converse easily. For the training process, the total number of training data is 20900, and the training process takes 0.628 seconds in each epoch.

In Fig. 8, we show the visualized results of the feature vectors before input to the classifier module. The 128-dimensional features are downscaled into two-dimensional coordinate vectors by the t-SNE algorithm [36]. Figure 8a–d shows the visualization of the seen and unseen classes of Group 4 with different signal-to-noise ratios for the feature vectors. The classification accuracy of the classifier is highest at the SNR of about 6 dB, as described in the previous experimental results. For the results of dimensionality reduction visualization, at a signal-to-noise ratio of 6dB, different classes are clustered together, and each class is easier to distinguish. When SNRs = {0 dB, 4 dB, 10 dB}, the reduced-dimensional features of BPSK and QPSK are mixed together and not easily distinguished. However, for 4ASK and 64QAM, which are unseen categories, the generated data are distinguishable. It is concluded that the algorithm has the highest classification accuracy at a signal-to-noise ratio of 6 dB because the CNN feature vectors of BPSK and QPSK are easily confused. Thus, the result of difficulty distinguishing between BPSK and QPSK is that the pre-trained CNN feature vector outputs low-quality feature vectors.


Simulation results show that the SigZSLNet proposed in this paper can generate data of missing modulation signal to make up for that data in the training set and solve zero-sample in AMC. In the groups where one category was missing, and two categories were missing, the accuracy rate exceeded 76% in the task of classifying seven categories. However, it is known from the simulation results that the generated modulated signals lack diversity and can only generate data of modulated signals with a single SNR. The model needs to be further improved in terms of SNR diversity to generate rich modulated signal SNRs. In addition, the poor robustness of the open-source CNN model, which is only trained with 24 categories of modulated signals, makes it difficult to excel in our experiments. In future work, we will collect rich modulated signal data, making a dataset like Imagenet [37], and CoCo [38], to train more robust pre-trained models.


The increasing number of IoT devices means that more traffic will occupy the scarce available spectrum in the future, so it becomes extremely important to regulate and recognize the observed signals. However, in complex electromagnetic environments, some classes of modulated signals cannot be collected in advance to train the classifier, which requires us to find a way to address the recognition of signals in zero-sample conditions. In this paper, we first propose the method SigZSLNet to implement AMC under zero-sample conditions. Based on the semantic feature vector, the feature vectors of the missing modulated signals are generated with WGAN, the classification accuracy of unseen classes has been greatly improved.

Simultaneously, the various groups’ experimental results validate the effectiveness of the proposed SigZSLNet. The proposed method obtains an average accuracy of over 85% in the missing category when one modulation signal is missing, corresponding to an accuracy of over 76% for the classification over seen and unseen classes. While two modulation signals are missing, the proposed algorithm obtains an average accuracy of more than 69% in the missing category, corresponding to an accuracy of more than 76% for the seven classification tasks. We visualized the generated feature vectors for comparison. Also, We visualized the data clustering for each category based on the PCA algorithm and t-SNE algorithm to prove the validity of the generated data. In conclusion, the experimental results show that our proposed method effectively solves the AMC task of spectrum resource management for IoT terminal devices.

Availability of data and materials

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.



Automatic modulation classification


Internet of things


Signal zero-shot learning network


Generative adversarial network


The 6th generation mobile communication technology






Convolutional neural networks


Deep learning


Convolutional long short term memory deep neural network


Adversarial transfer learning architecture


Capsule network


Support vector machine


Decision tree




K-nearest neighbors


In-phase and quadrature


Binary phase shift keying


Amplitude shift keying


Quadrature phase shift keying


Quadrature amplitude modulation


Signal-to-noise ratio


Parametric rectified linear unit


t-distribution domain embedding algorithm


  1. C. Yuanhao, L. Fan, J. Xiaojun, M. Junsheng. Integrating sensing and communications for ubiquitous iot: Applications, trends and challenges. arXiv preprint arXiv:2104.11457 (2021)

  2. M. Junsheng, J. Xiaojun, Z. Yangying, G. Yi, Z. Ronghui, Z. Fangpei, Machine learning-based 5g ran slicing for broadcasting services. IEEE Trans. Broadcast. (2021).

    Article  Google Scholar 

  3. Z. Ronghui, J. Xiaojun, W. Sheng, J. Chunxiao, M. Junsheng, Y.F. Richard, Device-free wireless sensing for human detection: the deep learning perspective. IEEE Internet Things J. 8(4), 2517–2539 (2021).

    Article  Google Scholar 

  4. W. Zijie, L. Rongke, L. Qirui, S. John, T.K. Michel, Energy-efficient data collection and device positioning in uav-assisted iot. IEEE Internet Things J. 7(2), 1122–1139 (2020).

    Article  Google Scholar 

  5. L. Qiang, S. Songlin, R. Bo, K. Michel, Intelligent reflective surface based 6g communications for sustainable energy infrastructure. IEEE Wirel. Commun. 28(6), 49–55 (2021).

    Article  Google Scholar 

  6. Z. Ronghui, J. Chunxiao, W. Sheng, Z. Quan, J. Xiaojun, M. Junsheng, Wi-Fi Sensing for joint gesture recognition and human identification from few samples in human-computer interaction. IEEE J. Sel. Areas Commun.

  7. C. Na, R. Bo, Z. Xinran, K. Michel, Scalable and flexible massive mimo precoding for 5g h-cran. IEEE Wirel. Commun. 24(1), 46–52 (2017).

    Article  Google Scholar 

  8. C. Lei, F.Y. Richard, J. Hong, R. Bo, L. Xi, C.M.L. Victor, Green full-duplex self-backhaul and energy harvesting small cell networks with massive mimo. IEEE J. Sel. Areas Commun. 34(12), 3709–3724 (2016).

    Article  Google Scholar 

  9. N. Ahasanun, K. Michel, R. Bo, Fountain coded cooperative communications for lte-a connected heterogeneous m2m network. IEEE Access 4, 5280–5292 (2016).

    Article  Google Scholar 

  10. M. Junsheng, J. Xiaojun, H. Hai, G. Ning, Subspace-based method for spectrum sensing with multiple users over fading channel. IEEE Commun. Lett. 22(4), 848–851 (2018)

    Article  Google Scholar 

  11. O.A. Dobre, M. Oner, S. Rajan, R. Inkol, Cyclostationarity-based robust algorithms for qam signal identification. IEEE Commun. Lett. 16(1), 12–15 (2012)

    Article  Google Scholar 

  12. S. Wei, Feature space analysis of modulation classification using very high order statistics. IEEE Commun. Lett. 17(9), 1688–1691 (2013)

    Article  Google Scholar 

  13. B. David, S. Balu, A hybrid ica-svm approach to continuous phase modulation recognition. IEEE Signal Process. Lett. 16(5), 402–405 (2009)

    Article  Google Scholar 

  14. H. Sai, Y. Yuanyuan, W. Zhiqing, F. Zhiyong, P. Zhang, Low complexity automatic modulation classification based on order-statistics. IEEE Trans. Wirel. Commun. 16, 400–411 (2017)

    Article  Google Scholar 

  15. E.E. Azzouz, A.K. Nandi, Automatic identification of digital modulation types. Signal Process 47(1), 55–69 (1995)

    Article  Google Scholar 

  16. M. Stephane, Z. Sifen, Characterization of signals from multiscale edges. IEEE Trans. Pattern Anal. Mach. Intell. 14(7), 710–732 (1992)

    Article  Google Scholar 

  17. C. Corinna, V. Vladimir, Support-vector network. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  18. B. Leo, Random forests. IEEE Trans. Broadcast. 45(1), 5–32 (2001)

    MATH  Google Scholar 

  19. T. Cover, P. Hart, Nearest neighbor pattern classification. IEEE Trans. Broadcast. 13(1), 21–27 (1967)

    MATH  Google Scholar 

  20. N. Friedman, D. Geiger, M. Goldszmidt, Bayesian network classifiers. Mach. Learn. 29(2), 131–163 (1997)

    Article  Google Scholar 

  21. L. Yann, B. Yoshua, H. Geoffrey, Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  22. O. Tim, R. Tamoghna, T.C. Charles, Over-the-air deep learning based radio signal classification. IEEE J. Sel. Top. Signal Process. 12(1), 168–179 (2018)

    Article  Google Scholar 

  23. W. Nathan, O. Timothy, Deep architectures for modulation recognition. In: IEEE International Symposium on Dynamic Spectrum Access Networks, pp. 1–6 (2017)

  24. Z. Zufan, L. Hao, W. Chun, G. Chengquan, X. Yong, Automatic modulation classification using cnn-lstm based dual-stream structure. IEEE Trans. Veh. Technol. 69(11), 13521–13531 (2020)

    Article  Google Scholar 

  25. W. Yu, L. Miao, Y. Jie, G. Guan, Data-driven deep learning for automatic modulation recognition in cognitive radios. IEEE Trans. Veh. Technol. 68(4), 4074–4077 (2019)

    Article  Google Scholar 

  26. H. Sai, J. Yizhou, G. Yue, F. Zhiyong, Z. Ping, Automatic modulation classification using contrastive fully convolutional network. IEEE Wirel. Commun. Lett. 8(4), 1044–1047 (2019)

    Article  Google Scholar 

  27. B. Ke, H. Yuan, J. Xiaojun, H. Jindong, Adversarial transfer learning for deep learning based automatic modulation classification. IEEE Signal Process. Lett. 27, 880–884 (2020)

    Article  Google Scholar 

  28. L. LiXin, H. Junsheng, C. Qianqian, Automatic modulation recognition: a few-shot learning method based on the capsule network. IEEE Trans. Broadcast. 99, 1–1 (2020)

    Google Scholar 

  29. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets. Neural Inf. Process. Syst. 27, 2672–2680 (2014)

    Google Scholar 

  30. A. Martin, C. Soumith, L. Bottou. Wasserstein gan (2017)

  31. G. Ishaan, A. Faruk, A. Martin, D. Vincent, C. Aaron, Improved training of Wasserstein GANs. Neural Inf. Process. Syst. 30, 5767–5777 (2017)

    Google Scholar 

  32. F. Yanwei, H. Timothy, X. Tao, G. Shaogang, Transductive multi-view zero-shot learning. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2332–2345 (2015)

    Article  Google Scholar 

  33. R.-P. Bernardino, T. Philip, An embarrassingly simple approach to zero-shot learning, vol. 37, pp. 2152–2161 (2015)

  34. R. Scott, A. Zeynep, L. Honglak, S. Bernt, Learning deep representations of fine-grained visual descriptions, pp. 49–58 (2016)

  35. M.A.V.D.A.C. Ishaan Gulrajani. Faruk Ahmed: Improved training ofwasserstein gans (2017)

  36. M. Laurens, H. Geoffrey, Visualizing data using t-sne. J. Mach. Learn. Res. 9(86), 2579–2605 (2008)

    MATH  Google Scholar 

  37. D. Jia, D. Wei, S. Richard, L. Lijia, L. Kai, L. Feifei, Imagenet: a large-scale hierarchical image database, pp. 248–255 (2009)

  38. T. Lin, M. Michael, S. Belongie, Microsoft coco: common objects in context, pp. 740–755 (2014)

Download references


This research was solely the work of the authors, funded by no authority.

Author information




Q.Z was in charge of the major theoretical analysis, algorithm design, experimental simulation, and paper writing. XJ.J and Q.Z had contributions to paper writing. FP.Z and RH.Z had contributions to theoretical analysis and gave suggestions of the organization. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xiaojun Jing.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhou, Q., Zhang, R., Zhang, F. et al. An automatic modulation classification network for IoT terminal spectrum monitoring under zero-sample situations. J Wireless Com Network 2022, 25 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Internet of things
  • Automatic modulation classification
  • Zero-shot learning
  • Generative adversarial network