# Research on modulation recognition with ensemble learning

- Tong Liu
^{1}, - Yanan Guan
^{1}and - Yun Lin
^{1}Email authorView ORCID ID profile

**2017**:179

https://doi.org/10.1186/s13638-017-0949-5

© The Author(s). 2017

**Received: **21 July 2017

**Accepted: **8 September 2017

**Published: **3 November 2017

## Abstract

Modulation scheme recognition occupies a crucial position in the civil and military application. In this paper, we present boosting algorithm as an ensemble frame to achieve a higher accuracy than a single classifier. To evaluate the effect of boosting algorithm, eight common communication signals are yet to be identified. And five kinds of entropy are extracted as the training vector. And then, AdaBoost algorithm based on decision tree is utilized to confirm the idea of boosting algorithm. The results illustrate AdaBoost is always a superior classifier, while, as a weak estimator, decision tree is barely satisfactory. In addition, the performance of three diverse boosting members is compared by experiments. Gradient boosting has better behavior than AdaBoost, and xgboost creates optimal cost performance especially.

## Keywords

## 1 Introduction

With the rapid progress of radio technology, it has influenced many fields such as communication reconnaissance and anti-reconnaissance. Communication modulation scheme is one of the most important technology in communication reconnaissance and anti-reconnaissance, and it has been widely used in military and civil fields [4–9]. Hence, the research of automatic modulation recognition (AMR) of digital signals should be pay more attention.

AMR is a central task to extract representative parameters or features for defining the type of received unknown signals. In the past decade years, the approach for AMR can be mainly divided into two categories [4, 8–10]. The first one refers to the decision-theoretic method which utilizes the statistical computing on digital signals and then converts the AMR to the probability space by the threshold hypothesis to get the recognition results. However, it has obvious drawbacks requiring too many parameters of the signal and high algorithm complexity. The next approach mentions pattern recognition. It can be regarded as a mapping relationship, which means mapping the time-series signals to feature fields, and the process of recognition just depends on the featured parameters. Compared with the former, pattern recognition occupies the advantage, an easy engineering implementation, and has a widespread application field. Generally, a large amount of data sets and training sets are employed to train the classifier, and then, a series of rather small sets, also called testing sets, try out the performance of the classifier. Above content has already displayed the primary steps for pattern recognition. However, there still exists a critical key issue of how to determine the classifier. A superior classifier can improve the overall recognition results, while a poor one will pull down the classification performance. A number of classifiers have been published, but the results are barely satisfactory under low signal to noise ratios (SNRs). To ameliorate the current state, we use the ensemble algorithm instead of single classifier. These algorithms, such as bagging and boosting, have been revealed more significant advantages than signal classifier [1–3, 33–37].

In this paper, the aim is to survey the performance of different boosting algorithms. From various algorithms, the AdaBoost, Gradient Boosting, and Extreme Gradient Boosting are selected. The ensemble algorithms can improve the recognition results by combining a serious of base estimators (classifiers). Here, all boosting methods are based on decision tree for the comparison of performance. The data set is composed of five different information entropies instead of conventional features.

The organization of this paper can be arranged as follows. The next section is feature extraction. In this part, the feature is extracted for eight common digital signals, including 2ASK, 2FSK, BPSK, 4ASK, 4FSK, QPSK, 16QAM, and 64QAM. Power spectrum entropy, wavelet energy entropy, singular entropy, sample entropy, and Renyi entropy compose the input datasets. Section 3 provides the ensemble learning methodology. In Section 4, the experiments are shown and the details will be discussed later on. The summary is given in Section 5.

## 2 Feature extraction

Different types of signal affect military and civilian application dissimilarly. And identifying the communication signals precisely needs some powerful information. As a result, the powerful information is fertile such as amplitude, phase, frequency, high-order cumulants, and cyclic spectral features [11–17]. As time passes by, the feature extractor was not only focusing on the time-frequency analysis but entropy features [18–21, 38, 39]. The concept of entropy belongs to information theory, which is a kind of measurement for the uncertainty of random events. It can be utilized to measure the uncertainty and complexity of the signal state distribution characteristic. The more entropy there is, the less stable the signal is. The capacity of carrying information can be distinguished by which is the reason why entropy is suitable for applying in the AMR. In our work, the diverse five entropies are chosen to express the signal respectively. The detailed principles are displayed in the following part.

### 2.1 Power spectrum entropy

*x*

_{ i },

*i*= 1, 2, …,

*N*}, which will be converted to

*X*(

*e*

^{ jw })after Fast Fourier Transform (FFT), and then, the power spectral density

*S*(

*e*

^{ jw }) can be presented as the following expression:

*S*(

*e*

^{ jw }) is the distribution of power in the frequency domain. If normalizing the

*S*(

*e*

^{ jw }), then

In (3), *p*
_{
k
} represents the *k*th ratio of the frequency to whole spectrum and *H* is the power spectrum entropy. The less *H* is, the more concentrated it is in the main frequency point.

### 2.2 Singular spectrum entropy

*H*.

*M*, and

*x*

_{ i }can be segmented by

*N*−

*M*. Then, the matrix

**A**is given by (4). Signal’s information (

*δ*

_{ i }) based on

*N*−

*M*basic vector is got after decomposing

**A**which is embodied in the way of the length of signal projection under the basic vector.

The order degree of the signal information distribution is incarnated by singular spectrum entropy. If *H* is relatively high, the signal order has a higher level.

### 2.3 Wavelet energy entropy

The above equation is on behalf of wavelet transform on the signal *f*(*t*).The wavelet transform covers high-dimensional signals well, while the classical analytical method, FFT, is adopted to one-dimensional signals.

*j*, then we use FFT on the wavelet signal:

**X**(

*k*)can be denoted:

So, the wavelet energy entropy is \( H=-\sum_{k=1}^N{p}_k{\mathrm{log}}_2{p}_k \).

### 2.4 Renyi entropy

*f*(

*x*,

*y*), then the Shannon entropy and

*α*order Renyi entropy are denoted:

Here, \( {SPWVD}_{g,h}\left(t,\tau \right)=\underset{-\infty }{\overset{\infty }{\int }}\underset{-\infty }{\overset{\infty }{\int }}s\left(t-u+\tau /2\right){s}^{\ast}\left(t-u+\tau /2\right)h\left(\tau \right)g(u){e}^{-j2\pi \tau} d\tau du \), which is from smoothing for the variables *t,τ* by windows function *h*(*τ*), *g*(*τ*). SPWVD is a kind of smooth pseudo Wigner-Ville distribution (WVD) in the Cohen, because of the cross term in WVD.

### 2.5 Sample entropy

*i*th signal

*X*(

*i*) and the others

*X*(

*j*) is calculated firstly,

*d*[

*X*(

*i*),

*X*(

*j*)] = max {|

*X*(

*i*+

*k*),

*X*(

*j*+

*k*)|}. It must be noted that

*X*(

*i*) is composed of

*m*samples of signal

*x*

_{ i }. Next, a threshold value is set by

*r*. Thus, the

*φ*

^{ m }(

*r*) is as follows:

\( {C}_i^m(r) \) denotes the ratio which means the number of maximum is smaller than the setting threshold to all samples.

We have illustrated the principle of these features, and we will use an experiment result to display their responsibilities which are skilled in different types of digital signal. The boxplot will be used to reflect the distribution of data. In the experiment part, their ability to extract signal information will be displayed.

## 3 Classifiers

There is an algorithm originating from probably approximately correct (PAC) learning model [25]. The concept of weak and strong learning is proposed by Valiant and Kearns. If the error rate is less than 0.5 slightly, which means the accuracy rate is just only better than random guessing, the algorithm can be considered as a weak learner [23, 24]. Then, another issue to ponder is how to boost the weak learners to be strong learners. A polynomial-time boosting method is come up by Schapire in 1989 [26], which is the prototype of boosting algorithm. In recent years, the application of boosting algorithm has become popular among various classifiers. As a machine learning method, ensemble boosting devotes to finding rough rules of thumb other than getting a high prediction rate rule. Especially, its superiority is reflected in avoiding overfitting and high probability of classification in high-dimensional space. In this paper, we employ boosting classifier instead of single classifier to build a high accuracy for the pattern recognition.

### 3.1 AdaBoost

Boosting is a cluster of algorithms. In 1995, a converted boosting, adaptive boosting (AdaBoost), was introduced by Freund and Schapire [27]. AdaBoost algorithm is one of the most famous representatives; therefore, it aims at transforming weak learners to strong ones. One of the cushy comprehensions is linear combination based on these weak learners or estimators for the AdaBoost.

In the above process, *w*
_{
ij
} denotes the weight for the *j*
^{th} sample in the *i*
^{th} round.

*D* is just sample weight set. AdaBoost combines a series of estimators by line, and *α*
_{
m
} is another weight or a coefficient for the estimator. From the equality relationship, it is obvious that *α*
_{
m
} is inversely proportional to *e*
_{
m
}. And *Z*
_{
m
} represents the normalization of *w*
_{
ij
} which is satisfied with the probability. In addition, we will find *w*
_{
ij
}depends on the last round result to upgrade where the adaptive comes from. All weak learners are not alone with each other but link closely.

The performance of ensemble method is closely related to weak learners. In the real situation, AdaBoost with decision tree is the best off-the-shelf classifier [28]. As a result, the whole simulation is based on the only weak classifier, i.e., decision tree.

### 3.2 Gradient boosting

Another stagewise boosting member is gradient boosting (GB) derived by Friedman [29, 30]. The principle idea of gradient boosting is to construct the new model based on the negative gradient of the previous loss function which is related to the former iteration rounds. In the machine learning, loss function is the key issue to solve, which embodies the relationship between prediction and target. The less the loss function is, the higher the precision is. If the loss function declines consecutively with the iteration process, a conclusion that the model changes sequentially along a superior direction can be inferred. Gradient of loss function is the superior direction.

*F*(

*x*). Here, the definition of loss function is

*L*(

*y*,

*F*(

*x*)):

*F*denotes the linear combination of some weak learners (

*G*

_{ i }(

*x*)) with weights (

*γ*

_{ i }). And \( \widehat{F} \) tries minimizing the value of loss function on the input vector. So, the algorithm initializes a constant function

*F*

_{0}(

*x*),

Similar with AdaBoost, if the decision tree is selected as the estimator, the algorithm will be the gradient boosting decision tree (GBDT), a shining classifier, which can be applied in many fields.

At the beginning of this section, we have mentioned there are a lot of member algorithms in the boosting method family. Gradient boosting and AdaBoost are two common ones of them. If we view from an abstract point, both of them get solved with the help of convex loss function. But gradient boosting can get more types of loss function. What is more, GB could deal with both regression and classification. In classification mode, log loss function is always the best objective function while AdaBoost will choose exponential loss. If you want to tell them from the fundamental element, the crucial question is how to identify the model. AdaBoost utilizes the misclassification to adjust the weight of weak learners whereas GB applies Negative gradient to ameliorate.

### 3.3 Extreme gradient boosting

During the last years, data mining and data analysis become the current topic with the rise of alpha go. Our life is full of these words such as big data and artificial intelligence. Boosting family also has a vicissitude with time. A novel boosting method occurs in the Kaggle, extreme gradient boosting, simply xgboost.

Xgboost, an implementation of GBDT, offers a novel tree searching: end to end [31, 32]. The algorithm has advantages in distributed computing, solving the sparse, and avoiding overfitting better. In other words, the amount of calculation reduces greatly and the split direction is learned automatically. For overfitting, regular terms are appended to the objective.

*l*as the training loss function as above, and

*L*is the real loss function for xgboost method. The other notations are consistent with the ones in other mentioned boosting methods.

*G*is the weak estimator (decision tree) and

*F*represents prediction. Moreover, the complexity of the decision trees (

*Ω*(

*G*

_{ m })) is added into the loss to construct the objective function. The definition of regular term,

*Ω*(

*G*

_{ m }), is showed as:

*T*is the number of leaves of the decision tree and

*w*

_{ j }

^{2}means L2 norm of leaf scores.

*γ*is the threshold to control the split of nodes, and

*λ*is just on behalf of coefficient to preserve overfitting which is a unique characteristic. More details are demonstrated in reference [31]. Then, the equation can be transformed as:

From the equation, the other main variables can be denoted respectively: \( {g}_i={\partial}_{F^{m-1}}l\left({y}_i,{F}_i^{m-1}\right) \) and \( {f}_i={\partial}_{F^{m-1}}^2l\left({y}_i,{F}_i^{m-1}\right) \), the first and second derivative on loss function.

## 4 Experiments

After introduction of methodology, AMR experiment to evaluate ensemble methods is proposed in this part. We consider the eight types of modulation schemes including 2ASF, 2FSK, BPSK, 4ASK, 4FSK, QPSK, 16QAM, and 64QAM. For every signal, sampling rate is 16 KHz and carrier frequency is 4 KHz. The other parameters are listed here such as number of symbols (125), symbol rate (1000), and length of signal (2000). The range of SNR is between − 10 and 20 dB with the 3 dB step length.

Data set is divided into training data and testing data. The first one covers 8000 samples in every SNR while the other one has 4000 samples. Every modulation scheme extracts 1500 samples for one SNR.

### 4.1 AdaBoost with weak learners

Building a mergence system to improve the performance of every weak classifier is the ultimate goal. In this part, the experiment will be implicated in comparing the single with groups. Decision tree (DT) is the only candidate. The result is shown in the graph.

However, there is a worrying phenomenon in the above picture. When the SNR reaches 5 dB, DT’s ability goes down drastically. We have no choice but to consider whether the cause is from features or base classifiers. The following picture will interpret the reason.

The confusion matrix is used to exhibit details of classification. The column denotes the real target while the row denotes the predicted label. It is a visualization tool to compare the classification results and actual measured value primarily.

The left matrix is the result of accuracy of decision tree based on 4 layers. For each signal, there are 500 samples to test the model. Modified model can identify the digital signals adequately except QPSK, 16QAM, and 64QAM. Model feels confused with regard to QPSK, QAM, and 2FSK. It allots wrong label, i.e., 2FSK, to them. Moreover, none of them can be escaped which means the five entropy features does not work anymore. These features no longer have their unique characteristics.

### 4.2 Comparison of boosting

The conclusion that AdaBoost algorithm converts base learner into a strong one successfully can be made. As the same with other algorithms, boosting algorithm also is a family method. Assorted deformation algorithms have been proposed by the amount of researchers and studies during the past decades. Two epidemical arithmetic based on tree structure stand out from these boosting members. The next experiment will show the comparison between the popular and classical arithmetic.

Probability of different ensemble learning

SNR | Classifier | ||
---|---|---|---|

AdaBoost | GBDT | Xgboost | |

− 10 dB | 0.783 | 0.8501 | 0.8465 |

− 7 dB | 0.894 | 0.928 | 0.928 |

− 4 dB | 0.993 | 0.993 | 0.993 |

− 3 dB | 1.000 | 1.000 | 1.000 |

− 1 dB | 1.000 | 1.000 | 1.000 |

2 dB | 1.000 | 1.000 | 1.000 |

5 dB | 1.000 | 1.000 | 1.000 |

8 dB | 1.000 | 1.000 | 1.000 |

11 dB | 1.000 | 1.000 | 1.000 |

14 dB | 1.000 | 1.000 | 1.000 |

17 dB | 1.000 | 1.000 | 1.000 |

20 dB | 1.000 | 1.000 | 1.000 |

As the same as former, confusion matrix is utilized to show the classification details. We want to analyze the probability of classification for every sigma.

## 5 Conclusions

In order to enhance the probability of communication digital signal recognition, in this paper, we bring in ensemble learning based on boosting algorithm. All of three boosting member algorithms can obtain a higher accuracy than weak classifier. First, five different information entropy of communication signals are extracted as the input training data set of classifiers. A boxplot is used to show the distribution of discrete features, and a similarity for QPSK, 16QAM, and 64QAM is also displayed. And then the experiment starts from the comparison between the AdaBoost algorithm and decision tree algorithm with uncertain depth of tree. The result exhibits that AdaBoost can improve the performance of decision tree despite the entropy feature work badly when SNR is over 2 dB. At last, another check experiment is made to confirm properties of each boosting member. It is obviously seen from the table of recognition result that gradient boosting is superior to classical AdaBoost a little. And the state-of-art boosting algorithm, named xgboost, may be more suitable for modulation scheme classification without less iteration times and higher precision.

## Declarations

### Acknowledgements

This work is supported by the National Nature Science Foundation of China (61301095), the Key Development Program of Basic Research of China (JCKY2013604B001), and the Fundamental Research Funds for the Central Universities (GK2080260148 and HEUCF1508). We gratefully thank very useful discussions of reviewers.

### Funding

The research is funded by the International Exchange Program of Harbin Engineering University for Innovation-oriented Talents Cultivation.

### Authors’ contributions

The authors have contributed jointly to all parts of the preparation of this manuscript, and all authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- E Bauer, R Kohavi. An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach. Learn.
**36**(1), 105–139 (1999)View ArticleGoogle Scholar - J Bergstra, N Casagrande, D Erhan, et al. Aggregate features and AdaBoost for music classification. Mach. Learn.
**65**(2–3), 473–484 (2006)View ArticleGoogle Scholar - P Viola, M Jones.
*Fast and Robust Classification Using Asymmetric AdaBoost and a Detector Cascade. Advances in Neural Information Processing Systems*(2002), pp. 1311–1318Google Scholar - A Hossen, F Al-Wadahi, JA Jervase. Classification of modulation signals using statistical signal characterization and artificial neural networks. Eng. Appl. Artif. Intell.
**20**(4), 463–472 (2007)View ArticleGoogle Scholar - E Azzouz, AK Nandi. Automatic Modulation Recognition of Communication Signals. J Franklin. Instit. 46(4), 431-436 (1998)Google Scholar
- D Grimaldi, S Rapuano, L De Vito. An automatic digital modulation classifier for measurement on telecommunication networks. IEEE Trans. Instrum. Meas.
**56**(5), 1711–1720 (2007)View ArticleGoogle Scholar - OA Dobre, A Abdi, Y Bar-Ness, et al. Survey of automatic modulation classification techniques: classical approaches and new trends. IET Commun.
**1**(2), 137–156 (2007)View ArticleGoogle Scholar - JL Xu, W Su, M Zhou, Likelihood-ratio approaches to automatic modulation classification. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev.
**41**(4), 455–469 (2011)View ArticleGoogle Scholar - A Swami, BM Sadler. Hierarchical digital modulation classification using cumulants. IEEE Trans. Commun. 48(3), 416–429 (2000)Google Scholar
- Z Tian, Y Tafesse, BM Sadler. Cyclic feature detection with sub-Nyquist sampling for wideband spectrum sensing. IEEE. J. Top. Signal. Process.
**6**(1), 58–69 (2012)View ArticleGoogle Scholar - OA Dobre, Y Bar-Ness, W Su. Higher-order cyclic cumulants for high order modulation classification. Military Communications Conference, 2003. MILCOM'03. 2003 IEEE. IEEE.
**1**, 112–117 (2003)Google Scholar - CM Spooner. Classification of co-channel communication signals using cyclic cumulants Signals, Systems and Computers, 1995. 1995 Conference Record of the Twenty-Ninth Asilomar Conference on IEEE. 1, 531-536 (1995)Google Scholar
- E Like, VD Chakravarthy, P Ratazzi, et al. Signal classification in fading channels using cyclic spectral analysis. EURASIP J. Wirel. Commun. Netw.
**2009**(1), 879812 (2009)View ArticleGoogle Scholar - A Polydoros, K Kim. On the detection and classification of quadrature digital modulations in broad-band noise. IEEE Trans. Commun.
**38**(8), 1199–1211 (1990)View ArticleGoogle Scholar - LV Dominguez, JMP Borrallo, JP Garcia, et al. A general approach to the automatic classification of radiocommunication signals. Signal Process.
**22**(3), 239–250 (1991)View ArticleGoogle Scholar - OA Dobre, M Oner, S Rajan, et al. Cyclostationarity-based robust algorithms for QAM signal identification. IEEE Commun. Lett.
**16**(1), 12–15 (2012)View ArticleGoogle Scholar - WA Gardner. The spectral correlation theory of cyclostationary time-series. Signal Process.
**11**(1), 13–36 (1986)View ArticleGoogle Scholar - SU Pawar, JF Doherty. Modulation recognition in continuous phase modulation using approximate entropy. IEEE. Trans. Inf. Forensics. Secur.
**6**(3), 843–852 (2011)View ArticleGoogle Scholar - SM Pincus. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci.
**88**(6), 2297–2301 (1991)View ArticleMATHMathSciNetGoogle Scholar - H. Kantz, T. Schreiber, Nonlinear time series analysis. Technometrics 43(4):491 (1999)Google Scholar
- S Kadambe, Q Jiang. Classification of modulation of signals of interest. Digital Signal Processing Workshop, 2004 and the 3rd IEEE Signal Processing Education Workshop. 2004 IEEE 11th. IEEE, 226–230 (2004)Google Scholar
- RG Baraniuk, P Flandrin, AJEM Janssen, et al. Measuring time-frequency information content using the Rényi entropies. IEEE Trans. Inf. Theory
**47**(4), 1391–1409 (2001)View ArticleMATHGoogle Scholar - M Kearns, Learning Boolean Formulae or Finite Automata is as Hard as Factoring. Technical Report TR-14-88 Harvard University Aikem Computation Laboratory. 1988Google Scholar
- M Kearns, L Valiant. Cryptographic limitations on learning Boolean formulae and finite automata. JACM
**41**(1), 67–95 (1994)View ArticleMATHMathSciNetGoogle Scholar - LG Valiant. A theory of the learnable. Commun. ACM
**27**(11), 1134–1142 (1984)View ArticleMATHGoogle Scholar - RE Schapire. The strength of weak learnability. Mach. Learn.
**5**(2), 197–227 (1990)Google Scholar - Y Freund, RE Schapire,
*A Desicion-Theoretic Generalization of On-line Learning and an Application to Boosting. European Conference on Computational Learning Theory*(Springer, Berlin, Heidelberg, 1995), pp. 23–37Google Scholar - L Breiman. Bagging predictors. Mach. Learn.
**24**(2), 123–140 (1996)MATHGoogle Scholar - JH Friedman, Greedy function approximation: a gradient boosting machine. Ann. Stat., 1189–1232 (2001)Google Scholar
- JH Friedman. Stochastic gradient boosting. Comput. Stat. Data. Anal.
**38**(4), 367–378 (2002)View ArticleMATHMathSciNetGoogle Scholar - T Chen, C Guestrin. XGBoost: A Scalable Tree Boosting System (ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM, 2016), pp. 785-794Google Scholar
- L Torlay, M Perrone-Bertolotti, E Thomas, et al. Machine learning–XGBoost analysis of language networks to classify patients with epilepsy. Brain. Informatics. 11, 1–11 (2017)Google Scholar
- Y Freund, RE Schapire, Experiments with a new boosting algorithm. Icml
**96**, 148–156 (1996)Google Scholar - Y Freund, R Iyer, RE Schapire, et al. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res.
**4**, 933–969 (2003)MATHMathSciNetGoogle Scholar - H Drucker, R Schapire, P Simard.
*Improving Performance in Neural Networks Using a Boosting Algorithm*, Advances in neural information processing systems (1993), pp. 42–49Google Scholar - NC Oza. Online bagging and boosting Systems, man and cybernetics, 2005 IEEE international conference on IEEE. 3, 2340-2345 (2005)Google Scholar
- MC Tu, D Shin, D Shin. Effective diagnosis of heart disease through bagging approach. Biomedical Engineering and Informatics, 2009. BMEI'09. 2nd International Conference on IEEE. 1-4 (2009)Google Scholar
- J Li, Y Li, Y Lin. The application of entropy analysis in radiation source feature extraction. J Projectiles Rockets Missiles Guid
**31**(5), 155–160 (2011)Google Scholar - ZY He, YM Cai, QQ Qian. A study of wavelet entropy theoryand its application in electric power system fault detection. Proc CSEE
**5**, 006 (2005)Google Scholar - J Li, Y Ying. Radar signal recognition algorithm based on entropy theory. Systems and Informatics (ICSAI), 2014 2nd International Conference on IEEE. 718-723 (2014)Google Scholar