Non-commutative large entries for cognitive radio applications

Cognitive radio has been proposed as a solution for the problem of underutilization of the radio spectrum. Indeed, measurements have shown that large portions of frequency bands are not efficiently assigned since large pieces of bandwidth are unused or only partially used. In the last decade, studies in different areas, such as signal processing, random matrix theory, information theory, game theory, etc., have brought us to the current state of cognitive radio research. These theoretical advancements represents a solid base for practical applications and even further developments. However, still open questions need to be answered. In this study, free probability theory, through the free deconvolution technique, is used to attack the huge problem of retrieving useful information from the network with a finite number of observations. Free deconvolution, based on the moments method, has shown to be a helpful approach to this problem. After giving the general idea of free deconvolution for known models, we show how the moments method works in the case where scalar random variables are considered. Since, in general, we have a situation where more complex systems are involved, the parameters of interest are no longer scalar random variables but random vectors and random matrices. Random matrices are non-commutative operators with respect to the matrix product and they can be considered elements of what is called non-commutative probability space. Therefore, we focus on the case where random matrices are considered. Concepts from combinatorics, such as crossing and non-crossing partitions are useful tools to express the moments of Gaussian and Vandermonde matrices, respectively. Our analysis and simulation results show that free deconvolution framework can be used for studying relevant information in cognitive radio such as power detection, users detection, etc.


Introduction
In the last decade, recent studies [1] have shown that future communication systems should be designed to be able to adapt to their environment in order to tackle the problem of the underutilization of a precious resource such as the radio spectrum. Measurements have shown that large portions of frequency bands are not efficiently used, that is, for most of the time, large pieces of bandwidth are unoccupied or partially occupied [2]. A possible solution, introduced by Mitola [3,4], is represented by cognitive networks, that can be thought of as self-learning, adaptive and intelligent networks. In cognitive networks, unlicensed (secondary) systems improve spectral efficiency by sensing the environment and filling opportunistically the discovered holes spectrum (or white spaces) of licensed systems (primary), which have exclusive right to operate in a certain spectrum band [5]. The current development of microelectronics allows us to suppose that these wireless systems, for which the spectrum utilization will play a key role, will be realized in the near future. These systems provide an efficient utilization of the radio spectrum based on the methodology understanding-bybuilding to learn from the environment and to adapt their parameters to statistical variations in the input stimuli [6].
The current development in cognitive radio research is the result of a multidisciplinary study that allows us to analyze different aspects of cognitive radio. We identify in signal processing, game theory, information theory, random matrix theory, etc., enabling areas for the development of cognitive radio.
Signal processing plays a major role in designing cognitive wireless networks, especially in spectrum sensing to identify spectrum opportunities and in the design of cognitive spectrum access to exploit the identified spectrum holes. We refer to spectrum sensing as the process where devices look for a signal in the presence of noise for a given frequency band. Several digital signal processing techniques, such as matched filtering, energy detection, and cyclostationary feature detection are analyzed [7,8] to improve radio sensitivity and detect the presence of primary users. In [9], it is proved that the energy detector is an efficient spectrum sensing technique when the secondary user has limited information on the primary user's waveform, i.e., only the power of the local noise is known. The authors of [10] formulate the spectrum sensing problem as a nonlinear optimization problem, minimizing the interference to the primary user and meeting the requirement of opportunistic spectrum utilization. Cooperation between users follows as a consequence of the following constraints: (1) secondary users should not interfere with the primary transmissions and they should be able to detect the primary signal even if decoding the signal may be impossible [9]; (2) secondary users are in general not aware of the exact transmission scheme used by primary users. Cooperation among all cognitive users operating in the same band reduces the detection time and increases the overall agility with which cognitive users are able to shift bands [11][12][13]. Cooperation is designed in [14] as joint detection among all the cooperating users and in [15] as fusion center that makes the final decision about the occupancy of the band by fusing the decisions made by all cooperating users. In [16], cooperation is analyzed for the partial CSI (channel state information) scenario at the secondary users.
From a game theoretic point of view spectrum sharing may be considered as a competition. The importance of studying cognitive radio networks in a game theoretic framework is multifold. By modeling dynamic spectrum sharing between users as a game, users behaviors and actions can be analyzed in a formalized structure, where the theoretical results in game theory can be fully applied [17,18]. The optimization of spectrum usage is generally a multiobjective optimization problem, which is very difficult to analyze and to solve. Moreover, game theory provides us game models that predict convergence and stability of networks [19]. In [20], a game-theoretic adaptive channel allocation scheme is proposed for cognitive radio networks. In particular, a game is formulated to analyze the selfish and cooperative behaviors of the players. The players of this game were the wireless nodes and their strategies were defined in terms of channel selection. In [21], the convergence dynamics of the different types of games in cognitive radio systems is studied. Then, a game theory framework is proposed for distributed power control to achieve agility in spectrum usage in a cognitive radio network.
Information theory is used to characterize the achievable rates in a cognitive radio network under different assumptions on how the secondary systems interfere with the primary ones. Fundamental understanding on the capacity of the cognitive systems are provided in [22][23][24][25][26]. Using recent results on random matrix theory, the authors of [27,28] propose a new method for signal detection in cognitive radio, based on the eigenvalues of the covariance matrix of received signal at the secondary users. In [29], a spectrum sensing technique that relies on the use of multiple receivers to infer on the structure of the received signals using random matrix theory is proposed. The authors show that their technique is quite robust and does not require the knowledge of signal or noise statistics. These methods do not require any prior information on the primary signal or on the noise power. In [30,31] two hypothesis tests allowing to detect the presence of an unknown transmitter using several sensors are proposed and random matrix theory is used to provide the error associated with both tests.
We recognize as a crucial point of cognitive radio development understanding how much it is possible to infer from the network with the knowledge of just few observations. In the current study, we use free probability theory, through the concept of free deconvolution, to handle the problem of retrieving useful information from the network with a limited number of observations. Free deconvolution, based on the moments method, has shown to be a interesting tool to attack this problem.
In cognitive random networks, devices are autonomous and should take optimal decisions based on their sensing capabilities. We are particularly interested in measures such as capacity, signal to noise ratio, and estimation of the signal power. Such measures are usually related to the eigenvalues of the channel matrix and not on the specific structure, the eigenvectors. The fact that the spectrum of a stationary process is related to the information measure of the underlying process dates back to Kolmogorov [32]. The entropy rate of a stationary Gaussian stochastic process can be expressed by where S is the spectral density of the considered process. Therefore, a complete characterization of the information contained in the process is given in the case the autocorrelation of the process is known. Moreover, the authors of [33,34] have shown that the entropy rate is also related to the minimum mean squared error (MMSE) of the best estimator of the process given the infinite past. In wireless communication, this means that it is possible retrieve one quantity from the other, especially as many receivers incorporate an MMSE component. In the discrete case, the entropy rate per dimension (or differential entropy) of a Gaussian stochastic process x i of size n is given by is the covariance and l i are its eigenvalues. The knowledge these eigenvalues provides us with the information on Gaussian networks. In fact, in order to estimate the rate and in extension the capacity which is the difference between two differential entropies or any other measure which involves performance criteria, one needs to compute the eigenvalues of the covariance. For a number of observations K of the vector x i , i = 1,..., K, the covariance R is usually estimated bŷ where S = [s 1 ,..., s K ] is an n × K i.i.d zero mean Gaussian vector with variance 1 K . In cognitive random networks, the number of samples K is of the same order as n, due to the high mobility of the network and to the fact that the statistics are considered to be the same within a K number of samples. Because of this, the use of classical asymptotic signal processing techniques is not more efficient since they require a number of samples K >> n. Therefore,our main problem consists in retrieving information within a window of limited samples. In this sense, free probability theory, through the concept of free deconvolution, is a very appealing framework for the study of cognitive networks. The main advantage of free deconvolution framework is that it provides us with helpful techniques to obtain useful informations from a finite number of observations. The deconvolution framework comes here from the fact that we would like to invert Equation (1) and express R with respect toR, since we can only have access to the sample covariance matrix. This is not possible, however, one can compute the eigenvalues of R knowing only the eigenvalues ofR.
In the following, the general idea of free deconvolution is presented. We show how the moments method works in the case where scalar random variables are considered. However, since in practical situations systems are more complex, the parameters of interest are no longer scalar random variables and they need to be represented by random matrices. Therefore, we analyze the case where random matrices are considered. We analyze moments method for matrices which show the freeness property and we show that it can be used to propose algorithmic methods to compute moments of finite Gaussian random matrices. Moreover, we analyze the case of matrices for which freness does not hold: Vandermonde, Hankel, Toeplitz. In the end, we present applications showing how the moments method approach can be used for studying cognitive radio: power detection, users detection, etc. In last section, we discuss our results presenting conclusions and open problems.

Information plus noise model
The example given in (1) is rarely met in practice in wireless communication since the transmitted signal s i is, usually, distorted by a medium, given by m i = f(s i ) with f any function, and the received signal y i is altered by some additive noise n i . We consider a finite number K of observations of the following n × 1 received signal, known as Information plus Noise model which can be rewritten in a matrix form stacking all observations as with M and N K × n independent random matrices. We are interested in retrieving information about the transmitted signal from the received signal, more explicitly to obtain the eigenvalues of MM H from the eigenvalues of YY H and NN H . This is exactly the goal of deconvolution.
In more general terms, the idea of deconvolution is related to the following problem [35]: Given A, B two n × n independent square complex Hermitian (or real symmetric) random matrices: (1) Can one derive the eigenvalue distribution of A from those of A + B and B? If feasible in the large nlimit, this operation is named additive free deconvolution, (2) Can one derive the eigenvalue distribution of A from those of AB and B? If feasible in the large n-limit, this operation is named multiplicative free deconvolution.
The techniques generally used to compute the operation of deconvolution in the large n-limit are the moments method [35] and the Stieltjes transform method [36]. Each of these methods has its advantages and its drawbacks. The moments method only works for measures with moments and characterizes the convolution only by giving its moments but it is easily implementable and, in many applications, one needs only a subset of the moments depending on the number of parameters to be estimated. Instead, the Stieltjes transform method works for any measure and it allows, when computations are possible, to recover the densities. Unfortunately, this method works only in very few cases, since the operations which are necessary are almost always impossible to implement in practice and combining patterns of matrices naturally leads to more complex equations for the Stieltjes transform and can only be performed in the large n-limit.
We analyze the concept of free deconvolution based on the moments method which uses the empirical moments of the eigenvalue distribution of random matrices to obtain information about the eigenvalues. The moments method has shown to be a fruitful technique in both the asymptotic and the finite setting to compute deconvolution, as well as the simplest patterns, sums and products, and products of many independent matrices.

Scalar case
We start by showing how moments method works for the scalar case. We consider X and Y two independent random variables and Z = X + Y . We are interested in retrieving the distribution of X knowing the distribution of Z and Y . The idea is to consider the moment generating function The knowledge of M X (t) gives us the distribution of the random variable X. However, it is not always easy to recover the distribution of X from M X (t). Another approach to solve the problem is to express the independence in terms of moments or cumulants. The cumulants are given by derivatives (at zero) of the function g X (t) = log E e tX . We denote by c n the cumulant of order n: The main advantage of using cumulants is due to the fact that for independent variables X and Y this means that Moments, denoted by m n (X) = E [X n ], and cumulants of a random variable can be deduced from each other by the moment-cumulant formula Therefore to obtain the distribution of X from the ones of X +Y and Y one can compute the cumulants of X by the formula c n (X) = c n (X + Y) − c n (Y) and then deduce the moments of X from its cumulants.
In the multiplicative case, we consider X and Y independent random variables and we are interested in retrieving the distribution of X from XY and Y. In this case, the problem can be easily solved since . Therefore, using the moments approach, we can compute the moments of X.
The moments method for scalar random variables seems to be very straightforward, however in general we have more complex situations. The generalization to multi-user multi-antenna communication systems has dramatically changed the nature of wireless communication problems. Furthermore, multi-dimensional stochastic problems need to be solved since cognitive devices are required to be simultaneously smarter and able to collaborate with one another. The random parameters in these problems are no longer scalar random variables but potentially vectors and matrices. The computation of deconvolution for random matrices is more complex than the scalar case and it is explained in the following.

Historical perspective
The origin of the moment approach for the derivation of the eigenvalue distribution of random matrices dates back to the work of Wigner [37]. Wigner was interested to the energy levels of nuclei (the positively charged central core of an atom). These energy levels are linked to the Hamiltonian operator by the Schrondinger equation, and the fact that these energy levels can be represented as the eigenvalues of the matrix representation of this operator, led Wigner to replace the exact matrix by a random matrix having the same properties. In most of the cases, it could be considered the following hermitian random matrix where the upper diagonal elements are i.i.d generated with a binomial distribution. His study revealed that, as the dimension of the matrix increases, the eigenvalues of the matrix become more and more predictable irrespective of the exact realization of the matrix (see Figure 1). The idea to show this is to compute, as the dimension increases, the moments of the matrix H, that is the trace at different exponent. Consider then the moments of the eigenvalue distribution of H are given by: The traces above can be computed, as the dimension increases, using combinatorial tools. It turn out that all odd moments converge to zero, whereas all even moments converge to what is known as the Catalan numbers. The only distribution which has all its odd moments null and all its even moments equal to the Catalan numbers is the semi-circular law (see Figure 1) provided by with |x| ≤ 2. In this way, the moments approach is shown to be a useful method for computing the eigenvalues distribution of classical known matrices.
When more than one matrix is considered, the concept of asymptotic freeness [38] leaves us to compute the eigenvalue distribution of sums and products of random matrices.

Free probability framework
Free probability theory [38] was introduced by Voiculescu in the 1980s in order to attack some problems related to operator algebras and it can be considered as a generalization of classical probability theory to noncommutative algebras. The analogy between the concept of freeness and the independence in classical probability leaves us to work with noncommutative operators like matrices that can be considered elements in what is called a noncommutative probability space. The algebra of Hermitian random matrices is a particular case of such a probability space, for which the random variables, i.e., the random matrices, do not commute with respect to the matrix product.
Definition 3.1 A non-commutative probability space (A, ϕ) consists of a unital a non-commutative algebra A over ℂ and a unital linear function The elements of A are called non-commutative random variables. In our case, A will consist of n × n matrices or random matrices. For matrices, will be the normalized trace defined for any A ∈ A by while for random matrices, will be the linear functional τ defined by Given A, B n × n hermitian and asymptotically free random matrices such that their eigen-values distributions converge to some probability measure µ A and µ B , respectively, then the eigenvalue distributions of A + B and AB converge to a probability measure which depends on µ A and µ B , called additive and multiplicative free convolution, and denoted by µ A ⊞ µ B and µ A ⊠ µ B , respectively.
Additive free deconvolution: The additive free deconvolution of a measure r by a measure ν is (when it exists) the only measure µ such that r = µ ⊞ ν. In this case, µ is denoted by µ = r ⊟ ν.
Multiplicative free deconvolution The multiplicative free deconvolution of a measure r by a measure ν is (when it exists) the only measure µ such that r = µ ⊠ ν. In this case, µ is denoted by µ = r ⊠ ν.
For a given n × n random matrix A, the p-th moment of A is defined, if it exists, as: is the associated empirical mean measure, and l i are the eigenvalues of A. The idea of additive and multiplicative free deconvolution stems from the fact that in the asymptotic case which means that we can express the moments of A + B and the moments of AB as a function of the moments of A and the moments of B. In other words, the joint distribution of A + B and the joint distribution of AB depend only on the marginal distributions of A and B.
Even if matrices with finite dimensions are not free, the free probability framework, based on the moments, can still be used to propose an algorithmic method to compute these operations for finite size matrices. This means that m n,p Hence, when, for n ∞, the moment m n,p A converges almost surely to an analytical expression m p A that depends only on some specific parameters of A (such as the distribution of its entries). b Therefore, in the finite setting one is still able by recursion to express all the moments of A with respect only to the moments of A + B and B, or AB and B.
We will give a characterization of free deconvolution in terms of free cumulants, which are polynomials in the moments with a nice behaviour with respect to the freeness. The nomenclature comes from classical probability theory where corresponding objects are well known. There exists a combinatorial description of these classical cumulants, which depends on partitions of sets. In the same way, free cumulants can also be described combinatorially, the only difference to the classical case is the replacement of partitions by the so called non-crossing partitions [39]. Definition 3.3 A partition π of a set {1, 2,..., n} is a decomposition in subsets V i : π = {V 1 ,..., V r } such that The set of all partitions of {1, 2,..., n} is denoted by P(n), and V i are called blocks of π. For a given random variable X, the relationship between moments and cumulants given in (4) can be combinatorially expressed by .., π |π| }. Definition 3.4 A partition π of {1,..., n} is non-crossing if whenever we have four numbers 1 ≤ i < k < j < l ≤ n such that i and j are in the same block, k and l are in the same block, we also have that i, j, k, l belong to the same block.
We denote by NC(n) the set of non-crossing partition of {1,..., n}, and if this situation does not happen, then we call π a crossing partition. Examples of non-crossing and crossing partitions are give in Figures 2 and 3.
The computation of free deconvolution by the moments method approach is based on the momentcumulant formula, which gives a relation between the moments m p A ≡ m p μA and the free cumulants κ p A ≡ κ p μA of a matrix A, where µ A is the associated measure. It turns out that the cumulants are quantities much easier to compute, also thanks to the concept of non-crossing partitions. The moment-cumulant formula says that where |V i | is the cardinality of the block V i . From (6) it follows that the first p cumulants can be computed from the first p moments, and viceversa.
The following characterization allows us to compute easily the additive free convolution using free cumulants.
Theorem 3.5 [38]Given A and B asymptotically free random matrices, µ A ⊞ µ B is the only law such that for all p ≥ 1 Hence, the deconvolution of µ A+B by µ B , denoted by µ (A+B) ⊟ µ B , is characterized by the fact that for all p ≥ 1 The implementation of additive free deconvolution is based on the following steps: for the two matrices (A + B) and B, we first compute the free cumulants, then, considering the relation between the cumulants and the moments, we can obtain information about the distribution of the eigenvalues of A.
The moments method, in the multiplicative case, is based on the relation between the moments m   Hence, we can compute multiplicative free convolution by the following characterization.
Theorem 3.6 [38]Given A and B asymptotically free random matrices, µ A ⊠ µ B is the only law such that: The multiplicative free deconvolution of μ AB by μ B , μ (AB) ⊠ μ B , is characterized by the fact that for all p ≥ 1 Even though freeness usually does not hold for finite matrices, the moments method can still be used to propose algorithmic methods to compute their moments. Focusing on the study of random matrices in the finite case, the authors of [40] were able to derive the explicit series expansion of the eigenvalue distribution of various models, namely the case of non-central Wishart distributions as well as one sided correlated zero mean Wishart distributions. In particular, they proposed a general finite dimensional statistical inference framework based on the moments method in the finite case, which takes a set of moments as input and produces sets of moments as output with the dimensions of the matrices considered finite. They focus on the finite Gaussian case. The formulas of the moments presented in their contributions have been generated by iterations through partitions and permutations and concepts from combinatorics. The first and simplest result concerns the moments of a product of a deterministic matrix and a Wishart matrix. Let n, N be positive integers, X be n × N standard, complex, Gaussian d matrix and D a (deterministic) n × n matrix. Denoting the moments D p = tr (D p ) and M p = E[tr((D 1 N XX H ) p )] for any positive integer p, Theorem 1 in [40] allows us to express the moments M p in terms of the moments D p . In particular, the first three moments can be written as where c = n N . By a simple recursion, we can express D p from M p . For the first three moments these recursions become Considering the sum of a D deterministic n × N matrix and X a n × N standard, complex, Gaussian matrix, in accordance with the [40, Theorem 2], for any positive integer p the moments M p = E[tr(( 1 N D + X)(D + X) H ) p )] can be expressed in terms of the moments D p = tr(( 1 n DD H ) p ) as the following formulas: In this case also, by a simple recursion, one can express D p from M p . It is clear how the operation of deconvolution can be viewed as operating on the moments: explicit expression for the moments of the Gram matrices associated to our models (sum or product of a deterministic matrix and a complex standard Gaussian matrix) are found, and are expressed in terms of the moments of the matrices involved. Hence, deconvolution means to express the moments, in this case of the deterministic matrices, in function of the moments of the Gram matrices.
Similar results are found when the Gaussian matrices are assumed to be square and selfad-joint. The implementation of the results is also able to generate the moments of many types of combinations of independent Gaussian and Wishart random matrices.
The algorithms are based on iterations through partitions and permutations and they give us rather complex expression. However, the author of [41] have generated Matlab codes based on concepts as partitions and permutations in order to implement the above results.
Once known the moments, the Newton-Girard formulas [42] can be used to retrieve the eigenvalues from the moments. These formulas state a relationship between the elementary symmetric polynomials 1 (λ 1 , . . . , λ n ) = λ 1 + · · · + λ n  If the sums of powers S p (l 1 ,..., l n ) are known for 1 ≤ p ≤ n, the relation (10) can be used to compute the elementary symmetric polynomials ∏ m (l 1 ,..., l n ) for 1 ≤ m ≤ n. Therefore, the characteristic polynomial (the roots of which provide the eigenvalues of the associated matrix) can be fully charac-terized since its n − k coefficient is given by (−1) k ∏ k (l 1 ,..., l n ). In this way the entire characteristic polynomial can be computed, and the eigenvalues can also be found.

Non free case
In recent works, deconvolution, based on the moments method, has been analyzed when n ∞ for some particular matrices A and B. For instance, when A is a random Vandermonde matrix and B is a deterministic diagonal matrix [43], or when A and B are two independent random Vandermonde matrices [44]. The authors in [43] developed analytical methods for finding moments of random Vandermonde matrices with entries on the unit circle and provide explicit expressions for the moments of the Gram matrix associated to the models considered. The explicit expressions found for the moments are useful for performing deconvolution. In these cases the moments technique has been shown to be very appealing and powerful in order to derive the exact asymptotic moments of "non free matrices". This type of matrices occurs in cognitive radio [45].
Definition 4.1 An N × M random Vandermonde matrix with entries on the unit circle has the form where the phases ω 1 ,..., ω M are i.i.d. random variables in [0, 2π].
The asymptotic behaviour of random Vandermonde matrices is analyzed when N and M are large, both go to infinity at a given ratio M N → c, with c constant. The scaling factor 1 √ N and the assumption that the entries lies on the unit circle guarantee that the analysis will give limiting asymptotic behaviour.
Definition 4.2 For ρ = {W 1 , . . . , W r } ∈ P(n) , we define (11) where ω W 1 , . . . , ω W |ρ| are i.i.d. (indexed by the blocks of r) with the same distribution of ω and where b(k) is the block of r which contains k. If the limit exists, we call it a Vandermonde mixed moment expansion coefficient.
These quantities do not behave exactly as cumulants, but rather as weights which tell us how a partition in the moment formula we present should be weighted. In this respect, the formulas presented for the moments are different from classical or free moment-cumulant formulas, since these do not perform this weighting. The limits K r,ω may not always exist, and necessary and sufficient conditions for their existence seem to be hard to find. In [43], it has been proved that the limit in (12) exists if the density of ω is continuous. The calculation is based on combinatorial computation using crossing partitions since the matrices are not free.
The fact that all moments exist is not enough to guarantee that there exists a limit probability measure having these moments. However, it is proved in [46] that the Carleman's condition is satisfied.
Uniform phase distribution plays an important role or Vandermonde matrices.
Theorem 4.4 For Vandermonde matrices with uniform phase distribution u, the Vander-monde mixed moment expansion coefficient exists ∀r. Moreover, K r,u satisfies the following properties The importance of uniform phase distribution is also expressed by the following theorem.
The behaviour of Vandermonde matrices is different when the density of ω has singularities and depends on the density growth rates near the singularities points. Indeed, for the case of generalized Vandermonde matrices, whose columns do not consist of uniformly distributed power, it is possible to define mixed moment expansion coefficients but the formulas are more complex.
When many independent Vandermonde matrices are considered, the following relations hold.
Theorem 4.6 If V 1 , V 2 ,... are independent Vandermonde matrices with the same phase distribution ω with continuous density, then {2, 4,... }}, and "≤"denotes the refinement order, i.e., any block of r is contained within a block of s.
If V 1 , V 2 ,... are independent Vandermonde matrices with different phase distribution ω i with continuous density p ω i , Equation (16) still holds with K r,ω replaced by In [44], the authors generalize the above results replacing convergence in distribution with almost sure convergence in distribution.
Such matrices are applied to cognitive radio in [45], where authors consider a scenario with a primary and a secondary user wish to communicate with their corresponding receivers simultaneously over frequency selective channels is considered. Under realistic assumptions that the primary user is ignorant of the secondary user's presence and that the secondary transmitter has no side information about the primary's message, the authors propose a Vandermonde precoder that cancels the interference from the secondary user by exploiting the redundancy of a cyclic prefix.

Toeplitz and Hankel matrices
The same strategy used to compute the moments of Vandermonde matrices can be used for Hankel and Toeplitz matrices.

Definition 4.7 We define a Toeplitz matrix
where X i are i.i.d., real-valued random variables with unit variance.
Theorem 4.8 [44]We denote by M i the asymptotic moments of order 2i of a Toeplitz matrix T, then we have Similar results hold for Hankel matrices. Definition 4. 9 We define an Hankel matrix where X i are i.i.d., real-valued random variables with unit variance.
Theorem 4.10 [44]We denote by M i the asymptotic moments of order 2i of a Hankel matrix H, then we have Toeplitz and Hankel matrices are structured matrices used for compressive wide-band spectrum sensing schemes [47,48] and for direction of arrival estimation [49].

Power estimation
We consider a multi-user MIMO system where the received signal can be expressed by where W, P, s i , and n i are respectively the N × K channel gain matrix, the K × K diagonal power matrix due to the different distances from which the users emit, the K × 1 matrix of signals and the N × 1 matrix representing the noise with variance s. In particular, W, s i , n i are independent standard, complex, Gaussian matrices and vectors. We are interested in estimating the power with which the users send information, from M observations (during which the channel gain matrix stays constant) of the vector y i . We consider the 2 × 2matrix and we apply additive deconvolution at first, and then multiplicative deconvolution twice (each application takes care of one Gaussian matrix). We can estimate the eigenvalues of P when we have an increasing number L of observations of the matrix Y = [y 1 ,..., y M ], representing the signals received (we average across several block fading channels). Hence, we estimate the moments of the matrix P based only on the moments of the matrix YY H . Knowing the moments of P, we can estimate the eigenvalues using Newton-Girard formulas. When L increases, we get a prediction of the eigenvalues which is closer to the true eigenvalues of P. Figure 4 illustrates the estimation of eigenvalues up to L = 1000 observations. The actual powers are 2.25 and 0.25, the variance s of the noise is assumed to be equal to 0.1.

Understanding the network in a finite time
In cognitive MIMO networks, one must learn and control the "black box" (for instance the wireless channel) with multiple inputs and multiple outputs within a fraction of time and with finite energy. The fraction of time constraint is due to the fact that the channel (black box) changes over time. Of particular interest is the estimation of the capacity within the window of observation.
Let y be the output vector, x and n respectively the input signal and the noise vector, so that In the Gaussian case, the rate is given by where R Y is the covariance of the output signal and R N is the covariance of the noise. Therefore, one can In the simulation we have taken n as an i.i.d. standard Gaussian vector of dimension 2 and Considering L observations of (21), we stack the observations as columns in a compound matrix to get an unbiased estimate of the moments of R Y . In Figure 5, we can observe the convergence of the estimated capacity to the true one.
In Figure 6, we estimate the eigenvalues of the matrix R Y versus the number of observations L. Once again, we observe the convergence of the estimated eigenvalues to the true eigenvalues.

Users detection
We consider M mobile users, each with a single antenna, communicating with a base station equipped with N receiving antennas, arranged as a uniform linear array (ULA). The N × 1 received signal at the base station is given by where x is the M × 1 input signal transmitted by the M users, satisfying E[xx H ] = I M , and n is the additive Gaussian noise such that E[nn H ] = σ 2 I M . We suppose that the components in x and n are independent. The matrix P = diag(p 1 ,..., p M ) represents the power with which users send information. In the case of a line of sight between the mobile users and the base station, the N × M matrix V has the following form we can have access to the sample covariance matrix For estimation of the number of users M, we assume that the power distribution of P is known. Based on the knowledge of the power distribution, we are able to estimate the number of users in the system. Thanks to the moments method it is possible to estimate the moments of the sample covariance matrix in (25) from the moments of the power matrix P.
Proposition 5.1 [43]Given the phase distribution ω and p ω its density function, we define Denoting the moments of P and the moments of the sample covariance matrix W, respectively, by then W 1 = c 2 P 1 + σ 2 W 2 = c 2 p 2 + (c 2 2 I 2 + c 2 c 3 )(P 1 ) 2 + 2σ 2 (c 2 + c 3 )P 1 + σ 4 (1 + c 1 )  Knowing the matrix P, we can compute the moments P i . From the moments P i , using the above expressions is possible to get the moments W i of the sample covariance matrix. We consider some candidate values of the number M of users. The estimate of M is chosen as the one which minimizes the sum of the square errors between the moments W i and the moments of the observed sample covariance matrix.
In Figures 7 and 8 we have taken N = 100, M = 30, σ = √ 0.1 , the distance d = 1 and the wavelength l = 2d. We take the values of the matrix P equals to 0.5, 1.5, 2 with equal probability. Therefore just the first three moments are considered. We see that in Figure 8 the approximation is better even for a small number of observations.

Wavelength detection
We know that in cognitive radio, mobile users are interested in understanding which band of transmission is occupied. Considering the model (23), we estimate the wavelength l using the moments method. As before, we consider some realizations of the sample covariance matrix and we estimate its moments in function of the moments of the power matrix, supposed to be known. In Figure 9, we consider K = 10, L = 30, N = 100, and σ = √ 0.1 , in addition to l = 2, d = 1,α = π 4 . Candidate values of the wavelength are taken in the interval [0.5, 4] with step 0.1 and the estimate value is chosen as the one which minimizes the sum of the squared errors of the first three moments between the moments W i and the moments of the observed sample covariance matrix.

Conclusions and open problems
In the last decade, researchers and practitioners have devised cognitive radio as a possible solution for the problem of underutilization of the radio spectrum. These theoretical ad-vancements in cognitive radio research have  set up a solid base for practical applications and even further developments. However, still open questions need to be answered. In particular, in the current work, we use free probability theory, through the concept of free deconvolution, to attack the problem of retrieving useful information from the network with a limited number of observations. Free deconvolution, based on the moments method, has shown to be a interesting tool to tackle this problem. First, we show how the moments method works in the case where scalar random variables are considered. Then, since in practical situations systems are so complex that the parameters of interest need to be represented by vectors and matrices and can not be modeled by scalar random variables, we analyze the case where random matrices are considered. We propose algorithm method to compute the moments of various models such as Gaussian and Vandermonde matrices. Matlab codes for cognitive radio is developed to implement this algorithm method. In the applications free deconvolution framework can be used for retrieving relevant information such as power with which users send information, number of users, etc.
We have analyzed how free deconvolution framework works for random matrices and how random matrices behave differently depending on their structure. Different directions of research can be followed in this framework. In Vandermonde matrix model, the deconvolution techniques have been performed taking into account only diagonal matrices. It could be interesting to address the case of general deterministic matrices. In this way, correlation between users can be considered. The knowledge of the correlation could be a relevant element to improve the cooperation among the users in a cognitive system.
The extension of free deconvolution techniques to more general functions of matrices is a hard task. The difficulty is related to the fact that up to now there is not a general hypothesis that guarantees the application of free deconvolution to any random matrix. This extension can take into account more general models that represent more realistic situations.
For future perspective we would like also to take into account a second order analysis. The study of the covariance matrices can improve the accuracy of the estimations related to the free deconvolution framework.