A unified message-passing algorithm for MIMO-SDMA in software-defined radio

This paper presents a novel software radio implementation for joint channel estimation, data decoding, and noise variance estimation in multiple-input multiple-output (MIMO) space division multiple access (SDMA). In contrast to many other iterative solutions, the proposed receiver is derived within the theoretical framework of a unified message-passing algorithm, combining belief propagation (BP) and the mean field approximation (MF) on the corresponding factor graph. The algorithm minimizes the region-based variational free energy in the system under appropriate conditions and, hence, converges to a fixpoint. As a use-case, we consider the high-rate packet-oriented IEEE 802.11n standard. Our receiver is implemented on a software-defined radio platform dubbed MIMONet, composed of a GNU radio software component and a universal software radio peripheral (USRP). The receiver was evaluated in real indoor environments. The results of our study clearly show that, once synchronization issues are properly addressed, the BP-MF receiver provides a substantial performance improvement compared to a conventional receiver also in real-world settings. Such improvement comes at the expense of an increase in running time that can be as high as 87. Therefore, the trade-off between communication performance and receiver complexity should be carefully evaluated in practical settings.


Introduction
Multiple-input multiple-output (MIMO) technology is popular in wireless communications due to the increased spectrum efficiency brought along by the use of multiple antennas in transmission, reception, or both.A further performance improvement is possible when MIMO technology is used in combination with orthogonal frequency division multiplexing (OFDM) modulation, namely, when different streams of information bits are modulated on orthogonal subcarriers.
The basic role of the receiver is to decode the information bits from the received signal which is affected by various unknown factors, such as the channel response and receiver noise.Receivers were originally designed to process the received signal in a cascaded fashion, starting with synchronization, then channel estimation, equalization, and finally decoding.Building on the intuition that (soft) information generated by one module can actually be re-used as a refined input to other modules, a multitude of receiver structures performing iterative, "turbo"-like processing have been proposed in the literature.Results unequivocally show substantial performance gains compared to non-iterative architectures.However, separate design of the individual modules cannot provide the guarantee of global optimality or convergence.Moreover, it is not clear what type of information the modules should exchange and how to combine/process it.In recent years, several works have looked at the receiver design from the perspective of Bayesian inference on graphical models [1].The use of formal frameworks for approximate inference allows for a principled design of iterative receiver structures.Among the various algorithms for approximate inference [1], the belief propagation (BP) algorithm [2] (also referred to as sum-product algorithm [3]) is the most celebrated one.The algorithm operates by passing messages on the so-called factor graph, which represents the factorization of the probabilistic model of the communication system.The algorithm is able to compute exact marginal probability density functions (pdfs) only when the factor graph is cycle free; otherwise, it outputs approximations of the marginal pdfs.Still, it was shown to work well in many graphs with cycles.This led to its widespread use in digital communications [3,4].However, BP usually yields intractable computations in probabilistic models with mixed discrete and continuous variables (see, e.g., [5]), which is also the case with our MIMO model.Different from BP, the mean field (MF) approximation [1] is a variational inference method that has been successfully applied to continuous probabilistic models.The algorithm also admits a message-passing formulation [6], and typically has simple closed-form message computation, especially for conjugate-exponential models.Its main drawback is that, for some models, the provided solution is not accurate enough, due to the underlying approximation of the joint posterior pdf (see [1] for more details).When the probabilistic model contains both continuous and discrete variables and the dependencies between them are both deterministic and stochastic, it is advantageous to apply the BP and MF algorithms in those parts of the factor graph where they are most suitable.For this purpose, we employ the recently proposed unifying inference framework that combines BP and MF [7].In a nutshell, the factor graph is divided into two parts, the BP part and the MF part, where, respectively, BP-like and MFlike messages are computed.The framework states clear message computation rules, including specific expressions for the messages to be passed between the two parts.The unified nature of the algorithm resides in the fact that it iteratively optimizes a single objective function.Having derived a BP-MF-based MIMO-space division multiple access (SDMA) receiver in [8], we want to adapt and test it in real channels using software-defined radio (SDR).
SDR systems are becoming commonly used in the wireless networking research community due to their flexibility in rearranging different communication architectures with limited effort.Three key virtues of SDR are reconfigurability, intelligence, and flexibility [9].To reduce hardware integration costs and to increase flexibility in implementing the physical layer at the same time, SDR systems run functional modulessuch as synchronization, modulation/de-modulation, coding/decoding, interleaving/de-interleaving, and channel parameter estimation-in a software that is executed on personal computers.Therefore, SDR technology facilitates implementation of reconfigurable radio systems where dynamic selection of parameters for the aforementioned modules is possible.The literature reports a number of SDR test-beds, designed to test network-level protocols.Popular among them are the Wireless Open Access Research Platform (WARP) [10,11] developed at Rice University and the Microsoft Research Software Radio (MS-SORA) [12].Hydra [13], developed at the University of Texas at Austin, is a SDR testbed comprising radio software by GNU (recursive acronym for "GNU's Not Unix") and universal software radio peripheral (USRP) by Ettus Research™ [14].The media access control layer (MAC) and the physical layer (PHY) design of Hydra implements the IEEE 802.11 distributed coordination function and a 2 × 2 MIMO-OFDM based on the IEEE 802.11a/g standard, respectively.Since then USRP hardware/GNU radio software was used to implement and test a series of heuristic receivers at PHY [15][16][17], other USRP/GNU radio implementations test a rate adaptation technique [18] and a random access protocol for MIMO networks [19].To our best knowledge, there is only one publication related to a USRP hardware/GNU radio software implementation based on a theoretical framework: the expectationmaximization (EM) algorithm with a BP maximization step has been used in the context of OFDM physical-layer network coding (PNC) systems for phase tracking and single-user channel decoding [20].
This paper assesses the real performance of a MIMO-SDMA receiver, performing joint multi-user data decoding, multi-channel, and noise variance estimation (JDE), implemented on a self-made USRP/GNU radio test-bed dubbed MIMONet [21].In contrast to many existing receiver implementations, ours is based on principled design, namely the combined BP-MF message-passing framework [7], where the virtues of BP and MF are kept but their respective drawbacks are avoided.Furthermore, the paper examines synchronization of (i) the USRP hardware, (ii) PHY burst and carrier at sample rate, and (iii) the CPU cores to process the individual data streams in parallel.
The rest of the paper is organized as follows.Section 2 presents an overview of the system under consideration, focusing on the PHY and MAC.As an example, we consider the high-rate packet-oriented IEEE 802.11n standard [22].Section 3 presents the test-bed setup and addresses three levels of synchronization.Section 4 applies the combined BP-MF message passing to JDE.Section 5 evaluates the performance of the proposed scheme in real environments.A conventional MIMO receiver comprising a MIMO zero-forcing channel estimator and a maximumlikelihood sequence decoder is used as a benchmarking reference.We investigate the bit-error rate (BER), packeterror rate (PER), and the execution time per uncoded bit in both line-of-sight (LOS) and none-line-of-sight (NLOS) conditions.The results of our experiments clearly demonstrate the receiver complexity vs. performance trade-off: if BP-MF is executed with a single iteration, its performance is worse than that of the conventional receiver.However, the performance of the BP-MF drastically improves with the number of iterations: with five iterations, its performance is already consistently better than the conventional receiver, while up to 4 dB performance gain can be achieved after about 20 iterations when BP-MF has converged.
Notation: In the following, (•) † , R(•), and I(•) denote the conjugate transpose and the real and the imaginary parts of a complex argument, respectively.The symbol ∠ is the argument of a complex number.The symbol diag{•} denotes a square matrix with the argument along its main diagonal.The Hadamard product of two vectors is denoted by .Moreover, • is the two norm of the argument.Vectors and matrices in the frequency domain (time domain) are represented by boldface lowercase and uppercase Latin (Greek) letters, respectively, unless otherwise stated.The notation col{•} represents the column vector with the elements in the argument as its entries.The symbol 0 m×n denotes the m × n-dimensional all-zero matrix, whereas I n represents the identity matrix of size n × n.The pdf of a multivariate Gaussian random vector with mean μ and covariance matrix is denoted by CN(•; μ, ).The pdf of a Gamma distribution with scale a and rate b is denoted by Ga(•; a, b).

System description
We consider a packet-oriented multi-stream MIMO-OFDM WLAN system with N T transmit (Tx) antennas and N R receive (Rx) antennas that implements multiple parallel, spatially segregated channels.As a working example, we refer to the IEEE 802.11n standard [22].Each channel may support a separate data stream k ∈ Without loss of generality, we assume that antenna k ∈ [1, N T ] is transmitting while antenna r ∈ [1, N R ] is receiving.The kth information stream {u k [i] } comprises L u ∈ N bits per frame.The information bit sequence u k is forward error correction (FEC) encoded with rate R, fed into the symbol interleaver k , serial-to-parallel converted, mapped onto a modulation alphabet of size S k = 2 M k , M k ∈ N, and the resulting symbols modulate N a active out of N carriers.To ease channel estimation, N p pilot tones with indices in P ⊂ { (N − N a − N p )/2 + 1 : (N + N a + N p )/2 + 1 } are multiplexed with the data.Specifically, the pilot sequence of stream k, k ∈ [1, K], is repeatedly taken from the kth row of the K×Kdim.Walsh-Hadamard matrix, K = 2 , ∈ N. Figure 1 illustrates the data and pilot placement.
The composite data vector is multiplexed with the N-point Fourier matrix F ∈ C N×N , fed into the permutation matrix s ∈ B (N+P)×N to add a cyclic prefix of length P, and sent over the channel.Without loss of generality, the relative propagation delays, caused by the channel, are incorporated in the channel impulse response (CIR).In receiver r, after analog-digital conversion the equivalent time-discrete observation vec- where T denotes the OFDM symbol time duration.The Toeplitz channel matrix A rk (τ r ) ∈ C (N+P)×(N+P) is constituted of the static discrete-delay CIR α rk ∈ C L , L ≤ P. The distortion matrix E(e) ∈ C (N+P)×(N+P) accounts for the frequency offset (FO) of the (common) local reference oscillator.It has the form with the phase φ φ(e) = 2πe/N, where e = [−0.5,0.5] is the FO normalized to subcarrier spacing.The matrix T r (τ r ) ∈ B (N+P)×(N+P) , containing the time offset (TO) τ r ∈ N 0 , is given by This TO is mainly caused by the signal buffers in the receiving USRPs.Finally, the entries of the vector ω r ∈ C N are independent circularly symmetric Gaussian random variables with variance σ 2 w .From ( 1), (2), and (3), the received unsynchronized time-discrete signal vector is FO-corrected, fed into the permutation matrix P r ∈ B N× (N+P) to remove the cyclic prefix, Fouriertransformed, and finally TO-corrected by the diagonal matrix D(τ r ) ∈ C N×N with the mth diagonal entry D m,m ( τr ) = exp{j 2πτ r (m − 1)/N}, m = 1, . . ., N, to yield the frequency-discrete vector

Probabilistic model of the MIMO system
Suppose that exact estimations of the TO and FO are available at the receiver.Then, the signal vector in (4) simplifies to where the components of h rk W α rk , with W ∈ C N×L denoting the truncated Fourier matrix, are samples of the frequency response of the channel between transmitter k and receiver r.For the design of the receiver, we make the assumption that the different channels are a priori mutually independent.Thus, we consider the prior pdf , where each factor is modeled as a complex Gaussian prior pdf with zero mean and covariance While the assumption we make does not model/take into account spatial correlation, it leads to lower complexity channel estimation [8].We thus prefer low complexity over performance of the algorithm.In ( 5), the frequency-domain noise vector w r has the pdf p(w r ) = CN(w r ; 0, λ −1 r I N ) with λ r 1/N 0 being the noise precision.
As design criterion, we use the bit-wise MAP decision rule, which minimizes the BER.That is, the decoded value of the ith information bit of the kth stream is where p(u k (i)|Y) is the marginal posterior pdf of bit u k (i) and Y contains the observation vectors of all receive antennas referring to all OFDM symbols of a packet: The marginal posterior pdf required in ( 7) is computed from the joint posterior pdf of the variables in the probabilistic model of the system by marginalizing out all variables but the bit of interest.Collecting all unknown system variables (channel responses, noise precision, data symbols, and information sequences) in , invoking Bayes' rule and using the system assumptions, the joint posterior pdf writes In the following, we introduce the notation for the factors in (8) and give their functional form.We define the "observation" factors f Y r to be the likelihood of the corresponding variables.From (5), we have The prior pdf of the precision of the noise at the rth receiver is denoted by f r .We choose a gamma prior pdf (i.e., a conjugate prior for the likelihood f Y r ) and set its parameters to be non-informative: For the prior pdf of the channel vector h rk , we write f H rk (h rk ) p(h rk ), with p(h rk ) given by ( 6).The factor and stands for the deterministic operations of coding, interleaving, and modulation mapping performed in transmitter k.Finally, is the prior probability mass function of the ith information bit of transmitter k.We assume a uniform prior, i.e., the bit values are a priori equally probable.
It is helpful to visualize the probabilistic dependencies between the system variables by representing the factorization of the joint posterior pdf in a factor graph [3].With the above definitions, the factor graph representing ( 8) is depicted in Fig. 2. Now that we have defined the probabilistic model, it is important to note that exact marginalization of (8) requires evaluation of high-dimensional integrals that Fig. 2 Factor graph representation of the pdf factorization in (8) do not admit closed-formed expressions.That is, direct marginalization is computationally intractable.Therefore, we use an approximate inference framework to compute the estimates of the marginal pdfs of the variables, called beliefs.Then, the MAP decision in (7) will be applied to the beliefs b(u k (i)) ≈ p(u k (i)|Y) of the information bits.

Physical layer convergence protocol
To facilitate synchronization and automatic gain control, a preamble of length L i symbols with L i mod N = 0, is prepended to each OFDM packet.
The preamble follows the IEEE 802.11n highthroughput (HT) greenfield format without a legacy compatible part, as sketched in Fig. 3. Notice that neither coding nor scrambling is applied to generate the preamble.
The offset binary phase-shift keying (BPSK)-modulated short training field (STF) of the kth Tx antenna comprises a sequence of identical training symbols each of length N/4, extending over two OFDM symbols [22] Section 20.3.9.The periodic structure of the STF is ideally suited for FO estimation.A subsequent BPSK-modulated first long training field (LTF), composed of two identical training symbols each of length N, assists the receiver in estimating the TO and the CIR of the channel between the Tx antenna k and Rx antenna r.The following legacy signal Fig. 3 Greenfield preamble structure adopted from the IEEE 802.11n standard [22] field (SIG) carries information on the HT packet format.Additional N T − 1 high-throughput LTFs ∈ C N T (N+P) are based on the same long training symbol as the first LTF in the preamble.
For the other Tx antennas k , k = k, cyclic shift is applied to the above preamble structure to prevent beamforming when similar signals are transmitted on different spatial streams [22].
The receiver is unsynchronized and does not know the channel coefficients and the data sequences.

Test-bed setup and synchronization
We consider a N T × N R MIMO communication link between one Tx node and one Rx node, comprising two host personal computers (PCs) and N = N T + N R USRPs, as illustrated in Fig. 4 for Each node was realized with the components given in Table 1.At hardware level of SDR, each USRP contains a bank of ADC/DACs, a wideband radio front end, and a vertical antenna.At software level of SDR, digital signal processing is distributed between internal fieldprogrammable gate arrays (FPGA) and an external host PC.
The open source software framework GNU radio under GNU general public license (GPL) was adopted to realize the transceiver chain depicted in Fig. 4. The choice of GNU radio was motivated by its scalability, its flexibility in setting the signal processing components, and its wide user base.
Three different levels of synchronization are needed to realize MIMO communications on computer-hosted hardware: (i) synchronization of the USRPs; (ii) burst synchronization at sample level; (iii) synchronization of the CPU cores to process the individual data streams simultaneously.
In the sequel, we describe how we addressed synchronization at each level.

USRP hardware synchronization
To enable MIMO communications, the transceiver must incorporate the following two functionalities: (i) each USRP hardware requires clock synchronization to derive the local oscillator frequency and timing synchronization to align the analog-to-digital converter/digital-toanalog converter (ADC/DAC) samples; (ii) all CPU cores have to align the digital signal streams in frequency [23] by the 10 MHz singleton and in time by the pulse-persecond (PPS) timing references.Our synchronizer, listed in Table 1, derives these references from GPS signals.In this way, PPS signals can be derived with an accuracy better than ± 50 ns.
When two USRPs are located next to each other, the internal 10 MHz /1 PPS reference of one USRP can be used to synchronize the other with a MIMO cable.
Subsequently, we present a low-complex closed-form solution to joint ML time-offset and fractional frequencyoffset estimation.

Synchronizer design
Let us return to the specific model in (1) Notice that the likelihood function in (11) exhibits a unique maximum in contrast to that in the work by [24].
A valid burst synchronization is achieved if θ is contained in the region Due to the model assumptions, the log-likelihood function in (11) The covariance matrix is given by = E{ωω † } = N 0 I, where I is the identity matrix.Expanding (12), we get The log-likelihood function in (12) can first be maximized w.r.t.e, leading to The integer part I of the FO e cannot be resolved at this stage.When, however, an external oscillator controls the clocks of the USRPs, I = 0, and e is confined to the range [−0.5, +0.5].Plugging (14) into ( 13) , the log-likelihood function in (12) can now be maximized w.r.t.τ 1 , . . ., τ N R .Following this approach, we have Inspecting (15), it can be seen that the TO of the rth receive stream is independent of the other r − 1 receive streams.As a result, Substituting ( 16) into (14), yields The above burst synchronization algorithm is implemented in GNU radio software at the host PC.

The resulting synchronized observation vector y[n, i]
)) is post-processed by MATLAB ® .

Core synchronization
GNU radio 3.6.0has an incorporated thread-per-block scheduler that allows for each signal processing block in the flow graph to run in an independent thread.The thread, associated to one block, loops until GNU radio code is terminated.In each loop, the thread calls the block's executer.If the block has available output buffer and sufficient data in the input buffer, the executor asks for signal processing on that block and then informs neighboring block(s) about its new status.Thus, all blocks in the flow graph process incoming data chunk-by-chunk [25].

Iterative channel estimation and data decoding
In this section, we describe our proposed MIMO receiver algorithm, which recovers the information bits of the K data streams.The various receiver tasks-channel estimation, MIMO detection, and decoding-are jointly designed by formulating the bit recovery process as Bayesian inference on the probabilistic model of the underlying OFDM system.The resulting algorithm iteratively computes and passes messages on the factor graph representing the probabilistic model.After a fixed number of iterations (tunable parameter), the algorithm returns the most probable configuration of the bits transmitted in the K data streams, along with estimates of other unknown quantities, such as the channel responses and the noise power.

Application to MIMO receiver design
The factor graph in Fig. 2 is split into the MF and BP parts by taking into account the functional forms of the factors and the specificities of the MF and BP algorithms.The factors f Y r , f r , and , and all variable nodes connected to them are placed in the MF part as they form a conjugate-exponential model.Given that BP has successfully been used for demapping and decoding, the rest of the factor nodes and the variable nodes connected to them represent the BP part.

Computation of messages and beliefs
The belief of each of the variables approximates the variable's posterior pdf.In the forthcoming computations, the following statistics will occur: ) represent the (soft) estimate and uncertainty, respectively, of the symbol on the i th subcarrier of the nth OFDM symbol transmitted by k th antenna.Note that for pilot subcarriers (i.e., i ∈ P), we have • The mean and covariance matrix of the belief b(h rk ) of the respective vector of channel weights are denoted by ĥrk h rk b(h rk ) and • The mean λr λ r b(λ r ) represents the estimate of the precision of the rth receiver's noise.

Channel estimation Obtaining the beliefs b(h rk
corresponds to channel estimation and requires computing the MF messages related to the channel vectors.We readily show that the message from the observation node f Y r has the Gaussian form where represents the estimates and their uncertainty when taking into account only the observations from the nth OFDM symbols and no prior information.Using the fact that multiplying Gaussian pdfs results in a Gaussian pdf, we obtain . Defining with mean and covariance matrix

Estimation of the noise precision
The message from the observation factor node m MF f Yr →λ r (λ r ) is found to be proportional to a gamma pdf.Given that the prior is a non-informative gamma pdf, the belief of the noise precision at receiver r equals the message m MF f Yr →λ r (λ r ).Therefore, where the rate of the gamma pdf is given by The estimates of the noise precisions are λr =

MIMO detection and decoding
The messages from the observation nodes are found to be where Note that the right-hand side of ( 20) is evaluated at the symbol constellation points.When normalized, those discrete messages "carry" extrinsic information on the different constellation points. For ) are passed to the BP part.They represent the input to the demappers and decoders, which compute messages related to the coded bits and information bits using BP.For example, applying BP to decode convolutional codes is equivalent to using the BCJR algorithm [26].The messages represent approximations of the a posteriori probabilities (APPs) of the symbols.These values are further passed to the MF part.The decoders also output the messages m BP Given the prior pdfs, the beliefs of the information bits are obtained as

Outline of the iterative algorithm
We now define the iterative algorithm by specifying a schedule for the message computations.
The BP-MF algorithm needs to be initialized.First, the algorithm sets the conditional expectations λr = Then, the beliefs of those subvectors of h rk , corresponding to the pilot indices P, are computed successively for each k.Next, ĥobs rk [i] = 0 and σ −2 and the beliefs of the channel vectors are computed as Having obtained the initial estimates of the beliefs of the channel weights and noise precision, the algorithm then performs MIMO detection with the initial parameter setting dk [n, i] = 0 and variance σ 2 , a scheme which resembles successive interference cancelation.The initial stage ends with demapping and decoding.
During subsequent iterations, messages for soft mapping, channel and noise precision estimation, MIMO detection, and demapping and decoding are computed.
After convergence, the information bits are determined by taking hard decisions based on the beliefs b(u k

Performance evaluation in real environments
The performance of the BP-MF iterative receiver was experimentally evaluated at some premises of the Istitutito di Informatica e Telematica (IIT) at the National Research Council (CNR), Pisa, Italy.The building is constructed of concrete with steel reinforcement and with wooden doors.The measurements were conducted in the first floor in the Algorithms and Computational Mathematics laboratory of the institute.A map of this area is given in Fig. 5. Test cases include (i) a LOS link in an office environment, comprising the windows, office furniture, and computers and (ii) a non-LOS link, striking office rooms, bathrooms, and a corridor.Communication is based on the IEEE 802.11n standard for OFDM-MIMO [22].The settings of the OFDM-SDMA system emulated on the test-bed are outlined in Table 2.The number of counted error events per simulation point is large enough to produce sufficiently tight confidence intervals.Hence, the confidence intervals are omitted in subsequent plots.
The BP-MF receiver was benchmarked against a (lowcomplex) conventional MIMO receiver, composed of a linear MIMO channel filter using pilot-based channel estimates and a bank of individual ML-sequence decoders.Specifically, the MIMO channel estimate is based on the least squares technique given the Walsh-Hadamard pilot matrix and the synchronized observation matrix y.The composite MIMO channel estimates at all active tones, Ĥ, is obtained by piecewise linear interpolation.The decorrelating MIMO multiuser detector outputs the signal x Ĥ−1 y.
In real environments, the SNR at the individual slicer is unknown.To obtain a guess of the SNR, though, we use the following histogram techniques.Let To construct a histogram from γb,k , the data is split into bins of width 0.5 dB.Each bin contains the number of occurrences of scores in the data set γb,k [n], n ∈ [1, Q], which fall within that bin.For the sake of fair comparison, both conventional MIMO and BP-MF receivers compute the SNR in the same way.
Experiments were conducted during nights or weekends, to avoid as much as possible dynamic interference with employees' movements and devices.

NLOS scenario
We first present the experimental results obtained in NLOS conditions.Without loss of generality, we focus on stream 1. Surprisingly, the BP-MF receiver was initially not able to decode the individual data streams at the output of the MIMO synchronizer (4) while it had done so in synthetic channels [8].Closer examination revealed that the real channel is strongly correlated in space, a property that had not been accounted in the underlying channel model of the BP-MF receiver.When it is included in the system model, spatial channel correlation can be jointly estimated and exploited to improve the accuracy of the estimates of the other system parameters and vice versa [8].As this estimation problem is outside the scope of the paper, spatial correlation is subsequently suppressed by an equalizer with transfer function Ĥ−1 prior to MIMO reception.The same approach is pursued in the conventional MIMO receiver.
Figure 6 shows the bit-error rate (BER) vs. quantized packet-SNR per bit with the number of iterations i as a parameter.The payload size is L u = 512 bytes.Inspecting Fig. 6, we see that with increasing iteration index i, the BER performance keeps improving until a minimum of the variational free energy is found.At the low and high SNR regimes, the BP-MF algorithm converges to a fix-point around i i = 17 iterations.After convergence, the proposed BP-MF outperforms the conventional receiver over the entire SNR range with a gain of up to 4 dB.This gain comes despite the mismatch between the real propagation conditions and those mimicked by the model used to derive the BP-MF algorithm.
The packet-error rate (PER) vs.
[ γb ] with the number of iterations as a parameter is reported in Fig. 7 for a packet length of L u = 512 ḃytes.To achieve a typical target PER of PER = 0.1, the BP-MF algorithm requires [ γb ] ≈ 2.1 dB after convergence.The conventional MIMO receiver, in contrast, meets the target-PER only at [ γb ] ≈ 5.4 dB, leaving a gap of approximately 3.3 dB.The performance gap between the BP-MF and conventional MIMO receivers can be as high as 4 dB depending on the target PER value.
With increasing payload size L u , the BP-MF receiver shows improved PER performance, as shown in Fig. 8.This is mainly because the individual bit errors are independent so that their impact on the packet-error rate is only 1 − (1 − BER) L u .The target packet error-rate PER = 0.1 is met at [ γb ] ≈ 1.9 (4.1) dB for L u = 96 (L u = 1032) bytes, corresponding to a SNR gap of 2.1 dB for a tenfold increase in payload size.Also shown in Fig. 8, the (one-stage) conventional receiver, in contrast, has difficulties in handling large packet sizes.

LOS scenario
We now discuss the experimental results obtained in LOS conditions.From Fig. 9, it can be seen that the BER curve of the BP-MF receiver is confined in a narrow range around BER = 0.2 for small SNR values up to [ γb ] = 2 dB, followed by a waterfall region for higher SNR values.Convergence is achieved after approximately i = 10 iterations.Generally speaking, the performance of the BP-MF receiver in the LOS condition is inferior to that in the NLOS condition, mainly because the number of degrees of freedom of the channel that can be exploited by the receiver is higher in the latter condition.The BP-MF receiver converges faster in LOS than it does in NLOS, because its convergence speed is inversely proportional to the number of degrees of freedom.The BP-MF receiver performs roughly 2 dB better than the conventional MIMO receiver over the entire SNR range.
The behavior of the PER curve in Fig. 10 is similar to that of the BER curve in Fig. 9.The waterfall region starts around [ γb ] = 2 dB.The target PER = 0.1 is met at [ γb ] = 3.5 dB, i.e., 1.4 dB higher than in NLOS.Notice that With increasing payload size, the PER performance of the BP-MF receiver still improves.However, the target of PER = 0.1 is met at [ γb ] ≈ 3.1 (6.8) dB for L u = 96 (L u = 1032) bytes.The gap in SNR is now 3.7 dB (Fig. 11).
Finally, Fig. 12 shows the execution time per uncoded bit of the BP-MF receiver normalized to the execution time by the conventional receiver.Payload sizes of L u = 96 and L u = 1032 bytes are considered.As the BP-MF receiver operates at symbol level, its execution time is roughly independent of the payload size and, therefore, we conclude that the per-bit computational complexity is linear in the payload size plus some offset.After the first iteration, the execution of the BP-MF algorithm has already required ten times more time than the execution of the conventional MIMO receiver.Ergo, the offset is ten.After convergence, typically at i = 17 iterations, the ratio is as high as 87.

Discussion and conclusions
In this paper, we investigated for the first time the trade-off between complexity, running time, and performance for an advanced, iterative MIMO-SDMA receiver operating in real world conditions.The receiver was derived within a unified message-passing framework that combines belief propagation and mean-field approximation.At each iteration, messages related to the channel parameters and noise precision are passed from the mean field part to the belief propagation part of the factor graph for the belief of the data and vice versa.The latter part represents the probabilistic model of the communication system.The proposed receiver was implemented in USRP/GNU radio.
The study showed that while substantial performance improvements with respect to a conventional receiver can be achieved-ranging from 2 to 4 dB depending on packet size and LOS conditions-these benefits come at an increase in per-bit decoding running time that increases linearly with the number of performed iterations.If full convergence (i.e., best performance) is decoding running time can be as much as 2 orders of magnitude larger than that of a conventional receiver.However, substantial performance improvements can be achieved also with a smaller number of iterations, especially in a rich scattering environment (NLOS scenario).Summarizing, our study clearly shows that in practical settings the trade-off between receiver complexity, running time, and performance should be carefully evaluated to strike the best compromise between these metrics.The results presented in this paper should be considered as a first step towards gaining an understanding of the feasibility of deploying advanced, iterative MIMO-SDMA receivers performance in real-world conditions.Future work includes considering higher-order modulations, as well as more complex MIMO configurations, including distributed MIMO channels.

Fig. 4
Fig. 4 A 2 × 2 MIMO-SDMA link with one transmitting node and one receiving node where R is the data rate.When two subsequent training symbols are identical, the received vector ζ [τ r ] at antenna r, r ∈ [1, N R ], and its time-shifted version ζ [τ r + (N + P)T] are related by ζ [τ r + (N + P)T] = exp{j 2π e(N + P)/N}ζ [τ r ] .(10) To obtain a ML-based synchronizer that handles frame and frequency synchronization, one might choose the parameter vector θ and observation vector Z as, respectively, θ = {τ 1 , . . ., τ N R , e} and Z = col{Z 1 , . . ., Z N R } with Z r ζ [τ r + (N + P)T] − exp{j 2π e(N + P)/N}ζ [τ r ].The latter choice accounts for the time periodicity of the preamble.The synchronizer seeks to find the joint estimate on the respective symbols obtained from the decoders and soft mappers.The symbol beliefs b

Fig. 5
Fig. 5 Map of the IIT department at CNR, Pisa. a LOS.b NLOS scenario

Fig. 6
Fig. 6 BER performance of the BP-MF receiver in a NLOS condition for different values of the number of iterations (L u = 512 bytes)

Fig. 7 Fig. 8
Fig. 7 PER performance of the BP-MF receiver in a NLOS condition for different values of the number of iterations (L u = 512 bytes)

Fig. 9
Fig. 9 performance of the BP-MF receiver in a quasi LOS condition for different values of the number of iterations (L u = 512 bytes)

Fig. 11 Fig. 12
Fig.11PER performance of the BP-MF receiver in a quasi-LOS condition for different payload sizes (i = 10 iterations)

Table 2
Parameter settings of the considered OFDM-SDMA system