Adaptive antenna selection and Tx/Rx beamforming for large-scale MIMO systems in 60 GHz channels

We consider a large-scale MIMO system operating in the 60 GHz band employing beamforming for high-speed data transmission. We assume that the number of RF chains is smaller than the number of antennas, which motivates the use of antenna selection to exploit the beamforming gain afforded by the large-scale antenna array. However, the system constraint that at the receiver, only a linear combination of the receive antenna outputs is available, which together with the large dimension of the MIMO system makes it challenging to devise an efficient antenna selection algorithm. By exploiting the strong line-of-sight property of the 60 GHz channels, we propose an iterative antenna selection algorithm based on discrete stochastic approximation that can quickly lock onto a nearoptimal antenna subset. Moreover, given a selected antenna subset, we propose an adaptive transmit and receive beamforming algorithm based on the stochastic gradient method that makes use of a low-rate feedback channel to inform the transmitter about the selected beams. Simulation results show that both the proposed antenna selection and the adaptive beamforming techniques exhibit fast convergence and near-optimal performance.


Introduction
The 60 GHz millimeter wave communication has received significant recent attention, and it is considered as a promising technology for short-range broadband wireless transmission with data rate up to multi-giga bits/s [1][2][3][4]. Wireless communications around 60 GHz possess several advantages including huge clean unlicensed bandwidth (up to 7 GHz), compact size of transceiver due to the short wavelength, and less interference brought by high atmospheric absorption. Standardization activities have been ongoing for 60 GHz Wireless Personal Area Networks (WPAN) [5] (i.e., IEEE 802. 15) and Wireless Local Area Networks (WLAN) [6] (i.e., IEEE 802.11). The key physical layer characteristics of this system include a large-scale MIMO system (e.g., 32 × 32) and the use of both transmit and receive beamforming techniques.
To reduce the hardware complexity, typically, the number of radio-frequency (RF) chains employed (consisting of amplifiers, AD/DA converters, mixers, etc.) is smaller than the number of antenna elements, and the antenna selection technique is used to fully exploit the beamforming gain afforded by the large-scale MIMO antennas. Although various schemes for antenna selection exist in the literature [7][8][9][10], they all assume that the MIMO channel matrix is known or can be estimated. In the 60 GHz WPAN system under consideration, however, the receiver has no access to such a channel matrix, because the received signals are combined in the analog domain prior to digital baseband due to the analog beamformer or phase shifter [11]. But rather, it can only access the scalar output of the receive beamformer. Hence, it becomes a challenging problem to devise an antenna selection method based on such a scalar only rather than the channel matrix. By exploiting the strong line-of-sight property of the 60 GHz channel, we propose a low-complexity iterative antenna selection technique based on the Gerschgorin circle and the * Correspondence: wangx@ee.columbia.edu 3 Electrical Engineering Department, Columbia University, New York, NY, 10027, USA Full list of author information is available at the end of the article stochastic approximation algorithm. Given the selected antenna subset, we also propose a stochastic gradientbased adaptive transmit and receive beamforming algorithm that makes use of a low-rate feedback channel to inform the transmitter about the selected beam.
The remainder of this paper is organized as follows. The system under consideration and the problems of antenna selection and beamformer adaptation are described in Section 2. The proposed antenna selection algorithm is developed in Section 3. The proposed transmit and receive adaptive beamforming algorithm is presented in Section 4. Simulation results are provided in Section 5. Finally Section 6 concludes the paper.

System description and problem formulation
Consider a typical indoor communication scenario and a MIMO system with N t transmit and N r receive antennas both of omni-directional pattern operating in the 60 GHz band. The radio wave propagation at 60 GHz suggests the existence of a strong line-of-sight (LOS) component as well as the multi-cluster multi-path components because of the high path loss and inability of diffusion [3,4]. Such a near-optical propagation characteristic also suggests a 3-D ray-tracing technique in channel modeling (see Figure 1), which is detailed in [12]. In our analysis, the transceiver can be any device, defined in IEEE 802.15.3c [5] or 802.11ad [6], located in arbitrary positions within the room. For each location, possible rays in LOS path and up to the second-order reflections from walls, ceiling, and floor are traced for the links between the transmit and receive antennas. In particular, the impulse response for one link is given by are called the intercluster parameters that are the amplitude, delay, departure, and arrival angles (in azimuth and elevation) of ray cluster i, respectively, and denotes the cluster constitution by rays therein, where are the intra-cluster parameters for kth ray in cluster i. Some inter-cluster parameters are usually location related, e.g., the severe path loss in cluster amplitude; some are random  variables, e.g., reflection loss, which is typically modeled as a truncated log-normal random variable with mean and variance associated with the reflection order [12], if linear polarization is assumed for each antenna. Besides, most intra-cluster parameters are randomly generated. On the other hand, for the short wavelength, it is reasonable to assume that the size of antenna array is much smaller than the size of the communication area, which leads to a similar geographic information for all links. It naturally accounts for the strong and neardeterministic LOS component and the independent realizations from reflection paths in modeling the overall channel response.
In OFDM-based systems, the narrowband subchannels are assumed to be flat fading. Thus, the equivalent channel matrix between the transmitter and receiver is given by for i = 1, 2, ..., N r and j = 1, 2, ..., N t , where the entry h ij denotes the channel response between transmitter j and receiver i by aggregating all N rays traced rays between them at the delay of the LOS component, τ 0 ; and α ( ) ij is the amplitude of ℓth ray in the corresponding link. Analytically, we can further separate the channel matrix in (3) into H LOS and H NLOS accounting for the LOS and non-LOS components, respectively where the Rician K-factor indicates the relative strength of the LOS component.
We assume that the numbers of transmit and receive antennas, i.e., N t and N r , are large. However, the numbers of available RF chains at the transmitter and receiver, n t and n r , are such that n t ≪ N t and/or n r ≪ N r . Hence, we need to choose a subset of n t × n r transmit and receive antennas out of the original N t × N r MIMO system and employ these selected antennas for data transmission (see Figure 2). Denote ω as the set of indices corresponding to the chosen n t transmit antennas and n r receive antennas, and denote H ω as the submatrix of the original MIMO channel matrix H corresponding to the chosen antennas.
For data transmission over the chosen MIMO system H ω , a transmit beamformer w = [w 1 , w 2 , . . . , w n t ] T , with ||w|| = 1, is employed. The received signal is then given by where s is the transmitted data symbol; ρ = E s n t N 0 is the system signal-to-noise ratio (SNR) at each receive antenna; E s and N 0 are the symbol energy and noise power density, respectively; n ∼ CN (0, I) is additive white Gaussian noise vector. At the receiver, a receive beamformer u = [u 1 , u 2 , ..., u n r ] T , with ||u|| = 1, is applied to the received signal r, to obtain For a given antenna subset ω and known channel matrix H ω , the optimal transmit beamformer w and receive beamformer u, in the sense of maximum received SNR, are given by the right and left singular vectors of H ω corresponding to the principal singular value s 1 (H ω ), respectively. The optimal antenna subset ω is then given by the antennas whose corresponding channel submatrix has the largest principal singular value. Letting S be a set each element of which corresponds to a particular choice of n t transmit antennas and n r receive antennas, we havê One variation to the above antenna selection problem is that instead of the numbers of available RF chains (n t , n r ), we are given a minimum performance requirement, e.g., s 1 ≥ ν. The problem is then to find the antenna subset with the minimum size such that its performance meets the requirement.

Problem statement
Our problem is to compute the optimal antenna setω and the corresponding transmit and receiver beamformers w and u for a ray-traced MIMO channel realization H. However, for the system under consideration, H is not available to us, but rather, we only have access to the receive beamformer output y(ω, w, u). This makes the straightforward approach of computing the singular value decomposition (SVD) of H ω to obtain the beamformers impossible. Furthermore, the brute-force approach to antenna selection in (7) involves an exhaustive search over N t n t N r n r possible antenna subsects, which is computationally expensive.
In this paper, we propose a two-stage solution to the above problem of joint antenna selection and transmitreceive beamformer adaptation. In the first stage, we employ a discrete stochastic approximation algorithm to perform antenna selection. By setting the transmit and receive beamformers to some specific values, this method computes a bound on the principal singular value of H ω corresponding to the current antenna subset ω, and then iteratively updates ω until it converges.
Once the antenna subset ω is selected, in the second stage, we iteratively update the transmit and receive beamformers w and u using a stochastic gradient algorithm. At each iteration, some feedback bits are transmitted from the receiver to the transmitter via a lowrate feedback channel to inform the transmitter about the updated transmit beamformer.
In the next two sections, we discuss the detailed algorithms for antenna selection and beamformer adaptation, respectively.
3 Antenna selection using stochastic approximation and Gerschgorin circle

The stochastic approximation algorithm
As mentioned earlier, we can only observe y(ω, w, u) in (6), which is a noisy function of the channel submatrix H ω . On the other hand, the objective function to be maximized for antenna selection is the principal singular value of H ω as in (7). If we could find a function j(·) of y such that it is an unbiased estimate of s 1 (H ω ), then we can rewrite the antenna selection problem (7) aŝ ω = arg max ω∈S E{φ(y(ω, w, u))}. (8) In [10], the stochastic approximation method is introduced to solve the problem of the form (8). The basic idea is to generate a sequence of the estimates of the optimal antenna subset where the new estimate is based on the previous one by moving a small step in a good direction towards the global optimizer. Through the iterations, the global optimizer can be found by means of maintaining an occupation probability vector π, which indicates an estimate of the occupation probability of one state (i.e., antenna subset). Under certain conditions, such an algorithm converges to the state that has the largest occupation probability in π. Compared with the exhaustive search approach, in this way, more computations are performed on the "promising" candidates, that is, the better candidates will be evaluated more than the others.
Due to the potentially large search space in the present problem, which not only limits the convergence speed but also makes it difficult to maintain the occupation probability vector, the algorithms in [10] can become inefficient. Here, we propose a modified version of the stochastic approximation algorithm that is more efficient to implement, and more importantly, it fits naturally to a procedure for estimating the principal singular value of H ω based on the receive beamformer output y(ω, w, u) only.
Specifically, we start with an initial antenna subset ω (0) and an occupation probability vector π (0) = [ω (0) , 1] T , which has only one element, with the first entry serving as the index of the antenna subset and the other entry indicating the corresponding occupation probability. We divide each iteration into n t + n r subiterations, and in each sub-iteration, we replace one antenna in the current subset ω with a randomly selected antenna outside ω, resulting in a new subsetw that differs from ω by one element. By comparing their corresponding objective functions, the better subset is updated as well as the occupation probability vector. This procedure is repeated until all n t + n r antennas are updated.
Instead of keeping records for all candidates, we dynamically allocate and maintain record in π for the new subset found in each iteration. If a subset already has a record in π, the corresponding occupancy probability will be updated. Otherwise, a new element is appended in π with the subset index and its occupation probability. Such a dynamic scheme avoids the huge memory requirement, since typically in practice, only a small fraction of the all possible subsets is visited.
We replace the selected subset with the current subset if the current subset has a larger occupation probabilities in π. Otherwise, keep the selected subset unchanged, thus completes one iteration.
In general, the convergence is achieved when the number of iterations goes to infinity. In practice, when it happens that one subset is selected in a large number, say 100, consecutive iterations, the algorithm is regarded as convergent and terminated, and the last selected subset is the global (sub)optimizer. Since most of the evaluations and decisions are generally made at the receiver, a low-rate and error-free feedback channel is assumed to coordinate the transmitter via feedback information. In each subiteration, the transmitter should know in advance which transmit antennas have been left in the current subset (i.e., ω (n) ) from last subiteration (because the current subset might have been changed in the previous subiteration), and then could generate a new subset by replacing the one with a random transmit antenna outside ω (n) . Without feedback an invalid situation might happen such that a transmit antenna, which is already assigned to one RF chain in the current subset, is selected again for another RF chain. In other words, feedback is necessary only in subiterations in which the current subset has changed for the transmit antennas during the last update in the previous subiteration. This implies that the amount of feedbacks is rather limited.
The modified stochastic approximation algorithm for antenna selection is summarized in Algorithm 1. In what follows we discuss the form of the objective function j(·) in (8) and its calculation.

Estimating the principal singular value using Gerschgorin circle
The Gerschgorin circle theorem [13] gives a range on a complex plane within which all the eigenvalues of a square matrix lie. In this section, we show that a good approximation to the largest eigenvalue can be calculated as long as the Rician K-factor is high enough. By calculating the G-circles, a simple estimator j(·) of the objective function in (8) is developed and employed in the stochastic approximation algorithm for antenna selection, i.e., Algorithm 1.
Denote the channel submatrix of the selected antenna subset by H ω = [h 1 , h 2 , . . . , h n t ], where h k ∈ C n r ×1 is the SIMO channel between the kth transmit antenna and the n r receive antennas in the subset ω. The correlation matrix of H ω is then Denote the eigenvalues of R ω in descending order as λ 1 ≥ λ 2 ≥ · · · ≥ λ n t . Then, according to the Gerschgorin circles theorem [13], these n t eigenvalues lie in at least one of the following circles with the radius of the kth circle being The above nt circles are centered along the positive real axis. Since the correlation matrix R ω is positive semi-definite, all eigenvalues are located along the positive real axis within these circles, as illustrated in Figure  3. Note that from (10) to (11), a circle with a larger center coordinate implies a larger channel gain for the corresponding transmit antenna; and a circle with a smaller radius implies a smaller channel correlation between the corresponding antenna and the other selected antennas. As seen from Figure 3, the right-most point among the n t circles is the upper bound for all eigenvalues and such a point can be used as the estimate of the largest eigenvalue of R ω . That is, Since the principal singular value s 1 of H ω is related to l 1 through λ 1 = σ 2 1 , we can rewrite (7) aŝ ω = arg max Note that, B 1 is the maximum over the λ 1 norms of the rows of R ω . In particular, letting R ω = [r ij ] we have Next we prove a lemma that provides a useful bound on B 1 and l 1 .
Lemma 1 For any semi-unitary matrix U ∈ C n r ×r such that U H U = I, we have To prove the lemma, we defineR ω = H H ω UU H H ω and letR ω = [r ij ]. We offer the following inequalities.
where the last inequality follows upon noting the positive semi-definite ordering R ω R ω . Next, we let ||R ω || F tr(R ωRω ) denote the Frobenius norm ofR ω . Then, since the rank ofR ω is no greater than min{n t , r}, it can be readily verified that Further, we have Combining (18) with (19) we have the desired result. In our problem, only the receive beamformer output y (ω , w, u) in (6) is available. We will obtain an approximation to the lower bound on B 1 , l 1 given in the righthand side (RHS) of (15) in the following way. For each transmit antenna in the subset ω, k = 1, ..., n t , we set the transmit and receive beamformers as w = e k , and u = 1 √ n r 1, respectively, where e k is a length-n t column vector of all zeros, except for the k-th entry which is one; and 1 is a length-n r column vector of all ones. The transmitted symbol is set as s = 1. Then by (5)-(6), we have the corresponding receive beamformer output given by 1 We now use the following expression to approximately lower bound B 1 , l 1 .
Substituting (20) into (21), we have Note that in the noiseless case, we have that B 2 in (22) is equal toB 2 , wherê Then, using Lemma 1 and its proof, we see thatB 2 is indeed a lower bound on B 1 as well as l 1 (R ω ).
In order to mitigate the noise, for each transmit beamformer e k , we will make multiple, say M transmissions, and denote the corresponding receive beamformer outputs as y(k) (m) , m = 1, ..., M. A smoothed version of the estimator b(k) is then given bỹ The final estimator of the lower bound on the principal eigenvalue of R ω is then given bỹ It is easily seen that both the 1st-order and 2nd-order noise terms are averaged out inB 2 , so that as M ∞ we havẽ Recall that in the stochastic approximation algorithm for antenna selection, at each iteration, we sequentially update the transmit and receive antennas and compute the corresponding objective functions. The above approach for calculating the objective function fits naturally in this framework, since for each transmit antenna candidate, we only need to transmit a pilot signal from it and then compute the correspondingβ(k). The complete antenna selection algorithm is now summarized in Algorithm 1.
Remark-1: We note that a typical scenario in 60 GHz has a strongly LOS channel with K ≫ 1 and one dominant path, so that H LOS = ab H is a rank one matrix. Moreover, in many applications, it is feasible to retain all receive antenna elements, so that the task reduces to selection of the optimal transmit antenna subset. In this case, neglecting H NLOS and the background noise (which holds for K, M ≫ 1), it can be verified that the transmit antenna subset which maxi-mizesB 2 also results in the largest eigenvalue. In particular where we useB 2 (ω)to denote theB 2 evaluated for a particular subset and where the approximation becomes exact in the limit of large K, M.

Adaptive Tx/Rx beamforming with low-rate feedback
Once the antenna subset H ω is chosen, the transmit and receive beamformers w and u will be computed. As mentioned in Section 2, w and u should be chosen to maximize the received SNR, or alternatively, to maximize the power of the receive beamformer output in (6), |y(ω, w, u)| 2 , i.e., Since the channel matrix H ω is not available, we resort to a simple stochastic gradient method for updating the beamformers.

Stochastic gradient algorithm for beamformer update
The algorithm for the beamformer update is a generalization of [14] and is described as follows. At each iteration, given the current beamformers (w, u), we generate K t perturbation vectors for the transmit beamformer, p j ∼ CN (0, I), j = 1, ..., K t, and K r perturbation vectors for the receive beamformer, q i ∼ CN (0, I), i = 1, ..., K r . Then for each of the normalized perturbed transmitreceive beamformer pairs Replace the kth element in ω (n) by a randomly selected antenna that is not in ω (n) to obtain a new subsetω (n) that differs with ω (n) by only one element; For a newly selected transmit antenna, transmit pilot signals from it and obtain the received signals {y(k) (m) , m = 1, ..., M}; For a newly selected receive antenna, sequentially transmit pilot signals from all transmit antennas and obtain the received signals; Recalculate the objective function φ(ω (n) ) using (24)-(25).

SELECTION:
If π (n) (ω (n) ) > π (n) (ω) Then Setω = ω (n) ; EndIf ω (n+1) = ω (n) ; π (n+1) = π (n) ; EndFor (n) where b is a step-size parameter, the corresponding received output power |y| 2 are measured, and the effective channel gain |u H H ω w| 2 can be used as a performance metric independent of transmit power. Finally, the beamformers are updated using the perturbation vector pair that gives the largest output power at the receiver. The transmitter is informed about the selected perturbation vector by a ⌈log K t ⌉-bit message from the receiver. The algorithm is regarded as convergent, and the iteration terminates when the performance metric fluctuates below a tolerance threshold. The algorithm is summarized as follows.
Algorithm 2 Stochastic gradient algorithm for beamformer update

INITIALIZATION:
Initialize w (0) and u (0) For n = 0, 1, ... PROBING: Generate K t and K r new beamformer vectors using (32) based on w (n) and u (n) , respectively; Evaluate the received power |y| 2 for each one of the K t K r perturbed beamformer pairs; UPDATE AND FEEDBACK: Let p j* and q i* be the perturbation vectors that give the largest received power; Feedback the index of the best transmit perturbation vector using ⌈log K t ⌉ bits; Update the beamformers:

Implementation issues
We next discuss some implementation issues related to the above stochastic gradient algorithm for beamformer update.

Initialization
A good initialization can considerably speed up the convergence of the above stochastic gradient algorithm compared with random initialization. For the application considered in this paper, recall that the channel consists of a deterministic LOS component H LOS and a random component. When the K-factor is high, the LOS component mostly determines the largest singular mode. Hence, we can initialize the transmit and receive beamformers as the right and left singular vectors of H LOS , respectively, which we will call it a hot start.

Parameterization
Since both w and u have unit norms, we can represent them using spherical coordinates. Consider w = [w 1 , w 2 , . . . , w n t ] T as an example. Expanding v = [Re {w T }, Im{w T }] T , it is equivalent to a point on the surface of the 2n t -dimensional unit sphere. Thus, v can be parameterized by (2n t -1)-dimensional vector ψ as follows [15] v 1 = cos ψ 1 , . . .
Given the vector v or equivalently ψ, to obtain a new perturbed weight vector near v, we can set an arbitrary small ε > 0 and generate i.i.d. random variables {δ i } 2n t −2 i=1 , which are uniformly distributed within [− ε 2 , ε 2 ] and another independent uniform random variable δ 2n t −1 ∈ [−ε, ε]. Then, new parameters are obtained within some predefined boundaries, given bŷ where As a result, uniform search for the better weight vector is confined within a fixed space defined by [a i , b i ], 1 ≤ i ≤ 2n t -1 and the range of the perturbation depends on the definition of {δ i }. For example, given a hot start, the current weight vector maybe very close to the optimizer, and it is necessary to set a smaller search region and a finer perturbation.

Parallel reception
Since at each iteration, the best beamformer pair is chosen out of K t K r combinations based on the corresponding output powers |y| 2 , it would require K t K r transmissions. In practice, instead of switching to different the receive beamformers and making the corresponding transmissions for each transmit beamformer, we can set up K r parallel receiver beamformers to obtain K r receiver outputs simultaneously. Then, only K t transmissions are needed for each iteration.

Conservative update
If all candidate K t + K r beamformers at each iteration are generated anew, then the algorithm is termed aggressive. On the other hand, a conservative strategy keeps the best transmit and receive beamformers from the previous iteration and generates K t -1 new transmitand K r -1 new receive beamformers for the current iteration. With a fixed step size and a single feedback bit, the advantage of the aggressive update is the quicker convergence. But with multiple feedback bits, such an advantage is less significant. Therefore, the conservative update is preferable for a finer performance upon convergence.

Simulation results
We consider an empty conference room with dimension 4m(L) × 3m(W) × 3m(H) for analysis, in which a largescale MIMO system with N t = 32 and N r = 10 transmit and receive antennas operating at the 60 GHz band is randomly located. All the antennas are omni-directional with 20 dBi gain and vertical linear polarized. There are 10 available RF chains at both the transmitter and the receiver, i.e., n t = 10 and n r = 10. To generate the channel realizations, 3-D ray tracing is performed between the transceiver using the inter-and intra-cluster parameters specified for the conference room scenario in [12]. By the result of ray tracing, the 32 × 10 channel matrix is gathered using (3). The channel remains static during antenna selection and beamformer update. Note that the channels simulated in the sequel are covered by Remark-1 in Section 3.2. Also, OFDM-based PHY is used as suggested in [5], where 512 subchannels divide total 2.16 GHz bandwidth. The default system SNR is assumed as r = 60dB. The insertion loss on signal power due to the switches between the RF chains and antennas is considered as an extra 5 dB increase in noise figure.

Performance of antenna selection with fixed size
The performance of Algorithm 1 for antenna selection in a single run is shown in Figure 4. Both the G-circle esti-matesB 2 given by (24)-(25) as well as the actual largest eigenvalues of the selected antenna subsets are plotted for the first 200 iterations as a zoom-in view. The number of transmissions for obtaining the smoothed estimate in (24) is M = 20. Since the search space is quite large, i. e., ( 32 10 ) = 64512240, in the same figure, we also plot the largest eigenvalues of the best and the worst subsets among 1,000 randomly selected antenna subsets. Moreover, the single-run performance of the antenna selection algorithm in [10] is also shown. In Figure 5, the average performance of 100 runs for the above schemes is plotted in a larger span of iterations. Several observations are in order. First, it is seen that the G-circle estimates are quite close to the actual largest eigenvalues, which validates the use of G-circle as a metric for antenna selection in strong line-of-sight channels. Secondly, Algorithm 1 has a much faster convergence rate than the algorithm in [10], which at each iteration picks the next candidate subset randomly and independent of the current subset, whereas Algorithm 1 searches for the next candidate subset in the neighborhood of the current subset. Thirdly, Algorithm 1 can lock onto a near-optimal antenna subset very quickly, e.g., in 10-20 iterations, and it significantly outperforms the exhaustive search over a large number (e.g., 1,000) of subsets.
Performance of antenna selection with variable size Figure 6 shows the performance of the adaptive antenna selection given a minimum requirement, and   all of them meet the requirement, i.e., l 1 ≥ 0.05, we backup the current parameters (i.e., current iteration number, selected subset, probability vector, etc.), and then terminate the current iteration and set n t n t -1. If again the condition is met, a new backup is performed to simply replace the previous one. As shown in Figures 6 and 7, n t keeps decreasing until the selected subsets do not meet the requirement for a number of iterations, e.g., 50, which means the last n t is the desired minimum size n * t . Therefore, by restoring the last backup data, the terminated iteration in Algorithm 1 is resumed till the optimal antenna subset with size n * t is found. In Figure 6, we show both the Gcircle estimates and the exact largest eigenvalues of the selected subsets. Since the estimation provides a lower bound to the largest eigenvalue and G-circle, a margin should be taken into consideration when setting the minimum performance requirement in order to guarantee that the actual performance of the selected subset meets the requirement with minimum number of selected antenna. Figure 8 shows one run of Algorithm 2 for adaptive transmit and receive beamforming upon a selected channel submatrix. The number of perturbations at the transmitter and receiver are K t = 16 and K r = 16, respectively; hence, the number of feedback bits is log 16 = 4. The conservative update with step size 0.05 is used. The performance of the Algorithm 2 with a random initialization and hot start is plotted, as well as the exact largest eigenvalue of the channel obtained by SVD. It is seen that the hot start can significantly speed up the convergence. In Figure 9, we compare the performance of Algorithm 2 with different number of feedback bits, i.e., K t = 2, 4, 8, 16 and fixed K r = 16. It is seen that by employing more feedback bits, the convergence rate can be significantly increased. Similar behavior can be seen if we fix K t and vary K r .

Overall performance of adaptive antenna selection and beamforming
The effective channel gain, |u H H ω w| 2 , is a metric indicating the overall performance by associating the adaptive antenna selection with beamforming. In this simulation, the transceiver is dropped at 100 random locations with minimum distance 30 cm in the room independently, and we generate the channel realizations therein using 3-D ray-tracing technique. By running the proposed adaptive algorithms for these drops, Figure 10 shows different system SNR. For comparison, the non-adaptive solutions, i.e., the best out of 1,000 random subsets and SVD are also investigated. We have several observations. First, for both beamforming algorithm (Algorithm 2 and SVD), Algorithm 1 outperforms the best out of random 1,000 subsets at the high SNR region, but its performance is inferior at the lower SNR. This is because when the SNR is low, the accuracy and reliability can not be guaranteed in estimating the objective function value and ranking the subsets, which prevents the adaptive algorithms from converging to better solutions. Second, for the same reason, Algorithm 2 is inferior to SVD at lower SNR, but approaching SVD at high SNR by using both antenna selection strategies. It implies that the accuracy in objective function estimation is a key factor that largely affects the convergence and overall performance. From (24), we see that it is feasible to increase M in order to guarantee the estimation accuracy and maintain the overall performance in the low SNR region.

Conclusions
We have proposed a sequential antenna selection algorithm and an adaptive transmit/receive beamforming algorithm for large-scale MIMO systems in the 60 GHz band. One constraint of the system under consideration is that the receiver can only access a linear combination of the receive antenna outputs, which makes the traditional antenna selection schemes based on the channel matrix not applicable. The proposed antenna selection method uses a bound on the largest singular value of the channel matrix based on the Gerschgorin circle. The method is particularly useful over the 60 GHz channel, which has a strong line-of-sight component, and it employs a discrete stochastic approximation technique to quickly lock onto a near-optimal antenna subset. We have also proposed an adaptive joint transmit and receive beamforming technique based on the stochastic gradient method that makes use of a low-rate feedback channel to inform the transmitter about the selected beam. Simulation results show that both the proposed antenna selection and the adaptive beamforming techniques exhibit fast convergence and near-optimal performance. Note 1 Note that in obtaining (20) without loss of generality we have absorbed r into H.