 Research
 Open Access
 Published:
Spatial attention and quantizationbased contrastive learning framework for mmWave massive MIMO beam training
EURASIP Journal on Wireless Communications and Networking volume 2023, Article number: 69 (2023)
Abstract
Deep learning (DL)based beam training schemes have been exploited to improve spectral efficiency with fast optimal beam selection for millimeterwave (mmWave) massive multipleinput multipleoutput (MIMO) systems. To achieve high prediction accuracy, these DL models rely on training with a tremendous amount of labeled environmental measurements, such as mmWave channel state information (CSI). However, demanding a large volume of ground truth labels for beam training is inefficient and infeasible due to the high labeling cost and the requirement for expertise in practical mmWave massive MIMO systems. Meanwhile, a complex environment incurs critical performance degradation in the continuous output of beam training. In this paper, we propose a novel contrastive learning framework, named selfenhanced quantized phasebased transformer network (SEQPTNet), for reliable beam training with only a small fraction of the labeled CSI dataset. We first develop a quantized phasebased transformer network (QPTNet) with a hierarchical structure to explore the essential features from frequency and spatial views and quantize the environmental components with a latent beam codebook to achieve robust representation. Next, we design the SEQPTNet including selfenhanced pretraining and supervised beam training. SEQPTNet pretrains by the contrastive information of the target user and others with the unlabeled CSI, and then, it is utilized as the initialization to finetune with a reduced volume of labeled CSI. Finally, the experimental results show that the proposed framework improves beam prediction accuracy and data rates with 5% labeled data compared to existing solutions. Our proposed framework further enhances flexibility and breaks the limitation of the quantity of label information for practical beam training.
1 Introduction
Millimeterwave (mmWave) massive multipleinput multipleoutput (MIMO) is one of the most critical technologies in fifth/sixthgeneration (5G/6G) wireless communication systems due to its capacity to provide comprehensive spectrum and spatial resources for high transmission rate demands [1,2,3]. Benefiting from the short wavelength of mmWave signals, it permits a massive number of antenna elements to be integrated into limited equipment size at both base station (BS) and user equipment (UE) sides [4]. In addition, the massive MIMO array can compensate for the severer path loss of mmWave signals by highly directional beamforming, leading to stronger coverage, larger data rates, and improved reliability [5, 6].
Generally, mmWave massive MIMO arrays adaptively transmit signals via directional beams according to the wireless environment state [7,8,9,10,11]. To efficiently transmit beams with maximum gain, beam training is essential to identify the lineofsight (LOS) or dominant channel path, yielding the optimal beam pair for the transceivers [7]. In practical beam training, candidate beams can be defined by a finitesize codebook covering the intended angle range and exhaustively retrieved to determine the optimal beam pair [8]. However, mmWave massive MIMO beam training is challenging due to the large codebook size, which results in high computational overhead. To reduce the training overhead, [9] proposes hierarchical multiresolution codebook solutions, where a lowresolution subcodebook detects the candidate transmitting direction, and the highresolution codebook confirms the optimal beam pair. Another efficient beam training approach is interactive beam search. In [10] and [11], they detect the direction of the LOS/dominate path from the mmWave channel estimation and select optimal beam pairs.
The performance of beam training schemes, including alignment accuracy and overhead, is highly dependent on the codebook design. The literature has shown that an adaptive hierarchical codebook can decide the codewords based on previous beam training results with multiple mainlobes covering a spatial region for one or more user equipments (UEs) [12]. In [13], it is provided to efficiently generate the hierarchical codebook by jointly exploiting subarrays with the partial active antenna elements. An adaptive and sequential alignment scheme was proposed in [14], demonstrating the relation between fast search time and the probability of error in acquired beam directions through extrinsic Jensen–Shannon divergence. The study proposed in [15] developed a fast beamsweeping algorithm based on compressive sensing (CS) to determine the minimum number of measurements required.
1.1 Related work
The conventional schemes can satisfy the user demands but can hardly inherit from the experience to further improve their ability. Deep learning (DL) has recently elevated the field of wireless systems research and beam training to new heights [16, 17]. These heuristic proposals depend on the welllabeled channel state information (CSI) corresponding to the aligned gain of a predefined codebook to construct a supervised learning (SL) framework. In this framework, the optimal beam can directly predict based on the large volume of labeled knowledge to reduce training overhead and effects of noise [18,19,20,21,22,23]. A beam selection scheme based on deep neural networks (DNN) was proposed in [18] that recognized the desired beam from the elementary relation of position and received signals with low alignment overhead. Meanwhile, the potential of DL in predicting the optimal mmWave beam and correcting blockage status based on the sub6 GHz channel information has been proposed by [19] to reduce beam training overhead and achieve reliable communication [20]. has trained a beamforming prediction network (BPNet) using supervised and unsupervised learning methods to optimize power allocation and predict virtual uplink beamforming (VUB) for improving computational efficiency. An online learningbased training strategy in [21] utilized a large volume DNN network to obtain the offline model parameters and finetune the DNN model according to the extra CSI measured in real time. An adaptive beam training scheme for calibration was proposed in [22] that estimated approximate CSI features by a convolutional neural network (CNN) and determined the optimal beam by selfcriticism of the long shortterm memory (LSTM). Moreover, a nondeterministic beam training was proposed in [23] that developed a binary coding scheme to represent the valid CSI and reduce the effect of noise. Although DLbased studies can obtain impressive achievements, severe multipath interference may impair prediction accuracy when the angles of paths in the local cluster are closely near the dominant channel path.
CSI is informative in deciding the optimal beam by exploring the dominant path during the training process. In [24], the authors proposed a hierarchical search by decomposing the multipath into several virtual components and using the hierarchical search to recover the dominant CSI for beam training. However, estimation performance highly relies on the training penalty. DLbased beamspace channel estimation was proposed in [25] to directly estimate the beamspace from the received signal, eliminating the need for a timeconsuming beamforming process. A channel signaturebased hybrid precoding design was proposed in [26] using DNN to estimate the channel and perform hybrid precoding with low computational complexity. Furthermore, a dual timescale variational framework for mmWave beam training and training [27] addressed beamforming direction in real time by training a deep recurrent variational autoencoder, taking into account both the historical channel information and the current channel conditions. In [28] and [29], LSTM is shown to further improve the ability of beam training by the implicit channel signature. [28] inferred the optimal beam directions at a target BS in future time slots depending on the historical channel features for mobile UE. [30] indicated that spatial attention beam training can improve transmission reliability, and the associative LSTM encoder performs explicit channel features to improve training ability.
The existing solutions exploiting the DL model for beam training share the following limitations. Firstly, most existing solutions perform inefficient feature extraction for frequency and spatialdomain CSI. Direct vectorization for frequency and spatial information leads to a coarse representation. It is thus hard for DNNbased beam training to extract the CSI feature information effectively, resulting in low learning efficiency and beam prediction performance. Meanwhile, CNNbased models show strong ability on twodimensional local feature extraction but suffer from a limited global view of beamspace awareness. In addition, sequence modeling can capture the relation of features from frequency and spatial domains but can hardly learn over an extensive range. Secondly, environmental measurements can affect the continuous output of the dense network. The continuous representation of CSI is sensitive to noise and channel variation, resulting in an incorrect beam prediction. Finally, the existing SL approaches require all CSI data to be labeled. However, labeling the large volume of CSI is unrealistic due to the high labeling cost and expertise requirement in practical mmWave massive MIMO systems. Although CSI is easily obtainable, handling the rapidly changing CSI when labeling the actual optimal beam is impractical. Therefore, the deficiency of labeled CSI can constrain the performance of existing DLbased beam training schemes.
1.2 Motivation
Typically, the CSI of massive MIMO contains the frequency dynamic response that exhibits spatial fluctuations in interconnected power grids. Because we concentrate on selfenhanced pretraining and SL with limited labeled data, it is critical to align the relation of preponderant paths corresponding with different subcarriers for wellexplored beneficial features. Since the training signals are resulted from the transmitted signals and the propagation environment, tracking the UE movement and capturing the local feature couples have achieved delightful results [22, 30]. However, these methods are incapable of fetching a global view of spatial and frequency features. Consequently, we absorb the benefits of existing works to develop a hierarchical DL architecture with two levels, where the first level tracks the spatial varying on different subcarriers and serves the second level to explore the global view for efficient beam training.
Moreover, the complexity and diversity of the wireless environment are challenging issues for DLbased beam training. Continuous feature extraction results in uncontrollable representation, sensitive to random disturbance terms. Existing work address phase quantization mainly in three ways, by designing with realvalued phase shifts and then applying quantization [31], by constructing an analog beamforming codebook [32], and by nonlinear mapping into binary phase quantization [33]. They lack flexibility, robustness to noise, and channel variations. To address this problem, we develop a configurable codebook to quantize the continuous spatial feature satisfying the equal AoD distribution in categorical beams. Specifically, discrete codewords can get rid of the effects of noise and channel variations and reflect the dominant factor of the CSI. Thus, we can obtain controllable quantized results from the codebook to improve the robustness of hierarchical DL architecture.
DL has been suggested as a promising approach to address the nonlinear relationship of CSI and optimal beam prediction. As indicated in [21, 30], DLintegrated beam prediction is typically performed under an SL framework with perfect CSI annotation. However, it is challenging to acquire the exhaustive annotation of massive CSI for DLbased beam training in realistic massive MIMO mmWave systems, which leads to high labeling costs and expertise requirements. [34] proposes an unsupervised method that performs the CSI reconstruction, and accomplishes the online SL beam training with a large dataset of labeled CSI. It is inefficient for limited labeled CSI because of the uniqueness of the wireless environment, lacking extendability. We shed light on a novel contrastive learning framework with limited labeled CSI to mitigate the expertise requirements. Specially, we leverage the uniqueness of CSI of different UE locations to pretrain by identifying the contrastive information between the target UE and others.
1.3 Contributions
This paper proposes a novel contrastive learning framework, named selfenhanced quantized phasebased transformer network (SEQPTNet), in mmWave massive MIMO systems to enable reliable beam training with CSI measurements underlying limited labels. We first design a hierarchical DL architecture, named quantized phasebased transformer network (QPTNet), which sequentially extracts frequency and spatialdomain features to enable effective representation. In order to perform a reliable spatial representation, we also develop a latent beam codebook to align the similar phase features of the frequency domain for exploring the environmental components. Then, we propose the contrastive learning framework SEQPTNet extended from the QPTNet architecture. The SEQPTNet framework is concerned with detecting the relationship between the global beam signature and the distinctive CSI corresponding to user locations using contrastive environmental prediction. By leveraging the contrastive information of the unlabeled CSI, SEQPTNet is pretrained as the initialization for finetuning with a limited amount of labels to provide more accurate performance. The main contributions of this paper can be summarized as follows:

We propose QPTNet, a hierarchical DL architecture with two levels, to enable effective representation of CSI in frequency and spatial domains. The first level performs gate recurrent unit (GRU) to model and extract the dependencies among clusters information in the frequency domain. The second level utilizes the spatial attention mechanism to extract a global beam signature based on the results of quantization.

We design a codebookbased phase quantization to explore the complicated environmental components for reliable spatial representation. This quantization method can match and aggregate similar phase features with a codeword, improving the feature robustness from the effect of noise in the CSI. It converts the prior knowledge of continuous spatial features extracted from the frequency domain into the posterior knowledge of categorical beams.

We develop a contrastive learning framework SEQPTNet benefiting from the hierarchical QPTNet and codebookbased phase quantization. This enhanced model further improves beam training accuracy with limited labeled CSI. To the best of our knowledge, this is the first study that introduces contrastive learning in beam training applications. SEQPTNet performs two benefits based on contrastive environmental prediction. Firstly, it can pretrain without any label information by detecting the relationship between the global beam feature and positive/negative samples. Secondly, the similarity of a positive sample and beam signature can effectively capture the spatial dynamic changes under long interfrequency spans. SEQPTNet preserves the benefits of QPTNet and reduces the labeling cost.
The rest of the paper is organized as follows: Sect. 2 introduces the mmWave massive MIMO system model and the problem formulation. Section 3 explains the principle of spatial attentionassociated beam prediction. Section 4 develops the hierarchical feature extraction scheme QPTNet and discusses the details of the codebookbased phase quantization and the proposed QPTNet. Section 5 proposes the contrastive learning framework SEQPTNet and the procedure of selfenhanced pretraining. We then present the simulation results in Sect. 6, followed by a conclusion of this study in Sect. 7.
Notations: \(\varvec{A}\) is a matrix; \(\varvec{a}\) is a vector; a is a scalar; \((\cdot )^{T}\) and \((\cdot )^{H}\) denote transpose and conjugate transpose, respectively, while \(\cdot \) denotes the magnitude operator. \(\mathfrak {R}(\cdot )\) and \(\mathfrak {I}(\cdot )\) denote the real and imaginary parts of a complex number, respectively. \(\mathcal{C}\mathcal{N}(0,\varvec{\Sigma })\) represents the zeromean complex Gaussian distribution with covariance matrix \(\varvec{\Sigma }\), respectively; \(\sigma (\cdot )\) denotes the activate function of neural network.
2 Methods/experimental
The purpose of this study was to tackle the annotation costs and the spectral efficiency problem of mmWave massive MIMO systems by a novel contrastive learning framework SEQPTNet. We consider a system containing a BS equipped with massive MIMO communicating the UE in a complex environment with limited label information. Taking advantage of contrastive structure, the proposed framework SEQPTNet pretrains with unlabeled data and transforms the model to a hierarchical QPTNet with spatial attentionbased feature extraction. The proposed SEQPTNet can improve the learning efficiency, preserve the stability of feature extraction, reduce the requirements of expertise, and train with limited annotations. To analyze the performance of the framework experimentally, we generate the dataset by DeepMIMO and evaluate the proposal and competitive methods by performances of training loss, success rate, and achievable rate.
3 System model and problem formulation
It is worth emphasizing that our proposal can be extended to various communication scenarios. First, the proposed contrastive learning framework can be developed for the multiuser hybrid beamforming design since the beam direction is unique for each user separately to achieve fewer effects of multiuser interference, and the labeling cost is also intractable. Then, the proposed contrastive learning framework can be applied in the harsh wireless environment, e.g., the reconfigurable intelligent surface (RIS) scenario, where the beam direction of each user is aligned with the dominant AoD of RIS, while others are treated as negative.
The overview of proposed beam training schemes is illustrated in Fig. 1. And the proposed DLintegrated beam prediction is typically performed at the BS side to ensure fast prediction responses with high computational resources. For analytical simplicity, we consider downlink transmission of an mmWave massive MIMO BS and a single antenna UE. For a twodimensional (2D) mmWave channel where only azimuth angles are considered at both BS and UE, the Saleh–Valenzuela channel model is typically adopted, which can be formulated as
where L, \(\beta _{l}\), \(\theta _{l}\), and \(\phi _{l}\) denote the number of channel paths, channel gain, angleofarrival (AoA), and angleofdeparture (AoD) of the lth channel path, respectively.
Since the first channel path, corresponding to the LOS path, is typically significant, recognizing the LOS path can be beneficial for improving the coverage of mmWave signals [30]. Although the number of resolvable channel paths is much smaller than the number of BS antennas, i.e., \(L \ll N_{t}\), it is still challenging to efficiently distinguish the LOS path because of the limited scattering of mmWave channels and the nonlineofsight (NLOS) [35] [36]. The AoA and AoD of the l th path can be defined as \(\phi _{l} = 2d_{t}\sin \Phi _{l}/\lambda\) and \(\theta _{l} = 2d_{r}\sin \Theta _{l}/\lambda\), where \(\Theta _{l}\) and \(\Phi _{l}\) are the set of LOS and NLOS paths, respectively; \(\lambda\) denotes the wavelength; \(\mathrm{{d}}_{t}=\mathrm{{d}}_{r}=\lambda /2\) are the antenna spacing at the BS and UE. In particular, both \(\Theta _{l}\) and \(\Phi _{l}\) satisfy uniform distribution within \([\frac{\pi }{2},\frac{\pi }{2}]\). The transmit and receive array steering vectors can be expressed as
With the mmWave channel matrix \(\varvec{H}\) given in (1), the received signal can be described as
where P, \(\varvec{\omega }\in {\mathbb {C}}^{N_{r}\times {1}}\), \(\varvec{f}\in {\mathbb {C}}^{N_{t}\times {1}}\) denote the transmit power, combiner, and beamformer, respectively. x is the transmitted data with unit power, i.e., \(\vert x \vert =1\), while \(\varvec{n} \sim \mathcal{C}\mathcal{N}(\varvec{0},\sigma ^{2}\varvec{I}_{N_{r}})\) denotes the additional white Gaussian noise (AWGN) vector with power \(\sigma ^{2}\). Typically, the beamformer and combiner do not increase or decrease the power gain, i.e., \(\Vert \varvec{\omega }\Vert _{2}=\Vert \varvec{f}\Vert _{2}=1\). The achievable rate can be described by
To get the maximum achievable rate for the given \(\varvec{H}\), we conventionally perform the beam training to construct the optimal beam pair by optimizing \(\varvec{f}\) and \(\varvec{\omega }\) before data transmission (as shown in Fig. 1). This optimization issue can be implemented by the predefined codebooks \({\mathcal {F}}\) and \({\mathcal {W}}\) as the following equation:
Practically, it is impossible to directly reveal the ideal pair of \(\varvec{f}^{op}\) and \(\varvec{\omega }^{op}\) since it is hard to tackle with the three independent matrices in (4).
A straightforward and ergodic solution of (6) is to enumerate all possible candidate codewords of \(\varvec{f}\) and \(\varvec{\omega }\) to determine the optimal solution with the largest achievable rate through the beamsweeping and the overview is shown in Fig. 1. Conventionally, the beamformer \(\varvec{f}\) can be generated by a predefined codebook \({\mathcal {F}}\triangleq \{\varvec{f}_{n}, n=1,2,\dots , N_{t}\}\) that includes \(N_{t}\) codewords corresponding to different AoDs with the inherent transmitting spatial resolution. Identically, the combiner can be generated by \({\mathcal {W}}\triangleq \{\varvec{\omega }_{m},m=1,2,\dots ,N_{r}\}\) including \(N_{r}\) codewords with the inherent receiving spatial resolution with different AoAs. For each beam training test, the BS selects a codeword from \({\mathcal {F}}\) as the beamformer aligns with the combiner from \({\mathcal {W}}\) at the UE side. Generally, the discrete Fourier transform (DFT) codebook is a feasible option to decide the candidate beamformer \(\varvec{f}_{n}\) and combiner \(\varvec{\omega }_{m}\):
where the \(\xi _{{t},n}\) and \(\xi _{{r},m}\) are the beam directions of the n th possible beam at BS and m th possible received beams at the UE side. To span the whole angular domain in both BS and UE, \(\xi _{{t},n}\) and \(\xi _{{r},m}\) can be uniformly sampled in \((\Xi _{t},\Xi _{t})\) and \((\Xi _{r},\Xi _{r})\), i.e.,
To find the optimal solution of (6), the searching range is \(K \triangleq N_{t}N_{r}\), while the candidate beam pair can be denoted as
To evaluate the performance of beam training, the success rate is regarded as an important criterion in [13]. The index of solutions corresponding to the largest achievable rate can be treated as successful; otherwise, we consider the solutions are fail in the beam training. Hence, the success rate \(\gamma\) can be defined as the ratio of the number of successful trails \(N_\mathrm{{{Suc}}}\) over the total number of trails \(N_\mathrm{{{Tot}}}\) as the following equation
This paper proposes a novel contrastive learningbased SEQPTNet for reliable beam training with low labeling cost and expertise requirements. Generally, the prediction of the optimal mmWave transmitting beam is operated at the BS side, and the same method can be easily extended to predict the optimal receiving beam. Considering severe multipath interference and inconsistent dominant beam prediction, we propose to adopt the phase quantization of periodically estimated CSI to reflect the relation of cluster AoDs/AoAs of UE and predict the optimal mmWave beam when mmWave beam training is required. Since the mmWave channels are considered to have identical LOS AoD/AoA and NLOS cluster AoDs/AoAs. For AoDs, we can rewrite the received signal (4) at the UE side as
where the \(\varvec{n}_{eq}\) denotes the equivalent noise. By substituting into (1) and (13), it yields
We can quantify the correlation between the mmWave beam in (7) and the array steering vector in (2) as
where \(\psi _{n}=\sin \xi _{{t},n}\sin \phi _\mathrm{{{LOS}}}\). The quantization \(q_{n}(N_{t},\phi _\mathrm{{{LOS}}})\) illustrates the relation between angles of direction \(\xi _{{t,n}}\) and \(\phi _\mathrm{{{LOS}}}\), which is regarded as a quantization error [12]. However, the multipath interference and approximation of \(\phi _\mathrm{{{LOS}}}\) may lead to inaccurate predicted results.
Since the number of candidate beams is finite, mmWave beam training is a multiclass classification task based on the beam codebook \({\mathcal {F}}\). Mathematically, the prediction is represented by the probability results of the training function \({\mathcal {Q}}(\cdot )\) as
where \(P_{n}\) is the predicted probability, and the optimal beam corresponding index \({n}^{*}\) is the maximum probability from the output given the parameters \(\varvec{W}\).
In this paper, we propose a novel spatial attention and quantizationbased contrastive learning framework to achieve high learning efficiency with lower labeled data requirements and greater robustness. The proposed SEQPTNet framework includes two stages:
Training stage We divide the training stage into two phases: selfenhanced pretraining and supervised training. Selfenhanced pretraining is an unsupervised procedure, where we only utilize the CSI as the training samples to acquire the knowledge of environment variations based on the contrastive learning framework. This solution concentrates on modeling the belonging relation between the CSI features and its global beam signature to enhance the uniqueness of the feature of the current user. In supervised training, limited training samples are collected from the CSI and the transmitting beam information. The optimal mmWave transmitting beam index decided by the beamsweeping is used as the classification label and utilizes corresponding CSI as input. Different from the existing works, we finetune the model based on the initialization given knowledge of environmental variations from the pretraining stage.
Predicting stage: When mmWave beam training is required, the BS can predict the optimal beam with received signals depending on the welltrained model. According to our proposal, the labeling cost and requirement of expertise are flexible and beam prediction can achieve reliability under environmental variation.
For clarity, we first describe the spatial attentionassociated beam prediction of the proposed hierarchical QPTNet. Then, we illustrate the details of codebookbased phased quantization for the spatial features and procedure of QPTNet, including the first level for UE tracking corresponding to the different locations, quantization, and the beam prediction decided by spatial attention. We finally introduce the proposed contrastive learning framework SEQPTNet and summarize the procedure of selfenhanced pretraining.
4 Spatial attentionassociated beam prediction
This section presents the efficient spatial attentionassociated beam prediction, depicted in Fig. 2. Efficient perception of the environment from the observed CSI can significantly improve beam prediction accuracy. Benefiting from the attention mechanism, we can infer the optimal beam response from implicit directions by its attention scoring [37]. Moreover, the attention mechanism is also good at simultaneously capturing features from entire implicit directions for a comprehensive beam signature.
Beam gain generation module: Since the initial spatialfrequency channel measurement \(\varvec{H}\) is complexvalued, we firstly convert normalized \(\varvec{H}\) into realvalued \(\varvec{\tilde{H}}\) and normalize with the maximum amplitude of its elements:
Then realvalued \(\varvec{\tilde{H}}\) concatenates the real and imaginary components in the spatial domain to simultaneously generate the beam gain. Finally, the input modified channel can be denoted as \(\varvec{\tilde{H}} \in {\mathbb {R}}^{N_{f}\times {2N_{t}}}\).
To generate the transmitting beam gain in each antenna direction, we exploit the linear projection of \(\varvec{\tilde{H}}\) to antenna space \(N_{t}\) through an embedding layer. To preserve the relative distance among the antennas, the distinctive beam gain embedding parameters employ with the linear projection results. The transmitting beam gains of \(\varvec{\tilde{H}}\) can be defined as
where \(\texttt {embedding}(\cdot )\) is a linear projection layer with \(2N_{t} \times N_{t}\) transmit antenna dimension, and the size of \(\varvec{B}_{g} \in {\mathbb {R}}^{N_{f}\times {N_{t}}}\).
Beam direction tagging module: To symbolize the inherent direction for the beamforming gains, we consider a beam direction tagging \(\varvec{B}_{t}\) for each transmitting beam gain in (18), which is generated through the linear projection \(\varvec{B}_{t} \in {\mathbb {R}}^{N_{f}\times {N_{t}}}\) with each transmitter antenna index. By combing (18), the inputs of transformer encoder \(\varvec{{X}}=\varvec{B}_{g}+\varvec{B}_{t}\). Stacked attention module: The transformer encoder enables the spatialfrequency feature to deeply learn essential representation by applying the stacked attention module. Particularly, the attention mechanism [37] may capture the relevant relation between the specific LOS/dominate path direction while diminishing the effects of NLOS/subordinate by traversing all transmission antennas with the beam direction query matrix \(\varvec{Q}\), beam gains key matrix \(\varvec{K}\), and corresponding scoring value matrix \(\varvec{V}\). The stacked attention module devotes more focus to mutually important alignment degrees from the coherence of candidate beam gains and the beam directions of the local antenna, learning which specific AoD information is more competitive than others, depending on the limited number of scatters of LOS and NLOS paths. Then, the query matrix \(\varvec{Q}\), key matrix \(\varvec{K}\), and value matrix \(\varvec{V}\) are generated by the wide fully connected (FC) layers with input signal \(\varvec{X}\) as
where the \(\varvec{W}^{q_{i}},\varvec{W}^{k_{i}},\varvec{W}^{v_{i}}\) are the linear projection layers in i th attention module. Attention operation can be introduced as a scaled coherence function in Fig. 2, which maps a beam direction query and a bunch of candidate beam gains pairs as a dependency relation. Specifically, we compute the dotproduct of beam direction query matrix \(\varvec{Q}\) with all key beam gains matrix \(\varvec{K}\) and apply a \(\texttt {softmax}(\cdot )\) activation to obtain the pairing probabilities on the scoring value matrix \(\varvec{V}\), so that beam gains can be precisely aligned with LOS direction. Then, the output can be computed by
The candidate pairs context of the beam gains and direction can be extracted by the forward stacked attention process and decided with the relevant score, which comes from the softmax probability result. Moreover, the superior collaboration of beam gains and direction can effectively improve the beam training performance.
Output module: Different from the dense FC layer basedprediction scheme, the global average pooling (GAP) in [38], is introduced to implement the average of each token from stacked attention operations to the candidate beams which can be written as
where \(\varvec{c}_{d} \in {\mathbb {R}}^{N_{t}\times {1}}, d = 1,2, \cdots , D\) is the output from the stacked attention module, and the size of beam signature vector \(\varvec{c}_{t} \in {\mathbb {R}}^{N_{t}\times {1}}\). One advantage of GAP over the FC layer is that it summarizes the aligned information between beam gains and transmitting direction with the weighted average results; thus, it is more robust to spatial translations of the fusion local features of gains and direction. In addition, there are no extra parameters to optimize in the GAP layer, and overfitting can be avoided. The resulting vector \(\varvec{c}_{t}\) is directly fed into the \(\texttt {softmax}(\cdot )\) layer, and the optimal beam can be chosen with the maximum probability of the global beam signature \(\varvec{c}_{t}\)
where \(\varvec{p}\) is the predicted probability and \(p_{n}\) is the element of \(\varvec{p}\). To train our model, the crossentropy loss is applied as the evaluation metric for the classification problem. The crossentropy loss can be expressed as
where \(y_{c}\) is the actual optimal mmWave beam described by onehot encoding as the classification label. If label c is identical to the optimal beam, \(y_{c}=1\); otherwise, \(y_{c}=0\).
5 QPTNetassociated beam training
This section presents the hierarchical QPTNet for efficiently processing the CSI. QPTNet has two hierarchical levels and a codebookbased phase quantization procedure. The first level is an autoregressive encoder based on GRU, and the second level is a spatial attention encoder based on the transformer. The output of the GRU encoder can be quantized with a generated beam codebook. Finally, the spatial attention encoder extracts a global beam signature. The overview of QPTNet is illustrated in Fig. 3.
5.1 Codebookbased phase quantization
To analyze the components of the propagation environment, we attempt to capture the implicated relation of cluster AoDs by the discrete phasebased quantization method. We define a latent beam codebook \({\mathcal {G}}\) including \(N_{CB}\) codewords \(\varvec{g}_{i} \in {\mathbb {C}}^{N_{t} \times 1}, i = 1,2, \cdots , N_{CB}\) (i.e., \(N_{CB}\)way categorical beams). The latent beamspace is initialized with randomly sampled angular \({\bar{\phi }}_{i} \in [\frac{\pi }{2},\frac{\pi }{2}]\), and the normalized spatial frequency \({u}_{i}\) is defined as
where \(u_{i} \in [1/2,1/2]\) for \(\lambda /2\) element spacing. Intuitively, the beam vector can be generated by the DFT of \(\varvec{u}\) at points separated by \(1/N_{CB}\). Note that \(N_{CB} \ge N_{t}\) and there are \(N_{CB}\) categorical beam vectors \(\varvec{g}_{i}\). Thus, the latent beamspace can be described as
Since we only consider the spatial power spectrum, the elements of (25) are adjustable with the network training.
As shown in Fig. 3(a) and (b), the GRU encoder yields a continuous spatial feature matrix \(\varvec{Z}\) with column vectors \(\varvec{z}_{l} \in {\mathbb {C}}^{N_{t} \times 1}, l = 1,2, \cdots , N_{f}\). Next, we quantize the continuous feature \(\varvec{z}_{l}\) into categorical beam \(\varvec{g}_{i^{*}}\)based on the Euclidean distance \(d_{l,i}\). The categorical beam index is expressed as
The minimal \(d_{l,i^{*}}\) ensures that the continuous spatial features corresponding to the categorical beams are within a neighboring zone of latent beamspace. Meanwhile, distance calculation can efficiently reflect the relation of cluster AoDs to represent the components of the environment. The quantization approach can also, respectively, perform the clustering by mapping to the nearest codeword. The posterior categorical beam distribution \(p(\varvec{\hat{z}}_{l}= \varvec{g}_{i} \varvec{\tilde{H}},\varvec{\bar{\phi }}_{i})\) is formulated as
During the forward pass, we define \(\varvec{\hat{Z}} = [\varvec{\hat{z}}_{1}, \varvec{\hat{z}}_{2},\cdots , \varvec{\hat{z}}_{N_{f}}]\) to present discrete angular beam responses. \(\varvec{\hat{z}}_{l} = \varvec{g}_{i^{*}}\) are then selected corresponding to the approximate AoDs \(\varvec{\bar{\phi }}_{i}\). Different from the setup in Section 3, we acquire the optimal beam prediction based on the attention scoring results of \(\varvec{\hat{Z}}\) to enhance the ability of environmental representation.
5.2 Hierarchical learning procedure
The proposed QPTNet contains two hierarchical levels for extracting the features from frequency and spatial domains. For the first level, we apply a GRU encoder to extract the relations between two domains and the spatial feature \(\varvec{Z}\). We employ the GRU module as the instant feature extractor due to its high training efficiency and similarity to the LSTM network in temporal sequence learning. GRU encoder proposes to synchronize the CSI estimation based on the gate mechanism to control the spatial feature in different subcarriers. In processing \(\varvec{\tilde{H}}\) at each subcarrier index, the GRU encoder takes in the channel delay response at the current subcarrier as well as the shared hidden state and output from the previous subcarrier step \((\varvec{z}_{l1},\varvec{\Theta })\) and updates its hidden state \(\varvec{\Theta }\) at the current subcarrier index, resulting into a new \(\varvec{z}_{l}\). The process can be described by
where \({\mathbb {P}}(\cdot )\) represents the conditional probability distribution over the subcarriers. Once the continuous spatial feature \(\varvec{z}_{l}\) is extracted, we quantize it by a latent beam codebook to perceive the environmental component. According to (26) and (27), we perform a nearest neighbor search to decide the posterior latent beam vector related to the discrete angular index \(\varvec{\bar{\phi }}_{i^{*}}\).
The input to the decoder corresponds to the selected latent beam matrix \(\varvec{\hat{Z}}\). We reconstruct the channel matrix based on \(\varvec{\hat{Z}}\) to guarantee the consistency between the latent beamspace and channel space. The reconstruction loss is formulated as
\({\mathcal {F}}_\mathrm{{{enc}}}\)/\({\mathcal {F}}_{dec}\) are GRU encoder/decoder mapping and the reconstructed channel \(\varvec{\hat{H}}= {\mathcal {F}}_{dec}(\varvec{\hat{Z}})\), respectively. The forward computation pipeline can be regarded as the regular autoencoder with a specific nonlinearity that corresponds to the 1of\(N_{CB}\) latent beam vectors. The optimization of QPTNet follows the backpropagating in [39]. To make sure the encoder commits to a latent beamspace and its output does not grow, we add a commitment loss. Thus, the total training loss becomes
where \(sg[\cdot ]\) means the stop gradient operator defined as an identity at forward computation time and has zero partial derivatives, \(\varvec{G}= [\varvec{g}_{1},\varvec{g}_{1},\cdots ,\varvec{g}_{N_{CB}}]\), and \(\beta\) is a hyperparameter. The first term is the beam prediction loss in (23) and the optimization does not change the latent beam codebook because of the nonlinearity of quantization. The second term is the CSI reconstruction loss to update the latent beam codebook. The third term utilizes the \(l_{2}\) norm loss to move the latent beam codebook vectors close to the encoder output \({\mathcal {F}}_{enc}(\varvec{\tilde{H}})\), and the fourth term ensures that the encoder outputs toward the codeword.
6 SEQPTNetassociated beam training
This section proposes SEQPTNet to provide an unsupervised pretraining strategy for improving the performance of QPTNet given the small training dataset. In SEQPTNet, we construct positive and negative samples based on the anchor operation. Then, the pretraining process can be completed by detecting the relationship among a positive sample of target UE, negative samples of others, and global beam signature with the contrastive environmental prediction. Therefore, SEQPTNet can constantly perceive the environment component under a contrastive learning framework to further improve the performance of beam training. Figure 4 describes the procedure of selfenhanced pretraining in SEQPTNet.
To pretrain the proposed SEQPTNet under the contrastive learning framework, we present an anchor operation to acquire the positive and negative samples. Specifically, we obtain forward latent beam features by splitting the outputs of quantization with the anchor t and explore the global beam signature \(\varvec{c}_{t}\) based on attention scoring evaluation. Meanwhile, the reconstructed channel data streams recovered by the GRU decoder divide as contrastive samples with length k, in which the positive sample is determined by the CSI of the target UE location while negative samples are determined by the others. SEQPNet enhances the similarity between the target UE and the prediction of \(\varvec{c}_{t}\) while making the other UE responses dissimilar by the contrastive environmental prediction. Benefiting from this distinction, we can facilitate latent angular learning within the latent beams corresponding to the dominant path, discard the noises, and enhance the representation of the environmental component.
Considering the mmWave massive MIMO system, the amplitudes and phases of the received signals suffer from spatial differences due to a significant fraction of propagation delay [40]. To achieve robustness, the proposed SEQPTNet compensates for the effects of propagation delay by separately dealing with the selected latent beams. We denote the global view \(\varvec{\hat{Z}}\) output from the quantization for notational convenience. We select an anchor t and define \(\varvec{\hat{Z}}_{\le t}\) as the forward component. Taking inspiration from recent advancements [41], we introduce a token \(\varvec{\hat{z}}_{0}\) at the beginning of \(\varvec{\hat{Z}}_{\le t}\) and feed them to the spatial attentionassociated beam prediction to identify the beam signature vector \(\varvec{c}_{t}\). We then predict the k reconstructed channels by a linear mapping \(\varvec{\bar{H}}_{t+k} = \varvec{W}_{t+k} \varvec{c}_{t}\), where \(\varvec{W}_{t+k}\) denotes the weight of the linear mapping \({\mathcal {T}}\). If the largescale dense channel signatures can be successfully predicted, it indicates that the beam signature vector \(\varvec{c}_{t}\) contains a global and robust view of the propagation environment. By combining the positive and negative samples, the loss function of contrastive environmental prediction is
where \(\varvec{\bar{H}}_\mathrm{{{neg,l}}}\) denotes negative samples taken from other channel data, and the positive sample is \(\varvec{\hat{H}}_{t+k}\). Considering the latent beam codebook updating, the total pretraining loss is expressed as
The pretraining process is illustrated in Algorithm 1.
Once the pretraining process is completed, we transfer the parameters of the GRU encoder/decoder and spatial attention to the QPTNet for supervised beam training. According to (23) and (32), the total beam training loss of the proposed SEQPTNet can be described as
Benefiting from the pretrained model, the network can achieve high performance with a small training dataset. It allows mitigation of preprocessing tasks for labeling the actual optimal beam knowledge, reduces training overhead, and reaches stronger adaptability to environmental variation.
7 Results and discussion
7.1 Simulation set up
We evaluate the training and predicting performances of the proposed QPTNet and SEQPTNet on three tasks: training performance, success rate, and achievable rate. We consider publicly available evaluation scenarios from the DeepMIMO dataset [42] for data generation. The scenario is constructed using the 3D raytracing software Wireless InSite [43], which captures the channel dependence on frequency and spatial domains. The mmWave signal is available at 28 GHz in an outdoor scenario, and we consider a single BS with ULA massive MIMO located at BS 3 of the scenario. We collect the mmWave channel data by the DeepMIMO generator and describe the details in Table 1. The main parameters of the proposed framework are represented in Table 2.
7.2 Training performance
We first compare the training performance of QPTNet and SEQPTNet with different sizes of latent beam codebook options, and then, we evaluate the proposed schemes against existing related works in [16, 29].
Figure 5 presents the training performance of different latent beam codebook options between QPTNet and SEQPTNet. The results indicate that the latent beam codebook with 256 spatial resolution results in the smallest training loss values for both QPTNet and SEQPTNet. Additionally, SEQPTNet outperforms QPTNet with arbitrary latent beam codebook options, achieving a much lower training loss curve. These results highlight the importance of higher spatial resolution options in precisely quantifying the continuous channel features into categorical beams to achieve better beam training performance and faster convergence. It also shows that the contrastive learning framework based SEQPTNet can improve the learning efficiency by consistently interacting with the propagation environment and accounting for the mmWave channel fluctuation in the reidentification of QPTNet. Therefore, the higher spatial resolution option and the contrastive environmental prediction can enhance training performance and learning efficiency. Consequently, we set the latent beam codebook option \(N_{CB}=256\) for the following results.
Figure 6 illustrates the training comparisons among the proposed SEQPTNet, QPTNet, spatial attentionassociated beam prediction, and DNN [21] and LSTM attention [30]. The training loss performance indicates that the SEQPTNet achieves the fastest convergence with the loss value of 0.12, outperforming LSTM attention (0.15), DNN (1.8), spatial attention (0.31), and QPTNet (0.4). It is observed that the learning efficiency of SEQPTNet is the best among the comparisons, benefiting from its competitive initial loss value (0.78) and smooth convergence. Looking closely at QPTNet and spatial attention, the latent beams can quantify the components of the propagation environment and significantly improve the accuracy of beam prediction, leading to faster convergence. However, the latent beam codebook updated by the mmWave channel reconstruction restricts the learning efficiency of QPTNet, because of the effects of noise and channel variations.
7.3 Success rate performance
To evaluate the robustness and low training overhead, we evaluate the performance of beam prediction in terms of the success rate under the different SNRs ranging from 0 to 20 dB and with various sizes of training datasets.
Figure 7 shows the average success rate of proposed SEQPTNet, QPTNet, and related works with different training data sizes. It is visible that the proposed SEQPTNet can achieve a success rate of around 0.69 only with 1% training data and rise to 0.91 given 30% training data. To illustrate the advantage of the proposed phase quantization scheme, we apply the DNN instead of spatial attentionassociated beam prediction, named QPDNN. It demonstrates that the unsupervised pretraining strategy can perceive informative environmental components to distinguish the implicit representations of AoDs so that SEQPTNet can promote learning ability with a small training dataset. Moreover, the proposed contrastive learning framework can enhance the global beam signature with multipath interference to improve reliability. Compared with the LSTM attention, the proposed SEQPTNet obtains almost the same performance with only 5% data, which improves learning efficiency by six times. The result shows the potential to utilize less training time and mitigation of labeling the actual optimal beam knowledge to perform a flexible beam training process.
To further investigate the robustness and lower training overhead, we evaluate the average success rate using 50% of the training dataset with the proposed SEQPTNet. The results are presented in Fig. 8. It shows the success rate performance of the proposed SEQPTNet (50% training data), QPTNet, spatial attentionassociated beam prediction, DNN, and LSTM attention. Our proposed SEQPTNet achieves a success rate of 0.58, demonstrating its robustness at 0 dB SNR compared to LSTM attention, DNN, and spatial attention, as well as the proposed QPTNet, with achievable rates of 0.42, 0.43, 0.26, and 0.39, respectively. The proposed SEQPTNet exhibited robustness against the effect of noise with low SNR levels in the comparison due to its contrastive learning framework. Closely looking at QPTNet, LSTM attention, and spatial attention; the performance of QPTNet is sensitive at the low SNR level because the latent beam codebook updates with the reconstruction of the mmWave channel. It makes QPTNet incorrectly quantize the spatial feature resulting in fuzzy environmental components.
Fig. 9 illustrates the effectiveness of the proposed codebookbased phase quantization. We compare the success rate performance of different latent beam codebook options and sigmoidbased quantization [33]. The baseline is the proposed SEQPTNet without quantization. It is visible that the proposed SEQPTNet can obtain a higher success rate performance and faster convergence than the sigmoidbased method and the baseline method. The sigmoidbased quantization in [33] can map each frequency information into the range of 0 and 1 to establish a phase relationship. The continuous and monotonically increasing properties make the results of quantization hardly determine the discrete phase option. Unlike the sigmoidbased quantization, the proposed codebookbased method leads to a rich diversity of beam features corresponding to the discrete phases. Moreover, the proposed quantization method can boost robust performance (smaller error band) by increasing the size of the codebook.
In Table 3, we compare the complexity of SEQPTNet, QPTNet, and related works. Both the success rate and the achievable rate of SEQPTNet can obtain the best performance against the related works. Although the achievable rate of QPTNet and SEQPTNet are almost identical, the success rate of SEQPTNet is higher. We can observe an improvement of \(2.6 \%\), owing to the contrastive environmental prediction that enables it to update the latent beam codebook efficiently. The number of parameters of SEQPTNet is lower than those of LSTM attention [30], spatial attention [37], as well as the DNN [21], which can save \(18\%\), \(28\%\), and \(82\%\) parameter requirements. The results demonstrate that the proposed contrastive learning framework can improve DL performance with low overhead. Additionally, contrastive environmental prediction is beneficial for improving the learning efficiency by distinguishing the target UE from others, maintaining a low parameter overhead and better global beam signature representation. Consequently, the proposed contrastive learning framework contributes to promoting learning efficiency and getting rid of high complexity.
Table 4 investigates the effects of spatial attention modules for beam prediction. We compare the complexity, success rate, and achievable rate of different spatial attention options. According to the results, the success rate can be improved by \(20\%\) when the number of attention modules is larger than 1. Increasing the number of attention modules does not significantly improve the accuracy of beam prediction, but obviously increases the overhead. Therefore, we adopt two spatial attention modules as the optimal option for SEQPTNet.
7.4 Achievable rate performance
The proposed SEQPTNet can obtain extra performance by pretraining without label information and achieve a high transmission rate with a reduced amount of labeled CSI. To evaluate the training overhead and robustness in practical systems, we plot the achievable rate for different training dataset sizes and their performance under different SNR levels.
In Table 5, we compare the achievable rate of QPTNet and SEQPTNet against different latent beam codebook options. The achievable rates of SEQPTNet and QPTNet are close but still higher than that of QPTNet without noise effects. Although the performance of QPTNet and SEQPTNet is almost identical, SEQPTNet is more robust. We can observe an improvement of 0.41 in 10 dB SNR, owing to the contrastive environmental prediction that enables it to update the latent beam codebook efficiently. The results demonstrate that a larger spatial resolution can comprehensively quantify environmental variations to improve beam prediction accuracy. Additionally, contrastive environmental prediction is beneficial for improving robustness by the inherent representation of the global beam signature among different UEs.
To illustrate the low training overhead of proposed solutions, Fig. 10 shows the achievable rate of the proposed SEQPTNet, QPTNet, and DNN approaches, defined in (5) for different dataset sizes. Note that the dotdash line in Fig. 10 represents the upper bound on the performance of the given system and channel models. The horizontal axis represents the training data size, and the achievable rate results calculate with \(1\%, 5\%, 10\%, 30\%, 50\%, 70\%, 100\%\) training data. The results of SEQPTNet and QPTNet are close to the upper bound with only 10% training data, indicating that the proposed solutions allow the BS to apply the desired beam with an efficient training scheme. It is visible that SEQPTNet can successfully predict the optimal beam based on the contrastive learning framework with only \(1\%\) training data at the BS. It clearly illustrates the ability of the selfenhancement to efficiently represent the desired angular sector with positive and negative samples, achieving negligible training overhead. Moreover, the proposed solutions are flexible in customizing the beam solution with small training data. Therefore, approaching this bound illustrates the optimality of the proposed solutions
To verify the robustness of the proposed SEQPTNet and QPTNet, we investigate the achievable rate of different SNR levels, as shown in Fig. 11. The dotdash line represents the upper bound performance, the optimal rate without noise for the given system and channel models. The results show that SEQPTNet and QPTNet are consistently better than the simple DNN beam training approach. Benefiting from the contrastive environmental prediction to update the latent beam codebook, SEQPTNet can achieve a higher rate than the QPTNet at low SNR levels. Both proposed beam training schemes are near the optimal rate when SNR is higher than 15 dB. It illustrates that the hierarchical strategy and the quantization can deal with the frequency and spatial channel efficiently, and the ability of the selfenhancement can provide a better perception of the complicated propagation environment by the contrastive learning framework. Moreover, the contrastive environmental prediction can improve the robustness relying on the distinct global beam signature by considering the similarity among global beam signature and positive/negative samples.
8 Conclusion
This paper proposed a novel framework SEQPTNet for beam training based on spatial attention and quantization, utilizing the contrastive environmental prediction to detect the relationship between the beam signature of the target UE and others. The proposed hierarchical scheme QPTNet quantized the continuous spatial features into controllable latent beam responses within a wide range of discrete phases achieving higher robustness. The proposed contrastive learning framework SEQPTNet enhanced the capability of identifying the beam signature by contrastive environmental prediction with a small fraction of the labeled CSI dataset. The proposed SEQPTNet permitted DLintegrated beam training to adapt flexibly to wireless environments with low labeling costs. The simulation results showed that the proposed SEQPTNet framework can achieve a data rate of 7.01 bps/Hz with only \(5\%\) labeled data, which represents \(95.1\%\) of the performance utilizing the fullsize dataset. The proposed SEQPTNet framework outperformed existing DLbased schemes, obtaining higher capacity with lower expertise requirements and highly reliable performance for mmWave massive MIMO systems.
Availability of data and materials
The millimeterwave massive MIMO dataset is generated from the DeepMIMO, a public framework for generating largescale MIMO data based on Accurate Remcom 3D raytracing.
Abbreviations
 5G/6G:

Fifth/sixth generation
 mmWave:

Millimeter wave
 MIMO:

Multipleinput multipleoutput
 BS:

Base station
 UE:

User equipment
 CSI:

Channel state information
 LOS:

Lineofsight
 NLOS:

Nonlineofsight
 AoA:

Angleofarrival
 AoD:

Angleofdeparture
 RF:

Radio frequency
 CS:

Compressive sensing
 DL:

Deep learning
 SL:

Supervised learning
 DNN:

Deep neural network
 BPNet:

Beam prediction network
 VUB:

Virtual uplink beamforming
 LSTM:

Long shortterm memory
 GRU:

Gate recurrent unit
 QPTNet:

Quantized phasebased transformer network
 SEQPTNet:

Selfenhanced QPTNet
 RIS:

Reconfigurable intelligent surface
 AWGN:

Additional white Gaussian noise
 DFT:

Discrete Fourier transform
 FC:

Fully connected
 GAP:

Global average pooling
References
T.S. Rappaport, Y. Xing, G.R. MacCartney, A.F. Molisch, E. Mellios, J. Zhang, Overview of millimeter wave communications for fifthgeneration (5G) wireless networks with a focus on propagation models. IEEE Trans. Antennas Propag. 65(12), 6213–6230 (2017). https://doi.org/10.1109/TAP.2017.2734243
M. Alsabah, M.A. Naser, B.M. Mahmmod, S.H. Abdulhussain, M.R. Eissa, A. AlBaidhani, N.K. Noordin, S.M. Sait, K.A. AlUtaibi, F. Hashim, 6G wireless communications networks: a comprehensive survey. IEEE Access 9, 148191–148243 (2021). https://doi.org/10.1109/ACCESS.2021.3124812
T.S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G.N. Wong, J.K. Schulz, M. Samimi, F. Gutierrez, Millimeter wave mobile communications for 5G cellular: it will work! IEEE Access 1, 335–349 (2013). https://doi.org/10.1109/ACCESS.2013.2260813
J. Hoydis, S. ten Brink, M. Debbah, Massive MIMO in the UL/DL of cellular networks: how many antennas do we need? IEEE J. Sel. Areas Commun. 31(2), 160–171 (2013). https://doi.org/10.1109/JSAC.2013.130205
J. Mo, B.L. Ng, S. Chang, P. Huang, M.N. Kulkarni, A. Alammouri, J.C. Zhang, J. Lee, W.J. Choi, Beam codebook design for 5G mmWave terminals. IEEE Access 7, 98387–98404 (2019). https://doi.org/10.1109/ACCESS.2019.2930224
F.A. Pereira de Figueiredo, An overview of massive MIMO for 5G and 6G. IEEE Lat Am Trans 20(6), 931–940 (2022). https://doi.org/10.1109/TLA.2022.9757375
J. Li, Y. Niu, H. Wu, B. Ai, S. Chen, Z. Feng, Z. Zhong, N. Wang, Mobility support for millimeter wave communications: opportunities and challenges (IEEE Commun. Surv, Tutor, 2022)
Y. Li, J.G. Andrews, F.h. Baccelli, Design and analysis of initial access in millimeter wave cellular networks. IEEE Trans. Wireless Commun. 16(10), 6409–6425 (2017). https://doi.org/10.1109/TWC.2017.2723468
J. Kim, A.F. Molisch, Fast millimeterwave beam training with receive beamforming. J. Commn. Net 16(5), 512–522 (2014). https://doi.org/10.1109/JCN.2014.000090
M.E. Eltayeb, A. Alkhateeb, R.W. Heath, T.Y. AlNaffouri, Opportunistic beam training with hybrid analog/digital codebooks for mmWave systems. In: Proc. IEEE GlobalSIP, pp. 315–319 (2015). https://doi.org/10.1109/GlobalSIP.2015.7418208
J. Wang, Z. Lan, C. Pyo, T. Baykas, C. Sum, M.A. Rahman, J. Gao, R. Funada, F. Kojima, H. Harada, S. Kato, Beam codebook based beamforming protocol for multiGbps millimeterwave WPAN systems. IEEE J. Sel. Areas Commun. 27(8), 1390–1399 (2009). https://doi.org/10.1109/JSAC.2009.091009
C. Qi, K. Chen, O.A. Dobre, G.Y. Li, Hierarchical codebookbased multiuser beam training for millimeter wave massive mimo. IEEE Trans. Wireless Commun. 19(12), 8142–8152 (2020). https://doi.org/10.1109/TWC.2020.3019523
Z. Xiao, T. He, P. Xia, X.G. Xia, Hierarchical codebook design for beamforming training in millimeterwave communication. IEEE Trans. Wireless Commun. 15(5), 3380–3392 (2016). https://doi.org/10.1109/TWC.2016.2520930
S.E. Chiu, N. Ronquillo, T. Javidi, Active learning and csi acquisition for mmwave initial alignment. IEEE J. Sel. Areas Commun. 37(11), 2474–2489 (2019). https://doi.org/10.1109/JSAC.2019.2933967
I. Aykin, M. Krunz, Efficient beam sweeping algorithms and initial access protocols for millimeterwave networks. IEEE Trans. Wireless Commun. 19(4), 2504–2514 (2020). https://doi.org/10.1109/TWC.2020.2965926
A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, D. Tujkovic, Deep learning coordinated beamforming for highlymobile millimeter wave systems. IEEE Access 6, 37328–37348 (2018)
H. Echigo, Y. Cao, M. Bouazizi, T. Ohtsuki, A deep learningbased low overhead beam selection in mmwave communications. IEEE Trans. Veh. Technol. 70(1), 682–691 (2021). https://doi.org/10.1109/TVT.2021.3049380
S. Rezaie, C.N. Manchón, E. de Carvalho, Location and orientationaided millimeter wave beam selection using deep learning. In: Proc. IEEE ICC, https://doi.org/10.1109/ICC40277.2020.9149272
M. Alrabeiah, A. Alkhateeb, Deep learning for mmWave beam and blockage prediction using sub6 GHz channels. IEEE Trans. Commun. 68(9), 5504–5518 (2020)
H. Huang, Y. Peng, J. Yang, W. Xia, G. Gui, Fast beamforming design via deep learning. IEEE Trans. Veh. Technol. 69(1), 1065–1069 (2019)
C. Qi, Y. Wang, G.Y. Li, Deep learning for beam training in millimeter wave massive MIMO systems. IEEE Trans. Wireless Commun. (2020). https://doi.org/10.1109/TWC.2020.3024279
K. Ma, D. He, H. Sun, Z. Wang, S. Chen, Deep learning assisted calibrated beam training for millimeterwave communication systems. IEEE Trans. Commun. 69(10), 6706–6721 (2021). https://doi.org/10.1109/TCOMM.2021.3098683
H.Jia, Liu, Z. Wang, N. Chen, M. Okada, Nondeterministic sparse feature learning for reliable beam prediction in mmWave massive MIMO systems. Proc. IEEE PIMRC (2022). https://doi.org/10.1109/PIMRC54779.2022.9977796
Z. Xiao, H. Dong, L. Bai, P. Xia, X.G. Xia, Enhanced channel estimation and codebook design for millimeterwave communication. IEEE Trans. Veh. Technol. 67(10), 9393–9405 (2018). https://doi.org/10.1109/TVT.2018.2854369
J. Palacios, D. De Donno, J. Widmer, Tracking mmwave channel dynamics: Fast beam training strategies under mobility. In: Proc. IEEE INFOCOM, pp. 1–9 (2017). https://doi.org/10.1109/INFOCOM.2017.8056991
W. Ma, C. Qi, Z. Zhang, J. Cheng, Sparse channel estimation and hybrid precoding using deep learning for millimeter wave massive MIMO. IEEE Trans. Commun. 68(5), 2838–2849 (2020). https://doi.org/10.1109/TCOMM.2020.2974457
M. Hussain, N. Michelusi, Learning and adaptation for millimeterwave beam tracking and training: A dual timescale variational framework. IEEE J. Sel. Areas Commun. 40(1), 37–53 (2022). https://doi.org/10.1109/JSAC.2021.3126086
S. Chen, Z. Jiang, S. Zhou, Z. Niu, Timesequence channel inference for beam alignment in vehicular networks. In: Proc. IEEE GlobalSIP, pp. 1199–1203 (2018). https://doi.org/10.1109/GlobalSIP.2018.8646413
H. Jia, N. Chen, M. Okada, Memory shared spatial attention neural network for reliable beam prediction. In: Proc. IEEE GCCE, pp. 107–108 (2022). https://doi.org/10.1109/GCCE56475.2022.10014249
L. Dai, X. Gao, S. Han, I. ChihLin, X. Wang, Beamspace channel estimation for millimeterwave massive MIMO systems with lens antenna array. In: Proc. IEEE ICCC, pp. 1–6 (2016). https://doi.org/10.1109/ICCChina.2016.7636854
H. Hojatian, V.N. Ha, J. Nadal, J.F. Frigon, F. LeducPrimeau, RSSIbased hybrid beamforming design with deep learning. In: Proc. IEEE ICC, pp. 1–6 (2020). https://doi.org/10.1109/ICC40277.2020.9149321
K. Chen, J. Yang, Q. Li, X. Ge, Subarray hybrid precoding for massive MIMO systems: a CNNbased approach. IEEE Commun. Lett. 25(1), 191–195 (2021). https://doi.org/10.1109/LCOMM.2020.3022898
H. Hojatian, J. Nadal, J.F. Frigon, F. Leduc–Primeau, Flexible unsupervised learning for massive MIMO subarray hybrid beamforming. In: IEEE GLOBECOM, pp. 3833–3838 (2022). IEEE
H. Hojatian, J. Nadal, J.F. Frigon, F. LeducPrimeau, Unsupervised deep learning for massive MIMO hybrid beamforming. IEEE Trans. Wirel. Commun. 20(11), 7086–7099 (2021). https://doi.org/10.1109/TWC.2021.3080672
C. Gustafson, K. Haneda, S. Wyne, F. Tufvesson, On mmWave multipath clustering and channel modeling. IEEE Trans. Antennas Propag. 62(3), 1445–1455 (2014). https://doi.org/10.1109/TAP.2013.2295836
T.S. Rappaport, Y. Xing, G.R. MacCartney, A.F. Molisch, E. Mellios, J. Zhang, Overview of millimeter wave communications for fifthgeneration (5G) wireless networkswith a focus on propagation models. IEEE Trans. Antennas Propag. 65(12), 6213–6230 (2017). https://doi.org/10.1109/TAP.2017.2734243
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need. In: Proc. NeuralIPS, pp. 5998–6008 (2017)
M. Lin, Q. Chen, S. Yan, Network in network. arXiv preprint arXiv:1312.4400 (2013)
A. Van Den Oord, O. Vinyals, et al., Neural discrete representation learning. In: Proc. NeuralIPS, vol. 30 (2017)
B. Wang, F. Gao, S. Jin, H. Lin, G.Y. Li, S. Sun, T.S. Rappaport, Spatialwideband effect in massive MIMO with application in mmWave systems. IEEE Commun. Mag. 56(12), 134–141 (2018). https://doi.org/10.1109/MCOM.2018.1701051
A.v.d. Oord, Y. Li, O. Vinyals, Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
A. Alkhateeb, DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications. CoRR arxiv: abs/1902.06435 (2019)
Remcom: Wireless insite, https://www.remcom.com/wirelessinsiteempropagationsoftware/ (July, 2023)
Acknowledgements
This work was supported in part by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number 23K16870 and Hirose Foundation.
Author information
Authors and Affiliations
Contributions
All the authors contributed equally to data collection, processing, experiments and article writing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Jia, H., Chen, N., Urakami, T. et al. Spatial attention and quantizationbased contrastive learning framework for mmWave massive MIMO beam training. J Wireless Com Network 2023, 69 (2023). https://doi.org/10.1186/s1363802302277w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1363802302277w
Keywords
 MmWave
 Massive MIMO
 Deep learning
 Spatial attention
 Feature quantization
 Contrastive learning