 Research
 Open Access
 Published:
Enhancing energy efficiency for cellularassisted vehicular networks by online learningbased mmWave beam selection
EURASIP Journal on Wireless Communications and Networking volume 2022, Article number: 1 (2022)
Abstract
Millimeter Wave (mmWave) technology has been regarded as a feasible approach for future vehicular communications. Nevertheless, high path loss and penetration loss raise severe questions on mmWave communications. These problems can be mitigated by directional communication, which is not easy to achieve in highly dynamic vehicular communications. The existing works addressed the beam alignment problem by designing online learningbased mmWave beam selection schemes, which can be well adapted to high dynamic vehicular scenarios. However, this kind of work focuses on network throughput rather than network energy efficiency, which ignores the consideration of energy consumption. Therefore, we propose an Energy efficiencybased FML (EFML) scheme to compensate for this shortfall. In EFML, the energy consumption is reduced as far as possible under the premise of meeting the basic data rate requirements of vehicle users, and the users requesting the same content in close proximity can be organized into the same receiving group to share the same mmWave beam. The simulation results demonstrate that, compare with the comparison method with best energy efficiency, the proposed EFML improves energy efficiency by 17–41% in different scenarios.
Introduction
Connected vehicles and Cooperative Intelligent Transport System (CITS) systems will depend on Vehicle to Everything (V2X) communications to improve traffic safety, driving efficiency, and infotainment experience. The Long Term Evolution (LTE) network and the FifthGeneration (5G) cellular network have been widely recognized as the main infrastructure to adapt to the characteristics of connected vehicles, such as high mobility, high time sensitivity, and high transmission reliability. With the rapid growth of connected vehicles and vehicle users’ continuous pursuit of experience quality, higher requirements are put forward for network capacity. Although the sub6 GHz frequency band adopted in the LTE network cannot meet the increasing capacity demand, it has a longer communication distance. On the contrary, the millimeter Wave (mmWave) frequency band adopted in the 5G system can support higher transmission capacity, but it has a shorter communication distance. Therefore, the 5G system and the LTE network are widely recognized as the main driving forces for fully supporting all the V2X applications.
Nevertheless, mmWave signals are characterized by high path loss and penetration loss, which can be alleviated by directional communication. The high mobility of connected vehicles will raise the complexity of directional mmWave communications. The quality of directional mmWave communications depends on whether and to what extent the beams at both ends of the communication are aligned [1]. The high mobility of connected vehicles will force alignment operations to be performed frequently, resulting in high overhead. For the usually used beam alignment mechanisms (e.g., beamforming training, beam tracking, and beam selection), it will be a promising option to always select a reasonable beam pair for a mmWave link in time. However, with the growth of the number of users and beams, the search space for selecting a reasonable beam for the communication link will increase, which undoubtedly increases the delay of establishing a mmWave communication link. Moreover, due to the many communication parameters involved in beam selection, pure mathematical modeling is very complex. So, some researchers are trying to use machine learning to figure out how to solve this problem, where beam selection methods based on online learning [2, 3] can be used without prior training and will be well suitable for dynamic vehicular networks.
The existing online learningbased mmWave beam selection methods are based on the assumption that the performance of a specific beam is similar in similar contexts. Each mmWave Small Base Station (mmSBS) can select a set of the appropriate beams by selfexploration, learning, and adapting to the dynamic communication environment. Specifically, a mmSBS can learn independently from its previous decisions and the relationship to available vehicular contexts and thus becomes conscious of the performance of each mmWave beam. In the Fast Machine Learning (FML) scheme in [2], the beam selection problem is modeled as a contextual MultiArmed Bandit (MAB) problem and the mmSBS can learn the data rate of the chosen beam by the FML without requiring a training process. However, the assumption that there are not overlapping mmWave beams in FML limits the optional beam width and azimuth range. In the Improved Fast Machine Learning (IFML) scheme in [3], the beams are allowed to overlap and the virtual beam concept is proposed. Unlike the number of Radio Frequency (RF) chains, which is limited by formfactor and manufacturing cost [4], a virtual beam is only a region that limits energy propagation and its shape is usually a cone (or sector). Due to the abundant frequency band resources of mmWave, it is easy to provide different frequency bands for mmSBS. That is, the number of virtual beams can be infinite at each mmSBS. When an RF chain is assigned to a virtual beam and also a specific channel is assigned to it, this virtual beam becomes the actual available physical beam and its use is not limited by beam overlap. When the overlapping beams share the same channel, the corresponding transmission powers should be adjusted to control the mutual interference. However, the specific solutions for power adjustment is not covered in [3] and the setting of some discrete power values is just simply considered. Furthermore, in the existing related works, the performance measurement standard of each beam is the amount of data received by the vehicle rather than the energy efficiency, which is not suitable for the development demand of green communication due to the lack of energy consumption concerns. Also, the authors in [3] do not consider aggregating users in close proximity who request the same data content (e.g., the latest traffic congestion information, realtime highdefinition electronic maps, current events, and news) to serve as a multicast group, resulting in the possibility of consuming resources repeatedly to send the same content. Therefore, to address the above problems, we propose an Energy efficiencybased FML (EFML) scheme and list the main contributions as follows.

1.
Different from the existing online learningbased mmWave beam selection schemes aiming to maximize the overall aggregated received data, our scheme aims at enhancing the energy efficiency of cellularassisted vehicular networks.

2.
In our scheme, the transmission power of each mmSBS is allowed to be adjusted as long as the energy efficiency can be improved. Therefore, the energy consumption can be reduced as far as possible under the premise of meeting the basic data rate requirements of vehicle users.

3.
To further reduce energy consumption and save communication resources, the users requesting the same data content in close proximity are organized into the same receiving group to share the same mmWave beam and reduce the occupation of RF chains.

4.
The simulation results show that, compared with the existing online learningbased mmWave beam selection schemes, the EFML scheme substantially improves the energy efficiency and the amount of data of cellularassisted vehicular networks at the cost of more system overhead. However, after a period of sufficient online learning, there is no difference in the cost of updating the beam performance of the system.
In the rest of the paper, the related works are presented in Sect. 2. The system model and the detail of the EFML algorithm are addressed in Sects. 3 and 4, respectively. Simulations results are discussed in Sect. 5. Finally, we conclude this paper in Sect. 6. Furthermore, for the convenience of readers, the main notations of this paper can be found in Table 1.
Related work
There have been many solutions to the problem of beam selection in traditional networks (e.g., the works in [5, 6]). However, they need complex transceiver links and accurate location information and thus undoubtedly cause high overhead and delay. Unlike the above works in the sub6 GHz bands, there are many works on beam selection in mmWave networks. The authors in [7] proposed a mmWave beam selection method based on deep learning which utilizes the channel characteristics of the sub6 GHz band to solve the mmWave beam selection problem, while the authors in [8] presented a beam alignment algorithm based on machine learning for the beam management problems in the mmWave massive Multiinput Multioutput (MIMO) networks. The author in [9] proposed an iterative order minimum optimization training scheme based on the simulated beam selection of machine learning. The above schemes require a large number of prior data samples and beam training processes.
In [10], the authors studied the problem of multiple RF chains of mmWave transceivers in mobile mmWave communication systems and developed a codebookbased beam tracking strategy, which shows that the performance of the beam tracking strategy can be improved by optimizing the transmit power of the training beam. The authors in [11] gave an overview of current beam management approaches based on 5G standardization, including some of the major challenges and future trends for mmWave communications in current 5G New Radio (NR) standards. By analyzing the average search delay of two different mmWave network models, the authors in [12] found that the average number of searches is related to the number of search sectors. The authors in [13] designed a lowcost beamforming moduleassisted hybrid architecture and proposed a fast beam training method. The authors in [14] studied the sensitivity of the beam stability selected by the base station. By observing different operating frequencies, dynamic channel characteristics, and different user mobility, they found that the perceived timeofstay of the beam will be affected by beam management parameterization.
The authors in [15] leveraged the advantage of the mmWave characteristics in ultradense networks and proposed a method for joint optimization and resource allocation between base stations and users. Specifically, they aimed at maximizing user throughput in the system while also considering fairness. To reduce the overhead and complexity of the wireless backhaul and access process, the authors in [16] proposed a hybrid beamforming multistage design scheme based on channel feedback. To improve the efficiency of userprovided networks through resource allocation of links, the authors in [17] proposed a joint incentive and resource allocation algorithm, which considered the restriction of network resources, incentive system and user fairness. Moreover, to alleviate the overload problem of cellular networks and save cellular network resources, the authors in [18] proposed the traffic offloading method through opportunistic mobile networks. Also, the authors in [19] proposed an incentive mechanism based on delay constraint and reverse auction to stimulate WiFi access points to participate in the data unloading process. To further reduce the traffic burden of cellular networks and the cost of content service providers, the authors in [20] proposed a new method based on incentive drive and deep Q network, which considered the incentive mechanism and content caching strategy to improve the offloading performance.
To reduce the overhead of establishing a mmWave link in vehiculartoeverything networks, the authors in [21] proposed a beam training method based on the assistance of outofband information. The authors in [22] proposed a beamforming scheme based on deep learning for high mobility mmWave systems. To further reduce the beamforming overhead of the mmWave system, the authors in [23] proposed an intelligent prediction beam alignment algorithm from the Multiple Access Control (MAC) layer of the mmWave vehicle system. The authors in [24] proposed a machine learning approach based on situational awareness to predict mmWave beams. Specifically, this approach learns beam information from some past observations including the position of the vehicles and the optimal beam. The authors in [25] proposed a neural networkbased algorithm for beam alignment in vehicular networks. However, this scheme needs to learn more information about the channel state and can only select the best beam direction for a single user. Considering the propagation characteristics of mmWave in 5G vehicular networks, the authors in [26] proposed a simulated annealbased beam management model to improve the effective communication of the system. The challenges of mmWave communication for the vehicular networks are also investigated in [27, 28].
Moreover, MAB is a classic and general online learning method and has been used to solve various problems in wireless communication networks [29]. The author in [30] developed an equivalent structured MAB model to solve the beam alignment problem in the mmWave system. However, this method requires an exhaustive search for beam alignment between transceivers, which will cause great system overhead due to the large search space. The authors in [2] proposed FML to address the contextawareness beam selection issue. Specifically, they modeled the problem of beam selection as a contextual MAB problem and proved the convergence of FML. However, they only consider onedimensional contextual information and only one vehicle can be served within the beam range.
The authors in [31] modeled the problem of beam selection as a contextual combinatorial MAB problem with delayed feedback and Quality of Service (QoS) constraints and proposed an online learning algorithm that achieves a good balance between satisfying the performance guarantee of the system and maximizing the network capacity. However, since this prediction mechanism requires the view information of the source mmWave base station, it will cause greater system overhead. In addition, the fast mobility of the vehicle scenarios is also a big challenge to this prediction method. The authors in [32] developed an online learning algorithm for beam selection by using the MAB framework that requires learning rough beam orientation in the predefined codebook.
To reduce the time consumed in beam training, the authors in [33] proposed a beam selection scheme based on deep learning, which realized low delay and highspeed communication by reducing the number of measurements. In [34], the authors proposed lowcost joint designs of digital filters and analog beam selection, which achieved a higher network sumrates than the benchmark without joint design. Due to the high path loss and penetration loss, it is not easy to establish and track beams in mmWave vehicular communications. The authors in [35] proposed a beam selection method based on integrated learning classification to determine the beam pairs suitable for mmWave vehicular communication, which used the position and type of the receiving vehicle and its neighboring vehicles. The authors in [36] designed a locationbased beam prediction and selection technology to maximize the achievable rate in mmWave cellular systems, which leveraged the machine learning tools to deal with the blockages. With the social information and context of vehicles and passengers, the author in [37] proposed a twolayer online learning algorithm for fast and effective beam allocation for mmWave base stations. However, the goal of the above studies is to maximize the achievable rate or increase the system capacity.
The authors in [3] proposed an online learningbased algorithm for mmWave beam selection to improve the network capacity of the vehicular communication systems. Furthermore, the algorithm selects a more appropriate beam direction and beam width for the mmWave base station by setting and learning more dimensional context information. However, the above researches are all only considered maximizing the system throughput or achieving the maximum network rate, but they did not consider the power adjustment to reduce the energy consumption. Overall, the work in [3] is the most relevant to our work, but there is still room for improvement in IFML due to the problems discussed above. For example, the problem of power adjustment is not described in detail, and only unicast communication scenarios are considered. It is the main motivation for this paper to consider user multicast groups and power adjustment requirements for green and energysaving communications.
System model
Network architecture
An integrated mmWave/sub6 GHz cellular network is considered in this paper, in which some mmSBSs are overlapped in the coverage area of an LTE eNB. As shown in Fig. 1, by a wired or wireless backhaul link, a mmSBS can communicate with its associated LTE eNB. Each vehicle is equipped with two kinds of radio interfaces, where an LTE interface is used to keep a connection to the LTE eNB, and a mmWave interface is adopted for highspeed data transmission. From a theoretical point of view, an infinite number of virtual beams can be programmed per mmSBS, and the beam width of each beam can be set between 0° and 360°, where beams are allowed to overlap and the number of RF chains is much less than the number of virtual beams due to the manufacturing cost and the limitation of formfactor.
All the vehicles will be grouped according to how close they are to each other and whether the same data content is requested. In this paper, each mmSBS only provides service for each vehicle group. Even if there is only one vehicle in a vehicular network, it must form a vehicle group. Each vehicle group has a unique identifier and the other parameters associated with this group include the number of vehicles, the identifiers of vehicles, the requested data content, the central coordinates of the distribution of vehicles within the group, and the identifier of the vehicle farthest from the target mmSBS within the group.
The maximum number of RF chains at a mmSBS determines the maximum number of vehicle groups that this mmSBS can simultaneously serve. If the number of vehicle groups in the coverage of a mmSBS and the number of the virtual beams of it exceed the number of RF chains of this mmSBS respectively, the mmSBS should select the best subset of beampower pairs in order to provide the best system performance. To reach that target, we formulate each mmSBS’s beam selection as a MAB problem, where each mmSBS can identify the subset of best beams with the matching transmission power values over time. According to the description in [38], a decision maker of a MAB problem has to choose a subset of actions of unknown expected rewards to maximize the reward over time, but those which have already generated high rewards should also be exploited, where how to deal with the exploration vs. exploitation dilemma is a challenging problem.
Problem statement
Like the work in [3], the number of virtual beams at a mmSBS is not limited in this paper. Also, unlike the work in [3], the transmission power of each mmSBS is allowed to be adjusted in this paper. Therefore, besides preserving the coverage division in [3], we must also focus on how to find an appropriate transmission power for each mmSBS from the set of available transmission powers.
Firstly, for the purpose of reducing the search time of the online learning process, each mmSBS’s coverage area is divided into L nonoverlapping sectors (e.g., L = 4 in Fig. 1), where there are no more than \(M^{l}\) virtual beams for the lth sector (\(l \in \left\{ {1, \ldots , L} \right\}\)) and these virtual beams are allowed to overlap. In the lth sector, each mmSBS uses a set \({\mathcal{M}}^{l}\) and a set \(\aleph^{l}\), which includes \(M^{l} = \left {{\mathcal{M}}^{l} } \right\) virtual beams and \(N^{l} = \left {\aleph^{l} } \right\) transmission power levels. Therefore, there are \(M^{l} \times N^{l}\) beampower pairs for the lth sector.
For each sector, the mmSBS can choose a subset of no more than n beampower pairs to serve no more than n (\(n < M^{l} \times N^{l}\), \(\forall l \in \left\{ {1, \ldots , L} \right\}\)) vehicle groups simultaneously, in which the number of the served vehicle groups is limited by the maximum number of RF chains at the mmSBS and thus the maximum number of them is limited to n.
From the perspective of all the sectors, the n beampower pairs in the same sector may not be the best n beampower pairs. Thus, first of all, the mmSBS should choose no more than n best beampower pairs from every sector separately to provide them to no more than n vehicle groups in every sector respectively. Then, it chooses no more than n best beampower pairs from all the chosen beampower pairs to provide them to no more than n vehicle groups in the whole coverage area of the mmSBS.
The LTE eNB is capable of providing the vehicle group context information to the mmSBS. With the help of the LTE eNB, a vehicle group will know the location of the mmSBS and the chosen beampower pair for it. Figure 2 shows the whole process of information interaction between vehicles, the LTE eNB, and mmSBSs. When a vehicle wants to communicate with a mmSBS via a mmWave link, it firstly sends a registration request message (refer to “1: registration request” in Fig. 2) to the LTE eNB with which it keeps a continuous connectivity via its LTE interface. This registration request message contains the description of the vehicle’s velocity, location, and request data content.
The LTE eNB may receive a large number of registration requests from the vehicles in the service areas it covers. So, it will periodically analyze and handle the received registration requests based on certain policies, where the interval between successive processing operations can be adjusted according to the delay requirements of registration requests. If the interval is longer, the number of registration requests processed at a single time period may be larger, but the response per vehicle will be slower, and vice versa.
Once the time to process the registration requests arrives, the LTE eNB firstly analyzes the received registration requests to build the vehicle groups one by one according to the vehicles’ velocity, location, and request data content, and then sends a potential mmSBS a mmWave service request message (refer to “2: service request” in Fig. 2). This message contains each vehicle group’s identifier, each vehicle’s cellular system identifier in a vehicle group, the identifier of the vehicle farthest from the target mmSBS within a vehicle group, and the expected direction of arrival at the mmSBS.
By using the EFML scheme, the mmSBS will respond to the LTE eNB’s mmWave service request (refer to “3: service response” in Fig. 2) with the chosen beampower pairs. Upon receipt of service response from the mmSBS, the LTE eNB will send each vehicle in each vehicle group a registration response message about the mmSBS (refer to “4: registration response” in Fig. 2). This message contains the mmSBS’s location and the chosen beampower pairs.
Once each vehicle in the vehicle group reaches the covered area, it sends the mmSBS an associating request to start a mmSBS associating process, and then it receives an associating response from the mmSBS (refer to “5: association” in Fig. 2). Then, each vehicle obtains the Channel State Information (CSI) by analyzing the associating response message from mmSBS and feedbacks the CSI to the mmSBS. Based on the CSI feedback from all the vehicles in a group, the mmSBS can know whether the beampower pair it chooses can meet the data rate requirement of the vehicle with the worst channel quality in the group.
After the associating operations, the mmSBS starts the data transfer process (refer to “6: communication” in Fig. 2), and then it will get acknowledgments of the transferred data frames if the data transfer process is successful, where any other feedback is not required. If any vehicle in the vehicle group cannot detect the mmSBS within the chosen beampower pair, it will send the feedback to the LTE eNB (refer to “7: service feedback” in Fig. 2). Finally, in order to help mmSBS make better decisions in the future, the LTE eNB will send the feedback to the mmSBS (refer to “8: service feedback” in Fig. 2).
The selection results of beampower pairs should be adjusted in time to serve the most suitable set of vehicle groups, so each mmSBS uses a discrete time setting, where system time is divided into time slices with equal length and denoted as t (t ∈ {1, …, T}).
When each time slice t passes, all the selection results of beampower pairs will be updated. If each time slice is relatively shorter, the selection results of beampower pairs are updated timelier, but it generates a higher system overhead. Thus, how to get a reasonable tradeoff will be very critical, and one option is to determine the specific value through experience. The detailed process of selection and update for beampower pairs is described below.

1.
At the first time slot of each time slice t, a set \(g_{t}^{l} = \{ g_{t,i}^{l} i \in \left\{ {1, \ldots ,G_{t}^{l} } \right\}\}\) of vehicle groups will be registered in the lth (\(l \in \left\{ {1, \ldots ,L} \right\}\)) sector of the mmSBS via the LTE eNB, in which \(G_{t}^{l}\) is the number of vehicle groups and meets \(G_{t}^{l} = \left {g_{t}^{l} } \right \ge n\). The parameter n is determined by the maximum number of RF chains that mmSBS can support, so it also represents the maximum number of vehicle groups that can simultaneously obtain downlink transmission services in the entire coverage of mmSBS.
As mentioned above, the mmSBS obtains the information about the group context \(o_{t,i}^{l}\) of each incoming vehicle group \(g_{t,i}^{l}\). The group context \(o_{t,i}^{l}\) may be described by D context dimensions, which is regarded as an Ddimensional vector \({\text{\rm O}}_{{G_{t}^{l} }} = \{ o_{t,i}^{l} = \left\langle {o_{t,i}^{l,1} , \ldots ,o_{t,i}^{l,D} } \right\rangle i \in \left\{ {1, \ldots ,G_{t}^{l} } \right\}\}\) and acquired by the mmSBS after the first time slot of each time slice t. In this paper, we only consider vehicle group distance (which is determined by the vehicle farthest from the target mmSBS within the group) and direction of arrival as the context for a vehicle group, so the context vector is twodimensional (i.e., D = 2).

2.
The mmSBS chooses a subset of no more than n best beampower pairs from the lth sector, in which the set of chosen beampower pairs in each time slice t is indicated as \({\mathcal{B}}{\text{\rm P}}_{t}^{l} = \{ \left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle j \in \left\{ {1, \ldots ,n} \right\}, n < M^{l} \cdot N^{l} \}\). After this, it reselects no more than n beampower pairs from \(\bigcup\nolimits_{l = 1}^{L} {{\mathcal{B}}{\text{\rm P}}_{t}^{l} }\) to serve no more than n vehicle groups within the mmSBS’s coverage area. Finally, no more than n vehicle groups in \(\bigcup\nolimits_{l = 1}^{L} {g_{t}^{l} }\) are selected to accept service, and each vehicle of each selected vehicle group is informed about the chosen beampower pair by the associated LTE eNB by adopting their LTE interfaces.

3.
When any vehicle of each chosen vehicle group (e.g.,\(g_{t,i}^{l}\)) reaches its expecting coverage of mmSBS, it receives communication data from this mmSBS and feeds this situation back to it. The mmSBS only observes the amount of data successfully received by the vehicle with the worst channel quality within each group, and then regarded it as the amount of data \(r_{{\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle }} \left( {o_{t,i}^{l} } \right)\) that the vehicle group \(g_{t,i}^{l}\) successfully receives via the chosen beampower pair \(\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle\), until the time slice t is over or the vehicle with the worst channel quality in the group is not covered by its beam.
The amount of data \(r_{{\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle }} \left( {o_{t,i}^{l} } \right)\) is usually limited to \(r_{{{\text{max}}}}\), in which \(r_{{{\text{max}}}}\) is the maximum amount of data that can be received by the vehicle with the worst channel quality in the group. The contact time and the Shannon theorem can be employed to estimate \(r_{{{\text{max}}}}\). The contact time is considered to be the time during that mmSBS can send data to the vehicle with the worst channel quality in the group, which is bounded by the coverage area of the chosen beampower pair and relies on vehicle speed, beam direction, beam width and transmission power size.
In this paper, the performance of the chosen beampower pair \(\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle { } \in {\mathcal{B}}{\text{\rm P}}_{t}^{l}\) for the vehicle group with the context \(o_{t,i}^{l} \in {\text{\rm O}}_{{G_{t}^{l} }}\) during the time slice t is estimated by
This performance measure in (1) is also approximated as the energy efficiency of the vehicle group with the context \(o_{t,i}^{l} \in {\text{\rm O}}_{{G_{t}^{l} }}\) when this vehicle group gets the beampower pair \(\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle\) during the time slice t.
We consider \(e_{{\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle }} \left( {o_{t,i}^{l} } \right)\) as a random variable, and denote its expected value as \(\tilde{e}_{{\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle }} \left( {o_{t,i}^{l} } \right)\), which is also seen as the expected performance of the beampower pair \(\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle\) under the group context \(o_{t,i}^{l}\).
The goal of the mmSBS’s choosing a subset of the beampower pairs is to maximize the expected energy efficiency at a subset of vehicle groups. In other words, its goal is to maximizing the average expected beampower pair performance. We denote the optimal subset of beampower pairs in the time slice t of the lth sector of the mmSBS as \({\mathcal{B}}{\text{\rm P}}_{t}^{l*} \left( {{\text{\rm O}}_{{G_{t}^{l} }} } \right) = \{ \left\langle {b_{t,j}^{l*} ,p_{t,j}^{l*} } \right\rangle \left( {{\text{\rm O}}_{{G_{t}^{l} }} } \right)j \in \left\{ {1, \ldots ,n} \right\}, n < M^{l} \cdot N^{l} \}\), which depends on \({\text{\rm O}}_{{G_{t}^{l} }} = \{ o_{t,i}^{l} = \left\langle {o_{t,i}^{l.1} , \ldots ,o_{t,i}^{l,D} } \right\rangle i \in \left\{ {1, \ldots ,G_{t}^{l} } \right\}\}\) and its n beampower pairs formally satisfy the formula (2).
At the beginning of system initialization, if the mmSBS already knows the expected beampower pair performance \(\tilde{e}_{{\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle }} \left( {o_{t,i}^{l} } \right)\) for each vehicle group context \(o_{t,i}^{l} \in {\text{\rm O}}_{{G_{t}^{l} }}\) and each beampower pair \(\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle \in {\mathcal{M}}^{l} \times \aleph^{l}\), it could be easy to choose the optimal subset of beampower pairs for each set of the reaching vehicle groups in the lth sector by (2). As shown in the formula (3), the average energy efficiency can be obtained through the total amount of data expected to be received for all time slices.
Usually, the mmSBS has no information about the communication environment, so it must learn the expected beampower pair performance \(\tilde{e}_{{\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle }} \left( {o_{t,i}^{l} } \right)\) over time for each vehicle group \(g_{t,i}^{l}\) with the group context \(o_{t,i}^{l}\).
That is, in order to learn these performances, the mmSBS must explore different beampower pairs for different group contexts over time.
Also, it should exploit the beampower pairs proved to have good performance. Thus, the mmSBS must make a tradeoff between exploring the beampower pairs with the unknown performance and exploiting those with the known high performance.
Next, we will elaborate the EFML scheme, in which the best n beampower pairs are selected from \(\bigcup\nolimits_{l = 1}^{L} {{\mathcal{M}}^{l} \times \aleph^{l} } { }\) based on the incoming vehicle groups with the contexts \(\bigcup\nolimits_{l = 1}^{L} {{\text{\rm O}}_{{G_{t}^{l} }} }\) under each time slice. The EFML’s choice depends on the historical record of the chosen beampower pairs in previous time slices and the corresponding observed beampower performance values. For any set of vehicle groups in any group context in lth sector, the expected average energy efficiency of amount of received data of this sector is estimated by the formula (4).
We regard the regret of learning as the expected difference in the average energy efficiency achieved by a vehicle group and by the learning algorithm. According to (3) and (4), it can be estimated as the formula (5).
The energy efficiencybased FML
Each mmSBS executes the EFML scheme independently. To begin with, the context space of the vehicle group in each sector of the mmSBS will be evenly divided into context subspaces of the same size. Then, the EFML learns the performance in each subspace of different beampower pairs separately. Furthermore, the EFML executes either an exploration action or an exploitation action in each time slice, and it depends on the control function of the system and the contexts of reaching vehicle groups. If the EFML executes an exploration action, the scheme randomly chooses a subset of beampower pairs. Furthermore, in any exploitation process, it will choose the beampower pairs that performed best in the previous time slices. Finally, by observing the average energy efficiency achieved by the vehicle groups in its coverage area, the EFML scheme obtains performance estimating values of the chosen beampower pairs. Thus, the algorithm learns the performance of each beampower pair under each vehicle group context over time.
The pseudocode description of the EFML scheme is listed in Algorithms 1–3. In the lines 1–5 of Algorithm 1, the EFML evenly divides the vehicle group context space \(\bigcup\nolimits_{l = 1}^{L} {{\text{\rm O}}_{{G_{t}^{l} }} }\) into \(L \cdot {\text{\rm O}}_{T}\) Ddimensional subspaces with the equal size, in which \(L\) and \({\text{\rm O}}_{T}\) are the input constants. Then, the EFML scheme initializes each counter \(N_{{\left\langle {b,p} \right\rangle ,s}}^{l}\) for each beampower pair \(\left\langle {b,p} \right\rangle { } \in \bigcup\nolimits_{l = 1}^{L} {{\mathcal{M}}^{l} \times \aleph^{l} }\) and each subspace \(s \in \bigcup\nolimits_{l = 1}^{L} {{\text{\rm O}}_{{G_{t}^{l} }} }\). The function of these counters is to record how many vehicle groups of a certain group context have entered the coverage area of the mmSBS in previous time slices, where the mmSBS had chosen a certain beampower pair.
The counter \(N_{{\left\langle {b,p} \right\rangle ,s}}^{l} \left( t \right)\) represents the total number of vehicle groups with the group context in subspace s that entered the coverage area of the mmSBS whenever the beampower pair \(\left\langle {b,p} \right\rangle\) had been chosen in any of the time slices 1, …, t − 1 of any of the sectors 1, …, L. Moreover, Algorithm 1 also initializes each estimator \(\hat{e}_{{\left\langle {b,p} \right\rangle ,s}}^{l}\) for each beampower pair \(\left\langle {b,p} \right\rangle \in \bigcup\nolimits_{l = 1}^{L} {{\mathcal{M}}^{l} \times \aleph^{l} }\) and each subspace \(s \in \bigcup\nolimits_{l = 1}^{L} {{\text{\rm O}}_{{G_{t}^{l} }} }\), which represents the estimated performance of beampower pair \(\left\langle {b,p} \right\rangle\) for the vehicle groups with the group contexts in subspace s.
In each time slice t, the EFML scheme observes the vehicle group contexts \(\bigcup\nolimits_{l = 1}^{L} {{\text{\rm O}}_{{G_{t}^{l} }} }\) of the \(\sum\nolimits_{l = 1}^{L} {G_{t}^{l} }\) incoming vehicle groups. In the lines 3–4 in Algorithm 2, for each vehicle group context \(o_{t,i}^{l}\), the EFML scheme decides to which subspace the vehicle group context belongs, where it finds out \(s_{t,i}^{l} \in \bigcup\nolimits_{l = 1}^{L} {{\text{\rm O}}_{{G_{t}^{l} }} }\) with \(o_{t,i}^{l} \in s_{t,i}^{l}\). Based on the set \(\bigcup\nolimits_{l = 1}^{L} {{\text{G}}{\mathcal{H}}_{t}^{l} } : = \bigcup\nolimits_{l = 1}^{L} {\left\{ {s_{t,i}^{l} i \in \left\{ {1, \ldots ,G_{t}^{l} } \right\}} \right\}}\) of subspaces, the EFML scheme solves the set \({\mathcal{B}}{\text{\rm P}}_{{{\text{G}}{\mathcal{H}}_{t}^{l} }}^{ue} \left( t \right)\) of underexplored beampower pairs (see line 5 in Algorithm 2) by the formula (6).
In (6), \(K:\left\{ {1, \ldots ,T} \right\} \mapsto {\mathbb{R}}\) is a deterministic monotone increasing control function, which is adopted to decide whether to execute an exploration process or run an exploitation process. The control function \(K\left( t \right)\) should be adequately chosen to ensure that the EFML scheme obtains an expected good performance with respect to its regret. The Theorem 1 in [2] provides an appropriate choice for the control function and it is also described in [3]. For the convenience of readers, we repeat it as follows.
Theorem 1
(Bound for R(T)) For the lth sector, let \(K\left( t \right) = t^{{\frac{2\alpha }{{3\alpha + D}}}} \log \left( t \right)\) and \({\rm O}_{T} = T^{{\frac{1}{3\alpha + D}}}\). If the EFML scheme is executed by adopting these parameters and if Assumption 1 is true, the leading order of the regret R(T) is \(O\left( {nG_{t}^{l} r_{{{\text{max}}}} M^{l} N^{l} T^{{\frac{2\alpha + D}{{3\alpha + D}}}} \log \left( T \right)} \right)\).
There is a detailed proof of Theorem 1 in [2], and the parameters are slightly different but basically the same. For the reader's convenience, we also repeat the description of Assumption 1 here.
Assumption 1
there is \(\alpha > 0\) and \(\beta > 0\) so that for all \(\left\langle {b,p} \right\rangle \in {\mathcal{M}}^{l} \times \aleph^{l}\) and for all \(x,y \in {\text{\rm O}}_{{G_{t}^{l} }}\) in the lth sector, it is true that \(\left {\tilde{e}_{{\left\langle {b,p} \right\rangle }} \left( x \right)  \tilde{e}_{{\left\langle {b,p} \right\rangle }} \left( y \right)} \right \le \beta \left {\left {x  y} \right} \right^{\alpha }\), where \(\left {\left \cdot \right} \right\) represents the Euclidean norm in \({\mathbb{R}}^{D} .\)
For Assumption 1, although the parameters are somewhat different from those described in [2, 3], they are essentially the same. The lines 6–14 in Algorithm 2 show that the EFML scheme will execute the exploration process if there are underexplored beampower pairs. If the number \(u^{l} \left( t \right): = {\mathcal{B}}{\text{\rm P}}_{{{\text{G}}{\mathcal{H}}_{t}^{l} }}^{ue} \left( t \right)\) of underexplored beampower pairs is at least n, the EFML scheme randomly chooses n beampower pairs from them. If the number \(u^{l} \left( t \right)\) of underexplored beampower pairs is less than n, the EFML scheme chooses all the \(u^{l} \left( t \right)\) underexplored beampower pairs. Furthermore, it chooses the \((n  u^{l} \left( t \right))\) beampower pairs \(\left\langle {\hat{b}_{{1,{\text{G}}{\mathcal{H}}_{t}^{l} }}^{l} \left( t \right), \hat{p}_{{1,{\text{G}}{\mathcal{H}}_{t}^{l} }}^{l} \left( t \right)} \right\rangle\), …, \(\left\langle {\hat{b}_{{n  u^{l} ,G{\mathcal{H}}_{t}^{l} }}^{l} \left( t \right),\hat{p}_{{n  u^{l} ,G{\mathcal{H}}_{t}^{l} }}^{l} \left( t \right)} \right\rangle\) from \({\mathcal{M}}^{l} \times \aleph^{l} \backslash {\mathcal{B}}{\text{\rm P}}_{{{\text{G}}{\mathcal{H}}_{t}^{l} }}^{ue} \left( t \right)\), which satisfy the formula (7).
In (7), j = 1, …, \((n  u^{l} \left( t \right))\). As shown in lines 15–17 in Algorithm 2, the EFML scheme will conduct an exploitation action when there are no underexplored beampower pairs, and it will choose the n beampower pairs \(\left\langle {\hat{b}_{{1,{\text{G}}{\mathcal{H}}_{t}^{l} }}^{l} \left( t \right),\hat{p}_{{1,{\text{G}}{\mathcal{H}}_{t}^{l} }}^{l} \left( t \right)} \right\rangle\),…, \(\left\langle {\hat{b}_{{n,G{\mathcal{H}}_{t}^{l} }}^{l} \left( t \right),\hat{p}_{{n,G{\mathcal{H}}_{t}^{l} }}^{l} \left( t \right)} \right\rangle\) from \({\mathcal{M}}^{l} \times \aleph^{l}\), which satisfy the formula (8).
In (8), j = 1, …, n. After choosing the n beampower pairs from each sector respectively, the EFML scheme will reselect no more than n beampower pairs from all the chosen beampower pairs of all the sectors, as described in the lines 8–18 in Algorithm 1. After this, the EFML scheme observes the received data of each vehicle group \(g_{t,i}^{l}\) with the context \(o_{t,i}^{l} \in s_{t,i}^{l} \in {\text{\rm O}}_{{G_{t}^{l} }}\) in each beampower \(\left\langle {b_{t,j}^{l} ,p_{t,j}^{l} } \right\rangle\), and estimates the energy efficiency of each vehicle group \(g_{t,i}^{l}\) according to the formula (1) (see lines 2–3 in Algorithm 3).
Based on these observations and estimations, the EFML scheme updates its internal counters (see lines 4–13 in Algorithm 3), where the weight coefficient \(\zeta\) denotes the contribution of the recently observed beampower pair performance to the current updated beampower pair performance, and usually depends on the empirical values with respect to the communication performance of the system.
Performance evaluation
Simulation settings
Figure 3 shows the simulation scenario adopted in this paper, where the mmSBS’s coverage is partitioned into the four sectors (i.e., L = 4) with the equal size and there are differences in terms of road length and road distribution between the different sectors. Each sector is equally divided from two dimensions, where the mmSBS is firstly taken as the central point for equiangular division and then it is used as the starting point for equal length division. Based on the above spatial division principle, besides paying attention to vehicles’ requesting data content, the LTE eNB should also make the registered vehicles located in the same angle range be the same group, where the furthest group member from the mmSBS is used to determine how far the group is from the mmSBS. For simplicity without loss of generality, the distance from the mmSBS is classified into three levels: Near, Moderate, and Far. The group distance from the mmSBS is regarded as ‘Near’ if it is less than R_{N}. Otherwise, it is regarded as ‘Moderate’ if it is less than R_{M}. Except for these two cases, everything else is considered to be ‘Far’.
Like in [3], the seven types of beams are adopted in the EFML scheme based on the different beam widths (i.e. from 30° to 90° with the step size of 10°) for each sector, in which the number of beams per type is one. Thus, \(M^{l}\) is equal to 7. In addition, we set \(N^{l}\) is equal to 10, which means that the discrete transmission power values are set from 0.1 to 1 Watts with the step size of 0.1. In the simulations, a time slice is set to a fixed length of time by the mmSBS. For each vehicle group, its twodimensional context vector involves the arrival direction dimension and the vehicle group distance dimension (i.e., D = 2). The arrival direction of a vehicle group is defined as the angle between the positive Xaxis in the plane coordinate system with the mmSBS as the origin and the line connecting the center point of the vehicle group with a mmSBS.
We denote the number of twodimensional subspaces in each sector of the mmSBS by \({\text{\rm O}}_{T}\) and set \({\text{\rm O}}_{T} = 18\). Furthermore, the parameter \(\alpha\) and the length of a time slice \(t\) are set to \(\alpha = 0.34\) and \(t = 3 {\text{s}}\), respectively. Thus, according to Theorem 1, the time horizon \(T\) is approximately 6000, and the value of the control function \(K\left( t \right)\) is approximately 2.03. We implement our simulation environment through the eventbased network simulator OMNeT++, in which we also use the vehicular network simulation framework of Veins and the road traffic simulator of SUMO. In addition, we consider the vehicle’s speed between 5 and 10 m/s. The mmWave channel propagation model in our simulation as the following formulation.
where \(p_{i}^{{\text{r}}}\) is the received power of the vehicle i while \(p_{i}^{{\text{t}}}\) is the transmission power of the mmSBS, \(G_{i}^{{\text{t}}}\) and \(G_{i}^{{\text{r}}}\) are the directional transmission gain and directional reception gain between the vehicle i and the mmSBS, respectively, and \(G_{i}^{{\text{c}}}\) is the channel gain between the vehicle i and the mmSBS. The estimation of the gain parameters in the above channel model can be found in [39, 40]. But for the convenience of readers, they are briefly stated as follows. When a mmSBS selects a certain beampower pair to the vehicle i, the transmission gain and the reception gain of this mmWave channel can be estimated by
and
In (10) and (11), \(\theta_{i}^{{\text{t}}}\) denotes the transmitting beam width of the mmSBS, while \(\theta_{i}^{{\text{r}}}\) is the receiving beam width of the vehicle i. In addition, \(\theta_{i}^{{\text{t}}}\) and \(\theta_{i}^{{\text{r}}}\) is the gain of the main lobe, while \(\xi\) represents the gain in the side lobe and \(0 < \xi { } \ll 1\). \(\varphi_{i}^{{\text{t}}}\) denotes the angle between the line connecting the mmSBS with the vehicle i and the center line of the transmitting beam of the mmSBS, and \(\varphi_{i}^{{\text{r}}}\) is the angle between the line connecting the vehicle i with the mmSBS and the center line of the receiving beam of the vehicle i. According to [41], the channel gain \(G_{i}^{{\text{c}}}\) can be given by
where δ (·) denotes the Dirac delta function, and \(\chi_{i}^{{\text{c}}}\) is the amplitude of the path from the vehicle i to the mmSBS, while \(\tau_{i}\) is the propagation delay of the path from the vehicle i to the mmSBS. And \(\tau_{i}\) can be estimated by the following expression.
In (13), \(c\) and \(d_{i}\) are the speed of light and the distance of the path from the vehicle i to the mmSBS, respectively. Wireless signal transmission methods include LineofSight (LOS) transmission and NonLineofSight (NLOS) transmission. When there are buildings and plants between the transmitter and the receiver, the NLOS path will have some problems such as high pathloss, reflection and penetration loss. Here, we consider only one reflection of a given path. According to [41], we can get the estimation of the amplitude of LOS and NLOS path as follows.
where \(\lambda\) denotes the wavelength of the mmWave in this simulation, which can be estimated by \(\lambda = c/f_{{\text{c}}}\) and \(f_{{\text{c}}}\) is the carrier frequency. \(\partial\) is the reflection coefficient of the path between the vehicle and the mmSBS.
Simulation schemes and performance metrics
The EFML scheme is most similar to the works in [2, 3]. However, the work in [2] only considers the onedimensional context vector and unicast communication scenarios. Although the work in [3] considers the twodimensional context vector, it uses the identifier of road and the direction of arrival as the context instead of the arrival direction and the vehicle group distance. Furthermore, the work in [3] aims at maximizing network throughput and does not consider the power adjustment of the base station. Thus, to compare the difference between considering the energy efficiency as optimization goal and taking the amount of received data as optimization goal under the same context, we design several comparison schemes based on the works in [2, 3], which are called VFML and NFML for convenience. The VFML and the NFML schemes retain the core ideas in [2] and [3] (i.e., the optimization target is the amount of data received, the transmission power is consistent, and the vehicle groups mode do not be applied) respectively, but other parts are the same as our EFML scheme.
Also, we adopt a User Experience Quality Assurance (UEQA) for the vehicle group as another comparison of our EFML scheme. Based on the approximate estimation formula \(f\left( {\gamma_{i} } \right) = 1  e^{{  0.5\gamma_{i} }}\) in [42], the bit transmission success rate from the mmSBS to a vehicle can be easily estimated if the signaltonoise ratio (SNR) \(\gamma_{i}\) at this vehicle is known. If we know that the receiving bit error rate (BER) threshold \({\text{BER}}_{i}^{{{\text{th}}}}\) of each vehicle, we can estimate the corresponding SNR threshold value \(\gamma_{i}^{{{\text{th}}}}\) by letting \({\text{BER}}_{i}^{{{\text{th}}}}\) be equal to \(e^{{{  0.5\gamma_{i}{{\text{th}}}} }}\), which can be expressed by
To ensure that the BER level of each vehicle’s receiving data from the mmSBS is not more than \({\text{BER}}_{i}^{{{\text{th}}}}\), the transmission power of the mmSBS should not be lower than the transmission power threshold \(p_{i}^{th}\), which is estimated as follows.
In (16), W and N_{0} represent the bandwidth of mmWave band and the background noise power spectrum density, respectively. In other words, when the mmSBS adopts \(p_{i}^{{{\text{th}}}}\) to send data to the vehicle i, the bit transmission success rate from the mmSBS to the vehicle i can be expressed by
Thus, combined with Shannon theorem, the energy efficiency of the UEQA from the mmSBS to the vehicle i can be expressed by
The energy efficiency of a vehicle group is determined by the vehicle that has the minimum energy efficiency of bit transmission in all the members of a vehicle group. The performance metrics adopted in the simulation experiments are the energy efficiency, the online learning cost, the cumulative received data, and the aggregate received data. The energy efficiency of the EFML, the VFML and the NFML is defined in the formula (1) while that of the UEQA is defined in the formula (18). The definition of online learning cost is the number of explorations rounds that each of the three schemes to achieve a certain percentage of the performance of the optimal solution, and all exploration operations in each discrete time slice are regarded as one round of exploration. The cumulative received data for all the three schemes is defined as the amount of data received by all the vehicles during the time horizon T, while the aggregate received data for all the three schemes is defined as the amount of data received by all the vehicles during a time slice. Unless otherwise stated below, the values of the rest simulation parameters are listed in Table 2.
Analysis of simulation results
We evaluate the performance metrics of the EFML scheme compared with the benchmark schemes such as the VFML, the NFML scheme and the UEQA scheme. In Figs. 4, 5 and 6, we investigate the impact of the number of vehicles in the simulation area on the performance metrics, in which no more than 6 selected beampower pairs are employed simultaneously in each time slice and the number of vehicles ranges from 35 to 95 with the step size of 15.
As shown in Fig. 4, we can see that the cumulative received data increases the number of vehicles in the simulation area. The reason is that the fewer the number of vehicles, the less contextual information in the system, which ultimately leads to a poorer learning effect. The mmSBS is also unlikely to accurately select the beam power pair that maximizes the received data. However, the increase in the number of vehicles means that more contextual information can be provided, which is conducive to getting a better learning effect. In this case, the system is more likely to precisely select the beampower pair that maximizes the received data.
It can also be observed in Fig. 4 that the EFML outperforms the other algorithms VFML, NFML and UEQA in the cumulative received data. There are two reasons for this occurrence: for one thing, compared with the NFML and VFML, the EFML can provide service for each vehicle group with the same request (i.e., multicast). Nevertheless, the NFML focuses on unicast communication (i.e., only one of the vehicle group can be serviced) and the VFML only concerns onedimensional context. For another thing, the UEQA only considers the power that meets the worst SNR within the vehicle group, while the EFML allows the power to be adjusted while satisfied the BER threshold of the system. That is, in addition to finding a more appropriate beam orientation and beam width for vehicle groups, it can also provide a more suitable power to reduce the power consumption.
Figure 5 shows the online learning costs of different schemes under the different number of vehicles with the same simulation configurations as Fig. 4. We can observe from Fig. 5 that the number of exploration costs of online learning decreases with the number of vehicles in the simulation area. The main reason is that with the increase of vehicles that enter the system in a scheduling period, the corresponding context subspaces will raise. It causes that the performance of each beampower pair in more subspaces can be detected. If it is found that there is no historical performance data or the recorded historical data is not sufficient, the exploration schedule should start as soon as possible. Therefore, more beams can be scheduled for detection in a scheduling time slice, which is beneficial to speed up the detection process of the performance of each beampower pair in each context subspace and thus effectively reduce the number of exploration rounds.
In Fig. 5, we can also see that the cost of the online learning of the EFML is higher than that of VFML, NFML, and UEQA. This is because these comparison schemes do not pay attention to the dimension of power and the VFML only considers onedimensional context (i.e., the direction of vehicle arrival). In other words, the EFML has a greater learning space than other schemes that do not take into account transmission power adjustment. Therefore, the EFML needs to spend a higher online learning cost than other schemes to select a more appropriate transmission power for each mmSBS.
In Fig. 6, we can see that as the number of vehicles in the simulation area increases, the energy efficiency of the network also increases. The main reason is that under the condition that the number of concurrent beams is limited, the more vehicles means that the probability of selecting the vehicles with relatively high data rate and relatively low power consumption is greater. Moreover, Fig. 6 shows that the energy efficiency achieved by the EFML is better than that achieved by the VFML, the NFML and the UEQA. This phenomenon can be explained in the two aspects. Firstly, the VFML and the NFML adopt a fixed transmission power while the UEQA only considers meeting the worst SNR in a certain vehicle group. That is, the adjustability of the transmission power is not considered in these schemes. However, in addition to meeting the BER threshold for vehicles within the group, the EFML can adjust a more appropriate transmission power for each mmSBS to improve the energy efficiency of the system. Secondly, in the EFML, the vehicles with the same content request and in close proximity are constantly grouped together to share the same mmWave beampower pair and thus they save power consumption and improve the energy efficiency of the system.
In Figs. 7, 8 and 9, we investigate the impact of the number of selected beampower pairs per time slice on the performance metrics, in which the number of vehicles in the simulation area is set to 65 and the number of selected beampower pairs per time slice varies from 2 to 6 with the step size of 1. In Fig. 7, it can be seen that as the number of beampower pairs that can be used concurrently in each time slice increases, the cumulative received data of all schemes also increases. This is because the greater number of selected beampower pairs that can be used concurrently means that the more vehicles can be served at the same time. We can also find from Fig. 7 that cumulative received data achieved by the EFML is higher than that achieved by the other algorithms VFML, NFML and UEQA. The explanation of the difference of cumulative received data between different schemes is similar to that of the result in Fig. 4.
From Fig. 8, we observe that the number of exploration rounds decreases with the number of selected beampower pairs per time slice. The more beampower pairs that can be used simultaneously in each time slice, the more beampower pairs with unknown or uncertain performance information that can be explored at the same time slice. Therefore, when the number of context spaces of the system is fixed, the number of exploration rounds will decrease as the number of selected beampower pairs per time slice increases. It can also be seen that the cost of online learning of the EFML is higher than that of other schemes, and the explanation of the difference among different schemes is similar to that of the results in Fig. 5.
Figure 9 shows that the energy efficiency slightly decreases with the number of selected beampower pairs per time slice. As mentioned earlier, in any exploitation process, the EFML will select the beampower pairs that have been proved to have the best performance in the previous time slices. In this case, the average energy efficiency will be higher if a smaller number of optimal beampower pairs are selected. We can also see from Fig. 9 that the energy efficiency achieved by the EFML is higher than that achieved by the VFML, the NFML and the UEQA, and the explanation for this difference is similar to that of the results in Fig. 6.
In Figs. 10, 11 and 12, we investigate the effect of different thermal noise power density on the performance metrics, in which the number of selected beampower pairs per time slice is set to 6, the number of vehicles in the simulation area is set to 65, and the thermal noise power density ranges from − 170 to – 150 dBm/Hz in steps of 5 dBm/Hz. It is obvious from Fig. 10 that cumulative received data decreases with thermal noise power density. This is due to the fact that the SNR of any receiver will be affected by thermal noise density. It is easy to know that the data rate will decrease with the SNR by the Shannon theorem.
Furthermore, we can see from Fig. 10 that the cumulative received data of UEQA is almost unaffected when thermal noise power density is relatively small. This is because that the UEQA adjusts the transmission power to maintain a certain SNR to meet the BER threshold of the system according to the formula (16), and thus the cumulative received data remains almost unchanged. However, when thermal noise power density is too large, the transmitter may not meet the BER threshold of the system even if the transmission power is adjusted to the maximum value. At this time, the cumulative received data will decrease with thermal noise power density. It can also be seen from Fig. 10 that the cumulative received data of the EFML is more than that of the VFML, the NFML and the UEQA, where the explanation of the reason for the difference is similar to the explanation of the result of Fig. 4.
It can be seen from Fig. 11 that thermal noise density has almost no effect on the cost of online learning. This is because that the size of context space required by the online learning algorithm will not change with the change of thermal noise density. Moreover, the explanation of the difference in the number of the online learning costs among different schemes is similar to the explanation of the results in Fig. 5.
From Fig. 12, we can observe that the network energy efficiency will decline significantly as thermal noise density increases. This decrease is due to the two reasons. On the one hand, we can know that the cumulative received data decreases with thermal noise power density based on the results in Fig. 10. On the other hand, no matter how the environmental noise changes, the VFML and the NFML always maintain a fixed transmit power. However, as the thermal noise density of the system increases, the EFML and the UEQA must increase the transmission power of each mmSBS to ensure that each vehicle satisfies the SNR threshold of BER, which reduces the network energy efficiency. We can also see from Fig. 12 that the difference in the network energy efficiency among different schemes, and the explanation of this difference is similar to the result in Fig. 6.
In Figs. 13, 14 and 15, we analyze the performance metrics achieved by the schemes over the time horizon with 6000 time slices, in which the number of selected beampower pairs per time slice is 6 and the number of vehicles in the simulation area is set to 65. Figure 13 shows that the aggregate received data achieved by different schemes over a time horizon of 6000 time slices. Specifically, it also shows that the EFML can achieve better performance than the other schemes after the 2200th time slice. The fluctuations on the graph are caused by the number of the vehicle groups, the contact time and the speed of each vehicle.
We can see from Fig. 13 that the aggregate received data per time slice achieved by the EFML began to show an upward trend after the 1300th time slice and higher than the UEQA after the 2200th time slice. This is because that the context space of the EFML is larger than that of the VFML, the NFML and the UEQA. Due to the insufficient online learning before the 1300th time slice, most of the beampower pairs allocated by the EFML to the vehicles are selected randomly. Moreover, the VFML, the NFML and the UEQA may have entered the exploitation phase while the EFML is still in the exploration phase. So, the cumulative received data of the EFML may not be as good as the other schemes before the 1300th time slice. However, after a period of sufficient online learning, the EFML can choose a set of beampower pairs that are more reasonable than the other schemes, including beam directions, beam widths and transmission powers.
By using the same simulation parameters as in Fig. 13, Fig. 14 shows the number of exploration operations per time slice over the time horizon of 6000 time slices. We can observe from Fig. 14 that the number of exploration operations in each time slice will decrease as the number of time slices increases. This is because the number of underexplored beampower pairs decreases in the system over time. It can also be seen from Fig. 14 that the EFML requires more time slices to explore beam performance than the other schemes. Since EFML considers the power dimension, its context subspace is larger than that of other schemes. That is, it will take a longer time for the EFML to enter the exploitation phase.
As can be seen from Fig. 15, after a certain number of time slices, the energy efficiency per time slice achieved by the EFML is higher than that of the UEQA. The main reason is as mentioned above. Since the EFML has more context space than the UEQA, it needs more time slices to explore beam performance. As long as the learning is sufficient, the EFML can adjust and select a better power for the mmSBS to reduce the power consumption and improve the energy efficiency of the system. Also, we see that the VFML and the NFML are always less energy efficient than the other schemes over a time horizon of 6000 time slices. This is because it always selects the maximum transmission power for each mmSBS without reasonable power adjustment.
In Figs. 16, 17 and 18, we investigate the impact of learning information space size on the performance metrics of EFML and NFML, in which “EFML 7 × 2 × 18”, “EFML 7 × 1 × 18”, “NFML 7 × 2 × 10” and “NFML 7 × 1 × 10” represent the different learning information space used by the EFML and NFML respectively. "EFML 7 × 2 × 18" means that each sector in the EFML scheme adopts the beams with seven different widths, the number of beams at each width type is 2, and the contextual subspace of each sector is 18. It is obvious that “EFML 7 × 1 × 18”, “NFML 7 × 2 × 10” and “NFML 7 × 1 × 10” have the similar meanings. Figures 16, 17 and 18 show the cumulative received data, online learning cost, and energy efficiency of EFML and NFML with different learning information space sizes under different numbers of vehicles.
It can be seen from Figs. 16 and 18 that in terms of cumulative received data and energy efficiency, the scheme with a large learning information space is better than that with a small learning information space. This indicates that the performance can be improved by increasing the number of beams with the same width and the granularity of subspace partition under the condition that beam overlap is allowed. In other words, increasing the number of beams and the fineness of the context division will help the mmSBS flexibly select more suitable powerbeam pairs for the vehicle groups. At the same time, we can see that EFML performs better than NFML because the EFML considers vehicle group multicast communication and can serve more vehicle users on the premise of allowing beam overlap and increasing available RF chains. Moreover, we can see that without changing the core idea of the NFML algorithm, the improvement obtained by expanding the learning information space is not obvious, which further illustrates the advantages of this paper.
Combined with Fig. 17, we can also see that, although the performance of "EFML 7 × 1 × 18" is slightly worse than that of "EFML 7 × 2 × 18", the online learning cost of the latter is much higher than that of the former. This means that "EFML 7 × 1 × 18" has a relatively higher performancetocost ratio. Also, we can see that the online learning cost of EFML is higher than that of NFML, and the explanation is similar to that in Fig. 5.
Conclusions
In this paper, we proposed the EFML scheme to improve network energy efficiency in cellularassisted vehicular networks based on the MAB theory. By reducing energy consumption as far as possible under the premise of meeting the basic data rate requirements of vehicle users, the EFML scheme avoids unnecessary power consumption. By grouping the users requesting the same data content in close proximity into the same receiving group, the EFML scheme save mmWave beams and reduce the occupation of RF chains. The simulation results show that, compared with the existing online learningbased mmWave beam selection schemes, the EFML scheme not only improves the energy efficiency but also the amount of data in cellularassisted vehicular networks at the cost of more system overhead. However, there is no difference in terms of beam performance update cost after a certain number of time slices between the EFML scheme and the comparison schemes.
Methods/experimental
The simulation scenario is shown in Fig. 3, where the coverage radius of each mmSBS is 100 m and each mmSBS coverage is partitioned into the four sectors (i.e., L = 4). Each sector is equally divided from two dimensions, where the mmSBS is firstly taken as the central point for equiangular division and then it is used as the starting point for equal length division. We implement our simulation environment through the eventbased network simulator OMNeT++, in which we also use the vehicular network simulation framework of Veins and the road traffic simulator of SUMO. In addition, we consider the vehicle’s speed between 5 and 10 m/s. In order to compare the difference between considering the energy efficiency as optimization goal and taking the amount of received data as optimization goal under the same context, we design two comparison schemes, which are called NFML and VFML for convenience. Also, we adopt the UEQA scheme as another comparison of our EFML scheme.
Availability of data and materials
Not applicable.
Abbreviations
 CITS:

Cooperative Intelligent Transport System
 V2X:

Vehicle to Everything
 LTE:

Long Term Evolution
 5G:

FifthGeneration
 mmWave:

Millimeter Wave
 mmSBS:

MmWave Small Base Station
 FML:

Fast Machine Learning
 IFML:

Improved Fast Machine Learning
 RF:

Radio Frequency
 EFML:

Energy efficiencybased FML
 MIMO:

Multiinput multioutput
 NR:

New Radio
 MAC:

Multiple Access Control
 MAB:

MultiArmed Bandit
 QoS:

Quality of service
 CSI:

Channel State Information
 LOS:

LineofSight
 NLOS:

NonLineofSight
 UEQA:

User Experience Quality Assurance
 BER:

Bit error rate
 SNR:

Signal–noise ratio
References
 1.
A. Alkhateeb, Y.H. Nam, M.S. Rahman, C. Zhang, R. Heath, Initial beam association in millimeter wave cellular systems: analysis and design insights. IEEE Trans. Wirel. Commun. 16(5), 2807–2821 (2017)
 2.
G.H. Sim, S. Klos, A. Asadi, A. Klein, M. Hollick, An online contextaware machine learning algorithm for 5G mmWave vehicular communications. IEEE/ACM Trans. Netw. 26(6), 2487–2500 (2018)
 3.
J.S. Gui, Y. Liu, X.H. Deng, B. Liu, Network capacity optimization for cellularassisted vehicular systems by online learningbased mmWave beam selection. Wirel. Commun. Mobile Comput. 2021, Article ID 8876186, 26 pages (2021)
 4.
S. Han, C.L. I, Z. Xu, C. Rowell, Largescale antenna systems with hybrid analog and digital beamforming for millimeter wave 5G. IEEE Commun. Mag. 53(1), 186–194 (2015)
 5.
P. Kela, M. Costa, J. Turkka, M. Koivisto, J. Werner, A. Hakkarainen, M. Valkama, R. Jantti, K. Leppanen, Location based beamforming in 5G ultradense networks, in Proceedings of IEEE 84th Vehicular Technology Conference (VTCFall), Montreal, QC, Canada (2016), pp. 1–7
 6.
P. Kela, M. Costa, K. Leppänen, R. Jäntti, Locationaware beamformed downlink control channel for ultradense networks, in Proceedings of IEEE Conference on Standards for Communications and Networking (CSCN), Helsinki, Finland (2017), pp. 7–11
 7.
M.S. Sim, Y.G. Lim, S.H. Park, L.L. Dai, C.B. Chae, Deep learningbased mmWave beam selection for 5G NR/6G with sub6 GHz channel information: algorithms and prototype validation. IEEE Access 8(1), 51634–51646 (2020)
 8.
W.Y. Ma, C.H. Qi, G.Y. Li, Machine learning for beam alignment in millimeter wave massive MIMO. IEEE Wirel. Commun. Lett. 9(6), 875–878 (2020)
 9.
Y. Yang, Y. He, D. He, Z. Gao, Y. Luo, Machine learning based analog beam selection for 5G mmWave small cell networks, in Proceedings of IEEE Globecom Workshops (GC Wkshps), Waikoloa, HI, US (2019), pp. 1–5
 10.
D.Y. Zhang, A. Li, M. Shirvanimoghaddam, P. Cheng, Y.H. Li, B. Vucetic, Codebookbased training beam sequence design for millimeterwave tracking systems. IEEE Trans. Wirel. Commun. 18(11), 5333–5349 (2019)
 11.
Y.N.R. Li, B. Gao, X.D. Zhang, K.B. Huang, Beam management in millimeterwave communications for 5G and beyond. IEEE Access 8(1), 13282–13293 (2020)
 12.
J.C. Fan, L.Y. Han, X.M. Luo, J. Huang, Delay analysis and optimization of beam scanningbased user discovery in millimeter wave systems. IEEE Access 8(1), 25075–25083 (2020)
 13.
J. Yang, S. Jin, C.K. Wen, X. Yang, M. Matthaiou, Fast beam training architecture for hybrid mmWave transceivers. IEEE Trans. Veh. Technol. 69(3), 2700–2715 (2020)
 14.
F. Fernandes, C. Rom, J. Harrebek, G. Berardinelli, Beam management in mmWave 5G NR: an intracell mobility study, in Proceedings of IEEE 93rd Vehicular Technology Conference (VTC2021Spring), Helsinki, Finland (2021), pp. 1–7
 15.
H.T. Nguyen, H. Murakami, K. Nguyen, K. Ishizu, W.J. Hwang, Joint user association and power allocation for millimeterwave ultradense networks. Mob. Netw. Appl. 25(1), 274–284 (2020)
 16.
G. Kwon, H. Park, Joint user association and beamforming design for millimeter wave UDN with wireless backhaul. IEEE J. Sel. Areas Commun. 37(12), 2653–2668 (2019)
 17.
Y. Liu, A. Tang, X. Wang, Joint incentive and resource allocation design for user provided network under 5G integrated access and backhaul networks. IEEE Trans. Netw. Sci. Eng. 7(2), 673–685 (2020)
 18.
H. Zhou, X. Chen, S.B. He, C.S. Zhu, V.C.M. Leung, Freshnessaware seed selection for offloading cellular traffic through opportunistic mobile networks. IEEE Trans. Wirel. Commun. 19(4), 2658–2669 (2020)
 19.
H. Zhou, X. Chen, S.B. He, J.M. Chen, J. Wu, DRAIM: A novel delayconstraint and reverse auctionbased incentive mechanism for WiFi offloading. IEEE J. Sel. Areas Commun. 38(4), 711–722 (2020)
 20.
H. Zhou, T. Wu, H.J. Zhang, J. Wu, Incentivedriven deep reinforcement learning for content caching and D2D offloading. IEEE J. Sel. Areas Commun. 39(8), 2445–2460 (2021)
 21.
A. Ali, N. GonzálezPrelcic, R.W. Heath, Millimeter wave beam selection using outofband spatial information. IEEE Trans. Wirel. Commun. 17(2), 1038–1052 (2018)
 22.
A. Ahmed, A. Sam, V. Paul, L. Ying, Q. Qu, T. Djordje, Deep learning coordinated beamforming for highlymobile millimeter wave systems. IEEE Access 6(1), 37328–37348 (2018)
 23.
I. Mavromatis, A. Tassi, R.J. Piechocki, A. Nix, MmWave system for future ITS: a MAClayer approach for V2X beam steering, in Proceedings of IEEE 86th Vehicular Technology Conference (VTCFall), Toronto, ON, Canada (2017), pp. 1–6
 24.
Y.Y. Wang, M. Narasimha, R. Heath, MmWave beam prediction with situational awareness: a machine learning approach, in Proceedings of IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Kalamata, Greece (2018), pp. 1–5
 25.
S.B. Chen, Z.Y. Jiang, S.Z. Zhou, Z.S. Niu, Time sequence channel inference for beam alignment in vehicular networks, in Proceedings of IEEE Global Conference on Signal and Information Processing (GlobalSIP), Anaheim, CA, USA (2018), pp. 1199–1203
 26.
R. Benelmir, S. Bitam, A. Mellouk, Simulated annealingbased beam management for 5G vehicular networks, in Proceedings of IEEE 22nd International Conference on High Performance Switching and Routing (HPSR), Paris, France (2021), pp. 1–4
 27.
K.Z. Ghafoor, L. Kong, S. Zeadally, A.S. Sadiq, G. Epiphaniou, M. Hammoudeh, A.K. Bashir, S. Mumtaz, Millimeter wave communication for internet of vehicles: status, challenges, and perspectives. IEEE Internet Things J. 7(9), 8525–8546 (2020)
 28.
S. Gyawali, S. Xu, Y. Qian, R.Q. Hu, Challenges and solutions for cellular based V2X communications. IEEE Commun. Surv. Tutor. 23(1), 222–255 (2021)
 29.
S. Maghsudi, E. Hossain, Multiarmed bandits with application to 5G small cells. IEEE Wirel. Commun. 23(3), 64–73 (2016)
 30.
M. Hashemi, A. Sabharwal, C.E. Koksal, N.B. Shroff, Efficient beam alignment in millimeter wave systems using contextual bandits, in Proceedings of IEEE Conference on Computer Communications (INFOCOM), Honolulu, HI, USA (2018), pp. 1–9
 31.
X.T. Li, R.T. Zhou, Y. Zhang, L. Jiao, Z.P. Li, Smart vehicular communication via 5G mmWaves. Comput. Netw. 172, 107173 (2020)
 32.
V. Va, T. Shimizu, G. Bansal, R. Heath, Online learning for positionaided millimeter wave beam training. IEEE Access 7(1), 30507–30526 (2019)
 33.
H. Echigo, Y. Cao, M. Bouazizi, T. Ohtsuki, A deep learningbased low overhead beam selection in mmwave communications. IEEE Trans. Veh. Technol. 70(1), 682–691 (2021)
 34.
C.M. Yetis, E. Björnson, P. Giselsson, Joint analog beam selection and digital beamforming in millimeter wave cellfree massive mimo systems. IEEE Open J. Commun. Soc. 2, 1647–1662 (2021)
 35.
Y. Wang, A. Klautau, M. Ribero, A.C.K. Soong, R.W. Heath, Mmwave vehicular beam selection with situational awareness using machine learning. IEEE Access 7(1), 87479–87493 (2019)
 36.
M. Saquib Khan, Q. Sultan, Y. Soo Cho, Position and machine learningaided beam prediction and selection technique in millimeterwave cellular system, in 2020 International Conference on Information and Communication Technology Convergence (ICTC), 2020, pp. 603–605
 37.
D. Li, S. Wang, H. Zhao, X. Wang, Contextandsocialaware online beam selection for mmwave vehicular communications. IEEE Internet Things J. 8(10), 8603–8615 (2021)
 38.
P. Auer, N. CesaBianchi, P. Fischer, Finitetime analysis of the multiarmed bandit problem. Mach. Learn. 47(2), 235–256 (2002)
 39.
O. Semiari, W. Saad, M. Bennis, Z. Dawy, Interoperator resource management for millimeter wave multihop backhaul networks. IEEE Trans. Wirel. Commun. 16(8), 5258–5272 (2017)
 40.
Q. Xue, X. Fang, C.X. Wang, Beamspace SUMIMO for future millimeter wave wireless communications. IEEE J. Sel. Areas Commun. 35(7), 1564–1575 (2017)
 41.
P. Liu, J. Blumenstein, N.S. Perovic, M. di Renzo, A. Springer, Performance of generalized spatial modulation MIMO over measured 60GHz indoor channels. IEEE Trans. Commun. 66(1), 133–148 (2018)
 42.
H.L. Ren, M.Q.H. Meng, Modeling of joint topology control and power scheduling for wireless heterogeneous sensor networks. IEEE Trans. Autom. Sci. Eng. 6(4), 610–625 (2009)
Acknowledgements
The authors would like to acknowledge the anonymous reviewers for their thoughtful comments.
Funding
This work was supported in part by the National Natural Science Foundation of China (61873352).
Author information
Affiliations
Contributions
JSG presented the scheme and designed the experiments. YL did the experiments, analyzed the data, and explained the simulation results. JSG drafted this paper. Both authors read and approved the final manuscript.
Authors’ information
Jinsong Gui received the PhD from Central South University, China, in 2008. He is currently a Professor in School of Computer Science and Engineering, Central South University, China. Yao Liu is currently a Master Student working in School of Computer Science and Engineering, Central South University, China.
Corresponding author
Ethics declarations
Competing interests
The authors declare that there is no conflict of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gui, J., Liu, Y. Enhancing energy efficiency for cellularassisted vehicular networks by online learningbased mmWave beam selection. J Wireless Com Network 2022, 1 (2022). https://doi.org/10.1186/s13638021020805
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13638021020805
Keywords
 Energy efficiency
 Vehicular networks
 Fast machine learning
 Millimeter wave
 Beam selection