Skip to main content

A modified LSTM with QoS aware hybrid AVO algorithm to enhance resource allocation in D2D communication


In communication technologies, device-to-device (D2D) communication is essential for resource management and power control, which are major research concerns nowadays. D2D resource allocation involves dividing vital resources, such as time, power, and spectrum, among several devices. Each device can connect to other devices via one or more frequency channels. D2D communication shares the cellular user resources, while signal power transmission causes interference to the users who share the same channel. So, there is a need to control the power of the D2D device to prevent interference. For proper power control and optimization of multi-channel D2D communication, which is a challenging task, we proposed a deep learning approach incorporating a hybrid resource allocation framework. This framework aims to increase the sum rate of D2D user equipment (DUE) while considering quality of service (QoS) factors like limiting interference to cellular user equipment (CUE) and guaranteeing individual DUE rates above a certain threshold. The proposed resource allocation scheme combines two methods, namely a metaheuristic hybrid particle swarm Cauchy approach to African vulture optimization (HPSCAV) and a modified long short-term memory (MLSTM) based approach. The HPSCAV scheme helps to ensure that the QoS constraints are met, while the MLSTM-based approach is utilized for efficient resource allocation by optimizing the power and improving it with HPSCAV. Simulation results validate that the proposed model achieved better performance in various metrics such as system capacity, power consumption, spectral efficiency (SE), and energy efficiency (EE).

1 Introduction

Every new technological advancement impacts how people interact with one another and share information, especially in mobile computing and wireless communication. Wireless technology has advanced from first generation (1G) to fifth generation (5G) during the past few decades. The fifth generation of wireless technology has now begun to spread around the globe. 5G and beyond 5G (B5G) will handle data rates that are thousands of times higher than those of the previous generation, ten times more energy and spectrum efficient, and have a latency of less than one millisecond. 5G makes use of a variety of technologies to meet these demands. Heterogeneous network (HetNet), massive multiple-input multiple-output (massive MIMO), device-to-device (D2D) communication, millimeter wave (mm Wave), and cognitive radio network (CRN) are some of the technologies [1, 2].

D2D communication has emerged as a promising technology for enabling direct communication between nearby devices without relying on cellular infrastructure. This technology has gained significant attention recently due to its potential to enhance network efficiency, increase spectrum utilization, and reduce power consumption [3, 4]. One of the critical challenges in D2D communication is the efficient allocation of resources, such as spectrum, power, and time, among devices to maximize system performance. Resource allocation in D2D communication is a complex problem due to the dynamic nature of the wireless environment and the need to balance conflicting objectives [5, 6]. For example, allocating spectrum resources must be optimized to minimize interference between D2D and cellular users while ensuring that D2D users have sufficient bandwidth to achieve their desired data rates.

Similarly, power allocation must be optimized to ensure that devices communicate reliably while minimizing energy consumption [7]. Efficient resource allocation in D2D communication can bring several benefits. First, it can enhance network capacity and increase overall throughput by enabling devices to share resources effectively. Second, it can improve network coverage and reliability by allowing the devices to communicate directly with each other, bypassing the cellular infrastructure. Third, optimizing power resources can reduce energy consumption and increase battery life. Fourth, it can enable new applications and services, such as peer-to-peer file sharing, multimedia streaming, and collaborative computing. Several approaches have been proposed in the literature to achieve efficient resource allocation in D2D communication. One common approach is to use centralized algorithms, where a central controller is responsible for managing the allocation of resources [8, 9]. In this approach, devices communicate with the central controller to request resources and receive their instructions. While centralized algorithms can effectively optimize resource allocation, they suffer from several drawbacks, including high latency, scalability issues, and the need for a reliable backhaul connection. In distributed algorithms, devices collaborate to allocate resources in a decentralized manner [10, 11]. In this approach, devices communicate directly with each other to negotiate resource allocation and make decisions based on local information. Distributed algorithms can be more scalable and robust than centralized algorithms, but can also be more complex to design and implement. Machine learning techniques have also been proposed for resource allocation in D2D communication [12, 13]. Machine learning algorithms are used to learn the optimal resource allocation policies based on historical data and feedback from the network [14, 15]. Machine learning techniques can effectively handle the complex and dynamic nature of D2D communication, but they also require significant computational resources and training data. Overall, the efficient allocation of resources is critical for realizing the full potential of D2D communication [16, 17]. As the demand for wireless connectivity grows, developing practical resource allocation algorithms that enable efficient and reliable D2D communication is becoming increasingly important. While centralized, distributed, and machine learning approaches have advantages and disadvantages, combining these approaches may be necessary to achieve optimal resource allocation in D2D communication.

2 Methods/experimental

The primary objective of D2D resource allocation is to make use of limited resources to improve overall system performance. One significant challenge in D2D communication is controlling co-tier and cross-tier interference in the cellular network. The other significant challenge is the effective use of power resources, which can reduce energy consumption and increase battery life. To achieve efficient resource allocation and address the issue of interference and power reduction in this article, we have considered interference, power, and data rate as constraints. We have used HPSCAV, a metaheuristic-based optimization technique, to optimize the D2D node. This optimized node is fed as input to the deep learning modified long short-term memory (MLSTM) model, which allocates the resource effectively. As interference and power are controlled in the D2D network, we have achieved better signal-to-interference-plus-noise ratio (SINR), enhanced the system capacity, and reduced the energy consumption of the overall system. From the simulation results, it is clear that the proposed method not only achieved better system capacity, but also improved spectral and energy efficiency compared with existing algorithms.

3 Related works

Song et al. [18] have investigated resource allocation for the D2D communications system, which includes both the uplink and the downlink. A simultaneous uplink and downlink resource allocation approach is presented that assures the signal-to-interference-plus-noise ratio (SINR) of cellular users and D2D pairs while maximizing system capacity. In this work, the author has not considered the metaheuristic approach, which does not guarantee the optimal solution. Cicalo and Tralli [19] have proposed a joint efficient admission control (AC) and radio resource allocation (RRA) method to improve the quality of service (QoS) of the network. The suggested AC method is computationally intensive and might not be scalable for big networks. Further suggested RRA methods will not converge to the best global solution. Le et al. [20] have proposed a joint resource allocation problem of user clustering, power control, and D2D mode selection to increase network throughput. The proposed system ignores the impact of interference, and networks with a high density of user equipment may find the suggested strategy unsuitable. Nouri et al. [21] proposed an iterative search algorithm to achieve the best solution under energy and delay restrictions. Regarding limitations, the author has not considered intercell interference; it may be a problem when small cells of mm Wave are deployed in dense networks. He has also not explained how these techniques impact the QoS. Eslami et al. [22] have proposed the fractional frequency reuse (FFR) method to reduce interference in heterogeneous networks and also performed optimal power control and admission control for the users to maximize the sum rate. Due to not considering the metaheuristic approach, the author cannot guarantee a global optimal solution. Guo et al. [23] have examined the energy efficiency (EE) of cellular networks that support D2D communication from the viewpoint of user fairness and proposed a Lagrangian decomposition-based (LDB) method to enhance the EE in D2D users. The system’s complexity increases as the number of users increases in the network, leading to system capacity degradation. Hao et al. [24] proposed a two-stage iterative algorithm to optimize the EE, spectral efficiency (SE), and queuing delay jointly. As the number of users increases, complexity increases, which leads to the undesired system performance. Ma et al. [25] proposed a centralized and distributed relay selection and power allocation algorithm to reduce the total transmit power and improve the system throughput. The problem of relay selection and power allocation increases with the users and impacts QoS. Sanusi et al. [26] proposed a priced differencing acceptance algorithm to improve D2D user equipment (DUE) access rate and throughput with reduced signaling overhead. Still, it does not go into specific implementation details or provide an in-depth performance evaluation of the discussed approaches. Mohammed et al. [27] presented a non-cooperative game theory (NCG) approach for resource allocation to increase D2D pairs’ EE. The complexity of the game theory approach is high, so it may not be suitable for large networks. Hou et al. [28] proposed a resource allocation algorithm based on D2D communication mode selection. The algorithm achieved the goal of allocating the best communication mode and resources for users with the maximum throughput; this work did not consider mobility or interference. The algorithm is assessed for single cells. Noor Mohammed et al. [29] proposed dynamic sectorization and parallel processing techniques to improve the probability of successful transmission and SINR and, thereby, improve the capacity of the D2D network. Here, the author has not used any optimization or described the control of the power mechanism. Lie et al. [30] proposed a D2D resource allocation and power control (DRAPC) framework to increase signal quality and degree of resource sharing. In this framework, the author assumed that all user equipment's (UEs) transmit with equal power. However, in reality, this is not the case. Interference between D2D links is also not taken into account. Zhang et al. [31] proposed a deep deterministic policy gradient (DDPG) reinforcement learning method for improving the EE in a D2D heterogeneous network. The proposed algorithm is computationally expensive and not applicable in real time; the impact of interference between D2D users is not considered, and the proposed approach assumes that user locations and channel conditions are static. Shi et al. [32] proposed a Stackelberg game (SG)-guided multi-agent deep reinforcement learning (MADRL) approach that allows D2D users to make smart power control and channel allocation decisions in a distributed manner. Here, the author assumed the network was fixed, and the SG framework assumed the evolved NodeB (eNodeB) had full information about the network state and the actions of D2D pairs. Hamdi et al. [33] proposed the Dinkelbach, Hungarian and conjugate gradient methods to maximize EE for mobile devices in energy harvesting systems with D2D offloading capabilities. The proposed algorithm assumes that the energy harvesting process is perfect, meaning there is no energy loss during harvesting. This is not true; there may be energy losses due to inefficiencies in the harvesting devices. Abohashish et al. [34] proposed the unmanned aerial vehicle trajectory optimization (UAV-TO) technique based on reinforcement learning to enhance EE for numerous UEs and maximize the utilization of network resources. The proposed scheme assumes that the UAV knows the channel conditions and the users’ locations perfectly. This assumption may not hold in practice, as the UAV may not have complete information about the network environment, and the proposed scheme does not consider the impact of interference between the UAV and other users in the network. Rajkumar and Mohammed [35] proposed the sequential best throughput seek algorithm (SBTSA) to provide the best throughput to D2D pairs without affecting the QoS of the cellular user equipment (CUE). The SBTSA algorithm does not consider the impact of channel dynamics. Channel dynamics can significantly affect the performance of D2D communication, as the interference between D2D pairs and CUEs can vary depending on the channel conditions. For a mobile edge computing (MEC) system based on non-orthogonal multiple access (NOMA), the authors [36] provided a dynamic optimization model whose goal is to optimize the total EE while satisfying the necessary QoS requirements. The paper also proposes a computational partitioning technique to boost the overall throughput of mobile computing services. One type of limitation is a non-convex optimization problem that is typically difficult to solve. The writer in [37] suggested a way to divide up resources in a way that saves energy while transmitting in uplink–downlink decoupled NOMA heterogeneous networks (HetNets). Subchannel allocation, user association, and power allocation are the two parts of the proposed scheme. The recommended strategy assumes that the base station (BS) have full channel state information (CSI). Since CSI is seldom flawless in reality, an unsatisfactory performance might occur. The author [38] proposed a dynamic optimization strategy to reduce the energy consumption of 5G heterogeneous networks while preserving the necessary capacity and coverage. The proposed method optimizes small-cell switching, power consumption, and carrier allocation for energy efficiency. It also proposes a multi-hop backhauling strategy to effectively utilize the existing infrastructure of small-cell networks for simultaneous dual-hop transmissions. The proposed model does not account for the effect of interference. Interference may seriously impair the functioning of heterogeneous cellular networks.

3.1 Motivation and contribution

3.1.1 Motivation

The literature survey shows that most researchers focused on conventional mechanisms and few game theory approaches; in conventional techniques, researchers focused on enhancing the system throughput, energy efficiency, transmission power, and interference minimization. The game theory method concentrates on battery life, throughput, and energy efficiency. Still, this method has no training phase, unified response, and some degree of uncertainty. So, in this work, we have focused on a metaheuristic algorithm, which has a training phase and provides better accuracy of the results with less computational complexity when compared to conventional and game theory approaches.

In this paper, we proposed a novel hybrid particle swarm Cauchy approach to African vulture (HPSCAV) optimization with a combination of deep learning MLSTM model and a metaheuristic approach for resource allocation in cellular networks.

3.1.2 Contributions

  • A metaheuristic HPSCAV optimization algorithm is considered. This algorithm provides efficient solutions to complex optimization problems. Here, we formulate an objective function that evaluates the sum rate of DUE, interference of CUE, and individual DUE rates. Next, we ensure the QoS constraints, such as limiting interference and maintaining individual rates. This also enhanced to meet QoS constraints.

  • Once the QoS constraints were met, we used the deep learning MLSTM technique. Here, it controls the power and does the resource allocation.

  • The combination of the HPSCAV optimization and the MLSTM based approach is unique. This mechanism provides flexibility and efficiency compared to conventional methods; this hybrid framework allows for a more comprehensive and effective solution to D2D communication by simultaneously optimizing power and data rate and minimizing interference.

  • Extensive simulation results demonstrate significant performance improvements in system capacity, power consumption, SE, and EE compared to existing methods. The proposed model enables effective resource allocation with optimal power while maintaining QoS.

The rest of this research paper is organized as follows. Section 4 covers the system model, Sect. 5 presents the results and discussions, and Sect. 6 discusses the conclusion and future scope.

4 System model

Figure 1 shows the system model for D2D Communication. It consists of a eNodeB which is placed at the center. The CUEs and DUEs are deployed randomly around the eNodeB. DUE shares the CUE resource block when the channel is free. If many users try to use the same resource block, there is interference, making the network vulnerable. Consider multi-channel D2D communications in cellular networks, where D2D pairs can share CUE resources if the total interference of CUE is less than a predetermined threshold. The set of D2D pairs and channels is represented by \({\mathbb{M}}\;{\text{and}}\;{\mathbb{N}}\), with \(\left| {\mathbb{M}} \right| = M\) and \(\left| {\mathbb{N}} \right| = N\), respectively. The transmitter power of the CUE and ith D2D pair is represented as \(po_{C}^{n}\) and \(po_{i}^{n}\), respectively, where the CUE and D2D pair share the same channel n. The bandwidth and noise spectral density are represented by BW, \(NS_{0}\), respectively.

Fig. 1
figure 1

System model of D2D communication

The gain of the channel between the ith D2D transmitter and the jth D2D receiver is labeled as \(h_{i,j}^{n}\). Similarly, the channel gain between the ith D2D transmitter and eNodeB is labeled as \(h_{i,0}^{n}\).

The data rate of DUE is denoted as \(Dr_{i}\), and it is represented as

$$Dr_{i} \left( {\overrightarrow {po} } \right) = \mathop \sum \limits_{{n \in {\mathbb{N}}}} BW\log_{2} \left( {1 + \frac{{h_{i,i}^{n} po_{i}^{n} }}{{NS_{0} BW + \mathop \sum \nolimits_{{l \in {\mathbb{M}}\backslash \left\{ i \right\}}} h_{l,i}^{n} po_{l}^{n} + h_{0,i}^{n} po_{C}^{n} }}} \right)$$

where \(\overrightarrow {po} = \left\{ {po_{1}^{1} ,po_{1}^{2} , \ldots , po_{M}^{N} } \right\}\).

For effective resource allocation of the uplink cellular network to maximize the DUE data rate, minimize the DUE transmission interference to below Ith. To ensure that each DUE’s data rate is not less than \(Dr_{{{\text{th}}}}\).

The optimization problem can be formulated as

$$\begin{array}{*{20}l} {\mathop {{\text{maximize}}}\limits_{{0 \preceq \left( {\overrightarrow {po} } \right)}} } & {\mathop \sum \limits_{{i \in {\mathbb{M}}}} {\text{DR}}_{i} \left( {\overrightarrow {po} } \right)} & {} \\ {{\text{s.t.}}} & {\mathop \sum \limits_{{n \in {\mathbb{N}}}} po_{i}^{n} \le po_{\max } } & {\forall i \in {\mathbb{M}}} \\ {} & {\mathop \sum \limits_{{i \in {\mathbb{M}}}} h_{i,0}^{n} po_{i}^{n} \le I_{{{\text{thr}}}} } & {\forall m \in {\mathbb{N}}} \\ {} & {{\text{DR}}_{{{\mathrm{thr}}}} \le {\text{DR}}_{i} \left( {\overrightarrow {po} } \right)} & {\forall i \in {\mathbb{M}},} \\ \end{array}$$

The first constraint represents the “maximum transmission power” (\(po_{\max }\)) of the DUE, while the second constraint means to minimize interfering with the CUE. The third constraint is related to ensuring the minimum data rate of the DUEs. When D2D pairs are large, the non-convex optimization problem (2) makes it very difficult to find the optimal solution analytically in a short computation time. To address this, a resource allocation strategy based on MLSTM can provide a near-optimal solution in a short period. QoS constraints \(\mathop \sum \limits_{{i \in {\mathbb{M}}}} h_{i,0}^{n} po_{i}^{n} \le I_{{{\text{th}}}}\) and \(Dr_{{{\text{th}}}} \le Dr_{i } \left( {\overrightarrow {po} } \right)\user2{ }\) have to be satisfied as these constraints are often violated; if these constraints are violated, then in highly dense network conditions, the network will not be suitable for communication.

4.1 Scheme of hybrid resource allocation

Figure 2 represents the long short-term memory (LSTM) neural network model, consisting of two modules; each module consists of dense, united layers; normalized channel gain and normalized transmit power are given as input to both modules, and the output is multiplied by the power.

Fig. 2
figure 2

Basic LSTM feed-forward neural network

The hybrid resource allocation scheme combines two methods: the LSTM-based approach \(\left( {\overrightarrow {po}_{l} } \right)\) and the metaheuristic method (\(\overrightarrow {po}_{C}\)). It adaptively selects one of these methods depending on the system’s requirements. To identify \(\overrightarrow {po}_{l}\), LSTM structure is used, which consist of two separate LSTM modules. The input to this LSTM module is normalized channel gain and CUEs normalized transmit power. The normalized channel gain is represented as, \(\hat{h}_{i,j }^{n} = \frac{{\log_{10} (h_{i,j}^{n} ) - \mu_{{\hat{h}}} }}{{\sigma_{{\hat{h}}} }}\), and CUE’s normalized transmit power as \(\frac{{po_{C}^{n} }}{{po_{\max } }}\). Here, \(\mu_{{\hat{h}}} = {\mathbb{E}}_{{h_{i,j}^{n} }} \left[ {\log_{10} h_{i,j}^{n} } \right],\;{\text{and}}\;\sigma_{{\hat{h}}} = \sqrt {{\mathbb{E}}_{{h_{i,j}^{n} }} \left[ {\left( {\log_{10} (h_{i,j}^{n} ) - \mu_{{\hat{h}}} } \right)^{2} } \right]}\). The normalized total transmit power of each D2D pair is determined by the first LSTM module, and it is represented as \(\frac{{\mathop \sum \nolimits_{{n \in {\mathbb{N}}}} po_{i}^{n} }}{{po_{\max } }}\). Each channel transmits power proportion is found through the second LSTM module and it is represented as \(\frac{{po_{i}^{n} }}{{\mathop \sum \nolimits_{{n \in {\mathbb{N}}}} po_{i}^{n} }}\) The LSTM based resource allocation strategy, \(\overrightarrow {po}_{l}\), can be calculated by multiplying the outputs of both LSTM modules by \(po_{\max }\). The LSTM modules are made up of multiple dense layers connected unitedly. The input, weight, and bias of ith dense layer are represented as \(in_{i}\), \(wi_{i}\), and \(bi_{i}\), respectively. The output is obtained by performing the calculation \(wi_{i} in_{i} + bi_{i}\). The output of these layers is forwarded through a “Leaky rectified linear unit (Leaky ReLU) layer,” which filters out any negative values. The Leaky ReLU layer takes \(in_{r}\) as input and output is \(\left[ {in_{r} } \right]^{ + } = {\text{max}}\left( {in_{r} ,0} \right)\).

In (2), the first constraint \(\sum\nolimits_{{n \in {\mathbb{N}}}} {po_{i}^{n} \le po_{\max } }\) is always satisfied by the LSTM structure because sigmoid layer output is between 0 and 1. The output layer of LSTM module 2 uses a softmax activation function \(\frac{{e^{{y_{j} }} }}{{\mathop \sum \nolimits_{j} e^{{y_{j} }} }}\) to convert its input \(y_{j}\) into a probability distribution over multiple classes. In contrast to LSTM module 1, the output of LSTM module 2 is composed of M softmax blocks, each with K outputs. This means that the softmax layer has a total of \(M \times K\) outputs. The output of the ith softmax block represents the part of transmit power for the ith D2D pair over K channels. The training method used for this LSTM is based on unsupervised learning, which means that the LSTM can find the optimal solution independently without relying on labeled data. This makes the training process easier than supervised learning. The LSTM can approximate the optimal solution based on the input data sample. In the training of LSTM, the network’s parameters are updated using the loss function (3). The loss function (\(Lo\)) consists of three controlling parameters, \(\lambda_{1} , \lambda_{2} , {\text{and}} \lambda_{3}\), all of which are positive, and the hyperbolic tangent function, \(tanh\left( \cdot \right)\) i.e., \(tanh\left( {in} \right) = \frac{{1 - e^{ - 2in} }}{{1 + e^{ - 2in} }}\).

$$Lo = \lambda_{1} \mathop \sum \limits_{{i \in {\mathbb{M}}}} Dr_{i} \left( {\overrightarrow {po} } \right) + \lambda_{2} \mathop \sum \limits_{{n \in {\mathbb{N}}}} tanh\left( {\frac{{\left[ {\mathop \sum \nolimits_{{l \in {\mathbb{M}}}} h_{l,0}^{n} po_{l}^{n} - I_{{{\text{th}}}} } \right]^{ + } }}{{I_{{{\text{th}}}} }}} \right) + \lambda_{3} \left( {\frac{{\left[ {Dr_{{{\text{th}}}} - { }Dr_{i} \left( {\overrightarrow {po} } \right)} \right]^{ + } }}{{Dr_{{{\text{th}}}} }}} \right)$$

The loss function \(Lo\) is used to update the parameters of an LSTM for maximizing the sum rate of DUEs \(\sum\nolimits_{{i \in {\mathbb{M}}}} {DR_{i} \left( {\overrightarrow {po} } \right)}\) , while ensuring the interference at CUEs \(\sum\nolimits_{{l \in {\mathbb{M}}}} {h_{l,0}^{n} po_{l}^{n} }\) is below a threshold (\(I_{{{\text{th}}}}\)), and \(Dr_{i} \left( {\overrightarrow {po} } \right)\) is larger than (\(Dr_{{{\text{th}}}}\)). Larger values of \(\lambda_{1}\) emphasizing the maximization of the sum rate, \(\lambda_{2}\) emphasize on limiting interference, and \(\lambda_{3}\) on meeting minimum rate requirements. \([ \cdot ]^{ + }\) is operator used in the loss function to ensure that the second and third terms related to constraints do not affect the loss function value once the constraints have been fulfilled. The use of \(tanh\left( \cdot \right)\) ensures that the loss function does not grow too large.

The training of the LSTM also involves using dropout, which involves randomly ignoring the outputs of hidden nodes, to regularize the learning parameters and prevent overfitting. Despite achieving near-optimal performance, as demonstrated by simulation results, the second and the third constraints of the QoS constraint can still be violated with a non-negligible percentage. To efficiently satisfy QoS constraints, a resource allocation strategy that combines the results from metaheuristic HPSCAV and MLSTM based scheme (\(\overrightarrow {po}_{l}\)) is considered. The HPSCAV scheme is determined by assuming that each DUE allocates the same transmit power (\(po_{{{\text{HPSCAV}}}}\)), resulting in \(\overrightarrow {po}_{c} = po_{{{\text{HPSCAV}}}} \cdot 1_{M.N}\), where 1in in is a vector of all ones with length as \(in\).

An optimization problem can be used to determine the best value of \(po_{{{\text{HPSCAV}}}}\) as per (4),

$$\begin{array}{*{20}l} {\mathop {{\text{max}}}\limits_{{\begin{array}{*{20}c} {0 \le po_{{{\text{HPSCAV}}}} \le \frac{{po_{\max } }}{\mathbb{N}}} \\ {{\text{s.t.}}} \\ \end{array} }} } \hfill & {\mathop \sum \limits_{{i \in {\mathbb{M}}}} {\text{DR}}_{i} \overrightarrow {po}_{C} } \hfill & {} \hfill \\ { } \hfill & {\mathop \sum \limits_{{i \in {\mathbb{M}}}} h_{i,0}^{n} po_{{{\text{HPSCAV}}}} \le I_{{\text{thr }}} } \hfill & {\forall n \in {\mathbb{N}}} \hfill \\ {} \hfill & {{\text{DR}}_{{\mathrm{thr }}} \le {\text{DR}}_{i} \left( {\vec{p}_{{\text{C}}} } \right)} \hfill & {\forall i \in {\mathbb{M}}.} \hfill \\ \end{array}$$

A low-computation exhaustive search can be used to find the optimal solution according to (4) as it involves only one optimization parameter, \(po_{{{\text{HPSCAV}}}}\). The resource allocation strategy, \(\overrightarrow {po}^{*}\), use \(\overrightarrow {po}_{C}\) instead of \(\overrightarrow {po}_{l}\) when either QoS constraints are not met by the LSTM based allocation. The formulation of \(\overrightarrow {po}^{*}\) is shown as (5),

$$\overrightarrow {po}^{*} = \left\{ {\begin{array}{*{20}l} {\overrightarrow {po}_{l} , } \hfill & {{\text{for}}\; \mathop \sum \limits_{{i \in {\mathbb{M}}}} Dr_{i} \overrightarrow {po}_{C} \le f\left( {\overrightarrow {po}_{l} } \right)} \hfill \\ {} \hfill & . \hfill \\ {\overrightarrow {po}_{C} } \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.$$
$${\text{Where}}\quad f\left( {\overrightarrow {po}_{l} } \right) = \mathop \sum \limits_{{i \in {\mathbb{M}}}} Dr_{i} \overrightarrow {po}_{l} \cdot \mathop \prod \limits_{{n \in {\mathbb{N}}}} {\mathbb{I}}_{{in\mathop \sum \limits_{{l \in {\mathbb{M}}}} h_{l,0}^{n} \hat{p}_{l}^{n} \le I_{{{\text{th}} }} }} \mathop \prod \limits_{{i \in {\mathbb{M}}}} {\mathbb{I}}_{{in_{{Dr_{{{\text{th}}}} \le Dr_{i } \left( {\overrightarrow {po}_{l} } \right)}} }}$$

As per (6), for the LSTM based scheme, \({\mathbb{I}}_{in}\) is the indicator function, and \(\hat{p}_{l}^{n}\) represents the transmit power of the lth D2D pair assigned to channel \(n\).

Spectral efficiency (SE) of the system is expressed using (7)

$$\eta_{{{\text{SE}}}} = \frac{{\mathop \sum \nolimits_{{n \in {\mathbb{N}}}} BW\log_{2} \left( {1 + \frac{{h_{i,i}^{n} po_{i}^{n} }}{{NS_{0} BW + \mathop \sum \nolimits_{{l \in {\mathbb{M}}\backslash \{ i\} }} h_{l,i}^{n} po_{l}^{n} + h_{0,i}^{n} po_{C}^{n} }}} \right)}}{BW}$$

Energy efficiency (EE) of the system is calculated using (8)

$$\eta_{{{\text{EE}}}} = \frac{{\mathop \sum \nolimits_{{n \in {\mathbb{N}}}} \log_{2} \left( {1 + \frac{{h_{i,i}^{n} po_{i}^{n} }}{{NS_{0} BW + \mathop \sum \nolimits_{{l \in {\mathbb{M}}\backslash \{ i\} }} h_{l,i}^{n} po_{l}^{n} + h_{0,i}^{n} po_{C}^{n} }}} \right)}}{{po_{i}^{n} }}$$

4.2 Hybrid Particle Swarm Cauchy Approach to African Vulture Optimization (HPSCAV)

Hybrid HPSCAV algorithm combines particle swarm optimization (PSO) and African vulture optimization (AVO) metaheuristic algorithms, which are used in D2D communication for resource allocation. The above mentioned metaheuristic algorithms are optimized by combining PSO, the Cauchy method, and AVO principles. This promotes a balance between exploration and exploitation. HPSCAV is an innovative approach, inspired by vultures’ hunting behavior, which aims to prevent local optima entrapment, enhancing the algorithm’s ability to find globally optimal solutions [39]. By using the HPSCAV algorithm, it is possible to optimize network performance and improve SE, EE and system capacity through the resource allocation in D2D communication.

4.2.1 Initialization stage

From Fig. 1, nodes are randomly selected, and the fitness value is calculated based on (2). The node with the best fitness value is termed as the first-best vulture, i.e., DUE node, and assigned this node to the group, and the second DUE node is termed as the second-best vulture DUE node and assigned to group 2 for all the nodes the fitness value is calculated. Depending upon the fitness value and position, the remaining nodes move toward the respective group. This is done by using (9). The population is dispersed out over the entire search area at this stage using (10),

$$W\left( j \right) = \left\{ {\begin{array}{*{20}l} {best_{1 } } \hfill & {{\text{if}}\;b_{i} = a_{1} } \hfill \\ {best_{2 } } \hfill & {{\text{if}}\;b_{i} = a_{2} } \hfill \\ \end{array} } \right.$$
$${\text{Position}} = rand\left( {N_{p} ,1} \right)*\left( {ub - lb} \right) + lb$$

Here, a1 and a2 are the probability factors for choosing the first-best vulture and second-best vulture, respectively, whose value ranges from 0 to 1, \(b_{i}\) is acquired using a roulette wheel strategy; the lower limit is \(lb\), and the upper limit is \(ub\), several vulture populations are referred to as Np and the solution is mentioned as position.

4.2.2 Fitness calculation

The fitness \(f_{j}\) of each DUE node in the population is calculated for each iteration to obtain the best optimal solution of DUE nodes for both the first and second groups. The best solution is obtained for each group by (11) Roulette Wheel with the probability value within [0,1].

$$b_{i} = \frac{{ f_{j} }}{{\mathop \sum \nolimits_{j = 1}^{n} f_{j} }}$$

4.2.3 Behavior of vulture

The weakest vultures/DUE nodes, those starving, are aggressive and seek food near the most robust node because they need more energy to conduct a proper search.

$$FC = \left( {2*rand_{1} + 1} \right)*s*\left( {1 - \frac{it}{{maxIT}}} \right) + u*( \sin \left( {\frac{\pi }{2}* \frac{it}{{maxIT}}} \right) + \cos \left( {\frac{\pi }{2}*\frac{it}{{maxIT}}} \right) - 1)$$

In (10), hungry vultures are denoted as \(FC\), and the present and total number of iterations are denoted as it and \(maxIT\), respectively. The variable “ \(s\)” represents a random number between − 1 and 1; its value changes according to iteration changes, and “u” is random number between − 2 and 2.

4.2.4 Exploration phase

There are two ways that vultures/ DUE nodes look for solutions in random areas. Having the parameter b1 in the range [0, 1] aids in choosing which method to use through (13) and (14) are used to compare the value of rand b1 from the exploration phase to b1 to select the best method for searching. The vulture’s search is close to one of the best outcomes found by (17), when rand b1 \(\ge\) b1, and rand b1 < b1, vultures are looking for a solution in a new and remote area of the environment by (12)

$$V\left( {j + 1} \right) = W\left( j \right) - \left( {\left| {Y*W\left( j \right) - V\left( j \right)} \right|*FC} \right)$$
$$V\left( {j + 1} \right) = W\left( j \right) - FC + rand_{2} *\left( {u_{b} - l_{b} } \right)*rand_{3} + l_{b}$$

where \(V\left( {j + 1} \right)\) denotes the preceding iteration vulture location according to (15).

4.2.5 Exploitation phase

There are two stages and two approaches in the exploitation phase. The attributes b2 and b3, with values between [0, 1], are being used to choose one of the approaches in each stage. The first stage of DUE node exploitation occurs if \(FC\) is greater than 0.5 but less than 1 (competing over food), and the second stage of DUE node exploitation occurs when \(FC\) is below 0.5

The updated location of the node is indicated by \(V\left( {j + 1} \right)\), one of the best solutions is \(W\left( j \right)\), and random numbers [\(rand_{5}\) and \(rand_{6}\)] in the [0, 1] range represent \(\left( {\sin , \cos } \right)\) function of mathematics.

$$V\left( {j + 1} \right) = \left| {Y*W\left( j \right) - V\left( j \right)} \right|*\left( {FC + rand_{4} } \right) - o_{d}$$

where \(W\left( j \right)\) denotes one top solution of the DUE node, \(V\left( j \right)\) is the current location, \(rand_{4}\) defines random number with [0, 1] range and “\(o_{d}\)” represents the distance of the DUE node to one of the best DUE node of the two groups. A siege flight (15) is chooses if b2 ≥ rand b2, a rotating flight (16) is chosen if b2 < rand b2.

$$V\left( {j + 1} \right) = W\left( j \right) - \left( {M_{1} + M_{2} } \right)$$


$$M_{1} = W\left( j \right)* \frac{{rand_{5} *V\left( j \right)}}{2\pi }*{\text{cos}}\left( {V\left( j \right)} \right)$$
$$M_{2} = W\left( j \right)* \frac{{rand_{6} *V\left( j \right)}}{2\pi }*\sin \left( {V\left( j \right)} \right)$$

PSO is a recursive numerical approach to optimization to enhance node solutions. The Cauchy-based PSO (CPSO) [40] is a variation of PSO that produces new solutions using the Cauchy distribution. This distribution is used to generate random values that reflect the current search state, making the optimization process more robust and effective, especially in the presence of noise or high-dimensional optimization problems.

$$V\left( {j + 1} \right) = \frac{{D_{1 } + D_{2} }}{2}$$
$$D_{1} = best_{1} - \frac{{best_{1} *V\left( j \right)}}{{best_{1} - V\left( j \right)^{2} }}*FC + Cauchy()$$
$$D_{2} = best_{2} - \frac{{best_{2} *V\left( j \right)}}{{best_{2} - V\left( j \right)^{2} }}*FC + Cauchy()$$

Using (20) and (21), the long-tail Cauchy mutation helps trapped nodes escape from local maxima and discover new regions in the network. Using the Cauchy distribution function and a scale parameter \(t = 1\), the Cauchy mutation \(Cauchy()\) is produced which is a random number. Overall, HPSCAV improves the performance of the optimization process on complex problems by combining the advantages of AVO, PSO, and Cauchy mutation. The ability of the algorithm to avoid local optima and locate high-quality solutions is enhanced by using the Cauchy mutation with a long tail and transforming optimized node information to a new search space of the network.

Initially in African vulture optimization, there is a chance of not getting the best optimized node due to local maxima problem so we have added Cauchy mutation in African vulture optimization, i.e., in (20) and (21), which avoids local maxima problem and helps to achieve best node. This best node is used as input to MLSTM for further processing. From (20) onward is a modified expression which is used for the proposed algorithm.

Where \(D_{1}\) and \(D_{2}\) denote DUE node motion, best1 and best2 represent the current iterations prioritized first and second in both groups. Levy flight enhances the algorithm, FC is the calculated starvation rate, \(V(j\)) is the current location, and \(V\left( {j + 1} \right)\) is the updated vulture location according to (22),

$$V\left( {j + 1} \right) = W\left( j \right) - \left| {o_{d} *FC*LevyF\left( q \right)} \right|$$
$$\text{where} \ LevyF\left( q \right) = 0.01* \frac{u*\sigma }{{\left| v \right|^{{\frac{1}{\beta }}} }}$$
$$\sigma = \left( {\frac{{r\left( {1 + \beta } \right)*{\text{sin}}\left( {\frac{\pi \beta }{2} } \right)}}{{r\left( {1 + \beta } \right)*\beta *2*\left( { \frac{\beta - 1}{2}} \right)}}} \right)^{{\frac{1}{\beta }}}$$

where \(u\) and \(v\) are numbered in the range [0, 1]. \(\beta\) is predetermined and the default value of 1.5, the HPSCAV algorithm should be tuned to balance the exploitation and exploration of the solution space and to avoid premature convergence to sub-optimal solutions.

The HPSCAV then uses (13) to (22) to compute the updated best optimized DUE node position in terms of power, and the data rate of the nodes for the entire process is determined. Here, the solution indicates the optimized values concerning the power and data rate of DUEs.

4.3 Workflow of the proposed model

Figure 3 represents the workflow of the proposed model. In this proposed model after initialization, the nodes are divided into two groups, next the node enters into exploration phase when FC value is greater than 1 and the node position is updated based on (13) and (14). If the FC value is less than 1, then the exploitation phase starts; here, again FC value is checked if it is greater than 0.5, and then node positions are updated using (15) and (16). Else the nodes are updated using (17) and (18), now the optimized node is given as input to the MLSTM model for resource allocation. Notations used for the essential parameters are listed in Table 1.

Fig. 3
figure 3

Workflow of the proposed model

Table 1 Notations used for the essential parameters

4.4 Modified long short-term memory (MLSTM)

The hybrid particle swarm Cauchy approach is an optimization algorithm based on the African vulture optimization and Cauchy distribution, applied to the LSTM model known as MLSTM. The proposed model employs the MLSTM to make decisions regarding optimal resource allocation.

Figure 4 illustrates the training process of LSTM; the term LSTM refers to a recurrent neural network (RNN) type that can effectively capture long-term temporal dependencies in sequential data. LSTMs are designed to overcome the problems of traditional RNNs,

Fig. 4
figure 4

Training process of LSTM

(25) to (29) are used to model the LSTM’s forward training process,

$$s_{t} = \sigma \left( {V_{f} \cdot \left[ {u_{t - 1} , y_{t} } \right] + d_{f } } \right)$$
$$j_{t} = \sigma (V_{f} \cdot \left[ { u_{t - 1 } , y_{t} } \right] + d_{i}$$
$$B_{t} = s_{t} *B_{t - 1} + j_{t} *tanh(V_{c} \cdot \left[ {u_{t - 1} , y_{t } } \right] + d_{b}$$
$$Q_{t} = \sigma \left( {V_{q} \cdot \left[ {u_{t - 1} ,y} \right] + d_{q} } \right)$$
$$u_{t} = Q_{t} *tanh\left( {B_{t} } \right)$$

where the activation of the input, forget, and output gates are indicated by \(s_{t} ,j_{t} , \;{\text{and}}\;Q_{t} ,\) \(B_{t} \;{\text{and}}\;u_{t}\) stand for each cell’s and each memory block’s respective activation vectors and the terms \(V\) and \(d\) stand for the individual weight matrix and bias vector.

figure a

The entire population of the DUE nodes is divided into two groups; at first, two random DUE node is selected, and their fitness value is calculated using (2) based upon the fitness value, the node with the best fitness value is termed a first-best node and assigned to the group1, the second node is termed as a second-best node and assigned to group 2 like that for all the nodes fitness value is calculated and update the value of the node, now depending upon the starvation rate FC of a node if it is greater than one than the node enters into the exploration phase it means that the search for the optimized node continues. If the FC is less than one, the node enters into the exploitation phase here; the DUE node passes its information to the next node, which is near that node; thus, after checking all the nodes, the optimized position of the DUE node is given as input to MLSTM; it calculates the fitness value based on the loss function. MLSTM with a low loss function value is chosen as the best node. Thus, MLSTM does the resource allocation and continues until the end of the iterations. The expected use of this algorithm is to optimize resource allocation in a D2D communication system by maximizing data rate, enhancing energy efficiency, and improving the system’s capacity.

4.5 Analysis of computational and space complexity

The time complexity of the MLSTM model is shown in (30)

$$c_{c} = O \, (T * L * n_{{s_{t} }} *n_{{u_{t} }} * 4 + n_{{u_{t} }} * n_{{u_{t} }} * 4 + n_{{u_{t} }} *n_{{Q_{t} }} + n_{{u_{t} }} * 3) + O(N*\left( {T + T*D} \right)$$

since (30) is a first-order technique in which AVO’s computational complexity consists of three fundamental processes: initialization, fitness evaluation, and updating of DUE node position. In the network, the computational complexity of DUE nodes is O(N), searching the best DUE node and updating the best DUE node vector is represented as O(T*N) + O(T*N*D), respectively, where T is number of iterations and D is the dimension. The optimized node is input to the MLSTM model, so the time complexity of MLSTM is less compared with the LSTM model.

(2) Space complexity: The MLSTM model has n D2D pairs, and its space complexity is shown in Eqs. (31) and (32)

$${\text{Space complexity}} = O(\log n)$$

as we are using T iterations and the number of D2D pairs is n, the space complexity becomes

$${\text{Space complexity}} = O(T * \log n).$$

5 Results and discussion

Table 2 shows the simulation parameters of the system model.

Table 2 Simulation parameters

Figure 5 shows the graph between uplink channel capacity (bps) versus transmit power (W), it is observed that at the same transmit power, say 0.5 W, the proposed model has shown better improvement in channel capacity compared to the existing models autonomous power efficient resource allocation algorithm (APERAA), AVO, and CPSO because optimized DUE node based on interference minimization constraint, there are least number of redundant DUE node, so that the uplink system capacity is improved with respect to transmit power.

Fig. 5
figure 5

Uplink channel capacity versus transmit power

Figure 6 compares SE (b/s/Hz) and transmit power (W). It is observed that as the transmission power increases the SE is also increasing because due to the optimized transmit power and minimized interference of the DUE node in the network.

Fig. 6
figure 6

Spectral efficiency versus transmit power

In Fig. 7, a comparison of energy efficiency (b/J) and transmit power is shown; we can see that as transmission power increases energy efficiency also increases because transmission power has an impact on the SINR of the received signal. As SINR increases, the receiver can decode the signal more accurately with fewer retransmissions, which reduces overall energy consumption and improves energy efficiency.

Fig. 7
figure 7

Energy efficiency versus transmit power

Table 3 is derived from Fig. 5. Table 3 shows the overall uplink channel capacity for different transmit power levels. The transmit power levels are listed in the first row, ranging from 0.4 to 1 W. For ease of explanation, we have taken transmit power at 0.4 W. At 0.4 W, the algorithms APERAA, AVO, CPSO, and HPSCAV-MLSTM achieved channel capacity of 8.8 bps, 9.3 bps, 9.5 bps, and 10.63 bps, respectively. Table 3 shows that channel capacity is improved in the HPSCAV-MLSTM model when compared with the prevailing methods.

Table 3 Overall uplink channel capacity versus transmit power

From Table 4, it is inferred that at 0.4 W, the algorithms AVO, CPSO, and HPSCAV-MLSTM achieved 5.68%, 7.95%, and 20.8% improvement in channel capacity, respectively, when compared to the existing model APERAA, and at 0.7 W, AVO, CPSO, and HPSCAV-MLSTM achieved 13.22%,12.5%, and 31.53% improvement in channel capacity, respectively. Similarly, at 1W, the algorithms AVO, CPSO, and HPSCAV-MLSTM achieved 14.35%, 16.67%, and 44.7% improvement, respectively. From this, the proposed model users achieved better channel capacity at lower transmit powers due to the better performance of the optimized model.

Table 4 % improvement in channel capacity when compared with the existing model

Table 5 is derived from Fig. 6. Table 5 shows comparison of spectral efficiency for different transmit power. From Table 5, it is inferred that at 0.4 W, the algorithms APERAA, AVO, CPSO, and HPSCAV-MLSTM achieved, 1.76,1.92, 1.85, and 2.05, respectively, expressed in b/s/Hz, At 0.7 W, the algorithms APERAA, AVO, CPSO, and HPSCAV-MLSTM achieved 1.77,2.36, 2.15 and 2.77, respectively. Similarly, at 1 W, APERAA, AVO, CPSO, and HPSCAV-MLSTM achieved 1.92, 2.6, 2.79 and 3.4, respectively. The capacity of the system is enhanced which shows the positive impact on the spectral efficiency.

Table 5 Comparison of spectral efficiency of the proposed model with the existing one

From Table 6, it is inferred that at 0.4 W, the algorithms AVO, CPSO, and HPSCAV MLSTM achieved 9.09%,10.08%, and 16.48% improvement in SE; at 0.7 W, AVO, CPSO, and HPSCAV-MLSTM achieved 33.33%, 21.47%, and 46.89%, respectively, when compared to existing model APERAA. Similarly, at 1 W, AVO, CPSO, and HPSCAV-MLSTM achieved 35.42%, 45.31%, and 77.08% improvement, respectively, when compared to the existing model APERAA and at even less power transmission user achieved better SE due to the better performance of the optimized model.

Table 6 % improvement in spectral efficiency when compared with the existing model

Table 7 is derived from Fig. 7. Table 7 shows the EE expressed in (b/J) of different algorithms at various transmit power levels. At the transmit power level of 0.4 W, APERAA, AVO, CPSO, and HPSCAV-MLSTM achieved EE of 1.76, 2.33, 1.93, and 2.64, respectively. At 0.7, APERAA, AVO, CPSO, and HPSCAV-MLSTM achieved 1.77, 3.14, 2.92, 3.59, respectively; similarly, at 1 W, APERAA, AVO, CPSO, HPSCAV-MLSTM achieved 1.9, 2.67, 3.25, and 3.69, respectively.

Table 7 Energy efficiency of different algorithms at various transmit powers

From Table 8, it is inferred that at 0.4 W, AVO, CPSO and HPSCAV-MLSTM achieved an improvement of 32.39%, 9.66%, and 50%, respectively; at 0.7 W, AVO, CPSO, and HPSCAV-MLSTM achieved an improvement of 57.39%, 64.97%, and 95%, respectively; similarly at 1 W, AVO, CPSO, and HPSCAV-MLSTM achieved an improvement of 39.06%, 69.27%, and 92%, respectively, in EE when compared with the existing APERAA model. The system’s energy efficiency has improved as the constraint condition is not violated, and DUE nodes with the best-optimized value are taken.

Table 8 % improvement in energy efficiency when compared with the existing model

In Fig. 8, the comparison of SINR with varying distances between D2D pairs is presented. It is illustrated that the proposed model outperformed the existing model in terms of SINR improvement, and it is also observed that the value of SINR increases with decreasing distances between D2D pairs. This is because as D2D devices get closer together, the signal power of the DUE node is increases so the interference power is minimized which enhance the SINR.

Fig. 8
figure 8

SINR versus number of D2D pairs with varying distances between D2D pairs

Figure 9 illustrates the comparison of SINR with varying distances between eNodeB and D2D pairs. It can be noticed that as D2D devices move away from the eNodeB, the received signal strength from the eNodeB decreases, which can result in a higher SINR.

Fig. 9
figure 9

SINR versus number of D2D pairs with varying distances between D2D pairs and  eNodeB

Table 9 is derived from Fig. 8; Table 9 represents the SINR values for different models at different distances between D2D pairs. Here, SINR absolute values are taken. The proposed model, i.e., “HPSCAV-MLSTM” and the “APERAA” model, is compared in the Table 9. The number of D2D pairs varies from 1 to 8, and the SINR values are shown for each method at different Ynn values. For instance, at Ynn = 20 and with 1 D2D pair, the proposed model achieved an SINR value of 837, while the APERAA method achieved an SINR value of 614. Similarly, at Ynn = 30 and D2D pair 1, the proposed model achieved an SINR value of 163, while the APERAA method achieved an SINR value of 114. In the same way at Ynn = 40 and D2D pair 1, the proposed method achieved an SINR of 63, while the APERAA achieved an SINR value of 55. This shows that the proposed model achieved better SINR.

Table 9 SINR versus D2D pairs with different distance values between the D2D pair in meters (m)

Table 10 is derived from Fig. 9; Table 10 shows the SINR values for different models and the number of D2D pairs with the varying distance between the D2D pair and eNodeB. The HPSCAV-MLSTM and APERAA models are compared in this table. The results are presented for three different values of Ymn (distance between the D2D pairs and eNodeB. For instance, when the distance between D2D pairs and eNodeB is 100 m (i.e., Ymn = 100), the proposed method achieves SINR values ranging from 36,478 for a single D2D pair to 4267 for 7 D2D pairs. On the other hand, APERAA achieves SINR values ranging from 10,250 for a single D2D pair to 1250 for 7 D2D pairs. Similarly, the table presents SINR values for the proposed and APERAA methods for Ymn values of 200 and 300 m.

Table 10 SINR versus D2D pairs with Ymn (distance between D2D pair and eNodeB in meters (m))

Figure 10 compares system capacity and transmit power at varying distances between D2D pairs and eNodeB. It has been observed that when the transmit power of a D2D user increases, the D2D system capacity also increases because as the D2D transmission power increases, the D2D receiver has enough signal strength to resist the noise and interference, so the system capacity is improved.

Fig. 10
figure 10

Transmit power versus system capacity for varying distances between D2D pairs and eNodeB

Figure 11 compares system capacity and transmit power for various D2D pair distances. It has been analyzed that for a fixed transmission power, the system capacity is increased because, as the D2D pair distance increases, the interference between the adjacent nodes is decreased.

Fig. 11
figure 11

Transmit power versus system capacity for varying distances between D2D pairs

Table 11 is derived from Fig. 10; from Table 11, it is inferred that at 0.5 W and Ymn = 100 m, HPSCAV-MLSTM achieved 70 bps system capacity, at the same transmission power and Ymn = 200 HPSCAV-MLSTM achieved 92 bps similarly at same transmit power and Ymn = 300 HPSCAV-MLSTM achieved 144 bps system capacity when compared with the existing model APERAA. We can see a similar improvement in the system capacity of the HPSCAV-MLSTM when compared with the existing model at other transmission power levels. From this, it is clear that as the distance between the D2D pairs and eNodeB increases, the interference effect on the D2D pairs decreases. Thus, there is an enhancement in the system capacity.

Table 11 System capacity versus transmit power (W) with Ymn (distance between D2D pairs and eNodeB in meters (m))

Table 12 is derived from Fig. 11; Table 12 shows the system capacity for different transmit power with varying distances between D2D users. At Ynn = 10 m, for the APERAA model, the system capacity increases from 66 bps at 0.5 to 70 bps at 0.9 transmit power, and for HPSCAV-MLSTM from 72 bps at 0.5 W transmit power to 83 bps at 0.9 W transmit power. Similarly, Ynn = 20 m and Ynn = 30 m, HPSCAV-MLSTM achieved better system capacity than the existing APERAA model. As the distance between the D2D pairs increases, the interference caused by the nearby users decreases, which leads to enhancement in the system capacity.

Table 12 System capacity versus transmit power (W) with different Ynn (distance between D2D pairs in meters (m))

The accuracy factors of the system capacity per transmitter in the network are based on channel conditions, interference levels, and system requirements. In the MLSTM model, constraint (2) minimizes the DUE interference and optimizes the power required to maintain the desired QoS. So that overall system capacity per transmission power is improved.

5.1 Accuracy of the results

Table 13 shows the accuracy percentages for four different models (APERAA, AVO, CPSO, and HPSCAV-MLSTM) after 50 and 100 epochs of training. The HPSCAV-MLSTM model achieved the highest accuracy of 95.45% after 100 epochs, and it also had the largest improvement in accuracy (15.43%) from 50 to 100 epochs. Optimized node is obtained from the HPSCAV, and this optimized node is fed to the MLSTM due to that the accuracy of the proposed model is increased.

Table 13 Accuracy comparison of the different models with epochs 50 and 100

6 Conclusion and future scope

Efficient resource allocation is a challenging task in next-generation networks. In this research work, an innovative resource allocation in the D2D communication model was developed. Initially, we performed optimization using the HPSCAV algorithm. The HPSCAV algorithm can strike a balance between exploitation and exploration, reducing the possibility of becoming trapped in local optima and acting as a viable global optimizer. In the next stage, we have combined HPSCAV with MLSTM, a deep learning model. The combined algorithm, i.e., HPSCAV with MLSTM, maximized the sum rate of uplink users while minimizing interference from CUEs, ensuring each DUE’s minimum rate. In this approach, constraints related to power, interference, and data rates are considered, and optimized nodes in terms of power, interference, and data rate are fed to the input of the MLSTM model. The nodes that satisfy the optimization criteria are considered for communication in the network, so the time taken to adjust the weights and biases is minimized which not only reduced the computational complexity, but also increased the accuracy of the proposed model. Results validate that the proposed model achieved better performance regarding channel capacity, SINR, SE, and EE than the prevailing algorithms. Thus, the D2D model demonstrated efficient resource allocation and optimal power allocation. Further, in the future, we can include joint optimization and an energy harvesting scenario for energy-efficient resource allocation.

Availability of data and materials

The datasets used during the current study are available from the corresponding author on reasonable request.



First generation


Fifth generation


Admission control


Autonomous power efficient resource allocation algorithm


African vulture optimization


Beyond 5G


Cauchy-based PSO


Cognitive radio network


Channel state information


Cellular user equipment




Deep deterministic policy gradient


D2D resource allocation and power control


D2D user equipment


Energy efficiency


Evolved Node B


Fraction frequency reuse


Hybrid particle swarm Cauchy approach to African vulture


Lagrangian decomposition based

Leaky ReLU:

Leaky rectified linear unit


Long short-term memory


Multi-agent deep reinforcement learning

massive MIMO:

Massive multiple-input–multiple-output


Mobile edge computing


Modified long short-term memory


Millimeter wave


Non cooperative game theory


Non orthogonal multiple access


Particle swarm optimization


Quality of service


Recurrent neural network


Radio resource allocation


Sequential best throughput seek algorithm


Spectral efficiency


Stackelberg game


Signal-to-interference-plus-noise ratio


Unmanned aerial vehicle


Unmanned aerial vehicle trajectory optimization


  1. O. Hayat, R. Ngah, S.Z. Mohd Hashim, M.H. Dahri, R. Firsandaya Malik, Y. Rahayu, Device discovery in D2D communication: a survey. IEEE Access 7, 131114–131134 (2019).

    Article  Google Scholar 

  2. U.N. Kar, D.K. Sanyal, An overview of device-to-device communication in cellular networks. CT Express 4(4), 203–208 (2018). (ISSN 2405-9595)

    Article  Google Scholar 

  3. J. Lee, J.H. Lee, Performance analysis and resource allocation for cooperative D2D communication in cellular networks with multiple D2D pairs. IEEE Commun. Lett. 23(5), 909–912 (2019).

    Article  ADS  Google Scholar 

  4. W. Lee, K. Lee, Resource allocation scheme for guarantee of QoS in D2D communications using deep neural network. IEEE Commun. Lett. 25(3), 887–891 (2021).

    Article  Google Scholar 

  5. P. Wang, K. Yang, H. Mei, Joint resource allocation algorithm for energy harvest-based D2D communication underlying cellular networks considering fairness. IEEE Commun. Lett. 27(4), 1200–1204 (2023).

    Article  Google Scholar 

  6. P. Mach, Z. Becvar, M. Najla, Resource allocation for D2D communication with multiple D2D pairs reusing multiple channels. IEEE Wirel. Commun. Lett. 8(4), 1008–1011 (2019).

    Article  Google Scholar 

  7. C. Kai, Y. Wu, M. Peng, W. Huang, Joint uplink and downlink resource allocation for NOMA-enabled D2D communications. IEEE Wirel. Commun. Lett. 10(6), 1247–1251 (2021).

    Article  Google Scholar 

  8. S. Liu, Y. Wu, L. Li, X. Liu, W. Xu, A two-stage energy-efficient approach for joint power control and channel allocation in D2D communication. IEEE Access 7, 16940–16951 (2019).

    Article  Google Scholar 

  9. X. Wang, Y. Han, H. Shi, Z. Qian, JOAGT: latency-oriented joint optimization of computation offloading and resource allocation in D2D-assisted MEC system. IEEE Wirel. Commun. Lett. 11(9), 1780–1784 (2022).

    Article  Google Scholar 

  10. Z. Zhang, Y. Wu, X. Chu, J. Zhang, Resource allocation and power control to maximize the overall system survival time for mobile cells with a D2D underlay. IEEE Commun. Lett. 23(5), 880–883 (2019).

    Article  Google Scholar 

  11. S. Ullah, K. Kim, A. Manzoor, L.U. Khan, S.M.A. Kazmi, C.S. Hong, Quality adaptation and resource allocation for scalable video in D2D communication networks. IEEE Access 8, 48060–48073 (2020).

    Article  Google Scholar 

  12. H. Gao, S. Zhang, Y. Su, M. Diao, Joint resource allocation and power control algorithm for cooperative D2D heterogeneous networks. IEEE Access 7, 20632–20643 (2019).

    Article  Google Scholar 

  13. C. He, Q. Chen, C. Pan, X. Li, F.-C. Zheng, Resource allocation schemes based on coalition games for vehicular communications. IEEE Commun. Lett. 23(12), 2340–2343 (2019).

    Article  Google Scholar 

  14. Y. Li, Y. Liang, Q. Liu, H. Wang, Resources allocation in multicell D2D communications for internet of things. IEEE Internet Things J. 5(5), 4100–4108 (2018).

    Article  Google Scholar 

  15. M. Liu, L. Zhang, Resource allocation for D2D underlay communications with proportional fairness using iterative-based approach. IEEE Access 8, 143787–143801 (2020).

    Article  Google Scholar 

  16. S. Dominic, L. Jacob, Distributed resource allocation for D2D communications underlaying cellular networks in time-varying environment. IEEE Commun. Lett. 22(2), 388–391 (2018).

    Article  Google Scholar 

  17. M. Elnourani, S. Deshmukh, B. Beferull-Lozano, Resource allocation for underlay interfering D2D networks with multiantenna and imperfect CSI. IEEE Trans. Commun. 70(9), 6066–6082 (2022).

    Article  Google Scholar 

  18. X. Song, X. Han, Y. Ni, L. Dong, L. Qin, Joint uplink and downlink resource allocation for D2D communications system. Future Internet 11, 12 (2019).

    Article  Google Scholar 

  19. S. Cicalò, V. Tralli, QoS-aware admission control and resource allocation for D2D communications underlaying cellular networks. IEEE Trans. Wirel. Commun. 17(8), 5256–5269 (2018).

    Article  Google Scholar 

  20. M. Le, Q.-V. Pham, H.-C. Kim, W.-J. Hwang, Enhanced resource allocation in D2D communications with NOMA and unlicensed spectrum. IEEE Syst. J. 16(2), 2856–2866 (2022).

    Article  ADS  Google Scholar 

  21. N. Nouri, J. Abouei, M. Jaseemuddin, A. Anpalagan, Joint access and resource allocation in ultradense mmWave NOMA networks with mobile edge computing. IEEE Internet Things J. 7(2), 1531–1547 (2020).

    Article  Google Scholar 

  22. L. Eslami, G. Mirjalily, T.N. Davidson, Spectrum-efficient QoS-aware resource assignment for FFR-based D2D-enabled heterogeneous networks. IEEE Access 8, 218186–218198 (2020).

    Article  Google Scholar 

  23. S. Guo, X. Zhou, S. Xiao, M. Sun, Fairness-aware energy-efficient resource allocation in D2D communication networks. IEEE Syst. J. 13(2), 1273–1284 (2019).

    Article  ADS  Google Scholar 

  24. Y. Hao, Q. Ni, H. Li, S. Hou, G. Min, Interference-aware resource optimization for device-to-device communications in 5G networks. IEEE Access 6, 78437–78452 (2018).

    Article  Google Scholar 

  25. B. Ma, H. Shah-Mansouri, V.W.S. Wong, Full-duplex relaying for D2D communication in Millimeter wave-based 5G networks. IEEE Trans. Wirel. Commun. 17(7), 4417–4431 (2018).

    Article  Google Scholar 

  26. I.O. Sanusi, K.M. Nasr, K. Moessner, Radio resource management approaches for reliable device-to-device (D2D) communication in wireless industrial applications. IEEE Trans. Cogn. Commun. Netw. 7(3), 905–916 (2021).

    Article  Google Scholar 

  27. V.M. Noor Mohammed, P.M. Sreenivasan, T. Ravishankar, S. Hariharan, M. Lakshmanan, Energy-efficient resource allocation for device-to-device communication through noncooperative game theory. Int. J. Commun. Syst. 33, e4279 (2020).

    Article  Google Scholar 

  28. G. Hou, L. Chen, D2D communication mode selection and resource allocation in 5G wireless networks. Comput. Commun. 155, 244–251 (2020). (ISSN 0140-3664)

    Article  Google Scholar 

  29. N.M. Vali Mohamad, P. Ambastha, S. Gautam et al., Dynamic sectorization and parallel processing for device-to-device (D2D) resource allocation in 5G and B5G cellular networks. Peer-to-Peer Netw. Appl. 14, 296–304 (2021).

    Article  Google Scholar 

  30. W.-K. Lai, Y.-C. Wang, H.-C. Lin, J.-W. Li, Efficient resource allocation and power control for LTE-A D2D communication with pure D2D model. IEEE Trans. Veh. Technol. 69(3), 3202–3216 (2020).

    Article  Google Scholar 

  31. T. Zhang, K. Zhu, J. Wang, Energy-efficient mode selection and resource allocation for D2D-enabled heterogeneous networks: a deep reinforcement learning approach. IEEE Trans. Wirel. Commun. 20(2), 1175–1187 (2021).

    Article  Google Scholar 

  32. D. Shi, L. Li, T. Ohtsuki, M. Pan, Z. Han, H.V. Poor, Make smart decisions faster: deciding D2D resource allocation via Stackelberg game guided multi-agent deep reinforcement learning. IEEE Trans. Mob. Comput. 21(12), 4426–4438 (2022).

    Article  Google Scholar 

  33. M. Hamdi, A. Ben Hamed, D. Yuan, M. Zaied, Energy-efficient joint task assignment and power control in energy-harvesting D2D offloading communications. IEEE Internet Things J. 9(8), 6018–6031 (2022).

    Article  Google Scholar 

  34. S.M.M. Abohashish, R.Y. Rizk, E.I. Elsedimy, Trajectory optimization for UAV-assisted relay over 5G networks based on reinforcement learning framework. J. Wirel. Commun. Netw. 2023, 55 (2023).

    Article  Google Scholar 

  35. R. Nagarajan, N.M. Vali Mohamad, Energy-optimized resource and power allocation in an uplink-based underlay device-to-device communication for 5G network. Int. J. Commun. Syst. (2022).

    Article  Google Scholar 

  36. A. Mohajer, M. Sam Daliri, A. Mirzaei, A. Ziaeddini, M. Nabipour, M. Bavaghar, Heterogeneous computational resource allocation for NOMA: toward green mobile edge-computing systems. IEEE Trans. Serv. Comput. 16(2), 1225–1238 (2023).

    Article  Google Scholar 

  37. S. Dong, J. Zhan, W. Hu, A. Mohajer, M. Bavaghar, A. Mirzaei, Energy-efficient hierarchical resource allocation in uplink–downlink decoupled NOMA hetNets. IEEE Trans. Netw. Serv. Manag. 20(3), 3380–3395 (2023).

    Article  Google Scholar 

  38. A. Mohajer, F. Sorouri, A. Mirzaei, A. Ziaeddini, K.J. Rad, M. Bavaghar, Energy-aware hierarchical resource management and backhaul traffic optimization in heterogeneous cellular networks. IEEE Syst. J. 16(4), 5188–5199 (2022).

    Article  ADS  Google Scholar 

  39. B. Abdollahzadeh, F.S. Gharehchopogh, S. Mirjalili, African vultures optimization algorithm: a new nature-inspired metaheuristic algorithm for global optimization problems. Comput. Ind. Eng. 158, 107408 (2021).

    Article  Google Scholar 

  40. H. Wang, Z. Wu, S. Rahnamayan, Y. Liu, M. Ventresca, Enhancing particle swarm optimization using generalized opposition-based learning. Inf. Sci. 181(20), 4699–4714 (2011).

    Article  MathSciNet  Google Scholar 

Download references


Not applicable.


Not applicable.

Author information

Authors and Affiliations



All authors contributed equally to this work and approved the final manuscript.

Corresponding author

Correspondence to Noor Mohammed Vali Mohamad.

Ethics declarations

Competing interests

The authors have no competing interests to declare.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pasha, S.A., Mohamad, N.M.V. A modified LSTM with QoS aware hybrid AVO algorithm to enhance resource allocation in D2D communication. J Wireless Com Network 2024, 12 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: