Latency optimization for D2D-enabled parallel mobile edge computing in cellular networks

Edge offloading, including offloading to edge base stations (BS) via cellular links and to idle mobile users (MUs) via device-to-device (D2D) links, has played a vital role in achieving ultra-low latency characteristics in 5G wireless networks. This paper studies an offloading method of parallel communication and computation to minimize the delay in multi-user systems. Three different scenarios are explored, i.e., full offloading, partial offloading, and D2D-enabled partial offloading. In the full offloading scenario, we find a serving order for the MUs. Then, we jointly optimize the serving order and task segment in the partial offloading scenario. For the D2D-enabled partial offloading scenario, we decompose the problem into two subproblems and then find the sub-optimal solution based on the results of the two subproblems. Finally, the simulation results demonstrate that the offloading method of parallel communication and computing can significantly reduce the system delay, and the D2D-enabled partial offloading can further reduce the latency.

(1) We consider the MEC offloading scenarios where parallel communication and computation is performed. The offloading includes offloading to the edge servers via cellular communication and offloading to the neighboring idle MUs via D2D communication. Three different scenarios are considered, i.e., the full offloading, the partial scenario, and the D2D-enabled partial offloading scenarios. (2) For the full offloading scenario, we give a heuristic solution to finding the serving order of the MUs. Then, we jointly optimize the serving order and task segment in the partial offloading scenario. For the D2D-enabled partial offloading scenario, we decompose the problem into two subproblems and then find the sub-optimal solution based on the results of the two subproblems. (3) The simulation results show that when the computing resources of the edge server are insufficient, or the local computing resources are insufficient, or the wireless channel capacity is small, all three cases show better delay performance. In particular, D2D-enabled partial offloading can greatly reduce system latency.
The remainder of this paper is organized as follows. The system model and three different execution models are introduced in Sect. 2. Sections 3, 4 and 5 investigate the full offloading, the partial offloading, and D2D-enabled partial offloading scenarios, respectively. Simulation results are presented in Sect. 6, and the paper is concluded in Sect. 7.

System model
As illustrated in Fig. 1, we consider a D2D-enabled MEC system which consists of one BS equipped with an edge server and K mobile users (MUs) denoted by U = {U 1 , U 2 , . . . , U K } . The MU k can be characterized by a tuple {L k , V loc k } where L k (in bits) represents the size of its task and V loc k (in bits/s) is its local computing capacity. Due to the limited local computing resource, each MU can offload its task to the edge server for processing through a cellular link. Besides, there are some idle MUs which are willing to process the tasks offloaded to them and we refer to them as helpers. We denote the set of the idle MUs as H = {H 1 , H 2 , . . . , H N } . It is assumed that each MU can establish a D2D link with at most a helper. When N ≥ K , each MU can be assigned a helper and we denote the set of K helpers as H = {H 1 , H 2 , ..., H K } . When N < K , for the convenience of analysis , we still assume there are K helpers and the last K − N helpers, i.e., {H N +1 , H N +2 , ...H K } , are the virtual MUs with zero-capacity D2D links and zero-computing power. As shown in Fig. 2, the task L k can be divided into three parts for local computing, edge server processing, and helper processing, respectively. It is assumed that the delay for task segmenting can be ignored. The computing capacities of H n (1 ≤ n ≤ K ) and the edge server are denoted as V help n and V edge , respectively.

Communication models
There are two kinds of wireless communication: cellular communication and D2D communication. We assume that D2D communication and cellular communication use different frequency bands and the time division multiple access scheme is adopted in D2D communication. Thus, there is no inter-user interference in the considered system.

D2D communication
We assume that U k and H n establish a D2D link. p d2d k,n indicates the transmit power of U k and h d2d k,n denotes the Rayleigh channel gain. If H n does not actually exist, the transmit power and channel gain are both zero, i.e., p d2d k,n = 0 and h d2d k,n = 0 . According to Shannon's formula, the achievable rate of the D2D link between U k and H n is given as where B 0 denotes the bandwidth for D2D communication, and N 0 represents the background noise power.

Cellular communication
It is assumed that all MUs in the system can establish cellular links with the edge cloud. MUs offload computation tasks in a certain order. In other words, only one MU communicates with the edge cloud at a time. Let B 1 denotes the bandwidth for cellular communication, p edge k is the transmit power of U k , and h edge k denotes the Rayleigh channel gain between U k and the BS. According to Shannon's formula, the achievable rate of the cellular link of U k is given as where N 1 denotes the background noise power for cellular communication.

Executing models
The task of each MU can be processed locally or offloaded to the helper and the edge server. Thus, there are two kinds of offloading: D2D offloading and MEC offloading.

Local computation
When U k processes its task locally, the related computation delay is given as We define the system local delay as the maximum of the local computation delay of the MUs, i.e.,

D2D offloading
The time division multiple access scheme is adopted in D2D communication. A time frame is divided into K time slots, corresponding to K MUs. U k can only communicate with H n in the corresponding time slot. For convenience, we normalize the time slot as t k (t k ∈ [0, 1]) . Therefore, the average transmission rate between U k and H n in a time frame can be expressed as R k,n = t k r d2d k,n . The transmission delay is denoted as D d2d,t k,n = L k R k,n = L k t k r d2d k,n , and the computing delay of H n is given as D d2d,c k,n = L k V help n . We assume each helper starts processing data only after the whole data are received, thus the D2D offloading delay for U k can be expressed as Similar to Sect. 2.2.1, we define the system D2D offloading delay as the maximum of the D2D offloading delay of the MUs, i.e.,

MEC offloading
We assume that the transmission and computation tasks are implemented in two sequences in parallel, as shown in Fig. 3. Specifically, an MU cannot transmit (calculate) its task until the task of the last MU has been transmitted (calculated) completely. Besides, each MU's task can be calculated only after the data reception is finished. For example, U 1 first offloads tasks to the edge server, and the edge server performs calculations immediately after the task is transmitted. At the same time, U 2 starts to offload the task. The subsequent MUs carry out transmission and calculation in turn. If the task offloaded to the edge cloud has been computed and the next user's task has not yet been transmitted, the server is in a waiting state. Then, the total latency consumed to executed the first k MUs' tasks by edge server is computed as where D comp k = L k V edge is the computing delay and D wait k represents the waiting time of the edge server before computing the data of U k+1 and is given as is the transmission delay. Finally, based on (7) and (8), the total delay of the MEC offloading can be expressed as In the following, depending on the place where the tasks are performed, we will consider three different scenarios: full offloading, partial offloading, and D2D-enabled partial offloading.

Sub-optimal solution to the full offloading scenario
In this section, we analyze the latency-minimization problem for edge cloud computing model and give specific optimization problems. Then, we propose an heuristic algorithm to solve the problem.

Problem formulation
In the full offloading scenario, all MUs' tasks are offloaded to the edge server for processing. According to Sect.2.2.3, the delay of MEC offloading includes three parts: the calculation delay of the edge server, the waiting delay of the edge server, and the transmission delay of the first MU. The computation resource of edge server is fixed, and the computation delay is determined when the MUs' computation task is generated. However, the waiting delay of the edge server and the transmission delay of the first MU are related to the task offloading order of MUs. We introduce a vector ϕ = [ϕ 1 , ϕ 2 , . . . , ϕ K ] T ∈ R K ×1 which determines the transmission (calculation) order of the MUs, i.e., ϕ k means that

Fig. 3 MEC offloading
the task of the ϕ k th MU is offloaded at the kth position. The total delay of this scenario can be obtained by (7) and (9) and is expressed as: Our goal is to minimize the total system latency for all MUs, i.e., where (11b) denotes the value range of the index vector.

Sub-optimal solution
There will be K! offloading sequences for K users. If we use the exhaustive method to search for the optimal solution of P1, the algorithm time complexity is O(K!) which is significantly high. In this section, we propose a heuristic algorithm for P1, as shown in Algorithm 1. In Algorithm 1, the MU with the shortest transmission delay is chosen as the first one for offloading. Then, we divide all MUs into two parts: Ensure that the waiting delay of the edge server is zero when the first part of MUs transmit(calculate) the tasks.
. . , U ϕ k 1 }, the server has no idle period when calculating the tasks of these MUs.

Proof
When processing the tasks of the first k 1 MUs, some tasks will be cached on the edge server. Therefore, when processing subsequent MUs, even if D tran is not met, the server may not be in a waiting state but continue to calculate the previously cached computation tasks.
The result of Algorithm 1 is not the optimal solution of P1, but the sub-optimal solution. The optimal solution and the sub-optimal solution are denoted as D edge * and D edge * * , respectively. We consider an ideal situation: The edge server is not in a waiting situation during the entire process of computing all MU tasks. The system delay in this case is expressed as D

Sub-optimal solution to the partial offloading scenario
In the above sections, we have analyzed the local computing model and the edge cloud computing model, and we have discussed the performance of the full offloading scenario. In this section, we study the performance of partial offloading scenario. By utilizing the computing resources in both MUs and edge cloud, the system delay can be shorter than that of the full offloading scenario. In the following, we first formulate the latency-minimization problem and then give out the solution.

Problem formulation
In the partial offloading scenario, each MU's task L k is partitioned into two parts with a split parameter k : One with k L k bits remains for local computing, and the other with (1 − k )L k bits is offloaded to the edge server for processing. According to 2.2.1, 2.2.3, and 3.1, the local delay and the offloading delay in the partial offloading scenario are given as and respectively. Then the system delay minimization problem is formulated as

Sub-optimal solution
Before presenting the proposed algorithm, we first introduce some lemmas.

Lemma 2
Given the computation and communication resources and the amount of data, the total system delay cannot be minimal as long as some computing or communication resources are idle.

Proof
It can be inferred from Lemma 2 that all MUs should finish their local computation at the same time. Thus, the splitting ratio of each MU's task is proportional to its local computing resource. We introduce a variable h and the division ratio k is represented as Then, according to (12) the local calculation delay can be denoted as

Lemma 3
If and only if D loc = D edge , the sub-optimal solution of P2 is obtained.
According to Lemma 3, we need to find an optimal h * to make D loc = D edge . Thus, Algorithm 2 is proposed.
With the increase of h, the local calculation delay linearly increases, while the edge offloading delay linearly decreases. Therefore, Algorithm 2 must converge. In addition, the complexity of Algorithm 2 mainly depends on the number of iterations, which is determined by the iteration step s. In the iterative process, D edge is calculated by Algorithm 1, so the time complexity of Algorithm 2 is O( K 2 s ) . Since Algorithm 1 can only obtain the sub-optimal solution of P1, the result obtained by Algorithm 2 is the sub-optimal solution of P2.

Sub-optimal solution to the D2D-enabled partial offloading scenario
In the previous section, we studied the performance of partial offloading scenario. On this basis, we will study the latency-optimization problem of introducing D2D communication into the MEC system. Each MU's task is divided into three parts, one part remains local, and the other two parts are offloaded to its corresponding helper and the edge server, respectively. The system delay is further reduced by utilizing the computation resource in all MUs, helpers, and edge cloud. In the following, we first formulate the latency-minimization problem which is not convex and it is difficult to solve it. We decompose it into two problems to solve separately and then get the solution of the original problem.

Problem formulation
We first introduce two parameters α k and β k for U k . Then, the amount of data that is offloaded to edge server is denoted as α k L k . We also denote by (1 − α k )β k L k the amount of data that is processed locally and (1 − α k )(1 − β k )L k the amount of data that is offloaded to the corresponding helper, respectively. Then, according to 2.2.1, 2.2.2, and 2.2.3, the delays of each part in this scenario can be expressed as and respectively. Finally, the delay minimization problem under D2D-enabled partial offloading is formulated as It is not easy to find the sub-optimal solution to P3 due to its non-convexity.

Problem decomposition
In order to solve P3, we first introduce a lemma based on Lemma 3.

Lemma 4
If and only if D loc = D d2d = D edge , the sub-optimal solution to P3 is obtained.

Proof
See "Appendix D".
Although D loc and D d2d depend on α , according to (17) and (19), we can see both D loc and D d2d are linearly inversely related to α . We start by solving over β and t for a fixed α and ϕ , the magnitude relation of the obtained result i.e., D loc* and D d2d* do not change any more when α changes. Note that the optimal β and t do not depend on α and ϕ . In other words, the two variable sets {α, ϕ} and {β, t} are independent of each other. Hence the proposed splitting into subproblems P4 and P5 do not suffer any optimality loss compared to P3. Two subproblems P4 and P5 are as follows: and where D l,d* represents the optimal solution of P4.
By analyzing the intrinsic relationship between the two subproblems, we further have the following lemma.

Lemma 5
The sub-optimal solution obtained by P4 and P5 is equivalent to the suboptimal solution of P3.

Proof
See "Appendix E".

Sub-optimal solution
According to the result in Sect. 4.2, the local computing delay of the system is expressed as The D2D offload delay consists of two parts: the D2D-link transmission delay and the helper processing delay. The time division multiple access scheme is adopted in D2D communication, and t represents the normalized length of each time slot. It is assumed that the D2D communication time of MUs can be allocated arbitrarily. In order to enable all helpers to complete computing tasks at the same time, we adopt an on-demand distribution strategy. If the helper has sufficient computing resources, we will allocate more communication resources t k to the D2D link; otherwise, we allocate less communication resources t k to the D2D link. Specifically, we make that the ratio of the communication rate of all D2D links to the computing resources of the device is equal and a certain value, i.e., which leads to and If the D2D communication resource t is not allocated by (26), the computational resources of some helpers are bound to be in an idle state. According to Lemma 1, the system delay is definitely not minimal. Therefore, on-demand distribution strategy (25) does not result in optimality loss compared to the optimization problem P3. From (18) and (25), the helper processing delay can be expressed as is the combined rate of D2D transmission and helper processing.
Similar to (24), the D2D offloading delay of this model is expressed as Then, P4 can be rewritten as whose sub-optimal solution is given by the following lemma.

Lemma 6
The sub-optimal solution of P6 is given by Based on Lemma 6, we can derive the closed-form solutions to P4 and P6, as where r * = h * + f * . After getting D l,d* , P5 is equivalent to P2. According to Algorithm 2, the sub-optimal solution D sys* of P5 can be obtained. And then, it can be seen from Lemma 5 that D sys* is also the sub-optimal solution of P3.

Results and discussion
In this section, we provide numerical results to verify the In the simulation, we introduced the MEC model of resource allocation [13] for comparison. In this model, the cellular communication resource and the computing resource of the edge server are allocated to each MU. After the task transmission of all MUs is completed, these data are processed on the edge server at the same time. Here are two scenarios.
(1) Existing full offloading in [13]: In this scenario, all MUs' tasks are offloaded to the edge server for execution. (2) Existing partial offloading in [13]: Each MU's task is partitioned into two parts: One with remains for local computing, and the other is offloaded to the edge server processing. Figure 5 depicts the minimum delay in the five scenarios varies with the number of MUs in the system. By comparing the curves of existing full offloading and proposed full offloading, we can observe that proposed full offloading performs better than existing full offloading. Moreover, as the number of MUs in the system increases, the performance gap between the two scenarios becomes more obvious. In the existing full offloading model, MUs first transfer computing tasks to the edge cloud, and then the edge cloud server starts computing. Therefore, the system delay is the sum of the cellular link transmission delay and the edge cloud server calculation delay. However, in the proposed full offloading model, MUs transmission computing task and the edge cloud processing computing task are performed at the same time. Here, the edge cloud computing delay is greater than MUs transmission delay so the system delay is almost equal to the computing delay. We can get the same conclusion by comparing the curves of existing partial offloading and proposed partial offloading . And then, the performance of D2D-enabled partial offloading is the best among the five scenarios. It is intuitive that when idle helpers computing resources are utilized, the delay of the system must decrease. When there are more MUs in the system, the performance advantages of D2D-enabled partial offloading are more obvious, which indicates that D2D-enabled partial offloading model is suitable for user-intensive scenarios. In Fig. 6, the minimum system delay is the ordinate and the abscissa is the average computing resources of MUs. In this simulation, there were 20 MUs in the system, whose average local computing resources varied between 0.75 and 2.5 Mbps. From this figure, the advantage of proposed partial offloading is more prominent when the local computing resources are scarce compared to existing partial offloading. The reason can be explained as follows. When there are fewer local computing resources, more data will be offloaded to the edge server. Existing partial offloading is subject to limited communication resources, resulting in large delay. However, the system delay of the partial offloading is less affected by the transmission delay. Therefore, our proposed partial offloading model is suitable for the case of sensor networks with weak computing power. Then, the average local computation resources of all MUs is from 0.75 to 2.5 Mbps, and D2D-enabled partial offloading model shows better performance than other four models. Figure 7 illustrates how the minimum system delay varies with the computing resource of the edge server. It is not difficult to find that the system delays of five different scenarios decrease with the increase of calculating resource of the edge server. Next, the performance gap between existing full offloading and proposed full offloading becomes more evident with the increase of the calculating resource of the edge server. This is because when the computing resources of the edge server are sufficient, the computing delay is small, and the delay of proposed full offloading is also small. However, the delay for existing full offloading decreases slowly because it contains the transmission delay that has not changed. We have the similar observation by comparison of the existing partial offloading and proposed partial offloading. Since the helpers' computing resource is utilized in the D2D-enabled partial offloading, the latency of such systems can be kept to a small level when the calculating resources of the edge server are scarce. Figure 8 shows the minimum system delay versus the radius of the BS. The BS radius is taken from 100 to 450 m, and we assume that there are 20 MUs in the system. It is not difficult to find that the system delays of existing full offloading and existing partial offloading are positively correlated with the BS radius. However, the system delay of three scenarios of parallel communication and computation offloading is approximately invariant. The reason is that the increase in the radius of the BS affects the transmission capacity of the cellular link, thereby increasing the delay of transmitting data over the cellular link. We propose the three scenarios are also applicable when the wireless channel is poor. Figure 9 shows the impact of the average size of MU data on the minimum system delay in five different scenarios. Then the average size of MUs' data is taken from 55 to 90 Mbits, and we assume that there are 20 MUs in the system. More computing tasks are generated, and MUs offload more computing tasks to edge cloud servers. The advantages of the proposed offloading strategy of parallel transmission and calculation are more prominent. From this figure, compared with the existing full offloading and existing partial offloading models, the corresponding proposed full offloading and proposed partial offloading have great performance advantages. D2D-enabled partial offloading further shortens the latency of the system in processing data. So The system delay with the computation capacity of the edge server with K=20 Fig. 8 The system delay versus the radius of BS the three scenarios we propose can also perform well in networks with high-density computing tasks.

Conclusion
This paper proposed a strategy of parallel communication and computation for MEC offloading to improve the quality of service. Three scenarios, namely full offloading, partial offloading, and D2D-enabled partial offloading, were studied and compared. First, we proposed a heuristic algorithm to solve the optimization problem of the proposed full offloading. Although this algorithm could only get a suboptimal solution, the complexity of this algorithm was greatly reduced compared to the complexity of finding the optimal solution. Second, on the basis of Algorithm 1, Algorithm 2 is proposed to find the solution of the partial offloading scenario. Third, the D2D-enabled partial offloading scenario was complicate and we decomposed it into two sub-problems to solve separately. Then it was proved that the solutions of the two sub-problems could form a satisfying solution to the original problem. Finally, the numerical results validated the performance of the proposed algorithms.
In this paper, we utilized the overlay mode of D2D communication, which required a wider spectrum. Considering the limited spectrum resources, we will introduce the underlay mode of D2D communication into the parallel mobile edge computing system in the future work. Another significant direction for future work is to optimize energy efficiency in this multi-user system. System delay(s) Existing full offloading in [13] Existing partial offloading in [13] Proposed full offloading Proposed partial offloading D2D−enabled partial offloading  (8), is established, so the edge server has no idle period. 2. Suppose that when k = k 0 , the theorem holds, i.e., D edge . According to (8), D wait Therefore, the server has no idle period when k = k 0 + 1. Combining 1 and 2, Lemma 1 is proved.

B. Proof of Lemma 2
The total data in the system are represented as L sys = K k=1 L k . The local computing resource of all the MUs is represented as V loc = K k=1 V loc k . The rate of D2D offloading processing data is The total rate of helpers processing data is R help = K k=1 R help k . The system delay can be expressed as During the tasks are calculated, if some resources are idle, the resources participating in the calculation will be smaller than the resources existing in the system. This can be expressed as follows Then we have Thus, Lemma 2 is proved.

C. Proof of Lemma 3
Adequacy: Proof using the counter-evidence method.
Assuming D loc = D edge , the following discussion will be conducted.
1) When D loc > D edge , we have D loc − D edge > 0 . In this case, the total system delay is determined by D loc (i.e., D sys = D loc ). The edge cloud server will be idle for the D loc − D edge . According to Lemma 2, this situation is not a sub-optimal solution.
2) When D loc < D edge , we have D edge − D loc > 0 . In this case, the total system delay is determined by D edge (i.e., D sys = D edge ). The local computing resources of the MUs will be idle for the D edge − D loc . According to Lemma 2, this situation is not the sub-optimal solution. So, when D loc = D edge , the sub-optimal solution of P2 will be got.
Necessity: According to the negative proposition of Lemma 2, if the P2 obtains the suboptimal solution, no resources in the system will be in an idle state during the raw data are calculated. In other words, local device and edge cloud server simultaneously complete the computing tasks, i.e., D loc = D edge . In summary, Lemma 3 is proved.

D. Proof of Lemma 4
Adequacy: Proof using the counter-evidence method. Assuming D loc = D d2d = D edge , the following discussion will be conducted.
1) When D loc > D d2d > D edge , we have D loc − D d2d > 0 , D loc − D edge > 0 . In this case, the total system delay is determined by D loc (i.e., D sys = D loc ). The helpers and the edge cloud will be idle for the D loc − D d2d and D loc − D edge , respectively.
According to Lemma 2, this situation is not an sub-optimal solution. 2) Same as 1), when D loc > D edge > D d2d , this situation is not the sub-optimal solution. 3) Same as 1), when D edge > D loc > D d2d , this situation is not the sub-optimal solution. 4) Same as 1), when D d2d > D loc > D edge , this situation is not the sub-optimal solution. 5) Same as 1), when D edge > D d2d > D loc , this situation is not the sub-optimal solution. 6) Same as 1), when D d2d > D edge > D loc , this situation is not the sub-optimal solution.
So, when D loc = D d2d = D edge , the sub-optimal solution of P3 will be got.
Necessity: According to the negative proposition of Lemma 2, if the P3 obtains the sub-optimal solution, no resources in the system will be in an idle state during the raw data are calculated. In other words, the local, helper, and edge server simultaneously complete the computing tasks. That is D loc = D d2d = D edge .
In summary, Lemma 4 is proved.

E. Proof of Lemma 5
According to Lemma 4, the sub-optimal solution to problem P3 is obtained if and only if D loc = D d2d = D edge . D loc = D d2d = D edge is equivalent to D loc = D d2d = D edge . In other words, we can first make D loc = D d2d under constraint conditions (22b).
respectively. We then have β * k can be obtained via dividing (41) by (43), i.e., It can be seen from Lemma 1 that the sub-optimal solution of P6 is certainly not obtained when β k = 1 or β k = 0. So the P6 solution is the sub-optimal solution D l,d* at Since β k can take any value when α k = 1 , we make β * k =