Deep learning-based computation offloading with energy and performance optimization

With the benefit of partially or entirely offloading computations to a nearby server, mobile edge computing gives user equipment (UE) more powerful capability to run computationally intensive applications. However, a critical challenge emerged: how to select the optimal set of components to offload considering the UE performance as well as its battery usage constraints. In this paper, we propose a novel energy and performance efficient deep learning based offloading algorithm. The optimal offloading schemes of components based on remaining energy and its performance can be determined by our proposed algorithm. All of these considerations are modeled as a cost function; then, a deep learning network is trained to compute the solution by which the optimal offloading scheme can be determined. Experimental results show that the proposed method is superior to existing methods in terms of energy and performance constraints.


Introduction
With the development and popularity of smart terminals referred to as user equipment (UE), various network services and applications continue to emerge. Although UEs have experienced a tremendous increase in computational power over the years, it still cannot process intensive computation and huge data in a short time [1][2][3], for which cloud computing used to be a solution. However, the delay caused by the communication between the UE and the cloud server poses a severe challenge to the feasibility of this typical solution [4]. The European Telecommunications Standards Institute proposed placing small edge servers near end users to reduce network latency, and studied them as mobile edge computing (MEC) [5][6][7].
MEC refers to deploying computing and storage resources at the edge of mobile networks to provide *Correspondence: caosuzhi@csu.ac.cn 1 Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China 2 Key Laboratory of Space Utilization, Chinese Academy of Sciences, Beijing 100094, China Full list of author information is available at the end of the article IT service environments and cloud computing capabilities for mobile networks, thereby providing users with ultra-low latency and high-bandwidth network service solutions. As one of the key technologies, computation offloading [8,9] refers to the technology in which UEs hand over part or all of their computing tasks to the cloud computing environment to address the shortcomings of mobile devices in terms of resource storage, computation performance, and energy efficiency.
Affected by the way of thinking in cloud computing, the existing offloading solutions generally have the following problems: (1) assuming that the server has unlimited computing power, (2) assuming that users have constant uplink and downlink network conditions, and (3) ignoring different user priorities caused by different energy and network conditions [10][11][12].
In this paper, we propose a novel energy and performance efficient deep learning based offloading algorithm (EPED), which partially offload computations from UE to MEC under a comprehensive optimization of UE's energy consumption and performance. Based on the concept of We design a cost function for each deployment method, comprehensively considering the performance and energy consumption [13]. Then, the overall cost of computation offloading can be measured by all components, and finally, the best offloading scheme is determined by a deep learning method under the constraint of the smallest overall cost. The proposed method can adaptively select the optimal combinations of application components to offload, with the smallest cost of execution time and energy consumption. The following summarized our contributions: The rest of this paper is organized as follows. Section 2 presents our energy and performance efficient deep learning based offloading algorithm. Section 3 describes our experiments, comparison with other methods, and how to prepare the training dataset. Section 4 discusses some related work, and Section 5 concludes the paper.

Proposed EPED
The execution process of an application can be divided into several steps. Each of these steps as well as the related data is called a component of the application execution. The component can be either deployed on the local side (UE) or mobile edge server side (MES). An efficient offloading approach should select an optimal part of components to offload to MES but not the whole, aiming to which, EPED is proposed as the following 5 key steps: (1) determines the costs of deploying a component on local side and MES side respectively; (2) designs the cost function formula of offloading scheme, wherein the cost is the dependent variable of offloading decision; (3) searches the best offloading schemes for some specific component states with exhaustive method and (4) the best offloading schemes as well as their component states respectively, and these two parts are then designed as the outputs and inputs of our training dataset; (5) and finally, using a deep neural network, we can get the best offload scheme of any component states from the training dataset.

Local side execution cost
The local side execution cost consists of energy consumption and execution time. Orsini et al. [14] proposed that the execution time can be evaluated by the input data amount needed for a component. But this method ignored that the input data and the processed data were not equal in amount. If we assume that the output of a component is the input of the next component, then we use d n−1,n to denote the input data amount of component c n as well as the output data amount of component c n−1 . Then, the workload of component c is denoted as : where W c is measured in CPU clock cycles and V denotes the number of clock cycles a processor will perform per byte and is measured in cycles per byte. Yang et al. [15] presented the study of this value. O n is the computational complexity of c n and represents the data amplification factor of c n . It is obvious that the input data and the processed data were not equal in amount since the input data may be processed several times by a component; this is why we introduce the denotation of O n . Now, if the component c is deployed and run on the local UE side, its execution time is equal to the time to complete the workload W c , which is given by : where f l is the CPU rate of UE, which is measured in million instructions per second (MIPS). Let the energy consumption due to this workload be E c and is given by: where U is the unit power consumption of UE and is measured in MAH per CPU cycle. If the total energy of the UE is E t , then the remaining energy for the next component c + 1 is given by: After the execution time and energy consumption were determined by formula (2) and (4) respectively, the local side execution cost of component c can be evaluated by: where γ 1 and γ 2 are weighting coefficients which can balance the contribution of time delay and energy consumption in the local cost function respectively.

MES side execution cost
Except local side execution, UE can also offload a component to remote side, i.e., MES to execute. Like the local side, the execution cost of the MES side also includes the execution time while this time is much shorter than the local side. We can represent this time similarly as (2) by: where f r is the CPU rate of MES. The time spent on transfer data from UE to MES should also be considered. This time depends on the mobile internet environment of UE, and this paper only considers the most commonly used 4G environment. 4G communication is implemented by the orthogonal frequency division multiple access (OFDMA) technology. With such technology, the upload and download speed depends on the bandwidth B and the transmission subcarrier number N.
Assuming the same additive white Gaussian noise (AWGN) channel in transmission for uplink and downlink, the maximum achievable uplink and downlink data rate can be easily derived as [16]: where B is the bandwidth, β is the path loss exponent, d is the distance between UE and MES, n is the number of subcarriers that will be allocated for transmission from UE to MES, N o is the noise power, p u and p s refer to the transmit power of UE and MES respectively, h ul and h dl are the channel fading coefficient for uplink and downlink respectively, and g ul and g dl are the required bit error rate for uplink and downlink respectively. (g ul ) = −2log5g ul 3 is the SNR margin to satisfy the required bit error rate with quadrature amplitude modulation constellation.
Using (7) and (8), the time spent for UE to send the input data of component c to MES can be derived as: where p c−1 is the offloading decision of the previous com- Finally, the MES side execution cost is derived as: where γ 3 , γ 4 , and γ 5 are weighting coefficients which can balance the contribution of these three types of time respectively.

Cost function
We have discussed that a component can be either executed locally or remotely, for which the cost function is shown as (5) and (11) respectively. To derive the cost function of offloading scheme conveniently, we represent the cost of a single component c as: The cost function of offloading scheme is the sum evaluation of all components and hence can be represented as: Let the offloading decisions of all components compose the decision space P = p 1 , p 2 , · · ·, p M , where M is the number of the components. Then, the goal of EPED is to find a special decision space P * to minimize (13), which can be represented as:

Algorithm implementation
To determine the optimal offloading scheme shown as (14), a DNN structure is employed in this paper. For training this DNN, the most important thing is preparing the training dataset. Our training dataset was got by the following steps: where I i is the state of all M components. Therefore, I i consists of 4M data items since each state has 4 state items. The neuron number of the input layer is also 4M to accept I i accurately. P * i is the desired optimal offloading scheme of M components.
For example, (c i,1 , v i,1 , b i,1 , d i,1 ) is the state of component 1 which is randomly generated in the ith pass in step 5, and (p i,1 , p i,2 , · · ·, p i,m ) is the offloading scheme corresponding this pass. The DNN is designed as Fig. 1, which is a fully connected neural network, but we did not show the real connections between different layers since there are too many neurons. When we input a training record into DNN, a state of a component will be accepted by 4 adjacent neurons. Since there are M components, the number of input nodes is 4M, and the output layer has M nodes each represents the offloading decision of the corresponding components.
The training dataset is prepared from a limited number of states, but the well-trained DNN can predict the optimal offloading scheme of any combination of states. Although the proposed DNN is designed for fixed number of components, we can introduce a large M to satisfy different scenarios; then, the DNN is competent for any scenarios less than M s. For example, if M = 10 and the actual component number is 8, we can just let the input and output of sample The EPED algorithm is summarized as follows:

Experiment setup
We implemented the experiments on a workstation with a 32-core CPU and 1 TB RAM. We set M = 100 and randomly generated 10,000 states for each component; then, we get a dataset {sample i = {I i , P * i }|i = 1, 2, . . . , 10, 000}. We employed different sparsity of this dataset as our training data, i.e., 10, 20, 50, 100, 200, 500, 1000, and 2000 respectively; samples were selected to train the DNN.
We first verified the accuracy of EPED under different sparsity and then compared the predictive performance of our EPED with other 2 types of representative methods. The 2 compared types of methods are: 1. Total offloading scheme (TOS) [16]: TOS is a coarse grained approach. It makes no decision but selects all the components to offload from UE to MES. No  end for until convergence Input any required states for optimal offloading scheme satisfied: components will be executed on UE via this method; therefore, TOS seems to be able to save energy of UEs. However, this method needs a lot of data transmission, which also requires energy consumption. Therefore, we select this method to make a comparison. 2. Random offloading scheme (ROS) [16]: ROS performs offloading by a simple strategy, just randomly select some components to offload.
For the convenience of comparison, we proposed the predictive accuracy as follows: where p * i,m is the prediction of offloading scheme of a component, p i,m is its real best offloading scheme, and N is the number of all components. Step 3 of Section 2.4 has shown how to get the real best offloading scheme. Figure 2 shows the prediction accuracies of EPED under different number is S. Figure 2 indicates that the prediction accuracy of EPDE improves with the increment of the sample number. When the sample number is ≥ 50, the MAE and RMSE are all less than 0.5, which indicates that EPED can make an accurate prediction, since 0 ≤ p i,m ≤ 1. However, when the sample number exceeds 1000, the prediction accuracy declines quickly. It means that the performance of EPED is not linear to the sparsity.

Experiment implementation
We compared EPED between TOS and ROS for accuracy rate and cost consumption. The accuracy rate is defined as: where N p is the number of accurately offloaded components. Figure 3 compared EPED between TOS and ROS for accuracy rate. The accuracy rate is obtained from training dataset of sample number = 100, sample number = 500, and sample number = 1000 respectively. We have verified that the prediction accuracy of EPDE improves with the increment of the sample number. Therefore, the DNN can make more accurate offloading scheme with a large sample number. Figure 3 also indicates that the performance of our proposed EPED is competent for most scenarios, while the other two methods are hard to improve. It is easy to understand, the other two methods select total or random offloading, the more sample number, the more errors they will make. Figure 4 compared EPED between TOS and ROS for accuracy cost consumption under different sample numbers. If compared to other methods, EPED will run the application with the smallest cost. Another important thing is that EPED has the minimum slope of the curve, which indicates that when the prediction scene becomes complex, EPED can have a relatively small decrease of offloading performance.

Related work
To improve the performance of mobile device offloading to utilize the benefits of Clouds, many attempts have been made by researchers. Some studies have focused on methods on how to effectively offload tasks to MES with the minimal energy and time cost. This section introduces several representative works.
To maximize the potential of energy savings, MAUI [17] minimizes the burden on programmers by combining the reflection of programming, portability of code, network costs, serialization, and type safety. In MAUI, applications are prebuilt and executed on HTC smartphones, and the MAUI server uses a dual core desktop running Windows 7 with the v3.5 .NET Framework. Using MAUI, the mobile game components can be offloaded to a remote cloud server and save energy for two types of games. If running computer games, 27% of the energy consumption can be saved, and 45% of the energy consumption can be saved if running chess games. CloneCloud [18] offload the calculation task of the resource-intensive components of a mobile application to a more powerful clone which is created on a cloud. It periodically or on-demand synchronizes all tasks of mobile devices to adjust the current offloading scheme. The advancement of CloneCloud is that an offloaded task can be even partitioned into pieces and select some pieces to run locally while another part of pieces runs on the server side. A related weakness of CloneCloud is that if native resources are not virtualized or are inaccessible for clone, then CloneCloud cannot virtualize such type of resources. It can co-work with threadgranularity migration to improve performance. The difference between MAUI and CloneCloud is that the former only can save energy consumption of mobile applications through automatically offloading, while CloneCloud can minimize either execution time or energy consumption of applications by adaptively determining computationintensive components. In mobile network, offloading demands usually encountered the inaccessible cloud computing resources, which may bring failure to offloading scheme. COSMOS [19] aims to solve this problem by the risk-based offloading idea. COSMOS can make risk-based offloading strategies to decrease the uncertainties caused by variable network environments. COSMOS offloads task from local to server with the attempt of little energy consumption and the rental fees of cloud resources. A big trouble of the mobile network environment offloading is the high WAN latency caused by an unstable network environment or a long distance between user equipment and servers. Cloudlets [20] puts forward a clever idea to find the cloud resources close to the users. The distance between user equipment and such cloud resources is only one router hop. The infrastructure of cloudlets is decentralized and widely dispersed. The infrastructure of cloudlets is self-managing and low energy consumption. In a word, the cloudlet is a predefined cloud in consisting of several static stations and is generally established in public domains. However, cloudlets cannot guarantee the availability for a nearby  [21] considers how to organize a group of colocated devices to provide a cloud service as edge servers. The proposed architecture of femtocloud provides a dynamic, self-configuring mobile cloud which can serve a cluster of mobile devices. The system of femtocloud mainly contains two parts: one is a highly stable and well-configured controller and the other is many mobile and unpredictable devices, i.e., a compute cluster to perform the computation. Spontaneous proximity cloud (SPC) [22] lets a set of neighboring mobile devices work in a collaborative way and decreases the high latency between the user equipment and the cloud server by the data sharing between different users.

Conclusion
This paper proposed a new method EPED that can adaptively select the optimal combinations of components to offload, with the smallest cost of execution time and energy consumption. EPED splits an application into multiple components and designs a mathematical model of the cost function for each component, considering energy consumption, execution time, and server-side resource consumption. Based on the cost function of each component, we proposed a mathematical model of cost function of final offloading scheme, comprehensively evaluating the overall cost of all components. The cost function of final offloading scheme is not designed as the simple linear addition of all components but the interactions and connections between adjacent components, by which the cost of offloading scheme can be accurately evaluated. Based on the cost function of final offloading scheme, we designed a mathematical model of parameter constraint which can bring the smallest cost. We proposed a supervised deep neural network (DNN) to calculate the parameters of the cost function of final offloading scheme. We also proposed a method of getting a training dataset in our experiments.