- Research
- Open Access

# Proactive edge computing in fog networks with latency and reliability guarantees

- Mohammed S. Elbamby
^{1}Email authorView ORCID ID profile, - Mehdi Bennis
^{1, 3}, - Walid Saad
^{2}, - Matti Latva-aho
^{1}and - Choong Seon Hong
^{3}

**2018**:209

https://doi.org/10.1186/s13638-018-1218-y

© The Author(s) 2018

**Received:**26 March 2018**Accepted:**30 July 2018**Published:**20 August 2018

## Abstract

This paper studies the problem of task distribution and proactive edge caching in fog networks with latency and reliability constraints. In the proposed approach, user nodes (UNs) offload their computing tasks to edge computing servers (cloudlets). Cloudlets leverage their computing and storage capabilities to proactively compute and store *cacheable* computing results. In this regard, a task popularity estimation and caching policy schemes are proposed. Furthermore, the problem of UNs’ tasks distribution to cloudlets is modeled as a one-to-one matching game. In this game, UNs whose requests exceed a delay threshold use the notion of hedged-requests to enqueue their request in another cloudlet, and offload the task data to whichever is available first. A matching algorithm based on the deferred-acceptance matching is used to solve this game. Simulation results show that the proposed approach guarantees reliable service and minimal latency, reaching up to 50 and 65% reduction in the average delay and the 99th percentile delay, as compared to reactive baseline schemes.

## Keywords

- 5G
- Caching
- Fog networks
- IoT
- Hedged requests
- Matching theory
- Offloading
- Resource allocation

## 1 Introduction

Evolving wireless networks to their next generation is no longer limited to boosting network capacity in the context of enhanced mobile broadband (eMBB). Instead, 5G networks aim to provide low-latency and high-reliability guarantees in addition to higher spectral efficiency. This essentially include the support for massive machine-type communication and ultra-reliable and low-latency communication (URLLC).

The emergence of Internet of things (IoT), which requires massive connectivity of resource-limited devices, poses unprecedented challenges in terms of required amount of data communication and computation [1, 2]. These stringent requirements in end-to-end latency, service reliability, and availability mandate a system design which incorporates latency and reliability aspects. Due to the limited computation and energy capabilities of IoT devices, it is hard to rely on local computing resources in satisfying stringent computing and processing latency requirements. Therefore, mobile cloud computing (MCC) solutions have been considered to offer computing services to IoT network nodes. In this regard, user nodes (UNs) offload their computing tasks to remote cloud centers, the cloud servers execute the task and send back the results. While this solution offers high resource capacity, it falls short of handling latency-critical computing services, owing to the high propagation delays between the UNs and the cloud data center. For this reason, the concept of fog computing has been introduced [3] to bring computing resources to the network edge. With short propagation delays, fog networks are able to strike a balance between high computation capabilities and short propagation distances, offering efficient computing services to latency-intolerant applications. However, smarter resource utilization schemes that efficiently distribute the communication and computing resources are needed to reap the benefits of fog networks [4, 5].

Several recent works have studied the problem of joint computing and communication resource optimization. A centralized joint communication and computation resource allocation scheme is proposed in [6]. The work in [7] studies the resource allocation in mobile edge computing (MEC) networks for energy efficiency. The power-delay tradeoff in centralized MEC systems is discussed in [8] using tools from stochastic optimization. In [9], the problem of latency minimization under a resource utilization constraint is studied, where an algorithm is proposed to balance between minimizing computing latency and energy consumption. However, these works rely on centralized solutions in which the MEC network has information about all users requests and channel-state information (CSI). A semi-distributed solution for the joint computing and communication resource optimization is proposed in [10] under asymmetric user preferences and capabilities. A distributed framework to realize the power-delay tradeoff in latency constrained fog networks is proposed in [11]. Game-theoretic solutions are discussed in [12] to design decentralized computing offloading schemes for edge cloud computing. Game theory is also used in [13] to model the competition of IoT nodes over the fog network resources with the aim of minimizing both energy and delay. The fog network formation under the uncertainty of arriving and departing nodes is investigated in [5, 14, 15] with the aim of minimizing the maximum computing delay.

The vast majority of the literature in fog networking considers the reactive computing paradigm in which task computing starts only after the task data is offloaded to the fog node [4]. Moreover, most of the prior art has not explicitly accounted for stringent latency and reliability constraints in fog networks. Due to the distributed nature of these networks, having computing resources closer to the network edge allows for providing personalized type of computing services to end-users [4]. This allows for harnessing the correlation between end-user requests through *proactive computing* to minimize computing latency [16]. The idea of *prefetching* is recently introduced as a first step towards proactive computing [17, 18], in which part of the upcoming task data is predicted and prefetched during the computing of the current one such that the fetching time is minimized. Using proactive computing, the fog network can keep track of the popularity patterns of UNs’ tasks and cache their computing results *in advance*. This eliminates the need to request the task data multiple times thus reducing the burden on the task offloading transmissions [4, 19]. Caching computing results is applicable to several applications. For example, in an augmented reality (AR) service provided in a museum, edge servers can store popular AR requests of visitors to serve them with minimal latency [4]. In the context of MEC for vehicle navigation applications [20], correlation between computing tasks exists as multiple vehicles may request to detect the same object in a road. In such situations, the fog network can use proactive computing by keeping track of the popularity patterns of UNs’ tasks and cache their computing results in advance. This eliminates the need to request the task data multiple times thus reducing the burden on the task offloading transmissions

The idea of proactive networks has been extensively studied in the context of wireless content caching [21, 22]. Proactive caching of computing tasks differs from content caching in several aspects. First, while content caching aims to alleviate the burden on the backhaul links by prefetching popular contents from the core network during off-peak times, computing caching decreases the load on the access link by providing computing results to end users without the need to prefetch their task data beforehand. Second, unlike content, computing tasks are of diverse types depending on the computing environment. Some computing tasks, such as object detection results, are *cacheable*, since they can be reused by other devices or in other time instants. On the contrary, computing data that are personalized or time dependent are not cacheable. Instead, they have to be computed in real time. Moreover, future computing networks will have dense deployments of servers and low density of UNs per server [23], which makes it impractical to follow the popularity patterns locally at each server. Alternatively, relying on central computing, either fully or partially, to build the popularity distributions over larger sets of servers is essential in providing a broader view on the popularity patterns of computing tasks.

The main contribution of this paper is to investigate the problem of URLLC in fog networks with proactive edge caching of computing results. Reliability is ensured by allowing UNs to offload their computing tasks to multiple edge computing nodes (cloudlets) based on the notion of *hedged requests* [24]. Furthermore, both computing and storage resources are exploited to minimize the computing latency via joint task offloading and proactive caching of *popular* and *cacheable* computing tasks. In the proposed framework, UNs and their serving cloudlets are grouped into well-chosen clusters based on spatial proximity and mutual interests in popular tasks. This allows cloudlets to proactively cache computing results of popular tasks in their clusters to ensure minimal computing latency. Moreover, the problem of task distribution to cloudlets is modeled as a matching game between cloudlets and UNs. To this end, an efficient distributed matching algorithm is proposed to solve the problem of distributing UN requests to cloudlets, reaching a stable matching such that a reliable service in terms of maximum latency violation rate is guaranteed. Simulation results show that the proposed framework can guarantee reliable computations with bounded latency in different network density and storage capability conditions. Proactiveness is shown to minimize up to 50% and 65% of the average and the 99th percentile delay, respectively. Moreover, hedged requests are shown to significantly improve the system reliability even without proactive capability.

The rest of this paper is organized as follows. Section 2 describes the system model and problem formulation. The clustering scheme as well as the joint caching and task distribution schemes are studied in Section 3. The performance of the proposed scheme is analyzed in Section 4. Finally, Section 5 concludes the paper.

## 2 System model

*E*cloudlets and a set \(\boldsymbol{\mathcal {U}}\) of

*U*UNs, which are uniformly distributed over the network area. Each cloudlet has a CPU computing capability of

*c*

_{e}cycles/s and a storage of size

*s*

_{e}. Cloudlets share the same frequency band with bandwidth

*B*, and schedule UNs’ transmissions in orthogonal time instants (slots). Here, our optimization framework focuses only on the uplink transmissions [4, 17]. UNs have computing tasks that arrive following a Poisson process with mean

*λ*

_{u}. UNs are interested in a set \(\mathcal {A}\) of

*A*tasks. Tasks have a task data size

*L*

_{a}that is drawn from an exponential distribution of mean \(\bar {L}_{a}\) and processing density of

*κ*cycles per bit of task data. A UN requesting a computing task is matched to one or more cloudlets within its coverage, where coverage is decided based on a threshold path loss value. Path loss is used as a coverage metric so that a UN’s cloudlet list does not change frequently due to wireless channel dynamics. The task data is then offloaded to the one that schedules the UN request first. Task computing is performed after the task data is offloaded. Subsequently, the computed data is transmitted back to the UN. Due to the typically small computed data size to minimize the task data offloading delay, cloudlets proactively cache the pre-computed results of the most popular cacheable computing tasks. A subset \(\mathcal {A}_{c}\subset \mathcal {A}\) of the tasks is assumed to be cacheable and another subset \(\mathcal {A}_{nc}\subset \mathcal {A}\) is non-cacheable such that \(A_{c}\cup \mathcal {A}_{nc}=\mathcal {A}\). The notations and abbreviations used throughout this paper are summarized in Table 1. An illustration of the studied fog network model is shown in Fig. 1.

List of abbreviations and symbols

CCDF | Complementary cumulative distribution function |

CSI | Channel state information |

DA | Deferred acceptance |

MCC | Mobile cloud computing |

MEC | Mobile edge computing |

QoS | Quality of service |

TDD | Time division duplex |

UN | User node |

URLLC | Ultra-reliable and low latency communication |

\(\mathcal {A}\) | Set of tasks |

\(\mathcal {C}_{i}\) | Set of UNs in cluster |

\(\mathcal {E}\) | Set of edge cloudlets |

\(\mathcal {U}\) | Set of UNs |

| System bandwidth |

| Computing power of cloudlet |

| Total delay of task |

\(D_{ea}^{\text {comp}}\) | Computing delay of task |

\(D_{ea}^{\text {comm}}\) | Transmission delay of task |

| Delay threshold |

| Size of task |

| Task occurrence vector of UN |

| Task queue of cloudlet |

| Data rate of UN |

\(\bar {r}_{ue}\) | Estimated data rate of UN |

| Distance Gaussian similarity matrix |

| Task popularity similarity matrix |

| Storage size of cloudlet |

v
| Geographical coordinates vector of UN |

| Waiting time of task |

| Association indicator |

| Maximum number of associated cloudlets |

| Caching indicator |

| Zipf distribution parameter |

| Target maximum delay violation |

| Similarity parameter |

| Task processing density |

| Mean UN task arrival rate |

| Clustering parameter |

| Edge processing delay |

| Learning rate parameter |

ξ
| Popularity task vector of cluster |

### 2.1 Computing model

*u*offloaded to cloudlet

*e*experiences a transmission delay \(D_{ea}^{\text {comm}}\) and a computing delay \(D_{ea}^{\text {comp}}\). The transmission delay at time instant

*t*is expressed as

*r*

_{ue}(

*t*) is the uplink data rate per time slot, calculated using the Shannon’s formula as follows:

with |*h*_{ue}| being the channel gain between UN *u* and cloudlet *e*, and *σ*^{2} being the noise variance, where the channel *h*_{ue} includes path loss and block fading.

where *x*_{ea} is a binary variable that equals 1 if task *a* is distributed to cloudlet *e*, *W*_{ea}(*t*) is the waiting time of task *a* due to the previous computing tasks in the queue *Q*_{e} of cloudlet *e*, *y*_{ea}(*t*) is a binary variable that equals 1 when the computation result of task *a* is cached in cloudlet *e*, and *τ*_{EP} is the cloudlet latency which accounts for the downlink transmission of computed data and the cloudlet processing latency. In other words, only the clouldlet latency is experienced if the requested task result was cached in the cloudlet. Similar to other works [4, 17], we assume that the latency due to downlink transmission of computed data is negligible compared to the uplink task data offloading time and computing time; hence, it is taken into account only as part of the cloudlet delay.

*w*

_{ea}, and discard the request to the other one. Consequently, the delay experienced to compute the task from a hedged request, denoted

*D*

_{a}(

*t*), is the delay value from the cloudlet with the smallest waiting time, which is expressed as

*x*

_{ea}] and Y=[

*y*

_{ea}], respectively. Reliability is modeled as a probabilistic constraint on the maximum offloaded computing delay. This optimization problem is

*D*

_{th}with a probability 1−

*ε*. Constraint (6c) indicates that a request is offloaded to a maximum of

*X*

_{max}cloudlets. (6d) limits the number of cached tasks to a maximum of

*s*

_{e}. The above problem is a combinatorial problem with a non-convex cost function and probabilistic constraints, for which finding an optimal solution is computationally complex [25]. The non-convexity arises from the service rate expression in Eqs. (1)–(2) which is function of the offloading decisions of other UNs and the non-linear probabilistic constraint. Therefore, we replace the nonlinear constraint in (6b) by a linear one. The Markov’s inequality [25], for the non-negative random variable

*D*

_{a}and

*D*

_{th}>0, states that \(\mathbb {E}\{D_{a}(t)\}\leq D_{\text {th}}\epsilon,\) where \(\mathbb {E}\{.\}\) denotes the expectation over time. Since the delay of computing a cached task is very small, we are interested in keeping the delay of non-cached tasks below a pre-defined threshold. The constraint can be expressed for a single cloudlet as

*a*

_{i}in the queue

*Q*

_{e}of cloudlet

*e*at time instant

*t*. Finally, the URLLC constraint can be met by ensuring that the above inequality is satisfied at each time instant

*t*, i.e.,

*a*of UN

*u*, where \(\bar {r}_{ue}(t)\) is the average service rate. Hence, if the constraint in (9) is not met for one cloudlet, UN can be matched to an additional cloudlet such that the reliability constraint is satisfied. The average service rate is estimated at each cloudlet

*e*for each UN

*u*within its coverage using a time-average rate estimation method, as follows:

where *ν*(*t*) is a learning parameter.

###
**Remark 1**

A learning parameter *ν*(*t*) where *ν*(*t*)∈(0,1] for any *t* such that \(\lim \limits _{t\rightarrow \infty }\nu (t)=0\), ensures that \(\bar {r}_{ue}(t)\) yields the time average rate of UN *u* at *t*→*∞*.

In the following section, we propose a joint matching [26] and caching scheme to solve the optimization problem in (6).

## 3 The proposed method

In this section, we discuss our proposed approach to solve the optimization problem in (6). To simplify the computational complexity of the problem, we decouple it into two separate subproblems: task distribution to cloudlets and caching of popular cacheable task results. Since the IoT network size is typically large, adopting centralized optimization schemes over the entire cloudlet set is not practical. On the other hand, local schemes do not have enough local information about the popularity of computing tasks due to the low number of UNs per cloudlet. Therefore, our approach resorts to clustering tools to split the network into mutually coupled groups of UNs and their serving cloudlets. This clustering approach is carried out in a training period that can be repeated in a slow time scale, allowing for a broader view of the popularity patterns. In this regard, the proposed clustering scheme first groups UNs into disjoint sets based on both spatial proximity and mutual interest in popular tasks. Subsequently, a task *popularity matrix* is obtained. Finally, a joint task distribution and caching scheme is proposed. The UN clustering and the popularity matrix calculation are assumed to be carried out in the network training period during which information about UNs’ requests and their serving cloudlets are reported to a higher level controller, e.g., a cloud data center. During the training period, UNs are assigned to the cloudlets with the highest channel strength within their coverage area. While a central controller is involved in the training period calculations, we emphasize that this process does not need to be updated as frequently as the task distribution and caching processes, since a given UN’s interests are likely to remain unchanged for a number of time instants *N*_{t} (≫1).

### 3.1 Network clustering and task popularity matrix

The training phase starts by grouping UNs into *k* disjoint clusters \(\mathcal {C}_{1},\ldots,\mathcal {C}_{k}\) based on their mutual-coupling in distance and task popularity. The objective of clustering is to obtain a per-cluster task popularity matrix, defined as Ξ = [ξ_{1},…,ξ_{k}], where ξ_{i} is a vector of the popularity order of tasks in cluster \(\mathcal {\mathcal {C}}_{i}\). Essentially, identifying the similarities between neighboring UNs and their mutual interests is the first step in bringing computing resources closer to them. To that end, we harness the similarity between different UNs in terms of task popularity patterns to allow cloudlets in their proximity to store the computing results of their highly requested tasks. Next, we discuss how the distance similarity and task popularity similarity are calculated.

#### 3.1.1 Distance-based Gaussian similarity

_{d}= [

*d*

_{ij}], with

*d*

_{ij}being

where v_{i} is a vector of the geographical coordinates of UN *i*, and *σ*_{d} is a similarity parameter to control the neighborhood size.

#### 3.1.2 Task popularity-based similarity

_{p}=[

*p*

_{ij}], where

*p*

_{ij}is expressed as

#### 3.1.3 UN clustering and popularity matrix calculation

*S*, is calculated as

where *θ* is a parameter that prioritizes the impact of distance and task popularity. Subsequently, spectral clustering [27] is used to group UNs into *k* disjoint clusters, denoted \(\mathcal {C}_{1},\ldots,\mathcal {C}_{k}\).

###
**Remark 2**

Spectral clustering of *U* points requires the setting of a minimum and maximum number of clusters, *k*_{min} and *k*_{max} where *k*_{max}≤*U* [27]. Here, we select *k*_{min}=2 in order to have at least two groups of UNs, and *k*_{max}=*U*/2 since having a higher number of clusters will imply having multiple clusters with a single member, which contradicts with the idea of grouping.

To bring the popular tasks closer to the network edge, the task popularity matrix of UN clusters is reported to cloudlets so that they cache the computing result of the most popular tasks. Accordingly, the most preferred cluster by a cloudlet is obtained by calculating how frequently the members of each cluster were assigned to this specific cloudlet during the training period. The vector ξ_{i} of tasks that are most popular for a cluster *i* is reported to the cloudlets that have cluster \(\mathcal {C}_{i}\) as their most preferred cluster. The proposed UN clustering and task popularity matrix calculation is described in Algorithm 1.

### 3.2 Computing caching scheme

Following the training phase, cloudlets seek to minimize the service delay of their UNs’ requests during the network operation by proactively caching the computing results of the popular tasks they receive. The caching process is performed in an online-manner, in a faster time scale than the clustering and popularity calculation phase. Each cloudlet aims to optimize its caching policy to minimize the total computing latency of its serving UNs. The cache storage of each cloudlet is assumed to be empty at the beginning of the network operation. As UNs start to offload their computing tasks, cloudlets will cache as many computing results as their storage capacity allows. Once a cloudlet’s storage is full, a new arriving request that is more popular than the least popular task currently in the cache will replace it. This is equivalent to a least frequently used (LFU) replacement strategy in content caching. The algorithm implementation per cloudlet is described in Algorithm 2.

Next, if the cloudlet receives a computation request of a task that is cached in its storage, there is no need to offload the task data or recompute the task. Hence, the computing delay of this task consists only of the cloudlet delay, which can be obtained from Eq. (4) by setting (1−*y*_{ea}(*t*))=0 for task *a* cached in cloudlet *e*.

### 3.3 UN task distribution

Our next step is to propose a task distribution scheme that solves the constrained minimization problem in (6). The task distribution problem is formulated as a matching game between UNs and cloudlets where, at each time instant, UNs requesting new tasks are matched to one or more serving cloudlets, depending on their latency requirements, such that their service delay is minimized. Matching theory [26] is a framework that solves combinatorial problems in which members of two sets of players are interested in forming *matching pair*s with a player from the opposite set. Preferences of both the cloudlets and UNs are denoted as ≻_{e} and ≻_{u}, and they represent how each player in one side ranks the players of the opposite side. From the optimization problem in (6), a UN seeks to be matched to more than one cloudlet if the first one is not satisfying its reliability metric, given the estimated rate \(\bar {r}_{ue}(t)\). Therefore, the proposed matching algorithm first finds a one-to-one matching of UN requests to a single cloudlet each. The estimated delay of each request is then calculated. If a subset of the matched UNs’ requests are not guaranteed to satisfy the reliability constraint, the matching algorithm is run again to match the UNs to additional cloudlets, subject to the availability of vacant cloudlets.

###
**Definition 1**

Given the two disjoint sets of cloudlets and UNs \((\mathcal {E},\mathcal {U}\)), a *matching* is defined as a *one-to-one* mapping *Υ* from the set \(\mathcal {E}\cup \mathcal {U}\) into the set of all subsets of \(\mathcal {E}\cup \mathcal {U}\), such that for each \(e\in \mathcal {E}\) and \(u\in \mathcal {U}\):

- 1.
For each \(u\in \mathcal {U},\Upsilon (u)\in \mathcal {E}\).

- 2.
For each \(e\in \mathcal {E}, \Upsilon (e)\in \mathcal {U}.\)

- 3.
∣

*Υ*(*u*)∣=1,∣*Υ*(*e*)∣=1. - 4.
*Υ*(*u*)=*e*⇔*Υ*(*e*)=*u*.

This preference allows UNs to seek matchings that minimizes their own QoS by selecting the cloudlets that provides minimal computing latency. Note that due to the impracticality of having knowledge of the queue state of each cloudlet at each UN, UNs consider the transmission and computing delay of their own task data in calculating their preference profiles.

*u*is assigned to cloudlet

*e*as

In other words, the utility of each cloudlet is to seek a matching that maximizes the difference between the right hand side and the left hand side of the inequality in (9), such that the constraint is met as a stable matching is reached. Note from (9) that, to satisfy the reliability constraint, *Φ*_{eu}(*t*) has to be greater or equal to zero. Therefore, matched UNs with *Φ*_{eu}(*t*)<0 will seek to be matched to additional cloudlets to satisfy their reliability requirements.

Next, we define matching stability and provide an efficient algorithm based on the deferred acceptance (DA) [26] to solve this game.

###
**Definition 2**

Given a matching *Υ*, and a pair \((u^{\prime },e^{\prime })\) with \(\Upsilon (e^{\prime })\neq u^{\prime }\) and \(\Upsilon (u^{\prime })\neq e^{\prime }\), \((u^{\prime },e^{\prime })\) is said to be blocking the matching *Υ* and form a blocking pair if: 1) \(u^{\prime }\succ _{e^{\prime }}\Upsilon (e^{\prime })\), 2) \(e^{\prime }\succ _{u^{\prime }}\Upsilon (u^{\prime })\). A matching *Υ*^{∗} is stable if there is no blocking pair.

###
**Remark 3**

The DA algorithm, described in Algorithm 3, converges to a two-sided stable matching of UNs to cloudlets [26].

## 4 Simulation results

- 1.
*Baseline**1 (BL1)*, which is similar to the proposed scheme, in which hedged requests are enabled to satisfy UNs’ reliability. However, caching is performed by estimating the popularity distribution locally at each cloudlet and considering a least recently used (LRU) caching replacement strategy. - 2.
*Baseline**1 (BL2)*, which is a reactive version of the proposed scheme, in which hedged requests are enabled to satisfy UNs’ reliability, but with no caching or proactive capabilities in the cloudlets. - 3.
*Baseline*2*(BL3)*, in which neither proactiveness nor hedged requests are considered. UNs and cloudlets rank each other based on the wireless access link quality, and a UN can only be matched to a single cloudlet.

We use the set of default parameters listed in Table 2 unless stated otherwise. Popularity is assumed to vary among tasks following a Zipf-like popularity model with parameter *z* [21]. Accordingly, the rate of requesting the *i*th most popular task is proportional to 1/*i*^{z}. Three different sets of task popularity distributions are assigned randomly to UNs. Furthermore, one third of the tasks, uniformly selected, is assumed to be cacheable, whereas the remaining tasks have to be computed in real time^{1}.

### 4.1 Delay tail performance

### 4.2 Impact of network density

Comparing BL2 and BL3 schemes, we can clearly see the trade-off between ensuring a reliable service and maintaining a low delay. At low network density conditions, there are not enough available cloudlets for each UN. Accordingly, boosting reliability through hedged-requests costs an increase in the system load, which in turn increases the average latency. However, as the number of cloudlets increases, the increased, hedged-requests are shown to minimize both the average delay and the 99th percentile delay. This highlights the advantage of hedged-requests in computing networks, specially in the absence of proactive capabilities. Moreover, the gains brought by proactiveness reaches up to 50% and 65% in the average and the 99th percentile delay, respectively.

### 4.3 Impact of caching capability

### 4.4 Impact of popularity distribution

*z*. The Zipf parameter reflects the discrepancy level of the task popularity distribution. When the discrepancy level increases (higher

*z*), the popularity gap between the most and least popular tasks increases. Hence, at high values of

*z*, the likeliness of serving a request from the cache increases as compared to lower values, for the same storage size. Accordingly, we can see from Fig. 7 that as

*z*increases, the cache hit rate (defined as the ratio of the requests served from the cache) increases. As a result, the 99th percentile delay decreases.

## 5 Conclusions

In this paper, a proactive computing and task distribution scheme for ultra-reliable and low-latency fog computing networks has been introduced. In the proposed approach, clusters of cloudlets and edge user nodes are formed based on spatial proximity and similar interests in computing results. Each cluster proactively caches computing results in the storage of its cloudlets to minimize the computing latency. Moreover, a matching algorithm to distribute the computing tasks to cloudlets is presented. The matching algorithm allows latency-critical requests to be matched to multiple cloudlets such that their latency constraints are met. Simulation results have shown that the proposed approach strikes a balance between minimizing the service latency and maximize the service reliability at different network conditions.

Simulation parameters

Parameter | Value |
---|---|

System bandwidth | 10 MHz |

Number of cloudlets | 30 |

UN density | 3 |

Maximum associated cloudlets | 2 |

Mean UN task arrival rate | 10 task/s |

Number of tasks ∣ | 100 |

Cloudlet storage size | 10 tasks |

Mean task size \(\bar {L}_{a}\) | 200 Kbit |

Processing density | 162.5 cycle/bit [28] |

Cloudlet computing power | 10 |

Zipf parameter | 0.6 |

Delay threshold | 1 s |

Target delay violation | 0.01 |

Clustering parameter | 0.5 |

Processing delay | \({Unif} \left (\frac {1}{8},\frac {1}{4}\right)\text {ms}\) |

Neighborhood size parameter \(\sigma _{d}^{2}\) | 500 m |

Scheduling slot | 1 ms |

Network size | 500×500 |

UN transmit power | 10 dBm |

learning parameter | 1/ |

## Declarations

### Funding

This research was supported by the Academy of Finland (CARMA) project, NOKIA donation on fog (FOGGY project), and by the US Office of Naval Research (ONR) under Grant N00014-15-1-2709.

### Availability of data and materials

The paper is self-contained. Simulations description and parameters are provided in details in Section 4.

### Authors’ contributions

All authors have contributed to this manuscript and approved the submitted manuscript.

### Competing interests

The authors declare that they have no competing interests.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- Z Dawy, W Saad, A Ghosh, JG Andrews, E Yaacoub, Toward massive machine type cellular communications. IEEE Wirel. Commun.
**24**(1), 120–128 (2017). https://doi.org/10.1109/MWC.2016.1500284WC.View ArticleGoogle Scholar - M Mozaffari, W Saad, M Bennis, M Debbah, Unmanned aerial vehicle with underlaid device-to-device communications: Performance and tradeoffs. IEEE Trans. Wirel. Commun.
**15**(6), 3949–3963 (2016). https://doi.org/10.1109/TWC.2016.2531652.View ArticleGoogle Scholar - F Bonomi, R Milito, J Zhu, S Addepalli, in
*Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing*. Fog computing and its role in the internet of things (ACMNew York, 2012), pp. 13–16. https://doi.org/10.1145/2342509.2342513.View ArticleGoogle Scholar - Y Mao, C You, J Zhang, K Huang, KB Letaief, A survey on mobile edge computing: The communication perspective. IEEE Commun. Surv. Tutorials.
**PP**(99), 1–1 (2017). https://doi.org/10.1109/COMST.2017.2745201.Google Scholar - G Lee, W Saad, M Bennis, An online optimization framework for distributed fog network formation with minimal latency. ArXiv e-prints (2017). http://arxiv.org/abs/1710.05239.Google Scholar
- S Barbarossa, S Sardellitti, P Di Lorenzo, in
*2013 IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC)*. Joint allocation of computation and communication resources in multiuser mobile cloud computing, (2013), pp. 26–30. https://doi.org/10.1109/SPAWC.2013.6612005. - C You, K Huang, H Chae, BH Kim, Energy-efficient resource allocation for mobile-edge computation offloading. IEEE Trans. Wirel. Commun.
**16**(3), 1397–1411 (2017). https://doi.org/10.1109/TWC.2016.2633522.View ArticleGoogle Scholar - Y Mao, J Zhang, SH Song, KB Letaief, in
*Proc. IEEE Global Commun. Conf. (GLOBECOM)*. Power-delay tradeoff in multi-user mobile-edge computing systems, (2016), pp. 1–6. https://doi.org/10.1109/GLOCOM.2016.7842160. - YH Kao, B Krishnamachari, MR Ra, F Bai, Hermes: Latency optimal task assignment for resource-constrained mobile computing. IEEE Trans. Mob. Comput.
**16**(11), 3056–3069 (2017). https://doi.org/10.1109/TMC.2017.2679712.View ArticleGoogle Scholar - X Lyu, H Tian, C Sengul, P Zhang, Multiuser joint task offloading and resource optimization in proximate clouds. IEEE Trans. Veh. Technol.
**66**(4), 3435–3447 (2017). https://doi.org/10.1109/TVT.2016.2593486.View ArticleGoogle Scholar - C Liu, M Bennis, HV Poor, in
*2017 IEEE Globecom Workshops (GC Wkshps)*. Latency and reliability-aware task offloading and resource allocation for mobile edge computing. ArXiv e-prints, (2017), pp. 1–7. https://doi.org/10.1109/GLOCOMW.2017.8269175. - X Chen, L Jiao, W Li, X Fu, Efficient multi-user computation offloading for mobile-edge cloud computing. IEEE/ACM Trans. Networking.
**24**(5), 2795–2808 (2016). https://doi.org/10.1109/TNET.2015.2487344.View ArticleGoogle Scholar - H Shah-Mansouri, VWS Wong, Hierarchical fog-cloud computing for IoT systems: A computation offloading game. IEEE Internet of Things Journal, 1–1 (2018). https://doi.org/10.1109/JIOT.2018.2838022.
- G Lee, W Saad, M Bennis, in
*2017 IEEE International Conference on Communications (ICC)*. An online secretary framework for fog network formation with minimal latency, (2017), pp. 1–6. https://doi.org/10.1109/ICC.2017.7996574. - Lee, G, Saad, W, Bennis, M, in
*2017 IEEE Fog World Congress (FWC)*. Online optimization for low-latency computational caching in Fog networks, (2017), pp. 1–6. https://doi.org/10.1109/FWC.2017.8368529. - MS ElBamby, C Perfecto, M Bennis, K Doppler, Toward low-latency and ultra-reliable virtual reality. IEEE Netw.
**32**(2), 78–84 (2018). https://doi.org/10.1109/MNET.2018.1700268.View ArticleGoogle Scholar - SW Ko, K Huang, SL Kim, H Chae, Live prefetching for mobile computation offloading. IEEE Trans. Wirel. Commun.
**PP**(99), 1–1 (2017). https://doi.org/10.1109/TWC.2017.2674665.Google Scholar - SW Ko, K Huang, SL Kim, H Chae, in
*2017 IEEE International Conference on Communications (ICC)*. Energy efficient mobile computation offloading via online prefetching, (2017), pp. 1–6. https://doi.org/10.1109/ICC.2017.7997341. - MS Elbamby, M Bennis, W Saad, in
*2017 European Conference on Networks and Communications (EuCNC)*. Proactive edge computing in latency-constrained fog networks, (2017), pp. 1–6. https://doi.org/10.1109/EuCNC.2017.7980678. - Q Hu, C Wu, X Zhao, X Chen, Y Ji, T Yoshinaga, Vehicular multi-access edge computing with licensed sub-6 ghz, ieee 802.11p and mmwave. IEEE Access.
**6:**, 1995–2004 (2018). https://doi.org/10.1109/ACCESS.2017.2781263.View ArticleGoogle Scholar - E Baştuğ, M Bennis, M Debbah, Living on the edge: the role of proactive caching in 5g wireless networks. IEEE Commun. Mag.
**52**(8), 82–89 (2014). https://doi.org/10.1109/MCOM.2014.6871674.View ArticleGoogle Scholar - MS ElBamby, M Bennis, W Saad, M Latva-aho, in
*Proc. 11th Intl. Symp. on Wireless Communications Systems (ISWCS)*. Content-aware user clustering and caching in wireless small cell networks, (2014), pp. 945–949. https://doi.org/10.1109/ISWCS.2014.6933489. - A Anpalagan, M Bennis, R Vannithamby,
*Design and deployment of small cell networks*(Cambridge University Press, UK, 2015).Google Scholar - J Dean, LA Barroso, The tail at scale. Commun. ACM.
**56**(2), 74–80 (2013). https://doi.org/10.1145/2408776.2408794.View ArticleGoogle Scholar - A Mukherjee, in
*IEEE Globecom Workshops (GC Wkshps)*. Queue-aware dynamic on/off switching of small cells in dense heterogeneous networks, (2013), pp. 182–187. https://doi.org/10.1109/GLOCOMW.2013.6824983. - Y Gu, W Saad, M Bennis, M Debbah, Z Han, Matching theory for future wireless networks: fundamentals and applications. IEEE Commun. Mag.
**53**(5), 52–59 (2015). https://doi.org/10.1109/MCOM.2015.7105641.View ArticleGoogle Scholar - J Cranshaw, R Schwartz, JI Hong, N Sadeh, in Proc. International AAAI Conference on Weblogs and Social Media. The livehoods project: utilizing social media to understand the dynamics of a city, (2012), p. 58.Google Scholar
- AP Miettinen, JK Nurminen, in
*Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud’10*. Energy efficiency of mobile clients in cloud computing (USENIX AssociationBerkeley, 2010), pp. 4–4. http://dl.acm.org/citation.cfm?id=1863103.1863107. - X Ma, C Lin, X Xiang, C Chen, in
*Proceedings of the 18th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems. MSWiM ’15*. Game-theoretic analysis of computation offloading for cloudlet-based mobile cloud computing (ACMNew York, 2015), pp. 271–278. https://doi.org/10.1145/2811587.2811598.