An IoT-based task scheduling optimization scheme considering the deadline and cost-aware scientific workflow for cloud computing

Large-scale applications of Internet of things (IoT), which require considerable computing tasks and storage resources, are increasingly deployed in cloud environments. Compared with the traditional computing model, characteristics of the cloud such as pay-as-you-go, unlimited expansion, and dynamic acquisition represent different conveniences for these applications using the IoT architecture. One of the major challenges is to satisfy the quality of service requirements while assigning resources to tasks. In this paper, we propose a deadline and cost-aware scheduling algorithm that minimizes the execution cost of a workflow under deadline constraints in the infrastructure as a service (IaaS) model. Considering the virtual machine (VM) performance variation and acquisition delay, we first divide tasks into different levels according to the topological structure so that no dependency exists between tasks at the same level. Three strings are used to code the genes in the proposed algorithm to better reflect the heterogeneous and resilient characteristics of cloud environments. Then, HEFT is used to generate individuals with the minimum completion time and cost. Novel schemes are developed for crossover and mutation to increase the diversity of the solutions. Based on this process, a task scheduling method that considers cost and deadlines is proposed. Experiments on workflows that simulate the structured tasks of the IoT demonstrate that our algorithm achieves a high success rate and performs well compared to state-of-the-art algorithms.


Introduction
With the rapid development of science and technology, the requirement of large-scale computing cannot be separated from scientific applications or life services. Due to the geometric growth of information and the complexity of data processing, researchers in most disciplines face more challenges and opportunities than ever [1]. Many science applications, such as the Internet of things (IoT), gene sequencing, and earthquake prediction, are becoming increasingly dependent on high-performance computing and distributed storage. In particular, the IoT, which depends on a reliable computing environment, has been widely utilized in various fields [2]. As an innovative application mode, the IoT uses sensing technology, computing technology, and communication technology to connect massive devices to achieve interaction with the real world. However, in practical applications, the IoT obtains various information through perception and realizes intelligent management of different objects by sharing and exchanging data in real time. Due to its own capacity limitations, such as CPU, memory, and battery capacity, IoT applications face constraints in completing tasks that require substantial computing resources and energy consumption at the device end [3]. At the same time, analysis and decision-making based on big data at the infrastructure level of the IoT are difficult.
Meanwhile, cloud computing, a new computing architecture, has been widely applied in recent years. Cloud computing is a model that aims to provide a flexible heterogeneous resource pool through the network, and users can rent different resources on demand. These configurable computing resources are maintained by cloud providers and can be rapidly provisioned and released [4]. Users procure and release computing resources, which are usually virtual machines (VMs) with different specifications, according to their own needs within a certain time. As two new technologies based on the Internet, the IoT and cloud computing are closely related in terms of their roles. The IoT intelligently manages equipment and digitalizes various information, and cloud computing can be used as a carrier for highspeed data processing, storage, and utilization. Cloud computing has provided the advantages of speed, convenience, and security that the IoT lacks, and the technology makes the real-time dynamic management and intelligent analysis of the IoT more reliable.
Based on the infrastructure as a service (IaaS) model of cloud computing, this paper studies the task scheduling of the IoT applications in the cloud environment. In this paper, the data and processing requirements of the IoT are expressed as different applications that are usually represented as workflows that consist of computing tasks and data edges generally organized as directed acyclic graphs (DAGs) [5]. Although implementing IoT applications in cloud computing has various advantages, some issues remain to be addressed. First, workflow scheduling is regarded as an NP-hard problem [6]. In particular, workflow scheduling refers to the assignment of tasks to virtual resources according to execution sequence [7]. Second, when a computing resource (VM) is leased or released, a proper handoff takes time. Since cost is an important indicator when applying workflows to cloud platforms [8], the lease time of the VM must be considered because the time will affect the cost and deadline of the workflow. Third, the performance of a VM can vary because of the virtualization and instability of hardware, multiuser environment, and other reasons. The overall CPU performance of a VM can be up to 24% in the Amazon public clouds, and the performance variation of a cloud can reach 30% in terms of makespan [9,10]. The performance variation of VMs increases the execution makespan and cost of the workflow, leading to violation of the quality of service (QoS) in the cloud.
In this paper, based on the advantages and issues discussed above, we propose a deadline and cost-aware genetic algorithm (DCGA) for workflow scheduling in IoT applications. The presented solution improves the meta-heuristic genetic algorithm (GA) [11] by using new heuristic methods to schedule each task to execute on an appropriate VM. First, the workflow scheduling model in a cloud environment is constructed. Then, we describe the optimization goal and major conceptual definitions by formalizing definitions. Next, the proposed algorithm is described in detail in different stages, and the key links are illustrated with examples, such as encoding, crossover, and mutation. Finally, according to the characteristics of VMs in Amazon cloud services, the simulation platform is built to evaluate the performance of the proposed algorithm.
The main contributions of this paper are as follows: Most characteristics of cloud are considered, including on-demand acquisition, dynamic extension, heterogeneity, acquisition delay, performance variation of VMs, and pay-as-you-go. Novel schemes are designed for population initialization, selection, crossover, and mutation of the genetic algorithm, which improve the search performance in the solution space. A comparison of state-of-the-art algorithms and the proposed algorithm is performed on the simulation platform using different workflow specifications.
The remainder of this paper is organized as follows. Section 2 surveys and analyses related work. Section 3 presents the structure of the platform and the problem of scheduling. Then, the DCGA scheduling algorithm is proposed in Section 4, and empirical studies are presented in Section 5. Finally, Section 6 concludes the paper and discusses future work.

Related work
The workflow in IoT applications is usually represented as a DAG, in which the vertexes of the graph represent tasks and edges represent the relationship between two tasks and data transmission. Moreover, workflow scheduling can be separated into two stages: task selection and resource allocation. First, the task selection phase aims to determine the sequence of tasks according to the constraints in the workflow [12]. Tasks can be grouped in different ways as long as successive dependencies are not violated. Therefore, the task execution order can be arranged in many ways. Then, the resource allocation phase aims to choose the number and type of VMs to execute tasks [13]. In this phase, the number of VMs in the cloud is regarded as unlimited: a task can run on only one VM and a VM can run only one task at a time.
Workflow scheduling in distributed systems has been widely studied. Yu et al. [14] propose a cost-based workflow scheduling algorithm for utility grids. The algorithm minimizes the cost of execution while meeting a deadline. They use a Markov decision process approach to schedule task and reschedule unexecuted tasks to adapt to delays in service execution that may violate the deadline constraints. In another work, Yu et al. [15] use multi-objective evolutionary algorithms to obtain a set of feasible solutions after the user defines the deadline and budget constraints. The main objective is to let the user choose the best solution from multiple possible solutions. In [16], Sakellariou et al. implement an algorithm to schedule DAGs on heterogeneous machines under budget constraints. The algorithm computes the weight value of each task and machine via two approaches and then uses the weights to assign tasks to machines in consideration of the cost and budget. Using the concept of game theory and sequential cooperative game, Duan et al. provide two algorithms (gamequick and game-cost) to optimize performance and cost in [17]. In addition, they design and implement a novel system model with better controllability and predictability of multi-workflow optimization problems in a grid environment. Chen et al. proposed an ant colony optimization algorithm to schedule largescale workflows with three QoS parameters in a grid computing scenario in [18]. The algorithm enables users to specify two of the constraints and finds an optimized solution for the third constraint while meeting these parameters. These various algorithms are designed for users in a distributed environment, and the cost is usually based on all services used.
With the rise and development of cloud computing, research has increasingly turned to this field. Unlike in grid and cluster computing, task scheduling in cloud computing not only needs to consider the makespan factor but also needs to reduce the cost, which is based on the lease intervals. Hoffa et al. [19] compare the performance of running the Montage workflow on various types of resources, including both physical and virtual environments. They note that virtual environments can provide the necessary scalability, while local environments are sufficient but are not a scalable solution. Byun et al. [20] propose partitioned balanced time scheduling for executing tasks on the minimum resources under a given deadline. The algorithm associates the workflow management system with the resource provisioning environment to minimize the execution cost of a workflow. Their approach takes advantage of the elasticity, but ignores the heterogeneity, of cloud resources. Balanced time scheduling (BTS) is proposed for estimating the minimum number of resources needed to execute tasks under a deadline in [21]. The BTS algorithm uses list scheduling technology to delay a task until reaching the latest execute time. According to the local optimal time of each task, the tasks are allocated to the same host as much as possible to achieve resource minimization in workflow scheduling. However, the researchers simply consider VMs of the same type in the cloud environment, and the network contention influencing the start time of the task is ignored. In [22], Malawski et al. address the scheduling problem and resource provisioning for workflows on IaaS. The algorithms aim to maximize the number of workflows executed under the QoS constraints, such as deadline and budget. However, the execution time of a task on a VM is regarded as a constant. Abrishami et al. [23] propose a static algorithm named IaaS cloud partial critical path (IC-PCP) to schedule tasks on IaaS. First, the algorithm calculates the latest finish time of each task and then assigns tasks on a partial critical path to the least expensive VM instance that can complete the tasks before deadlines. If the available VMs cannot satisfy the deadline constraints, the algorithm generates a new VM instance that meets the conditions to execute all the tasks. This process is repeated until all the tasks are scheduled. However, the algorithm uses tasklevel optimization instead of a global optimization and may fail to generate a better solution without considering the entire workflow structure and characteristics.
Due to the uncertainty in the number and scale of tasks and resources in cloud computing, metaheuristic algorithms such as ant colony optimization (ACO), particle swarm optimization (PSO), and GA have been applied to scheduling problems. These methods obtain near optimal solutions via iteration. Although the complexity of these algorithms is more than that of heuristic and traditional algorithms, the methods have attracted substantial attention due to their good performance. Pandey et al. [24] try to achieve load balancing between VMs using a PSObased algorithm that can minimize the execution cost of tasks. The researchers obtain the total cost of a single workflow by changing the communication and execution cost in the resource pool. However, the makespan of the workflow, which is ignored in the scheduling, may violate the deadline constraint when using the cost-minimization policy. Yao et al. [25] design an improved multi-objective PSO algorithm called endocrine-based coevolutionary multiswarm for multi-objective optimization (ECMSMOO) to schedule the workflow in the cloud. An evolutionary strategy inspired by the endocrine regulation mechanism is designed for every particle to optimize different objectives, such as cost, makespan, and energy. Then, the algorithm uses competition and cooperation among swarms to avoid falling into a local optimum. However, the experiments are based on a limited set of VMs, which fails to take advantage of the infinite extensibility of the cloud. In [26], a PSObased algorithm is established in the cloud computing. Based on the related theory of multi-core processors, the proposed method assigns tasks to processors via certain rules in a dynamic scheduling strategy. The algorithm focuses on the utilization and bandwidth of resources in the cloud environment without considering the cost and deadline, which are more important factors for users. Wu et al. [27] propose two algorithms named ProLis and L-ACO to minimize the execution cost under the deadline constraint of workflow scheduling on the cloud. The ProLis algorithm distributes a deadline to each task, ranks the tasks, and sequentially allocates each task to satisfy the QoS. Based on this, L-ACO employs ACO to construct different task order lists and to construct solutions that minimize the cost under the deadline. However, the start-up time and performance variation of VMs are not considered.
In recent years, increasing research has focussed on the characteristics of the cloud environment, such as indefinite quantity, heterogeneity, performance variation, and acquisition delay of VMs, and few of these characteristics have been fully considered in previous studies. Mao et al. [28] present a new auto-scaling mechanism named scaling-consolidation-scheduling (SCS) to allocate all workflow tasks to the most costefficient VMs. The researchers consider the characteristics of VMs, such as performance variation and acquisition delay, but ignore the data transfer time between tasks, which will affect the completion time and total execution cost of the workflow. Rodriguez et al. [29] propose a static cost-minimization and deadline-constrained algorithm for workflow scheduling in a cloud environment. Based on the main characteristics of IaaS, the researchers merge and model both resource provisioning and scheduling as an optimization problem and use PSO to generate a solution that minimizes the lease cost of VMs before the deadline. However, the index of resources used to encode particles does not include much information about the type of VMs, so the algorithm cannot easily produce the best global optimal solution when particles move to the individual best solution. Poola et al. [30] propose robustness-cost-time (RCT), robustnesstime-cost (RTC), and weighted algorithms to schedule workflow tasks on heterogeneous resources in the cloud. These algorithms based on partial critical paths (PCPs) provide a robust and fault-tolerant schedule while minimizing the total makespan and cost. Moreover, the three policies have different objectives, robustness, time, and cost, and each objective has different priorities in each algorithm. However, these algorithms schedule the entire tasks of PCPs on VMs and may affect the makespan of the workflow because the performance variation of a VM will delay the completion time of all the tasks assigned to it, thereby affecting the start time of the tasks of the other PCPs. Sahni et al. [31] propose a dynamic cost-effective deadline-constrained heuristic algorithm called just in time (JIT-C) for the workflow scheduling problem. The algorithm aims to determine a feasible solution in which the resources are provisioned immediately before the tasks are ready to execute. Compared with the similar methods in [23,30], this algorithm consumes more time for the pre-processing and monitoring control loop. To address the characteristics of task allocation in the cloud, Arabnejad et al. [32] introduce the budget deadline aware scheduling (BDAS) algorithm to perform workflow scheduling in IaaS clouds. A tunable cost-time tradeoff is proposed to satisfy both budget and deadline constraints. The researchers use deadline bottom level (DBL) and deadline top level (DTL) to allocate tasks to different levels, and the budget and deadline are proportionally distributed to each level. Then, the algorithm assigns each task to an instance in consideration of a tradeoff between makespan and cost. In the experiments, the authors adopt a 97-s boot time for resource allocation but ignore the performance variation of VMs, which is one of the most important factors affecting the deadline and cost.
According to these studies, the characteristics of task allocation in the cloud have become an indispensable concern in scheduling problems. Meanwhile, the meta-heuristic GA is often applied to the problem because of its time efficiency [33], and different improvements of the GA to solve the workflow scheduling problem have been studied in [34][35][36][37]. On the basis of these studies, this paper proposes a DCGA for workflow scheduling in the cloud while considering the characteristics of the cloud, such as on-demand acquisition, dynamic extension, heterogeneity, acquisition delay, and performance variation of VMs. The DCGA uses a novel method for population initialization, crossover, and mutation to minimize the total execution cost under a deadline constraint.

Application model
The workflow of an IoT application can be represented as a DAG G = (T, E), where T = {t 1 , t 2 , t 3 , …, t n } is a set of vertices representing the tasks and E = {e ij | t i , t j ∈ T} is a set of edges denoting the control dependencies between tasks. Each task t i represents an indivisible transaction with a certain computation workload (w i ). A dependency e ij with transfer data (data ij ) is a precedence constraint between task t i and task t j . This constraint implies that the child task t j can start only after the parent task t i is finished and all associated data are transferred to t j . Each task may have one or more parents (predecessors) or children (successors), and a child task cannot be executed unless all its parent tasks have been completed and all the transfer data have been received.
The workflow starts with the entry tasks and concludes with the exit tasks. Thus, we add two dummy tasks with zero execution and communication time to the DAG when there is more than one entry or exit task. The two tasks are represented as t entry and t exit , respectively. In addition, we define D as the deadline constraint to complete the execution of the workflow, and an elastic factor that will be introduced in the experimental section is used to adjust the limit time bounded by D. More details are provided in Section 4, and a sample workflow is illustrated in Fig. 1.
For the workflow shown in Fig. 1, different task allocation schemes will lead to different execution times and costs. For example, as shown in Fig. 2, this sample workflow is scheduled on three different VMs. In schedule plan 1, the data transfer time delays the start time of t 6 and t 7 , which causes more idle time of V 2 and V 3 . Thus, the total execution time and cost of the workflow are both increased. In contrast, the strategy of assigning tasks with large data transmission to the same VM can effectively reduce the waiting time of tasks in schedule plan 2. Thus, with improving the utilization of VMs, the total execution time and cost of the workflow have been greatly reduced.
It can be seen that the execution time and cost of workflow will greatly differ due to the different scheduling schemes. This situation will become more obvious and complex with an increase of the number of tasks in a heterogeneous cloud environment. Therefore, a feasible workflow scheduling solution will not only reduce the time and cost of the workflow but will also improve resource utilization in the cloud environment.

System model
We adopt the IaaS cloud as the system model to support workflow scheduling in this article. The IaaS service model provides various computing resources in the form of a set of VMs VM = {vm 2 1 ; vm 3 2 ; vm 1 3 ; … vm k m } used to execute the tasks of the workflow. The vm m in the VM set can be classified as a VM type (vm k ) provided by the cloud provider. These VM types have different CPU types (processor k ) with different costs (cost vm k ) per time interval (time_interval). Therefore, VM types with faster computing performance usually have higher costs. The pricing model is dependent on a pay-as-you-go billing scheme, and the users are charged for the number of time intervals within a VM leased period. Partial utilization is rounded to a full unit time, that is, when 1 h is the minimum time interval, the user has to pay for the entire hour even if a VM is used for only a few minutes. Moreover, data transfer is free in most cloud environments, so the corresponding cost is not considered in the model. Additionally, an unlimited number of VMs is assumed to be leased by a workflow. The initial boot time (init_time) is the VM acquisition delay when a VM is leased before it is available to execute a task. The performance variation (per_vary) of VMs, which is caused by virtualization of resources, the shared nature of the infrastructure, and other factors, must also be considered. The shutdown time when a VM is released is ignored because the time has a negligible impact on the workflow scheduling process. The average bandwidth between VMs in a cloud is always assumed to be a fixed value. Furthermore, the data transfer time between two tasks is calculated based on the quantity of data transferred, and when two tasks are assigned to the same VM, this time is considered to be zero.

Problem formulation
Based on the models discussed above, we define some notation in Table 1. The detailed descriptions are presented as follows. 1. Predecessors pred(t i ) and successors succ(t i ): the immediate predecessors and successors of task t i are defined as 2. Estimated execution time ET(t i , vm k ) and data transfer time TT(e ij ): when task t i is scheduled on a VM of type vm k , the estimated execution time is defined as Before a task is executed, all the transfer data from its predecessors must be received. The data transfer time between task t i and its parent task t p is defined as 3. Latest finish time LFT(t i ) and latest start time LST(t i ): the latest finish time of task t i is the time that t i can complete its execution on the fastest VM type so that the makespan of workflow is no more than the deadline D. The definition of the latest start time is similar. The LFT(t i ) and LST(t i ) are obtained as  transfer time between t i and all its predecessors, and the available time of vm k m . The finish time is influenced by the percentage of performance variation per_vary of vm k m . The recurrence relations are 5. Available time Availðvm k m Þ: the available time of VM vm k m is the ready time to execute a new task. Assume task t i is the last scheduled task on vm k m . The Availðvm k m Þ will be updated to FT(t i ). The definition is the same as in Eq. (8). If the VM is new and has no task assigned to it, its available time is equal to the initial boot time (init_time). 6. VM lease start time VLST ðvm k m Þ and VM lease finish time VLFT ðvm k m Þ: the lease start time of VM vm k m is the time when vm k m is ready to execute tasks, which is equal to ST(t i ) when task t i is the first executed task on vm k m . The VM lease finish time is the time at which vm k m is recovered by the cloud, which is equal to FT(t j ) when task t j is the last executed task on vm k m . 7. Total execution time TET and total execution cost TEC: the total execution time of the workflow is equal to the finish time of task t exit , and the total execution cost is the total cost for all of the leased VMs.
Finally, the problem of workflow scheduling can be defined as finding a feasible solution for the given workflow such that the TEC is minimized and the TET is no greater than the deadline D. The problem is defined as When deploying IoT applications to cloud computing, in addition to the cost of the VMs leased, the need to ensure that the results can be obtained within a specified time should also be considered. Therefore, under the restriction of deadline D in formula (9), the workflow can be guaranteed to meet the requirement of completion time while reducing the execution cost.

The proposed DCGA algorithm
In this section, based on the GA, the proposed deadline and cost-aware optimization approach to achieve the goal of Eq. (9) is described. The GA, a metaheuristic algorithm inspired by the ideas of natural selection and genetic evolution, is frequently applied to optimization problems. The basic idea of the GA is to initialize a certain number of populations whose individuals are called chromosomes that represent a solution. The chromosome genes can be encoded by bits 0 and 1. Individuals with better evaluation values are selected for crossover to form new chromosomes, and mutation is performed with a specified probability to vary the diversity of the next generation. In this way, after multiple iterations or upon reaching the termination condition, the individual with the maximum/minimum evaluation value is selected as the approximate optimal solution. Based on the problem definition in Section 3, the pseudocode of DCGA is shown in Algorithm 1, and the relevant operations are presented as follows. Table 2 defines the symbols used in these algorithms. Compared to the traditional genetic algorithm, the proposed algorithm not only improves the coding but also reflects the topological structure between tasks, heterogeneity, and elasticity of the cloud environment through three strings of chromosomes. In the population initialization phase, the search scope can be reduced by adding specific individuals. In addition, the proposed algorithm improves the crossover and mutation operations to avoid the premature convergence problem of the traditional GA.

Task initialization
Tasks in the DAG have certain topological structures and dependencies. To schedule tasks according to an order, tasks are separated into a bag of tasks (BoT) series according to their level l. That means each task in the same BoT has the same level, and there is no interdependence between them. The pseudocode is shown in Algorithm 2.
We use depth-first search (DFS) [38] to generate topological sorts of tasks and use the DBL [39] to allocate tasks into different levels. The level of a task represents the maximum number of edges from it to the exit task. The level l can be defined as All tasks are then grouped into a set of BoTs based on their levels where l is a level integer in [1, LV(t entry )].

Encoding
Workflow scheduling in a heterogeneous cloud environment can usually be divided into three stages: task execution order, task assignment to VM, and VM type matching. Then, a solution to the scheduling problem consists of three strings: the task order, the assigned VM, and the matched type. These strings represent the structure of the chromosomes in the GA. In this paper, we use an encoding method similar to that presented in previous literature [34]. The lengths of the three strings are set as the number of tasks. The first string is named Task-Order, and the value represents the order in which tasks are scheduled according to the topological  structure of the DAG. Tasks in the string can switch positions as long as their level numbers l are equal. The next string, called Task-VM, denotes the VM to which the corresponding task is assigned. Similarly, the third string represents the type of the corresponding VM and is called Type-VM.
Omitting the entry and exit task, Fig. 3 shows the encoding of a possible schedule of the DAG in Fig. 1. In this case, the task scheduling sequence is {t 2 , t 1 , t 4 , t 3 , t 5 , t 7 , t 6 , t 8 } according to the Task-Order string. Each task in the sequential list is scheduled on the VM list { vm 3 1 ; vm 4 5 ; vm 1 2 ; vm 3 3 ; vm 2 6 ; vm 1 4 ; vm 3 1 ; vm 1 2 }, in which the type of each VM is presented in the Type-VM string. For example, task t 6 is assigned to vm 1 with type vm 3 after task t 5 is assigned to vm 6 with type vm 2 . Meanwhile, tasks t 2 and t 6 are assigned to the same VM vm 3 1 . The method of representing the task order, assigned VM, and VM type by three strings not only reflects the topological structure of the workflow, heterogeneity, and elasticity of the cloud environment, but also can generate different scheduling schemes by adjusting each string. This coding method is helpful to improve genetic operations such as crossover and mutation to improve the efficiency of finding the optimal solutions.

Population initialization
The scale of the workflow directly affects the solution space of the scheduling problem. To reduce the search scope and accelerate the convergence of the solution, we propose a novel method of population initialization to produce task sequences and VM allocations that is different from those in [34][35][36]. In the proposed DCGA, the following steps are used to generate an effective initial population and complete the chromosome coding. The pseudocode is shown in Algorithm 3.
Consider the workflow shown in Fig. 1. Each task has a level value l and belongs to the set BoT(l). The set of VM types is {v 0 , v 1 , v 2 , v 3 , v 4 , v 5 }, in which v 0 is the slowest type and v 5 is the fastest. The following process is repeated to generate individuals.
First, tasks are randomly selected from each BoT to form a task execution list and Task-Order strings, such as 12543768 (lines 3-8).
Second, HEFT is used to generate individuals with the minimum completion time and cost. For the individual with the minimum completion time, we assign tasks on the VMs with the fastest speed. The Task-VM strings are set to different integers, such as 01234567, while the VM-Type strings are set to the fastest VM type, such as 55555555 (lines 9-11). Notably, although this approach does not take into account the data transmission time between different VMs, it can still be considered as an approximate alternative. Similarly, the latter process is to assign all tasks on the same VM with the slowest speed, and Task-VM and VM-Type are set to 0 (lines [12][13][14]. These two individuals account for only one-third of the total population in order to achieve better optimization performance. Finally, appropriate VMs are allocated for each task according to the deadline and cost constraints. The VMs are selected according to the following steps: (1) select the VM with the lowest cost among the used VMs while meeting the subdeadline LFT(t i ) of the scheduling task (lines 16-18), (2) create a new VM that has the lowest cost while meeting LFT(t i ) (lines [20][21][22], and (3) compare the data transfer times TT(e pi ) between the scheduling task and all its parent tasks and then select the VM on which the parent task that has the maximum value (lines 24-26).

Fitness evaluation and selection
The fitness of each chromosome is determined by the total execution time TET and total execution cost TEC. First, tasks are selected and assigned to VMs according to the three strings of a chromosome. The start time ST and finish time FT of tasks are calculated by Eqs. (7) and (8). Then, the lease start time LST and lease finish time LFT of the VMs are updated. Finally, the TET and TEC of the workflow can be calculated according to Eq. (9) after all of the tasks are executed.
Based on the championship mode [11] and constraint processes strategy [40], the select operation between two chromosomes is implemented in the following order: (1) select the individual with the smallest TEC when the TET of two individuals are both less than the deadline D, (2) select the individual that has the TET that meets D when the TET of another exceeds D, and (3) select the individual with the smallest TEC when the TET of two individuals exceeds D.

Crossover
The GA generates offspring through the crossover operation, and the new individuals cannot violate the dependencies between tasks in a workflow. The definitions of l and BoT are used to implement crossover between the selected parents. The pseudocode is shown in Algorithm 4.
First, a BoT is randomly selected between 1 and the number of BoTs. Two crossover values, cross_point and cross_length, are also randomly produced, where cross_ point plus cross_length is less than the number of tasks in the selected BoT (lines 2-5). One parent is divided into Task_Seg1 and Remain_P1 by the crossover values, and the other parent is divided into Task_Seg2 and Remain_P2 by the tasks of Task_Seg1 (lines 6-9). Subsequently, two candidate individuals are formed according to Task_Seg2 and Remain_P1. The conflicts of different types on the same VMs are resolved by updating the type of VM according to the segments and the remaining parts. The different type in VM_Type of one candidate individual is updated to the type in Task_Seg2 (lines 10-15), while the VM_Type of another candidate individual is modified according to Remain_P1 (lines [16][17][18][19][20][21]. Finally, using the selection method in Section 4.4, the candidate individual with the best fitness is retained and added to the new population (line 22). Another offspring is generated in a similar manner, and the process is omitted in Algorithm 4. Figure 4 is an example of a crossover operation that randomly selects 3 as the level of BoT, 1 as the cross_ point, and 2 as the cross_length. After generating the Task_Segs according to the parents (grey part), there are type conflicts with v 1 and v 2 between Task_Seg2 and the rest of P1. Thus, two candidate individuals, Candchild1 and Candchild2, are produced. In VM_Type of Cand-child1, the number 3 of index 1 is updated to 4, and the number 1 of index 8 is updated to 3. Similar changes occur in Candchild2, and all of the updated values are presented in italics. Finally, Candchild1 is selected as offspring Child1 due to its better fitness. Another offspring, Child2, is produced in a similar manner.

Mutation
As in the crossover operation, the constraints between tasks cannot be violated by the mutation operation. Two mutation operations are used to increase the diversity of the solutions. The pseudocode is shown in Algorithm 5.
The first mutation operation is the exchange of the task order. Two tasks in the same BoT are randomly selected; then, the Task_Order is swapped while the values of Task_ VM and VM_Type are unchanged (lines [4][5]. The second mutation operation is the merging of VMs. The least used VM is randomly changed to another used VM. That is, the least frequent number in Task_VM is updated to another existing number, and the number in VM_Type is correspondingly changed (lines 7-10). Figure 5 shows an example of the two mutation operations: exchange of task t 4 and t 5 and update of v 4 5 to v 1 2 . In the VM merge operation, because several VMs, such as v 4 5 , v 3 3 , v 2 6 ; v 1 4 ; are used only once, v 4 5 is updated to v 1 2 in a random manner.
To avoid the deterioration of individual fitness caused by mutation, the mutated individual and the original individual are compared and the better individual is selected according to the method in Section 4.4.

Complexity analysis
The overall computational complexity of the proposed algorithm can be analysed in terms of the number of initialization, selection, crossover, mutation, and fitness evaluation operations. Consider a cloud in which the workflows have n tasks, the number of individuals in the population is p, and the maximum number of iterations is g. Task initialization is performed by a traversal through each edge and a nested loop of tasks, so the time complexity is O(n 2 ). Then, for each individual in the initial population, the task selection step has a complexity of O(n). The procedure of mapping each task to an appropriate resource has a maximum complexity of O(n 2 ) when satisfying the deadline and cost constraints. Therefore, the total time complexity of population initialization is O(n + n 2 )p. The time complexity of the selection operation is O(p), and the time complexity of the crossover and mutation operations are O(n 2 ) and O(n), respectively. The fitness evaluation for each individual has a complexity of O(n 2 ). Finally, the total time complexity of selection, crossover, mutation, and fitness evaluation with p individuals in the population and g iterations is O(pgn 2 ). Thus, the overall time complexity of DCGA is O(n 2 ) + O(n + n 2 )p + O(pgn 2 ), which can be expressed as O(pgn 2 ).

Experiments
This section presents the experiments implemented to evaluate the performance of the proposed DCGA.

Workflows
Workflows can achieve service objectives via various construction structures and have the characteristics of low coupling and flexible operation [41]. In order to better simulate IoT applications, the proposed algorithm was evaluated on three real workflows: LIGO, Montage, and Cybershake. These workflows, which have different structures and characteristics, can used to simulate various applications in the IoT environment. The multiple tasks of LIGO require processing large amounts of data; therefore, they can be regarded as data aggregation tasks, which require high computation power. The Montage does not require a large amount of CPU resources, and it focuses on intensive I/O operations. The Cybershake demands high performance computing and has an I/O intensive characteristic. A simplified structure of each workflow is shown in Fig. 6, and a detailed description of these workflows is presented in [42].
Synthetic workflows with similar structures are created by a workflow generator 1 . The generated workflows are represented in the form of a DAG in XML (DAX) format, which contains tasks, dependency edges, transfer data and other information. Then, the DAGs of the experimental workflows can be constructed as XML files.

Baseline algorithms
In our experiments, IC-PCP [23], PSO [29], and a normal GA are used as baseline algorithms to compare with the proposed algorithm.
The IC-PCP algorithm is based on the concept of a partial critical path (PCP), which is defined by unassigned parent tasks and the latest transfer data time. The PCPs related to the exit task are determined recursively, and the tasks on each PCP are assigned to the least expensive available VM or a new VM that can meet the latest start time of each task. Then, the start time and schedule identifier of the tasks are updated. This process is repeated until all tasks in the workflow have been scheduled on the appropriate VMs. Therefore, the cost and makespan of the schedule are calculated based on the lease time of the VMs. However, this algorithm does not consider the characteristics of the VMs, such as acquisition delay and performance variation.
The PSO algorithm is a heuristic optimization method focussed on the deadline constraint for resource allocation. Particles are represented as tasks in the workflow, and their positions represent the allocated resources. Because of the deficiency of VM types in encoding, The normal GA algorithm (GA_N) is used to analyse the basic characteristics of GAs. The initial population is randomly generated, and roulette wheel selection is used to choose parents. In the crossover phase, tasks with the same level number l are switched between parents. Random mutation is used to create new individuals with different types of VM. The other parameters are similar to the proposed algorithm.
These algorithms were applied in the same simulated cloud environment, together with the DCGA proposed in this paper. The parameters of the PSO algorithm were set according to the experiments in [29].

Experimental setup
The experiments simulated a cloud environment with a single datacenter and eight different VM types. The characteristics of the VM types shown in Table 3 are based on the Amazon cloud services 2 . The average bandwidth between VMs is fixed to 20 MBps, and the transmission cost is assumed to be zero. The processing capacity of each VM type is estimated at one million floating point operations per second (MFLOPS). In a real cloud environment, various factors, such as hardware type, data transmission, and resource allocation, can cause acquisition delay and performance variation of VMs. Therefore, an acquisition delay of 97 s, as presented in [43], and the performance of VMs, which was diminished by at most 24% based on a normal distribution with a mean of 12% and a standard deviation of 10%, as presented in [10], are adopted in our experiments.
To investigate the performance of these algorithms under different deadlines, the method mentioned in Section 4 is used to obtain the benchmark deadline D B . D B can be approximately calculated as the finish time of task t exit when all tasks are allocated to different VMs of the fastest type. The following formula is used to obtain different deadlines: where α is the deadline factor ranging from 1 to 4 with a step size of 0.2. Three different levels of deadlines can be classified as follows: (1) strict deadlines (1 ≤ α < 2), (2) moderate deadlines (2 ≤ α < 3), and (3) relaxed deadlines (3 ≤ α ≤ 4). The parameters of the proposed DCGA are as follows: the population size is 500, the maximum number of generations is 100, the crossover rate is 80%, and the mutation rate is 20%. Initially, 500 solutions are created by the methods presented in Sections 4.1 and 4.3. Then, these initial chromosomes are evaluated, and the next population is generated by selection, crossover, mutation, and fitness evaluation. Finally, the feasible solution with the minimum TEC under the deadline constraint D in every generation is found.

Results and analysis
Under the different deadline constraints for the three workflows, each experiment is executed ten times, and  the mean of the results is used for analysis. According to the deadline and cost constraints, the experimental analyses are carried out with respect to three indicators: success rate, execution makespan, and execution cost. The definitions of these indicators are as follows: 1. Success rate: the ratio between the number of simulations runs that satisfy the constraints and the total number of simulations runs. The success rate represents the effectiveness of the algorithm. 2. Execution time: the finish time of the last task t exit in a schedule, which can be calculated by Eq. (8).
Usually, a shorter execution time means better performance of the algorithm. 3. Execution cost: the lease cost of all VMs used in a schedule, which can be calculated by Eq. (9). The execution cost of the algorithm can only be meaningfully compared under the deadline constraint. The success rates of these algorithms are greatly improved when the deadline is moderate or relaxed. Under a moderate deadline, the success rate of GA_N is 22%, while IC-PCP fails to obtain a feasible solution on LIGO. The average success rate of PSO is increased to 99.3%, and that of DCGA is 100%. For the relaxed deadline constraint on LIGO, DCGA outperforms GA_N by 20% and outperforms IC-PCP by 55%. All four algorithms achieve their highest success rate on Montage and Cybershake.

Success rate
The results in Table 4 show that IC-PCP gives poor performance, while DCGA gives the best performance for each deadline constraint. Because IC-PCP does not consider the performance variation and acquisition delay of VMs, it is easy to prolong the finish time of tasks, which will cause the makespan of workflow to exceed the deadline. Because of the instability and randomness of the original algorithm, GA_N also has poor performance compared to that of PSO and DCGA. PSO ignores more information of the VM on particle coding and fails to exploit the effect of VM type on task execution cost, so the cost to obtain a higher success rate is increased. The DCGA proposed in this paper takes into account the main characteristics of the cloud environment and searches for the best solution through encoding, selection, crossover and mutation; therefore, its performance is superior to that of the other three algorithms.

Makespan and cost evaluation
To obtain more details about the performance of all the algorithms, the average execution time and average execution cost are compared under strict, moderate, and relaxed deadline constraints. Figures 7, 8 and 9 show the average makespan and average cost under these deadline factors for different workflows.
1. Figure 7 shows the average makespan and average cost results achieved on LIGO. With a long makespan in most cases, GA_N and IC-PCP fail to generate a feasible solution under strict and moderate deadline constraints. GA_N has the worst performance with the highest cost, while the performance of IC-PCP is slightly better. Although PSO is the least expensive, the algorithm does not obtain valid solutions without violating the deadline constraints. The proposed DCGA has low execution cost under different deadline constraints: the average cost is 89.83% lower than that of GA_N, 38.8% higher than that of IC-PCP, and 60.99% lower than that of PSO. Thus, DCGA produces effective feasible solutions and achieves the best performance with a smaller cost than that of the other algorithms on the LIGO workflow. 2. The simulation results for the Montage workflow are shown in Fig. 8. The average execution time of the schedules generated by GA_N and IC-PCP is less than that of PSO and DCGA. The average cost of GA_N is the highest, while DCGA has the lowest cost. The performance of PSO is slightly improved compared to LIGO. However, PSO fails to obtain a feasible solution when the deadline factor is 1, so the cost is not shown in the figure. The DCGA algorithm achieves better average execution cost than that of IC-PCP but has a larger makespan. For the Montage workflow under moderate deadline constraints, DCGA obtains an average cost that is 98.45% lower than that of GA_N, 52.07% lower than that of IC-PCP, and 92.16% lower than that of PSO. The average execution cost of schedules generated by the proposed algorithm is minimized by using a large execution time that satisfies the deadline constraint. 3. The average makespan and average cost results obtained by the tested algorithms on Cybershake are shown in Fig. 9. IC-PCP failed to obtain valid solutions meeting the strict deadline constraint, while GA_N slightly improved its performance. Meanwhile, IC-PCP cannot obtain a feasible solution when the deadline factor is less than 1.6, so the cost is not shown in the figure. The same situation occurs for PSO when the deadline factor is 1. The cost of the proposed DCGA is 98.02% lower than that of GA_N, 72.96% lower than that of IC-PCP, and 89.87% lower than that of PSO. Among the compared algorithms, DCGA always achieves the minimum execution cost while meeting the deadline constraints.

Summary of scheduling algorithms performance
In general, the DCGA proposed in this paper can obtain feasible solutions under different deadline constraints.
The proposed algorithm not only has a high success rate but can also achieve the minimum execution cost under deadline constraints. In particular, Figs. 7, 8, and 9 show that DCGA performs better than the baseline algorithms in terms of makespan and cost regardless of whether the workflow is I/O intensive or computationally complex. This good performance is due mainly to the following reasons: 1. GA_N shows only the characteristics of GA without considering the optimization of the population initialization, selection, crossover, and other operations. Because the initial population is randomly generated, the search space is too large to converge to the optimal solution. Although the algorithm can sometimes obtain solutions that meet certain conditions, the average execution cost is always much higher than that of the other algorithms because of the randomness and instability. 2. IC-PCP fails to take into account the actual characteristics of the cloud environment. When a task is assigned to a VM, the performance variation and acquisition delay of VMs affect not only the task and other tasks in the critical path but also tasks on other critical paths of the workflow. Thus, a missed subdeadline for a task extends the makespan of the entire workflow, so the algorithm fails to produce feasible solutions under the strict deadline. 3. PSO does not provide sufficient VM information through the encoding of particles. When the velocity and position of particles are updated, the insufficient consideration of VM types causes the particles to tend toward the local optimum rather than the global optimum. When the deadline is tight, feasible solutions are difficult to obtain, which negatively affects the success rate of the algorithm. Under moderate and relaxed deadline constraints, PSO gives less consideration to the cost of different VM types, thus affecting actual performance. 4. The proposed DCGA focuses on the main features in the cloud environment. The matching of tasks and VMs can be reflected by the appropriate coding of chromosomes, which has good adaptability. Due to the effective use of coding features, individuals with better makespan and cost can be inherited to generate the final optimized solution.
Therefore, DCGA has the best performance in terms of producing cost-effective and feasible schedules under deadline constraints.

Conclusions and further work
In this paper, the scheduling problem of the workflow of IoT applications is studied, and a deadline and costaware genetic optimization algorithm is proposed. With the purpose of minimizing the cost under a deadline constraint, the proposed algorithm focuses on the important features of the cloud, such as on-demand acquisition, heterogeneous dynamics, acquisition delay, and performance variation of VMs. Furthermore, the proposed algorithm uses heuristic operations to assign tasks to appropriate VMs. The experimental results demonstrate that among state-of-art algorithms, our algorithm not only exhibits the highest success rate but also achieves low execution costs under various deadline constraints.
In the future, we would like to consider other issues, such as task reassignment and VM failure, because these factors will affect the stability and reliability of the proposed algorithm. The cost of communication among VMs in a real cloud environment must also be considered. Furthermore, we intend to update the design of the population initialization, encoding, and crossover of the proposed algorithm to improve the performance and efficiency.