Scheduling multitask jobs with extra utility in data centers
 Xiaolin Fang^{1}Email authorView ORCID ID profile,
 Junzhou Luo^{1},
 Hong Gao^{2},
 Weiwei Wu^{1} and
 Yingshu Li^{3}
https://doi.org/10.1186/s1363801709860
© The Author(s) 2017
Received: 10 August 2017
Accepted: 13 November 2017
Published: 25 November 2017
Abstract
This paper investigates the problem of maximizing utility for job scheduling where each job consists of multiple tasks, each task has utility and each job also has extra utility if all tasks of that job are completed. We provide a 2approximation algorithm for the singlemachine case and a 2approximation algorithm for the multimachine problem. Both algorithms include two steps. The first step employs the Earliest Deadline First method to compute utility with only extra job utility, and it is proved that it obtains the optimal result for this subproblem. The second step employs a Dynamic Programming method to compute utility without extra job utility, and it also derives the optimal result. An approximation result can then be obtained by combining the results of the two steps.
Keywords
1 Introduction
Job scheduling is a widely studied topic in computer science. Many systems such as parallel and distributed computing, cloud computing, workforce management, energy management, and network communications require scheduling of jobs [1–5]. There are many studies designing efficient approaches to solve the job scheduling problem so as to improve the resultant performance subject to the resource constraints [6–9].
Many applications prefer to divide large jobs into multiple small tasks to better utilize the limited resources and provide better service quality. As stated in [10], most interactive services such as web search, social networks, online gaming, and financial services now are heavily dependent on computations at data centers because their demands for computing resources are both huge and dynamic. Interactive services are timesensitive as users expect to receive a complete or possibly partial response within a short period of time. Thus, a job should be preemptive and it can be divided into many small tasks (we call this as multitask problem) in order to provide interactive services and improve the utilization ratio of the computing resources.
We study the multitask job scheduling problem in this paper. Usually, the aim of multitask job scheduling is to maximize the profit or minimize the cost while subject to the resource and deadline constraints. This paper also studies the profit maximization problem for multitask job scheduling where each job has a starting time and an ending time. The profit is called utility in this paper. The utility of a task or job can be obtained only if the task or job is completed. Most state of art works study the problem considering either the utility of the tasks or the utility of the jobs. Few works consider both the utility of the tasks and jobs. In this paper, we study the problem of multitask job scheduling at a data center with the goal of maximizing the total utility of all the jobs, where each job is decomposed into multiple tasks, and both a job and a task have their own utility. That is, each task has its own utility and each job also has an extra utility which can only be obtained when all its tasks are completed.
The problem investigated in this paper is particularly challenging because it is quite difficult to decide whether it is better to schedule a job as a whole or to schedule the tasks of the job separately. Furthermore, it is difficult to make correct decisions for current jobs because the requirements of the incoming jobs are unknown.
We first study the singlemachine problem where there is only one machine that can be used. The singlemachine problem expects a method to schedule the jobs on one machine while satisfying the resource and deadline constraints. We then study the multimachine problem where multiple machines can be employed. Because of the NPcompleteness of the problem, we present two corresponding 2approximation algorithms for the two problems.

To the best of our knowledge, this is the first work to study the problem considering both task utility and job utility.

A 2approximation algorithm is provided for the singlemachine problem. This algorithm includes an Earliest Deadline First (EDF) scheduling and a Dynamic Programming (DP) algorithm.

Another 2approximation algorithm is provided for the multimachine problem. Similar to the algorithm for the singlemachine problem, this algorithm also employs an EDF scheduling and a DP algorithm.
The rest of the paper is organized as follows. Section 2 introduces the related works. Section 3 presents the problem formulation. Section 4 studies the singlemachine problem. The multimachine problem is studied in Section 5. And Section 6 concludes the paper.
2 Related works
The job scheduling problem can be classified into multiple classes, such as single or multiple tasks, single or multiple machines, and identical or unrelated machines. Usually, the input of the problem involves n jobs and k machines. Each job is associated with a release time, a deadline, a weight, and a processing time on each machine. The goal is to find a nonpreemptive schedule that maximizes the weight of the jobs subject to their respective deadlines. Garey and Johnson [11, 12] show that the simplest instance of the decision problem corresponding to this problem is NPcomplete.
BarNoy et al. [13, 14] study the scheduling problem where each job includes a single task. The authors present a 3approximation algorithm using the local ratio technique. For arbitrary job weights and a single machine, an LP formulation achieves a 2approximation for polynomially bounded integral input and a 3approximation for arbitrary input. For unrelated machines, the factors are 3 and 4, respectively. Because of the high time complexity of the LPbased method, BarNoy et al. [13] also provide a combinatorial approximation algorithm whose approximation factor is \(3+2\sqrt {2}\). Independently, Calinescu et al. [15] designed a 3approximation algorithm via rounding linear programming solutions.
The preemptive version of the singletask problem for a single machine was studied by Lawler [16]. For identical job weights, Lawler showed how to apply the dynamic programming techniques to solve the problem in polynomial time. The same techniques are employed to obtain a pseudopolynomial algorithm for the NPhard variant in which the weights are arbitrary. Lawler [17] also designed polynomial time algorithms that solve the problem in two special cases: (i) the time windows in which jobs can be scheduled are nested, and (ii) the weights and processing times are in opposite order. Kise et al. [18] showed how to solve the special case where the release times and deadlines are similarly ordered.
Some works [19–22] study the problem where each job has multiple tasks, which is called the SplitJob problem. In the SplitJob problem, a task does not have a window within which the tasks can be scheduled. That is, the tasks can only be decided to be scheduled or not. The unit height case of the basic SplitJob problem has been addressed by finding the maximum weight independent sets in interval graphs [19, 20]. BarYehuda et al. [21] present a (2r)approximation algorithm, where r is the number of the tasks in a job. They also proved a hardness result indicating it is NPhard to approximate the problem within a factor of O(r/logr). Thus, their approximation ratio is nearoptimal. BarYehuda and Rawitz [22] studied the uniform case of the basic SplitJob problem and derived a (6r)approximation algorithm by utilizing the fractional local ratio technique.
Venkatesan et al. [23] study the problem of maximizing the throughput of jobs where each job consists of multiple tasks. Different from the SplitJob problem, each task has a window where the task can be scheduled any time within the window subject to the processing length. The algorithm presented in [23] is an LPbased algorithm which gives 8rapproximation.
All the above works either consider the utility of tasks or the utility of jobs. In this paper, we study the problem where each job consists of multiple tasks, each task has utility, and each job has extra utility if all its tasks are completed.
A closely related problem is considered by Zheng et al. in [10] which study the problem of scheduling interactive jobs at a data center with the goal of maximizing the total utility of all the jobs. In their problem, the utility of a job is a function of the completed workload of that job. That is, the utility of a job varies when the completed workload of that job increases. The presented function in their work is nonlinear and concave. If the scheduling can be preemptive, then the authors can provide an optimal solution to solve the problem.
3 System model and problem formulation
3.1 System model
Assume there are m physical machines {M _{1},M _{2},…,M _{ m }} and n jobs {J _{1},J _{2},…,J _{ n }} in the data centers. Each job J _{ i } has a starting time s _{ i } and an ending time e _{ i }, i.e., J _{ i }=[s _{ i },e _{ i }], which is called the processing interval. Each job needs to be completed within its own processing interval. Each job J _{ i } consists of multiple tasks \(\{T_{i1},T_{i2},\dots,T_{in_{i}}\}\), where n _{ i } is the number of the tasks involved in job J _{ i }. Each task T _{ ij } has a processing time of length 1 and a utility u _{ ij }, which means it needs 1 machine to take 1 unit of time to complete task T _{ ij }, and if the task is completed, then the utility is u _{ ij }. With the above assumption, it can be easily found that each task T _{ ij } must be completed within the processing interval [s _{ i },e _{ i }]; otherwise, the task is dropped. We define the assignment of a task T _{ ij } as ∅ or a subinterval I _{ ij } of 1 unit of length within the processing interval [s _{ i },e _{ i }], i.e., I _{ ij }=∅, or I _{ ij }∈[s _{ i },e _{ i }] and I _{ ij }=1 for some machine M _{ k }. Let a(T _{ ij })=k indicate that task T _{ ij } is assign to machine M _{ k }. The empty assignment I _{ ij }=∅ indicates that the task is dropped. If a task is completed, then it has a nonempty assignment on a certain machine such that the assigned subinterval does not overlap (or conflict) with any assignment on the same machine. We assume s _{ i } and e _{ i } are integers. One unit of time is called a slot in this paper. That is, a task needs to take one slot on a machine to be completed. We only consider the problem where each task T _{ ij } has a processing time of length 1. Tasks with arbitrary processing times will be studied in the future work.
We consider a situation where the jobs belong to different users, and the users are always willing to encourage the data centers to complete all their tasks. Therefore, job J _{ i } has an extra utility σ _{ i } in this paper. If all the tasks of job J _{ i } are completed, then the utility gain of this job is the sum of the utility of the tasks included in J _{ i } and the extra utility σ _{ i }, i.e. \(u(i)={\sum \nolimits }_{j=1}^{n_{i}}u_{ij}+\sigma _{i}\). Otherwise, even one task of a job is not completed, the utility gain is the sum of the utility of the completed tasks, without the extra job utility, i.e. \(u(i)={\sum \nolimits }_{j=1,I_{ij}\neq \emptyset }^{n_{i}}u_{ij}\).
3.2 Problem statement
It is easy to find that this problem is NPcomplete. Consider a simple instance where there is only one machine in the problem, the processing interval of each job is [0,T], each job J _{ i } consists of n _{ i } oneunit tasks, each job has extra utility σ _{ i }, and all the tasks have no utility, i.e. u _{ ij }=0, then the problem is to find an assignment within the processing interval while maximizing the utility gain, which is equivalent to the wellknown Knapsack problem which is NPcomplete. For simplicity, the singlemachine problem where there is only one machine can be used is first studied and then followed by the general case where there are multiple machines.
4 Algorithm design for singlemachine problem
We first consider a simpler instance of this problem where there is only one machine. As stated in the previous section, even the singlemachine problem is NPcomplete. Therefore, we present a 2approximation algorithm in this section. The main idea of this algorithm is to solve the problem in two steps. The first step is to solve the singlemachine problem without considering extra job utility. The second step is to solve the singlemachine problem by only considering the extra job utility. The final result of the singlemachine problem can then be obtained by combining the results derived from the two steps.
4.1 Problem without extra job utility
In this step, we do not consider the extra job utility. Therefore, given n jobs, each job J _{ i } has a processing interval [s _{ i },e _{ i }], each job J _{ i } consists of n _{ i } oneunitlength tasks, and each task T _{ ij } has utility u _{ ij }. This step is to find an assignment with the maximum utility gain in a single machine.
We first consider a special case where the utility of each task is 1. Then, the problem is to schedule as many tasks as possible. We introduce the earliest ending time first algorithm which is also called Earliest Deadline First (EDF) in other works and show that the EDF algorithm schedules the maximum number of tasks.
The EDF method always schedules the job with the earliest ending time first. Let J _{ i } be the job with the earliest ending time. EDF scans the processing interval of J _{ i } from s _{ i } to e _{ i } and schedules the tasks of J _{ i } one by one to the unused slots. A slot is unused if no tasks are scheduled to this slot. If all the slots in [s _{ i },e _{ i }] are scanned, or all the tasks of J _{ i } are scheduled, EDF begins to schedule the next job J _{ i+1}.
Theorem 1
The EDF algorithm schedules the maximum number of tasks.
Proof
We prove the theorem by induction. Let n=1, i.e., there is only one job, then it is easy to find that the EDF algorithm schedules the maximum number of tasks.
Assume the EDF algorithm schedules the maximum number of tasks when n=k.
Now, we prove that the theorem is correct when n=k+1. If all the tasks of J _{ k+1} can be scheduled in its processing interval, then the theorem is correct. Otherwise, not all the tasks of J _{ k+1} can be scheduled, then there are two cases as follows.
1) If all the scheduled tasks of J _{1} to J _{ k } are within the processing interval [s _{ k+1},e _{ k+1}], it indicates that s _{ k+1}≤min_{1≤i≤k }{s _{ i }} and max_{1≤i≤k }{e _{ i }}≤e _{ k+1}, then all the slots within [s _{ k+1},e _{ k+1}] are used.
2) Otherwise, some tasks of J _{1} to J _{ k } are scheduled before s _{ k+1}, and some tasks of J _{1} to J _{ k } are scheduled within [s _{ k+1},e _{ k+1}]. No tasks are scheduled after e _{ k+1} because the ending time max_{1≤i≤k }{e _{ i }}≤e _{ k+1}. We only need to consider whether the scheduled tasks of J _{1} to J _{ k } within [s _{ k+1},e _{ k+1}] can be moved before s _{ k+1}. If we can, then more tasks of J _{ k+1} can be scheduled. However, the EDF algorithm always schedules the tasks as earlier as possible; therefore, it is impossible to move some of these scheduled tasks earlier.
It completes the proof. □
For simplicity of illustration, we explain the meaning of link and reaching which will be used frequently later. Both link and reach are defined towards the tasks/jobs which are not dropped (the scheduled tasks/jobs). In this paper, the links are directed.
1) The scheduled tasks of the same job link to each other. We call this the task link.
The final schedule is shown in Fig. 5. The algorithm schedules the maximum number of tasks achieving the maximum total utility. In Fig. 5, the gray slots have been scheduled with tasks, and the numbers on the gray slots represent task utility.
Theorem 2
The EDFbased algorithm is optimal.
Proof
Theorem 1 proves that the EDF algorithm schedules the maximum number of tasks. We now only need to prove that the EDFbased algorithm maximizes the total utility of the scheduled tasks. We also prove it by induction.
Let n=1, that is, there is only one job. It is easy to find that the EDFbased algorithm maximizes the total utility of the scheduled task because the EDFbased algorithm always schedules the task with the largest utility first.
Assume the EDFbased algorithm maximizes the total utility of the scheduled tasks when there are k jobs, i.e., n=k.
Next, we prove that the theorem is correct when n=k+1. Then, we need to prove the EDFbased algorithm obtains the optimal result for J _{ k+1}.
If all the tasks of J _{ k+1} can be scheduled in its processing interval, then the theorem is correct.
Otherwise, if not all the tasks of J _{ k+1} can be scheduled, then the EDFbased algorithm first schedules as many tasks with the largest utility as possible to the unused slots. As stated in Theorem 1, the EDF algorithm cannot schedule more tasks. Therefore, it should determine whether to schedule the remaining tasks of J _{ k+1} or not. For the remaining tasks, the EDFbased algorithm always uses them to replace the tasks with smaller utility. Without loss of generality, let T ^{′} be the task with the largest utility in the remaining tasks and its utility be u ^{′}. Let T be the task with the least utility that has been scheduled, and its utility be u. If u<u ^{′}, then the largest utility gain is achieved if T is replaced by T ^{′}. The utility gain is u ^{′}−u _{1}+u _{1}−u _{2}+u _{2}−…−u _{ k }+u _{ k }−u=u ^{′}−u, where {u _{ k },u _{ k−1},…,u _{1}} are the utility of the relay tasks in the link sequence from T to T ^{′}. Because u is the minimum, u ^{′}−u is maximum. This property holds for the remaining tasks. It completes the proof. □
The time complexity of the EDFbased algorithm is O(n ^{2} r ^{2}), where n is the number of jobs and r is the maximized number of the tasks of a job. In the EDFbased algorithm, the jobs are scheduled according to the ending time one by one. For the tasks of each job, the algorithm needs to find a task with the smallest utility that has already been scheduled. It needs O(n r) time to search the task in a directed graph constructed by the link relation for tasks to tasks, tasks to jobs, and jobs to jobs. It also needs O(n) time to update the graph any time the graph is changed, i.e. a replacement is implemented.
4.2 Problem with only extra job utility
Now, we address the problem where each job has only extra utility and every task does not have utility. If all the tasks of a job are scheduled, then the job obtains the extra utility. Even one task of a job is not scheduled, the job loses the extra utility. This problem is similar to the one studied in [16] where given n jobs with arbitrary processing time, release dates and due dates, and the job can be scheduled preemptively; the objective is to minimize the sum of the weights of the later jobs. The scheduling of our problem is not preemptive, but the processing time of each task is one unit of time. The preemptive scheduling and a task with one unit of processing time in our problem are similar. Therefore, minimizing the sum of the weights of the later jobs is the same as maximizing the sum of the utility of the scheduled jobs in our problem. The authors in [16] give a pseudo polynomial time Dynamic Programming (DP) algorithm. We also adopt this algorithm to solve the problem. The DP formulations are represented by Eqs. (6) and (7).
Given a job set J, let \(s(J)=\min _{J_{i}\in J}\{s_{i}\}\) be the minimum starting time of J, \(p(J)={\sum \nolimits }_{J_{i}\in J}n_{i}\) be the total processing time of J, \(\sigma (J)=\sum _{J_{i}\in J}\sigma _{i}\) be the total extra utility of J, and c(J) be the time the last job in J is completed in an EDF schedule.
One can refer to article [16] for the detailed algorithm. As stated in that work, assume the jobs are ordered by the ending time in nondecreasing order. Let s be a starting time, and σ be an integer representing utility. C _{ i }(s,σ) is defined as the minimum value of c(J) with respect to feasible set J⊆{J _{1},J _{2},…,J _{ i }}, with s(J)≥s and σ(J)≥σ. If there is no such feasible set J, then C _{ i }(s,σ)=+∞. Accordingly, the final result that maximizing the weight of a feasible set is given by the largest value of σ such that C _{ n }(s _{min},σ) is finite, where s _{min}= min1≤i≤n{s _{ i }}.
If job J _{ i } cannot be contained in a feasible set J, i.e. s _{ i }<s(J), then C _{ i }(s,σ)=C _{ i−1}(s,σ).
Otherwise, if job J _{ i } can be contained in the feasible set J, there exists two cases.
In the first case, job J _{ i } starts after c(J−{J _{ i }}). Either c(J−{J _{ i }})≤s _{ i }, then C _{ i }(s,σ)=s _{ i }+n _{ i }; or c(J−{J _{ i }})>s _{ i } and the scheduled tasks in the interval [s _{ i },c(J−{J _{ i }})] are continuous for J−{J _{ i }}, then C _{ i }(s,σ)=C _{ i−1}(s,σ−σ _{ i })+n _{ i }. Thus, C _{ i }(s,σ)= max{s _{ i },C _{ i−1}(s,σ−σ _{ i })}+n _{ i }.
In the second case, job J _{ i } starts before c(J−{J _{ i }}), which indicates there is an idle time between s _{ i } and c(J−{J _{ i }}). Let J ^{′} be the last set of jobs scheduled continuously before c(J−{J _{ i }}) for J−{J _{ i }}. Then, c(J ^{′})=C _{ i−1}(s(J ^{′}),σ(J ^{′})). Let it be c(J ^{′})=C _{ i−1}(s ^{′},σ ^{′}) for simplicity.
Let P _{ i−1}(s,s ^{′},σ ^{″}) be the minimum number of tasks scheduled between s _{ i } and s ^{′}, with respect to feasible set J ^{″}⊆{J _{1},J _{2},…,J _{ i−1}} with s(J ^{″})≥s, c(J ^{″})≤s ^{′}, and σ(J ^{″})≥σ ^{″}. Note that it is the minimum number of tasks scheduled in interval [s _{ i },s ^{′}], rather than [s,s ^{′}]. Then, the number of slots available for job J _{ i } between s _{ i } and s ^{′} can be represented as s ^{′}−s _{ i }−P _{ i−1}(s,s ^{′},σ−σ _{ i }−σ ^{′}). Thus, the completing time C _{ i }(s,σ)=C _{ i−1}(s ^{′},σ ^{′})+ max{0,n _{ i }−s ^{′}+s _{ i }+P _{ i−1}(s,s ^{′},σ−σ _{ i }−σ ^{′})}.
Enumerate every s ^{′} and σ ^{′}. We can get C _{ i }(s,σ)= mins ^{′}>s,σ ^{′}<σ{C _{ i−1}(s ^{′},σ ^{′})+ max{0,n _{ i }−s ^{′}+s _{ i }+P _{ i−1}(s,s ^{′},σ−σ _{ i }−σ ^{′})}}. The enumeration of s is among all the starting times of the jobs, rather than among all the possible times, which can drastically reduce the computation complexity.
The computation of P _{ i−1}(s,s ^{′},σ ^{″}) is as follows. Let J ^{″}⊆{J _{1},J _{2},…,J _{ i−1}} be the set of jobs which realize P _{ i−1}(s,s ^{′},σ ^{″}). Then, there exists two cases.
If s(J ^{″})>s, then P _{ i−1}(s,s ^{′},σ ^{″})=P _{ i−1}(s(J ^{″}),s ^{′},σ ^{″}). Enumerate every s ^{+}>s and find the minimum one, then \(P_{i1}(s,s',\sigma '')=\min _{s^{+}>s}\{P_{i1}(s^{+},s',\sigma '')\}\).
Otherwise, if s(J ^{″})=s and the scheduling of J ^{″} is not continuous, let J ^{′} be the first set of jobs which run continuously and s(J ^{′})=s, then the total number of tasks scheduled within [s _{ i },C _{ i−1}(s,σ(J ^{′}))] is max{0,C _{ i−1}(s,σ(J ^{′}))−s _{ i }}. We now need to compute the number of tasks which can be scheduled within [C _{ i−1}(s,σ(J ^{′})),s ^{′}]. It is easy to find that P _{ i−1}(s,s ^{′},σ ^{″}) can be represented as max{0,C _{ i−1}(s,σ(J ^{′}))−s _{ i }}+P _{ i−1}(s ^{″},s ^{′},σ ^{″}−σ(J ^{′})), where s ^{″} is the minimum starting time greater than or equal to C _{ i−1}(s,σ(J ^{′})). For simplicity, let σ ^{′}=σ(J ^{′}). Enumerate every σ ^{′}. We have P _{ i−1}(s,s ^{′},σ ^{″})= min0<σ ^{′}≤σ ^{″}{max{0,C _{ i−1}(s,σ ^{′})−s _{ i }}+P _{ i−1}(s ^{″},s ^{′},σ ^{″}−σ ^{′})}.
The time and space complexities for this DP algorithm are O(n ^{3} σ ^{2}) and O(n ^{2} σ), respectively, where n is the number of the jobs and σ is the sum of the utility of the jobs. It can be easily found that the time complexity of the DP algorithm is pseudo polynomial because the DP formula includes an integer input σ which can be extremely large in real systems. Therefore, we provide a theoretical approximation solution for this problem.
4.3 Approximation algorithm for singlemachine problem
Theorem 3
The APPX1 algorithm is a 2approximation algorithm.
Proof
Let OPT be the utility obtained by an optimal solution and ALG be the utility obtained by the APPX1 algorithm. It is easy to find that OPT can be represented as OPT=u+σ, where u is the total utility of the scheduled tasks and σ is the total extra utility of the entirely scheduled jobs in an optimal solution. Let u ^{′} be the total utility obtained by the EDFbased algorithm, and σ ^{′} be the total utility obtained by the DP algorithm. From the earlier analysis, both the EDFbased algorithm for the problem without extra utility and the DP algorithm for the problem with only extra utility are optimal. Thus, u≤u ^{′} and σ≤σ ^{′}, and then, we have OPT≤u ^{′}+σ ^{′}. ALG actually can be represented as ALG≥ max{u ^{′},σ ^{′}}. Therefore, OPT≤2ALG. It completes the proof. □
4.4 An improvement for the DP algorithm
Recall the definition of C _{ i }(s,σ), it is the minimum value of c(J) with respect to feasible set J⊆{J _{1},J _{2},…,J _{ i }}, with s(J)≥s and σ(J)≥σ. It can be found that, in the DP recursion formula C _{ i }(s,σ), the parameter σ is build only on the extra utility of the scheduled jobs, but does not consider the utility of the tasks of the scheduled jobs. This can be improved by computing C _{ i }(s,u) where the paremeter is build on the total utility of the scheduled jobs. Let \(u_{i}={\sum \nolimits }_{i=1}^{n_{i}}u_{ij}+\sigma _{i}\). Use u _{ i } to replace σ _{ i } in the DP formula, then C _{ i }(s,u) represents the minimum value of c(J) with respect to feasible set J⊆{J _{1},J _{2},…,J _{ i }}, with s(J)≥s and u(J)≥u, where u(J) includes the utility of all the tasks of the jobs in J and the extra utility of the jobs in J. Such modification consider the extra utility of the scheduled jobs and also the utility of the tasks of the scheduled jobs. It can improve the result derived by the DP algorithm when the utility of the tasks takes a large proportion comparing with the extra utility. However, when the extra utility takes a large proportion (for example, in a worst case where all the tasks have no utility, the jobs have only the extra utility), the improvement is little. Algorithm APPX1’s use of such modification of the DP algorithm cannot improve the approximation ratio, but may improve the results in many scenarios.
5 Algorithm design for multimachine problem
The solution for the multimachine problem is similar to the singlemachine problem. It also includes two steps. The first step is to schedule the tasks without considering the extra utility of the jobs. The second step is to schedule the jobs only considering the extra utility of the jobs. And finally, select a better schedule from the two steps.
5.1 Problem without extra utility
5.2 Problem with only extra utility
We design an algorithm for the multimachine problem with only extra utility by adopting the idea of the DP algorithm for the singlemachine problem. Given job set J, let \(s(J)=\min _{J_{i}\in J}\{s_{i}\}\) be the minimum starting time of J, \(p(J)={\sum \nolimits }_{J_{i}\in J}n_{i}\) be the total processing time of J, and \(\sigma (J)={\sum \nolimits }_{J_{i}\in J}\sigma _{i}\) be the total extra utility of J.
Define c(J)=〈t,j〉 as a 2tuple where t is the time and j is the number of used machines at t that the last job in J is completed in an EDF schedule.
Define \(\langle t, j\rangle + p = \left \langle t+\left \lfloor \frac {j+p+1}{m} \right \rfloor, (j+p)\mod m\right \rangle \) which represents scheduling p tasks from time t and after machine M _{ j } continuously. Because there are m machines, m tasks can be scheduled in each slot. We focus on how many machines are used in slot t rather than which machines are used. Without loss of generality, we assume 〈t,j〉 represent in slot t, machine M _{1} to M _{ j } are used. Therefore, the ending time after scheduling p tasks from time t and after machine M _{ j } continuously is \( t+\left \lfloor \frac {j+p+1}{m} \right \rfloor \), and at the ending time, (j+p) mod m machines are used. In this paper, 〈t+1,0〉 equals to 〈t,m〉 representing in slot t, all the m machines are used.
Define 〈t,j〉<〈t ^{′},j ^{′}〉 if t<t ^{′} or t=t ^{′} and j<j ^{′}. It represents that 〈t,j〉 is earlier than 〈t ^{′},j ^{′}〉. 〈t,j〉=〈t ^{′},j ^{′}〉 only if t=t ^{′} and j=j ^{′}.
We can regard the scheduling process as putting tasks to a 2dimensional array from top to bottom and from left to right. Given a starting place 〈s,j〉, the tasks can be scheduled 〈s,j+1〉 to 〈s,m〉, 〈s+1,1〉 to 〈s+1,m〉, 〈s+2,1〉 to 〈s+2,m〉, and so on. Therefore, 〈t ^{′},j ^{′}〉−〈t,j〉 can be regarded as how many tasks can be put from 〈t,j〉+1 to 〈t ^{′},j ^{′}〉.
The dynamic programming recursion formula is shown in Eqs. (12) and (13). We now introduce the recursion formula in details. The definition of C _{ i }(〈s,0〉,σ) whose value is a 2tuple represent the smallest ending place 〈t,x〉 where a feasible set J⊆{J _{1},J _{2},…,J _{ i }} can complete, while satisfying u(J)≥σ, s(J)≥s and j machines are used in s.
If J _{ i }∉J, i.e., J _{ i } cannot be contained in a feasible set J satisfying the constraint, C _{ i }(〈s,0〉,σ)=C _{ i−1}(〈s,0〉,σ).
Let us consider the situation J _{ i }∈J, where J _{ i } is contained in a feasible set J satisfying the constraint. There are two cases as follows.
Case 1: Job J _{ i } is apparent to start after 〈s _{ i },0〉. If c(J−{J _{ i }})≤〈s _{ i },0〉, or else c(J−{J _{ i }})>〈s _{ i },0〉 and J−{J _{ i }} are scheduled continuously from 〈s _{ i },0〉 to c(J−{J _{ i }}), then C _{ i }(〈s,0〉,σ)= max{〈s _{ i },0〉,C _{ i−1}(〈s,0〉,σ−σ _{ i })}+n _{ i }.
Case 2: Job J _{ i } is apparent to start after 〈s _{ i },0〉. But the tasks scheduled between 〈s _{ i },0〉 and c(J−{J _{ i }}) are not continuous. That is, some tasks of job J _{ i } can be scatterred between 〈s _{ i },0〉 and c(J−{J _{ i }}) rather than after c(J−{J _{ i }}). As stated in [16], the EDF method schedules the tasks as the form of periods of continuous processing. A period of continuous processing is called a block. We consider the scheduling of J−{J _{ i }} in the DP algorithm. Let the starting time of the last block in J−{J _{ i }} be 〈s ^{′},0〉, and the utility of the last block in J−{J _{ i }} be σ ^{′}, then the ending time of the last block is C _{ i−1}(〈s ^{′},0〉,σ ^{′})}. Let P _{ i−1}(〈s,0〉,〈s ^{′},0〉,σ ^{″}) be the minimum amount of processing done between 〈s _{ i },0〉 and 〈s ^{′},0〉, with respect to feasible set J ^{″}⊆{J _{1},J _{2},…,J _{ i−1}} with s(J ^{″})≥s, c(J ^{″})≤s ^{′}, and σ(J ^{″})≥σ ^{″}, then the number of slots available for job J _{ i } between 〈s _{ i },0〉 and 〈s ^{′},0〉 is
〈s ^{′},0〉−〈s _{ i },0〉−P _{ i−1}(〈s,0〉,〈s ^{′},0〉,σ−σ _{ i }−σ ^{′}).
Then, the completing time C _{ i }(〈s,0〉,σ) can be represented as
C _{ i−1}(〈s ^{′},0〉,σ ^{′})}+ max{0,n _{ i }−〈s ^{′},0〉+〈s _{ i },0〉+P _{ i−1}(〈s,0〉,〈s ^{′},0〉,σ−σ _{ i }−σ ^{′})}
Enumerating every s ^{′} and σ ^{′}, we can get
C _{ i }(〈s,0〉,σ)= min〈s ^{′},0〉>〈s,0〉,σ ^{′}<σ{C _{ i−1}(〈s ^{′},0〉,σ ^{′})}+ max{0,n _{ i }−〈s ^{′},0〉+〈s _{ i },0〉+P _{ i−1}(〈s,0〉,〈s ^{′},0〉,σ−σ _{ i }−σ ^{′})}}.
We now introduce how to realize the the computation of P _{ i−1}(〈s,0〉,〈s ^{′},0〉,σ ^{″}). Recall the definition of P _{ i−1}(〈s,0〉,〈s ^{′},0〉,σ ^{″}). It is the minimum number of tasks scheduled between 〈s _{ i },0〉 and 〈s ^{′},0〉 satisfying the utility constraint. Assume P _{ i−1}(〈s,0〉,〈s ^{′},0〉,σ ^{″}) is achieved by a nonempty set J ^{″}⊆{J _{1},J _{2},…,J _{ i−1}}.
If 〈s(J ^{″}),0〉>〈s,j〉, then P _{ i−1}(〈s,0〉,〈s ^{′},0〉,σ ^{″})=P _{ i−1}(〈s(J ^{″}),0〉,〈s ^{′},0〉,σ ^{″}). Therefore, we can enumerate every s ^{+}>s and find the result. The result is the minimum one among \(\min _{s^{+}>s}\{P_{i1}(\langle s^{+},0\rangle,\langle s', 0\rangle,\sigma '')\}\).
Otherwise, 〈s(J ^{″}),0〉=〈s,0〉. Let the first block in the solution be J ^{′} and the total extra utility of the first block be σ ^{′}, then the ending time of the first block is C _{ i−1}(〈s,0〉,σ ^{′}). Therefore, P _{ i−1}(〈s,0〉,〈s ^{′},0〉,σ ^{″}) can be represented as max{0,C _{ i−1}(〈s,0〉,σ ^{′})−〈s _{ i },0〉}+P _{ i−1}(〈s ^{″},0〉,〈s ^{′},0〉,σ ^{″}−σ ^{′}), where s ^{″} is the smallest starting time 〈s ^{″},0〉≥C _{ i−1}(〈s,0〉,σ ^{′}). Enumerating every σ ^{′}, we can obtain P _{ i−1}(〈s,0〉,〈s ^{′},0〉,σ ^{″})= min0<σ ^{′}≤σ ^{″}{max{0,C _{ i−1}(〈s,0〉,σ ^{′})−〈s _{ i },0〉}+P _{ i−1}(〈s ^{″},0〉,〈s ^{′},0〉,σ ^{″}−σ ^{′})}.
An improvement of the DP formula for the singlemachine problem can also be used in the multimachine problem. It cannot improve the result in a worst case where the tasks have no utility, but it can improve the result for many inputs.
5.3 Approximation algorithm for multimachine problem
Theorem 4
The APPXm algorithm is a 2approximation algorithm.
Proof
Let OPT be the utility obtained by an optimal solution and ALG be the utility obtained by the APPXm algorithm. It is easy to find that OPT can be represented as OPT=u+σ, where u is the total utility of the scheduled tasks and σ is the total extra utility of the entirely scheduled jobs in an optimal solution. Let u ^{′} be the total utility derived by the EDFmultialgorithm and σ ^{′} be the total utility obtained by the dynamic programming algorithm. From the earlier analysis, both the EDFmultialgorithm for the problem without extra utility and the dynamic programming algorithm for the problem with only an extra utility are optimal. Thus, u≤u ^{′} and σ≤σ ^{′}, then we have OPT≤u ^{′}+σ ^{′}. ALG actually can be represented as ALG≥ max{u ^{′},σ ^{′}}. Therefore, OPT≤2ALG. It completes the proof. □
6 Simulation result
The simulation result is shown in this section. As the computation complexity of our algorithm is high, especially the DP algorithm; thus, the input of our simulation is set to not too large. The number of machines used in our simulation is at most 5, the number of applications is at most 100, and the number of tasks for each application is at most 5. The utility of each task is a random value. The starting time and the ending time and the extra utility of each application is also randomly generated.
As the optimal result is hard to compute; thus, we use an upper bound result to represent the optimal result. The upper bound is computed by dividing the extra utility of each application into its tasks in proportion, that is task with high utility will be assigned with a high extra utility.
7 Conclusions
This paper proposes a class of algorithms to solve the problem of maximizing utility for job scheduling where each job consists of multiple tasks. Different from the existing works which either consider job utility or task utility individually, this paper considers both job utility and task utility simultaneously by introducing extra utility for every job. We analyze the complexity of the problem and discuss two subproblems of scheduling jobs in a single machine and scheduling jobs in multiple machines. We design two 2approximation algorithms for the subproblems, and the approximation proofs are also presented. Although the time complexity is pseudo polynomial, we provide a theoretical insight into this problem.
Declarations
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under grant nos. 61502099, 61632008, 61320106007, 61502100, 61532013, 61602084, and 61672154, Jiangsu Provincial Natural Science Foundation of China under grant no. BK20150637, Jiangsu Provincial Key Laboratory of Network and Information Security under grant no. BM2003201, Key Laboratory of Computer Network and Information Integration of Ministry of Education of China under grant no. 93K9, and Collaborative Innovation Center of Novel Software Technology and Industrialization.
Authors’ contributions
XF and WW conceived and designed the study. XF performed the experiments. XF and YL wrote the paper. JL, HG, and YL reviewed and edited the manuscript. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 SH Bokhari, Assignment Problems in Parallel and Distributed Computing, vol. 32 (Springer US, New York, 2012).Google Scholar
 M Armbrust, A Fox, R Griffith, AD Joseph, R Katz, A Konwinski, et al., A view of cloud computing. Commun. ACM. 53(4), 50–58 (2010). Available from: URL http://doi.acm.org/10.1145/1721654.1721672.
 Q Zhang, L Cheng, R Boutaba, Cloud computing: stateoftheart and research challenges. J. Int. Serv. Appl. 1(1), 7–18 (2010). Available from: http://dx.doi.org/10.1007/s1317401000076.
 V Sharma, U Mukherji, V Joseph, S. Gupta, Optimal energy management policies for energy harvesting sensor nodes. IEEE Trans. Wirel. Commun. 9(4), 1326–1336 (2010).View ArticleGoogle Scholar
 C Lefurgy, K Rajamani, F Rawson, W Felter, M Kistler, TW Keller, Energy management for commercial servers. Computer. 36(12), 39–48 (2003).View ArticleGoogle Scholar
 RL Graham, EL Lawler, JK Lenstra, AHGR Kan, in Discrete Optimization IIProceedings of the Advanced Research Institute on Discrete Optimization and Systems Applications of the Systems Science Panel of NATO and of the Discrete Optimization Symposium cosponsored by IBM Canada and SIAM Banff, Aha. and Vancouver. vol. 5 of Annals of Discrete Mathematics, ed. by ELJ P L Hammer, BH Korte. Optimization and approximation in deterministic sequencing and scheduling: a survey (Elsevier, 1979), pp. 287–326. Available from: http://www.sciencedirect.com/science/article/pii/S016750600870356X. Accessed 29 Apr 2008.
 D Applegate, W Cook, A computational study of the jobshop scheduling problem. ORSA J. Comput. 3(2), 149–156.Google Scholar
 EL Lawler, JK Lenstra, AHGR Kan, DB Shmoys, in Logistics of Production and Inventory. vol. 4 of Handbooks in Operations Research and Management Science. Chapter 9. Sequencing and scheduling: algorithms and complexity (Elsevier, 1993), pp. 445–522. Available from: http://www.sciencedirect.com/science/article/pii/S0927050705801896.
 J Blazewicz, JK Lenstra, AHGR Kan, Scheduling subject to resource constraints: classification and complexity. Discret. Appl. Math. 5(1), 11–24 (1983). Available from: http://www.sciencedirect.com/science/article/pii/0166218X83900124. Accessed 9 Sept 2002.MathSciNetView ArticleMATHGoogle Scholar
 Y Zheng, B Ji, N Shroff, P Sinha, in 2015 IEEE 8th International Conference on Cloud Computing. Forget the deadline: scheduling interactive applications in data centers (IEEENew York, 2015), pp. 293–300.View ArticleGoogle Scholar
 MR Garey, DS Johnson, Twoprocessor scheduling with starttimes and deadlines. SIAM J. Comput. 6(3), 416–426 (1977).MathSciNetView ArticleMATHGoogle Scholar
 MR Garey, DS Johnson, Computers and Intractability: A Guide to the Theory of NPCompleteness (W.H. Freeman & Co, New York, 1979).MATHGoogle Scholar
 A BarNoy, S Guha, JS Naor, B Schieber, in Proceedings of the Thirtyfirst Annual ACM Symposium on Theory of Computing. STOC ’99. Approximating the throughput of multiple machines under realtime scheduling (ACMNew York, 1999), pp. 622–631. Available from: http://doi.acm.org/10.1145/301250.301420.
 A BarNoy, R BarYehuda, A Freund, J (Seffi) Naor, B Schieber, A unified approach to approximating resource allocation and scheduling. J. ACM. 48(5), 1069–1090 (2001). Available from: http://doi.acm.org/10.1145/502102.502107.
 G Calinescu, A Chakrabarti, H Karloff, Y Rabani, An improved approximation algorithm for resource allocation. ACM Trans. Algorithms. 7(4), 48:1–48:7 (2011). Available from: http://doi.acm.org/10.1145/2000807.2000816.
 EL Lawler, A dynamic programming algorithm for preemptive scheduling of a single machine to minimize the number of late jobs. Ann. Oper. Res. 26(14), 125–133 (1991). Available from: http://dx.doi.org/10.1007/BF02248588.
 G Steiner, Models and algorithms for planning and scheduling problems minimizing the number of tardy jobs with precedence constraints and agreeable due dates. Discret. Appl. Math. 72(1), 167–177 (1997). Available from: http://www.sciencedirect.com/science/article/pii/S0166218X96000431. Accessed 10 Jan.View ArticleMATHGoogle Scholar
 H Kise, T Ibaraki, H Mine, A solvable case of the onemachine scheduling problem with ready and due times. Oper. Res. 26(1), 121–126 (1978). Available from: http://dx.doi.org/10.1287/opre.26.1.121.
 V Bafna, B Narayanan, R6 Ravi, Nonoverlapping Local Alignments, Weighted Independent Sets of Axis Parallel Rectangles (Center for Discrete Mathematics & Theoretical Computer Science,Princeton, 1995).View ArticleMATHGoogle Scholar
 P Berman, B DasGupta, S Muthukrishnan, in Proceedings of the Thirteenth Annual ACMSIAM Symposium on Discrete Algorithms. SODA ’02. Simple approximation algorithm for nonoverlapping local alignments (Society for Industrial and Applied MathematicsPhiladelphia, 2002), pp. 677–678. Available from: http://dl.acm.org/citation.cfm?id=545381.545471.
 R BarYehuda, MM Halldórsson, JS Naor, H Shachnai, I Shapira, in Proceedings of the Thirteenth Annual ACMSIAM Symposium on Discrete Algorithms. SODA ’02. Scheduling split intervals (Society for Industrial and Applied MathematicsPhiladelphia, 2002), pp. 732–741. Available from: http://dl.acm.org/citation.cfm?id=545381.545479.
 R BarYehuda, D Rawitz, Using fractional primaldual to schedule split intervals with demands. Discret. Optim. 3(4), 275–287 (2006). Available from: http://dx.doi.org/10.1016/j.disopt.2006.05.010.
 VT Chakaravarthy, A Roy Choudhury, S Roy, Y Sabharwal, in Proceedings of the 19th International Conference on Parallel Processing. EuroPar’13. Scheduling jobs with multiple nonuniform tasks (SpringerVerlagBerlin, Heidelberg, 2013), pp. 90–101. Available from: http://dx.doi.org/10.1007/9783642400476_12.