Non-clairvoyant scheduling for weighted flow time and energy on speed bounded processors

We consider the online scheduling problem of minimizing total weighted flow time plus energy on a processor that can scale its speed dynamically between 0 and some maximum speed T. In the past few years this problem has been studied extensively under the clairvoyant setting, which requires the size of a job to be known when it is released [1, 4, 5, 8, 12, 15, 16, 17]. For the non-clairvoyant setting, despite its practical importance, the progress is relatively limited. Only recently an online algorithm LAPS is known to be O(1)-competitive for minimizing (unweighted) flow time plus energy in the infinite speed model (i.e., T = ∞) [10, 11]. This paper makes two contributions to the non-clairvoyant scheduling. First, we resolve the open problem that the unweighted result of LAPS can be extended to the more realistic model with bounded maximum speed. Second, we show that another non-clairvoyant algorithm WRR is O(1)-competitive when weighted flow time is concerned.


Introduction
Energy consumption has become an important concern for the design of modern microprocessors.Manufacturers like Intel and IBM are now producing processors that can support dynamic speed scaling, which would allow operating systems to manage the power by scaling the processor speed dynamically.Running jobs slower saves more energy, yet it takes longer time.Taking speed scaling and energy usage into consideration makes job scheduling more complicated than before.The challenge arises from the conflicting objectives of optimizing some quality of service (QoS) of the schedule and minimizing the energy usage.
The theoretical study of speed scaling was initiated by Yao, Demers and Shenker [24].They considered a model where a processor can vary its speed s between 0 and infinity dynamically, and it consumes energy at the rate s α , where α is a constant (commonly believed to be 2 or 3 [9,22] for CMOS-based processors).Under this infinite speed model, Yao et al. studied the deadline scheduling and gave an online algorithm that is O(1)-competitive for minimizing the energy for completing all jobs.Algorithms with better ratios were later obtained by Bansal, Kimbrel and Pruhs [7] and Bansal, Chan, Pruhs and Katz [6].The best ratio now is 4 α /2 √ eα for general α and is about 6.7 when α = 3.The infinite speed model is a convenient model to work with.Among others, it allows an online algorithm to catch up arbitrarily fast and recover from any over-conservative decision on speed.However, this is not a practical model.Recently, Chan et al. [10] and Bansal et al. [4] have obtained several interesting results on speed scaling for a speed bounded processor, where the maximum speed T is a fixed constant.
Flow and energy.The study of speed scaling and energy-efficient scheduling goes beyond deadline scheduling.When scheduling jobs without deadlines, a commonly used QoS measure is the total flow time of jobs.The flow time (or simply the flow) of a job is the time elapsed since the job is released until it is completed.Note that job preemption is allowed, and a preempted job can be resumed later at the point of preemption.Assuming jobs are equally important, it is natural to find a schedule that minimizes the total flow time (which is also referred to as minimizing the total/average response time in the literature).When jobs have varying importance or weights, it is more meaningful to minimize the total weighted flow time.
Minimizing flow time and minimizing energy usage are orthogonal objectives.To understand their tradeoff, Albers and Fujiwara [1] initiated the study of minimizing a linear combination of flow and energy.The intuition is that, from an economic viewpoint, users are willing to pay a certain (say, ρ) units of energy to reduce one unit of flow time.By changing the units of time and energy, one can further assume that ρ = 1 and thus would like to minimize flow time plus energy, or in general, weighted flow time plus energy.
Clairvoyant scheduling for flow plus energy.The problem of minimizing flow plus energy has attracted a lot of attention [1,4,5,8,12,[18][19][20].These works mainly focus on the clairvoyant setting which assumes that the size of a job is known when the job is released.The work of Albers and Fujiwara [1] focused on jobs of unit size.Bansal, Pruhs and Stein [8] were the first to consider jobs of arbitrary sizes.In the infinite speed model, they gave an algorithm that is O(( α ln α ) 2 )-competitive for minimizing weighted Key words and phrases: online algorithms, non-clairvoyant scheduling, speed scaling, energy, flow time flow plus energy.Bansal, Chan, Lam and Lee [4] later adapted the BPS algorithm to the bounded speed model.The competitive ratio remains O(( α ln α ) 2 ) when the algorithm uses a processor with maximum speed (1 + ln α−ln ln α α )T . 1 Very recently, Bansal, Chan and Pruhs [5] improved the analysis of BPS; their work implies that BPS is indeed O( α ln α )-competitive, when the maximum speed is relaxed as before.It is worth-mentioning that the recent lower bound result on weighted flow [3] implies that without relaxing the maximum speed, no online algorithm can be constant competitive (in terms of α) for weighted flow plus energy.However, the extra speed requirement does not apply to unweighted flow.
A drawback of the BPS algorithm is that it scales the speed according to the fraction of unfinished work and thus keeps changing the speed continuously over time.It is practically more desirable to have an algorithm that changes the speed at discrete times (say, at job arrival or completion) .Recently, focusing on (unweighted) flow plus energy, Lam et al. [18] studied another speed scaling algorithm AJC (Active Job Count), which scales the speed as a function of the number of active jobs (i.e.unfinished jobs).In other words, AJC changes the speed only when a job arrives or finishes.AJC when coupled with the job selection algorithm SRPT (shortest remaining processing time) is indeed O( α log α )-competitive for (unweighted) flow plus energy.This result holds in both the infinite and bounded speed models.For the latter, unlike BPS, AJC does not demand relaxation of maximum speed.Recently, Bansal, Chan and Pruhs [5] adapted AJC and gave a tighter analysis; they showed that the competitive ratio is 3 for minimizing (unweighted) flow plus energy (even when α is as small as 3, this result is still better than the O( α log α ) bound in [18], which is equal to 3.25).Again, no extra maximum speed is needed.More recently, the analysis is further tightened by [2] and the competitive ratio is reduced to 2.
For weighted flow plus energy, it has remained an open problem whether AJC (or any speed scaling algorithm that changes the speed at discrete times) can lead to a competitive guarantee.
Non-clairvoyant scheduling for flow plus energy.All of the above results assume clairvoyance.In the non-clairvoyant setting, the size of a job is not known when the job is released; it is only known when the job is completed.This is a natural assumption from the viewpoint of operating systems.Nonclairvoyant flow time scheduling (on a fixed-speed processor) has been an interesting problem itself (e.g., [17,21]).Chan et al. [12] initiated the study of non-clairvoyant speed scaling.Under the infinite speed model, they consider an algorithm LAPS (Latest Arrival Processor Sharing) which scales the speed as AJC and selects some most recently released jobs to share the processor.LAPS is shown to be 4α 3 (1 + (1 + 3/α) α ) = O(α 3 )-competitive for (unweighted) flow plus energy.Furthermore, they showed that no algorithm can be O(α 1/3−ε )-competitive for any ε > 0, illustrating that the non-clairvoyant setting is more difficult than the clairvoyant setting.Recently, Chan et al. [11] improved the analysis of LAPS and reduce the competitive ratio to O( α 2 log α ).Yet, not much is known for the bounded speed model, let alone weighted flow plus energy.
Our contributions.This paper considers non-clairvoyant scheduling on a processor whose maximum speed T is a fixed constant.In the first part, we adapt the algorithm LAPS to run on such a processor and show that it is 8α 2 = O(α 2 )-competitive for (unweighted) flow plus energy when the maximum speed is relaxed to (1 + 1 α−1 )T .Note that unlike the clairvoyant setting, even for unweighted jobs, extra maximum speed is necessary to achieve constant competitiveness.This inherits from the lower bound result on non-clairvoyant (fixed-speed) scheduling by Motwani, Philips and Torng [21].
In the clairvoyant setting, existing results on the bounded speed model take advantage of a local property that the online algorithm in concern accumulates at most O(T α ) jobs more than the optimal offline algorithm [4,5,18].With this local property, it is relatively easy to adapt the analysis in the infinite speed model.In the non-clairvoyant setting, such a local property is no longer valid for algorithms like LAPS.To analyze these algorithms in the bounded speed model, we exploit a more "global" accounting of the rate of change of flow plus energy.Instead of using the above property, we integrate the maximum speed constraint into the potential analysis.
The second result of this paper concerns the more difficult general case where jobs have arbitrary weights.Under the bounded speed model, we give the first competitive algorithm called WRR (Weighted Round Robin) for weighted flow plus energy; the competitive ratio is O(3 α /ε) when using a processor with maximum speed (3 + ε)T , where 0 < ε ≤ 3 α .Motivated by AJC, WRR uses a generalized AJC for speed scaling; i.e., the speed is a function of the total weight (instead of the number) of active jobs.Recall that all existing clairvoyant results [4,8] on weighted flow plus energy are based on the BPS algorithm, which scales the speed continuously.Our result of WRR gives, as a by-product, the first competitive clairvoyant algorithm for weighted flow plus energy that changes the speed discretely, but the competitive ratio is way worse than that of BPS.

Definitions and Notations
We study job scheduling on a single processor.Jobs arrive over time in an online fashion; we have no information about a job before it arrives.For any job j, we use r( j) and p( j) to denote its release time and work requirement (or size).In some case, each job j may have a weight w( j).We consider the non-clairvoyant setting, in which when a job j arrives, we only know its weight w( j) (if any) but not its size p( j).And p( j) is known only when j is completed.At any time, the processor can scale its speed between 0 and a maximum speed T .When running at speed s, the processor processes s units of work per unit time and consumes energy at the rate s α , where α > 1 is a fixed constant.Preemption is allowed; a job can be preempted and later resumed at the point of preemption without any penalty.
Flow and energy.Consider any job set I and some schedule S of I.At any time t, for any job j, we let q( j,t) be the remaining work of j at t.A job j is an active job if it has been released but not yet completed, i.e., r( j) ≤ t and q( j,t) > 0. The flow F( j) of a job j is the time elapsed since j arrives and until it is completed.The total flow F is equal to ∑ j∈I F( j), which is equivalent to ∞ 0 n(t) dt, where n(t) is the total number of active jobs at time t.The energy usage E is ∞ 0 (s(t)) α dt, where s(t) is the processor speed at time t.The objective is to minimize the sum of total flow and energy usage, denoted by G = F + E.
In general, when jobs have different weights, we generalize the notion of total flow as follows.The total weighted flow F is equal to ∑ j∈I w( j)F( j), or equivalently, ∞ 0 w(t) dt, where w(t) is the total weight of active jobs at time t.The objective becomes minimizing total weighted flow plus energy, denoted by G = F + E.

Minimizing unweighted flow plus energy with bounded maximum speed
In this section, we consider jobs without weights and aim at minimizing total flow plus energy in the bounded speed model.As mentioned in the introduction, no online algorithm can achieve constant competitiveness when its maximum speed is the same as the optimal offline algorithm OPT (which is denoted T below); thus, we consider allowing the online algorithm to use slightly higher maximum speed.We adapt the non-clairvoyant algorithm LAPS (Latest Arrival Processor Sharing) which was first given in [12] for the infinite speed model.When using a processor with maximum speed (1 + δ )T , where Below is the definition of LAPS, which assumes using a processor with maximum speed (1 + δ )T for some δ > 0.
Algorithm LAPS.Let 0 < β ≤ 1 be any real.Consider any time t.The processor speed is set to s a (t) = min((n a (t)) 1/α , (1 + δ )T ), where n a (t) is the total number of active jobs at t.The processor processes the β n a (t) active jobs with the latest release time (ties are broken by job ids) by splitting the processor speed equally among these jobs.
We compare LAPS with an optimal offline algorithm OPT using a processor with maximum speed T .Our main result is the following theorem.
To prove Theorem 3.1, our analysis exploits amortization and potential functions (e.g., [8,10]).Let G a (t) and G o (t) denote the flow plus energy incurred up to time t by LAPS and OPT, respectively.We drop the parameter t when it is clear that t is the current time.To show that LAPS is c-competitive for some constant c ≥ 1, it suffices to define a potential function Φ(t) such that the following conditions hold: (i) Φ = 0 before any job is released and after all jobs are completed; (ii) Φ is a continuous function except at some discrete times (e.g., when a job arrives, or when a job is completed by LAPS or OPT), and Φ does not increase at such times; (iii) at any other time, dG a (t) dt , where γ is a positive constant (to be set to 4α).Condition (iii) is also known as the running condition.The sufficiency of these conditions for proving c-competitiveness follows from integrating them over time.
Potential function Φ(t).Consider any time t.Let n a (t) and n o (t) be the number of active jobs in LAPS and OPT, respectively.Let j 1 , j 2 , . . ., j n a (t) be all the active jobs in LAPS, which are ordered by release times such that r( j 1 ) ≤ r( j 2 ) ≤ • • • ≤ r( j n a (t) ) (ties are broken by job ids).For any job j, let q a ( j,t) and q o ( j,t) be the remaining work of job j in LAPS and OPT, respectively.For each j i , let x i = max{q a ( j i ,t) − q o ( j i ,t), 0} which is the amount of work of j i in LAPS that is lagging behind OPT.We call a job j i lagging if x i > 0. The potential function Φ(t) is defined as follows.
We call c i the coefficient of j i .Note that c i is monotonically increasing.
We first check Conditions (i) and (ii).Condition (i) holds since Φ = 0 before any job is released and after all jobs are completed.Now we check Condition (ii).When a job j arrives, j must be non-lagging and the coefficients of all existing jobs of LAPS remain the same, so Φ does not change.When OPT completes a job, Φ does not change.When LAPS completes a job, the coefficient of any other job either stays the same or decreases, so Φ does not increase.
It remains to check the running condition (Condition (iii)).Consider any time t when Φ does not have discrete change.Let s a and s o be the current speeds of LAPS and OPT, respectively.Then dG a dt = n a + s α a and dG o dt = n o + s α o .Let be the number of lagging jobs that LAPS is processing.Note that ≤ β n a .For convenience, we further define another real number φ ≤ β such that (β − φ )n a is an integer equal to β n a − .Note that φ can be less than zero if = 0.
To bound the rate of change of Φ, we consider how Φ changes in an infinitesimal amount of time (from t to t + dt), first due to LAPS only (Lemma 3.2), and then due to OPT (Lemma 3.3).We denote the rate of change of Φ due to LAPS and OPT by dΦ a dt and dΦ o dt , respectively.Note that dΦ dt = dΦ a dt + dΦ o dt .
Proof.LAPS is processing β n a jobs and of them are lagging jobs.For each of these lagging jobs j i , its lagging size x i is changing at the rate of −s a / β n a (we only consider the change due to LAPS).For a non-lagging job j i , x i does not change.First, consider the case that ≥ 1.To upper bound dΦ a dt , the worst case is that the lagging jobs have the smallest coefficients among the β n a latest released jobs, i.e., the jobs are { j n a − β n a +1 , • • • , j n a − β n a + }.On the other hand, as c i is monotonically increasing, for any integers a < b, we have Recall that s a = min(n 1/α a , (1 + δ )T ).By the definition of c i , we have c i s a ≥ i for any 1 It remains to consider the case that = 0.In this case, dΦ a dt = 0. Recall that 0 < β ≤ 1.Note also that Proof.To upper bound dΦ o dt , the worst case is that OPT is processing the job j n a with the largest coefficient We apply the Young's Inequality [23], which is stated in Lemma 3.5 below, by setting p = 1/(1 − 1 α ), q = α, x = n 1−1/α a and y = s o .
Then dΦ o dt ≤ n The lemma thus follows.
We are now ready to prove the running condition, which together with Conditions (i) and (ii), implies Theorem 3.1.Proof.We will show an equivalent version of the inequality, dG a dt + γ dΦ dt − 8α 2 • dG o dt ≤ 0. LAPS is processing β n a − non-lagging jobs, which are also active jobs in OPT.Thus, n o ≥ β n a − = (β − φ )n a .Note that dG a dt = n a + s α a ≤ 2n a , and By Lemmas 3.2 and 3.3, We set β = 1 2α and γ = 4α.
Below is the formal statement of Young's Inequality, which is used in the proof of Lemma 3.3.
4 Minimizing weighted flow plus energy with bounded maximum speed In this section, we consider jobs with arbitrary weights and give a non-clairvoyant algorithm WRR (Weighted Round Robin) that is O(α3 α )-competitive for weighted flow plus energy, when using a processor with maximum speed (3 + ε)T , where ε = 3 α .The algorithm WRR scales its speed based on the total weight of active jobs and shares the processor among the active jobs according to the ratio of their weights.Below is the definition of WRR, which assumes using a processor with maximum speed (3 + ε)T for any ε > 0.
Algorithm WRR.Consider any time t.The processor speed is set to s a (t) = (3 + ε) • min((w a (t)) 1/α , T ), where w a (t) is the total weight of active jobs at t.The processor processes all active jobs such that each active job j receives processor speed equal to s(t) • (w( j)/w a (t)).
We compare WRR against an optimal offline algorithm OPT that uses a processor with maximum speed T .Our main result is the following theorem.
α , the competitive ratio in the above theorem becomes (6α + 4)(1 ).The rest of this section is devoted to proving Theorem 4.1.Let G a (t) and G o (t) be the weighted flow plus energy incurred up to time t by WRR and OPT, respectively.We drop the parameter t when it is clear that t is the current time.Similar to Section 3, to prove that WRR is c-competitive, we derive a potential function Φ(t) that satisfies the following conditions: (i) Φ = 0 before any job is released and after all jobs are completed; (ii) Φ is a continuous function except at some discrete times where Φ does not increase; (iii) Running condition: at any other time, d G a (t) dt , where γ is a positive constant (to be set to (2

Potential function Φ(t).
Consider any time t.For any job j, let q a ( j,t) and q o ( j,t) be the remaining work of j at t in WRR and OPT, respectively.An active job j in WRR is lagging if WRR has processed less on j than OPT at time t.Let L = { j 1 , j 2 , . . ., j } be the set of lagging jobs in WRR, ordered in ascending order of the latest time when the job becomes lagging.For each j i ∈ L, let x i = q a ( j i ,t) − q o ( j i ,t); note that x i > 0. We define the potential function Φ(t) as follows.
We call c i the coefficient of j i .Note that c i is monotonically increasing with i.We first check Conditions (i) and (ii).Condition (i) holds since Φ = 0 before any job is released and after all jobs are completed.Now we show that Condition (ii) holds.When a job j i joins L, x i tends to zero and j i must be at the end of L, so the coefficients of all other jobs do not change and Φ does not change.When a job j i leaves L (e.g., WRR completes j i ), x i must be zero and the coefficient of any other lagging job either stays the same or decreases, so Φ does not increase.
It remains to check the running condition (Condition (iii)).Consider any time t when Φ does not have discrete change.Let w a and w o be the total weight of active jobs at t in WRR and OPT, respectively.Let w = ∑ i=1 w( j i ) be the total weight of jobs in L. Note that w ≤ w a .Furthermore, let s a and s o be the current speeds of WRR and OPT, respectively.As stated in Section 2, d G a dt = w a + s α a and d G o dt = w o + s α o .We will divide the analysis into cases depending on whether w is small or big.
To bound the rate of change of Φ, we consider how Φ changes first due to OPT only (Lemma 4.2) and then due to WRR (Lemma 4.3).We denote the rate of change of Φ due to OPT  dt , observe that the worst case is when OPT is processing the job j with the largest coefficient c .Then x i is increasing at the rate of s o (we only consider the change due to OPT) and hence dΦ o dt ≤ c s o .When w ≤ T α , c = w 1−1/α and thus dΦ o dt ≤ w 1−1/α s o .We apply the Young's Inequality (Lemma 3.5 in Section 3) with p = α, q = α/(α − 1), x = s o and y = w 1−1/α .Then we have Proof.To upper bound dΦ a dt , note that each job j i ∈ L is being processed at the rate of s a • w( j i ) w a (we only consider the change due to WRR), and thus x i is changing at the rate of −s a • w( j i ) w a .To ease discussion, let y i = ∑ i k=1 w( j k ).Note that y 0 = 0, y = w , and for any 1 ≤ i ≤ , y i − y i−1 = w( j i ).First, consider w ≤ T α .In this case, for each job j i ∈ L, c i = y Next, consider w > T α .In this case, w a ≥ w > T α and hence s a = (3 + ε) • min(w 1/α a , T ) = (3 + ε)T .Note that y = w > T α .We let g < be the largest integer such that y g ≤ T α .Then We are ready to show the following running condition, which together with Conditions (i) and (ii), implies Theorem 4.1.
Proof.The analysis is divided into three cases depending on whether w a > T α and whether w > T α .In each case, we further divide the analysis depending on whether w > (1 − β )w a , where β = ε/(6 + 2ε).
It is useful to note that . By the definition of β and the fact that 0 < β < 1, we have Therefore, In conclusion, the running condition is satisfied in all the three cases.

Conclusion
In this paper we have given two non-clairvoyant scheduling algorithms for minimizing flow plus energy.The first algorithm (LAPS) is 8α 2 -competitive for (unweighted) flow plus energy, when using a processor with maximum speed α α−1 T .The second algorithm (WRR) is O(3 α /ε)-competitive for weighted flow plus energy, when using a processor with maximum speed (3 + ε)T , where 0 < ε ≤ 3 α .We believe that LAPS can be generalized to minimize weighted flow plus energy, and the competitive ratio would remain O(α 2 ).