Problems with TFRC packet scheduling ==================================== We are having implementation difficulties to make the packet scheduling in CCID 3 both working smoothly _and_ up to spec with regard to [RFC 3448]. Three problems have been identified and are described in detail below. They can be summarized as two conceptual problems: (a) Packet scheduling can only be controlled up to a maximum rate which depends on the scheduling granularity of the OS; beyond this rate, packets are sent uncontrolled (at top speed); (b) The system as described in [RFC 3448, 4.6] works like a Token Bucket Filter with an undefined bucket size; the absence of a defined bucket size leads to unbounded `credits' which entail arbitrarily large bursts. PROBLEM I: When basing rate regulation on OS packet scheduling, the granularity t_gran of the OS scheduler imposes a maximum controllable speed X_crit. Beyond this X_crit, the sending rate can no longer be controlled and solely depends on the capabilities of the hardware. Cause for Problem I: -------------------- Assume that no loss occurs. Then X = max(min(2*X, 2*X_recv), s/R) and so t_ipi = s/X <= R Let t_gran be the OS scheduling granularity (in Linux, t_gran = 1/HZ). When using the mechanism described in [RFC 3448, 4.6], rate control is based on delaying packet sending by putting the application to sleep for up to t_ipi time units. When t_ipi < t_gran, there are two possibilities: (i) not let the application sleep -> send immediately (ii) conservatively assume that t_ipi = t_gran -> always delay Both choices occur at the same point, characterised by t_ipi = t_gran. From t_ipi = s/X_crit = t_gran we obtain a `critical speed' as X_crit = s/t_gran (in Linux: X_crit = s * HZ) When going faster than X_crit, t_ipi < t_gran; when going slower then t_ipi > t_gran, which is not critical. Consequences: (i) using case (i) above means that there is no control beyond X_crit (ii) using case (ii) above means that X is always bounded by X_crit To illustrate with a numerical example: when t_gran = 1ms (which is `good'), then a packet size s of 32 bytes means an X_crit of 256 kilobits/sec; and a packet size of 1460 bytes means an X_crit of 11.68 megabits/sec. The following two problems are related and have been identified by Ian McDonald. They were posted a while before and illustrate a second problem of the OS-based packet scheduling. They are important and there is a clear need for further clarification. PROBLEM II: The application sends a packet at time t_0 and goes idle for a duration t_idle >= 2 * t_ipi. When resuming activity, the application is then permitted to send a burst of beta = floor(t_idle / t_ipi) - 1 packets immediately. The problem is that the burst size beta is unbounded. Cause for Problem II: --------------------- Packet sending times, as described in RFC 3448, are calculated from the previous sending time and the value of t_ipi. When idling for too long, this accrues a `negative credit', which can be calculated as follows. Assume that the last packet before the idle phase was sent at time t_0. Then, according to [RFC 3448, 4.6], the next sending time t_1 has the nominal value t_1 = t_0 + t_ipi. Let t_idle = n * t_ipi + t_rem be the the time during which no packets are emitted; with t_rem < t_ipi. When the application resumes sending activity at time t_now = t_0 + t_idle, it can send a packet immediately, since t_now - delta = t_1 + (n-1) * t_ipi + t_rem - delta > t_1 Assuming for the moment that t_ipi stays constant, it takes n-1 iterations to clear this negative credit before normal (i.e. delayed) packet scheduling resumes. PROBLEM III: Application emits packets at a rate alpha which is less than the sending rate X currently permitted by TFRC: alpha < X/s Consequences of Problem III: ---------------------------- In this case, since 1/alpha > s/X = t_ipi, a `credit' similar to the one in Problem II accrues; accumulating in increments of 1/alpha - t_ipi each time a packet is sent out. The problematic case arises when the application increases its rate alpha by some drastic amount: then a burst of length proportional to the accrued credit can be sent without any delaying. Combined Analysis for Problems II and III ----------------------------------------- Problems II/III can be described in terms of a Token Bucket Filter with bucket size beta equal to the accrued credit of packet backlog. As in the standard Token Bucket Filter, tokens are placed at a rate rho = t_ipi in the bucket. Under `normal' conditions, 0 <= beta <= 1; i.e. at most one packet is sent at each scheduled time. When however beta > 1, as is the case in Problem II/III, beta packets can be emitted immediately. The problem here is that there is no defined upper bound for beta, it can grow arbitrarily large. This includes both the possible cases of mere TCP-unfriendliness as well as possible network overload due to congestion. The challenge is now to find an upper bound for beta. When the sender is in slow-start, X >= s/R is not bounded and can not be used as upper limit on the burst length. Another idea would be to set beta = MSS/s.