Adding a 1 millisecond delay remedied the problem:

tc qdisc add dev lo root netem delay 1ms

The CCID-3 measurements over loopback resulted in the following: as expected there was no congestion and hence no loss (p = 0). The bitrate X_recv measured at the receiver was a very stable 219 bytes per second (min. 224, max. 225, stddev. 32.52).

The interesting deviations from the expectations follow below.

The average RTT was 8.3 microseconds (min. 3, max. 63, stddev. 12.5). We will consider the minimum of 3 microseconds as "true" link RTT.

There is a notable peak starting with
the first packet, afterwards the RTT converged to the actual link RTT.
The reason for the peak is that the first RTT measurement is taken from
the Request/Response exchange. Since packet exchanges for connection
setup take slightly longer than those in steady state, a higher value
results. The slow decline is due to the fact that an exponential
weighted moving average is used: it takes longer to converge to the
actual link RTT of 3 microseconds.

The RTT changes coincide with the changes of X:

The RTT changes coincide with the changes of X:

The loopback interface has an MTU of
16436 bytes. With regard to the
RFC3390 initial sending rate this means for the packet size s=22:

which is the value of the first point plotted above.

CCID-3 then performs a kind of slow-start until about 2 seconds. This is not a crystal-clear case of slow-start, so a more detailed analysis is necessary.

The Linux implementation uses the revision 00 of draft rfc3448bis, where an idle sender is defined as one which hasn't sent anything for more than 2 RTTs. Since the RTT is low in this case, in each computation of X the sender is classified as "idle" and gets the benefit of being constrained by the RFC 3390 initial rate rather than 2 * X_recv.

The end of this phase was reached when the RTT converged towards the actual link RTT, peaking at about 2 seconds. This peak value is 234.7 Mbps (reached at 2.391 seconds) and it is simply 4*s/RTT with the link RTT of 3 microseconds, i.e. 4*22*8/3 Mbps.

The t_ipi plot is inverse to the X plot:

W_init/RTT = min(4*22, max(2*22, 4380)) bytes / 63usec = 88bytes / 63usec = 11.17 Mbps

CCID-3 then performs a kind of slow-start until about 2 seconds. This is not a crystal-clear case of slow-start, so a more detailed analysis is necessary.

The Linux implementation uses the revision 00 of draft rfc3448bis, where an idle sender is defined as one which hasn't sent anything for more than 2 RTTs. Since the RTT is low in this case, in each computation of X the sender is classified as "idle" and gets the benefit of being constrained by the RFC 3390 initial rate rather than 2 * X_recv.

min_rate = max(W_init/RTT, 2 * X_recv) = max(4*22*8/RTT, 2*X_recv)

X = max(min(2*X, min_rate), s/RTT) = max(min(2*X, min_rate), 22*8/RTT)

Considering that since the sender is
application limited and that therefore we have a near-constant X_recv equalling the
application-sending rate of 1800 bits per second, we get the following
simplification:

This means that a "classic" slow-start (where X' = 2*X) is performed
only as long as X = max(min(2*X, max(W_init/RTT, 2*X_recv), s/RTT)The inner `max' simplifies to W_init/RTT since this is greater than 1800 for all RTTs less than 391 msec. Hence

= max(min(2*X, max(4*s/RTT, 2*X_recv) s/RTT)

= max(min(2*X, max(4*22*8/RTT, 1800), 22*8/RTT)

X = max(min(2*X, 4*s/RTT), s/RTT)

= max(min(2*X, 4*22*8/RTT), 22*8/RTT)

s/RTT < 2*X < 4*s/RTT = W_init/RTTwhich, with the initial RTT of 63 usec, gives the interval 1.4 Mbps < X < 5.6 Mbps. This is only a narrow stretch in the above diagram (in the dccp_probe log, this doubling of the sending rate happened between 0.826 seconds and 1.022 seconds). As long as X <= 2*s/RTT, the s/RTT term dominates so that X = s/RTT (in the log this happened from the begin up to 0.826 seconds). Lastly, consider the stage where 2 * X >= W_init/RTT: since W_init/RTT is always greater than s/RTT, at this stage the value of X solely depends on W_init/RTT (in the log this happens from 1.022 seconds onwards).

The end of this phase was reached when the RTT converged towards the actual link RTT, peaking at about 2 seconds. This peak value is 234.7 Mbps (reached at 2.391 seconds) and it is simply 4*s/RTT with the link RTT of 3 microseconds, i.e. 4*22*8/3 Mbps.

The t_ipi plot is inverse to the X plot:

It is useful to also plot the inverse of t_ipi, the allowed number of packets per second:

This again shows the problem: after
reaching the peak at about 2 seconds, the allowed number of packets per
second gradually declines; in the reported case it eventually dropped
below the minimum required number of one packet per 64 seconds (which corresponds to 0.016 packets
per second). This "dead" minimum is not reached in the above
plot, but is an asymptotic value.

### 3. What is the explanation?

Summarising the above conditions:

#### 3.1 Reason for decline of X

The reason for the decline of X
(and hence the increase of t_ipi)
is the nofeedback timer:
according to RFC 3448, 4.3, each time a feedback packet is received,
the nofeedback timer is set to expire after

- the application sends at a constant bitrate of 1.8 kbps, using 22-byte packets each, resulting in a constant inter-packet gap of 98 milliseconds;
- the steady-state RTT is 3 microseconds;
- there is no loss, i.e. p = 0.

max(4*RTT, 2*s/X) = max(12, 2*22*8/X)

microseconds. When it expires, X
is halved by modifying the cached value of X_recv. The revision 00 of
rfc3448bis had the additional rule
whereby X is halved directly if either (i) no feedback has been
received yet or (ii) p == 0, according to

The nofeedback timer will expire every 12 microseconds as long as s/X < 6 microseconds, or as long as X > 29.34 Mbps. When X has been reduced to lower than 29.34 Mbps, then s/X < 2*RTT and thus the next timer expiry happens at 2*s/X. In the X diagram above, the first time that X reaches a value of 29.34 Mbps between 1.02 and 1.11 seconds. But this is still during the semi slow-start phase, where X is increasing.

The second time that X passes 29.34 Mbps is at about 2.68 seconds into the trace, where it already has been halved three times (relative to the peak value of 234.7 Mbps). At this time X was 29.3 Mbps, so the next expiry is at about 12 microseconds later. After halving, X has a value of 14.7 Mbps, the next expiry of the nofeedback timer is thus after 2*22*8/14.7 = 24 microseconds, then leading to further expiries at 48, 96, 192, ... microseconds.

An astute observer will find such a rapid decline of X does not happen in the diagram. The reason for this is that the Linux implementation has clamped the nofeedback expiry interval to a minimum user-configurable value which defaults to 100 milliseconds:

X = max(X/2, s/t_mbi)

where s/t_mbi is the asymptotic value
of 1 packet in 64 seconds. Since in the given scenario p == 0, this is what happens (and this rule is still in place in
revision
05 of rfc3448bis).

The nofeedback timer will expire every 12 microseconds as long as s/X < 6 microseconds, or as long as X > 29.34 Mbps. When X has been reduced to lower than 29.34 Mbps, then s/X < 2*RTT and thus the next timer expiry happens at 2*s/X. In the X diagram above, the first time that X reaches a value of 29.34 Mbps between 1.02 and 1.11 seconds. But this is still during the semi slow-start phase, where X is increasing.

The second time that X passes 29.34 Mbps is at about 2.68 seconds into the trace, where it already has been halved three times (relative to the peak value of 234.7 Mbps). At this time X was 29.3 Mbps, so the next expiry is at about 12 microseconds later. After halving, X has a value of 14.7 Mbps, the next expiry of the nofeedback timer is thus after 2*22*8/14.7 = 24 microseconds, then leading to further expiries at 48, 96, 192, ... microseconds.

An astute observer will find such a rapid decline of X does not happen in the diagram. The reason for this is that the Linux implementation has clamped the nofeedback expiry interval to a minimum user-configurable value which defaults to 100 milliseconds:

actual_expiry_interval = max(user_config_value, max(4*RTT, 2*s/X))

This default value is incidentally near
the inter-packet gap (98
milliseconds) of the sending application. Hence halving of the
sending rate occurs approximately at sending time intervals.

A slow-down will now occur when t_ipi=s/X > 50msec. This happens when X is less than 3.52 kbps and can be observed in all the above diagrams as a change in the gradient of the increase after about 4 seconds into the trace. Consider the X plot above: sampling points are taken each time a new datagram is sent (i.e. every 98 milliseconds); after 4 seconds, the gradient changes and one can see that often two consecutive samples still have the same X rate, indicating that X now decreases at a slower rate.

#### 3.2 Why is there no longer any feedback?

X will be halved as long as there is no feedback. And indeed, in the
trace the receiver stopped sending feedback approximately at the time
when the EWMA of the RTT had converged to the actual link RTT of 3
microseconds.

The reason for this is in the window-counter algorithm of RFC 4342: the receiver only sends feedback when the CCVal changes, and the algorithm uses the following expression:

### 4. Counter-measure

The obvious thing to do is, since the window counter algorithm does not
work for RTTs smaller than 4 microseconds, to clamp all RTT samples to
a value greater than 4 microseconds.

This is shown below to successfully work, tested by artificially inflating the RTT. Subsequently an equilibrium condition is computed.

#### 4.1 Reaching stability by artificially inflating the RTT

Experiments were performed where the RTT was artificially inflated by
adding
0.5..1msec link delay as described in section
1.

#### 4.2 Computing an equilibrium condition

The
particulars of the given situation are:

A slow-down will now occur when t_ipi=s/X > 50msec. This happens when X is less than 3.52 kbps and can be observed in all the above diagrams as a change in the gradient of the increase after about 4 seconds into the trace. Consider the X plot above: sampling points are taken each time a new datagram is sent (i.e. every 98 milliseconds); after 4 seconds, the gradient changes and one can see that often two consecutive samples still have the same X rate, indicating that X now decreases at a slower rate.

The reason for this is in the window-counter algorithm of RFC 4342: the receiver only sends feedback when the CCVal changes, and the algorithm uses the following expression:

quarter_RTTs = floor((current_time - last_WC_time) / (RTT/4))

Since the implementation uses
microsecond resolution, a divide-by-zero would occur for any RTT
smaller than 4 microseconds. Hence there is no feedback, and therefore
the nofeedback timer continues to halve the sending rate X until
reaching the asymptotic value of s/t_mbi.

This is shown below to successfully work, tested by artificially inflating the RTT. Subsequently an equilibrium condition is computed.

The steps can be traced in the 1ms log and 2ms log.
The following graphs are a consequence of this stabilistation (as they derive from X).

- The inter-packet gap IPG of the sending application
is such that IPG > 2*RTT.
The sender thus qualifies as "idle
between send()s" according to draft
rfc3448bis-00, so that the new value X' is computed as

X' = max(min(2*X, max(W_init/RTT, 2*X_recv), s/RTT) - The sending rate is application-limited, i.e. X_send < s/RTT. The receiver reports this as X_recv = X_send < s/RTT.
- Since X_recv < s/RTT,
we can simplify

X' = max(min(2*X, W_init/RTT), s/RTT)

- When the start value X >= W_init/(2*RTT) then the series converges towards W_init/RTT.
- When the start value X < s/(2*RTT) then the series converges towards s/RTT.