This is the main document summarizing
the inner workings of the CCID3 packet receive path. This consists of:
- RX history handling
- loss detection
- RTT sampling
- transfer of loss information
from RX history to Loss Intervals database
The following UML diagram shows the packages which are updated by this
patch set.
Generally,
when
a new packet
comes in, the CCID 3
packet reception algorithm must do the following:
- check if a loss has
occurred;
- perform RTT sampling
on data packets as long as no loss has
occurred (the RTT is needed to compute the first loss interval);
- send feedback periodically,
once per RTT;
- send feedback when the value of `p' has changed;
- update s, p, the number of bytes received since sending the last
feedback.
Combining these requirements is not trivial, the following documents
each functional aspect of the algorithm.
1. Overview
The following shows the main loop of CCID3 packet reception.
2. General Principles
If no packet has been received so far, the algorithm merely updates the
entry with the
highest
received
sequence number so far (called
`
nonloss.S'
in the
flowchart). It keeps a counter which counts the
number of packets since last a packet was detected to be missing.
Detecting a missing packet
works by comparing the current sequence
number against the highest received sequence number so far. We use a
function dist(A, B) which
returns the sequence number distance between
sequence numbers A and B as a signed number. This
makes it possible to
detect
re-ordering and duplicates: if dist(A,B) <= 0 then the
packet
with sequence number B is
either re-ordered (`before' A)
or B==A.
Now, a loss indication
is
given when D = dist(nonloss.S,
S) > 1 and the NDP count N of the
current packet is such that N
< D. Missing non-data
packets are not
considered; this is in accordance with RFC 4342. The effect of a loss
indication is that the highest received sequence number
obtained
up
till then is stored. Flagging a new loss is accomplished using index manipulation:
-
loss_count = 1
-
loss_prev = hist[0]
-
nonloss.S = hist[loss_count]
This ensures that the highest received sequence number received before
the loss is stored in
hist[0],
while
hist[1] has the
overall
highest-received sequence number so far (which is always updated). The
loss_count is incremented
each time a packet is received whose sequence
number
S' is such that
dist(loss_prev.S, S') > 0.
When
loss_count =
NDUPACK = 3, then the
pending
loss
is promoted to a
loss event.
3. Loss Detection
Loss detection needs to
- be robust against packet reordering;
- detect when holes are filled
due to late arrival;
- handle bursts of losses;
- determine whether two losses belong
to the same loss interval.
Comprehensive details can be found
here.
4. Handling Data Packets and Sending Feedback
Initial
feedback is sent when the first data packet arrives. The
payload length of data packets is always tracked in the CCID 3
parameter `
s'; also the
number of bytes received in between feedbacks
(variable
bytes_rcvd) is
kept up-to-date. When a loss is pending (
1 <= loss_count < 3), the
receiver does not recompute
p
or
sample RTT values. This is for two reasons so:
- the missing packet(s) may come in late, due to delay or
re-ordering;
- when packets are missing, comparing window counter fields (CCVal)
is no longer reliable - the
counter may have wrapped around the just 16
values in the meantime.
Computing the value of
X_recv
contained in feedback packets is described later, in
section 8.
5. Sampling RTT Values
As long as the loss interval history is empty,
p is 0 and hence need
not be recomputed. On the other hand, a RTT value is needed to compute
the length of the first loss interval (RFC 3448, 6.3.1). Since a CCID 3
receiver only has the CCVal field as RTT information, it must try to
obtain an RTT estimate before the first loss occurs, cf. RFC 4342,
8.1. The finer details are
here.
6. Maintaining the Loss Intervals Database
The TFRC specification suggests an implementation with a fixed number
of
n=8 entries, where the
oldest entry is overwritten when a new entry
is added at the front. This calls for a
ring-buffer
implementation which is maintained in FIFO manner. It is again
realised on top of a fixed array, using index arithmetic. The basic
principle is simple, details of a user-space implementation can be
found
here. The actual in-kernel
implementation is a bit more sophisticated: it uses
on-demand allocation
so that applications running on loss-free links require less memory.
7. Distinguishing Data and Non-Data
packets
RFC 4342
introduces the concept of
data length of a
loss interval, which is the sequence length of a loss interval "
minus the number of
non-data packets the sender transmitted during the loss interval"
(sec. 6.1.1).
For the receiver, computing the data length of a loss interval
constitutes a challenge in that the data length can only be reliably
computed if certain conditions hold; in most cases the receiver is
at best able to compute
a lower bound as an approximation of the data length, but will
lack the information to compute it exactly.
The RFC suggests that "
the receiver can account for received
non-data packets by not including them in the data length, and for
packets that were not received, it may be able to discriminate between
lost data packets and lost non-data packets using DCCP's NDP Count
option.the receiver can account for received non-data packets"
(ibid.)
Hence the following
necessary condition
must be satisfied for a receiver to compute the loss interval data
length exactly:
- For each `hole' in the loss interval's lossy part, defined by the
highest received sequence number S_i
before the loss and the first packet, with sequence number S_j and NDP count N_j, received after the loss: N_j must account exactly for
all non-data packets sent between S_i
and S_j.
If this is not the case, then
N_j is
at best a lower bound
for the number of non-data packets between
S_i and
S_j. This is in particular the
case when the distance from
S_i
to
S_j is large (more
than one packet loss), e.g. due to a link failure or prolonged burst.
The receiver can
not distinguish
between single and multiple losses (RFC 4342, p. 25); the only
information it has available is the NDP count
N_j of the first packet
received after the loss.
This uncertainty increases proportionally with the number of losses
occurring in one loss interval, since the total number of non-data
packets in that interval will sum all such NDP counts
N_j.
The design decision has therefore been to use the conservative choice
of assuming that the NDP count in each loss interval is always equal to
zero. This policy is
in agreement with RFC
4342: "
In
the absence of better information, a receiver MUST conservatively
assume that every lost packet was a data packet and thus must
occur in some lossy part" (p. 12).
In addition, there are two more reasons supporting this design choice.
Firstly, all situations where the
connection from sender to receiver consists mainly of streaming data
packets, the NDP count will de facto be low or even zero.
Secondly, in order to approximate a
lower bound on the NDP count in a loss interval, the receiver needs to
track NDP and packet type information throughout each of the three
subsystems (1) CCID3 packet reception, (2) receiver history, and (3)
loss interval history. The added implementation complexity that this
requires does not seem to be adequately matched by the gains of
obtaining an estimated lower bound of the NDP count.
However, it may make sense to upgrade the implementation at a later
stage, if experience suggests that connections involving a large
fraction of non-data packets are not an exception. As a greatly
exaggerrated worst-case analysis, suppose that in each loss interval
the NDP counts amounts to 50% of the interval length. Then an
I_mean
based on a zero NDP count will be twice as large than if it were based
on the actual interval data length, resulting in half the value of the
estimated loss event rate
p =
1/I_mean. The upshot would be that the
sender reacts more slowly to congestion.
On the other hand, an NDP count of 50% per each loss interval is a
practically pathological. Therefore, we stick for the moment with the
conservative assumption from above, leaving other choices as future
implementation
options.
8. Computing
X_recv
This section describes computation of
X_recv wrt [
RFC 4342,
10.3] and [
RFC
3448, 6.], using the following terminology from [
RFC 3448, 6.2]:
- S_m
- The sequence number of the last data packet that has been
received.
- R_m
- The last RTT estimate, taken at the time packet S_m was received.
We further refer to the
last_counter
window counter CCVal variable defined in [
RFC 4342,
10.3]. CCID3 differs from TFRC in that (i) the sender does not
provide explicit RTT estimates to the receiver and that therefore (ii)
the receiver has no feedback timer. On the other hand, CCID3 conforms
to TFRC, and so the relationship needs to be established; which is done
in the following sequence:
- Periodic
feedback is sent whenever the mod-16 distance from last_counter to the current
packet's CCVal window counter value exceeds 3.
- In the absence of other/better information, we have to assume
that the time interval
between two packets which satisfy (1) is a (close) approximation of
one RTT; and hence we equate this time interval with R_m. This means that X_recv will be calculated as
the number of bytes received since last_counter was last set,
divided by R_m, the RTT
estimate given by the arrival of a data packet satisfying (1). To
keep track of the number of bytes
received since the last feedback was sent, a variable bytes_rcvd is used.
- However, when sending feedback due to a
change in parameters (new loss occurs or p > p_prev), we do not have dependable information about
the length of an RTT: in particular, the loss could have
occurred immediately after sending the last feedback packet. Hence we
can not compute X_recv
reliably, and therefore the best solution is to reuse the previous value of X_recv, as
computed at the time of last sending periodic feedback.
- Finally, the receiver-cached
value of X_recv
in (3) may be zero - e.g. due a loss occurring early during a
connection, after sending initial feedback as per RFC 3448, 6.3. There
are two ways of solving this problem (of which the second one is used):
- send feedback with p
> 0 and X_recv = 0
(the previous value). At the sender, this would immediately bring down
the sending rate to s/t_mbi
(cf. RFC 3448, 4.3);
- in the absence of better information, consider the time interval since last_counter was set (i.e.
since the last feedback packet was sent) as the best possible approximation of R_m and
compute X_recv over this
time interval. This is still crude, but seems safe to do since the
sending rate is limited above by X_calc,
no matter how high the estimate of X_recv is.
To
summarise: The
byte counter
(
bytes_rcvd) is always
reset
whenever feedback is sent (periodic or otherwise), at the same
last_counter is
updated. Hence, when a packet satisfying (1) arrives, we have counted
the number of bytes over the period of last setting
last_counter; this period
serves as an approximation of
R_m;
and in this manner the computation of
X_recv conforms to step (2) of [
RFC 3448, 6.2].