The case for tail_ackno
This text fixes two problems in
implementing the circular buffer of RFC 4340, Appendix A.
1. Background
This section shortly summarises the main
mechanism, using the terminology and
variable names from RFC 4340.
The circular buffer `buf' keeps
track of the highest-received sequence number in buf_ackno, and the HC-receiver
records information about previously-sent Ack Vectors in terms of
- ack_ptr
(the position of buf_head
at the time of sending the Ack
Vector);
- ack_ackno
(the value of buf_ackno
at the time of sending the Ack
Vector);
- ack_runlen
(the value of buf[ack_ptr].runlen
at the time of sending the Ack
Vector).
Assuming that
- buf_tail
always indicates the first free
position immediately after
the live portion of the
buffer, and that
- buf_head
moves from right to left (from
higher to lower index numbers),
the following happens when
the
HC-sender acknowledges the receipt of an Ack Vector by sending a packet
carrying the acknowledgment number A:
- The receiver locates the Ack
Vector record R
which satisfies R.ack_seqno == A
(as per A.3).
- The following modified pseudo-code
is executed (following the
intent of A.3):
if (buf[R.ack_ptr].runlen > R.ack_runlen) {
/*
* This cell is still `live' and must not be freed: move the tail pointer
* to the next-oldest buffer cell (towards the next-higher circular index)
*/
buf[R.ack_ptr].runlen -= R.ack_runlen + 1
buf_tail = (R.ack_ptr + 1) % ARRAY_SIZE
} else if (buf[R.ack_ptr].runlen < R.ack_runlen) {
/* This case must not occur and would constitute an implementation bug */
} else {
/* buf[R.ack_ptr].runlen == R.ack_runlen: free this cell */
buf_tail = R.ack_ptr
}
The above differs slightly from the presentation in A.3, but is
logically consistent when considering that buf_tail points to the first
free cell after the live
portion of the buffer.
2. Limitations of the naive algorithm
The above algorithm will not perform correctly: it is not able to deal
with the case of overlap between Ack
Vector records. As an
example, consider a first Ack Vector which reported a run length of x, and a subsequent one with the same ack_ptr value but a run length
of, e.g., x+1.
In the following section it is shown that such overlap leads the above
algorithm into confusion and a messed-up buffer state. The illustration
is on a real-life example, using the above algorithm; the corresponding
packet traces can be downloaded here.
3. Demonstration of failure
The following comments on a selected section of the packet capture
where the failure of the above described algorithm was observed.
Sequence and acknowledgment numbers are truncated, a buffer with
an ARRAY_SIZE=2*253
was used.
- The receiver has received packets 785 and 786 from the sender.
Its buffer state of buf_ackno=786,
buf[buf_head].runlen=1
is indicated by the shorthand
"786/1". Due to the given Ack Ratio of
2,
the receiver now sends an Ack
with
- sequence number
404,
- acknowledgment
number 786,
- featuring an Ack Vector
of state 0 and run length 1:
All this is expressed via the second
shorthand "#404: 786/1".
- Two more packets arrive, resulting in a buffer state of 788/3 and subsequent Ack #405: 788/3.
- Same as in (2), arrival of two more packets changes buffer state
to 790/5, and trigger Ack #406: 790/5.
- Same as in (3), leading to a buffer state of 792/7 and Ack #407: 792/7.
- The next packet from the sender carries sequence number
793 and ack number 404, thus relating back to (1):
- The receiver finds the record for ack_seqno=404: it has
- ack_ackno=786
and
- ack_runlen=1.
- Using the above algorithm, the circular buffer state is now
modified as follows:
- the run length of
previously 7 changes to 5 (=
7 - (1 + 1)),
- buf_ackno
remains at 792.
- The receipt of packet 793 is now registered in the buffer;
resulting in a state of 793/6.
- Packet number 794 arrives from the sender, changing the receiver
buffer state to 794/7;
time for Ack #408: 794/7.
- Packet number 795 arrives, changing the buffer state to 795/8. Everything fine up till here.
- Packet number 796 arrives with ack number 405, thus
freeing state for the Ack Vector sent in (2):
- The receiver finds the Ack Vector record with ack_seqno=405, ack_ackno=788, ack_runlen=3.
- Due to (7) it finds that the
run length of
buf[R.ack_ptr] is now 8.
- As per the above algorithm, it subtracts 3+1, leading to a
buffer state of 795/4.
- It registers receipt of packet 796 in the updated buffer,
resulting in a state of 796/5.
- The Ack Ratio
dictates that now the next acknowledgment is due, giving Ack #409: 796/5.
Note that the buffer is now in an inconsistent
state:
- before the arrival of packet 796,
the buffer state covered packets 786
... 795
- now it covers
packets 791 ... 796
- two packets, 789 and
790 (which were not
covered by Ack #405), have
"vanished"
from the buffer state.
The reason for the disappearance of packets 789/790 is a miscalculation:
- due to step 7, the buffer extended from 787 ... 795
- Ack #405 covered
packets 785 ... 788
The run length of 3+1 should not have
been subtracted in step 8(c) above. Rather, the intersection of buffer state and the packets covered
by Ack #405
should have been used. This would have lead to the correct buffer state
of 796/7.
Unfortunately, such a
computation is non-trivial
with the given data structures and the above algorithm: only buf_ackno is encoded
explicitly, every other sequence number is encoded differentially
relative to buf_ackno (so that the entire chain would
need to be traversed). This deficit
motivates the use of a `tail_ackno'
field which is proposed in the next section.
Concluding the
discussion of the packet trace, the following steps show how the buffer
ends up in utter confusion:
- Packets 797 and 798 arrive, triggering Ack #410: 798/7 (packets 789 and 790 are no longer
accounted for).
- Packet number 799 with acknowledgment number 408 arrives:
- The receiver retrieves the record with ack_seqno=408, ack_ackno=794, ack_runlen=7.
- Due to step (9), the receiver finds that buf[R.ack_ptr].runlen = 7 ==
R.ack_runlen.
- Following the above, naive algorithm, the receiver now believes
that it can free this cell.
- The receipt of packet 799 is then recorded, finally leading to
the state of 799/0.
The fatal
problem here is that the receiver behaves as if the sender had
acknowledged all packets up to and including 410, while in reality only
the packets up to and including 408 have been acknowledged.
4. Modified algorithm
The proposed `fix' is to add another field, tail_seqno, which tracks the
highest buffer sequence number that
has so far been acknowledged by the
HC-sender.
4.1 Initial setting
Initially,
when the first insertion with regard to sequence number `seqno' is made into the empty
buffer, the HC-receiver sets
buf_ackno = tail_ackno = seqno
During subsequent insertions,
tail_ackno remains
unaffected.
4.2 Steady state
When acknowledgments
from the HC-sender arrive, the following algorithm is used
instead of the above one:
/*
* Look up record R with R.ack_seqno corresponding to the given ackno
*/
/* See section (5) below: we only free records here but do not change buffer state */
if (R.ack_ackno is `before' tail_ackno)
goto free_records;
max_runlen = (R.ack_ackno + 2^48 - tail_ackno) % 2^48
effective_runlen = min(R.ack_runlen, max_runlen)
if (buf[R.ack_ptr].runlen > effective_runlen) {
buf[R.ack_ptr].runlen -= effective_runlen + 1
buf_tail = (buf_tail + 1) % ARRAY_SIZE
} else if (buf[R.ack_ptr].runlen == effective_runlen) {
buf_tail = R.ack_ptr
}
/* tail_ackno always marks the earliest packet of group (2) in 11.4.2 */
tail_ackno = (R.ack_ackno + 1) % 2^48
Verifying this algorithm on the described packet trace is left as
exercise to the reader (it works).
5. How to deal with valid records that are too old for the buffer
state
A related problem arises when e.g. the HC-sender is using Ack Vectors
to acknowledge the Acks sent by the HC-receiver.
The basic problem is
that the HC-sender is piggybacking Ack Vectors onto its data traffic at
a high rate, accumulating
a large number of Ack Vector records in the list. In
the extreme case this means one Ack Vector record stored per DataAck
packet; hence turning off sending Ack Vectors at the
sender is a good idea (but
not always possible).
The receiver, on the
other hand, is sending Acks at a slower
rate (depending on Ack
Ratio:
if it is e.g. 2, the Ack sending rate is approximately half the data
sending rate). A consequence is
that the
HC-sender will send multiple DataAcks
which acknowledge the same (earlier)
Ack sent by the HC-receiver. That is, there will be many records
featuring a different ack_seqno,
but with the same ack_ackno.
Under this constellation, it is possible that the list of the HC-sender
may contain
Ack Vector records which have not yet been acknowledged by the
HC-receiver, but which are too old
for the buffer state.
This problem
is addressed in the first part of the above algorithm; and it was
observed before the statement
if (R.ack_ackno is `before' tail_ackno)
goto free_records;
had been added. Here is a packet trace
which illustrates the problem and which was captured with the above
statement disabled (to work
out what is going on, it is useful to sketch the state of the Ack
Vector record list and the value of tail_ackno on paper).
The
HC-receiver will occasionally free state relating to DataAcks it has
received so far. But there will still be records in the HC-sender's
list which refer to the same, outdated ack_ackno. When the HC-receiver
eventually sends an acknowledgment for such records,
allowing access to the buffer variables means attempting to modify
state which has long been acknowledged. Hence
the above test: we must not allow earlier, outdated records to
interfere with
the buffer state.
It is however a good idea to free the outdated Ack Vector records at
this stage; in the above this is indicated by the jump label `free_records'.