The case for tail_ackno

This text fixes two problems in implementing the circular buffer of RFC 4340, Appendix A.

1. Background

This section shortly summarises the main mechanism, using the terminology and variable names from RFC 4340.

The circular buffer `buf' keeps track of the highest-received sequence number in buf_ackno, and the HC-receiver records information about previously-sent Ack Vectors in terms of
Assuming that
the following happens when the HC-sender acknowledges the receipt of an Ack Vector by sending a packet carrying the acknowledgment number A:
  1. The receiver locates the Ack Vector record R which satisfies R.ack_seqno == A (as per A.3).
  2. The following modified pseudo-code is executed (following the intent of A.3):
	if (buf[R.ack_ptr].runlen > R.ack_runlen) {
/*
* This cell is still `live' and must not be freed: move the tail pointer
* to the next-oldest buffer cell (towards the next-higher circular index)
*/
buf[R.ack_ptr].runlen -= R.ack_runlen + 1
buf_tail = (R.ack_ptr + 1) % ARRAY_SIZE

} else if (buf[R.ack_ptr].runlen < R.ack_runlen) {
/* This case must not occur and would constitute an implementation bug */

} else {
/* buf[R.ack_ptr].runlen == R.ack_runlen: free this cell */
buf_tail = R.ack_ptr
}

The above differs slightly from the presentation in A.3, but is logically consistent when considering that buf_tail points to the first free cell after the live portion of the buffer.

2. Limitations of the naive algorithm

The above algorithm will not perform correctly: it is not able to deal with the case of overlap between Ack Vector records. As an example, consider a first Ack Vector which reported a run length of x, and a subsequent one with the same ack_ptr value but a run length of, e.g., x+1.

In the following section it is shown that such overlap leads the above algorithm into confusion and a messed-up buffer state. The illustration is on a real-life example, using the above algorithm; the corresponding packet traces can be downloaded here.

3. Demonstration of failure

The following comments on a selected section of the packet capture where the failure of the above described algorithm was observed. Sequence and  acknowledgment numbers are truncated, a buffer with an ARRAY_SIZE=2*253 was used.
  1. The receiver has received packets 785 and 786 from the sender. Its buffer state of buf_ackno=786, buf[buf_head].runlen=1 is indicated by the shorthand "786/1". Due to the given Ack Ratio of 2, the receiver now sends an Ack with
    All this is expressed via the second shorthand "#404: 786/1".

  2. Two more packets arrive, resulting in a buffer state of 788/3 and subsequent Ack #405: 788/3.
  3. Same as in (2), arrival of two more packets changes buffer state to 790/5, and trigger Ack #406: 790/5.
  4. Same as in (3), leading to a buffer state of 792/7 and Ack #407: 792/7.

  5. The next packet from the sender carries sequence number 793 and ack number 404, thus relating back to (1): 
    1. The receiver finds the record for ack_seqno=404: it has 
    2. Using the above algorithm, the circular buffer state is now modified as follows:
    3. The receipt of packet 793 is now registered in the buffer; resulting in a state of 793/6.
  1. Packet number 794 arrives from the sender, changing the receiver buffer state to 794/7; time for Ack #408: 794/7.
  2. Packet number 795 arrives, changing the buffer state to 795/8. Everything fine up till here.

  3. Packet number 796 arrives with ack number 405, thus freeing state for the Ack Vector sent in (2):
    1. The receiver finds the Ack Vector record with ack_seqno=405, ack_ackno=788, ack_runlen=3.
    2. Due to (7) it finds that the run length of buf[R.ack_ptr] is now 8.
    3. As per the above algorithm, it subtracts 3+1, leading to a buffer state of 795/4.
    4. It registers receipt of packet 796 in the updated buffer, resulting in a state of 796/5.
    5. The Ack Ratio dictates that now the next acknowledgment is due, giving Ack #409: 796/5.
Note that the buffer is now in an inconsistent state:
The reason for the disappearance of packets 789/790 is a miscalculation:
The run length of 3+1 should not have been subtracted in step 8(c) above. Rather, the intersection of buffer state and the packets covered by Ack #405 should have been used. This would have lead to the correct buffer state of 796/7.
Unfortunately, such a computation is non-trivial with the given data structures and the above algorithm: only buf_ackno is encoded explicitly, every other sequence number is encoded differentially relative to buf_ackno (so that the entire chain would need to be traversed). This deficit motivates the use of a `tail_ackno' field which is proposed in the next section.

Concluding the discussion of the packet trace, the following steps show how the buffer ends up in utter confusion:
  1. Packets 797 and 798 arrive, triggering Ack #410: 798/7 (packets 789 and 790 are no longer accounted for).
  2. Packet number 799 with acknowledgment number 408 arrives:
    1. The receiver retrieves the record with ack_seqno=408, ack_ackno=794, ack_runlen=7.
    2. Due to step (9), the receiver finds that buf[R.ack_ptr].runlen = 7 == R.ack_runlen.
    3. Following the above, naive algorithm, the receiver now believes that it can free this cell.
    4. The receipt of packet 799 is then recorded, finally leading to the state of 799/0.
The fatal problem here is that the receiver behaves as if the sender had acknowledged all packets up to and including 410, while in reality only the packets up to and including 408 have been acknowledged.

4. Modified algorithm

The proposed `fix' is to add another field, tail_seqno, which tracks the highest buffer sequence number that has so far been acknowledged by the HC-sender

4.1 Initial setting

Initially, when the first insertion with regard to sequence number `seqno' is made into the empty buffer, the HC-receiver sets
	buf_ackno = tail_ackno = seqno
During subsequent insertions, tail_ackno remains unaffected.

4.2 Steady state

When acknowledgments from the HC-sender arrive, the following algorithm is used instead of the above one:
	/* 
* Look up record R with R.ack_seqno corresponding to the given ackno
*/

/* See section (5) below: we only free records here but do not change buffer state */
if (R.ack_ackno is `before' tail_ackno)
goto free_records;

max_runlen = (R.ack_ackno + 2^48 - tail_ackno) % 2^48
effective_runlen = min(R.ack_runlen, max_runlen)

if (buf[R.ack_ptr].runlen > effective_runlen) {
buf[R.ack_ptr].runlen -= effective_runlen + 1
buf_tail = (buf_tail + 1) % ARRAY_SIZE

} else if (buf[R.ack_ptr].runlen == effective_runlen) {
buf_tail = R.ack_ptr
 }

/* tail_ackno always marks the earliest packet of group (2) in 11.4.2 */
tail_ackno = (R.ack_ackno + 1) % 2^48
Verifying this algorithm on the described packet trace is left as exercise to the reader (it works).

5. How to deal with valid records that are too old for the buffer state

A related problem arises when e.g. the HC-sender is using Ack Vectors to acknowledge the Acks sent by the HC-receiver.

The basic problem is that the HC-sender is piggybacking Ack Vectors onto its data traffic at a high rate, accumulating a large number of Ack Vector records in the list. In the extreme case this means one Ack Vector record stored per DataAck packet; hence turning off sending Ack Vectors at the sender is a good idea (but not always possible).

The receiver, on the other hand, is sending Acks at a slower rate (depending on Ack Ratio: if it is e.g. 2, the Ack sending rate is approximately half the data sending rate). A consequence is that the HC-sender will send multiple DataAcks which acknowledge the same (earlier) Ack sent by the HC-receiver. That is, there will be many records featuring a different ack_seqno, but with the same ack_ackno.

Under this constellation, it is possible that the list of the HC-sender may contain Ack Vector records which have not yet been acknowledged by the HC-receiver, but which are too old for the buffer state.

This problem is addressed in the first part of the above algorithm; and it was observed before the statement
			if (R.ack_ackno is `before' tail_ackno)
goto free_records;
had been added. Here is a packet trace which illustrates the problem and which was captured with the above statement disabled (to work out what is going on, it is useful to sketch the state of the Ack Vector record list and the value of tail_ackno on paper).

The HC-receiver will occasionally free state relating to DataAcks it has received so far. But there will still be records in the HC-sender's list which refer to the same, outdated ack_ackno. When the HC-receiver eventually sends an acknowledgment for such records, allowing access to the buffer variables means attempting to modify state which has long been acknowledged. Hence the above test: we must not allow earlier, outdated records to interfere with the buffer state.

It is however a good idea to free the outdated Ack Vector records at this stage; in the above this is indicated by the jump label `free_records'.