Estimating sufficient option headroom with regard to DCCP's MPS =============================================================== There is a long-outstanding FIXME in dccp_sync_mss() which involves finding a conservative estimate of how much space will be needed for the options, so as to be able to pass on an MPS value (RFC 4340, 14) to the API. This text summarises relevant information and suggests a strategy for estimating the MPS. 1. General ========== The Maximum Packet Size (MPS) is of interest for applications which want to transfer data, so it is only relevant to the data transfer phase of a connection (unless one wants to send data on the DCCP-Request, but that is not considered here). The objective is to ensure that there is sufficient headroom for common packet options (RFC 4340, 5.8), while leaving as much packet space for application data as possible. Since the option headroom is independent of the application data size, it determines a lower bound of the packet size (when all considered options are inserted into a packet with no application data). 2. The use of CCMPS =================== An upper bound on the MPS is given by the CCID-specific Maximum Packet Size (CCMPS, defined in RFC 4340, sec. 14), which has the effect of effective_MPS = min(CCMPS, MPS). Currently: * CCID2 and CCID3 pose no limits on the MPS and thus implicitly set it to `infinity'; * CCID4 ( http://tools.ietf.org/html/draft-floyd-dccp-ccid4-01 ), limits via RFC 4828 the packet size to 1500 bytes (including network and transport headers). The question now is how to derive the single CCMPS value from the RX and TX CCIDs, since both may define a value. We have the following facts: * for the HC-sender to HC-receiver half-connection the MPS determines the maximum payload size; * for the HC-receiver to HC-sender half-connection there is only feedback traffic (Acks). Since Acks are expected to be small in comparison with (bulk) data traffic, a pragmatic solution (chosen here) is to only consider the MPS value corresponding to the HC-sender to HC-receiver half-connection. 3. Strategy of guessing the overall option-space requirements ============================================================= The following shows three possible extreme cases to be avoided. The present issues are comparable to the TCP MSS considerations which appeared in RFC 879 and RFC 1122, section 4.2.2.6. * overcautious: using 4*255 (max val of Data Offset, sec. 5) as upper bound is safe, but results in a poor maximum application payload size of 1500-20-1020 = 460 bytes for IPv4 DCCP packets, which is too inefficient. * everything-in-advance: some options (Change L/R, Init Cookie) only appear once or twice during a connection; these should not be considered for average MPS. CCID2 uses Ack Vector, but reserving DCCP_MAX_ACKVEC_LEN = 506 bytes is also too much for the average case where e.g. only a few packets are ACKed. * too careless: allowing the application to fill up to the maximum and then lack the space for a Slow Receiver option for example would be bad. The host requirements RFC 1122 states that TCP must take into consideration both IP and TCP option sizes (Eff.snd.MSS). For DCCP we have as further simplifying factor the distinction into * options which may be carried on data packets and * options which must not appear on data packets, as surveyed in table 3 of section 5.8 in RFC 4340. Since the interest of using the MPS is in the data transfer phase, the strategy is to leave room for only such options that may appear on data packets. This includes in particular variable-length options such as Ack Vector: as detailed in section 11.2 of RFC 4340, such options can use a separate Ack if a current DataAck packet does not have sufficient room left. Due to the similarities in format, the same consideration also applies to Data Dropped options (11.7). The restriction further excludes options which appear only or mainly during connection setup, such as Init Cookie (8.1.4) and feature-negotiation options (Change L/R, Confirm L/R, Mandatory). The following section shows the available option sizes, from which the subsequent section derives the actual computation rule for allocating option headroom. 4. Survey of currently defined options ====================================== The table below surveys all fixed-length options from RFC 4340. Some fixed-length options can have different lengths (e.g. Timestamp Echo option). Since we are interested in a worst-case estimate, the table only lists the maximum-possible option lengths. +----------------+----+---------+-----------------------------------------------------------+ | Name | Len| Section | Remarks | +----------------+----+---------+-----------------------------------------------------------+ | Padding | 1 | 5.8.1 | currently only used at end, to align 32bit-wise (5.8) | | Mandatory | 1 | 5.8.2 | currently only used for feature negotiation (6.6.9) | | Slow Receiver | 1 | 11.6 | not implemented; would be set via API (boolean?) | +----------------+----+---------+-----------------------------------------------------------+ | Timestamp | 6 | 13.1 | Used on initial handshake for RTT, CCID-specific otherwise| | Timestamp Echo | 10 | 13.3 | should be added, since depends on incoming timestamp | | Elapsed Time | 6 | 13.2 | CCID-specific | +----------------+----+---------+-----------------------------------------------------------+ | Data Checksum | 6 | 9.3 | not implemented; depends on Check Data Checksum Feature | | NDP Count | 8 | 7.7 | depends on Send NDP Count Feature (7.7.2) | +----------------+----+---------+-----------------------------------------------------------+ For completeness we also list the variable-length options in the following table. +----------------------+--------+---------+-----------------------------------------------------------+ | Name | Length | Section | Remarks | +----------------------+--------+---------+-----------------------------------------------------------+ | Change L/R | 4..255 | 6.1 | Not used during normal data transfer, so can be excluded | | Confirm L/R | 3..255 | 6.2 | from MPS considerations for normal data flow. | +----------------------+--------+---------+-----------------------------------------------------------+ | Init Cookie | 3..255 | 8.1.4 | not implemented; initial handshake only, so not relevant | +----------------------+--------+---------+-----------------------------------------------------------+ | Ack Vector Nonce 0/1 | 3..255 | 11.4 | should be considered separately, see notes above | | Data Dropped | 3..255 | 11.7 | not implemented; same considerations as Ack Vector | +----------------------+--------+---------+-----------------------------------------------------------+ Lastly we consider CCID-specific options: CCID2 does not define any of these, so the table below lists the options defined by CCID3. All of these are inserted on feedback packets from the HC-receiver to the sender (cf. RFC 4340, 10.3), and none of these may appear on data packets (table 1 in section 8 of RFC 4342). Hence this table is also for completeness only, but not relevant for deriving the option headroom. +-----------------+----------+-----+---------------------------------------------------------+ | Name | Len | Sec | Remarks | +-----------------+----------+-----+---------------------------------------------------------+ | Receiver Rate | 6 | 8.3 | must be present in all acknowledgments, should be added | | Loss Event Rate | 6 | 8.5 | depends on Send Loss Event Rate Feature (8.4) | | Loss Intervals | 84..255 | 8.6 | not implemented | +-----------------+----------+-----+---------------------------------------------------------+ 5. Computation of option headroom for application-data packets ============================================================== The above considerations necessitate to leave room for the following options on application-data packets: * 1 byte for Slow Receiver (11.6) * 6 bytes for Timestamp (13.1) * 10 bytes for Timestamp Echo (13.3) * 8 bytes for NDP count (7.7) * 6 bytes for Data Checksum (9.3) - once it is supported by Linux 6. Further considerations on piggybacking ========================================= It is still possible to use DataAck packets to carry even variable-length options -- given that enough space is left. However, a fallback solution is needed when space which is originally intended for application data can not be used for piggybacking additional options. RFC 4340, sec. 14 recommends that, when running out of space for options, "the implementation should [...] send the options on a separate packet (such as a DCCP-Ack)". In this case it would be good to have for instance an `overflow queue' which would store all non-data packet options (table 3 in 5.8) that could not be piggybacked. When running out of piggybacking space on the current DataAck, options would be transferred to that queue; and a pure Ack would e.g. be scheduled to release the options waiting in the overflow queue.