Implementing shutdown(2)
This memo summarises notes on the implementation of shutdown() for DCCP. The shutdown() system call
originated from TCP applications. Since DCCP connections can, like TCP,
have data flowing in either direction, it makes sense to reconsider the
use of this function for the benefit of DCCP.
1. Background and motivation
Multimedia streaming can in
principle be considered as one-directional
stream of data packets sent by the streaming source to the
consumer of the media packets. DCCP is a very generic protocol and
allows data traffic to flow in both directions of the same connection.
For many applications this is unnecessary
overhead: by shutting down one direction of the full-duplex
connection, a better performance can be achieved, since the processing
costs per packet are reduced. This implies better responsiveness, scaleability and
use of computing resources.
1.1 DCCP half-connections
A DCCP connection splits into two separate half-connections, each possibly
having a different congestion control ID (CCID). This is illustrated in
the following schema.
The situation is more complicated than with TCP, which has only read()/write() for each end. The
functions used for the active end of
a half-connection are:
- tx_send_packet:
decides when, according to the CCID rules, an enqueued packet is fit
for sending
- tx_packet_sent: is
called directly after tx_send_packet and does CCID accounting for
packets `in flight'
- tx_packet_recv: is
the input for the feedback control loop of the TX CCID
On the receiving end of the
half-connection we have an active and a passive function:
- rx_packet_recv: is
the sink for the data packets sent by the other end
- rx_send_feedback:
sends the receiver packet according to the input received via rx_packet_recv
The point of using shutdown()
in this context is in reducing
processing complexity. Without shutdown(), each received
packet is processed twice - both by the TX CCID via tx_packet_recv and by the RX
CCID via rx_packet_recv.
1.2 The shutdown() function
This function was originally developed for TCP's full-duplex service,
it allows to shut data
transfer in one or both directions.
- SHUT_RD means that
the read end is closed, i.e.
the peer is done with its reading
- SHUT_WR means
that the write end is
closed, i.e. no more data packets will be
sent
- SHUT_RDWR means
shutting both ends at the
same time - neither read or write may follow
after this
There is one subtlety which
distinguishes TCP from DCCP - the latter has no half-closed state [RFC 4340, 4.6].
Hence the semantics of shutdown are not exactly identical, but quite
similar. Making this similarity precise is the purpose of the present
page.
A classical example of the
shutdown function can be found in Stevens' volume 1, section 18.5:
- the server listens for
incoming connections, pipes its input into sort(1), and writes the output
back to the client,
- the client connects to
the server, writing all input from stdin to the socket descriptor,
- the sort program
can only start when the input has reached EOF,
- hence the client sends
a half-close to the server which the server translates as EOF,
- the sort program
processes the completed input, writes its output to the half-open
connection until in turn it reaches the end of the output, after which
the connection terminates.
2. Using shutdown() for DCCP
In DCCP the same example as above is not possible since the signalling
(sending FIN) is missing.
The original meaning of FIN
in RFC 793 was "No more data from
sender". But the lack of such signalling to the peer is no
disadvantage, since shutdown
can still be used locally to reduce processing costs.
Furthermore, as described in section 11.7 of RFC 4340, Data Dropped
options can be used to signal that data packets have not reached
the
application. This option is sent on packets which carry an ACK number
(hence it can not be used on Request or pure Data packets), a
packet drop is indicated by a high-order first bit; and the reason for dropping the packet is
contained in a 3-bit subsequent drop code. The relevant one in this
context is Drop Code 1, "Application
not listening"; it is described in section 11.7.2 of RFC 4340.
2.1 Basic concept
The basic use of shutdown()
which suggests itself in this context is:
- use SHUT_WR to
`shut down' the active side of the half-connection, i.e. tell the CCID
infrastructure that no further packets will be enqueued;
- use SHUT_RD to
locally declare end-of-reading input;
- use Data Dropped, Drop Code 1
to signal to the active side should it continue to send after locally SHUT_RD has been set.
Lastly, shutdown(SHUT_RDWR)
exists but is a bit pointless. To maintain compatibility with TCP (see
e.g. tcp_poll() in net/ipv4/tcp.c), it should
nevertheless be supported.
2.2 Subtleties
The reading side of the RX
CCID is straightforward: if SHUT_RD
is set, then no further input will be accepted, hence no packets
are delivered to the RX CCID. Should still data
packets arrive after the local end of the half-connection has issued shutdown(SHUT_RD) on its side,
an Ack with a Data Dropped option, Drop Code 1, "Application not listening" is sent.
The writing side (TX CCID)
needs a bit more sophistication. The situation is clear when the TX
queue is empty at the time the shutdown(SHUT_WR)
is called. In this case, all further attempts to write to the socket
will be caught by the socket API and lead to a write error, as
intended. Furthermore, since no more packets are going to follow, we
can close the end leading to tx_packet_recv,
since it follows that if no packets are going to be sent, the
congestion control is also no longer needed.
The situation is different if the TX ringbuffer contains several
packets at the time shutdown is called. In this case, we must keep the
input for tx_packet_recv
open, since this would cut off the necessary control traffic to
regulate the congestion control for the still pending packets. The
proposed solution here is to issue the SHUT_WR on the socket so that
no more packets will enter the TX queue, but to defer closing the
feedback input (tx_packet_recv)
until the last packet has left the output TX queue.
3. Further work
The use of Data Dropped options is not currently supported in Linux
DCCP, but work is under way to support it. Until then, the signalling
proposed above should be marked with a FIXME.