Problem of short-lived connections in DCCP ========================================== In the current implementation of DCCP there is a problem with short-lived connections. The problem is that connection state may be torn down before userspace gets a chance to read the data; while a completely normal transaction (handshake, data transfer, termination) can be see on the wire. The problem occurs both * when a server sends a CloseReq to the connecting client; * when the server decides to hold TIMEWAIT state and sends a Close to the client. The bug behaviour can be observed using the following setting: * short test program which has the server write a string that the client reads; * can best be observed using the loopback setting; * or when trying iteratively - usually one out of x * 10 connection attempts fails. I. Behaviour when the server sends a CloseReq --------------------------------------------- 1. client receives the CloseReq in OPEN state; 2. client immediately replies with Close (8.3); 3. within one RTT, the server replies with the terminating Reset; 4. dccp_rcv_reset() enqueues Reset via dccp_fin() and calls dccp_time_wait(); 5. dccp_time_wait() calls dccp_done(); 6. dccp_done() changes the socket state from CLOSING to CLOSED. Now, the general situation is that * client has issued the connect(2) system call; * sys_connect() calls inet_stream_connect(); * inet_stream_connect() initiates handshake via dccp_v{4,6}_connect(); * while the state is REQUESTING (SYN_SENT) or RESPOND (SYN_RECV, not applicable here), inet_wait_for_connect() is called; * inet_wait_for_connect() changes task state and might go to sleep. ==> When the system call to connect(2) returns, inet_stream_connect() finds that the socket is closed; it then calls dccp_disconnect() and returns ECONNABORTED. II. Behaviour when the server sends a Close ------------------------------------------- The server can decide to hold TIMEWAIT state (8.3), in which case it sends a Close instead of a CloseReq. The following sequence of events was observed to happen in this case. 1. client receives Close while in OPEN state; 2. dccp_rcv_close() sets CLOSED state immediately after enqueueing the Close (dccp_fin). The latter leads to the same behaviour as in (I): inet_stream_connect(), after returning from inet_wait_for_connect(), finds that the socket is in CLOSED state; calls dccp_disconnect(); and returns ECONNABORTED. III. Why the problem is significant ----------------------------------- The above described problem will cause unexpected behaviour when the connections are short-lived. While connections are typically longer than a single transaction, such an assumption is restrictive. Library routines based on getaddrinfo(), for example, typically make several attempts to look for the "right" address. When connect(2) fails due to the observed behaviour, it can happen that the wrong address, or none at all (total failure), is selected. IV. Countermeasure ------------------ The above described situation needs to be avoided (i.e. inet_stream_connect must not find the socket in state CLOSED); while the API should continue to work as expected in all other regards. The solution needs to be independent of what happens in userspace (the application may have a bug, it may not read from the queue at all, it may be suspended etc.), yet it needs to allow the application to read the data until the close(2) system call is called (directly or via exit). The cleanest solution seems to be to add intermediate states; in this way state is made explicit.