DCCP-IPv4/6 Socket Application Library

The DCCP/IPv6 library provides abstractions to simplify building DCCP applications.

DCCP/TCP/UDP/UDP-Lite IPv4 and IPv6 sockets can all be created using a single socket call.

1. Socket Creation

All socket creation routines use getaddrinfo() internally. The problems of this glibc function are that

DCCP and UDP-Lite are not natively supported (the library works around this limitation);
some socket options (such as service code or partial checksums) need to be set before the socket is created;
the local side of the socket (for bind()) and/or the remote side of the socket (for connect()) may be specified.

These issues have been resolved by the library - there is a single generic function makesock2() which caters for all purposes, a simpler and less generic variant called makesock(), and two simple routines/wrappers for active (connecting) and passive (listening) sockets. We start in increasing order of complexity.

1.1 The simplest variants

For DCCP applications which have a one-to-one mapping between port number and service codes (such as the provided applications), you can use:

	int dccp_listen_simple(char *service_spec);			/* listen on the port associated with service code */
	int dccp_connect_simple(char *peer, char *service_spec);	/* connect to peer using service code string */

This works for both IPv4 and IPv6 sockets and supports all the `SC:', `SC=', and `SC=x' conventions for specifying service codes from RFC 4340, section 8.1.2. Parsing the service code, resolving associated port number and address are all taken care of by the library functions.

1.2 More complex variants

Many port numbers have no associated service code. For these the next, a bit more generic, variants are used:

	int  listen_simple(l4proto transport, const uint16_t port, struct flowopts *fo);	/* listen on given port */
	int connect_simple(l4proto transport, char *peer, uint16_t port, struct flowopts *fo);	/* connect peer on given port */

These work in the same way for IPv4/IPv6 hosts, the `l4proto' is an enum, where you are free to use either of the IPPROTO_XXX constants, or the provided shortcuts (`udp', `tcp', `udplite', `dccp') as specifier.

The `struct flowopts' are new and described in the next section. Basically, it is a FIFO list of everything that needs to be specified on a socket prior to making the connection. If you don't need/want any of these, it is as well to use the defaults and simply provide a NULL argument as in the following example from the hello-world example:

	listenfd = listen_simple(dccp, HELLO_PORT, NULL);		/* server */
	sockfd = connect_simple(dccp, hostname, HELLO_PORT, NULL);	/* client */

When no service code is specified, the API uses a zero socket code. This works without problem, especially on internal LANs. If you are crossing NATs or firewalls, people tend to have differing opinions about omitting the service code.

1.3 The generic makesock()/makesock2() routines

The next and last step are the two most generic routines - these are probably more for advanced socket programming.

The makesock() routine will do for most cases:

 	int makesock(const af_type af, const l4proto transport, int passive,
		     const char *host, const char *service, struct flowopts *fo);

The af_type is one of AF_INET, AF_INET6, or AF_UNSPEC; `transport' is as above; and the `passive' flag has the following meaning:

if set to 1 then `host' is either an interface address or NULL, and `service' is the port number or service name to listen on;
if set to 0 then `host' is the address/name of a peer, and `service' is its port number / service name to connect to.

The following routine, finally, is the most generic one and allows to specify both ends of the socket.

	int makesock2(const af_type	af,		/* network layer type */
		      const l4proto	transport,	/* transport layer type	*/
		      const char	*local_name,	/* address to bind() */
		      const char	*local_serv,	/* port or service name	*/
		      const char	*remote_name,	/* peer to connect() to */
		      const char	*remote_serv,	/* port or service name	*/
		      struct flowopts *fo);	        /* pre-connection options */

You may want to do this to e.g. restrict incoming connections or to force a local interface address when connecting.

2. The flowopts abstraction

The flowopts structure serves as a container for everything that needs to be specified before making an active or passive connection. This includes

DCCP service codes (there are specific wrappers provided)
partial checksums (DCCP, UDP-Lite)
most SOL_SOCKET options (see socket(7))
transport-protocol specific options (e.g. TCP_NODELAY, SO_DEBUG, ...)
...

The user does not have to bother with the internal structure of the flowopts list: it is dynamically created, then populated using either functions or macro-shortcuts, and it is cleaned up after use within the socket-creation routines described above.

To make use of flowopts:

allocate a pointer to a new instance using flowopt_new();
add using the flowopt_add() and flowopt_add_on/off() routines;
that's all - deallocation and memory handling is done internally.

The argument list of the flowopt_add_xxx() routines is deliberately identical with the one of setsockopt(2):

	void flowopt_add(struct flowopts *fo, 	/* is filled in by this function */
			 int level, 		/* socket level, same as second arg of setsocktopt() */
			 int opt, 		/* option type, same as third arg of setsockopt()    */
			 char *name, 		/* for debugging, should be stringified name of `opt' */
			 const void *val, 	/* option value, same as fourth arg of setsockopt()  */
			 int len);		/* value-length, same as fifth arg of setsockopt()   */ 


	/*

	 *  Wrapper for Boolean socket options (those which are either on=1 or off=0)

	 */

	void flowopt_add_bool(struct flowopts *fo, int lev, int opt, char *optname, bool on_or_off);

The point of using a character argument (`name') is for error reporting: if setting the socket option fails, the error message will use the value of `name' to indicate which socket option could not be set (there will be likely more than one). Since it is tedious to always fill in the `name' argument, the following shortcuts are provided, which do this automatically:

	/* shortcuts for the above */
	#define OPT_ADD(fo, lev, opt, val, len) flowopt_add(fo, lev, opt, #opt, val, len)
	#define OPT_ENABLE(fo, lev, opt)        flowopt_add_bool(fo, lev, opt, #opt, 1)
	#define OPT_DISABLE(fo, lev, opt)       flowopt_add_bool(fo, lev, opt, #opt, 0)

To illustrate, here is an example from the sock program internals:

	struct flowopts   *fo = flowopt_new();


	if (broadcast)
		OPT_ENABLE(fo, SOL_SOCKET, SO_BROADCAST);


	 if (reuseaddr)
		OPT_ENABLE(fo, SOL_SOCKET, SO_REUSEADDR);

	if (rcvbuflen)
		OPT_ADD(fo, SOL_SOCKET, SO_RCVBUF, &rcvbuflen, sizeof(int));

	if (cscov >= 0)		/* partial checksum coverage length */
		flowopt_set_cscov(fo, l4type, client, &cscov);

The last line also shows how partial checksums are set on the socket - the same function is used for both DCCP and UDP-Lite (see below for details).

Lastly, only one function is needed to set service codes on the socket:

	void 	flowopt_set_service(struct flowopts *fo, char *service);

The difference to the routines above is that `service' not only supports the `SC' conventions from RFC 4340, 8.1.2, but can optionally be a comma-separated list of such service code strings. These are internally parsed, converted to service codes, and assigned on the socket descriptor. More information is in the next section.

3. DCCP Service codes

3.1 General use within the library

DCCP service codes are part of the pre-connection flowopt structure described above. This structure is filled in using accessor methods, it is not visible from the outside.

For reference, here is the internal structure contained within the flowopt structure.

	struct dccp_services	{
   		uint32_t	sc_vec;		/* array starts here */
		uint8_t		sc_len,		/* total length of allocated places */
				sc_idx;		/* stack pointer */
	};

For active sockets, sc_vec has exactly one element, and sc_idx is never greater than 1. For passive sockets (which may be offering different services on the same socket), the maximum number is limited by the DCCP_SERVICE_LIST_MAX_LEN constant in linux/dccp.h (currently 32).

It is important to note that service codes are always filled in in network-byte order. The provided library routines automatically take care of this. There are two constructor routines:

    struct dccp_services   *dccp_service_allocate(uint8_t num_entries);
    struct dccp_services   *dccp_services_new(const uint32_t *sc_arr, const uint8_t len);

The first allocates space for num_entries entries, the second is a copy constructor which copies 'len' entries from the sc_arr array. To populate the allocated structure with service codes, use:

    void dccp_service_add(struct dccp_services *ds, uint32_t service_code);

The code will not allow pushing more than the pre-allocated num_entries (will bail out with error messages). Note that in the above service_code is an actual number; service string parsing routines are below.

3.2 Assigning service codes on a socket

The following assigns a service code struct on a socket. Note again that if you use the flowopts option, this low-level aspect is not necessary - it is already taken care of.

    int    dccp_services_assign(int sockfd, struct dccp_services const *ds);

To clean up after usage, the following destructor is used:

    void   dccp_services_cleanup(struct dccp_services *ds);

NB: For normal use, these low-level routines are not necessary. Use the above flowopts wrappers, which are high-level abstractions and already combine these low-level operations.

3.3 Parsing service code strings

To contain an ASCII service code string into a 32 bit number (in host byteorder), the following routine is provided:

     uint32_t  parse_service_code(const char *service);

This function supports all the SC... variants defined in section 8.1.2 of RFC 4340.

3.4 Printing service codes

There are three pretty-printing routines:

    char   *assigned_services(int sockfd);

    char   *service_code(uint32_t sc);
    char   *service_codes(uint32_t sc_array[], uint8_t arraylen);

The first of these returns a comma-separated list of all service codes associated with the given socket descriptor.

The two other ones are more low-level and turn a 32-bit service code (in host-byte-order) into a presentable string - using the ASCII representation SC:XXXX if possible. The service_codes() function is simply the array variant of service_code().

3.5 Service-code to port mapping

Using the algorithm from RFC 5595, 2.7, port numbers in the dynamic range (RFC 4340, 19.9) can be derived automatically from the service code using the following function.

    uint16_t  dccp_port_from_service_code(uint32_t sc);

4.1 Query the current maximum packet size

DCCP is datagram-oriented and fragmentation is discouraged, so applications need to be aware of the current maximum packet size (MPS) supported by the path. The MPS is defined in section 14 of RFC 4340, and the current value can be queried using

    int  dccp_get_cur_mps(int sockfd);

Note that this requires a connected socket, i.e. after connect() or after accept(). This feature requires kernel support (available in the test tree).

4.2 CCID-specific routines

CCID3 has two getsockopt()-only routines which return a struct tfrc_{r,t}x_info (defined in linux/tfrc.h). To perform the corresponding getsockopt calls, these two wrappers are provided:

    void   tfrc_get_tx_info(int sockfd, struct tfrc_tx_info *tfrc);
    void   tfrc_get_rx_info(int sockfd, struct tfrc_rx_info *tfrc);

Note that on CCID2 these will not return anything - but will print warnings if you try to.

5. Partial checksums

Two transport protocols of this library support the use of partial checksums - UDP-Lite (RFC 3828) and DCCP (RFC 4340, sec. 9).
To test this dynamically (e.g. in a program supporting multiple layer-4 protocols), use

	int  supports_partial_csums(const l4proto transport);

which returns 1 if supported. The use of the flowopts structure is encouraged, as the same wrapper function serves both for UDP-Lite and DCCP partial checksum coverage:

	void 	flowopt_set_cscov(struct flowopts *fo, 	/* is filled in by this function */
				  l4proto transport, 	/* IPPROTO_DCCP/dccp or IPPROTO_UDPLITE/udplite */
				  int sender, 		/* set to 1 if you want sender coverage, 0 for receiver coverage */
				  int const *cscov);	/* semantics of this value depend on the transport protocol */

Common to both UDP-Lite and DCCP is the distinction between sender and receiver coverage:

the sender sets the actual checksum coverage used in the packets;
the receiver specifies a minimum coverage and discards packets which coverages less than this.

There are two routines to read and pretty-print the checksum coverage currently assigned on a socket:

    int  get_cscov(int sockfd, l4proto transport, int sender);	   /* just return the number (or error) */
    char *cscov_to_str(int sockfd, l4proto transport, int sender); /* pretty-print cscov semantics      */

Both are transport-layer independent, the second generates verbose information about the current coverage.

6. Protocol-type convenience functions

6.1 Feature-test functions

Since the number of transport protocols seems to continuously grow each year, there is space for feature-test functions:

	int	is_connection_oriented(const l4proto transport);
	int	is_connectionless(const l4proto transport);
	int	is_datagram_based(const l4proto transport);

TCP is connection-oriented and not datagram-based; UDP is connectionless and datagram-based (so is UDP-Lite), while DCCP is connection-oriented and datagram-based.

The next pair of functions determine the address family of the local (bind) and remote (connect) end of a socket:

	int 	local_AF(int sockfd);	/* returns AF_INET or AF_INET6 */
	int	remote_AF(int sockfd);	/* same but for the remote end */

Lastly, from the transport protocol derive other parameters often used in other library calls:

	int     sock_type(const l4proto transport);		/* SOCK_DGRAM | SOCK_STREAM | SOCK_DCCP */
	int    sock_level(const l4proto transport);		/* SOL_DCCP | SOL_UDPLITE | ... */

6.2 Printing information about a socket

Both the local and remote side of a socket have a `name', which can be queried using getsockname(2) and getpeername(2), respectively. Since addresses are no longer IPv4-only, this requires allocating a buffer and an call to getnameinfo(3); all of which is combined into the following functions:

	char	*local_name(int sockfd);	/* local end of sock */
	char	*remote_name(int sockfd);	/* remote end of sock */

	char	*host_and_port(struct sockaddr *sa, socklen_t len);

Each of these uses the `host#port' format. The last one pretty-prints v4/v6 socket structures.

Finally, one can print the layer-3 and layer-4 protocols currently in use:

	const char	*ADDRESS_FAMILY(const int af);		/* prints AF_XXX */
	const char	*layer3_name(const af_type af);		/* v4-only | v6-only | both v4/v6 */
	const char	*layer4_name(const l4proto transport);	/* vernacular name of transport protocol */

7. Error handling

The following routines are well-known from the books by Stevens et al and have been adapted from the UNP library:

	void err_ret(const char fmt, ...);	/* non-fatal syscall error, continue	*/
	void err_sys(const char fmt, ...);	/* fatal syscall error, abort execution	*/
	void err_dump(const char fmt, ...);	/* fatal syscall error, dump core	*/

	void err_msg(const char fmt, ...);	/* same as fprintf(stder, ...)		*/
	void err_quit(const char fmt, ...);	/* same as err_msg(...); exit(1);	*/
	
	#define die(fmt, args...)		err_sys("%s: " fmt, __FUNCTION__, ##args)
	#define warn(args...)			err_ret("WARNING: " args)
	#define warnx(args...)			err_msg("WARNING: " args)

However, the main difference to the UNP routines is in the use of linux_strerror(), which is very useful for debugging - instead of just interpreting errno in perror(3) style, it additionally prints the errno, e.g. "EPERM: Operation not permitted".

Finally, getting tired to always test for NULL after calling malloc(3), here is a wrapper

	void 	*do_malloc(size_t size);

which does this automatically and complains if there is not enough memory available.