Overview

The Sockets Direct Protocol (SDP) is an InfiniBand™ specific protocol defined by the Software Working Group (SWG) of the InfiniBand(sm) Trade Association (IBTA). It defines a standard wire protocol over IBA fabric to support stream sockets (SOCK_STREAM) networking over IBA. SDP utilizes various InfiniBand features (such as remote DMA (RDMA), memory windows, solicited events etc.) for high-performance zero-copy data transfers. SDP is a pure wire-protocol level specification and does not specify socket API's or implementation details.

This project attempts to develop a generic socket offload framework for the Linux kernel and provides a reference implementation of Sockets Direct Protocol over InfiniBand fabric. Sockets Direct Protocol only deals with stream sockets, and if installed in a system, allows bypassing the OS resident TCP stack for all stream connections between any endpoints on the InfiniBand fabric. All other socket types (such as datagrams, raw, packet etc.) are supported by the Linux IP stack and operate over the IPoIB InfiniBand link drivers. The IPoIB driver has no dependency on the SDP stack; however, the SDP stack depends on an IPoIB link driver for local IP assignments and for IP address resolution. 

While IPoIB specifies a mapping of IP (both v4 & v6) protocols over IBA fabric and treats the IBA fabric simply as the link layer, SDP facilitates direct mapping of stream connections to InfiniBand reliable connections (or virtual circuits). Conceptually, the IPoIB driver in Linux will look like a network driver and will plug-in underneath the IP stack as any standard Linux network device. The IPoIB driver exposes a network device per IBA port (and partition) on the host system and these devices are used to assign (statically, or dynamically using protocols such as DHCP) IP addresses. The SDP stack simply makes use of these IP assignments for endpoint identifications.

The high level architecture of the Linux Sockets offload framework and the SDP stack is briefly illustrated below.

High Level Architecture

Copyright © 2002 Intel Corp.

Project Definition

The SDP project attempts to provide a generic socket offload framework for Linux and utilizes this framework to add Socket Direct Protocol support to Linux. The major components of this project are 

To aid in the portability of this SDP implementation to a wide set of InfiniBand Host Channel Adapters (HCA), the SDP InfiniBand transport module utilizes the services of the InfiniBand Access Layer infrastructure drivers project. Also, this implementation do not currently foresee any modifications to the Linux TCP/IP stack. A brief architectural overview of this implementation is provided here.

Goals

This section lists an initial set of requirements and goals for accelerated sockets support in Linux. The items listed in this section are by no means complete and need to be further refined. 

  1. All offload protocols/transports need to have a standard Linux network driver. This allows network administrators to use standard tools (like ipconfig) to configure and manage the network interfaces and assign IP addresses using static or dynamic methods.

  2. The offload sockets framework should work with and without kernel patches. To this effect, the offload protocols and transports will reside under a new offload address family (AF_INET_OFFLOAD) module. Applications will be able to create socket instances over this new address family directly. However, for complete application transparency, an optional minimal patch to the Linux kernel can be applied (socket.c) to allow re-direction of AF_INET sockets to the new AF_INET_OFFLOAD address family. The AF_INET_OFFLOAD module will work as a protocol switch and interact with the AF_INET address family. The patch also defines a new address family called AF_INET_DIRECT for applications that want to be strictly using the OS network stack. This kernel patch can be optional based on distributor and/or customer requirements.

  3. All standard socket APIs and File I/O APIs that are supported over the OS resident network stack should be supported over offload sockets.

  4. Support for Asynchronous I/O (AIO)  being added to Linux. AIO support is being worked in Linux community. The offload framework should utilize this to support newer protocol and transports that are natively asynchronous. (For example, SDP stack could utilize the AIO support to support PIPELINED mode in SDP)

  5. Architecture should support a layered design so as to easily support multiple offload technologies, and not just SDP. Makes sure the added offload sockets framework is useful for multiple offload technologies.

  6. The proposed architecture should support implementations optimized for zero-copy data transfer modes between application buffers across the connection. High performance can be achieved by avoiding the data copies and using RDMA support in SANs to do zero copy transfers. This mode is typically useful for large data transfers where the overhead of setting up RDMA is negligible compared to the buffer copying costs.

  7. The proposed architecture should support implementations optimized for low latency small data transfer operations. Use of send/receive operations incurs lower latency than RDMA operations that needs explicit setup.

  8. Behavior with signals should be exactly same as with existing standard sockets.

  9.  Listen() on sockets bound to multiple local interfaces (with IPADDR_ANY) on a AF_INET socket should listen for connections on all available IP network interfaces in the system (both offloaded, and non-offloaded). This requires the listen() call from application with IPADDR_ANY to be replicated across all protocol providers including the in-kernel TCP stack.

  10. select() should work across AF_INET socket file descriptors (fd) supported by different protocol/transport providers including the in-kernel IP stack. This guarantees complete transparency at the socket layer irrespective of which protocol/transport provider is bound to a socket. .

  11. Operations over socket connections bound to the in-kernel protocol (TCP/IP) stack should be directed to the kernel TCP/IP stack with minimum overhead. Application bound to kernel network stack should see negligible performance impact because of offload sockets support.

  12. Ability to fallback to kernel TCP/IP stack dynamically in case of operation/connection failure in direct mapping of stream connections to offloaded protocols/transports. Connection requests for AF_INET sockets that fail over offload stack is automatically retried with the kernel TCP/IP stack. Once a direct mapped connection is established, it cannot be failed back to the TCP stack, and any subsequent failures are reported to application as typical socket operation failures.

  13. Offload Socket framework enables sockets of type STREAMS only. Other socket types will use only the OS network stack.

  14. Offload sockets framework will support offloading of stream sessions both within local subnet and outside local subnet that needs routing. Offload protocols/transports will have the ability to specify if they do self-routing or need routing assistance. Ability to offload stream sessions to remote subnet will be useful for TOE vendors in general and for IBA edge router vendors who map SDP sessions on IBA fabric to TCP sessions outside fabric. For protocols/transports that do self-routing, the offload sockets framework simply forwards the requests. For protocols/transports that need routing support (such as SDP), the framework utilizes the OS route tables and applies its configurable policies before forwarding requests to offload transports.

  15. Since the socket extensions defined by the Interconnect Software Consortium (ICSC) in the open group are work in progress at this time, the offload sockets framework will not attempt to address them in this phase. This could be attempted at a later phase.

  16. Offload sockets framework should not affect any existing Linux application designs that uses standard OS abstractions and features (such as fork(), exec(), dup(), clone(), etc.). Transparency to applications should be maintained.

  17. Offload sockets framework should support both user-mode and kernel-mode socket clients. Maintain the existing socket semantics for existing user mode or kernel mode clients.

  18. The offload sockets framework currently deals with only IPv4 address family. Even though the same offload concepts can be equally applied to offload IPv6 family, it is deferred for later stages of the project.

Licensing Details

This software is being made available under a choice of one of two licenses. You may choose to be licensed under either the GNU General Public License (GPL) Version 2, June 1991, available at http://www.fsf.org/copyleft/gpl.html, or the Intel BSD + Patent License, further described here.

Deliverables

Timeline

Actual sub-project timelines are not yet determined and will be formulated by the participants of the sub-project.

How to Get Involved

The best way to contribute to this effort is to get in touch with the contact for the area you are interested in.  New participants are welcomed and encouraged to join in this effort to bring InfiniBand support to the Linux operating system.


Project News

The project news is included on the SourceForge site.  Click the "News" link at the left under "SourceForge Services".


Last Updated: 05/31/2002 05:05 PM