Overview
The Sockets Direct Protocol (SDP) is an InfiniBand™ specific protocol defined by the Software Working Group (SWG) of the InfiniBand(sm) Trade Association (IBTA). It defines a standard wire protocol over IBA fabric to support stream sockets (SOCK_STREAM) networking over IBA. SDP utilizes various InfiniBand features (such as remote DMA (RDMA), memory windows, solicited events etc.) for high-performance zero-copy data transfers. SDP is a pure wire-protocol level specification and does not specify socket API's or implementation details.
This project attempts to develop a generic socket offload framework for the Linux kernel and provides a reference implementation of Sockets Direct Protocol over InfiniBand fabric. Sockets Direct Protocol only deals with stream sockets, and if installed in a system, allows bypassing the OS resident TCP stack for all stream connections between any endpoints on the InfiniBand fabric. All other socket types (such as datagrams, raw, packet etc.) are supported by the Linux IP stack and operate over the IPoIB InfiniBand link drivers. The IPoIB driver has no dependency on the SDP stack; however, the SDP stack depends on an IPoIB link driver for local IP assignments and for IP address resolution.
While IPoIB specifies a mapping of IP (both v4 & v6) protocols over IBA fabric and treats the IBA fabric simply as the link layer, SDP facilitates direct mapping of stream connections to InfiniBand reliable connections (or virtual circuits). Conceptually, the IPoIB driver in Linux will look like a network driver and will plug-in underneath the IP stack as any standard Linux network device. The IPoIB driver exposes a network device per IBA port (and partition) on the host system and these devices are used to assign (statically, or dynamically using protocols such as DHCP) IP addresses. The SDP stack simply makes use of these IP assignments for endpoint identifications.
The high level architecture of the Linux Sockets offload framework and the SDP stack is briefly illustrated below.
High Level Architecture
Copyright © 2002 Intel Corp.
Project Definition
The SDP project attempts to provide a generic socket offload framework for Linux and utilizes this framework to add Socket Direct Protocol support to Linux. The major components of this project are
AF_INET_OFFLOAD address family module (also referred to as Offload Protocol Switch (OPS))
Socket Direct Protocol (SDP) module (also referred as Offload Protocol)
InfiniBand transport Module (also referred as Offload Transport)
An optional kernel patch for the Linux socket driver (socket.c)
To aid in the portability of this SDP implementation to a wide set of InfiniBand Host Channel Adapters (HCA), the SDP InfiniBand transport module utilizes the services of the InfiniBand Access Layer infrastructure drivers project. Also, this implementation do not currently foresee any modifications to the Linux TCP/IP stack. A brief architectural overview of this implementation is provided here.
Goals
This section lists an initial set of requirements and goals for accelerated sockets support in Linux. The items listed in this section are by no means complete and need to be further refined.
All offload protocols/transports need to have a standard Linux network driver. This allows network administrators to use standard tools (like ipconfig) to configure and manage the network interfaces and assign IP addresses using static or dynamic methods.
The offload sockets framework should work with and
without kernel patches. To this effect, the offload protocols and
transports will reside under a new offload address family
(AF_INET_OFFLOAD) module. Applications will be able to create socket
instances over this new address family directly. However, for
complete application transparency, an optional minimal patch to the
Linux kernel can be applied (socket.c) to allow re-direction of
AF_INET sockets to the new AF_INET_OFFLOAD address family. The
AF_INET_OFFLOAD module will work as a protocol switch and interact
with the AF_INET address family. The patch also defines a new
address family called AF_INET_DIRECT for applications that want to
be strictly using the OS network stack. This kernel patch can be
optional based on distributor and/or customer requirements.
All standard socket APIs and File I/O APIs that are
supported over the OS resident network stack should be supported
over offload sockets.
Support for Asynchronous I/O
(AIO) being added to Linux. AIO support is being worked in
Linux community. The offload framework should utilize this to
support newer protocol and transports that are natively
asynchronous. (For example, SDP stack could utilize the AIO support
to support PIPELINED mode in SDP)
Architecture should support a layered design so as
to easily support multiple offload technologies, and not just SDP.
Makes sure the added offload sockets framework is useful for
multiple offload technologies.
The proposed architecture should support
implementations optimized for zero-copy data transfer modes between
application buffers across the connection. High performance can be
achieved by avoiding the data copies and using RDMA support in SANs
to do zero copy transfers. This mode is typically useful for large
data transfers where the overhead of setting up RDMA is negligible
compared to the buffer copying costs.
The proposed architecture should support
implementations optimized for low latency small data transfer
operations. Use of send/receive operations incurs lower latency than
RDMA operations that needs explicit setup.
Behavior with signals should be exactly same as with
existing standard sockets.
Listen()
on sockets bound to multiple local interfaces (with IPADDR_ANY) on a
AF_INET socket should listen for connections on all available IP
network interfaces in the system (both offloaded, and
non-offloaded). This requires the listen() call from application
with IPADDR_ANY to be replicated across all protocol providers
including the in-kernel TCP stack.
select() should work across AF_INET socket file
descriptors (fd) supported by
different protocol/transport providers including the in-kernel IP
stack. This guarantees complete transparency at the socket layer
irrespective of which protocol/transport provider is bound to a
socket. .
Operations over socket connections bound to the
in-kernel protocol (TCP/IP) stack should be directed to the kernel
TCP/IP stack with minimum overhead. Application bound to kernel
network stack should see negligible performance impact because of
offload sockets support.
Ability to fallback to kernel TCP/IP stack
dynamically in case of operation/connection failure in direct
mapping of stream connections to offloaded protocols/transports.
Connection requests for AF_INET sockets that fail over offload stack
is automatically retried with the kernel TCP/IP stack. Once a direct
mapped connection is established, it cannot be failed back to the
TCP stack, and any subsequent failures are reported to application
as typical socket operation failures.
Offload Socket framework enables sockets of type
STREAMS only. Other socket types will use only the OS network stack.
Offload sockets framework will support offloading of
stream sessions both within local subnet and outside local subnet
that needs routing. Offload protocols/transports will have the
ability to specify if they do self-routing or need routing
assistance. Ability to offload stream sessions to remote subnet will
be useful for TOE vendors in general and for IBA edge router vendors
who map SDP sessions on IBA fabric to TCP sessions outside fabric.
For protocols/transports that do self-routing, the offload sockets
framework simply forwards the requests. For protocols/transports
that need routing support (such as SDP), the framework utilizes the
OS route tables and applies its configurable policies before
forwarding requests to offload transports.
Since the socket extensions defined by the Interconnect
Software Consortium (ICSC) in the open group are work in
progress at this time, the offload sockets framework will not
attempt to address them in this phase. This could be attempted at a
later phase.
Offload sockets framework should not affect any
existing Linux application designs that uses standard OS
abstractions and features (such as fork(), exec(), dup(), clone(),
etc.). Transparency to applications should be maintained.
Offload sockets framework should support both
user-mode and kernel-mode socket clients. Maintain the existing
socket semantics for existing user mode or kernel mode clients.
The offload sockets framework currently deals with only IPv4 address family. Even though the same offload concepts can be equally applied to offload IPv6 family, it is deferred for later stages of the project.
Licensing Details
This software is being made available under a choice of one of two licenses. You may choose to be licensed under either the GNU General Public License (GPL) Version 2, June 1991, available at http://www.fsf.org/copyleft/gpl.html, or the Intel BSD + Patent License, further described here.
Deliverables
Offload Protocol Switch
Module - The offload protocol switch (OPS) module exposes a new address family
(AF_INET_OFFLOAD) and preserves the protocol operation semantics
with the kernel socket driver at the top while providing a new
offload sock structure, interface, and binding to offload
protocols modules underneath capable of offloading all reliable transport features.
SDP Offload Protocol Module – The SDP offload protocol module implements the SDP wire protocol and state machine to support stream socket connections across the InfiniBand fabric. The SDP module provides standard socket semantics via protocol operations exported up to offload protocol switch. The SDP module interfaces with the offload transport interface at the bottom edge to function over the InfiniBand transport drivers.
InfiniBand Offload Transport Module - The InfiniBand offload transport driver maps the InfiniBand Access Layer interfaces to standard Offload Transport Interface operations for socket based offload protocol modules. Transport services exported by this module include Dynamic transport registration, IP to IBA name services, IP and device plug-and-play, IBA connection services, memory mapping, and data transfer operations. The offload transport module allows offload protocol drivers such as SDP to treat underlying transport interfaces as IP/network endpoints, and hides the IBA fabric specifics such as HCAs, ports, partitions etc.
Optional Linux Socket Driver Patch - The offload protocol modules are by default exposed under a new address family called AF_INET_OFFLOAD. However for applications that need transparent offload capabilities for the sockets created under the AF_INET address family, a minimal kernel patch to the Linux socket driver (socket.c) is required. This patch allows socket driver to route AF_INET stream socket creation calls from application to be routed to the AF_INET_OFFLOAD address family if the offload protocol switch module is loaded. The patch also exposes a new address family called AF_INET_DIRECT for applications to restrict specific socket creation requests to be bound only to the OS network stack.
Timeline
Actual sub-project timelines are not yet determined and will be formulated by the participants of the sub-project.
How to Get Involved
The best way to contribute to this effort is to get in touch with the contact for the area you are interested in. New participants are welcomed and encouraged to join in this effort to bring InfiniBand support to the Linux operating system.
Project News
The project news is included on the SourceForge site. Click the "News" link at the left under "SourceForge Services".
Last Updated: 05/31/2002 05:05 PM