While TRILL and SPB offer one way to handle to the scaling problems of spanning trees, Software-Defined Networking, or SDN, offers another, much more general, approach. The core idea of SDN is to place the forwarding mechanism of each participating switch under the aegis of a controller, a user-programmable device that is capable of giving each switch instructions on how to forward packets. Like TRILL and SPB, this approach also allows forwarding and redundant links to coexist. The controller can be a single node on the network, or can be a distributed set of nodes. The controller manages the forwarding tables of each of the switches.
To handle legitimate broadcast traffic, the controller can, at startup, probe the switches to determine their layout, and, from this, construct a suitable spanning tree. The switches can then be instructed to flood broadcast traffic only along the links of this spanning tree. Links that are not part of the spanning tree can still be used for forwarding to known destinations, however, unlike conventional switches using the spanning tree algorithm.
Typically, if a switch sees a packet addressed to an unknown destination, it reports it to the controller, which then must figure out what to do next. One option is to have traffic to unknown destinations flooded along the same spanning tree used for broadcast traffic. This allows fallback-to-flooding to coexist safely with the full use of loop topologies.
Switches are often configured to report new source addresses to the controller, so that the controller can tell all the other switches the best route to that new source.
SDN controllers can be configured as simple firewalls, disallowing forwarding between selected pairs of nodes for security reasons. For example, if a datacenter has customers A and B, each with multiple nodes, then it is possible to configure the network so that no node belonging to customer A can send packets to a node belonging to customer B. See also the following section.
At many sites, the SDN implementation is based on standardized modules. However, controller software can also be developed locally, allowing very precise control of network functionality. This control, rather than the ability to combine loop topologies with Ethernet, is arguably SDN’s most important feature. See [FRZ13].
2.8.1 OpenFlow Switches
At the heart of SDN is the ability of controllers to tell switches how to forward packets. We next look at the packet-forwarding architecture for OpenFlow switches; OpenFlow is a specific SDN standard created by the Open Networking Foundation. See [MABPPRST08] and the OpenFlow switch specification (2015 version).
OpenFlow forwarding is built around one or more flow tables. The primary components of a flow-table entry are a set of match fields and a set of packet-response instructions, or actions, if the match succeeds. Some common actions include
dropping the packet
forwarding the packet out a specified single interface
flooding the packet out a set of interfaces
forwarding the packet to the controller
modifying some field of the packet
processing the packet at another (higher-numbered) flow table
The match fields can, of course, be a single entry for the destination Ethernet address. But it can also include any other packet bit-field, and can include the ingress interface number. For example, the forwarding can be done entirely (or partially) on IP addresses rather than Ethernet addresses, thus allowing the OpenFlow switch to act as a so-called Layer 3 switch (7.6.3 Subnets versus Switching), that is, resembling an IP router. Matching can be by the destination IP address and the destination TCP port, allowing separate forwarding for different TCP-based applications. In 9.6 Routing on Other Attributes we define policy-based routing; arbitrary such routing decisions can be implemented using OpenFlow switches. In SDN settings the policy-based-routing abilities are sometimes used to segregate real-time traffic and large-volume “elephant” flows. In the
l2_pairs.py example of the following section, matching is done on both Ethernet source and destination addresses.
Flow tables bear a rough similarity to forwarding tables, with the match fields corresponding to destinations and the actions corresponding to the next_hop. In simple cases, the match field contains a single destination address and the action is to forward out the corresponding switch port.
Normally, OpenFlow switches handle broadcast packets by flooding them; that is, by forwarding them out all interfaces other than the arrival interface. It is possible, however, to set the NO_FLOOD attribute on specific interfaces, which means that packets designated for flooding (broadcast or otherwise) will not be sent out on those interfaces. This is typically how spanning trees for broadcast traffic are implemented (see 18.9.6 l2_multi.py for a Mininet example). An interface marked NO_FLOOD, however, may still be used for unicast traffic. Generally, broadcast flooding does not require a flow-table entry.
Match fields are also assigned a priority value. In the event that a packet matches two or more flow-table entries, the entry with the highest priority wins. The table-miss entry is the entry with no match fields (thereby matching every packet) and with priority 0. Often the table-miss entry’s action is to forward the packet to the controller, although a packet that matches no entry is simply dropped.
Flow-table instructions can also involve modifying (“mangling”) packets. One Ethernet-layer application might be VLAN coloring (2.6 Virtual LAN (VLAN)); at the IPv4 layer, this could be used to decrement the TTL and update the checksum (7.1 The IPv4 Header).
In addition to match fields and instructions, flow tables also include counters, flags, and a last_used time. The latter allows flows to be removed if no matching packets have been seen for a while. The counters allow the OpenFlow switch to implement Quality-of-Service constraints – eg bandwidth limiting – on the traffic.
2.8.2 Learning Switches in OpenFlow
Suppose we want to implement a standard Ethernet learning switch (2.4.1 Ethernet Learning Algorithm). The obvious approach is to use flows matching only on the destination address. But we encounter a problem because, by default, packets are reported to the controller only when there is no flow-entry match. Suppose switch S sees a packet from host B to host A and reports it to the controller, which installs a flow entry in S matching destination B (much as a real learning switch would do). If a packet now arrives at S from a third host C to B, it would simply be forwarded, as it would match the B flow entry, and therefore would not be reported to the controller. This means the controller would never learn about address C, and would never install a flow entry for C.
One straightforward alternative approach that avoids this problem is to match on Ethernet ⟨destaddr,srcaddr⟩ pairs. If a packet from A to B arrives at switch S and does not match any existing flow entry at S, it is reported to the controller, which now learns that A is reached via the port by which the packet arrived at S.
In the network above, suppose A sends a packet to B (or broadcasts a packet meant for B), and the flow table of S is empty. S will report the packet to the controller (not shown in the diagram), which will send it back to S to be flooded. However, the controller will also record that A can be reached from S via port 1.
Next, suppose B responds. When this packet from B arrives at S, there are still no flow-table entries, and so S again reports the packet to the controller. But this time the controller knows, because it learned from the first packet, that S can reach A via port 1. The controller also now knows, from the just-arrived packet, that B can be reached via port 2. Knowing both of these forwarding rules, the controller now installs two flow-table entries in S:
dst=B,src=A: forward out port 2dst=A,src=B: forward out port 1
If a packet from a third host C now arrives at S, addressed to B, it will not be forwarded, even though its destination address matches the first rule above, as its source address does not match A. It will instead be sent to the controller (and ultimately be flooded). When B responds to C, the controller will install rules for dst=C,src=B and dst=B,src=C. If the packet from C were not reported to the controller – perhaps because S had a flow rule for dst=B only – then the controller would never learn about C, and would never be in a position to install a flow rule for reaching C.
The pairs approach to OpenFlow learning is pretty much optimal, if a single flow-entry table is available. The problem with this approach is that it does not scale well; with 10,000 addresses on the network, we will need 100,000,000 flowtable-entry pairs to describe all the possible forwarding. This is prohibitive.
We examine a real implementation (in Python) of the pairs approach in 18.9.2 l2_pairs.py, using the Mininet network emulator and the Pox controller (18 Mininet).
A more compact approach is to use multiple flow tables: one for matching the destination, and one for matching the source. In this version, the controller never has to remember partial forwarding information, as the controller in the version above had to do after receiving the first packet from A. When a packet arrives, it is matched against the first table, and any actions specified by the match are carried out. One of the actions may be a request to repeat the match against the second table, which may lead to a second set of actions. We repeat the A→B, B→A example above, letting T0 denote the first table and T1 denote the second.
Initially, before any packets are seen, the controller installs the following low-priority match rules in S:
T0: match nothing: flood, send to T1T1: match nothing: send to controller
These are in effect default rules: because there are no packet fields to match, they match all packets. The low priority ensures that better-matching rules are always used when available.
When the packet from A to B arrives, the T0 rule above means the packet is flooded to B, while the T1 rule means the packet is sent to the controller. The controller then installs the following rules in S:
T0: match dst=A: forward via port 1, send to T1T1: match src=A: do nothing
Now B sends its reply to A. The first rule above matches, so the packet is forwarded by S to A, and is resubmitted to T1. The T1 rule immediately above, however, does not match. The only match is to the original default rule, and the packet is sent to the controller. The controller then installs another two rules in S:
T0: match dst=B: forward via port 2, send to T1T1: match src=B: do nothing
At this point, as A and B continue to communicate, the T0 rules ensure proper forwarding, while the T1 rules ensure that no more packets from this flow are sent to the controller.
Note that the controller always installs the same address in the T0 table and the T1 table, so the list of addresses present in these two tables will always be identical. The T0 table always matches destinations, though, while the T1 table matches source addresses.
The Mininet/Pox version of this appears in 18.9.3 l2_nx.py.
Another application for multiple flow tables involves switches that make quality-of-service prioritization decisions. A packet’s destination would be found using the first flow table, while its priority would be found by matching to the second table. The packet would then be forwarded out the port determined by the first table, using the priority determined by the second table. Like building a learning switch, this can be done with a single table by listing all combinations of ⟨destaddr,priority⟩, but sometimes that’s too many entries.
We mentioned above that SDN controllers can be used as firewalls. At the Ethernet-address level this is tedious to configure, but by taking advantage of OpenFlow’s understanding of IP addresses, it is straightforward, for example, to block traffic between different IP subnets, much like a router might do. OpenFlow also allows blocking all such traffic except that between specific pairs of hosts using specific protocols. For example, we might want customer A’s web servers to be able to communicate with A’s database servers using the standard TCP port, while still blocking all other web-to-database traffic.
2.8.3 Other OpenFlow examples
After emulating a learning switch, perhaps the next most straightforward OpenFlow application, conceptually, is the support of Ethernet topologies that contain loops. This can be done quite generically; the controller does not need any special awareness of the network topology.
On startup, switches are instructed by the controller to report their neighboring switches. With this information the controller is then able to form a complete map of the switch topology. (One way to implement this is for the controller to order each switch to send a special marked packet out each of its ports, and then for the receiving switches to report these packets back to the controller.) Once the controller knows the switch topology, it can calculate a spanning tree, and then instruct each switch that flooded packets should be sent out only via ports that correspond to links on the spanning tree.
Once the location of a destination host has been learned by the controller (that is, the controller learns which switch the host is directly connected to), the controller calculates the shortest (or lowest-cost, if the application supports differential link costs) path from each switch to that host. The controller then instructs each switch how to forward to that host. Forwarding will likely use links that are not part of the spanning tree, unlike traditional Ethernet switches.
We outline an implementation of this strategy in 18.9.6 l2_multi.py.
184.108.40.206 Interconnection Fabric
The previous Ethernet-loop example is quite general; it works for any switch topology. Many other OpenFlow applications require that the controller contains some prior knowledge of the switch topology. As an example of this, we present what we will refer to as an interconnection fabric. This is the S1-S5 network illustrated below, in which every upper (S1-S2) switch connects directly to every lower (S3-S5) switch. The bottom row in the diagram represents server racks, as interconnection fabrics are very common in datacenters. (For a real-world datacenter example, see here, although real-world interconnection fabrics are often joined using routing rather than switching.) The red and blue numbers identify the switch ports.
The first two rows here contain many loops, eg S1–S3–S2–S4–S1 (omitting the S3-S5 row, and having S1 and S2 connect directly to the server racks, does not eliminate loops). In the previous example we described how we could handle loops in a switched network by computing a spanning tree and then allowing packet flooding only along this spanning tree. This is certainly possible here, but if we allow the spanning-tree algorithm to prune the loops, we will lose most of the parallelism between the S1-S2 and S3-S5 layers; see exercise 8.5. This generic spanning-tree approach is not what we want.
If we use IP routing at S1 through S5, as in 9 Routing-Update Algorithms, we then need the three clusters of server racks below S3, S4 and S5 to be on three separate IP subnets (7.6 IPv4 Subnets). While this is always technically possible, it can be awkward, if the separate subnets function for most other purposes as a single unit.
One OpenFlow approach is to assume that the three clusters of server racks below S3-S4-S5 form a single IP subnet. We can then configure S1-S5 with OpenFlow so that traffic from the subnet is always forwarded upwards while traffic to the subnet is always forwarded downwards.
But an even simpler solution – one not requiring any knowledge of the server subnet – is to use OpenFlow to configure the switches S1-S5 so that unknown-destination traffic entering on a red (upper) port is flooded out only on the blue (lower) ports, and vice-versa. This eliminates loops by ensuring that all traffic goes through the interconnection fabric either upwards-only or downwards-only. After the destination server below S3-S5 has replied, of course, S1 or S2 will learn to which of S3-S5 it should forward future packets to that server.
This example works the way it does because the topology has a particular property: once we eliminate paths that both enter and leave S1 or S2 via blue nodes, or that enter and leave S3, S4 and S5 via red nodes, there is a unique path between any input port (red upper port) and any output port (towards the server racks). From there, it is easy to avoid loops. Given a more general topology, on the other hand, in which unique paths cannot be guaranteed by such a rule, the OpenFlow controller has to choose a path. This in turn generally entails path discovery, shortest-path selection and loop avoidance, as in the previous example.
220.127.116.11 Load Balancer
The previous example was quite local, in that all the OpenFlow actions are contained within the interconnection fabric. As a larger-scale (and possiby more typical) special-purpose OpenFlow example, we next describe how to achieve server load-balancing via SDN; that is, users are connected transparently to one of several identical servers. Each server handles only its assigned fraction of the total load. For this example, the controller must not only have knowledge of the topology, but also of the implementation goal.
To build the load-balancer, imagine that the SDN controller is in charge of, in the diagram of the previous section, all switches in the interconnection fabric and also all switches among the server racks below. At this point, we configure all the frontline servers within the server racks identically, including giving them all identical IPv4 addresses. When an incoming TCP connection request arrives, the controller picks a server (perhaps using round robin, perhaps selecting the server with the lowest load) and sets up OpenFlow forwarding rules so all traffic on that TCP connection arriving from the outside world is sent to the designated server, and vice-versa. Different servers with the same IPv4 address are not allowed to talk directly with one another at all, thereby averting chaos. The lifetime of the OpenFlow forwarding rule can be adjusted as desired, eg to match the typical lifetime of a user session.
When the first TCP packet of a connection arrives at the selected server, the server may or may not need to use ARP to figure out the appropriate internal LAN address to which to send its reply. Sometimes ARP needs to be massaged a bit to work acceptably in an environment in which some hosts have the same IPv4 address.
At no point is the fact that multiple servers have been assigned the same IPv4 address directly exposed: not to other servers, not to internal routers, and not to end users. (Servers never initiate outbound connections to users, so there is no risk of two servers contacting the same user.)
For an example of this sort of load balancing implemented in Mininet and Pox, see 18.9.5 loadbalance31.py.
The identical frontline servers might need to access a common internal database cluster. This can be implemented by assigning each server a second IPv4 address for this purpose, not shared with other servers, or by using the common public-facing IPv4 address and a little more OpenFlow cleverness in setting up appropriate forwarding rules. If the latter approach is taken, it is now in principle possible that two servers would connect to the database using the same TCP port, by coincidence. This would expose the identical IPv4 addresses, and the SDN controllers would have to take care to ensure that this did not happen. One approach, if supported, would be to have the OpenFlow switches “mangle” the server IPv4 addresses or ports, as is done with NAT (7.7 Network Address Translation).
There are also several “traditional” strategies for implementing load balancing. For example, one can give each server its own IPv4 address but then use round-robin DNS (7.8 DNS) to assign different users to different servers. Alternatively, one can place a device called a load balancer at the front of the network that assigns incoming connection requests to an internal server and then takes care of setting up the appropriate forwarding. Forwarding can be at the IP layer (that is, via routing), or at the TCP layer, or at the application layer. The load balancer can be thought of as NAT-like (7.7 Network Address Translation) in that it maintains a table of associations between external-user connections and a internal servers; once a user connects, the association with the chosen server remains in place for a period of time. One advantage of the SDN approach described here is that the individual front-line servers need no special configuration; all of the load-sharing awareness is contained within the SDN network. Furthermore, the SDN switches do virtually no additional work beyond ordinary forwarding; they need only involve the controller when the first new TCP packet of each connection arrives.