• 21 Network Management and SNMP

    Network management, broadly construed, consists of all the administrative actions taken to keep a network running efficiently. This may include a number of non-technical considerations, eg staffing the help desk and negotiating contracts with vendors, but we will restrict attention exclusively to the technical aspects of network management.

    The ISO and the International Telecommunications Union have defined a formal model for telecommunications and network management. The original model defined five areas of concern, and was sometimes known as FCAPS after the first letter of each area:

    • fault

    • configuration

    • accounting

    • performance

    • security

    Most non-ISP organizations have little interest in network accounting (the A in FCAPS is often replaced with “administration” for that reason, but that is a rather vague category). Network security is arguably its own subject entirely. As for the others, we can identify some important subcategories:

    fault:

    • device management: monitoring of all switches, routers, servers and other network hardware to make sure they are running properly.

    • server management: monitoring of the network’s application layer, that is, all network-based software services; these include login authentication, email, web servers, business applications and file servers.

    • link management: monitoring of long-haul links to ensure they are working.

    configuration:

    • network architecture: the overall design, including topology, switching vs routing and subnet layout.

    • configuration management: arranging for the consistent configuration of large numbers of network devices.

    • change management: how does a site roll out new changes, from new IP addresses to software updates and patches?

    performance:

    • traffic management: using the techniques of 19 Queuing and Scheduling to allocate bandwidth shares (and perhaps bucket sizes) among varying internal or external clients or customers.

    • service-level management: making sure that agreed-upon service targets – eg bandwidth – are met (depending on the focus, this could also be placed in the fault category).

    While all these aspects are important, technical network management narrowly construed often devolves to an emphasis on fault management and its companion, reliability management: making sure nothing goes wrong, and, when it does, fixing it promptly. It is through fault management that some network providers achieve the elusive availability goal of 99.999% uptime.

    By far the most common device-monitoring protocol, and the primary focus for this chapter, is the Simple Network Management Protocol or SNMP (21.2 SNMP Basics). This protocol allows a device to report information about its current operational state; for example, a switch or router may report the configuration of each interface and the total numbers of bytes and packets sent via each interface.

    Implicit in any device-monitoring strategy is initial device discovery: the process by which the monitor learns of new devices. The ping protocol (7.11 Internet Control Message Protocol) is common here, though there are other options as well; for example, it is possible to probe the UDP port on a node used for SNMP – usually 161. As was the case with router configuration (9 Routing-Update Algorithms), manual entry is simply not a realistic alternative.

    It is a practical necessity, for networks of even modest size, to automate the job of checking whether everything is working properly. Waiting for complaints is not an option. Such a monitoring system is known as a network management system or NMS; there are a wide range of both proprietary and open-source NMS’s available. At its most basic, an NMS consists of a library of scripts to discover new network devices and then to poll each device (possibly but not necessarily using SNMP) at regular intervals. Generally the data received is recorded and analyzed, and alarms are sounded if a failure is detected.

    When SNMP was first established, there was a common belief that it would soon be replaced by the OSI’s Common Management Interface Protocol. CMIP is defined in the International Telecommunication Union’s X.711 protocol and companion protocols. CMIP uses the same ASN.1 syntax as SNMP, but has a richer operations set. It remains the network management protocol of choice for OSI networks, and was once upon a time believed to be destined for adoption in the TCP/IP world as well.

    But it was not to be. TCP/IP-based network-equipment vendors never supported CMIP widely, and so any network manager had to support SNMP as well. With SNMP support essentially universal, there was never a need for a site to bother with CMIP.

    CMIP, unlike SNMP, is supported over TCP, using the CMIP Over Tcp, or CMOT, protocol. One advantage of using TCP is the elimination of uncertainty as to whether a request or a reply was received.

  • 21.1 Network Architecture

    Before turning to SNMP in depth, we offer a few references to other parts of this book relating to network architecture. At the LAN and Internetwork layers local to a site, perhaps the main issues are redundancy, bandwidth and cost. Cabling between buildings, in particular, needs to provide redundancy. See 2.5 Spanning Tree Algorithm and Redundancy and 2.6 Virtual LAN (VLAN) for some considerations at the Ethernet level, and 7.6.3 Subnets versus Switching.

    Next, a site must determine what sort of connection to the Internet it will have. ISP contracts vary greatly in terms of bandwidth, burst bandwidth, and agreed-upon responses in the event of an outage. Some aspects of service-level specification appear in 19.9 Token Bucket Filters and 20.7.2 Assured Forwarding.

    Organizations with geographically dispersed internal networks – ISPs and larger corporations – must decide how their internal sites should be connected. Should they communicate over the public Internet? Should a VPN be used (3.1 Virtual Private Networks)? Or should private lines (such as SONET, 4.2.2 SONET, or carrier Ethernet, 3.2 Carrier Ethernet) be leased between sites? If private lines are used, link monitoring becomes essential.

    One increasingly important architectural decision at the application layer is the extent to which network software services are outsourced to the cloud, and run on remote servers managed by third parties.