7.9: DNS

Last updated
Save as PDF

Page ID: 11147

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

The Domain Name System, DNS, is an essential companion protocol to IPv4 (and IPv6); an overview can be found in RFC 1034 [https://tools.ietf.org/html/rfc1034.html]. It is DNS that permits users the luxury of not needing to remember numeric IP addresses. Instead of 162.216.18.28, a user can simply enter intronetworks.cs.luc.edu [http://intronetworks.cs.luc.edu], and DNS will take care of looking up the name and retrieving the corresponding address. DNS also makes it easy to move services from one server to another with a different IP address; as users will locate the service by DNS name and not by IP address, they do not need to be notified.

While DNS supports a wide variety of queries, for the moment we will focus on queries for IPv4 addresses, or so-called A records. The AAAA record type is used for IPv6 addresses, and, internally, the NS record type is used to identify the “name servers” that answer DNS queries.

While a workstation can use TCP/IP without DNS, users would have an almost impossible time finding anything, and so the core startup configuration of an Internet-connected workstation almost always includes the IP address of its DNS server (see 7.10 Dynamic Host Configuration Protocol (DHCP) below for how startup configurations are often assigned).

Most DNS traffic today is over UDP, although a TCP option exists. Due to the much larger response sizes, TCP is often necessary for DNSSEC (22.12 DNSSEC).

DNS is distributed, meaning that each domain is responsible for maintaining its own DNS servers to translate names to addresses. (DNS, in fact, is a classic example of a highly distributed database where each node maintains a relatively small amount of data.) It is hierarchical as well; for the DNS name intronetworks.cs.luc.edu the levels of the hierarchy are

edu: the top-level domain (TLD) for educational institutions in the US
luc: Loyola University Chicago
cs: The Loyola Computer Science Department
intronetworks: a hostname associated to a specific IP address

The hierarchy of DNS names (that is, the set of all names and suffixes of names) forms a tree, but it is not only leaf nodes that represent individual hosts. In the example above, domain names luc.edu [http://luc.edu] and cs.luc.edu [http://cs.luc.edu] happen to be valid hostnames as well.

The DNS hierarchy is in a great many cases not very deep, particularly for DNS names assigned to commercial websites. Such domain names are often simply the company name (or a variant of it) followed by the top-level domain (often .com). Still, internally most organizations have many individually named behind-the-scenes servers with three-level (or more) domain names; sometimes some of these can be identified by viewing the source of the web page and searching it for domain names.

Top-level domains are assigned by ICANN [www.icann.org/]. The original top-level domains were seven three-letter domains – .com, .net, .org, .int, .edu, .mil and .gov – and the two-letter country-code domains (eg .us, .ca, .mx). Now there are hundreds of non-country top-level domains, such as .aero, .biz, .info, and, apparently, .wtf [en.Wikipedia.org/wiki/.wtf]. Domain names (and subdomain names) can also contain unicode characters, so as to support national alphabets. Some top-level domains are generic, meaning anyone can apply for a subdomain although there may be qualifying criteria. Other top-level domains are sponsored, meaning the sponsoring organization determines who can be assigned a subdomain, and so the qualifying criteria can be a little more arbitrary.

ICANN still must approve all new top-level domains. Applications are accepted only during specific intervals; the application fee for the 2012 interval was US$185,000. The actual leasing of domain names to companies and individuals is done by organizations known as domain registrars who work under contract with ICANN.

The full tree of all DNS names and prefixes is divided administratively into zones: a zone is an independently managed subtree, minus any sub-subtrees that have been placed – by delegation – into their own zone. Each zone has its own root DNS name that is a suffix of every DNS name in the zone. For example, the luc.edu zone contains most of Loyola’s DNS names, but cs.luc.edu has been spun off into its own zone. A zone cannot be the disjoint union of two subtrees; that is, cs.luc.edu and math.luc.edu must be two distinct zones, unless both remain part of their parent zone.

A zone can define DNS names more than one level deep. For example, the luc.edu zone can define records for the luc.edu name itself, for names with one additional level such as www.luc.edu, and for names with two additional levels such as www.cs.luc.edu. That said, it is common for each zone to handle only one additional level, and to create subzones for deeper levels.

Each zone has its own authoritative nameservers for the zone, which are charged with maintaining the records – known as resource records, or RRs – for that zone. Each zone must have at least two nameservers, for redundancy. IPv4 addresses are stored as so-called A records, for Address. Information about how to find sub-zones is stored as NS records, for Name Server. Additional resource-record types are discussed at 7.8.2 Other DNS Records. An authoritative nameserver need not be part of the organization that manages the zone, and a single server can be the authoritative nameserver for multiple unrelated zones. For example, many domain registrars maintain single nameservers that handle DNS queries for all their domain customers who do not wish to set up their own nameservers.

The root nameservers handle the zone that is the root of the DNS tree; that is, that is represented by the DNS name that is the empty string. As of 2019, there are thirteen of them. The root nameservers contain only NS records, identifying the nameservers for all the immediate subzones. Each top-level domain is its own such subzone. The IP addresses of the root nameservers are widely distributed. Their DNS names (which are only of use if some DNS lookup mechanism is already present) are a.root.servers.net through m.root-servers.net. These names today correspond not to individual machines but to clusters of up to hundreds of servers.

We can now put together a first draft of a DNS lookup algorithm. To find the IP address of intronetworks.cs.luc.edu [http://intronetworks.cs.luc.edu], a host first contacts a root nameserver (at a known address) to find the nameserver for the edu zone; this involves the retrieval of an NS record. The edu nameserver is then queried to find the nameserver for the luc.edu zone, which in turn supplies the NS record giving the address of the cs.luc.edu zone. This last has an A record for the actual host. (This example is carried out in detail below.)

This strategy has a defect in that it would send much too much traffic to the root nameservers. Instead, there exists a great number of local and semi-local “DNS servers” that we will call resolvers (though, confusingly, these are sometimes also known as “nameservers” or, more precisely, non-authoritative nameservers). A resolver is a host charged with looking up DNS names on behalf of a user or set of users, and returning corresponding addresses; for this reason they are sometimes called recursive nameservers (we return to recursive DNS lookups below).

Most ISPs and companies provide a resolver to handle the DNS needs of their customers and employees; we will refer to these as site resolvers. The IP addresses of these site resolvers is generally supplied via DHCP options (7.10 Dynamic Host Configuration Protocol (DHCP)); such resolvers are thus the default choice for DNS services.

Sometimes, however, users elect to use a DNS resolver not provided by their ISP or company; there are a number of public DNS servers (that is, resolvers) available. Such resolvers generally serve much larger areas. Common choices include OpenDNS [en.Wikipedia.org/wiki/OpenDNS], Google DNS [en.Wikipedia.org/wiki/Google_Public_DNS] (at 8.8.8.8), Cloudflare [blog.cloudflare.com/dns-resolver-1-1-1-1/] (at 1.1.1.1) and the Gnu Name System mentioned in the sidebar above, though there are many others. Searching for “public DNS server” turns up lists of them.

One advantage of using a public DNS server is that your local ISP can no longer track your DNS queries. On the other hand, the public server now can, so this becomes a matter of which you trust more (or less).

Some public DNS servers provide additional services, such as automatically filtering out domain names associated with security risks, or content inappropriate for young users. Sometimes there is a fee for this service.

A resolver uses the level-by-level algorithm above as a fallback, but also keeps a large cache of all the domain names (and other record types) that have been requested. A lifetime for each cache entry is provided by that entry’s authoritative nameserver; these lifetimes are typically on the order of several days. Every DNS record has a TTL (time-to-live) value representing its maximum cache lifetime.

If I send a query to Loyola’s site resolver for google.com, it is almost certainly in the cache. If I send a query for the misspelling googel.com [http://googel.com], this may not be in the cache, but the .com top-level nameserver almost certainly is in the cache. From that nameserver my local resolver finds the nameserver for the googel.com zone, and from that finds the IP address of the googel.com host.

Applications almost always invoke DNS through library calls, such as Java’s InetAddress.getByName(). The library forwards the query to the system-designated resolver (though browsers sometimes offer other DNS options; see 22.12.4 DNS over HTTPS). We will return to DNS library calls in 11.1.3.3 The Client and 12.6.1 The TCP Client.

On unix-based systems, traditionally the IPv4 addresses of the local DNS resolvers were kept in a file /etc/resolv.conf. Typically this file was updated with the addresses of the current resolvers by DHCP (7.10 Dynamic Host Configuration Protocol (DHCP)), at the time the system received its IPv4 address. It is possible, though not common, to create special entries in /etc/resolv.conf so that queries about different domains are sent to different resolvers, or so that single-level hostnames have a domain name appended to them before lookup. On Windows, similar functionality can be achieved through settings on the DNS tab within the Network Connections applet.

Recent systems often run a small “stub” resolver locally (eg Linux’s dnsmasq [en.Wikipedia.org/wiki/Dnsmasq]); such resolvers are sometimes also called DNS forwarders. The entry in /etc/resolv.conf is then an IPv4 address of localhost (sometimes 127.0.1.1 rather than 127.0.0.1). Such a stub resolver would, of course, still need access to the addresses of site or public resolvers; sometimes these addresses are provided by static configuration and sometimes by DHCP (7.10 Dynamic Host Configuration Protocol (DHCP)).

If a system running a stub resolver then runs internal virtual machines, it is usually possible to configure everything so that the virtual machines can be given an IP address of the host system as their DNS resolver. For example, often virtual machines are assigned IPv4 addresses on a private subnet and connect to the outside world using NAT (7.7 Network Address Translation). In such a setting, the virtual machines are given the IPv4 address of the host system interface that connects to the private subnet. It is then necessary to ensure that, on the host system, the local resolver accepts queries sent not only to the designated loopback address but also to the host system’s private-subnet address. (Generally, local resolvers do not accept requests arriving from externally visible addresses.)

When someone submits a query for a nonexistent DNS name, the resolver is supposed to return an error message, technically known as NXDOMAIN (Non eXistent Domain). Some resolvers, however, have been configured to return the IP address of a designated web server; this is particularly common for ISP-provided site resolvers. Sometimes the associated web page is meant to be helpful, and sometimes it presents an offer to buy the domain name from a registrar. Either way, additional advertising may be displayed. Of course, this is completely useless to users who are trying to contact the domain name in question via a protocol (ssh, smtp) other than http.

At the DNS protocol layer, a DNS lookup query can be either recursive or non-recursive. If A sends to B a recursive query to resolve a given DNS name, then B takes over the job until it is finally able to return an answer to A. If The query is non-recursive, on the other hand, then if B is not an authoritative nameserver for the DNS name in question it returns either a failure notice or an NS record for the sub-zone that is the next step on the path. Almost all DNS requests from hosts to their site or public resolvers are recursive.

A basic DNS response consists of an Answer section, an AUTHORITY section and, optionally, an ADDITIONAL section. Generally a response to a lookup of a hostname contains an Answer section consisting of a single A record, representing a single IPv4 address. If a site has multiple servers that are entirely equivalent, however, it is possible to give them all the same hostname by configuring the authoritative nameserver to return, for the hostname in question, multiple A records listing, in turn, each of the server IPv4 addresses. This is sometimes known as round-robin DNS. It is a simple form of load balancing; see also 18.9.5 loadbalance31.py. Consecutive queries to the nameserver should return the list of A records in different orders; ideally the same should also happen with consecutive queries to a local resolver that has the hostname in its cache. It is also common for a single server, with a single IPv4 address, to be identified by multiple DNS names; see the next section.

The response AUTHORITY section contains the DNS names of the authoritative nameservers responsible for the original DNS name in question. The ADDITIONAL section contains information the sender thinks is related; for example, this section often contains A records for the authoritative nameservers.

The Tor Project [en.Wikipedia.org/wiki/Tor_(a...mity_network)] uses DNS-like names that end in “.onion”. While these are not true DNS names in that they are not managed by the DNS hierarchy, they do work as such for Tor users; see RFC 7686 [https://tools.ietf.org/html/rfc7686.html]. These names follow an unusual pattern: the next level of name is an 80-bit hash of the site’s RSA public key (22.9.1 RSA), converted to sixteen ASCII bytes. For example, 3g2upl4pq6kufc4m.onion is apparently the Tor address for the search engine duckduckgo.com [https://duckduckgo.com/]. Unlike DuckDuckGo, many sites try different RSA keys until they find one where at least some initial prefix of the hash looks more or less meaningful; for example, nytimes2tsqtnxek.onion. Facebook got very lucky [lists.torproject.org/piperma...r/035413.html] in finding an RSA key whose corresponding Tor address is facebookcorewwwi.onion (though it is sometimes said that fortune is infatuated with the wealthy). This naming strategy is a form of cryptographically generated addresses; for another example see 8.6.4 Security and Neighbor Discovery. The advantage of this naming strategy is that you don’t need a certificate authority (22.10.2.1 Certificate Authorities) to verify a site’s RSA key; the site name does it for you.

7.8.1 nslookup (and dig)

Let us trace a non-recursive lookup of intronetworks.cs.luc.edu, using the nslookup [en.Wikipedia.org/wiki/Nslookup] tool. The nslookup tool is time-honored, but also not completely up-to-date, so we also include examples using the dig [en.Wikipedia.org/wiki/Dig_(command)] utility (supposedly an acronym for “domain Internet groper”). Lines we type in nslookup’s interactive mode begin below with the prompt “>”; the shell prompt is “#”. All dig commands are typed directly at the shell prompt.

The first step is to look up the IP address of the root nameserver a.name-servers.net. We can do this with a regular call to nslookup or dig, we can look this up in our nameserver’s configuration files, or we can search for it on the Internet. The address is 198.41.0.4.

We now send our nonrecursive query to this address. The presence of the single hyphen in the nslookup command line below means that we want to use 198.41.0.4 as the nameserver rather than as the thing to be looked up; dig has places on the command line for both the nameserver (following the @) and the DNS name. For both commands, we use the norecurse option to send a nonrecursive query.

# nslookup -norecurse - 198.41.0.4
> intronetworks.cs.luc.edu
*** Can't find intronetworks.cs.luc.edu: No answer

# dig @198.41.0.4 intronetworks.cs.luc.edu +norecurse

These fail because by default nslookup and dig ask for an A record. What we want is an NS record: the name of the next zone down to ask. (We can tell the dig query failed to find an A record because there are zero records in the Answer section)

> set query=ns
> intronetworks.cs.luc.edu
edu   nameserver = a.edu-servers.net
...
a.edu-servers.net      internet address = 192.5.6.30

# dig @198.41.0.4 intronetworks.cs.luc.edu NS +norecurse
;; AUTHORITY SECTION:
edu.                   172800  IN      NS      b.edu-servers.net.
;; ADDITIONAL SECTION:
b.edu-servers.net.     172800  IN      A       192.33.14.30

The full responses in each case are a list of all nameservers for the .edu zone; we list only the first response above. Note that the full DNS name intronetworks.cs.luc.edu in the query here is not an exact match for the DNS name .edu in the resource record returned; the latter is a suffix of the former. Some newer resolvers send just the .edu part, to limit the user’s privacy exposure.

We send the next NS query to a.edu-servers.net (which does appear in the full dig answer)

# nslookup -query=ns -norecurse - 192.5.6.30
> intronetworks.cs.luc.edu
...
Authoritative answers can be found from:
luc.edu nameserver = bcdnswt1.it.luc.edu.
bcdnswt1.it.luc.edu    internet address = 147.126.64.64

# dig @192.5.6.30 intronetworks.cs.luc.edu NS +norecurse
;; AUTHORITY SECTION:
luc.edu.               172800  IN      NS      bcdnswt1.it.luc.edu.
;; ADDITIONAL SECTION:
bcdnswt1.it.luc.edu.   172800  IN      A       147.126.64.64

(Again, we show only one of several luc.edu nameservers returned). We continue.

# nslookup -query=ns - -norecurse 147.126.64.64
> intronetworks.cs.luc.edu
...
Authoritative answers can be found from:
cs.luc.edu     nameserver = dns1.cs.luc.edu.
ns1.cs.luc.edu internet address = 147.126.2.44

# dig @147.126.64.64 intronetworks.cs.luc.edu NS +norecurse
;; AUTHORITY SECTION:
cs.luc.edu.            86400   IN      NS      ns1.cs.luc.edu.
;; ADDITIONAL SECTION:
ns1.cs.luc.edu.                86400   IN      A       147.126.2.44

We now ask this last nameserver, for the cs.luc.edu zone, for the A record:

# nslookup -query=A -norecurse - 147.126.2.44
> intronetworks.cs.luc.edu
...
intronetworks.cs.luc.edu       canonical name = linode1.cs.luc.edu.
Name:  linode1.cs.luc.edu
Address: 162.216.18.28

# dig @147.126.2.44 intronetworks.cs.luc.edu A +norecurse
;; Answer SECTION:
intronetworks.cs.luc.edu. 300  IN      A       162.216.18.28

This is the first time we get an Answer section (versus the AUTHORITY section)

Here we get a canonical name, or CNAME, record. The server that hosts intronetworks.cs.luc.edu also hosts several other websites, with different names; for example, introcs.cs.luc.edu [introcs.cs.luc.edu/] (at least as of 2015). This is known as virtual hosting. Rather than provide separate A records for each website name, DNS was set up to provide a CNAME record for each website name pointing to a single physical server name linode1.cs.luc.edu. Only one A record is then needed, for this server.

The nslookup request for an A record returned instead the CNAME record, together with the A record for that CNAME (this is the 162.216.18.28 above). This is done for convenience.

Note that the IPv4 address here, 162.216.18.28, is unrelated to Loyola’s own IPv4 address block 147.126.0.0/16. The server linode1.cs.luc.edu is managed by an external provider; there is no connection between the DNS name hierarchy and the IP address hierarchy.

Finally, if we look up both www.cs.luc.edu and cs.luc.edu, we see they resolve to the same address. The use of www as a hostname for a domain’s webserver is often considered unnecessary and old-fashioned; many users prefer the shorter, “naked” domain name, eg cs.luc.edu.

It might be tempting to create a CNAME record for the naked domain, cs.luc.edu, pointing to the full hostname www.cs.luc.edu. However, RFC 1034 [https://tools.ietf.org/html/rfc1034.html] does not allow this:

If a CNAME RR is present at a node, no other data should be present; this ensures that the data for a canonical name and its aliases cannot be different.

There are, however, several other DNS data records for cs.luc.edu: an NS record (above), a SOA, or Start of Authority, record containing various administrative data such as the expiration time, and an MX record, discussed in the following section. All this makes www.cs.luc.edu and cs.luc.edu ineluctably quite different. RFC 1034 [https://tools.ietf.org/html/rfc1034.html] adds, “this rule also insures that a cached CNAME can be used without checking with an authoritative server for other RR types.”

A better way to create a naked-domain record, at least from the perspective of DNS, is to give it its own A record. This does mean that, if the webserver address changes, there are now two DNS records that need to be updated, but this is manageable.

Recently ANAME records have been proposed to handle this issue; an ANAME is like a limited CNAME not subject to the RFC 1034 [https://tools.ietf.org/html/rfc1034.html] restriction above. An ANAME record for a naked domain, pointing to another hostname rather than to address, is legal. See the Internet draft draft-hunt-dnsop-aname [https://tools.ietf.org/html/draft-hunt-dnsop-aname]. Some large CDNs (1.12.2 Content-Distribution Networks) already implement similar DNS tweaks internally. This does not require end-user awareness; the user requests an A record and the ANAME is resolved at the CDN side.

Finally, there is also an argument, at least when HTTP (web) traffic is involved, that the www not be deprecated, and that the naked domain should instead be redirected, at the HTTP layer, to the full hostname. This simplifies some issues; for example, you now have only one website, rather than two. You no longer have to be concerned with the fact that HTTP cookies with and without the “www” are different. And some CDNs may not be able to handle website failover to another server if the naked domain is reached via an A record. But none of these are DNS issues.

7.8.2 Other DNS Records

Besides address lookups, DNS also supports a few other kinds of searches. The best known is probably reverse DNS, which takes an IP address and returns a name. This is slightly complicated by the fact that one IP address may be associated with multiple DNS names. What DNS does in this case is to return the canonical name, or CNAME; a given address can have only one CNAME.

Given an IPv4 address, say 147.126.1.230, the idea is to reverse it and append to it the suffix in-addr.arpa.

230.1.126.147.in-addr.arpa

There is a DNS name hierarchy for names of this form, with zones and authoritative servers. If all this has been configured – which it often is not, especially for user workstations – a request for the PTR record corresponding to the above should return a DNS hostname. In the case above, the name luc.edu [http://luc.edu] is returned (at least as of 2018).

PTR records are the only DNS records to have an entirely separate hierarchy; other DNS types fit into the “standard” hierarchy. For example, DNS also supports MX, or Mail eXchange, records, meant to map a domain name (which might not correspond to any hostname, and, if it does, is more likely to correspond to the name of a web server) to the hostname of a server that accepts email on behalf of the domain. In effect this allows an organization’s domain name, eg luc.edu, to represent both a web server and, at a different IP address, an email server. MX records can even represent a set of IP addresses that accept email.

DNS has from the beginning supported TXT records, for arbitrary text strings. The email Sender Policy Framework (RFC 7208 [https://tools.ietf.org/html/rfc7208.html]) was developed to make it harder for email senders to pretend to be a domain they are not; this involves inserting so-called SPF records as DNS TXT records (or as substrings of TXT records, if TXT is also being used for something else.

For example, a DNS query for TXT records of google.com (not gmail.com!) might yield (2018)

google.com     text = "docusign=05958488-4752-4ef2-95eb-aa7ba8a3bd0e"
google.com     text = "v=spf1 include:_spf.google.com ~all"

The SPF system is interested in only the second record; the “v=spf1” specifies the SPF version. This second record tells us to look up _spf.google.com. That lookup returns

text = "v=spf1 include:_netblocks.google.com include:_netblocks2.google.com include:_netblocks3.google.com ~all"

Lookup of _netblocks.google.com then returns

text = "v=spf1 ip4:64.233.160.0/19 ip4:66.102.0.0/20 ip4:66.249.80.0/20 ip4:72.14.192.0/18 ip4:74.125.0.0/16 ip4:108.177.8.0/21 ip4:173.194.0.0/16 ip4:209.85.128.0/17 ip4:216.58.192.0/19 ip4:216.239.32.0/19 ~all"

If a host connects to an email server, and declares that it is delivering mail from someone at google.com, then the host’s email list should occur in the list above, or in one of the other included lists. If it does not, there is a good chance the email represents spam.

Each DNS record (or “resource record”) has a name (eg cs.luc.edu) and a type (eg A or AAAA or NS or MX). Given a name and type, the set of matching resource records is known as the RRset for that name and type (technically there is also a “class”, but the class of all the DNS records we are interested in is IN, for Internet). When a nameserver responds to a DNS query, what is returned (in the Answer section) is always an entire RRset: the RRset of all resource records matching the name and type contained in the original query.

In many cases, RRsets have a single member, because many hosts have a single IPv4 address. However, this is not universal. We saw above the example of a single DNS name having multiple A records when round-robin DNS is used. A single DNS name might also have separate A records for the host’s public and private IPv4 addresses. And perhaps most MX-record (Mail eXchange) RRsets have multiple entries, as organizations often prefer, for redundancy, to have more than one server that can receive email.

7.8.3 DNS Cache Poisoning

The classic DNS security failure, known as cache poisoning, occurs when an attacker has been able to convince a DNS resolver that the address of, say, www.example.com is something other than what it really is. A successful attack means the attacker can direct traffic meant for www.example.com to the attacker’s own, malicious site.

The most basic cache-poisoning strategy is to send a stream of DNS reply packets to the resolver which declare that the IP address of www.example.com is the attacker’s chosen IP address. The source IP address of these packets should be spoofed to be that of the example.com authoritative nameserver; such spoofing is relatively easy using UDP. Most of these reply packets will be ignored, but the hope is that one will arrive shortly after the resolver has sent a DNS request to the example.com authoritative nameserver, and interprets the spoofed reply packet as a legitimate reply.

To prevent this, DNS requests contain a 16-bit ID field; the DNS response must echo this back. The response must also come from the correct port. This leaves the attacker to guess 32 bits in all, but often the ID field (and even more often the port) can be guessed based on past history.

Another approach requires the attacker to wait for the target resolver to issue a legitimate request to the attacker’s site, attacker.com. The attacker then piggybacks in the ADDITIONAL section of the reply message an A record for example.com pointing to the attacker’s chosen bad IP address for this site. The hope is that the receiving resolver will place these A records from the ADDITIONAL section into its cache without verifying them further and without noticing they are completely unrelated. Once upon a time, such DNS resolver behavior was common.

Most newer DNS resolvers carefully validate the replies: the ID field must match, the source port must match, and any received DNS records in the ADDITIONAL section must match, at a minimum, the DNS zone of the request. Additionally, the request ID field and source port should be chosen pseudorandomly in a secure fashion. For additional vulnerabilities, see RFC 3833 [https://tools.ietf.org/html/rfc3833.html].

The central risk in cache poisoning is that a resolver can be tricked into supplying users with invalid DNS records. A closely related risk is that an attacker can achieve the same result by spoofing an authoritative nameserver. Both of these risks can be mitigated through the use of the DNS security extensions, known as DNSSEC. Because DNSSEC makes use of public-key signatures, we defer coverage to 22.12 DNSSEC.

7.8.4 DNS and CDNs

DNS is often pressed into service by CDNs (1.12.2 Content-Distribution Networks) to identify their closest “edge” server to a given user. Typically this involves the use of geoDNS, a slightly nonstandard variation of DNS. When a DNS query comes in to one of the CDN’s authoritative nameservers, that server

looks up the approximate location of the client (10.4.4 IP Geolocation)
determines the closest edge server to that location
replies with the IP address of that closest edge server

This works reasonably well most of the time. However, the requesting client is essentially never the end user; rather, it is the DNS resolver being used by the user. Typically such resolvers are the site resolvers provided by the user’s ISP or organization, and are physically quite close to the user; in this case, the edge server identified above will be close to the user as well. However, when a user has chosen a (likely remote) public DNS resolver, as above, the IP address returned for the CDN edge server will be close to the DNS resolver but likely far from optimal for the end user.

One solution to this last problem is addressed by RFC 7871 [https://tools.ietf.org/html/rfc7871.html], which allows DNS resolvers to include the IP address of the client in the request sent to the authoritative nameserver. For privacy reasons, usually only a prefix of the user’s IP address is included, perhaps /24. Even so, user’s privacy is at least partly compromised. For this reason, RFC 7871 [https://tools.ietf.org/html/rfc7871.html] recommends that the feature be disabled by default, and only enabled after careful analysis of the tradeoffs.

A user who is concerned about the privacy issue can – in theory – configure their own DNS software to include this RFC 7871 [https://tools.ietf.org/html/rfc7871.html] option with a zero-length prefix of the user’s IP address, which conveys no address information. The user’s resolver will then not change this to a longer prefix.

Use of this option also means that the DNS resolver receiving a user query about a given hostname can no longer simply return a cached answer from a previous lookup of the hostname. Instead, the resolver needs to cache separately each ⟨hostname,prefix⟩ pair it handles, where the prefix is the prefix of the user’s IP address forwarded to the authoritative nameserver. This has the potential to increase the cache size by several orders of magnitude, which may thereby enable some cache-overflow attacks.