DNS – The Internet's Directory Service

We human beings can be identified in many ways. For example, we can be identified by the names that appear on our birth certificates. We can be identified by our social security numbers. We can be identified by our driver's license numbers. Although each of these identifiers can be used to identify people, within a given context, one identifier may be more appropriate than an other. For example, the computers at the IRS (the infamous tax collecting agency in the US) prefer to use fixed-length social security numbers rather than birth-certificate names. On the other hand, ordinary people prefer the more mnemonic birth-certificate names rather than social security numbers. (Indeed, can you imagine saying, Hi. My name is 132-67-9875. Please meet my husband, 178-87-1146.)

Just as humans can be identified in many ways, so too can Internet hosts. One identifier for a host is its hostname. Hostnames – such as cnn.com, www.yahoo.com, gaia.cs.umass.edu and surf.eurecom.fr – are mnemonic and are therefore appreciated by humans. However, hostnames provide little, if any, information about the location within the Internet of the host. (A hostname such as surf.eurecom.fr, which ends with the country code .fr, tells us that the host is in France, but doesn't say much more.) Furthermore, because hostnames can consist of variable-length alpha-numeric characters, they would be difficult to process by routers. For these reasons, hosts are also identified by so-called IPaddresses. We will discuss IP addresses in some detail in Chapter 4, but it is useful to say a few brief words about them now. An IP address consists of four bytes and has a rigid hierarchical structure. An IP address looks like 121.7.106.83, where each period separates one of the bytes expressed in decimal notation from 0 to 127. An IP address is hierarchical because as we scan the address from left to right, we obtain more and more specific information about where (i.e., within which network, in the network of networks) the host is located in the Internet. (Just as when we scan a postal address from bottom to top we obtain more and more specific information about where the residence is located). An IP address is included in the header of each IP datagram, and Internet routers use this IP address to route s datagram towards its destination.

Services Provided by DNS

We have just seen that there are two ways to identify a host – a hostname and an IP address. People prefer the more mnemonic hostname identifier, while routers prefer fixed-length, hierarchically-structured IP addresses. In order to reconcile these different preferences, we need a directory service that translates hostnames to IP addresses. This is the main task of the the Internet's Domain Name System (DNS). The DNS is (i) a distributed database implemented in a hierarchy of name servers and (ii) an application-layer protocol that allows hosts and name servers to communicate in order to provide the translation service. Name servers are usually Unix machines running the Berkeley Internet Name Domain (BIND) software. The DNS protocol runs over UDP and uses port 53. Following this chapter we provide interactive links to DNS programs that allow you to translate arbitrary hostnames, among other things.

DNS is commonly employed by other application-layer protocols – including HTTP, SMTP and FTP – to translate user-supplied host names to IP addresses. As an example, consider what happens when a browser (i.e., an HTTP client), running on some user's machine, requests the URL www.someschool.edu/index.html. In order for the user's machine to be able to send an HTTP request message to the Web server www.someschool.edu, the user's machine must obtain the IP address of www.someschool.edu. This is done as follows. The same user machine runs the client-side of the DNS application. The browser extracts the hostname, www.someschool.edu, from the URL and passes the hostname to the client-side of the DNS application. As part of a DNS query message, the DNS client sends a query containing the hostname to a DNS server. The DNS client eventually receives a reply, which includes the IP address for the hostname. The browser then opens a TCP connection to the HTTP server process located at that IP address. All IP datagrams sent to from the client to server as part of this connection will include this IP address in the destination address field of the datagrams. In particular, the IP datagram(s) that encapsulate the HTTP request message use this IP address. We see from this example that DNS adds an additional delay – sometimes substantial – to the Internet applications that use DNS. Fortunately, as we shall discuss below, the desired IP address is often cached in a "near by" DNS name server, which helps to reduce the DNS network traffic as well as the average DNS delay.

Like HTTP, FTP, and SMTP, the DNS protocol is an application-layer protocol since (i) it runs between communicating end systems (again using the client-server paradigm), and (ii) it relies on an underlying end-to-end transport protocol (i.e., UDP) to transfer DNS messages between communicating end systems. In another sense, however, the role of the DNS is quite different from Web, file transfer, and email applications. Unlike these applications, the DNS is not an application with which a user directly interacts. Instead, the DNS provides a core Internet function – namely, translating hostnames to their underlying IP addresses, for user applications and other software in the Internet. We noted earlier in Section 1.2 that much of the "complexity" in the Internet architecture is located at the "edges" of the network. The DNS, which implements the critical name-to-address translation process using clients and servers located at the edge of the network, is yet another example of that design philosophy.

DNS provides a few other important services in addition to translating hostnames to IP addresses:

The DNS is specified in [RFC 1034] and [RFC 1035], and updated in several additional RFCs. It is a complex system, and we only touch upon key aspects of its operation here. The interested reader is referred to these RFCs and the book [Abitz 1993].

Overview of How DNS Works

We now present a high-level overview of how DNS works. Our discussion shall focus on the hostname to IP address translation service. From the client's perspective, the DNS is a black box. The client sends a DNS query message into the black box, specifying the hostname that needs to be translated to an IP address. On many Unix-based machines,gethostbyname() is the library routine that an application calls in order to issue the query message. In Section 2.7, we shall present a Java program that begins by issuing a DNS query. After a delay, ranging from milliseconds to tens of seconds, the client receives a DNS reply message that provides the desired mapping. Thus, from the client's perspective, DNS is a simple, straightforward translation service. But in fact, the black box that implements the service is complex, consisting of large number of name servers distributed around the globe, as well as an application-layer protocol that specifies how the name servers and querying hosts communicate.

A simple design for DNS would have one Internet name server that contains all the mappings. In this centralized design, clients simply direct all queries to the single name server, and the name server responds directly to the querying clients. Although the simplicity of this design is attractive, it is completely inappropriate for today's Internet, with its vast (and growing) number of hosts. The problems with a centralized design include:

In summary, a centralized database in a single name server simply doesn't scale. Consequently, the DNS is distributed by design. In fact, the DNS is a wonderful example of how a distributed database can be implemented in the Internet.

In order to deal with the issue of scale, the DNS uses a large number of name servers, organized in a hierarchical fashion and distributed around the world. No one name server has all of the mappings for all of the hosts in the Internet. Instead, the mappings are distributed across the name servers. To a first approximation, there are three types of name servers: local name servers, root name servers, and authoritative name servers. These name servers, again to a first approximation, interact with each other and with the querying host as follows:


Figure 2.5-1: A February 1998 map of the DNS root servers. Obtained from the WIA alliance Web site (http://www.wia.org/).

Let's take a look at a simple example. Suppose the host surf.eurecom.fr desires the IP address of gaia.cs.umass.edu. Also suppose that Eurecom's local name server is called dns.eurecom.fr and that an authoritative name server for gaia.cs.umass.edu is called dns.umass.edu. As shown in Figure 2.5-2, the host surf.eurecom.fr first sends a DNS query message to its local name server, dns.eurecom.fr. The query message contains the hostname to be translated, namely, gaia.cs.umass.edu. The local name server forwards the query message to a root name server. The root name server forwards the query message to the name server that is authoritative for all the hosts in the domain umass.edu, namely, to dns.umass.edu. The authoritative name server then sends the desired mapping to the querying host, via the root name server and the local name server. Note that in this example, in order to obtain the mapping for one hostname, six DNS messages were sent: three query messages and three reply messages.


Figure 2.5-2: Recursive queries to obtain the mapping for gaia.cs.umass.edu.

Our discussion up to this point has assumed that the root name server knows the IP address of an authoritative name server for every hostname. This assumption may be incorrect. For a given hostname, the root name server may only know the IP address of an intermediate name server that in turn knows the IP address of an authoritative name server for the hostname. To illustrate this, consider once again the above example with the host surf.eurecom.fr querying for the IP address of gaia.cs.umass.edu. Suppose now that the University of Massachusetts has a name server for the university, called dns.umass.edu. Also suppose that each of the departments at University of Massachusetts has its own name server, and that each departmental name server is authoritative for all the hosts in the department. As shown in Figure 2.5-3, when the root name server receives a query for a host with hostname ending with umass.edu it forwards the query to the name server dns.umass.edu. This name server forwards all queries with hostnames ending with .cs.umass.edu to the name server dns.cs.umass.edu, which is authoritative for all hostnames ending with .cs.umass.edu. The authoritative name server sends the desired mapping to the intermediate name server, dns.umass.edu, which forwards the mapping to the root name server, which forwards the mapping to the local name server, dns.eurecom.fr, which forwards the mapping to the requesting host! In this example, eight DNS messages are sent. Actually, even more DNS messages can be sent in order to translate a single hostname – there can be two or more intermediate name servers in the chain between the root name server and the authoritative name server!


Figure 2.5-3: Recursive queries with an intermediate name server between the root and authoritative name servers.

The examples up to this point assumed that all queries are recursive queries. When a host or name server A makes a recursive query to a name server B, then name server B obtains the requested mapping on behalf of A and then forwards the mapping to A. The DNS protocol also allows for iterative queries at any step in the chain between requesting host and authoritative name server. When a name server A makes an iterative query to name server B, if name server B does not have the requested mapping, it immediately sends a DNS reply to A that contains the IP address of the next name server in the chain, say, name server C. Name server A then sends a query directly to name serve  C.

In the sequence of queries that are are required to translate a hostname, some of the queries can be iterative and others recursive. Such a combination of recursive and iterative queries is illustrated in Figure 2.5-4. Typically, all queries in the query chain are recursive except for the query from the local name server to the root name server, which is iterative. (Because root servers handle huge volumes of queries, it is preferable to use the less burdensome iterative queries for root servers.)


Figure 2.5-4: A query chain with recursive and iterative queries.

Our discussion this far has not touched on one important feature of the DNS: DNS caching. In reality, DNS extensively exploits caching in order to improve the delay performance and to reduce the number of DNS messages in the network. The idea is very simple. When a name server receives a DNS mapping for some hostname, it caches the mapping in local memory (disk or RAM) while passing the message along the name server chain. Given a cached hostname/ IPaddress translation pair, if another query arrives to the name server for the same hostname, the name server can provide the desired IP address, even if it is not authoritative for the hostname. In order to deal with the ephemeral hosts, a cached record is discarded after a period of time (often set to two days). As an example, suppose that surf.eurecom.fr queries the DNS for the IP address for the hostname cnn.com. Furthermore suppose that a few hours later, another Eurecom host, say baie.eurecom.fr, also queries DNS with the same hostname. Because of caching, the local name server at Eurecom will be able to immediately return the IP address to the requesting host without having to query name servers on another continent. Any name server may cache DNS mappings.

DNS Records

The name servers that together implement the DNS distributed database, store Resource Records (RR) for the hostname to IP address mappings. Each DNS reply message carries one or more resource records. In this and the following subsection, we provide a brief overview of DNS resource records and messages; more details can be found in [Abitz] or in the DNS RFCs [RFC 1034], [RFC 1035].

A resource record is a four-tuple that contains the following fields:

(Name, Value, Type, TTL)

TTL is the time to live of the resource record; it determines the time at which a resource should be removed from a cache. In the example records given below, we will ignore the TTLfield. The meaning of Name and Value depend on Type:

If a name server is authoritative for a particular hostname, then the name server will contain a Type A record for the hostname. (Even if the name server is not authoritative, it may contain a Type A record in its cache.) If a server is not authoritative for a hostname, then the server will contain a Type NS record for the domain that includes the hostname; it will also contain a Type A record that provides the IP address of the name server in the Value field of the NS record. As an example, suppose a root server is not authoritative for the host gaia.cs.umass.edu. Then the root server will contain a record for a domain that includes the host cs.umass.edu, e.g.,

(umass.edu, dns.umass.edu, NS).

The root server would also contain a type A record which maps the name server dns.umass.edu to an IP address, e.g.,

(dns.umass.edu, 128.119.40.111, A).

DNS Messages

Earlier in this section we alluded to DNS query and reply messages. These are the only two kinds of DNS messages. Furthermore, both request and reply messages have the same format, as shown in Figure 2.5-5.


Figure 2.5-5: DNS message format

The semantics of the various fields in a DNS message are as follows:

The discussion above has focussed on how data is retrieved from the DNS database. You might be wondering how data gets into the database in the first place? Until recently, the contents of each DNS server was configured statically, e.g., from a configuration file created by a system manager. More recently, an UPDATE option has been added to the DNSprotocol to allow data to be dynamically added or deleted from the database via DNS messages. [RFC 2136] specifies DNS dynamic updates.

DNSNet provides a nice collection of documents pertaining to DNS [DNSNet]. The Internet Software Consortium provides many resources for BIND, a popular public-domain name server for Unix machines [BIND]