dnstracer doesn't diagnose what one wants because it uses the wrong query resolution algorithm.

Introduction

dnstracer is a tool that is purported to

[determine] where a given Domain Name Server (DNS) gets its information from, and [follow] the chain of DNS servers back to the servers which know the data.

Of course, this description (taken from its manual) is not what it actually does. There is no "chain of DNS servers" to be followed, for one thing. DNS simply doesn't work that way, and a design based upon such an erroneous misconception of how DNS operates would be fundamentally incorrect. There's no analogue for tools such as traceroute in DNS, because DNS queries and responses aren't relayed along a chain of content DNS servers. Content DNS servers don't talk to other content DNS servers.

One might hesitate from stating that dnstracer actually is designed on the basis of such an erroneous misunderstanding of how DNS service operates, although it would certainly explain the otherwise rather bizarre note in the bugs section of its manual that states that dnstracer is "useless" for pointing at dnscache, a proxy DNS server. One doesn't start a "trace" of DNS service at a proxy DNS server. One starts a "trace" of DNS service at a content DNS server, usually, but not necessarily, at one of one's chosen set of "." content DNS servers. One traces the path that a resolving proxy DNS server follows, as it is performing the task of query resolution, querying the content DNS servers that it queries and following the chains of delegations (not of servers) that it follows.

Indeed, this is what dnstracer in fact does do in practice, despite what its manual says. Of course, this is precisely what one wants to do if one is investigating delegations. And if one wants to determine "the servers which know the data", in order either to check that they are the intended ones or to discover who is publishing a particular part of the DNS database, investigating delegations is what one will be doing.

Or, rather, what dnstracer does would be precisely what one wants to do - if dnstracer did it correctly.

But it doesn't. Unfortunately, dnstracer (up to and including version 1.5, at least) does not use the query resolution algorithm that a secure resolving proxy DNS server actually uses. Therefore what dnstracer does, and what it reports, will not correspond to what a secure resolving proxy DNS server does and what results it will obtain in practice.

This makes dnstracer a flawed tool for tracing the operation of query resolution, in order to diagnose problems or to locate content DNS servers. Don't rely upon it.

There are, of course, better tools for diagnosing delegation problems. (dnstracer has been compared to one particular better tool ever since it first appeared, for example.) They are slower to run to completion, but that is because they do the right thing and doing the right thing takes more time than using the algorithm that dnstracer uses. dnstracer provides incorrect results quickly.

dnstracer's deviations

dnstracer deviates from the actual query resolution algorithm in several important ways.

dnstracer does not discard out-of-bailiwick glue.

dnstracer uses the "glue" resource record sets in the "additional" section of a referral response returned from a content DNS server, irrespective of whether those records are in fact outside of that server's bailiwick.

This is wrong. A secure resolving proxy DNS server discards "additional section glue" that is outside of the content DNS server's bailiwick, in order to prevent poisoning, and performs separate lookups in order to securely obtain the data.

The results of this error are twofold.

dnstracer won't detect various kinds of gluelessness. It won't detect a domain whose content DNS servers publish glue, but that is effectively glueless because that glue cannot be trusted and cannot be obtained from a trustworthy source. Instead it will blithely accept and use the glue, and report no problems. Secure resolving proxy DNS servers will be unable to lookup resource record sets for the domain, in the meanwhile, and administrators are left scratching their heads when "the trace clearly shows a delegation pointing from Y to Z, so that cannot be the problem".

dnstracer may well (in certain situations) query a set of content DNS servers that is quite different to the set that a secure resolving proxy DNS server queries. If, for example, a superdomain content DNS server publishes a referral whose out-of-bailiwick glue gives IP addresses different to those published by the relevant trustworthy sources, dnstracer will blithely follow the delegation to the wrong set of content DNS servers. Secure resolving proxy DNS servers will follow the correct glue to a different set of servers, in the meanwhile, and administrators will be completely mislead by the trace output as to who "the servers which know the data" really are.

dnstracer erroneously follows out-of-bailiwick delegations.

When a content DNS server returns a partial response that is a referral back to itself for that server's bailiwick, dnstracer correctly reports the server to be "lame". However, dnstracer does not report servers issuing referrals to superdomains of their bailiwicks as "lame". Instead, it follows the referrals.

This is wrong. Out-of-bailiwick referrals are "lame" too. A secure resolving proxy DNS server does not follow them.

The result of this is that dnstracer will follow paths in search of answers that a secure resolving proxy DNS server will not follow. dnstracer can falsely report the possibility of success in situations where a secure resolving proxy DNS server would actually fail. dnstracer can also query content DNS servers that a secure resolving proxy DNS server will never query.

Here is a real world example of where dnstracer erroneously follows an out-of-bailiwick delegation and ends up tracing the delegations down from a completely new set of roots (which a secure resolving proxy DNS server will not do).

When dnstracer is invoked with the command

dnstracer -s a.root-servers.orsc. -q a www.openwatcom.org.
to trace the delegations from the ORSC "." content DNS servers, at the time of writing it receives at one point (because of either a database replication error or a lame delegation) a response from one of the "openwatcom.org." content DNS servers that delegates to the ICANN "." content DNS servers. dnstracer proceeds to follow the delegation and to trace the entire path down from the ICANN "." content DNS servers as well.

In fact, a secure resolving proxy DNS server will ignore the upwards delegation because information about "." is outside of the bailiwick of the "openwatcom.org." content DNS servers. It will log the server (or the delegation to it) as being "lame", and not traverse the graph further along from that point. A secure resolving proxy DNS server will never end up querying the ICANN "." content DNS servers when resolving such a query starting from the ORSC root. But dnstracer does.

dnstracer does not obey RFC 2308 and RFC 1034.

dnstracer determines whether a response contains a complete answer or a partial answer (i.e. one that ends in a referral or an alias) from looking solely at the AA bit in the datagram header. A datagram with the AA bit set to 0 is presumed to be a partial answer.

This is wrong. The AA bit has nothing to do with it. According to the rules given in RFC 2308, whether a response contains a complete answer or a partial answer is determined from the resource record sets present in the response, not from the header flags. According to the query resolution algorithm in RFC 1034, resolving proxy DNS servers don't stop the process of query resolution just because the AA bit is set to 1. They stop when they finally receive a complete answer.

The result of this is that dnstracer will continue or stop the process of query resolution at points that are different to where a secure resolving proxy DNS server will actually continue or stop.

Here is a real world example of where dnstracer fails to actually report "the servers which know the data" because of this.

When dnstracer is invoked with the command

dnstracer -s h.root-servers.net. -q a h.root-servers.net.
it traces no levels of delegation because the first content DNS server queried returns a response with the AA bit set to 1, which causes dnstracer to stop right there.

In fact, there are three sets of "servers which know the data" here, two of which dnstracer utterly fails to even query, let alone report on, because it uses the wrong criterion for deciding where to stop. A caching resolving proxy DNS server will ask the "." content DNS servers first of all. But if it has answered such a question before, it might well have the "net." content DNS server information cached, and so could ask the "net." content DNS servers directly. Indeed, it might well have the "root-servers.net." content DNS server information cached, and so could ask the "root-servers.net." content DNS servers directly. dnstracer does not report any of these possibilities.

dnstracer does not cope with empty resource record sets and "no such name" errors.

dnstracer does not treat responses containing empty resource record sets ("no data") or indicating "no such name" errors as complete answers.

This is wrong. Resolving proxy DNS servers treat empty resource record sets and "no such name" errors as complete answers.

The result of this is that dnstracer is not helpful when one wants to diagnose delegation problems for empty resource record sets or for non-existent domain names (e.g. when one wants to check that delegation is correct for negative answers as well as for positive ones). It won't report any servers as being "the servers which know the data" with a "Got answer" indication, because it doesn't recognise negative or empty responses as being complete answers. According to its output, there are no "servers which know the data" in such cases.

Here is a real world example of where dnstracer reports no "servers which know the data", even though several actually do.

When dnstracer is invoked with the command

dnstracer -s b.root-servers.net. -q a q.root-servers.net.
it doesn't report receiving an answer from anywhere, even though it actually receives an answer from each of the "root-servers.net." content DNS servers. (In fact, because of this, it queries the same servers over and over again, because it doesn't cache the fact of receiving an answer from them the first time.)
© Copyright 2002 Jonathan de Boyne Pollard. "Moral" rights asserted.
Permission is hereby granted to copy and to distribute this web page in its original, unmodified form as long as its last modification datestamp is preserved.