To main content

Java’s DNS resolution is so 90ies!

Published by Benjamin Marwell on

Have you ever wondered how your program will connect to other hosts?

Well, I can tell you the answer for Java (any version) and compare it to Python. It is a 90’s implementation, and there is not even a good way to fix it.

How DNS Resolving works

As you probably already know, DNS IP resolution works by querying a DNS server with a host name. As a response, you will get a list of IP addresses (in the answer section, if the host name is known). It works like this:

dig +norrcomments +nostats  TXT google.com

; <<>> DiG 9.16.1-Ubuntu <<>> +norrcomments +nostats TXT google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33087
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.com.                    IN      TXT

;; ANSWER SECTION:
google.com.             3521    IN      TXT     "facebook-domain-verification=22rm551cu4k0ab0bxsw536tlds4h95"
google.com.             3521    IN      TXT     "v=spf1 include:_spf.google.com ~all"
google.com.             221     IN      TXT     "docusign=1b0a6754-49b1-4db5-8540-d2c12664b289"
google.com.             3521    IN      TXT     "globalsign-smime-dv=CDYX+XFHUw2wml6/Gb8+59BsH31KzUr6c1l2BPvqKX8="
google.com.             221     IN      TXT     "docusign=05958488-4752-4ef2-95eb-aa7ba8a3bd0e"

As we can see here, when querying for a TXT record, multiple records are provided in the answer section. The same may be true for querying an IP address, like so.

dig +nostats google.de

; <<>> DiG 9.16.1-Ubuntu <<>> +nostats google.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14434
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.de.                     IN      A

;; ANSWER SECTION:
google.de.              54      IN      A       142.250.74.195

Although in this example just one IP address is being returned, a valid answer can also return multiple IPs. RFC 1034, section 5.2.1 Client-resolver interface: typical functions:

Since the DNS does not preserve the order of RRs, this function may choose to sort the returned addresses or select the "best" address if the service returns only one choice to the client. Note that a multiple address return is recommended […]

https://tools.ietf.org/html/rfc1034#section-5.2.1

How the client should react to this

Sadly, there is no way to tell how the client selects the IP it connects to. There is RFC 1035 says about multi homed hosts:

Application protocol implementations SHOULD be prepared to try multiple addresses from the list until success is obtained.

HTTPS://TOOLS.IETF.ORG/HTML/RFC1123#SECTION-2

How Java handles name resolving

Java’s name resolving algorith in a nutshell:

InetAddress.java

// source: https://github.com/openjdk/jdk/blob/270674ce1b1b8d44bbe92949c3f7db7b7c767cac/src/java.base/share/classes/java/net/InetAddress.java#L1236-L1239
public class InetAddress {
    public static InetAddress getByName(String host)
        throws UnknownHostException {
        return InetAddress.getAllByName(host)[0];
    }
}

Wow, that is remarkably simple! How do we know that this IP will be reachable? Well, we do not! If there are more IPs in the answer section, they are just being ignored.

Where is this code being called? Read on, I will explain this in the python part!

Python needs a destination port and protocol first

Let us compare this to Python’s implementation, taken from the CPython source code:

// source: https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/socket.py#L707-L719

def create_connection(address, timeout=_GLOBAL_DEFAULT_TIMEOUT,
                      source_address=None):
    host, port = address
    err = None
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
        af, socktype, proto, canonname, sa = res
        sock = None
        try:
            sock = socket(af, socktype, proto)
            if timeout is not _GLOBAL_DEFAULT_TIMEOUT:
                sock.settimeout(timeout)
            if source_address:
                sock.bind(source_address)
            sock.connect(sa)
            # Break explicitly a reference cycle
            err = None
            return sock

        except error as _:
            err = _
            if sock is not None:
                sock.close()

    if err is not None:
        try:
            raise err
        finally:
            # Break explicitly a reference cycle
            err = None
    else:
        raise error("getaddrinfo returns an empty list")

That is a much more sophisticated approach! As you can see, line 7 loops over all returned IPs. Then there is a try-catch statement in lines 10 and 21, which tries to connect using a default timeout (set to an empty object, i.e. empty optional in python). The actual connection attempt is in line 16. If line 16 runs into a timeout (or other problem, like having no route), the next IP is tried until all available IP addresses are exhausted.

So, where does it actually get the IPs from? It’s also in line 7 where getaddrinfo() is called.

Python: getaddrinfo() to obtain IP addresses for connection checks.
Python will have the destination port at hand when resolving for IPs

Back to Java: Socket already lost all IPs

Java does not do this. Actually, in Socket.java you can find this code:

// source: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/Socket.java#L634-L637

public
class Socket implements java.io.Closeable {
    // […]
    public void connect(SocketAddress endpoint, int timeout) throws IOException {
        // […]
        InetSocketAddress epoint = (InetSocketAddress) endpoint;
        InetAddress addr = epoint.getAddress ();
        int port = epoint.getPort();
        checkAddress(addr, "connect");
        // […]
    }

    // […]
}

As you can see, in line 10 InetSocketAddress.getAddress() is being called without any additional port information.

You can even dig down further:

// AbstractPlainSocketImpl.java

    /**
     * Creates a socket and connects it to the specified port on
     * the specified host.
     * @param host the specified host
     * @param port the specified port
     */
    protected void connect(String host, int port)
        throws UnknownHostException, IOException
    {
        boolean connected = false;
        try {
            InetAddress address = InetAddress.getByName(host);
            this.port = port;
            this.address = address;

            connectToAddress(address, port, timeout);
            connected = true;
        } finally {
            if (!connected) {
                try {
                    close();
                } catch (IOException ioe) {
                    /* Do nothing. If connect threw an exception then
                       it will be passed up the call stack */
                }
            }
        }
    }

Even the newer implementation will do the same:

// AbstractPlainSocketImpl.java

  /**
     * Creates a socket and connects it to the specified address on
     * the specified port.
     * @param address the address
     * @param timeout the timeout value in milliseconds, or zero for no timeout.
     * @throws IOException if connection fails
     * @throws  IllegalArgumentException if address is null or is a
     *          SocketAddress subclass not supported by this socket
     * @since 1.4
     */
    protected void connect(SocketAddress address, int timeout)
            throws IOException {
        boolean connected = false;
        try {
            if (address == null || !(address instanceof InetSocketAddress))
                throw new IllegalArgumentException("unsupported address type");
            InetSocketAddress addr = (InetSocketAddress) address;
            if (addr.isUnresolved())
                throw new UnknownHostException(addr.getHostName());
            this.port = addr.getPort();
            this.address = addr.getAddress();

            connectToAddress(this.address, port, timeout);
            connected = true;
        } finally {
            if (!connected) {
                try {
                    close();
                } catch (IOException ioe) {
                    /* Do nothing. If connect threw an exception then
                       it will be passed up the call stack */
                }
            }
        }
    }

And just in case you wonder: The latest NioSocketImpl.java from Java 15 is not much different. Any Socket method will eventually have a call to InetAddress.getByName(hostname);. This method already filters out every IP except the first early.

Where to go from here, java?

So, what can we do to enhance Java’s resolving strategy? It turns out, there is not much we can actually do! All the interfaces in Java’s InetAddress.java class are non-public.

Well, I hacked together a small javaagent. A javaagent is a simple sidecar code container, which lives in the java process. My agent will modify some internal classes and make them resolve the IP address similar to Python’s approach.

You can find the bmhm/nameserviceagent on GitHub. It will, once installed, try every resolved IP three times for 100ms, before throwing an exception. It is tested up to Java 15 and requires Java 8 due to lambda usage.

Further reading

Many attempts have been made to have Oracle implement a better algorithm or an interface, which allows applications to modify DNS name resolution behaviour:

Article history

2020-10-08: In an earlier version I omitted the java socket factory from the java section. I added a paragraph which hints to the section inside the python section.

2020-09-24: I added a clarification why I think the Python code corresponds to the java code. Python will not resolve and return any IPs until it has the destination port, while java will just call getByName(hostname) and then getAddress() at some point in Socket.java.

Thanks to @nottycode for pointing this out.