Have you ever wondered how your program will connect to other hosts?
Well, I can tell you the answer for Java (any version) and compare it to Python. It is a 90’s implementation, and there is not even a good way to fix it.
How DNS Resolving works
As you probably already know, DNS IP resolution works by querying a DNS server with a host name. As a response, you will get a list of IP addresses (in the answer section, if the host name is known). It works like this:
dig +norrcomments +nostats TXT google.com
; <<>> DiG 9.16.1-Ubuntu <<>> +norrcomments +nostats TXT google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 33087
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.com. IN TXT
;; ANSWER SECTION:
google.com. 3521 IN TXT "facebook-domain-verification=22rm551cu4k0ab0bxsw536tlds4h95"
google.com. 3521 IN TXT "v=spf1 include:_spf.google.com ~all"
google.com. 221 IN TXT "docusign=1b0a6754-49b1-4db5-8540-d2c12664b289"
google.com. 3521 IN TXT "globalsign-smime-dv=CDYX+XFHUw2wml6/Gb8+59BsH31KzUr6c1l2BPvqKX8="
google.com. 221 IN TXT "docusign=05958488-4752-4ef2-95eb-aa7ba8a3bd0e"
As we can see here, when querying for a TXT record, multiple records are provided in the answer section. The same may be true for querying an IP address, like so.
dig +nostats google.de
; <<>> DiG 9.16.1-Ubuntu <<>> +nostats google.de
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14434
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;google.de. IN A
;; ANSWER SECTION:
google.de. 54 IN A 142.250.74.195
Although in this example just one IP address is being returned, a valid answer can also return multiple IPs. RFC 1034, section 5.2.1 Client-resolver interface: typical functions:
Since the DNS does not preserve the order of RRs, this function may choose to sort the returned addresses or select the "best" address if the service returns only one choice to the client. Note that a multiple address return is recommended […]
https://tools.ietf.org/html/rfc1034#section-5.2.1
How the client should react to this
Sadly, there is no way to tell how the client selects the IP it connects to. There is RFC 1035 says about multi homed hosts:
Application protocol implementations SHOULD be prepared to try multiple addresses from the list until success is obtained.
HTTPS://TOOLS.IETF.ORG/HTML/RFC1123#SECTION-2
How Java handles name resolving
Java’s name resolving algorith in a nutshell:
InetAddress.java
// source: https://github.com/openjdk/jdk/blob/270674ce1b1b8d44bbe92949c3f7db7b7c767cac/src/java.base/share/classes/java/net/InetAddress.java#L1236-L1239
public class InetAddress {
public static InetAddress getByName(String host)
throws UnknownHostException {
return InetAddress.getAllByName(host)[0];
}
}
Wow, that is remarkably simple! How do we know that this IP will be reachable? Well, we do not! If there are more IPs in the answer section, they are just being ignored.
Where is this code being called? Read on, I will explain this in the python part!
Python needs a destination port and protocol first
Let us compare this to Python’s implementation, taken from the CPython source code:
// source: https://github.com/python/cpython/blob/e42b705188271da108de42b55d9344642170aa2b/Lib/socket.py#L707-L719
def create_connection(address, timeout=_GLOBAL_DEFAULT_TIMEOUT,
source_address=None):
host, port = address
err = None
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
af, socktype, proto, canonname, sa = res
sock = None
try:
sock = socket(af, socktype, proto)
if timeout is not _GLOBAL_DEFAULT_TIMEOUT:
sock.settimeout(timeout)
if source_address:
sock.bind(source_address)
sock.connect(sa)
# Break explicitly a reference cycle
err = None
return sock
except error as _:
err = _
if sock is not None:
sock.close()
if err is not None:
try:
raise err
finally:
# Break explicitly a reference cycle
err = None
else:
raise error("getaddrinfo returns an empty list")
That is a much more sophisticated approach! As you can see, line 7 loops over all returned IPs. Then there is a try-catch statement in lines 10 and 21, which tries to connect using a default timeout (set to an empty object, i.e. empty optional in python). The actual connection attempt is in line 16. If line 16 runs into a timeout (or other problem, like having no route), the next IP is tried until all available IP addresses are exhausted.
So, where does it actually get the IPs from? It’s also in line 7 where getaddrinfo()
is called.
Back to Java: Socket already lost all IPs
Java does not do this. Actually, in Socket.java
you can find this code:
// source: https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/net/Socket.java#L634-L637
public
class Socket implements java.io.Closeable {
// […]
public void connect(SocketAddress endpoint, int timeout) throws IOException {
// […]
InetSocketAddress epoint = (InetSocketAddress) endpoint;
InetAddress addr = epoint.getAddress ();
int port = epoint.getPort();
checkAddress(addr, "connect");
// […]
}
// […]
}
As you can see, in line 10 InetSocketAddress.getAddress()
is being called without any additional port information.
You can even dig down further:
// AbstractPlainSocketImpl.java
/**
* Creates a socket and connects it to the specified port on
* the specified host.
* @param host the specified host
* @param port the specified port
*/
protected void connect(String host, int port)
throws UnknownHostException, IOException
{
boolean connected = false;
try {
InetAddress address = InetAddress.getByName(host);
this.port = port;
this.address = address;
connectToAddress(address, port, timeout);
connected = true;
} finally {
if (!connected) {
try {
close();
} catch (IOException ioe) {
/* Do nothing. If connect threw an exception then
it will be passed up the call stack */
}
}
}
}
Even the newer implementation will do the same:
// AbstractPlainSocketImpl.java
/**
* Creates a socket and connects it to the specified address on
* the specified port.
* @param address the address
* @param timeout the timeout value in milliseconds, or zero for no timeout.
* @throws IOException if connection fails
* @throws IllegalArgumentException if address is null or is a
* SocketAddress subclass not supported by this socket
* @since 1.4
*/
protected void connect(SocketAddress address, int timeout)
throws IOException {
boolean connected = false;
try {
if (address == null || !(address instanceof InetSocketAddress))
throw new IllegalArgumentException("unsupported address type");
InetSocketAddress addr = (InetSocketAddress) address;
if (addr.isUnresolved())
throw new UnknownHostException(addr.getHostName());
this.port = addr.getPort();
this.address = addr.getAddress();
connectToAddress(this.address, port, timeout);
connected = true;
} finally {
if (!connected) {
try {
close();
} catch (IOException ioe) {
/* Do nothing. If connect threw an exception then
it will be passed up the call stack */
}
}
}
}
And just in case you wonder: The latest NioSocketImpl.java
from Java 15 is not much different. Any Socket method will eventually have a call to InetAddress.getByName(hostname);
. This method already filters out every IP except the first early.
Where to go from here, java?
So, what can we do to enhance Java’s resolving strategy? It turns out, there is not much we can actually do! All the interfaces in Java’s InetAddress.java
class are non-public.
Well, I hacked together a small javaagent. A javaagent is a simple sidecar code container, which lives in the java process. My agent will modify some internal classes and make them resolve the IP address similar to Python’s approach.
You can find the bmhm/nameserviceagent on GitHub. It will, once installed, try every resolved IP three times for 100ms, before throwing an exception. It is tested up to Java 15 and requires Java 8 due to lambda usage.
Further reading
Many attempts have been made to have Oracle implement a better algorithm or an interface, which allows applications to modify DNS name resolution behaviour:
- JDK-8134577 : Eliminate or standardize a replacement for sun.net.spi.nameservice.NameServiceDescriptor.
- JDK-8201428 : Provide a standard API for name resolution.3.
- JDK-8192780 : Consider restoring DNS SPI.
Article history
2020-10-08: In an earlier version I omitted the java socket factory from the java section. I added a paragraph which hints to the section inside the python section.
2020-09-24: I added a clarification why I think the Python code corresponds to the java code. Python will not resolve and return any IPs until it has the destination port, while java will just call getByName(hostname)
and then getAddress()
at some point in Socket.java
.
Thanks to @nottycode for pointing this out.