On Ephemerality

Short-lived Adventures in the Linux Kernel

The other day I was playing around with the Node.js net module, as one does, when I noticed that calls to server.listen() don’t actually require a port number:

If port is omitted or is 0, the operating system will assign an arbitrary unused port, which can be retrieved by using server.address().port after the ‘listening’ event has been emitted.

const net = require("net");
const server = net.createServer();
server.on("listening", () => {
  console.log(server.address().port); // not 0!
});
server.listen(0);

The OS is choosing the ports here. Trott wrote a script to demonstrate that they’re chosen sequentially, which appeared to be the case on macOS:

$ uname
Darwin
$ node trott.js
create server with port 0
server was assigned port 52111
create server with port 52112
server was assigned port 52112
create server with port 0
server was assigned port 52113

But not on Linux:

$ uname
Linux
$ node trott.js
create server with port 0
server was assigned port 34229
create server with port 34230
server was assigned port 34230
create server with port 0
server was assigned port 34479

So what’s going on with Linux?

Some Yahoo! searches revealed that it’s choosing a port from what is known as the “ephemeral port range”:

$ sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768  60999

Sure enough, messing with that range causes things to blow up:

$ cat two-servers.js
const net = require("net");

function createServer(id) {
  const server = net.createServer();
  server.on("error", err => console.error(id, err));
  server.on("listening", () => console.log(id, server.address().port));
  server.listen(0);
}

createServer(1);
createServer(2);
$ node two-servers.js
1 37049
2 35057
^C
$ node two-servers.js
1 35553
2 39803
^C
$ sudo sysctl --write net.ipv4.ip_local_port_range="32768 32768"
net.ipv4.ip_local_port_range = 32768 32768
$ node two-servers.js
1 32768
2 Error: listen EADDRINUSE: address already in use ::
  code: 'EADDRINUSE',
  errno: -98,
  syscall: 'listen',
  address: '::'
}

But WHERE are those port numbers coming from?

Node is calling into bind(3) from uv__tcp__bind. The interesting work is happening in inet_csk_find_open_port.

Here's the call stack from the server.listen() call down to that function.
  1. inet_csk_find_open_port
  2. inet_csk_get_port (via tcpv6_prot)
  3. __inet6__bind
  4. __sys_bind
  5. uv__tcp_bind (calls into the kernel)
  6. uv_tcp_bind
  7. TCPWrap::Bind
  8. createServerHandle (calls into native Node.js C++)
  9. setupListenHandle
  10. listenInCluster
  11. lookupAndListen
  12. Server.listen

(I used https://code.woboq.org and this blog post to trace through most of the kernel code here. Thank you both for your help.)

It does randomized binary search on the range and returns the first port that isn’t already bound to a socket.

It starts by only trying odd numbers, so as not to conflict with a similar search that happens in connect(3) (which instead starts with even numbers). It also starts with only the lower half of the range, although I couldn’t figure out why. But that explains why all of the ports returned above are odd and below 40000!

And then it happened

A typo!

other_half_scan:
  inet_get_local_port_range(net, &low, &high);
  high++; /* [32768, 60999] -> [32768, 61000[ */
                                            ^ bad

Before I knew it I had 8 “how to send patches to Linus” tabs open. Apparently this is considered a trivial bug, so I just needed to cc trivial@kernel.org for review.

But then I looked a little closer…

The 61000[ just means that 61000 is excluded from the range. No typo after all. Not sure why they didn’t just use regular interval notation like the rest of us Mathematicians. At least I didn’t have to setup Mutt.