logoalt Hacker News

tomxortoday at 2:02 AM1 replyview on HN

If hung SSH connections are common it's likely due to CGNAT which use aggressively low TCP timeouts. e.g. I've found all UK mobile carriers set their TCP timeout as low as 5 minutes. The "default" is supposed to be 2 hours, you could literally sleep your computer, zero packets, and an SSH connection would continue to work an hour later, and generally speaking this is still true unless CGNAT is in the way.

If you are interested there are a few ways you can fix this:

Easiest is to use a VPN, because the VPN's exit node becomes the effective NAT they usually have normal TCP timeouts due to being less resource constrained. Another nice benefit of this method is you can move between physical networks and your connection doesn't die... If you use Tailscale then you already have this in a more direct way.

Another is to tune the tcp_keepalive kernel parameters. Lowering the keepalive timeout to be less than the CGNAT timeout will cause keepalive probes to prevent CGNAT from dropping the connection even while your SSH connection is technically idle. For Linux I pop these into /etc/sysctl.d/z.conf, I have no idea for Windows or Mac:

  # Keepalive frequently to survive CGNAT
  net.ipv4.tcp_keepalive_time   = 240 
  net.ipv4.tcp_keepalive_intvl  = 60
  net.ipv4.tcp_keepalive_probes = 120
This is really a misuse of these settings, they are supposed to be for checking TCP connections are still alive and clearing them up from the local routing table. Instead the idea is to exploit the probes by sending them more frequently to force idle connections to stay alive in a CGNAT environment (dont worry the probes are tiny and still very infrequent).

_time=240 will send a probe after 4 mins of idle connection instead of the default 2 hours, undercutting the CGNAT timeout. _intvl=60 and _probes=120 mean it will send 120 probes 60 seconds apart (2 hours worth) before considering the connection dead. This will keep it alive for at least 2 hours, but also allows us to have the best of both worlds so that under a nice NAT it keeps the old behaviour, e.g if I temporarily lose my network the SSH connection is still valid after 2 hours, but under CGNAT it will at least not drop the connection after 5 mins so long as I keep my computer on and don't lose the network.

There are also some SSH client keepalive settings but I'm less familiar with them.


Replies

snvzztoday at 5:00 AM

Note this is only an issue if not using IPv6.

CGNAT is for access to legacy IPv4 only.

show 2 replies