One cool trick you could try (although you are probably doing it already) is to include all inputs for some long period (like 1-2 seconds!) in every input packet the client sends to the server.
This way if one input packet gets lost, the very next one getting through will have all the inputs for the last 1-2 seconds, and this greatly improves how well your game will play under packet loss.
When you do this, you can even do an encoding from left -> right for all inputs, and actually, sort of delta encode inputs within the packet! Inputs don't change that much, so you can even get smart with the encoding and optimize it down to basically nothing.
Ah yes, I have heard of this method but the idea of sorting/delta encoding is new to me and now I’m reconsidering!
Maybe some people might find it interesting - I’m relaying packets from peer-to-peer using Cloudflare Realtime, which is like a one-to-many broadcast system for WebRTC. Each peer sends their input packet to Cloudflare, then Cloudflare forwards that on to the 10 other players (for example). It’s cool because (a) 10x less upload bandwidth from the peer (b) people IP addresses are not revealed to their peers and (c) Cloudflare is in 400 datacenters around the world so it adds minimal latency. Cloudflare Realtime is a really cool system that maybe more web game developers should look into!
Unfortunately, I’m paying for all the bandwidth that goes through Cloudflare Realtime and so I have perhaps over optimised on minimising bandwidth by sending only one input per packet. The other part of my equation is I’m getting my server to broadcast authoritative batches of inputs every 100ms or so via TCP, so if a packet gets lost, every peer will eventually receive the input but it might be a bit slow, and it will cause a big rollback that might be noticeable.
Reading your comment makes me think it might not be as expensive as I thought, and maybe I can play around with how long of an input period I resend for. Perhaps there is a better balance to strike between cost and reliability. So thanks for bringing this up!
No need to resend inputs with seq IDs <= the last one acknowledged by the server, right? 1-2 seconds sounds like overkill. Unless the server updates themselves are lower than 0.5-1 Hz, but what kind of game is that? A very sparse world where chances of misprediction are very low?