The main scenario that caused me a lot of grief is temporary RAM usage spikes, like a single process run during a build that uses ~8gb of RAM or more for a mere few seconds and then exits. In some cases the oom killer was reaping the wrong process or the build was just failing cryptically and if I examined stuff like top I wouldn't see any issue, plenty of free RAM. The tooling for examining this historical memory usage is pretty bad, my only option was to look at the oom killer logs and hope that eventually the culprit would show up.
Thanks for the tip about vm.overcommit_ratio though, I think it's set to the default.
you can get statistics off cgroups to get idea what it was (assuming it's a service and not something user ran), but that requires probing it often enough