The detail that -wavonly (falling back to the older WinMM API instead of DirectSound) actually gave the highest frame rate is a perfect example of a lesson that keeps reappearing in systems programming: "more direct" doesn't always mean faster when you're CPU-bound. DirectSound's lower latency came at the cost of more CPU cycles that could otherwise go to rendering.