IMO the take-away from LMAX is not ring buffers - it's the knowledge of how much useful work a single CPU core can do. It's a story of playing to hardware's strengths instead of wrapping yourself up in bullshit excuses. They realized their problem was fundamentally not parallelizable, so they wrote it to run serially as fast as possible instead of wrapping themselves up in bullshit excuses, and the resulting performance was much faster than anyone would have ever guessed if they hadn't done it.