> Uses ubershaders to guarantee no stutters due to pipeline compilation. I may sound way out of...

mouse_ • 02/20/2025 • 9 replies • view on HN

> Uses ubershaders to guarantee no stutters due to pipeline compilation.

I may sound way out of the loop here, but... How come this was never a problem for older dx9/dx11/GL games and emulators?

Replies

tgtweak • 02/20/2025

In those gl/dx games (built for non-specific hardware) all the textures and shaders are compiled either during the game's build OR before you get into the scene. Many console systems, particularly Nintendo, do that precompilation specifically for the hardware GPU that is inside the console. That is not known to the emulator in advance (unless someone publishes a shader compilation alongside the rom...) so when the shader is referenced in the scene, it needs to be compiled in runtime to work on the emulated graphics system (translated from nintendo-hardware shader code to direct-x, vulkan or openGL then further into the vendor-specific shader)

Most modern emulators implement a shader cache which stores those shaders as they are encountered so that this "compilation stutter" only happens once per shader - but modern titles can have hundreds or thousands of shaders and that means on a playthrough you're pretty much encountering it consistently. Breath of the Wild was one that stands out as a game where you basically had to run it with precompiled shader caches as it was borderline unplayable without it.

Ubershaders act like fallback shaders - using an off the shelf precompiled "particle" shader vs the actual one, while the actual one is compiled for use next time - this prevents the stutter at a cost of visual fidelity. If you see an explosion in a game, it will be a generic explosion shader vs the actual one used in the game, until it is available in the shader cache.

➕ show 2 replies

dcrazy • 02/20/2025

It was, in fact, a problem. DX11 and earlier tried to solve it with DXBC, an intermediate bytecode format that all drivers could consume. The driver would only need to lower the bytecode to the GPU’s ISA, which is much faster than a full compilation from HLSL. Prior to the emergence of Vulkan, OpenGL didn’t ever try to solve this; GLSL was always the interface to the driver. (Nowadays SPIR-V support is available to OpenGL apps if your driver implements the GL_ARB_gl_spirv extension, and of course DXIL has replaced DXBC.)

Compilation stutters was perhaps less noticeable in the DX9/OpenGL 3 era because shaders were less capable, and games relied more on fixed functionality which was implemented directly in the driver. Nowadays, a lot of the legacy API surface is actually implemented by dynamically written and compiled shaders, so you can get shader compilation hitches even when you aren’t using shaders at all.

In the N64 era of consoles, games would write ISA (“microcode”) directly into the GPU’s shared memory, usually via a library. In Nintendo’s case, SGI provided two families of libraries called “Fast3D” and “Turbo3D”. You’d call functions to build a “display list”, which was just a buffer full of instructions that did the math you wanted the GPU to do.

➕ show 1 reply

babypuncher • 02/20/2025

Epic published a very well written article explaining what the problem is, what makes it so much worse in modern games, and why it's a uniquely difficult problem to solve.

https://www.unrealengine.com/en-US/tech-blog/game-engines-an...

thereddaikon • 02/20/2025

Older games used precompiled shaders. These are inaccessible to the game devs and usually handled by the hardware makers, so the platform OEM for consoles and the video card OEM on PCs. Game devs have begged for the ability to write their own shaders for years and finally got it with DX11 and Vulkan. And that's when things went to hell. Instead of the shaders being written and compiled for the specific hardware, they now have to be compiled for your GPU at run time. It's a messy and imperfect process. EA, Ubisoft or anyone else is never going to have the same level of understanding of a GPU that Nvidia or AMD will have. Often the stuttering is due to the shaders having to be recompiled in game, something that never happened before.

➕ show 1 reply

garaetjjte • 02/20/2025

It was, but it was made worse by DX12/Vulkan. Previously shader stages were separate and fixed pipeline settings were loose bag of knobs. If that corresponded to hardware stages, great, everybody was happy! However, if there was mismatch between API and hardware (for example, some fixed pipeline state wasn't really hardware state but lowered to shader code) then driver needed to hack around it. If that could be done by patching the shader, it was ugly, but it worked fine. But if it required recompilation, then stutters occurred and it couldn't be prevented by application because it was all hidden from it. The solution was thought to encapsulate everything in pipeline state objects. If you could do that, excellent, no stutters no matter on what hardware it runs. It has issues however: in reality applications (especially emulators) don't always know beforehand which states they will need, previous ugly driver tricks that sometimes could have avoided recompilation by patching shaders are no longer possible, and even if the hardware does have easily switchable state API doesn't expose that leading to combinatorial explosion of state objects that was previously unnecessary. Some of that was rolled back by introducing API extensions that allow more granular switches of partial state.

edflsafoiewq • 02/20/2025

The oldest N64 emulators predate programmable shaders. They mapped possible configurations of the N64 GPU onto configurations of the host GPU. But this is really hard and some configurations are just impossible. It's basically a huge list of tricks and special cases. For reference, I think it was like 16K lines for the color combiner, which basically does (A-B)*C+D.

CrossVR • 02/20/2025

To answer that question in the context of Vulkan I highly recommend reading the proposal document for VK_EXT_shader_object which contains a great explanation in the problem statement about how modern graphics APIs ended up in this situation:

https://github.com/KhronosGroup/Vulkan-Docs/blob/main/propos...

The gist of it is that graphics APIs like DX11 were designed around the pipelines being compiled in pieces, each piece representing a different stage of the pipeline. These pieces are then linked together at runtime just before the draw call. However the pieces are rarely a perfect fit requiring the driver to patch them or do further compilation, which can introduce stuttering.

In an attempt to further reduce stuttering and to reduce complexity for the driver Vulkan did away with these piece-meal pipelines and opted for monolithic pipeline objects. This allowed the application to pre-compile the full pipeline ahead of time alleviating the driver from having to piece the pipeline together at the last moment.

If implemented correctly you can make a game with virtually no stuttering. DOOM (2016) is a good example where the number of pipeline variants was kept low so it could all be pre-compiled and its gameplay greatly benefits from the stutter-free experience.

This works great for a highly specialized engine with a manageable number of pipeline variants, but for more versatile game engines and for most emulators pre-compiling all pipelines is untenable, the number of permutations between the different variations of each pipeline stage is simply too great. For these applications there was no other option than to compile the full pipeline on-demand and cache the result, making the stutter worse than before since there is no ability to do piece-meal compilation of the pipeline ahead of time.

This gets even worse for emulators that attempt to emulate systems where the pipeline is implemented in fixed-function hardware rather than programmable shaders. On those systems the games don't compile any piece of the pipeline, the game simply writes to a few registers to set the pipeline state right before the draw call. Even piece-meal compilation won't help much here, thus ubershaders were used instead to emulate a great number of hardware states in a single pipeline.

Jasper_ • 02/20/2025

It happened all the time. I've played plenty of D3D11 games where clicking to shoot my gun would stutter the first time it happened as it was compiling shaders for the bullet fire particle effects.

Driver caches mean that after everything gets "prewarmed", it won't happen again.

an-unknown • 02/21/2025

I think there is some confusion about "ubershaders" in the context of emulators in particular. Old Nintendo consoles like the N64 or the GameCube/Wii didn't have programmable shaders. Instead, it was a mostly fixed-function pipeline but you could configure some stages of it to kind of somewhat fake "programmable" shaders with this configurable pipeline, at least to some degree. Now the problem is, you have no idea what any particular game is going to do, right until the moment it writes a specific configuration value into a specific GPU register, which instantly configures the GPU to do whatever the game wants it to do from that very moment onwards. There literally is no "shader" stored in the ROM, it's just code configuring (parts of) the GPU directly.

That's not how any modern GPU works though. Instead, you have to emulate this semi-fixed-function pipeline with shaders. Emulators try to generate shader code for the current GPU configuration and compile it, but that takes time and can only be done after the configuration was observed for the first time. This is where "Ubershaders" enter the scene: they are a single huge shader which implements the complete configurable semi-fixed-function pipeline, so you pass in the configuration registers to the shader and it acts accordingly. Unfortunately, such shaders are huge and slow, so you don't want to use them unless it's necessary. The idea is then to prepare "ubershaders" as fallback, use them whenever you see a new configuration, compile the real shader and cache it, and use the compiled shader once it's available instead of the ubershader, to improve performance again. A few years ago, the developers of the Dolphin emulator (GameCube/Wii) wrote an extensive blog post about how this works: https://de.dolphin-emu.org/blog/2017/07/30/ubershaders/

Only starting with the 3DS/Wii U, Nintendo consoles finally got "real" programmable shaders, in which case you "just" have to translate them to whatever you need for your host system. You still won't know which shaders you'll see until you observe the transfer of the compiled shader code to the emulated GPU. After all, the shader code is compiled ahead of time to GPU instructions, usually during the build process of the game itself. At least for Nintendo consoles, there are SDK tools to do this. This, of course, means, there is no compilation happening on the console itself, so there is no stutter caused by shader compilation either. Unlike in an emulation of such a console, which has to translate and recompile such shaders on the fly.

> How come this was never a problem for older [...] emulators?

Older emulators had highly inaccurate and/or slow GPU emulation, so this was not really a problem for a long time. Only once the GPU emulations became accurate enough with dynamically generated shaders for high performance, the shader compilation stutters became a real problem.

➕ show 1 reply

alt Hacker News

Replies