logoalt Hacker News

tgtweaklast Thursday at 3:41 PM2 repliesview on HN

In those gl/dx games (built for non-specific hardware) all the textures and shaders are compiled either during the game's build OR before you get into the scene. Many console systems, particularly Nintendo, do that precompilation specifically for the hardware GPU that is inside the console. That is not known to the emulator in advance (unless someone publishes a shader compilation alongside the rom...) so when the shader is referenced in the scene, it needs to be compiled in runtime to work on the emulated graphics system (translated from nintendo-hardware shader code to direct-x, vulkan or openGL then further into the vendor-specific shader)

Most modern emulators implement a shader cache which stores those shaders as they are encountered so that this "compilation stutter" only happens once per shader - but modern titles can have hundreds or thousands of shaders and that means on a playthrough you're pretty much encountering it consistently. Breath of the Wild was one that stands out as a game where you basically had to run it with precompiled shader caches as it was borderline unplayable without it.

Ubershaders act like fallback shaders - using an off the shelf precompiled "particle" shader vs the actual one, while the actual one is compiled for use next time - this prevents the stutter at a cost of visual fidelity. If you see an explosion in a game, it will be a generic explosion shader vs the actual one used in the game, until it is available in the shader cache.


Replies

zeta0134last Thursday at 4:30 PM

That's not quite how ubsershaders work. They're a "fallback" shader in the sense that they rather inefficiently implement the entire pipeline, but they do implement the entire pipeline. The shader being compiled in another thread will be more efficient as it uses only the logic needed for whatever configuration the game is calling up. But the visual result is identical in the ubsershaders case, that's the whole point. If you want, and your host system is powerful enough, you can turn ubsershaders on all the time and disable the entire compilation thread and associated cache.

I believe the term was coined by Dolphin team, who did a pretty good high level writeup of the feature here:

https://dolphin-emu.org/blog/2017/07/30/ubershaders/

show 1 reply
derefrlast Thursday at 8:10 PM

So how about:

1. A global, networked shader cache — where when any instance of the emulator encounters a new shader, it compiles it, and then pushes the KV-pair (ROM hash, target platform, console shader object-code hash)=(target-platform shader object-code) into some KV server somewhere; and some async process comes along periodically to pack all so-far-submitted KV entries with a given (ROM hash, target platform) prefix into shader-cache packfiles. On first load of a game, the emulator fetches the packfile if it exists, and loads the KV pairs from it into the emulator's local KV cache. (In theory, the emulator could also offer the option to fetch global-shader-cache-KV-store "WAL segment" files — chunks of arbitrary global-shader-cache KV writes — as they're published on a 15-minute-ly basis. Or KV entries for given (ROM hash, target) prefixes could be put into message-queue topics named after those prefixes, to which running instances of the emulator could subscribe. These optimizations might be helpful when e.g. many people are playing a just-released ROMhack, where no single person has yet run through the whole game to get it in the cache yet. Though, mind you, the ROMhack's shaders could already be cached into the global store before release, if the ROMhacker used the emulator during development... or if they knew about this, and were considerate enough to use some tool created by the emulator dev to explicitly compile + submit their raw shader project files into the global KV store.)

2. Have the emulator (or some separate tool) "mine out" all the [statically-specified] shaders embedded from the ROM, as a one-time process. (Probably not just a binwalk, because arbitrary compression. Instead, think: a concolic execution of the ROM, that is looking for any call to the "load main-memory region into VRAM as shader" GPU instruction — where there is a symbolically-emulated memory with regions that either have concrete or abstract values. If the RAM region referenced in this "load as shader" instruction is statically determinable — and the memory in that region has a statically-determinable value on a given code-path — then capture that RAM region.) Precompile all shaders discovered this way create a "perfect" KV cachefile for the game. Publish this into a DHT (or just a central database) under the ROM's hash. (Think: OpenSubtitles.org)

Mind you, I think the best strategy would actually combine the two approaches — solution #2 can virtually eliminate stutter with a single pre-processing step, but it doesn't allow for caching of dynamically-procedurally-generated shaders. Solution #1 still has stutter for at least one player, one time, for each encountered shader — but it handles the case of dynamic shaders.

show 1 reply