Author here, thanks for your perspective. Here some thoughts:
> approach of separating the simulation and presentation layers isn't all that uncommon
I agree that some level of separation is is not that uncommon, but games usually depend on things from their respective engine, especially on things like datatypes (e.g. Vector3) or math libraries. The reason I mention that our game is unique in this way is that its non-rendering code does not depend on any Unity types or DLLs. And I think that is quite uncommon, especially for a game made in Unity.
> Most games don't ship on the mono backend, but instead on il2cpp
I think this really depends. If we take absolute numbers, roughly 20% of Unity games on Steam use IL2CPP [1]. Of course many simple games won't be using it so the sample is skewed is we want to measure "how many players play games with IL2CPP tech". But there are still many and higher perf of managed code would certainly have an impact.
We don't use IL2CPP because we use many features that are not compatible with it. For example DLC and mods loading at runtime via DLLs, reflection for custom serialization, things like [FieldOffset] for efficient struct packing and for GPU communication, etc.
Also, having managed code makes the game "hackabe". Some modders use IL injection to be able to hook to places where our APIs don't allow. This is good and bad, but so far this allowed modders to progress faster than we expected so it's a net positive.
> In modern Unity, if you want to achieve performance, you'd be better off taking the approach of utilizing the burst compiler and HPC#
Yeah, and I really wish we would not need to do that. Burst and HPC# are messy and add a lot of unnecessary complexity and artificial limitations.
The thing is, if Mono and .NET were both equally "slow", then sure, let's do some HPC# tricks to get high performance, but it is not! Modern .NET is fast, but Unity devs cannot take advantage of it, which is frustrating.
By the way, the final trace with parallel workers was just C#'s workers threads and thread pool.
> Profiling the editor is always a fools errand
Maybe, but we (devs) spend 99% of our time in the editor. And perf gains from editor usually translate to the Release build with very similar percentage gains (I know this is generally not true, but in my experience it is). We have done many significant optimizations before and measurements from the editor were always useful indicator.
What is not very useful is Unity's profiler, especially with "deep profile" enabled. It adds constant cost per method, highly exaggerating cost of small methods. So we have our own tracing system that does not do this.
> I've seen a lot of mention around GC through this comment section, and professional Unity projects tend to go out of their way to minimize these at runtime
Yes, minimizing allocations is key, but there are many cases where they are hard to avoid. Things like strings processing for UI generates a lot of garbage every frame. And there are APIs that simply don't have an allocation-free options. CoreCLR would allow to further cut down on allocations and have better APIs available.
Just the fact that the current GC is non-moving means that the memory consumption goes up over time due to fragmentation. We have had numerous reports of "memory" leaks where players report that after periodic load/quit-to-menu loops, memory consumption goes up over time.
Even if we got fast CoreCLR C# code execution, these issues would prevail, so improved CG would be the next on the list.
[1] https://steamdb.info/stats/releases/?tech=SDK.UnityIL2CPP
Hey there, always appreciate a dialog
Per the separation, I think this was far more common both in older unity games, and also professional settings.
For games shipping on mono on steam, that statistic isn't surprising to me given the amount of indie games on there and Unity's prevalence in that environment. My post in general can be read in a professional setting (ie, career game devs). The IL injection is a totally reasonable consideration, but does (currently) lock you out of platforms where AoT is a requirement. You can also support mods/DLC via addressables, and there has been improvement of modding tools for il2cpp, however you're correct it's not nearly as easy.
Going to completely disagree that Burst and HPC# are unnecessary and messy. This is for a few reasons. The restrictions that HPC# enforce essentially are the same you already have if you want to write performant C# code as you just simply use Unity's allocators for your memory up front and then operate on those. Depending on how you do this, you either can eliminate your per frame allocations, or likely eliminate some of the fragmentation you were referring to. Modern .Net is fast, of course, but it's not burst compiled HPC# fast. There are so many things that the compiler and LLVM can do based on those assumptions. Agreed C# strings are always a pain if you actually need to interpolate things at runtime. We always try to avoid these as much as we can, and intern common ones.
The fragmentation you mention on after large operations is (in my experience) indicative of save/load systems, or possibly level init code that do tons of allocations causing that to froth up. That or tons of reflection stuff, which is also usually nono for runtime perf code. The memory profiler used to have a helpful fragmentation view for that, but Unity removed it unfortunately.
>We don't use IL2CPP because we use many features that are not compatible with it. For example DLC and mods loading at runtime via DLLs, reflection for custom serialization, things like [FieldOffset] for efficient struct packing and for GPU communication, etc.
FieldOffset is supported by IL2CPP at compile time [0]. You can also install new DLLs and force the player to restart if you want downloadable mod support.
It's true that you can't do reflection for serialization, but there are better, more performant alternatives for that use case, in my experience.
[0] https://docs.unity3d.com/Manual/scripting-restrictions.html