This is... not an example of good optimization.
Focusing on micro-"optimizations" like this one do absolutely nothing for performance (how many times are you actually calling Instance() per frame?) and skips over the absolutely-mandatory PROFILE BEFORE YOU OPTIMIZE rule.
If a coworker asked me to review this CL, my comment would be "Why are you wasting both my time and yours?"
Meyer's implementation (with static block variable) is elegant and thread safe, and retains lazy initialization which can be important for the initialization order. https://www.modernescpp.com/index.php/thread-safe-initializa...
I haven't written C++ in a long time, but isn't the issue here that the initialization order of globals in different translation units is unspecified? Lazy initialization avoids that problem at very modest cost.
I liked using singletons back in the day, but now I simply make a struct with static members which serves the same purpose with less verbose code. Initialization order doesn't matter if you add one explicit (and also static) init function, or a lazy initialization check.
Honestly the guard overhead is a non-issue in practice — it's one atomic check after first init. The real problem with the static data member approach is initialization order across translation units. If singleton A touches singleton B during startup you get fun segfaults that only show up in release builds with a different link order.
I ended up using std::call_once for those cases. More boilerplate but at least you're not debugging init order at 2am.
It is strange to use lightdm and gdm as examples, which are both written in C (if nothing has changed recently).
i am not sure why this entire article is warranted :o) just use `std::call_once` and you are all set.
Nice breakdown. I’m curious how often the guard check for a function-local static actually shows up in real profiles. In most codebases Instance() isn’t called in tight loops, so the safety of lazy initialization might matter more than a few extra instructions. Has anyone run into this being a real bottleneck in practice?
The performance observation is real but the two approaches are not equivalent, and the article doesn't mention what you're actually trading away, which is the part that matters.
The C++11 threadsafety guarantee on static initialization is explicitly scoped to block local statics. That's not an implementation detail, that's the guarantee.
The __cxa_guard_acquire/release machinery in the assembly is the standard fulfilling that contract. Move to a private static data member and you're outside that guarantee entirely. You've quietly handed that responsibility back to yourself.
Then there's the static initialization order fiasco, which is the whole reason the meyers singleton with a local static became canonical. Block local static initializes on first use, lazily, deterministically, thread safely. A static data member initializes at startup in an order that is undefined across translation units. If anything touches Instance() during its own static initialization from a different TU, you're in UB territory. The article doesn't mention this.
Real world singleton designs also need: deferred/configuration-driven initialization, optional instantiation, state recycling, controlled teardown. A block local static keeps those doors open. A static data member initializes unconditionally at startup, you've lost lazy-init, you've lost the option to not initialize it, and configuration based instantiation becomes awkward by design.
Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.