This seems like very low hanging fruit. How is the core loop not already hyper optimized?
I'd have expected it to be hand rolled assembly for the major ISAs, with a C backup for less common ones.
How much energy has been wasted worldwide because of a relatively unoptimized interpreter?
Python’s goal is never really to be fast. If that were its goal, it would’ve had a JIT long ago instead of toying with optimizing the interpreter. Guido prioritized code simplicity over speed. A lot of speed improvements including the JIT (PEP 744 – JIT Compilation) came about after he stepped down.
This is (a) wildly over expectations for open source and (b) a massive pain to maintain, and (c) not even the biggest timewaster of python, which is the packaging "system".
Probably because anyone concerned with performance wasn’t running workloads on Windows to begin with.
Software has gotten so slow we've forgotten how fast computers are
If you want fast just use pypy and forget about cpython.
Quite to the contrary, I'd say this update is evidence of the inner loop being hyperoptimized!
MSVC's support for musttail is hot off the press:
> The [[msvc::musttail]] attribute, introduced in MSVC Build Tools version 14.50, is an experimental x64-only Microsoft-specific attribute that enforces tail-call optimization. [1]
MSVC Build Tools version 14.50 was released last month, and it only took a few weeks for the CPython crew to turn that around into a performance improvement.
[1] https://learn.microsoft.com/en-us/cpp/cpp/attributes?view=ms...