logoalt Hacker News

cloudbonsaitoday at 9:56 AM0 repliesview on HN

> There is no caching of a "utf-8 representation".

No there certainly is. This is documented in the official API documentation:

    UTF-8 representation is created on demand and cached in the Unicode object.

    https://docs.python.org/3/c-api/unicode.html#unicode-objects
In particular, Python's Unicode object (PyUnicodeObject) contains a field named utf8. This field is populated when PyUnicode_AsUTF8AndSize() is first called and reused thereafter. You can check the exact code I'm talking about here:

https://github.com/python/cpython/blob/main/Objects/unicodeo...

Is it clear enough?