Yeah, this tracks! The steps are basically 1) determine user facing elements 2) determine strings 3) map user facing elements to the string. (We use the ast and no llms for this)
The upside of this approach is that we get a lot of context for accurate translation. The other upside is that down the line we can pull off fully automatic translation, but as others have pointed out, this is more of a gimmick. We think it's cool but it's more like the cherry on top
Also, yeah, that pattern would make life infinitely easier. Most develors really should think like this already, and not mix user facing strings with strings for other logic. But from what ive seen, pre i18n, devs dont think like this. Someday...