afaik this is a non-issue with modern text rendering engines. Modern font files include rulesets to determine the forms and shaping engines apply these rules to eventually reach the desired "shape" (i.e. order, position and which glyphs to render). For example, if you use HarfBuzz it should be able to calculate the Glyphs and offsets you need for a properly set script.
I personally spent way to much time trying to understand it, but at least according to this video (https://www.youtube.com/watch?v=VaA0v0V4RsU) it really is not that difficult if you leave out all the font-selection and emoji shenanigans.
I think at least FreeType (glyph rendering) and HarfBuzz (text shaping) make it needlessly complex through their documentation. It is extensive in describing what the parts do, but the only way to figure out what you need is by fiddling around. As soon as you want to do more complex stuff you're on your own. Especially figuring out which parts you don't need is annoying.