I am currently experimenting with linux based GUIs. It was always something that felt clunky to me, but now with more insights, it's clunky for a reason. If you need more then a framebuffer, then rendering something sophisticated to the screen is insanely complex. Somehow it's easy to expect that rendering text on a screen should be easy, but when you go down the layers you find yourself with a club and a flint stone trying to build a castle with it.
Wayland is another product of this hardships, going wayland native seems only feasible when all stars align around it. But then you are stuck in that place.
That being said, without deeper knowledge about SwiftUI, I find it a bit odd to expect so much from a novel concept. Native desktop dev is already kind of niche, considering the dominance of web dev. Chrome (and it's artifacts) is probably the best funded software in the world and google's incentive to improve it is above all. It's not a miracle that it just works. It's effort and tons of cash.
> Somehow it's easy to expect that rendering text on a screen should be easy
This is a common misconception among programmers, and is actually the opposite of the truth. Drawing arbitrary geometric shapes is easy, rendering text correctly is insanely difficult because ... humans.