LLMs read and write human-code because humans have been reading and writing human-code. The sample size of assembly problems is, in my estimate, too small for LLMs to efficiently read and write it for common use cases.
I liken it to the problem of applying machine learning to hard video games (e.g. Starcraft). When trained to mimic human strategies, it can be extremely effective, but machine learning will not discover broadly effective strategies on a reasonable timescale.
If you convert "human strategies" to "human theory, programming languages, and design patterns", perhaps the point will be clear.
But: could the ouroboric cycle of LLM use decay the common strategies and design patterns we use into inexplicable blobs of assembly? Can LLMs improve at programming if humans do not advance the theory or invent new languages, patterns, etc?
But starcraft training is not through mimicking human strategies - it was pure RL with a reward function shaped around winning, which allows it to emerge non-human and eventually super-human strategies (such as the worker oversaturation).
The current training loop for coding is RL as well - so a departure from human coding patterns is not unexpected (even if departure from human coding structure is unexpected, as that would require development of a new coding language).