The surfaces are all of different size, so the code would have to be more complex, resizing some underlying buffer on demand. You'd have to split up the text rendering into an API to measure the text and an API to render the text, so that you could resize the buffer. So you'd introduce quite a lot of extra complexity.
And what would be the benefit? You save up to one malloc and free per string you want to render, but text rendering is so demanding it completely drowns out the cost of one allocation.
Why does the buffer need to be resized? Your malloc version allocates a fixed amount of memory on each iteration. You can allocate the same amount of memory ahead of time.
If you were dynamically changing the malloc allocation size on each iteration then you have a case for a growable buffer to do the same, but in that case you would already have all the complexity of which you speak as required to support a dynamically-sized malloc.