That's not the best solution for image or video (or audio, or 3D) any more than it is for LLMs (which it also supports.)
OTOH, its the most flexible and likely to have some support for what you are doing for a lot of those, and especially if yoj are combining multiple of them in the same process.
Yes, "best" is subjective and that’s why I put it in quotes. But in the community it’s definitely seen as something users should and do "upgrade" to from less intimidating but less flexible tools if they want the most power, and most importantly, support for bleeding-edge models. I rarely use Comfy myself, FWIW.