Yes one could potentially increase accuracy greatly. One big problem would be occlusion.
There is already a solution to this that would be very hard to beat (and one can choose to use or not use an LLM to assist): prepare food yourself and use the information provided by the manufacturer.
If you consider time at all what you suggest is hardly a solution. It is the most accurate, but even 50% accuracy at orders of magnitude faster to calculate would be more useful for the main use case which is losing weight.
However for diabetes accuracy is likely preferred and I’m not sure any computer vision would be palatable.