With mass information you could infer much more from pictures. With some sort of standard cube in the picture as well as taking a picture at an angle that emphasizes all three dimensions you could also better estimate the relative volume.
It’s tractable I think, but not from a pic alone.
Yes one could potentially increase accuracy greatly. One big problem would be occlusion.
There is already a solution to this that would be very hard to beat (and one can choose to use or not use an LLM to assist): prepare food yourself and use the information provided by the manufacturer.