The title seems to be clickbait (the 13 foods in the paper didn't even have ranges such a title would be possible) but the results/paper are much more on point.
It'd be really interesting if it evaluated humans on the exact same image sets. The correct answer is just to feed in more data, such as the exact food itself, but the post makes it sound like it's using a model that is the only risk in this approach to counting carbs.