logoalt Hacker News

harperleetoday at 1:01 PM1 replyview on HN

There is a lot of hate in the comments but there is some merit to the post existing:

  1. Even if the task is unreasonable, it is good to showcase that the LLM will perform poorly - warning not to be used for diabetes.

  2. As it is a probabilistic model, the approach was to execute it multiple times and look at the distribution. They also tried to minimize variance: "All at the lowest randomness setting these models offer.", the post mentions. Yet the variance of the responses is surprising.

  3. A multimodal LLM should be in general able to discriminate between crema catalana and a cheese sandwich, and provide a textual, uncalculated range of how much calories the item has (internet is full with tables for calorie counting and things such as this https://fitia.app/calories-nutritional-information/cheese-sandwich-1205647).

  4. It is not clear that the "expose" surprised / outraged style is just a communication vehicle or if the author really thought that e.g. LLMs could be hypothetically able to provide confidence estimates.

Replies

bcjdjsndontoday at 2:19 PM

Re: 2... I think it's interesting they add arbitrary randomness in the algorithm. The problem of wildly varying outputs to the same input wouldn't exist in the first place

show 1 reply