FTA: In our "Mobile Actions" evaluation, fine-tuning transformed the model’s reliability, ...

Someone • last Thursday at 10:17 PM • 1 reply • view on HN

FTA: In our "Mobile Actions" evaluation, fine-tuning transformed the model’s reliability, boosting accuracy from a 58% baseline to 85%. This confirms that for edge agents, a dedicated, trained specialist is an efficient path to production-grade performance.

I would be wary of having a LLM with 85% accuracy call tools on my system. Isn’t that fairly far away from production-grade performance?

I also don’t see that the fact that accuracy can be boosted from 50% to 85% is any indication that it can be boosted further.

Replies

all2 • last Thursday at 11:50 PM

There are ways around this. You can push the success rate close to 100% if you use chain of thought and a quorum selection. It isn't great, and it slows response times, but if 85% isn't good enough, you just need to flip the coin about 5 times to get nearly(!) guaranteed results.

➕ show 2 replies

alt Hacker News

Replies