logoalt Hacker News

rtgfhyujlast Thursday at 9:29 PM2 repliesview on HN

why would it early stop? examples?


Replies

mickeyplast Friday at 6:28 AM

Models just naturally arrive at a conclusion that they are done. TODO hints can help, but is not infallible: Claude will stop and happily report there's more work to be done and "you just say the word Mister and I'll continue" --- this is a RL problem where you have to balance the chance of an infinite loop (it keeps thinking there's a little bit more to do when there is not) versus the opposite where it stops short of actual completion.

show 1 reply
embedding-shapelast Thursday at 9:35 PM

Not all models are trained with long one-shot task following by themselves, seems many of them prefer closer interactions with the user. You could always add another layer/abstraction above/below to work around it.

show 1 reply