Do you see these trajectories being used to fine tune a model automatically in some way rather then ...

dbish • 05/14/2025 • 1 reply • view on HN

Do you see these trajectories being used to fine tune a model automatically in some way rather then just replay, that way similar workflows might be improved too?

Replies

edunteman • 05/14/2025

I believe explicit trajectories for learned behavior are significantly easier for humans to grok and debug, in contrast to reinforcement learning methods like deep Q-learning, so avoiding the use of models is ideal, but I imagine they'll have their place.

For what that may look like, I'll reuse a brainstorm on this topic that a friend gave me recently:

"Instead of relying on an LLM to understand where to click, the click area itself is the token. And the click is a token and the objective is a token and the output is whatever. Such that, click paths aren't "stored", they're embedded within the training of the LAM/LLM"

Whatever it ends up looking like, as long as it gets the job done and remains debuggable and extensible enough to not immediately eject a user once they hit any level of complexity, I'd be happy for it to be a part of Muscle Mem.

alt Hacker News

Replies