Do you see these trajectories being used to fine tune a model automatically in some way rather then just replay, that way similar workflows might be improved too?
I believe explicit trajectories for learned behavior are significantly easier for humans to grok and debug, in contrast to reinforcement learning methods like deep Q-learning, so avoiding the use of models is ideal, but I imagine they'll have their place.
For what that may look like, I'll reuse a brainstorm on this topic that a friend gave me recently:
"Instead of relying on an LLM to understand where to click, the click area itself is the token. And the click is a token and the objective is a token and the output is whatever. Such that, click paths aren't "stored", they're embedded within the training of the LAM/LLM"
Whatever it ends up looking like, as long as it gets the job done and remains debuggable and extensible enough to not immediately eject a user once they hit any level of complexity, I'd be happy for it to be a part of Muscle Mem.
I believe explicit trajectories for learned behavior are significantly easier for humans to grok and debug, in contrast to reinforcement learning methods like deep Q-learning, so avoiding the use of models is ideal, but I imagine they'll have their place.
For what that may look like, I'll reuse a brainstorm on this topic that a friend gave me recently:
"Instead of relying on an LLM to understand where to click, the click area itself is the token. And the click is a token and the objective is a token and the output is whatever. Such that, click paths aren't "stored", they're embedded within the training of the LAM/LLM"
Whatever it ends up looking like, as long as it gets the job done and remains debuggable and extensible enough to not immediately eject a user once they hit any level of complexity, I'd be happy for it to be a part of Muscle Mem.