I'm the founder and CEO of General Agents. Happy to answer questions!
Hi! Looks pretty interesting - few questions/thoughts:
1. Could you talk a bit more about your behavioral-training? If ace-control is trained on behavioral recordings, would it choose the most efficient path for the agent to take to complete a task? I'm guessing humans choose naturally take less-optimal steps.
2. What causes the huge speed increase? I'm guessing there were a lot of optimizations made, especially since this behavioral-training seems very different from vision models. I'm guessing the model is smaller, so it's interesting that accuracy is highest. I'd be interested to see a comparison vs. 4o-mini
3. Would be neat for it to handle instructions offline/locally - like "connect me to wifi" ;)
4. Would be cool if agent could work in the background so I can do something else in the meantime. ;)
How does it perform on e.g. WebVoyager, WebArena, or OSWorld? These seem to be the oft-cited benchmarks when comparing computer-use agents.
First, I am extremely impressed by the demo. It looks truly groundbreaking.
Could you elaborate on the types of tasks and data sources used to train Ace, and how these contribute to its performance on desktop automation?
Ace is said to outperform other models on your suite of computer use tasks. Can you provide more details on these benchmarks and how Ace compares to existing automation tools?
Amazing performance! Do you anticipate making the model available for commercial use or are you primarily focused on releasing agents built upon it?
From your site, "Ace works like we do—performing mouse clicks and keystrokes based on the screen and prompt—trained with <3 by our team of software specialists and domain experts on over a million tasks."
Is there a way to train or augment training on applications you've never seen before? We have a bunch of custom Java applications that we use in finance, curious about some additional automation.