logoalt Hacker News

mlsutoday at 6:59 PM1 replyview on HN

This piece is pretty ineffective. Not that I like the world of "AI", I probably share the author's opinion that its just another evolution in the bullshittification of the human experience.

But, the point of the article is not that you would implement an agent based vending machine business. Humans restock the machine because its a red-team exercise. As a red-team exercise it looks very effective.

> Why do you ever want to add a chatbot to a snack vending machine? The video states it clearly: the vending machine must be stocked by humans. Customers must order and take their snack by themselves. The AI has no value at all.

Like this is like watching the simpsons and being like "why are the people in the simpsons yellow? people in real life aren't yellow!!"

The point isn't to run a profitable vending machine, or even validate that an AI business agent could become profitable. The point is to conduct an experiment and gather useful information about how people can pwn LLMs.

At some level the red team guy at Anthropic understands that it is impossible by definition for models to be secure, so long as they accept inputs from the real world. Putting instructions into an LLM to tell it what to do is the equivalent of exposing an `eval()` to a web form: even if you have heuristics to check for bad input, you will eventually be pwned. I think this is actually totally intractable without putting constraints on the model from outside. You'll always need a human in the loop to pull the plug on the vending machine when it starts ordering playstations. The question is how do you improve that capability, and that is the anthropic red-team guy's job.


Replies

layer8today at 7:06 PM

> The point isn't to run a profitable vending machine, or even validate that an AI business agent could become profitable.

Having an AI run an organization autonomously is exactly the point of Andon Labs [0], who provided the system that WSJ tested.

[0] https://andonlabs.com/