logoalt Hacker News

recursivecaveattoday at 6:22 AM3 repliesview on HN

These kind of agents really do see the world through a straw. If you hand one a document it doesn't have any context clues or external methods of determining its veracity. Unless a board-meeting transcript is so self-evidently ridiculous that it can't be true, how is it supposed to know its not real?


Replies

jstummbilligtoday at 10:36 AM

I don't think it's that different to what I observe in humans I work with. Things that happen regularly (and I have no reason will change in the future):

1) Making the same bad decisions multiple times, and having no recollection of it happening (or at least pretending to have none) and without any attempt to implement measures to prevent it from happening in the future

2) Trying to please people (I read it as: trying to avoid immediate conflict) over doing what's right

3) Shifting blame on a party that realistically, in the context of the work, bears no blame and whose handling should be considered part of the job (i.e. a patient being scared and acting irrationally)

show 2 replies
Workaccount2today at 4:03 PM

I think all the models are squeezed to hell in back in training to be servants of users. This of course is very favorable for using the models as a tool to help you get stuff done.

However, I have a deep uneasy feeling, that the models will really start to shine in agentic tasks when we start giving them more agency. I'm worried that we will learn that the only way to get a super-human vending machine virtuoso, is to make a model that can and will tell you to fuck off when you cross a boundary the model itself has created. You can extrapolate the potential implications of moving this beyond just a vending demo.

bobbylarrybobbytoday at 1:18 PM

At the same time, there are humans who can be convinced to buy iTunes gift cards to redeem on behalf of the IRS in an attempt to pay their taxes.