logoalt Hacker News

bfleschlast Thursday at 4:52 PM2 repliesview on HN

When ChatGPT cites web sources in it's output to the user, it will call `backend-api/attributions` with the URL and the API will return what the website is about.

Basically it does HTTP request to fetch HTML `<title/>` tag.

They don't check length of supplied `urls[]` array and also don't check if it contains the same URL over and over again (with minor variations).

It's just bad engineering all around.


Replies

bentcornerlast Thursday at 8:25 PM

Slightly weird that this even exists - shouldn't the backend generating the chat output know what attribution it needs, and just ask the attributions api itself? Why even expose this to users?

show 1 reply
JohnMakinlast Thursday at 7:08 PM

Even if you were unwilling to change this behavior on the application layer or server side, you could add a directive in the proxy to prevent such large payloads from being accepted as an immediate mitigation step, unless they seriously need that parameter to have unlimited number of urls in it (guessing they have it set to some default like 2mb and it will break at some limit, but I am afraid to play with this too much). Somehow I doubt they need that? I don't know though.

show 1 reply