logoalt Hacker News

ninjalanternshktoday at 3:27 AM2 repliesview on HN

Yeah this is about the worst way you could imagine to evaluate an AI model.

If you’d given it a real task you’d have been impressed.

I was floored by the day I spent with Fable. Got weeks of work done.


Replies

valleyertoday at 6:27 AM

Oh, I was also quite happy with Fable. I was just answering the question asked.

cevntoday at 3:57 AM

Same. It was one shotting unbelievably well compared to 4.8.