logoalt Hacker News

syndacksyesterday at 11:22 PM1 replyview on HN

How do people evaluate creative writing and emotional intelligence in LLMs? Most benchmarks seem to focus on reasoning or correctness, which feels orthogonal. I’ve been playing with Kimmy K 2.5 and it feels much stronger on voice and emotional grounding, but I don’t know how to measure that beyond human judgment.


Replies

mohsen1today at 2:46 AM

I am trying! https://mafia-arena.com

I just don't have enough funding to do a ton of tests