logoalt Hacker News

jamesblondeyesterday at 8:56 PM1 replyview on HN

I say this quite a lot to data scientists who are now building agents:

1. think of the context data as training data for your requests (the LLM performs in-context learning based on your provided context data)

2. think of evals as test data to evaluate the performance of your agents. Collect them from agent traces and label them manually. If you want to "train" a LLM to act as a judge to label traces, then again, you will need lots of good quality examples (training data) as the LLM-as-a-Judge does in-context learning as well.

From my book - https://www.amazon.com/Building-Machine-Learning-Systems-Fea...


Replies

pbronezyesterday at 9:27 PM

Yup, agree. “Evaluations” = Tests

Gets pretty meta when you’re evaluating a model which needs to evaluate the output of another agent… gotta pin things down to ground truth somewhere.