logoalt Hacker News

pizzayesterday at 11:44 PM0 repliesview on HN

The more general question of how to evaluate the quality of a given skill file is quite interesting to me. A skill may prime a model's responses in a way that a prompt alone may not. But also models aren't good at judging what they are or are not capable of.

Just asking a model "how good is this skill?" may or may not work, possibly the next laziest thing you could do - that's still "for cheap" - is asking the model to make a quiz for itself, and have it take the quiz with and without access to the skill, then see how the skill improved it. But there's still many problems with that approach. But would it be useful enough to work well enough much of the time for just heuristically estimating the quality of a skill?