A related trick - if you want to teach your agent a specific kind of behavior, and want this behavior to be calibrated and safe, what you can do is:
1. enumerate the actions (policies) your agent takes, collected from prior runs
2. infer the states that correspond to each of these policies, make a state atlas (similar to the zodiac here)
3. infer the maximally discriminative features that can identify the state from current context
4. label a few examples and train a small policy model that predicts your action from those state features
I think LLMs should be used more often like this - as feature extractors for toy models, which can be used like tools. This way you can encode arbitrary logic in a small tool model that does not depend on the biases of the base model. For example this setup could power a "skill" to reliably implement your policy.
The trick here is that you carefully identify states that predict policy reliably, and features that distinguish between states, instead of using embeddings or pure LLM reasoning. You can decouple the logic from the feature extraction, and have it calibrated to your goals.
All 4 steps can be done by a coding agent with your supervision and zero coding. It's LLM as generic feature extractor with small models sitting on top.