This is a fantastic guide! I did a lot of work on structured generation for my PhD. Here are a few other pointers for people who might be interested:
Some libraries:
- Outlines, a nice library for structured generation
- https://github.com/dottxt-ai/outlines
- Guidance (already covered by FlyingLawnmower in this thread), another nice library - https://github.com/guidance-ai/guidance
- XGrammar, a less-featureful but really well optimized constrained generation library - https://github.com/mlc-ai/xgrammar
- This one has a lot of cool technical aspects that make it an interesting project
Some papers:- Efficient Guided Generation for Large Language Models
- By the outlines authors, probably the first real LLM constrained generation paper
- https://arxiv.org/abs/2307.09702
- Automata-based constraints for language model decoding - A much more technical paper about constrained generation and implementation
- https://arxiv.org/abs/2407.08103
- Pitfalls, Subtleties, and Techniques in Automata-Based Subword-Level Constrained Generation - A bit of self-promotion. We show where constrained generation can go wrong and discuss some techniques for the practitioner
- https://openreview.net/pdf?id=DFybOGeGDS
Some blog posts:- Fast, High-Fidelity LLM Decoding with Regex Constraints
- Discusses adhering to the canonical tokenization (i.e., not just the constraint, but also what would be produced by the tokenizer)
- https://vivien000.github.io/blog/journal/llm-decoding-with-regex-constraints.html
- Coalescence: making LLM inference 5x faster - Also from the outlines team
- This is about skipping inference during constrained generation if you know there is only one valid token (common in the canonical tokenization setting)
- https://blog.dottxt.ai/coalescence.html
What a gold mine!
Automata-based constraints is fun.