Neat.
FYI I have a whole bunch of rust tools[0] for loading models and other LLM tasks. For example auto selecting the largest quant based on memory available, extracting a tokenizer from a gguf, prompting, etc. You could use this to remove some of the python dependencies you have.
Currently to support llama.cpp, but this is pretty neat too. Any plans to support grammars?