Seeing half of an AR LLM's output tokens go to generating a predefined json schema bothers me so much. I would love to have an option to use diffusion for infilling.
One trick I learned for this was to use csv for LLM I/I and translate json <-> csv at the boundary layer
One trick I learned for this was to use csv for LLM I/I and translate json <-> csv at the boundary layer