the tool use examples are nice, but i'm curious about the structured output reliability. we've had other API models completely fall apart on complex, nested JSON schemas under concurrent load