I don't see it at all. > Typed I/O for every LLM call. Use Pydantic. Define what goes...

deaux • today at 3:39 PM • 5 replies • view on HN

I don't see it at all.

> Typed I/O for every LLM call. Use Pydantic. Define what goes in and out.

Sure, not related to DSPy though, and completely tablestakes. Also not sure why the whole article assumes the only language in the world is Python.

> Separate prompts from code. Forces you to think about prompts as distinct things.

There's really no reason prompts must live in a file with a .md or .json or .txt extension rather than .py/.ts/.go/.., except if you indeed work at a company that decided it's a good idea to let random people change prod runtime behavior. If someone can think of a scenario where this is actually a good idea, feel free to elighten me. I don't see how it's any more advisable than editing code in prod while it's running.

> Composable units. Every LLM call should be testable, mockable, chainable.

> Abstract model calls. Make swapping GPT-4 for Claude a one-line change.

And LiteLLM or `ai` (Vercel), the actually most used packages, aren't? You're comparing downloads with Langchain, probably the worst package to gain popularity of the last decade. It was just first to market, then after a short while most realized it's horrifically architected, and now it's just coasting on former name recognition while everyone who needs to get shit done uses something lighter like the above two.

> Eval infrastructure early. Day one. How will you know if a change helped?

Sure, to an extent. Outside of programming, most things where LLMs deliver actual value are very nondeterministic with no right answer. That's exactly what they offer. Plenty of which an LLM can't judge the quality of. Having basic evals is useful, but you can quickly run into their development taking more time than it's worth.

But above all.. the comments on this post immediately make clear that the biggest differentiator of DSPy is the prompt optimization. Yet this article doesn't mention that at all? Weird.

Replies

alexjplant • today at 5:15 PM

> Sure, not related to DSPy though, and completely tablestakes.

I agree but you'd be surprised at how many people will argue against static typing with a straight face. It's happened to me on at least three occasions that I can count and each time the usual suspects were trotted out: "it's quicker", "you should have tests to validate anyhow", "YOLO polymorphism is amazing", "Google writes Python so it's OK", etc.

It must be cultural as it always seems to be a specific subset of Python and ECMAScript devs making these arguments. I'm glad that type hints and Typescript are gaining traction as I fall firmly on the other side of this debate. The proliferation of LLM coding workflows has likely accelerated adoption since types provide such valuable local context to the models.

callbacked • today at 6:22 PM

> not sure why the whole article assumes the only language in the world is Python

https://github.com/ax-llm/ax (if you're in the typescript world)

andyg_blog • today at 3:42 PM

>the whole article assumes the only language in the world is Python.

This was my take as well.

My company recently started using Dspy, but you know what? We had to stand up an entire new repo in Python for it, because the vast majority of our code is not Python.

➕ show 2 replies

sbpayne • today at 3:44 PM

I think all of these things are table-stakes; yet I see that they are implemented/supported poorly across many companies. All I'm saying is there are some patterns here that are important, and it makes sense to enter into building AI systems understanding them (whether or not you use Dspy) :)

➕ show 2 replies

hedgehog • today at 4:17 PM

In my experience the behavior variation between models and providers is different enough that the "one-line swap" idea is only true for the simplest cases. I agree the prompt lifecycle is the same as code though. The compromise I'm at currently is to use text templates checked in with the rest of the code (Handlebars but it doesn't really matter) and enforce some structure with a wrapper that takes as inputs the template name + context data + output schema + target model, and internally papers over the behavioral differences I'm ok with ignoring.

I'm curious what other practitioners are doing.

➕ show 1 reply

alt Hacker News

Replies