At least for Louie.ai, basically genAI-native computational notebooks, where operational analysts ask for intensive analytics tasks for like pulling Splunk/Databricks/neo4j data, getting it wrangled in some runtime, cluster/graph/etc it, and generate interactive viz, Python has ups and downs:
On the plus side, it means our backend gets to handle small/mid datasets well. Apache Arrow adoption in analytics packages is strong, so zero copy & and columnar flows on many rows is normal. Pushing that to the GPU or another process is also great.
OTOH, one of our greatest issues is the GIL. Yes, it shows up a bit in single user code, and not discussed in the post, especially when doing divide-and-conquer flows for a user. However, the bigger issue is in stuffing many concurrent users into the same box to avoid blowing your budget. We would like the memory sharing benefits of threaded, but because of the GIL, want the isolation benefits of multiprocess. A bit same-but-different, we stream results to the browser as agents progress in your investigation, and that has not been as smooth as we have done with other languages.
And moving to multiprocess is no panacea. Eg, a local embedding engine is expensive to do in-process per worker because modern models have high RAM needs. So that biases to using a local inference server for what is meant to be an otherwise local call, which is doable, but representative of that extra work needed for production-grade software.
Interesting times!
Am I the only one who thinks a Swift IDE project should be called Taylor?
Langchain and other frameworks are too bloated, it's good for demo, but highly recommend to build your own pipeline in production, it's not really that complicated, and you can have much better control over implementation. Plus you don't need 99% packages that comes with Langchain, reduce security vulnerabilities.
I've written a series of RAG notebooks on how to implement RAG in python directly, with minimal packages. I know it's not in Rust or C++, but it can give you some ideas on how to do things directly.
It would be helpful to move to a compiled language with a decent toolchain. Rust and Go are good candidates.
I was asking the same question, turns out mistral.rs [0] has pretty good abstractions in order to not depend and package llama.cpp for every platform.
This is a comparison of apples to oranges. Langchain has an order of magnitude of examples, of integrations and features and also rewrote its whole architecture to try to make the chaining more understandable. I don't see enough documentation in this pipeline to understand how to migrate my app to this. I also realize it would take me at least a week even migrate my own app to Langchain's rewrite.
Langchain is used because it was a first mover and that's the same reason it's achilles heel and not for speed at all.
this is very cool!
we built something for our internal consumption (and now used in quite a few places in India).
Edgechains is declarative (jsonnet) based. so chains + prompts are declarative. And we built an wasm compiler (in rust based on wasmedge).
https://github.com/arakoodev/EdgeChains/actions/runs/1039197...
I've covered this before in articles such as this: https://neuml.hashnode.dev/building-an-efficient-sparse-keyw...
You can make anything performant if you know the right buttons to push. While Rust makes it easy in some ways, Rust is also a difficult language to develop with for many developers. There is a tradeoff.
I'd also say LangChain's primary goal isn't performance it's convenience and functionality coverage.
Why not use C++?
For the most part, these aren't security critical components.
You already have a massive amount of code you can use like say llama.cpp
You get the performance that you do with Rust.
Compared to Python, in addition to performance, you also get a much easier deployment story.
DSPy is in Python, so it must be Python. Sorry bro :P
i mean LLM based or not has nothing to do with it, this is a standard optimization, scripting lang vs systems lang story.
I'm surprised they don't talk about the business side of this - did they have users complaining about the speed? At the end of day they only increased performance by 50%.
These kind of optimization seem awesome once you have a somewhat mature product but you really have to wonder if this is the best use of a startup's very limited bandwidth.
Most of the Python libraries, are anyway bindings to native libraries.
Any other ecosystem is able to plug into the same underlying native libraries, or even call them directly in case of being the same language.
In a way it is kind of interesting the performance pressure that is going on Python world, otherwise CPython folks would never reconsider changing their stance on performance.