It's orders of magnitude cheaper to serve requests with conventional methods than directly with LLM. My back-of-envelope calculation says, optimistically, it takes more than 100 GFLOPs to generate 10 tokens using a 7 billion parameter LLM. There are better ways to use electricity.
Try to convince the investors. The way the industry is headed is not necessarily related to what is most optimal. That might be the future whether we like it or not. Losing billions seems to be the trend.
Sure, but we can start with an LLM to build V1 (or at least a demo) faster for certain problem domains. Then apply traditional coding techniques as an efficiency optimization later after establishing product-market fit.
I work in enterprise IT and sometimes wonder if we should add the equivalent energy calculations of human effort - both productive and unproductive - that underlies these "output/cost" comparisons.
I realize it sounds inhuman, but so is working in enterprise IT! :)