> The same way you test any system - you find a sampling of test subjects, have them interact with the system and then evaluate those interactions.
That’s not strictly how I test my systems. I can release with confidence because of a litany of SWE best practices learned and borrowed from decades of my own and other people’s experiences.
> No system is guaranteed to never fail, it's all about degree of effectiveness and resilience.
It seems like the product space for services built on generative AI is diminishing by the day with respect to “effectiveness and resilience”. I was just laughing with some friends about how terrible most of the results are when using Apple’s new Genmoji feature. Apple, the company with one of the largest market caps in the world.
I can definitely use LLMs and other generative AI directly, and understand the caveats, and even get great results from them. But so far every service I’ve interacted with that was a “white label” repackaging of generative AI has been absolute dogwater.