logoalt Hacker News

not_kurt_godellast Wednesday at 4:36 PM2 repliesview on HN

Agreed - AI that could take care of this sort of cross-system complexity and automation in a reliable way would be actually useful. Unfortunately I've yet to use an AI that can reliably handle even moderately complex text parsing in a single file more easily than if I'd just done it myself from the start.


Replies

mnky9800nlast Wednesday at 10:06 PM

Yes. It’s very frustrating. Like there is a great need for a kind of data pipeline test suite where you can iterate through lots of different options and play around with different data manipulations so a single person can do it. Because it’s not worth it to really build it if it doesn’t work. There needs to be one of these astronomer/dagster/apache airflow/azure ml tools that are quick and dirty to try things out. Maybe I’m just naive and they exist and I’ve had my nose in Jupyter notebooks. But I really feel hindered these days in my ability to prototype complex data pipelines myself while also considering all of the other parts of the science.

knowaveragejoelast Thursday at 12:31 AM

This reminds me of a paper: "The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators"

https://arxiv.org/abs/2407.11004

In essence, LLMs are quite good at writing the code to properly parse large amounts of unstructured text, rather than what a lot of people seem to be doing which is just shoveling data into an LLM's API and asking for transformations back.