Hacker News new | past | comments | ask | show | jobs | submit login

So use better more standardized prompts. Specify the exact fields you demand of your unstructured data, refine your pipeline.



Just last week I went over 8k lines of data, doing a forst applicability analysis, meaning which lines to be considered for further analysis. The information I needed to do so was hidden in manually created comments, because of course it is, I have never ever seen pre defined classifications used consistently by people. And those pre defined classes never cover whatever need one has years later anyway.

Thing is, when I started I didn't even know what to look for. I knew once I was done, so almost impossible to explain that to LLM before. Added benefit, I found a lot of other stuff in tue dataset that will be very useful in future. Had I used a LLM for that, I wouldn't know hald of what I know about that data I do now.

That's the risk I see with LLMs, already now my pet peeve are data scientist with no domain knowledge or understanding of the data they analyze, but at least they now the maths. If part ofbthat is outsourced to a blackbox AI that halluzonates half the time, I am afraid most of those analysises will be utterly useless, or worse, misleading in a very confident way...

TLDR: In my opinion LLMs take away the curious discovery when go over data or text or whatever. Which is lazy and prevents us from casually learning new things. And we cannot even be sure we can trust results. Oh, and we are moving to think more about the tool, LLMs and prompts, than we of doing the job. Again, lazy and superficial, and a dead sure way to get mediocre, at best, results.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: