Unreasonable pessimism considering state of the art velocity. Determinism, while...

spxneo · 2024-04-25T18:01:15 1714068075

I don't think parent is being unreasonable if my perception of Gorilla is correct here but

Lot of mission critical apps cannot risk even 0.01% chance of hallucination or some blackbox process

Like I don't think a direct LLM-2-API is feasible for even enterprise clients unless LLM generates that 2-API code layer that can be audited but right away that negates the need for Gorilla

Seems like "just trust this blackbox because everybody is writing wrappers around it" is similar to "throwing redux because facebook said so" 7 years ago. Definitely seeing some parallels with technological pollyannaism with blockchain.

toomuchtodo · 2024-04-25T18:02:35 1714068155

I agree you cannot trust unsupervised LLM output for mission critical M2M use cases, but (imho) it will help you move faster to create and maintain integrations with human supervision (code->SVN->test harness part of CICD->human review [1] and editing + proposed fixes for failure detection via Sentry [2] or similar). My apologies I did not make that more clear further up subthread.

To determine if this creates a positive value creation trajectory, some implementation and caretaking of the code generation pipeline is necessary.

[1] https://medium.com/@rgranadosd/using-ai-to-build-our-apis-wi...

[2] https://docs.sentry.io/product/issues/issue-details/ai-sugge...

jasonjmcghee · 2024-04-25T18:17:49 1714069069

I'd call it realistic. It's one thing to have an LLM developer facing and suggesting improvements etc _maybe_ "something went wrong? Try auto fix"

But LLM in a hotspot is asking for trouble in present LLM sota

swyx · 2024-04-26T00:59:13 1714093153

doing what i do with latent.space and ai.engineer, it was nice for someone to call me pessimistic about ai for once haha. i enjoy that on HN usernames are so de-emphasized.