Another fun example from yesterday: pasted a blog post in markdown into a HTML c...

kccqzy · 2024-10-29T13:00:38 1730206838

And how do you trust that it didn't just alter or omit some sentences from your blog post?

I just use Pandoc for that purpose and it takes 30 seconds, including the time to install pandoc. For code generation where you'll review everything, AI makes sense; but for such conversion tasks, it doesn't because you won't review the generated HTML.

TeMPOraL · 2024-10-29T13:49:42 1730209782

> it takes 30 seconds, including the time to install pandoc

On some speedrunning competition maybe? Just tested on my work machine, `sudo apt-get pandoc` took 11 seconds to complete, and it was this fast only because I already had all the depndencies installed.

Also I don't think you'll be able to fulfill the "using another blog post as a style reference" part of GP's requirements - unless, again, you're some grand-master Pandoc speedrunner.

Sure, AI will make mistakes with such conversion tasks. It's not worth it if you're going to review everything carefully anyway. In code, fortunately, you don't have to - the compiler is doing 90% of the grunt work for you. In writing, depends on context. Some text you can eyeball quickly. Sometimes you can get help from your tool.

Literally yesterday I back-ported a CV from English to Polish via Word's Translation feature. I could've done it by hand, but Word did 90% of it correctly, and fixing the remaining issues was a breeze.

Ultimately, what makes LLMs a good tool for random conversions like these is that it's just one tool. Sure, Pandoc can do GP's case better (if inputs are well-defined), but it can't do any of the 10 other ad-hoc conversions they may have needed that day.

adamc · 2024-10-29T14:43:51 1730213031

Installing pandoc is basically a one-time cost that is amortized over its uses, so... why worry about it?

Relying on the compiler to catch every mistake is a pretty limited strategy.

TeMPOraL · 2024-10-29T18:15:59 1730225759

> Installing pandoc is basically a one-time cost that is amortized over its uses, so... why worry about it?

Because space of problems LLMs of today solve well with trivial prompts is vast, far greater than any single classical tool covers. If you're comparing solutions to 100 random problems, you have to count in those one-time costs, because you'll need to use some 50-100 different tools to get through them all.

> Relying on the compiler to catch every mistake is a pretty limited strategy.

No, you're relying on the compiler to catch every mistake than can be caught mechanically - exactly the kind of things humans suck at. It's kind of the entire point of errors and warnings in compilers, or static typing for that matter.

adamc · 2024-10-30T17:38:28 1730309908

No, if you are having an LLM generate code that you are not reviewing, you are relying on the compiler 100%. (Or the runtime, if it isn't a compiled language.)

TeMPOraL · 2024-11-01T10:41:20 1730457680

Who said I'm not reviewing? Who isn't reviewing LLM code?

scosman · 2024-10-29T13:57:19 1730210239

Re:trust. It just works using Sonnet 3.5. It's gained my trust. I do read it after (again, I'm more a code reviewer role). People make mistakes too, and I think it's error rate for repeititve tasks is below most people's. I also learned how to prompt it. I'd tell it to just add formatting without changing content in the first pass. Then in a separate pass ask it to fix spelling/grammar issues. The diffs are easy to read.

Re:Pandoc. Sure, if that's the only task I used it for. But I used it for 10 different ones per day (write a JSON schema for this json file, write a Pydantic validator that does X, write a GitHub workflow doing Y, add syntax highlighting to this JSON, etc). Re:this specific case - I prefer real HTML using my preferred tools (DaisyUI+tailwind) so I can edit it after. I find myself using a lot less boilerplate-saving libraries, and knowing a few tools more deeply.

kccqzy · 2024-10-29T14:07:28 1730210848

Why are you comparing its error rate for repetitive tasks with most people? For such mechanical tasks we already have fully deterministic algorithms to do it, and the error rate of these traditional algorithms is zero. You aren't usually asking a junior assistant to manually do such conversion, so it doesn't make sense to compare its error rate with humans.

Normalizing this kind of computer errors when there should be none makes the world a worse place, bit by bit. The kind of productivity increase you get from here does not seem worthwhile.

kristiandupont · 2024-10-29T19:04:48 1730228688

The OP said they had it use another HTML page as a style reference. Pandoc couldn't do that. Just like millions of other specific tasks.

kccqzy · 2024-10-29T19:09:08 1730228948

That's just a matter of copying over some CSS. It takes the same effort as copying the output of AI so that's not even taking extra time.

scosman · 2024-10-29T19:09:21 1730228961

Apply the style of B to A is not deterministic, nor are there prior tools that could do it.

falconertc · 2024-10-29T13:44:06 1730209446

You didn't also factor in the time to learn Pandoc (and to relearn it if you haven't used it lately). This is also just one of many daily use cases for these tools. The time it takes to know how to use a dozen tools like this adds up when an LLM can just do them all.

kccqzy · 2024-10-29T14:26:45 1730212005

This is actually how I would use AI: if I forgot how to do a conversion task, I would ask AI to tell me the command so that I can run it without rejiggering my memory first. The pandoc command is literally one line with a few flags; it's easily reviewable. Then I run pandoc myself. Same thing with the multitude of other rarely used but extremely useful tools such as jq.

In other words, I want AI to help me with invoking other tools to do a job rather than doing the job itself. This nicely sidesteps all the trust issues I have.

flir · 2024-10-29T17:50:21 1730224221

I do that constantly. jq's syntax is especially opaque to me. "I've got some JSON formatted like <this>. Give me a jq command that does <that>.

Google, but better.

scosman · 2024-10-30T00:49:46 1730249386

lgas · 2024-10-29T15:16:15 1730214975

> And how do you trust that it didn't just alter or omit some sentences from your blog post?

How do you trust a human in the same situation? You don't, you verify.

kccqzy · 2024-10-29T16:10:56 1730218256

What? Is this a joke? Have you actually worked with human office assistants? The whole point of human assistants is that you don't need to verify their work. You hire them with a good wage and you trust that they are working in good faith.

It's disorienting for me to hear that some people are so blinded by AI assistants that they no longer know how human assistants behave.

nuancebydefault · 2024-10-29T16:43:59 1730220239

It appears op has a different experience. Each human assistant is different.