There's a running theme in here of programming problems LLMs solve where it's actually not that important that the LLM is perfectly correct. I've been using GPT4 for the past couple months to comprehend Linux kernel code; it's spooky good at it.
I'm a C programmer, so I can with some effort gradually work my way through random Linux kernel things. But what I can do now instead is take a random function, ask GPT4 what it does and what subsystem it belongs to, and then ask GPT4 to write me a dummy C program that exercises that subsystem (I've taken to asking it to rewrite kernel code in Python, just because it's more concise and easy to read).
I don't worry at all about GPT4 hallucinating stuff (I'm sure it's doing that all the time!), because I'm just using its output as Cliff's Notes for the actual kernel code; GPT4 isn't the "source of truth" in this situation.
This is close to how I've been using them too. As a device for speeding up learning, they're incredible. Best of all, they're strongest where I'm weakest: finding all the arbitrary details that are needed for the question. That's the labor-intensive part of learning technical things.
I don't need the answer to be correct because I'm going to do that part myself. What they do is make it an order of magnitude faster to get anything on the board. They're the ultimate prep cook.
There are things to dislike and yes there is over-hype but "making learning less tedious" is huge!
You put words to what I've been thinking for a while. When I'm still new to some new technology it is a huge time-saver. I used to need to go bother some folks somewhere on a discord / Facebook group / matrix chat to get the one piece of context that I was hung up on. Sometimes it is hours or days to get that one nugget.
I feel more interested in approaching challenging problems in fact because I know I can get over those frustrating phases much more easily and quickly.
I came here to write essentially the same comment as you. Instead of going into a chatroom where people tell you you're lazy because you are unclear on ambiguous terms in documentation, these days I paste in portions of documentation and ask GPT for clarification on what I'm hazy about.
I'm finding myself using the extensively in the learning way, but also I'm an extreme generalist. I've learned so many languages over 23 years, but remembering the ones I don't use frequently is hard. The LLMs become the ultimate memory aid. I know that I can do something in a given language, and will recognised that it's correct when I see it.
Together with increasingly powerful speech to text I find myself talking to the computer more and more.
There are flaws, there are weaknesses, and a bubble, but any dev that can't find any benefit in LLMs is just not looking.
Languages, syntax, flags, and the details... I too have touched so many different technologies over the years that I understand at a high level, but don't remember the minutiae of. I have almost turned into a "conductor" rather than an instrumentalist.
Especially for debugging issues that could previously take days of searching documentation, Stack overflow, and obscure tech forums. I can now ask an LLM, and maybe 75% of the time I get the right answer. The other 25% of the time it still cuts down on debugging time by helping me try various fixes, or it at least points me in the right direction.
The advantage of using LLMs for use in coding, as distinct from most other domains, is that you can usually just directly check if the code it’s giving you is correct, by running it. And if it’s not, the LLM is often good at fixing it once the issue is pointed out.
I use it like a dictionary (select text and lookup) and based on what I looked up and answer, I judge myself how correct the answers are, and they are on point usually.
It has also made making small pure vanilla html/js based tools fun. It gives me a good enough prototype which I can mold to my needs. I have wrote a few very useful scripts/tools past few months which otherwise I would never even have started because of all the required first steps and basic learnings.
(never thought I would see your comment as a user)
> Best of all, they're strongest where I'm weakest: finding all the arbitrary details that are needed for the question. That's the labor-intensive part of learning technical things.
not arguing just an open question here but, is there a downside to this? Perhaps we won't retain this knowledge as easily because it's so readily provided. Not that I want to fill my head with even more arbitrary information, but there's probably some fluency gained in that
Exactly. It's similar in other (non programming) fields - if you treat it as a "smart friend" it can be very helpful but relying on everything it says to be correct is a mistake.
For example, I was looking at a differential equation recently and saw some unfamiliar notation[1] (Newton's dot notation). So I asked claude for why people use Newton's notation vs Lagrange's notation. It gave me an excellent explanation with tons of detail, which was really helpful. Except in every place it gave me an example of "Lagrange" notation it was actually in Leibniz notation.
So it was super helpful and it didn't matter that it made this specific error because I knew what it was getting at and I was treating it as a "smart friend" who was able to explain something specific to me. I would have a problem if I was using it somewhere where the absolute accuracy was critical because it made such a huge mistake throughout its explanation.
Once you know LLMS make mistakes and know to look for them half the battle is done. Humans make mistakes, which is why we take effort to validate thinking and actions.
As I use it more and more often the mistakes are born of ambiguity. As I supply more information to the LLM it's answer(s) gets better. I'm finding more and more ways to supply it with robust and extensive information.
- "things trivial to verify", so it doesn't matter if the answer is not correct - I can iterate/retry if needed and fallback to writing things myself, or
- "ideas generator", on the brainstorming level - maybe it's not correct, but I just want a kickstart with some directions for actual research/learning
Expecting perfect/correct results is going to lead to failure at this point, but it doesn't prevent usefulness.
Right, and it only needs to be right often enough that taking the time to ask it is positive EV. In practice, with the Linux kernel, it's more or less consistently right (I've noticed it's less right about other big open source codebases, which checks out, because there's a huge written record of kernel development for it to draw on).
I've been using it for all kinds of stuff. I was using a drying machine at a hotel a while ago and I was not sure about the icon that it was display on the visor regarding my clothes, so I asked gpt and it told me correctly. It read all the manuals and documentations from pretty much everything right? Better then Google it and you just ask for the exact thing you want.
I used LLMs for something similar recently. I have some old microphones that I've been using with a USB audio interface I bought twenty years ago. The interface stopped working and I needed to buy a new one, but I didn't know what the three-pronged terminals on the microphone cords were called or whether they could be connected to today's devices. So I took a photo of the terminals and explained my problem to ChatGPT and Claude, and they were able to identify the plug and tell me what kinds of interfaces would work with them. I ordered one online and, yes, it worked with my microphones perfectly.
My washing machine went out because some flooding and I gave chatGPT all of the diagnostic codes and it concluded that it was probably a short in my lid lock.
The lid lock came a few days later, I put it in, and I'm able to wash laundry again.
Yes, I like to think of LLM's as hint generators. Turns out that a source of hints is pretty useful when there's more to a problem than simply looking up an answer.
Especially when the hint is elementary but the topic is one I don't know about (or don't remember) and there exists a large corpus of public writing about it.
In such cases it makes getting past zero fast and satisfying, where before it would often be such a heavy lift I wouldn't bother.
For about 20 years, chess fans would hold "centaur" tournaments. In those events, the best chess computers, who routinely trounced human grandmasters, teamed up with those same best-in-the-world humans and proceeded to wipe both humans and computers off the board. Nicholas is describing in detail how he pairs up with LLMs to get a similar result in programming and research.
Sobering thought: centaur tournaments at the top level are no more. That's because the computers got so good that the human half of the beast no longer added any meaningful value.
Most people only have heard "Didn't an IBM computer beat the world champion", and don't know that Kasparov pysched himself out when Deep Blue had actually maken a mistake. I was part of the online analysis of the (mistaken) engame move at the time that were the first to reveal the error. Kasparov was very stressed by that and other issues, some of which IBM caused ("we'll get you the printout as promised in the terms" and then never delivered). My friend IM Mike Valvo (now deceased) was involved with both matches. More info: https://www.perplexity.ai/search/what-were-the-main-controve...
If they had a feature that only shared the links they gathered, I would use that. I've found in troubleshooting old electronics Google is often worse than useless, while Perplexity gets me the info I need on the first try. It hasn't (yet) hallucinated a found link, and that's what I use it for primarily
When I was a kid my dad told me about the most dangerous animal in the world, the hippogator. He said that it had the head of a hippo on one end and the head of an alligator on the other, and it was so dangerous because it was very angry about having nowhere to poop. I'm afraid that this may be a better model of an AI human hybrid than a centaur.
A bit of a detour (inspired by your words)... if anything, LLMs will soon be "eating their own poop", so structurally, they're a "dual" of the "hippogator" -- an ouroboric coprophage. If LLMs ever achieve sentience, will they be mad at all the crap they've had to take?
This mostly matches my experience but with one important caveat around using them to learn new subjects.
When I'm diving into a wholly new subject for the first time, in a field totally unrelated to my field (similar to the author, C programming and security) for example biochemistry or philosophy or any field where I don't have even a basic grounding, I still worry about having subtly-wrong ideas about fundamentals being planted early-on in my learning.
As a programmer I can immediately spot "is this code doing what I asked it to do" but there's no equivalent way to ask "is this introductory framing of an entire field / problem space the way an actual expert would frame it for a beginner" etc.
At the end of the day we've just made the reddit hivemind more eloquent. There's clearly tons of value there but IMHO we still need to be cognizant of the places where bad info can be subtly damaging.
I don't worry about that much at all, because my experience of learning is that you inevitably have to reconsider the fundamentals pretty often as you go along.
High school science is a great example: once you get to university you have to un-learn all sorts of things that you learned earlier because they were simplifications that no longer apply.
For fields that I'm completely new to, the thing I need most is a grounding in the rough shape and jargon of the field. LLMs are fantastic at that - it's then up to me to take that grounding and those jargon terms and start building my own accurate-as-possible mental model of how that field actually works.
If you treat LLMs as just one unreliable source of information (like your well-read friend who's great at explaining things in terms that you understand but may not actually be a world expert on a subject) you can avoid many of the pitfalls. Where things go wrong is if you assume LLMs are a source of irrefutable knowledge.
> like your well-read friend who's great at explaining things in terms that you understand but may not actually be a world expert on a subject
I guess part of my problem with using them this way is that I am that well-read friend.
I know how the sausage is made, how easy it is to bluff a response to any given question, and for myself I tend to prefer reading original sources to ensure that the understanding that I'm conveying is as accurate as I can make it and not a third-hand account whose ultimate source is a dubious Reddit thread.
> High school science is a great example: once you get to university you have to un-learn all sorts of things that you learned earlier because they were simplifications that no longer apply.
The difference between this and a bad mental model generated by an LLM is that the high school science models were designed to be good didactic tools and to be useful abstractions in their own right. An LLM output may be neither of those.
If you "tend to prefer reading original sources" then I think you're the best possible candidate for LLM-assisted learning, because you'll naturally use them as a starting point, not the destination. I like to use LLMs to get myself the grounding I need to then start reading further around a topic from more reliable sources.
That's a great point about high school models being deliberately designed as didactic tools.
LLMs will tend to spit those out too, purely because the high school version of anything has been represented heavily enough in the training data that it's more likely than not to fall out of the huge matrix of numbers!
> LLMs will tend to spit those out too, purely because the high school version of anything has been represented heavily enough in the training data that it's more likely than not to fall out of the huge matrix of numbers!
That assumes that the high school version of the subject exists, which is unlikely because I already have the high school version of most subjects that have a high school version.
The subjects that I would want to dig into at that level would be something along the lines of chemical engineering, civil engineering, or economics—subjects that I don't yet know very much about but have interest or utility for me. These subjects don't have a widely-taught high school version crafted by humans, and I don't trust that they would have enough training data to produce useful results from an LLM.
at what point does a well-read high school-level LLM graduate to college? I asked one about Reinforcement Learning, and at first it treated me like the high schooler, but I was able to prod it into giving me answers more suitable for my level. Of course, I don't know what's hallucinated or not, but it satisfied my curiosity enough to be worth my while. I'm not looking to change careers, so getting things 100% right about in the fields of chemical engineering, civil engineering, or economics isn't necessary. I look at it as the same way I think of astrophysics. After reading Steven Hawkins book, I still don't really know astrophysics at all, but I have a good enough model of things. And as they say, all models are wrong, some are useful.
If I were a lawyer using these things for work, I'd be insane to trust one at this stage, but the reality is I'm not using my digging into things I don't know about for anything load bearing, but even if I were, I'd still use an LLM to get started. Eg the post didn't state how the author learned anything the name for the dropped letter O, but I can describe a thing and have the LLM give me the name of it. there's an emphasis on getting things totally 100% right does erode trust, but you get a sense for what's could be a hallucination and then check background resources if you get enough experience with the tool.
In the article the author mentions wanting to benchmark a GPU and using ChatGPT to write CUDA. Benchmarks are easy to mess up and to interpret incorrectly without understanding. I see this as an example where a subtly-wrong idea could cause cascading problems.
This just does not match my experience with these tools. I've been on board with the big idea expressed in the article at various points and tried to get into that work flow, but with each new generation of models they just do not do well enough, consistently enough, on serious tasks to be a time or effort saver. I don't know what world these apparently high output people live in where their days consist of porting Conway's Game of Life and writing shell scripts that only 'mostly' need to work, but I hope one day I can join them.
Not to pick on you but this general type of response to these AI threads seems to be coalescing around something that looks like a common cause. The thing that tips your comment into that bucket is the "serious tasks" phrasing. Trying to use current LLMs for either extremely complex work involving many interdependent parts or for very specialized domains where you're likely contributing something unique or to any other form of "serious task" you can think of generally doesn't work out. If all you do all day long are serious tasks like that then congrats you've found yourself a fulfilling and interesting role in life. Unfortunately, the majority of other people have to spend 80 to 90 percent of their day dealing with mind numbing procedural work like generating reports no one reads and figuring out problems that end up being user error. Fortunately, lots of this boring common work has been solved six ways from Sunday and so we can lean on these LLMs to bootstrap our n+1th solution that works in our org with our tech stack and uses our manager's favourite document format/reporting tool. That's where the other use cases mentioned in the article come in, well that and side projects or learning X in Y minutes.
You get used to their quirks. I can more or less predict what Claude/GPT can do faster than me, so I exclusively use them for those scenarios. Implementing it to one's development routine isn't easy though, so I had to trial and error until it made me faster in certain aspects. I can see it being more useful for people who have a good chunk of experience with coding, since you can filter out useless suggestions much faster - ex. give a dump of code, description of a stupid bug, and ask it where the problem might be. If you generally know how things work, you can filter out the "definitely that's not the case" suggestions, it might route you to a definitive answer faster.
If you use it as an intern, as a creative partner, as a rubber-duck-plus, in an iterative fashion, give it all the context you have and your constraints and what you want... it's fantastic. Often I'll take pieces from it, if it's simple enough I can just use it's output.
I also use LLMs similarly. As a professional programmer, LLMs save me a lot of time. They are especially efficient when I don't understand a flow or need to transform values from one format to another. However, I don't currently use them to produce code that goes into production. I believe that in the coming years, LLMs will evolve to analyze complete requirements, architecture, and workflow and produce high-quality code. For now, using LLMs to write production-ready applications in real-time scenarios will take longer.
I've been pleasantly surprised by GitHub's "copilot workspace" feature for creating near production code. It takes a GitHub issue, converts it to a specification, then to a list of proposed edits to a set of files, then it makes the edits. I tried it for the first time a few days ago and was pleasantly surprised at how well it did. I'm going to keep experimenting with it more/pushing it to see how well it works next week.
A small portion of my regular and freelance work is translating things from a database to something an application can use. A perfect example of this is creating a model in MVC architecture from a database table/stored procedure/function. I used to have a note or existing code I would have to copy and paste and then modify each and every property one at a time, to include the data types. Not hard stuff at all, but very tedious and required a certain amount of attention. This would have taken me maybe 5 to 20 minutes in the perfect scenario, minus any typos in datatypes, names of properties, etc.
Now I'll do something like this for a table, grabbing the column names and data types:
SELECT
COLUMN_NAME,
DATA_TYPE,
CHARACTER_MAXIMUM_LENGTH,
NUMERIC_PRECISION,
NUMERIC_SCALE
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = 'Table Name Goes Here'
ORDER BY
COLUMN_NAME;
Then I'll ask my custom GPT to make me a model from the SQL output for my application. I do a quick spot check on the new class and done - the code is completed without any typos in much less time. This kind of stuff goes into production on a regular basis, and I feel about as guilty as I did in 10th grade using a TI-89 for matrix operations, which is zero.
I think all of these can be summarized into three items
1. Search engine - Words like "teach" or "learn" used to be slapped on Google once upon a time. One real great thing about LLMs here is that they do save time. The internet these days is unbelievably crappy and choppy. It often takes more time to click through the first item in the Google result and read it than to simply ask an LLM and wait for its slowish answer.
2. Pattern matching and analysis - LLMs are probably the most advanced technology for recognizing well-known facts and patterns from text, but they do make quite some errors especially with numbers. I believe that a properly fine-tuned small LLMs would easily beat gigantic models for this purpose.
3. Interleaving knowledge - this is the biggest punch that LLMs have, and also the main source of all the over-hype (which does still exist). It can produce something valuable by synthesizing multiple facts, like writing complex answers and programs. But this is where hallucination happens most frequently, so it's critical that you review the output carefully.
The problem is that AI is being sold to multiple industries as the cure for their data woes.
I work in education, and every piece of software now has AI insights added. Multiple companies are selling their version as hallucination free.
The problem is the data sets they evaluate are so large and complicated for a college that there is literally no way for humans to verify the insights.
It's actually kind of scary. Choices are being made about the future of human people based on trust in New Software.
My experience is that LLMs can't actually do 3 at all. The intersection of knowledge has to already be in the training data. It hallucinates if the intersection of knowledge is original. That is exactly what should expect though given the architecture.
Super interested in hearing more about why you think this -
> I believe that a properly fine-tuned small LLMs would easily beat gigantic models for this purpose.
I've long felt that vertical search engines should be able to beat the pants off Google. I even built one (years ago) to search for manufacturing suppliers that was, IMO, superior to Google's. But the only way I could get traffic or monetize was as middleware to clean up google, in a sense.
I just want to emphasise two things, which are both mentioned in the article, but I still want to emphasise them as they are core to what I take from the article as someone who has been a fan boy of Nicholas for years now
1. Nicholas really does know how badly machine learning models can be made to screw up. Like, he really does. [0]
2. This is how Nicholas -- an academic researcher in the field of security of machine learning -- uses LLMs to be more efficient.
I don't know whether Nicolas works on globally scaled production systems with have specific security/data/whatever controls that need to be adhered to, or whether he even touches any proprietary code. But seeing as he heavily emphasised the "i'm a researcher doing research things" in the article -- I'd take a heavy bet that he does not. And academic / research / proof-of-concept coding has different limitations/context/needs than other areas.
I think this is a really great write up, even as someone on the anti-LLM side of the argument. I really appreciate the attempt to do a "middle of the road" post which is absolutely what the conversation needs right now (pay close attention to how this was written LLM hypers).
I don't share his experience, I still value and take enjoyment from the "digging for information" process -- it is how I learn new things. Having something give me the answer doesn't help me learn, and writing new software is a learning process for me.
I did take a pause and digested the food for thought here. I still won't be using an LLM tomorrow. I am looking forward to his next post, which sounds very interesting.
i appreciate the article and the full examples. But I have to say this all looks like a nightmare to me. Going back and forth in English with a slightly dumb computer that needs to be pestered constantly and hand-held through a process? This sounds really really painful.
Not to mention that the author is not really learning the underlying tech in a useful way. They may learn how to prompt to correct the mistakes the LLM makes, but if it was a nightmare to go through this process once, then dealing with repeating the same laborious walkthrough each time you want to do something with Docker or build a trivial REST API sounds like living in hell to me.
Glad this works for some folks. But this is not the way I want to interact with computers and build software.
> You're gonna get left in the dust by everyone else embracing LLMs.
Probably not, there's a very long tail to this sort of stuff, and there's plenty of programming to go around.
I'll chime in with your enthusiasm though. Like the author of the post, I've been using LLMs productively for quite a while now and in a similar style (and similarly skeptical about previous hype cycles).
LLMs are so useful, and it's fascinating to see how far people swing the opposite way on them. Such variable experiences, we're really at the absolute beginning of this whole thing (and the last time I said that to a group of friends there was a range of agreement/disagreement on that too!)
They’re certainly useful if you know what you’re doing. An example: if I try to create an application in .NET for Windows, I’ll have a hard time using an LLM cause I’ll have no way to know if the solutions are the best, what’s possible and what isn’t, etc.
But I’m an iOS developer who doesn’t have experience with SwiftUI. I’ve been creating an app clone for the purpose of learning it and I’ve been using ChatGPT extensively like one would use StackIverflow when you’re still picking up a new framework. It works very well and I’ve advanced very fast because I’ve read and watched multiple content about it, just never got into actually using it. It’s easy to know and even try out variations of what the LLM gives me. It feels like having a friend that knows SwIftUI which I can ask stupid questions as I try it out.
> I suspect they're having an unusually positive experience with these tools due to working on a lot of new, short, programs.
That's academia for you :)
It also helps that he specialises deep learning models and LLMs and knows a thing or two about the inner workings, how to prompt (he authored papers about adversial attacks on LLMs) and what to expect.
There is no reason for folks to explain why they flag, but consider that if it was flagged but then remains available with the flag indicator (with the flags overridden), someone thought you might find value in it.
I’m personally drawn to threads contentious enough to be flagged, but that have been vouched for by folks who have the vouch capability (mods and participants who haven’t had vouch capability suspended). Good signal imho.
(to my knowledge, once a post has been sufficiently flagged without vouching, it is beyond a user's event horizon and only mods and users who had posted in the thread can see it)
The biggest AI skeptics i know are devops/infrastructure engineers.
At this point i believe most of them can not be convinced that LLMs are valuable or useful by any sort of evidence, but if anything could do it, this article could. Well done.
Ops engineers [0] are the ones who have to spend weekends fixing production systems when the development team has snuck in "new fangled tools X, Y and Z" into a "bugfix" release.
We have been burned by "new fangled" too many times. We prefer "old reliable" until "new fangled" becomes "fine, yes, we probably should".
[0]: DevOps has now become a corporate marketing term with no actual relevance to the original DevOps methodology
I'm a DevOps/infrastructure person, and I agree completely. This article won't change that.
They've been great for helping me with work-related tasks. It's like having a knowledgeable co-worker with infinite patience, and nothing better to do. Neither the people nor the LLM give back perfect answers every time, but it's usually more than enough to get me to the next step.
That said, having good domain knowledge helps a lot. You make fewer mistakes, and you ask better questions.
When I use LLMs for tasks I don't know much about, it takes me a lot longer than someone who does know. I think a lot of people - not just infrastructure people - are missing out by not learning how to use LLMs effectively.
Just today I had GPT4 implement a SwiftUI based UI for a prototype I’m working on. I was able to get it to work with minimal tweaks within 15 minutes even though I know next to nothing about SwiftUI (I’m mainly a systems person these days). I pay for this, and would, without hesitation, pay 10x for a larger model which does not require “minimal tweaks” for the bullshit tasks I have to do. Easily 80% of all programming consists of bullshit tasks that LLMs of 2024 are able to solve within seconds to minutes, whereas for me some of them would take half a day of RTFM. Worse, knowing that I’d have to RTFM I probably would avoid those tasks like the plague, limiting what can be accomplished. I’m also relieved somewhat that GPT4 cannot (yet?) help me with the non-bullshit parts of my work.
If it handles 99% of your tasks (making a smart boss fire you), know that you helped train it for that by using it/paying for it/allowing it to be trained on code in violation of license.
Even if 80% of programmer tasks in an org (or worldwide gig market) can be handled by ML, already 80% of programmers can be laid off .
Maybe you have enough savings that you just don't need to work but some of us do!
- LLM-assistance helps solve 80% of programming tasks, so 80% of programmers lose their jobs
- LLM-assistance provides that exact same productivity boost, and as a result individual programmers become FAR more valuable to companies - for the same salary you get a lot more useful work out of them. Companies that never considered hiring programmers - because they would need a team of 5 over a 6 month period to deliver a solution to their specific problem - now start hiring programmers. The market for custom software expands like never before.
I expect what will actually happen will be somewhere between those two extremes, but my current hope is that it will still work out as an overall increase in demand for software talent.
I like your optimism, but in programming at least in US unemployment so far already rose higher than average unemployment overall.
ML supercharges all disparity, business owners or superstars who made a nice career and name will earn more by commanding fleets of cheap (except energy) llms while their previous employees/reports get laid off by tens of thousands (ironically they do it to themseves by wecoming llms and thinking that the next guy will be the unlucky one, same reason unions don't work there I guess...)
And to small businesses who never hired programmers before, companies like ClosedAI monetize our work for their bosses to get full products out of chatbots (for now buggy but give it a year). Those businesses will grow but when they hire they will get cheap minimal wage assistants who talk to llms. That's at best where most programmers are headed. The main winners will be whoever gets to provide ML that monetize stolen work (unless we stop them by collective outrage and copyright defense), so Microsoft
I'm not sure how much we can assign blame for US programming employment to LLMs. I think that's more due to a lot of companies going through a "correction" after over-hiring during Covid.
As for "their bosses to get full products out of chatbots": my current thinking on that is that an experienced software engineer will be able to work faster with and get much higher quality results from working with LLMS than someone without any software experience. As such, it makes more sense economically for a company to employ a software engineer rather than try to get the same thing done worse and slower with cheaper existing staff.
> my current thinking on that is that an experienced software engineer will be able to work faster with and get much higher quality results from working with LLMS
> than someone without any software experience
- So you are betting against ML becoming good enough soon enough. I wouldn't be so sure considering the large amount of money and computing energy being thrown into it and small amount of resistance from programmers.
- Actually someone doesn't have to be zero experience. But if someone is mostly an llm whisperer to save boss some yacht time, instead of engineer, someone is paid according minimal wage.
No matter how good ML gets I would still expect a subject matter expert working with that ML to produce better results than an amateur working with that same ML.
When that’s not true any more we will have built AGI/ASI. Then we are into science fiction Star Trek utopia / Matrix dystopia world and all bets are off.
> would still expect a subject matter expert working with that ML to produce better results than an amateur working with that same ML.
Subject matter expert yes. Subject matter is not programming though, it's whatever the thing being built is about. (So if talking about non-tech companies that never considered hiring programmers before I think they still won't.)
Thing is though, I work in this field. I do not see it handling the non-bullshit part of my job in my lifetime, the various crazy claims notwithstanding. For that it’d need cognition. Nobody has a foggiest clue how to do that.
Truth be told, most big tech teams could benefit from significant thinning. I work in one (at a FANG) where half the people don't seem to be doing much at all, and the remaining half shoulders all the load. The same iron law held in all big tech teams I worked in, except one, over the course of the last 25 years. If the useless half was fired, the remaining half would be a lot more productive. This is not a new phenomenon. So IDK if "firing 80%" is going to happen. My bet - nope. The only number that matters to a manager is the number of headcount they have under them. And they're going to hold onto that even if their people do nothing. They are already doing that.
You switch topics. There are useless people. Not talking about them. Ignore useless people.
You and your good useful programmer coworkers do 80% llmable bullshit, 20% good stuff. So among you, if your boss is smart he will fire 80% of you and spread 20% non-llmable work across remaining people. You hope your coworker gets fired, your coworker hopes it's you, and you both help make it happen
Fire everyone and make themselves redundant? Please. You're also assuming the amount of non-bullshit work would stay constant, which it won't. I'm doing a ton more non-bullshit work today thanks to LLMs than I did 2 years ago.
> Easily 80% of all programming consists of bullshit tasks that LLMs of 2024 are able to solve within seconds to minutes, whereas for me some of them would take half a day of RTFM
> I'm doing a ton more non-bullshit work today thanks to LLMs than I did 2 years ago.
Logically this means either there is more non-bullshit tasks in total or some of your coworkers were fired so your workload is the same...
Are you paid more for doing more difficult work, adjusted for inflation?
I enjoy difficult work in my area of expertise a lot more, and dread boilerplate work, and work in unfamiliar domains that takes time for RTFM and trial and error. As to my pay, let’s just say I’m not complaining, especially when I get to do more of the stuff I enjoy. Also: work expands.
> I enjoy difficult work in my area of expertise a lot more
Real question: is it difficult work if that's exactly the part you like and you are not paid more when you do it more? What makes it difficult-- just the fact that LLM can't do it this year yet?
I wouldn't call my work "difficult". Boring parts can be hard but with the right stack there are very few. Stuff like back and forth to understand customer requirements is difficult but that's not even my job.
> let's just say
I didn't ask how much you get paid exactly, I asked if you get paid more (adjusted for inflation) for effectively doing more work now thanks to LLMs.
> work expands
And if pay doesn't you may ask yourself if LLMs are eating at your pay:)
The problem I have with LLMs is that one can never be sure that it will give you the best possible solution. In fact in coding very often it will give you a working but also outdated solution. And this is futile. Because in coding even the best possible solution nowadays gets old very quickly. But if you use LLMs your code will be outdated from the start. That is nothing I would pay for.
You have to look at it as a contractor. If you tell a contractor to "build me X" then you might get anything back with a high probability of getting something common but outdated. You have to write a specification for it with all of the constraints and preferences you have. Works well if you know the domain, if you're using it for something you don't know much about then you have to do more of the legwork yourself but at least it will give you a starting point that can inform your research.
With coding, getting a good enough solution quickly is usually more valuable than getting the perfect solution eventually. And as you say, things get outdated quickly anyway. I openai pay for speeding up my work. Instead of obsessing over something for an afternoon, I let it stub out some code, generate some tests and then let it fill in the blanks in under an hour. Time is money. The value of artisanally personally crafted code is very limited. And it's shelf life is short.
I think the author does a decent job laying out good ways of using the LLMs. If you’re gonna use them, this is probably the way.
But he acknowledges the ethical social issues (also misses the environmental issues https://disconnect.blog/generative-ai-is-a-climate-disaster/) and then continues to use them anyway. For me the ickiness factor is too much, the benefit isn’t worth it.
In a just society where private corporations didn't attempt to own everything in existence, there are no ethical social issues in my mind.
LLMs just use the commons, and should only be able to be owned by everyone in society.
The problem comes in with unaccountable private totalitarian institutions. But that doesn't mean the technology is an ethical social issue, it's the corporations who try to own common things like the means of production that is the problem.
Yes, there's the pragmatic view of the society we live in, and the issues that it contains, but that's the ethical issue that we need to address.
Not that we can as a society create LLMs based on the work of society.
LLMs do not simply use the commons, they are a vehicle for polluting the commons on an industrial scale. If, hypothetically, the ethical problems with plagiarizing creative work to create these models were a non-issue, there would still be massive ethical problems with allowing their outputs to be re-incorporated into the web, drowning useful information in a haze of superficially plausible misinformation.
I don't think you are right. If you test LLM text and random internet text for inaccuracies and utility, you'd probably find more luck with LLM text.
For example, if you use a LLM to summarize this whole debate, you would get a decent balanced report, incorporating many points of view. Many times the article generated from the chat thread is better than the original one. Certainly better grounded in the community of readers, debunks claims, represents many perspectives. (https://pastebin.com/raw/karBY0zD)
That's funny, because I was using forum debates as LLM reference precisely in order to reduce errors. People usually debunk stupid articles, the discussion is often centered on fact checking. A LLM referencing a HN/reddit thread is more informed than one reading the source material.
There is a fundamental conflict of interest in press. It costs money to run a newspaper, and then you just give it away for free? No, you use it to push propaganda and generally to manipulate. Forums have become the last defense for actual readers. We trust other people will be more aligned with our interests than who wrote the article.
I trust my forum mates more than press, and LLM gives a nice veneer to the text. No wonder people attach "reddit" to searches, they want the same thing. The actual press is feeding us the real slop. LLMs are doing a service to turn threads into a nice reading format. Might become the only "press"
we trust in the future.
If it’s ok I’d like to both share how I’m navigating my skepticism and also being mindful of the need to keep in perspective other people’s skepticism if it doesn’t offer anything to compare.
Why? I have friends who can border on veiled cynicism without outlining what might be in the consideration of skepticism. The only things being looked at are why something is not possible, not a combination. Both can result in a similar outcome.
Not having enough time to look into intent enough, it just invalidates the persons skepticism until they look into it more themselves. Otherwise used as a mechanism to try and trigger the world to expend mental labour for free on your behalf.
It’s important to ask one’s self if there may be partially relevant facts to determine what kind of skepticism may apply:
- Generally, is there a provenance of efficiency improvement both in the world of large scale software and algorithmic optimizations?
- Have LLMs become more optimized in the past year or two? (Can someone M1 Max Studio run more and more models that are smaller and better to do the same)
- Generally and historically is there provenance in compute hardware optimizations, for LLm type or LLM calculations outright?
- Are LLMs using a great deal more resources on average than new technologies preceding it?
- Are LLMs using a massive amount of resources in the start similar to servers that used to take up entire rooms compared to today?
Good article and it matches my own experience in the last year. I use it to my advantage both on hobby projects and professionally and it's a huge timesaver.
LLMs are far from flawless of course and I often get stuck with non working code. Or it is taking annoying shortcuts in giving a detailed answer, or it just wastes a lot of time repeating the same things over and over again. But that's often still useful. And you can sometimes trick them into doing better. Once it goes down the wrong track, it's usually best to just start a new conversation.
There are a few neat tricks that I've learned over the last year that others might like:
- you can ask chat gpt to generate some files and make them available as a zip file. This is super useful. Don't wait for it to painfully slowly fill some text block with data or code. Just ask it for a file and wait for the link to become available. Doesn't always seem to work but when it does it is nice. Great for starting new projects.
- chat gpt has a huge context window so you can copy paste large source files in it. But why stop there? I wrote a little script (with a little help of course) that dumps the source tree of a git repository into a single text file which I can then copy into the context. Works great for small repositories. Then you can ask questions like "add a function to this class that does X", "write some unit tests for foo", "analyze the code and point out things I've overlooked", etc.
- LLMs are great for the boring stuff. Like writing exhaustive unit tests that you can't be bothered with or generating test data. And if you are doing test data, you might as well have some fun and ask it to inject some movie quotes, litter it with hitchhiker's guide to the galaxy stuff, etc.
The recent context window increase to 128K with chat gpt 4o and other models was a game changer. I'm looking forward to that getting even larger. The first few publicly available LLMs had the memory of a gold fish. Not any more. Makes them much more useful already. Right now most small projects easily fit into its context already.
Great comment. I've also found some shortcuts to out-shortcut GPT. Before it even thinks of substituting code blocks with "/* code here */" or whatever, I usually just tell it "don't omit any code blocks or substituted any sections with fill-in comments. Preserve the full purpose of the prompt and make sure you retain full functionality in all code -- as if it's being copy-pasted into a valuable production environment".
It also helps to remind it that its role is a "senior developer" and that it should write code that likens it to a senior developer. It will be happy to act like a junior dev if you don't explicitly tell it.
Also, always remember to say please, thank you, hello, and that you'll tip it money - these HAVE made differences over time in my tests.
If I know technology which I am using llm for then llm helps me to do it faster. If I am not familiar with technology then llm helps me to learn it faster by showing me win the code that it generates which part of technology is important and how it works in real examples. But I do not think it is helpful and I would say it may be dangerous depending on task you do if you do not know technology and also do not what to learn it and understand how generated code works.
What is very useful for me is when I conduct research outside of my field of expertise, I do not even know what keywords to look for. An LLM can help you with this.
I've been getting a similar feeling lately, in that if a thing has been done before and knowledge is publicly available, asking the "AI" (the LLMs) first about it is the best place to start. It looks like that's how things are going to be from now on and it's only going to amplify.
But as the AI gets increasingly competent at showing us how to do things, knowing what task is worth doing and what not is still a task for the one who asks, not the AI.
Edit: One type of question I've found interesting is to make it speculate, that is asking questions that it doesn't know the answer to, but still is able to speculate because they involve combining things it does know about in novel (though not necessarily valid) ways.
I use it for boilerplate, totally uninteresting and boring code. Stuff like Bash parameters validation (it's only validation, so the damage when it hallucinates is quite limited and usually quickly shows up) and Google spreadsheets formula generation, stuff like: "extract ticker name from the OCC symbol in the previous column, write '-' instead if it's empty". It's really boring stuff to do manually and it's actually faster have GPT 4o generate it from me from the sentence than write it myself.
Typically there is fixing needed (e.g. it shall fuck up things as trivial as parentheses placement) but it's still useful.
Lots of french/english translation too: it's actually good at that.
I work with multiple programming languages and it's a godsend. Having something that gives you mostly correct instructions on how to do a generic thing without having to wade through today's garbage web search experience is fantastic.
What’s everyone’s coding LLM setup like these days? I’m still paying for Copilot through an open source Xcode extension and truthfully it’s a lot worse than when I started using it.
I'm happy with Supermaven as a completion, but only for more popular languages.
Otherwise Claude 3.5 is really good and gpt-4o is ok with apps like Plandex and Aider. You need to get a feel for which one is better for what task though. Plain questions to Claude 3.5 API through the Chatbox app.
Research questions often go to perplexity.ai because it points to the source material.
I don't really use them together exactly, I just alternate backwards and forwards depending on the type of task I'm doing. If it's the kind of change that's likely to be across lots of files (writing) then I'll use Aider. If it only uses context from other files I'll likely use Cursor.
Supermaven (vscode extension) was quite handy at recognizing that I was making the same kind of changes in multiple places and accurately auto-completed the way I was about to write it, I liked it better than copilot
I just wish they were better at recognizing when their help is not wanted because I would often disable it and forget to turn it back on for a while. Maybe a "mute for an hour" would fix that.
Allows you to open chats directly in a neovim window. Also, allows you to select some text and then run it with certain prompts (like "implement" or "explain this code"). Depending on the prompt you can make the result appear directly inside the buffer you're currently working on. The request to the ChatGPT API also is enriched with the file-type.
I hated AI before I discovered this approach. Now I'm an AI fanboy.
I wonder what the author thinks it openinterpreter/similar, which is a higher level of indirection, so you ask the computer to do it for it and it just does it for you. the first section is the kind of thing I'd use it for.
"make me a docker container that does foo."
"now have it do bar."
though the author uses emacs, so maybe they get the same level of not-having-to-copy-and-paste.
It is overhyped. If you don't know much about what you're trying to do, then you're not going to know how bad or suboptimal the the LLM's output is. Some people will say it doesn't matter as long as it gets the job done. Then they end up paying a lot extra for me to come in and fix it when it's going haywire in prod.
And if you don't know a lot, you should at least know that an LLM/chatbot is useful as far as giving you a bit of an immersive experience into a topic, and that you should use other resources to verify what the LLM/chatbot is telling you.
This fully matches my experience using Chat GPT for the past 12 months. You just have to allow yourself to ask it questions like you might ask a very smart co-worker and it just keeps delivering. In many ways it has delivered as a co-CTO on one rather complicated project I've been working on.
It’s kind of nice to see someone else whose experience parallels mine with getting good answers from chatgpt - kind of like search engine queries, you learn the vocabulary and get a feel for what kind of question gets you an answer you can work with.
What if we had something that could fill the gaps in docs for devlopers using a library? It doesn't actually write the docs, but simply hints at what a function could do. Would be pretty useful for beginner devs
I use pretty much as an "better google". I formulate questions and try to be as especific as possible, and I have good results in fixing some code troubles I had.
I get why he wrote it like that. Having this conversation (the "I know there are lots of bad things about them, but LLMs are genuinely useful for all sorts of things" conversation) is pretty exhausting. This whole piece was very clearly a reaction to having had that conversation time and time again, at which point letting some frustration slip through is understandable.
Every now and then, I'll actually sort of believe an article like this. Then I go and test the current models on things like semantic search.
For instance -
The Hough transform detects patterns with certain structure in images, e.g. circles or lines.
So I'm looking for academic research papers which apply the Hough transform to audio spectra, to recognize the harmonic structure of tonal audio and thus determine the fundamental pitch. (i.e. the Hough space would be a 1D space over fundamental frequency).
I've spent probably 90 minutes, over the several times I've read an optimistic post like this, asking various LLMs (mostly GPT-4o, though my early tests predate GPT-4o, and I've also tried Gemini and Claude), prompts along the lines of
> The Hough transform detects patterns with certain structure in images, e.g. circles or lines.
> I'm looking for academic research papers (please link them or provide a DOI.org link at least) which apply the Hough transform to audio spectra, to identify the harmonic structure of audio and thus determine the fundamental pitch.
> Make sure to provide only papers that actually exist. If you can't find anything particularly relevant, say so as a disclaimer & just provide the most relevant papers you can.
This is a reliable "fake paper generator", unfortunately - it'll just make up plausible garbage like
> Here are some academic papers related to applying the Hough transform to audio spectra for identifying harmonic structures and fundamental pitch:
> "An Audio Pitch Detection Algorithm Based on the Hough Transform"
> Authors: Mark W. and John D.
> Published In: IEEE Transactions on Audio, Speech, and Language Processing
> DOI: 10.1109/TASL.2008.2000773
> Abstract: This paper proposes an audio pitch detection algorithm that utilizes the Hough transform to analyze the harmonic structure of audio spectra and determine the fundamental pitch.
This paper does not exist. Complete waste of my time. And again, this behavior persists over the >1 year period I've been trying this query.
And it's not just search-like tasks. I've tried asking for code and gotten stuff that's outright dangerous (try asking for code to do safe overflow-checked addition on int64_t in C- you have about an 80% chance of getting code that triggers UB in one way or another). I've asked for floating-point calling conventions on RISC-V for 32-bit vs 64-bit (would have been faster than going through the extension docs), and been told that RV64 has 64 floating-point registers (hey, it's got a 64 in the name!). I've asked if Satya Nadella ever had COVID-19 and been told- after GPT-4o "searched the web"- that he got it in March of 2023.
As far as I can tell, LLMs might conceivably be useful when all of the following conditions are true:
1. You don't really need the output to be good or correct, and
2. You don't have confidentiality concerns (sending data off to a cloud service), and,
3. You don't, yourself, want to learn anything or get hands-on - you want it done for you, and
4. You don't need the output to be in "your voice" (this is mostly for prose writing, for code this doesn't really matter); you're okay with the "LLM dialect" (it's crucial to delve!), and
5. The concerns about environmental impact and the ethics of the training set aren't a blocker for you.
For me, pretty much everything I do professionally fails condition number 1 and 2, and anything I do for fun fails number 3. And so, despite a fair bit of effort on my part trying to make these tools work for me, they just haven't found a place in my toolset- before I even get to 4 or 5. Local LLMs, if you're able to get a beefy enough GPU to run them at usable speed, solve 2 but make 1 even worse...
Just out of curiosity: Have you tried perplexity? When I paste your prompt it gives me a list of
2 researchgate papers (Overlapping sound event recognition using local spectrogram features with the Generalised Hough Transform
July 2013 Pattern Recognition Letters)
and one ieee publication (Generalized Hough Transform for Speech Pattern Classification, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 11, pp. 1963-1972, Nov. 2015)
When I am looking for real web results chatgpt is not very good, but perplexity very often shines for me
and for python programming have a look at withpretzel.com which does the job for me.
> 1. You don't really need the output to be good or correct
> 2. You don't have confidentiality concerns (sending data off to a cloud service)
At $PREVIOUS_COMPANY LLMs were straight up blanket banned for these reasons too. Confidentiality related to both the code and data for the customers.
The possibility that "it might get some things right, some of the time" was nowhere near a good enough trade-off to override the confidentiality concerns.
And we definitely did not have staff/resources to do things local only.
I’ve found that it really matters a lot how good the LLM is on how large the corpus it is that exists for its training. The simple example is that it’s much better at Python than, say, Kotlin.
Also, I also agree with sibling comment that in general the specific task of finding peer reviewed scientific papers it seems to be especially bad at for some reason.
I've been using the JetBrains AI model assisted autocomplete in their IDEs, including for Kotlin. It works well for repetitive tasks I would have copy/paste/edited before, and faster, so I have become more productive there.
I've not yet tried asking LLMs Kotlin-based questions, so don't know how good they are. I'm still exploring how to fit LLMs and other AI models into my workflow.
I see no sibling comment here even with showdead on, but I could buy that (there's a lot of papers and only so many parameters, after all- but you'd think GPT-4o's search stuff would help, maybe a little better prompting could get it to at least validate its results itself? then again, maybe the search stuff is basically RAG and only happens one at the start of the query, etc etc)
Regardless, yeah- I can definitely believe your point about corpus size. If I was doing, say, frontend dev with a stack that's been around a few years, or Linux kernel hacking as tptacek mentioned, I could plausibly imagine getting some value.
One thing I do do fairly often is binary reverse engineering work- there's definitely things an LLM could probably help with here (for things like decompilation, though, I wonder whether a more graph-based network could perform better than a token-to-token transformer - but you'd have to account for the massive data & pretrain advantage of an existing LLM).
So I've looked at things like Binary Ninja's Sidekick, but haven't found an opportunity to use them yet - confidentiality concerns rule out professional use, and when I reverse engineer stuff for fun ... I like doing it, I like solving the puzzle and slowly comprehending the logic of a mysterious binary! I'm not interested in using Sidekick off the clock for the same reason I like writing music and not just using Suno.
One opportunity that might come up for Sidekick, at least for me, is CTFs- no confidentiality concerns, time pressure and maybe prizes on the line. We'll see.
Yeah, I spent 6 months trying to find any value whatsoever out of GitHub copilot on C# development but it’s barely useful. And then I started doing python development and it turns out it’s amazing. It’s all about the training set.
At least one paper about the Hough Transform here[1] should be of interest to you.
I'm afraid your prompts are the exact example of "holding it wrong". Replacing Wikipedia or Google is not what LLMs do. Think of them as a thinking engine, not as a "semantic search" of the Internet.
However, I've got great news for you: the app you're looking for exists, and it's a YC company. They've recently launched on here[0].
When I use the description from your post as the prompt (not your actual prompt that you quoted underneath), I get these clarifying questions:
> Applying the Hough transform to audio spectra for pitch recognition is an interesting extension of its typical use in image processing for line and circle detection.
> Can you clarify which specific types of harmonic structures you're hoping the Hough transform will detect in audio spectra? Are you interested in recognizing harmonic series in general, or are you targeting specific instrument voices or vocal data? Additionally, are there any constraints on the types of audio signals you'd want this method applied to—such as clean synthetic tones versus real-world noisy recordings?
> Just to ensure we're on the same page, are you specifically looking for papers that describe the application and methodological details of using the Hough transform in this context, or would you also be interested in papers that discuss the performance and comparative effectiveness of this approach against other pitch detection algorithms?
Now I've got no clue what your answers to these would be, but here are the search results[1]. Presumably that is a better tool for your purposes.
The article goes through a few use cases where LLMs are especially good. Your examples are very different, and are the cases where they perform especially poorly.
Asking a pure (ie no internet/search access) LLM for papers on a niche subject is doubling down on their weaknesses. That requires LLMs to have very high resolution specific knowledge, which they do not have. They have more coarse/abstract understanding from their training data, so things like paper titles, DOIs, etc are very unlikely to persist through training for niche papers.
There are some LLMs that allow searching the internet; that would likely be your best bet for finding actual papers.
As an experiment I tried your exact prompt in ChatGPT, which has the ability to search, and it did a search and surfaced real papers! Maybe your experiment was from before it had search access. https://chatgpt.com/share/a1ed8530-e46b-4122-8830-7f6b1e2b1c...
I can't really vouch how well these papers match what you're looking for since I'm not an expert on Hugh transforms (would love to know if they are better!). But my technique was: first ask it about Hugh transforms. This lets me (1) verify that we're on the same page, and (2) loads a bunch of useful terms into the context for the LLM. I then expand to the example of using Hugh transforms for audio, and again can verify that we're on the same page, and load even more terms. Now when I ask it to find papers, it had way more stuff loaded in context to help it come up with good search terms and hopefully find better papers.
With regards to your criteria:
1. The code from an LLM should never be considered final but a starting point. So the correctness of the LLM's output isn't super relevant since you are going to be editing it to make it fully correct. It's only useful if this cleanup/correction is faster than writing everything from scratch, which depends on what you're doing. The article has great concrete examples of when it makes sense to use an LLM.
2. Yep , although asking questions/generating generic code would still be fine without confidentiality concerns. Local LLMs though do exist, but I personally haven't seen a good enough flow to adopt one.
3. Strong disagree on this one. I find LLMs especially useful when I am learning. They can teach me eg a new framework/library incredibly quickly, since I get to learn from my specific context. But I also tend to learn most quickly by example, so this matches my learning style really well. Or they can help me find the right terms/words to then Google.
4. +1 I'm not a huge fan of having an LLM write for me. I like it more as a thinking tool. Writing is my expression. It's a useful editor/brainstormer though.
Also agree that asking for academic papers seems to increase the potential for hallucination. But, I don't know if I am prompting it the best way in these scenarios..
I also make use of LLMs to help me with certain programming problems. But this author simply glides over a very important issue: how do you use LLMs responsibly? What does it mean to be responsible in your work?
If all of this is just a hobby for you, then it doesn't matter. But it matters a lot when you are serving other people; it matters when you must account for your work.
You could make the case that all testing is a waste of time, because "I can do this, and this, and this. See? It appears to work. Why bother testing it?" We test things because it's irresponsible not to. Because things fail fairly often.
I'm looking through the author's examples. It appears that he knows a lot about technology in general, so that he can be specific about what he wants. He also appears to be able to adjust and evaluate the results he gets. What if someone is bad at that? The LLM can't prompt itself or supervise itself.
I come to everything with the mindset of a tester. LLMs have most definitely been overhyped. That doesn't mean they are useless, but any article about what they are able to do which doesn't also cover how they fail and how you be ready for them to fail is a disservice to the industry.
Sounds like the author is trying really hard to find an edge use case for an LLM. Meanwhile on YouTube... "I Made 100 Videos In One Hour With Ai - To Make Money Online"
Must be nice to work on stuff that doesn’t compete with “intelligence as a service.” I feel that’s an empty set, and everyone using these services actively rationalizes selling out the human race by *paying to get brain raped.*
I've had the most current publicly available models fail to give me even simple correct boilerplate code, but the guy is like: ...we have to be nuanced but, "Converting dozens of programs to C or Rust to improve performance 10-100x."? Seriously?
I also recently asked openai's gpt 4 which number is bigger, 3.8 oder 3.11, and it was pretty damn sure that it's 3.11, because 11 bigger 8, obviously. Another time I asked Meta Llama 3.1 70B and gpt 4 multiple times using a variation of prompts to suggest a simple code example for a feature in a project I was working on. They confidently provided code that was total nonsense, configuration that did nothing and even offered a dependency that didn't exist but somewhat sounded like the prompt itself.
I cannot predict the future. Maybe all of this will lead to something useful. Currently though? Massively overhyped. I've talked to CS colleagues and friends who also work as software developers that are all way more competent than me about their experiences, and some were exited about the prospects, none could provide a current use case for their work. The only instances I know in which people talk positively about these models are in online articles like this or in comment sections adjacent to them. Never in real live among actual developers in person.
User dang (Daniel Gackle) is the main moderator and admin of Hacker News. Your username basically says that he sucks ;). I don't know if that's why you are getting flagged though (if you are getting flagged).
GP claims in profile to be Dan Green, and implies the username is to be read Dan G's UX... If that's true and just a coincidence, it's pretty dang funny. (I do doubt it though.)
Yeah. There are many living dead users in HN, but if you go thru their post history it's well deserved. This particular user seems just naive. That's why I warned him. He could write to support email but will be hard to convince them it's not a joke. Even my warning got flagged/dead. Might be better for him to create another user.
"And that's where language models come in. Because most new-to-me frameworks/tools like Docker, or Flexbox, or React, aren't new to other people. There are probably tens to hundreds of thousands of people in the world who understand each of these things thoroughly. And so current language models do to. "
Apparently not using it to proof-read or it would end with "too. "
If you’re concerned about the environment, that is a trade you should take every time. AI is 100-1000x more carbon-efficient at writing (prose or code) than a human doing the same task. https://www.nature.com/articles/s41598-024-54271-x
The way this paper computes the emissions of a human seems very suspect.
> For instance, the emission footprint of a US resident is approximately 15 metric tons CO2e per year [22], which translates to roughly 1.7 kg CO2e per hour. Assuming that a person’s emissions while writing are consistent with their overall annual impact, we estimate that the carbon footprint for a US resident producing a page of text (250 words) is approximately 1400 g CO2e.
Averaging this makes no sense. I would imagine driving a car is going to cause more emissions than typing on a laptop. And if we are comparing "emissions from AI writing text" to "emissions from humans writing text" we cannot be mixing the the latter with a much more emissions causing activity and still have a fair comparison.
But that's besides the point, since it seems that the number being used by the authors isn't even personal emissions -- looking at the source [22], the 15 metric tons CO2e per year is labeled as "Per capita CO₂ emissions; Carbon dioxide (CO₂) emissions from fossil fuels and industry. Land-use change is not included."
This isn't personal emissions! This is emissions from the entire industrial sector of the USA divided by population. No wonder why AI is supposedly "100-1000x" more efficient. Counting this against the human makes no sense since these emissions are completely unrelated to the writing task the person is doing, its simply the fact they are a person living in the world.
> its simply the fact they are a person living in the world.
That's the whole point! If a task requires some time from a human, then you have to include the appropriate fraction of the (huge!) CO2 cost of "being a human" - the heating/cooling of their house, the land that was cleared for their lawn, and the jet fuel they burn to get to their overseas trip, etc, because all of those are unalienable parts of having a human to do some job.
If the same task is done by a machine, then the fraction of the fixed costs of manufacturing the machine and the marginal costs of running (and cooling) it are all there is.
I don't follow this argument, and there would still be issues with the computation anyways.
1) Pretend I want something written, and I want to minimize emissions. I can ask my AI or a freelancer. The total CO2 emissions of the entire industrial sector has nearly no relation to the emissions increase by asking the freelancer or not. Ergo, I should not count it against the freelancer in my decision making.
2) In the above scenario, there is always a person involved - me. In general, an AI producing writing must be producing it for someone, else it truly is a complete waste of energy. Why do the emissions from a person passively existing count when they are doing the writing, but not when querying?
3) If you do think this should be counted anyways, we are then missing emissions for the AI as the paper neglects to account for the emissions of the entire semiconductor industry/technology sector supporting these AI tools; it only computes training and inference emissions. The production of the GPUs I run my AI on are certainly an unalieanable part of having an AI do some job.
This article presumes that humans cease to emit when not being asked to program. When you use AI, you get both the emissions of AI and the emissions of the person who you did not use, who continues to live and emit.
>I'm also a security researcher. My day-to-day job for nearly the last decade now has been to show all of the ways in which AI models fail spectacularly when confronted with any kind of environment they were not trained to handle.
> ... And yet, here I am, saying that I think current large language models have provided the single largest improvement to my productivity since the internet was created.
>In the same way I wouldn't write off humans as being utterly useless because we can't divide 64 bit integers in our head---a task completely trivial for a computer---I don't think it makes sense to write off LLMs because you can construct a task they can't solve. Obviously that's easy---the question is can you find tasks where they provide value?
I'm a C programmer, so I can with some effort gradually work my way through random Linux kernel things. But what I can do now instead is take a random function, ask GPT4 what it does and what subsystem it belongs to, and then ask GPT4 to write me a dummy C program that exercises that subsystem (I've taken to asking it to rewrite kernel code in Python, just because it's more concise and easy to read).
I don't worry at all about GPT4 hallucinating stuff (I'm sure it's doing that all the time!), because I'm just using its output as Cliff's Notes for the actual kernel code; GPT4 isn't the "source of truth" in this situation.