"Participants responded to
a total of 18 tasks (or as many as they could within the given time frame). These
tasks spanned various domains. Specifically, they can be categorized into four types:
creativity (e.g., “Propose at least 10 ideas for a new shoe targeting an underserved
market or sport.”), analytical thinking (e.g., “Segment the footwear industry market
based on users.”), writing proficiency (e.g., “Draft a press release marketing copy for
your product.”), and persuasiveness (e.g., “Pen an inspirational memo to employees
detailing why your product would outshine competitors.”)."
Ha it's fun to dunk on management consultants but I think the magic is they are like pop music producers.
Somehow they're able to make the C suite hoover up LLM shovelware the same way top producers can take super obvious music and sell I V vi IV but when we try the same chords it's uninspired and no one wants to listen to it
It’s insane that people think producing pop music is easy. Competing on the pop music market is a cutthroat business with a lot of competition and the good ones get their price back in gold.
When Scorpions wanted to be resurrected they hired Desmond Child to produce them and he absolutely crushed. These people are very good at what they do and there are very few of them.
The perception is that success in the field is driven largely by factors other than the quality of music. It'd be extremely interesting to see a Richard Bachman / Steven King [1] type experiment with a Desmond Child, Max Martin, or whoever else.
Keep their existence completely out of the picture, and have them scout and produce talented no-name, but require the no-name to use only the sort of avenues that would be openly available to anybody/everybody: YouTube, Tunecore, social media, etc. Would the new party now be meaningfully likely to have a real breakthrough?
A lot of the public believes that what you're talking about already happens, for what it's worth. "Industry plants."
Something can be extremely catchy yet widely panned as low quality in music, so even within "just the music" there are several dimensions at play regardless of marketing, etc. Such as whether it's timed right - are there enough people ready for that song at that time?
The idea that "most people will just listen and be fans of whatever the big media companies put out there" doesn't stand up much examination or conversation with "most people."
People do often make breakthroughs on soundcloud, TikTok, whatever - do you think having the invisible support of a Max Martin would lower their chances? You'd need to do your experiment a hundred times or thousand times or so before you could really compare the success rate of your plants to the rest of the crowd, but it's hard for me to believe that they wouldn't have an advantage. The music industry isn't known for their charity, if they could get away with not paying those people without another label beating them in the market, why would they?
But the qualities of the song are not what make it popular. With "pop" music there are far more important forces at play, namely the quantity and quality of the song's publicity. One of the big things you get with a big time producer is big time connections and a lot of "airplay" in mixes, commercials, TV shows, etc. You also open the door for more collaborations with other popular artists.
Occasionally you'll have a song that breaks through due to sheer catchy-ness, but this is the exception rather than the rule.
In practice you need both. Max Martin himself has produced and songwritten for plenty of no-names, but for an artist that has the requisite marketing support, bad or uncatchy pop songs can absolutely ruin an artist who would otherwise make it big.
This is admittedly very niche but I see your point.
Also: I remember a time in the early 00s when almost every song on MTV began with Rodney Jerkins whispering “DARKCHILD” over the music.
Ultimately isn’t it just branding though? Would you buy Coca-Cola if it had some other label on the bottle? Or watch Mission Impossible 14 starring Some Dude? I’m not sure there’s a lot of fields where things are really competing on their own merits rather than the accumulation of their past successes.
Worth here is the key word. I did an MBA which was specifically designed to get people into consulting, we did many consulting projects for real multinantionals and the key lesson was: "What does the hidden client want?"
i.e. someone has been tasked with getting some consultants to come up with a suggestion, the key question however is what does THEIR boss want to hear. If you can work that out and give it to them then you've earned your 'worth'
Well, there's also the case where some internal engineering and product management think they have the right answer. But it may be a literally bet the company sort of thing. (Especially outside the software realm where once the bus has left the station it's not turning around.)
Now I know there are people here whose reaction is that executive management should just shut up and listen to what the worker bees say. But it actually doesn't seem unreasonable to me (and I've been on the product management end of things in a case like this) to have some outside perspective from some people who are mostly pretty smart to nod their heads and say this seems sensible.
As a bonus they create big spreadsheets that make the business planning types happy and keep them out from underfoot.
I hope someone builds a management consultant gpt to find out.
Until then, our leaders, experts and institutions were few of those things during the pandemic.
How large businesses are built and run has changed faster and more in the past few years than the principled predictions of a business’ future vector that are based on lagging indicators.
It also depends on how cookie cut the management consultant frameworks and “toolkits” are.
It’s no coincidence that it’s mostly juniors doing so much of the work and billing. New or average talent is more profitable per hour to bill than experience.
Financially, if $7-8 of every $10 for improvement went to a management consulting undertaking, the other $2-3 is what’s left over for the rest without even knowing. This would be the coup, if this were true.
The fun part to watch for is is tech people will be able to learn business easier than business people will be able to learn and apply tech when they can’t understand it’s capabilities or possibilities beyond speaking points.
The technical analyst will M&A the business analyst. Maybe they learn to extract Management Consultant type value too.
Check out the movie "The Wrecking Crew". It's about studio musicians fixing, rewriting, playing, and singing the music created by the "bands" you've all heard of, so the albums were good.
Then, the bands belatedly had to learn how to sing and play in order to go on tour.
I think about The Wrecking Crew whenever I hear the sob stories about bands being underpaid and the producers reaping the lion's share of the profits.
There's the story of David Cassidy. He got cast in the Partridge Family (the rock band family) largely because of his looks and his mother (Shirley Jones). His voice was set to be dubbed for the songs, but it turned out he had a golden throat. (The rest of the cast, besides Jones, could neither play nor sing.)
The producers hired top shelf songwriters to write the songs, and several hit albums were produced. (It really is good music, despite being bubblegum.)
Cassidy, however, decided that he had songwriting talent and chafed. He eventually left the show, and with the megabucks he earned on the show, produced albums. They're terrible.
So much of it is about information awareness. Like it or not, these consultants and analysts talk to hundreds of C-levels all the time. They become excellent information sources about what is working, what is not working, and about business risks that a particular executive may not be aware of. Yes, there is the potential for group-think, and the bad ones shill for a particular technology or process without any basis in success. But the good ones provide guidance to the executives that might be working in information-free areas, making them aware of concepts, technologies, and processes that either present risks to their businesses or represent good practices they really should adopt. It's easy to be cynical about this, but there are many good business leaders who are not analytical, and are in need of this kind of guidance.
the comment might be spot-on for some companies and industry, but really.. not all business culture is the same. By painting "all management" in this light you are showing the same one-dimensional thinking that is being criticized here..
“I paid a world-class consulting consultant company top dollar to vet this idea and they produced ton of documents about how great it was. And, yet it failed. But, I’m not at fault here. What more would you have had me do?”
There was an article on HN years ago about top grade from Harvard-like schools being sucked into consulting companies and discovering their job was to be paid tons of money writing reports that support whatever the exec of the moment wanted to hear.
Yeah, I think that's the majority of the grunt work done by junior consultants.
I don't doubt that if the high level decision agreed upon is "more layoffs because AI" and they were asked for a 60 page report to justify it that ChatGPT would help inordinately in fleshing it out with something that sounds fairly plausible.
There's a lot of boilerplate that actually takes quite a while to write from scratch. If the people involved have a pretty good idea in their heads of what fundamentals are fairly sensible and which are probably sort of irrelevant or even wrong, something like ChatGPT is actually pretty good at churning out at least a decent pre-draft that can save quite a bit of time. I've used it a fair bit for introductory background that I can certainly clean up faster than I could put together from scratch.
Or, you have a break out hit like the Spice Girls.
If it was so bad, then why do people listen?
There is still a market.
Does the market suck? Full of idiots?
Your argument ends up being that successful things are bad, because humans are just idiots and thus if something is successful it is because it is just liked by idiots.
As much as I might agree generally, it doesn't get you far.
Honey, everything is an "intentional, manufactured factory". Or do you just wake up every morning and say to yourself "let me type some random code and see what kind of software comes out"??
> Your argument ends up being that successful things are bad, because humans are just idiots and thus if something is successful it is because it is just liked by idiots.
I mean I'm not sure that this is that far from the truth in some domains, though it depends on how you define idiocy. There is, for example, a market for demolition derbies. Of course all of us are idiots in some ways so we should be careful about whom we disparage.
Sounds like you are tunnel visioning in your analysis of the music? There are so many more things to it than the chord progression.
On the songwriting side, there's the lyrics which have both a phonetic and a semantic component. There's also the fact that many people will mishear the lyrics and their evaluation of them will be based on the mishearing. There's the melody. Does it work together with the chords to highlight the key parts of the lyrics?
Then there's the performance where there are a million ways to stand out or flop. Loudness, timbre, timing and even detuning can all be used for expression.
> A confident GPT hallucination is almost indistinguishable from typical management consulting material...
If you're measuring based on output, sure, but... the value of any knowledge worker is primarily driven by the input, that is, a client doesn't want "10 ideas" they want "10 [valuable] ideas [informed by the understanding of the business and the market they're operating in]". If a management consultant said "boat shoes" in response to this question they would not have a client much longer.
You could apply this same nonsense task to software engineering, i.e: ask ChatGPT to "write 10 lines of code" and it'll be indistinguishable from the code we churn out day after day.
Even if you throw all of them away, the point is the human is simply more effective in their role. Just breaking the ice, the writer's block; anything to get out of the creative rut that we all fall in, is worth its weight in GPUs.
Thank you, you express it much better than I can. "Diabetic-Friendly Shoes" from the above list immediately had me thinking in half a dozen different directions at once.
I’m just imagining stepping in dog poop and my only way to help myself is also covered in dog poop.
The shoes seem to give you two options for cleaning up your dogs poop: (1) Bend down twice, once for the bag, once for the poop. (2) Bend down once nice and close to the poop and get a bigger whiff than otherwise.
I’m just not looking for ways to interact manually with my shoes more than I have to…
> A confident GPT hallucination is almost indistinguishable from typical management consulting material...
Sounds perfect for both Harvard and those linked to the institution.
My employer has hired McKinsey a few times, known to recruit from HYP, and their output has been subpar to say the least. My entire experience with these institutions has been fairly uniform in that regard.
I know it’s anecdotal. But it feels like there’s a lot of confirmation bias with these sorts of studies.
Well, they buried the lede with this one. Using LLMs were better for some tasks and actually made it worse for others.
The first task was a generalist task ("inside the frontier" as they refer to it), which I'm not surprised has improved performance, as it purposely made to fall into an LLM's areas of strength: research into well-defined areas where you might not have strong domain knowledge. This also is the mainstay of early consultants' work, in which they are generalists in their early careers – usually as business analysts or similar – until they become more valuable and specialise later on.
LLMs are strong in this area of general research because they have generalised a lot of information. But this generalisation is also its weakness. A good way to think about it is it's like a journalist of research. If you've ever read a newspaper, you often think you're getting a lot of insight. However, as soon as you read an article on an area of your specialisation, you realise they've made many flaws with the analysis; they don't understand your subject anywhere near the level you would.
The second task (outside the frontier) required analysis of a spreadsheet, interviews and a more deeply analytical take with evidence to back it up. These are all tasks that LLMs aren't strong at currently. Unsurprisingly, the non-LLM group scored 84.5%, and between 60% and 70.6% for LLM users.
The takeaway should be that LLMs are great for generalised research but less good for specialist analytical tasks.
I was thinking about this last night. It’s a new version of Gell-Mann amnesia. I call it LLm-man amnesia.
When I ask a programming question, chat GPT hallucinates something about 20% of the time and I can only tell because I’m skilled enough to see it. For all the other domains I ask it questions if I should assume at least as much hallucination and incorrect information.
I see this as for drill-down thinking from a broad -> specific concept AI seems to be helpful when supplementing specialist work. However like you both mentioned: when needing more focused and integrated answers AI tends hinders performance.
However as the paper noted, when working within AIs areas of strength it improved not only efficiency but the quality of the work as well (accounting for the hallucinations). As you mentioned:
> When I ask a programming question, chat GPT hallucinates something about 20% of the time and I can only tell because I’m skilled enough to see it
This matches their Centaur approach, delineating between AI and one’s own skills for a task which—with generalized work—seems to fair better than not using AI at all.
This is hilarious. As impressive as GPT-3/4 has been at writing, what's more shocking is just how bullshity-y human writing is.. And a "business consultant" is the epitome of a role requiring bullshit writing. Chat GPT could certainly out business-consultant the very best business consultants.
Sometimes to be taken seriously at work, you need to take some concise idea or data and fluff it up into a multiple pages or a slide deck JUST so that others can immediately see how much work you put in.
The ideal role for chatgpt at this moment is probably to take concise writings and to expand it into something way larger and full of filler. On the receiving end, people will endure your long-winded document or slide deck, recognize you "put in the work", and then feed it back into chatGPT to get the original key points summarized.
> As impressive as GPT-3/4 has been at writing, what's more shocking is just how bullshity-y human writing is..
Yeah. Most people have focused on what LLMs can do, but I think it’s equally if not more interesting what can they not do, and why?
When we say LLMs can generate text we’re painting brush strokes as broad as a 10-lane highway. Apparently we have quite limited vocabulary about what writing actually is, and specifically what categories and levels exist.
For instance, it’s fun (and in my view completely expected) to see that courteous emails, LinkedIn inspirational spam, corp-speech etc, GPT outperforms humans with flying colors, on the first attempt too! Whereas if you’re asking for the next book of Game of Thrones or any well-written literature it falls flat – incredibly boring, generic, full of platitudes and empty arcs and characters.
We have to start mapping the field of writing to a better conceptual space. Currently it seems like we can’t even differentiate between the equivalent of arithmetic and abstract algebra.
To me it looks very analogous to AI-generated "art", it's very easy to generate some generally esthetically pleasing visuals, but the depth of the art stays in proportion with the input effort... Which is often not much.
All of this shouldn't be very surprising really, and there's still a lot of usefulness to it, if only for depreciating the low-quality copy-paste productions and making the really unique and novel ones even more valuable.
Yeah, I couldn't say, it's just "vibes" I guess, just like the filler text produced by business consultants it's just not something that I feel would be missed if lost. It all looks the same at some level even when it's superficially different. Midjourney especially is very uniform in this regard, everything looks great, but it's kind of flat at the same time.
LLMs are stunningly good at language tasks: almost all of what us old-timers called NLP is just crushed these days. Summarization, Q&A, sentiment, the list goes on and on. Truly remarkable stuff.
And where there isn’t a bright line around “fact”, and where it doesn’t need to come together like a Pynchon novel, the generative stuff is smoking hot: short-form fiction, opinion pieces, product copy? Massive productivity booster, you can prototype 20 ideas in one minute.
But that’s about where we are: lift natural language into a latent space with some clear notion of separability, do some affine (ish) transformations, lower back down.
Fucking impressive for a computer. But if it can really carry water for an expensive Penn grad?
You’re paying for something other than blindingly insightful product strategy.
I wonder how long it takes AI to get good at law. Right now the verbal tasks it excels at are similar to the artistic ones: namely, solving problems with enormous solution spaces that are robust to small perturbations. That is, change a good picture of an angry tree man slightly and it's still probably a good picture of an angry tree man.
I've tried using a lot for writing motions. It can actually do a pretty decent job of writing motions, and it can come up with some arguments that you might not have thought of. You just have to ignore all its citations and look everything up yourself, otherwise this:
Isn't ChatGPT getting progressively better scores on medical and law exams? It will probably pass the USMLE and the bar one day. If it doesn't already.
Yes, but we should expect that, the answers are in its training data.
The problem is passing tests are an okay proxy for competence in humans, but if you think of LLMs as a giant library search engine, the thing it is competent at is identifying and regurgitating compiled phrases from its records.
Yes and that's amazing -- but law exams resemble programming exams. In the wild, both labors require you to keep a mountain of project-specific context in your head, something that tests like the LSAT cannot evaluate.
I don't buy it. LLMs cannot do anything reliably, no matter how constrained the domain. Their outputs are of acceptable quality when back to a person who will use their human brain to paper over the cracks. People can recognize when the output is garbage, figure out minor ambiguities, and subconsciously correct minor factual or logical errors. But I would never feed LLM results directly into another computer program This rules out most traditional NLP tasks.
I’m sympathetic to the instinct to push back on the absurd boosterism (these things are an existential threat to humanity this year), it’s fucking annoying.
But they can do plenty of useful stuff reliably. It’s not “be generally intelligent”, which they are just nothing even remotely close to, but know you don’t dig the LLM hype from that comment? Yeah, they get that every time.
I have tried to use LLMs, namely GPT4 and Llama-2, for sentiment analysis. They did quite poorly. I asked them to identify which sentiments from a list are found in a given text, with output formatted as a comma-separated list. In response I usually got just that. But sometimes I got a prose explanation of the sentiment, a list containing different keywords than requested, a list formatted differently than I wanted, or nothing useful at all.
Sentiment analysis is easy. It wasn't the end goal, just a "hello world" example to verify my tools were set up correctly. I ran into unsolvable problems in the tutorial.
I have no use for tools which do amazing things sometimes but which cannot be reasoned about and cannot be prevented from producing garbage. Maybe other people will find uses for them, though. I'll keep an open mind and check back in five years.
I too thought flagging the comment was a bit too harsh so I tried vouching for it, but it didn't get resurrected.
I note that the comment is [dead] not [flagged] [dead], so maybe its state has to do with something else than the content of the comment? Just [dead] is, I think, shadowban.
I checked the poster's comments, but since it's a new account there's very few of them and I can't determine the reason for the [dead] from them.
> almost all of what us old-timers called NLP is just crushed these days
For this to be true for most production service use cases, LLMs would need to be at least ~10X faster. I generally agree they can be quite good at these tasks, but the performance is not there to do them on large datasets.
I’m starting to think there’s an LLM equivalent to the old saying about how everything the media writes is accurate except on the topics you’re an expert in. All LLM output looks to be good quality except when it’s output you’re an expert in.
People who have no background in writing or editing think LLMs will revolutionize those fields. Actual writers and editors take one look at LLM output and can see it’s basically valueless because the time taken to fix it would be equivalent to the time taken to write it in the first place.
Similarly people who are poor programmers or have only a surface level understanding of a topic (especially management types who are trying to appear technical) look at LLM output and think it’s ready to ship but good programmers recognize that the output is broken in so many ways large and small that it’s not worth the time it would take to fix compared to just writing from scratch.
LLMs are not worthless for programming. You just cannot expect it to ship a full programm for you, but for generating functions with limited scope, I found it very useful. How to make use of a new and common libary for example. But of course you have to check and test.
And for text I know people who use it succesfully (professionally) to generate texts for them as a summary from some data. They still have to proof read, but it saves them time, so it is valuable.
I've been using it for code review. I just paste some of my code in and ask the AI to critique it, suggest ideas and improvements. Makes for a less lonely coding experience. Wish I could point it to my git repositories and have it review the entire projects.
I've had mixed experiences with getting it to generate new code. It produced good node.js command line application code. It didn't do so well at writing a program that creates 16 bit PCM audio file. I asked it to explain the WAV file format and things like lengths of structures got so confusing I had to research the stuff to figure out the truth.
This mirrors my experience. Very helpful writing node.js application code, but struggles to walk through simple operations in assembly. My hunch is that the tokenization process really hurts keeping the 1s and 0s straight.
It's been hit or miss with rust. It's super helpful in decrypting compilation errors, decent with "core rust" and less helpful with 3rd party libraries like the cursive TUI crate
Which comes as no surprise, really, as there's certainly less training data on the cursive crate than, say, expressjs
Also FWIW I have actually pointed it at entire git repos with the WebPilot plugin within ChatGPT and it could explain what the repo did, but getting it to actually incorporate the source files as it wrote new code didn't work quite so well (I pointed it to https://github.com/kean/Get and it would frequently fall back to writing native Swift code for HTTP requests instead of using the library)
They can be worse than worthless. They can sabotage your work if you let them making you spend even more time fixing it afterwards.
For an example. I've used Gpt4 as a sort of Google on steroids with prompts like "do subnets in gcloud span azs" and ", "in gcloud secret manager can you access secrets across regions". I very quickly learned to ask "is it true" after every answer and to never rely on a given answer too much(verify it quickly, don't let misinformation get you too far down the wrong route). So is it useful? Yes, but can it lead you down the wrong path? It very well can. The least experience you have in the field the easier it will happen.
>You just cannot expect it to ship a full programm for you, but for generating functions with limited scope, I found it very useful
Entire functions? Wow. I found it useful for generating skeletons I then have to fill by hand or tweak. I don't think I ever got anything out of Gpt4 that is useful as is (maybe except short snippets 3 lines long).
However, I found it extremely useful in parsing emails received from people or writing nice sounding replies. For that it is really good (in English).
Nobody ever made a code editor plugin that reads random SO answers and automatically pastes them over your code.
The amount of fighting I needed against MS development tools mingling my code recently is absurd. (Also, who the fuck decided that autocomplete on space and enter was a reasonable thing? Was that person high?)
>"I found it useful for generating skeletons I then have to fill by hand or tweak".
Even this can be a big time saver, that increases productivity.
Just like others have said, it isn't going to write a Pynchon novel, but it does do a great job at the other 99% of general writing that is done.
Same for computers, the average programmer isn't creating some new Dijkstra Algorithm every day, they are really just cranking out connecting things together and doing the equivalent of 'generic boiler plate'.
I have met my share of folks with decades of experience that was not of quality. The most hilarious are those that open tar gz files using notepad wondering where the code is to those that work on the web but dont know what xsrf is. Experience while long if it’s of the not so great type doesnt count. Not saying this is the case.
LLMs do produce impressive code. Even if they were indeed just procedural generators it would still be impressive. The code has structure and appears useful.
But the issue is that you can tell it makes no sense, there is no thought process behind it. It fits in no greater picture.
Even if you add more context it still has no purpose.
People that find this useful are the same type that copy stackoverflow code that they dont understand. It kinda works when it does but again it doesnt fit in the bigger picture.
Code isnt about spelling instructions - an…ai can do that - code is about what goes where in a way that the what changes as often as the where. It’s the bigger picture. So yes it can help and replace those that spell instructions but it will be hard to replace those that are required to deliver more.
While it is impressive that an ai can generate all this, the code is anything but significant. Using triggers for history is one sure way to bring a scalable system down fast and one of the first lessons a junior will learn.
I honestly don’t understand how people can say LLMs are useless for coding. Have you tried ChatGPT 4, or are you basing this take on the obsolete 3.5? I’m a professional programmer and I think LLMs are extremely useful.
I’ve used GPT 4. It’s not helpful in any domain in which I’m already proficient. If I’m having to use a new language or platform for whatever reason it’s mildly quicker than alt-tabbing to stack overflow, but probably not worth the subscription.
For graphics tasks GenAI is absurdly helpful for me. I can code but I can’t draw. Getting icons and logos without having to pay a designer is great.
Yep. ChatGPT is like having a junior engineer confidently asking to merge broken garbage into your codebase all the time. Adds negative value for anyone that knows what they’re doing.
But with one crucial difference: it's a junior programmer that can make changes based on your feedback in a few seconds, not a few hours. And it never gets tired or frustrated.
hahahah. A friend of mine has a problem with a contractor at his workplace that tries to PR in shell scripts written with Copilot. My friend spends an hour to explain why a script generated in 5 minutes is horrifically awful and will likely take down the company. He's legitimately angry about it.
It seems like the only ways to subordinate programming tasks are to write tests for your subordinate's code, or to review it tediously yourself, or to just trust the hell out of them.
> I’m starting to think there’s an LLM equivalent to the old saying about how everything the media writes is accurate except on the topics you’re an expert in.
This is true for media articles but for LLMs I feel like it's the opposite. Like people who aren't specialists don't fully appreciate how great it is at those tasks.
Most management consultants are useless. But there are some realities you must accept.
Number 1. In a team of 20-30 engineers there is only one extremely god "why is he with us" engineers who is great at technical stuff and being a people person. However, no matter how nice he is his approach to his job, it is a job and I will only drop hints how the management should be done. He doesn't care about where the company is headed because he plays video games, has a family and has a literal life. He doesn't care about management and taking on undue responsibilities. Moreover, the people up to has a label for him as an "engineer" does not see as a "manager".
For the rest of the engineers and managers, have also adopted the approach of "not my problem", you see a bizarre communication gap. Engineers working closesly with the product don't want to talk to their managers, becase the conversation goes like "if you know this so much, why don't you.... <a description of something results in more work that goes outside their JD>" and managers don't want to talk with engineers because "if you are you so interested, why don't you.... <a description of something results in more work that goes outside their JD>"
From this progressive distance between managers and engineers comes the "manaegment consultant". Management consultant have the upper management given flexibility of going back and forth between engineers and managers. They can have conversations with full flexibility but they are not bound to "why don't you...." phrases. They can talk with anyone and submit a report and take home 1 years worth of salary of managers/engineers in 1 month.
The conversation gap between product and business where management consultants come in. And the funny thing is that, management consultants target those "I don't want to but I should" work things and report to the upper management. They can do this so well, because they are not burdened with the "work" part.
Seriously, if you do some introspection, you will see there is plenty of things you know your company should do, but you don't want to voice them because it results in more work and in fact more risk. There comes a "good" management consultant who will discover those things and report to upper management who will create the system to get those jobs done.
That is my pitch if anyone wants a management consultant hire me. I am going to tell them why their company sucks in 20 different ways with 18 of those points being generated by ChatGPT.
Your horse tranquilizer-addled coworker seems to be expressing a few points about the workplace dynamics between engineers and managers. First, he believes that while there may be exceptional engineers who are also good with people, these engineers are generally not interested in managerial responsibilities. Second, he observes a communication gap between engineers and managers, where both parties avoid taking on additional tasks outside their job descriptions. Lastly, he argues that management consultants bridge this gap by identifying issues neither party wants to handle but should. He concludes by saying that he'd make a good management consultant because he can spot numerous ways a company could improve.
If all else fails, the LLM revolution will at least allow us to make sense of ketamine-induced rants on management.
Maybe it was done on Ketamine, but the points are valid. Have seen it, consultants don't really bring 'new' or 'creative' solutions, they just help move the ideas around the calcified layers in the organization.
My theory is that honest takes should be written on first take without revisions and without edits. The moment I massage a statement to be more coherent I am compromising on my honesty.
The new tasks people get from talking to each other are usually well within their job description. They are just new tasks, and neither developers nor middle managers are allowed to drop useless tasks just before something valuable appeared.
Either way, in my experience management consultants just add new useless tasks for everybody on that set. I have never seen them actually decreasing the number of tasks.
If your docs and PR descriptions can be generated off file diffs everyone's time could be better spent scanning the diff to come to the same conclusions.
Consider using your PRs and docs to capture the answers to the usual why questions which LLM won't be able to do.
I've seen code bases survive three different ticket management systems. Meanwhile, the tickets never made it between the different systems, so if the 'why' isn't in the commit message, then it got lost to time.
I will admit that a lot of the really old decisions don't have much relevance to the current business, but the historical insight is sometimes nice.
Agreed: the study only shows that BCG consultant's work is 40% noise without real added value... I guess that customers should now ask for a 40% rebates !!! ;-)
Says more about how people will parrot the same phrase over and over for anything at all. It's just funny how you can predict a comment like this in every thread regardless of what it does.
"It says more about [insert]" anytime GPT does something just makes the phrase lose all meaning. Surely you have something meaningful to say?
Often effortposts aren’t worth it because someone will come along and Gish Gallop the post with opaquely nonsensical bad-faith counterarguments that are a lot of work to refute.
I agree with you in an ideal world, but sadly this isn’t one.
As I understand it, they have a very specific purpose. The customer needs someone to blame in making difficult decisions. The difficult decision process itself is secondary.
perfect tool for a consultancy: take a fresh graduate, pair it with a LLM tool and charge big bucks. not much different from current but the client will get a much more confident consultant and will be happy to fork more money.
Not surprised. It's frighteningly good, and a perfect match for programming.
I often ask GPT4 to write code for something, and try if it works, but I seldom copy and paste the code it writes - I rewrite it myself to fit into the context of the codebase. But it saves me a lot of time when I am unsure about how to do something.
Other times I don't like the suggestion at all, but that's useful as well, as it often clarifies the problem space in my head.
I used ChatGPT yesterday for code for the first time.
I gave it a nontrivial task I couldn’t google a solution for, and wasn’t sure it was even possible:
Given a python object, give me a list of functions that received this object as an argument. I cannot modify the existing code, only how the object is structured.
It gave me a few ideas that didn’t quite work (e.g modifying the functions or wrapping them in decorators, looking at the current stack trace to find such functions) and after some back and forth it came up with hijacking the python tracer to achieve this. And it actually worked.
The crazy thing is that I don’t believe it encountered anything like this in its training set, it was able to put pieces together which is near human level. When asked, it easily explained the shortcomings of this solution (e.g interfering with the debugger).
> The crazy thing is that I don’t believe it encountered anything like this in its training set, it was able to put pieces together which is near human level. When asked, it easily explained the shortcomings of this solution (e.g interfering with the debugger).
I have seen similar things. So, no, it's not regurgitating from its training data-set. The NN has some capacity for reasoning. That capacity is necessarily limited given that it's feed-forward only and computing is still expensive. But it doesn't take much imagination to see where things are going.
I'm an atheist, but I have this feeling we will need to start believing in "And [man] shall rule over the fish of the sea and over the fowl of the heaven and over the animals and over all the earth and over all the creeping things that creep upon the earth"[1] more than we believe in merit as the measuring stick of social justice, if we were to apply that stick to non-human things.
The published article is not at all about programming tasks but about generating text for "strategy consultant".
Some example found page 10 of the original article:
- Propose at least 10 ideas for a new shoe targeting an underserved market or sport.
- Segment the footwear industry market based on users.
- Draft a press release marketing copy for your product.
- Pen an inspirational memo to employees detailing why your product would outshine competitors.
Without the right target market, business model, and effective methods to reach customers, the most brilliant pair of shoes or piece of code can be useless (unless someone works to repurpose them as art or a teaching tool).
Each question is too generic and there was apparently no specific input data to act upon. How a valid business model can be expected in those condition ?
I’ve also found the act of describing my problem to GPT4 is sometimes just a helpful as the answer itself. It’s almost like enhanced rubber duck debugging.
So true. I've written entire prompts with several lines worth of explanation, only to realize what my issue was and never hit the "send" button. Guess I should do that more often in life, in general
Absolutely bonkers algorithms that no one can make sense of unless they dedicate time to study and debug it
Also, it will have to be scrapped when anyone wants to tweak it a little. Sure monkeys randomly typing on a typewriter will eventually write the greatest novel in existence... but most of it will be shit
May $entity have mercy on your soul if the business starts bleeding tons of money due to an issue with the code, because the codebase won't
perhaps the difference is that attitude is fine when building hobby or saas apps adding no real value to the world however that's not the type of behavior we'd expect to see for engineers having responsibility for critical systems dealing with finance, health, etc.
Beware of that practice. If for some reasing you are get used to it too much, one day you may not have and you won't know where to start to write a function yourself.
It's simlar to what happens to people who knows a language (not coding language), stop using it or go back to use translator, and when they need to use it themselves, they are unable.
Having been a consultant, what strikes me about this is the next, to me seemingly obvious question: What if you just removed the consultants entirely and just had GPT-4 do the work directly for the client?
If you’re a client and need a consultant to do something, you have to explain the requirement to them, review the work, give feedback, and so forth. There will likely be a few meetings in there.
But if GPT-4 can make consultants so much better, I imagine it can also do their work for them. And if you combine this with the reduction in communications overhead that comes from not working with an outside group, why wouldn’t clients just accrue all the benefits to themselves, plus the benefit of not paying outside consultants or dealing with the overhead of managing them?
This is especially the case when the client is already a domain expert but just needs some additional horsepower. For example, marketing brand managers may work with marketing consultants even though they know their products and marketing very well. They just need more resources, which can come in the form of consultants for reasons such as internal head-count restrictions.
Anyway, I just wonder if BCG thought through the implications of participating in this study. To me it feels like a very short step from “helps consultants help their clients” to “helps clients directly and shows consultants aren’t really necessary.”
Especially so if the client just hires an intern and gives them GPT-4.
Companies like BCG and McKinsey are mostly about liability, as a CEO you call them, pay them the big bucks, have them make up plans and strategies, if it works out you get the credit, if it doesn't then well "we worked tightly with experts from McKinsey, etc. so the blame isn't on me"
The frustrating one is when you've been telling management something for months (if not years), and the consultant comes in, and their report says what you been saying, and only then does the company finally do what you've been saying all along! Coulda saved the company 5-figures just listening to me. sigh politics.
People have ideas all the time internally. I'm going to assume the idea you had was one of many.
The issue is getting the real decision makers to buy into it. They aren't going to take the word of someone who works in some division. They want some rigor to it.
Bringing in someone who isn't tainted by the groupthink of the company, can actually take a sober view of the situation, has puts some weight to the recommendation.
Yeah, but I wonder if it’s even more powerful to say, “we asked the world’s most powerful AI and it recommended that we lay off 20% of our staff, while ensuring we treated them all fairly.”
That's probably what they say to each other... Just looking at the garbage produced by movie studios recently you can't not ask yourself if the scripts are AI generated... And that's just the scripts let alone those crazy budgets that still produce movies that look like N64 games.
HN is so bad at predictions. Just a few months ago HN was awash with comments that confidently claimed LLMs were no more than stochastic parrots and unlikely to amount to anything.
> I can't help but think the next AI winter is around the corner. [0]
>If we're looking for a cost-effective way to replace content marketing spam... great! We've succeeded!
And if you read the article that’s almost exactly the level of output that we’re talking about.
- Propose at least 10 ideas for a new shoe targeting an underserved market or sport.
- Segment the footwear industry market based on users.
- Draft a press release marketing copy for your product.
- Pen an inspirational memo to employees detailing why your product would outshine competitors.
Also for the 2nd task the non LLM group performed significantly better.
We’re going to have legit AGI that can outperform humans in every way and HN will still find something to complain about. I love the tech news on here, but the constant cynicism on everything is exhausting.
There is a lot of office work that will overtime be optimized over time using gpt like services. I was tech savvy enough to know that a lot of office work that I do is repeatable and can be done using scripts but not good enough to write those scripts myself. Using Chat gpt allowed me to write those scripts it took me I think 15-20hrs to get the scripts working perfectly. I knew just a little bit of python scripting did not know anything about python pandas or xls writer etc but was able to create something that saves me I would estimate 20-25 hours a week.
In my opinion a lot of people here on hackernews as they are themselves good at programing underestimate how services like chat gpt can open a new world to non programmers. They also probably make the non inquisitive learn less. Previously to learn how to stop multiple snapd services using a script I would have googled and then cobbled together something today I just ask chatgpt and get a working script in less than a min.
Couldn't agree more. I've gone multiple times now from "I wonder if X is possible/how would you do X" to hacking out a crude proof of concept to a problem that I wouldn't even know how to google.
Two things mentioned in the abstract that are worth pointing out.
> For each one of a set of 18 realistic consulting tasks within the frontier of AI capabilities
They specifically picked tasks that GPT-4 was capable of doing. GPT-4 could not do many tasks, so when we say that performance was significantly increased this is only for tasks GPT-4 is well suited to. There is still value here but let's put these results into context.
> Consultants across the skills distribution benefited significantly from having AI augmentation, with those below the average performance threshold increasing by 43% and those above increasing by 17% compared to their own scores
Even when cherry-picking tasks that GPT-4 is particularly suited for, above average performers only increased performance by 17%. This increase is still impressive, were it to be seen across the board. But I do think that 17% is a lot less than some people are trying to sell.
Hmmm. Perhaps below-average performers are more likely to take GPT output at face-value, being less competent to review and edit it. And above-average performers are more likely to hack the GPT output around, because they're confident in their own abilities.
Therefore below-average types will produce finished output more quickly; and this was a time-constrained test, so velocity matters.
ChatGPT is very good at waffling, and marketing-speak and inspirational messages are essentially waffle. IOW, the tasks were tailor-made for unaided ChatGPT, so high-performers were penalized.
You're underestimating because it compounds. Small gains in efficency lead to huge advantages in long term growth. 17% would be absolutely monumental improvement.
Pipe /dev/random, transform to decimal, and you just got an amazing increase in performance for calculating decimals of Pi. Nobody said precision was important anyway.
Honestly if you don't care about precision, /dev/zero is going to give you more throughput. Plus, I personally guarantee it's correct to within an error margin of 4.0. You can't offer the same with /dev/random!
I always wanted the minor number of the device /dev/zero uses to select the byte you get, so if you go "mknod /dev/seven c 1 7" that would make an infinite source of beeps!
We're not trying to hit a comet with a rocket here. 1 significant figure is more than sufficient for an initial consultation. Any additional accuracy required would be billable follow-on work.
Yes, GPT-4 is great for doing “boring work” and allows me to focus on the “fun work”. You still need to know what you’re doing though, you can’t blindly copy and paste.
And for the second one, although I am paying for it too, this idea is more or less flawed nowadays. Utilization is a very hand wavy thing when it comes to this stuff. Like a purse, millions would pay money for it, some even pay thousands. But I have no use for it and wouldn’t even pay a $1 for one.
> You still need to know what you’re doing though, you can’t blindly copy and paste.
Agreed.
> Like a purse, millions would pay money for it, some even pay thousands.
Expensive purses have intangible value for some. They are often bought to signal social status.
I'm pretty sure a significant portion of ChatGPT Plus subscribers are paying because it can help them with information or cognitive work that some people value.
Consumer behavior around monthly subscription services that can be cancelled at any time looks very different from behavior around one-time luxury purchases.
I have free access to copilot because I do some open source work. I haven't been impressed by what it can do and I wouldn't pay even $3/month to use it.
The second question doesn't make sense to me. There are tons of things I think are useless (or worse) that people pay for anyway. Meal kit boxes come to mind, and at least you can eat those at the end of the day.
I've spent some time using it and experimenting with different prompts, but to be honest it's hard to be motivated to spend more time on it given the disappointing results so far.
I need to see a glimmer of it being useful before I decide the investment is worth it, I guess.
1: I’ve been scripting for 5 years using Python. I purchased a subscription to use GPT4 to see if it could assist me.
In the end it took me more time to fix its mistakes than to just apply my knowledge of knowing what to Google and reading docs.
Additionally the largest hurdle
I encountered was when it hallucinated a package that didn’t exist and I spent time trying to find it.
2: I don’t know about most people but I’m terrible at cancelling services that are “cheap”. I used ChatGPT for a few hours that first month and didn’t cancel it for another 5 months.
My prediction? In about 6 months, every test, task, or use of a LLM for anything that requires a modicum of creativity is going to find that it only has a fixed set of "ideas" before it starts regurgitating them. [0] I can easily imagine this in their hypothetical shoe pitch question, and many models going for more factual answers have been rapidly showing this bias by design.
I'm very unimpressed by that study. Look at how they generated the jokes - they fed it a prompt that was a slight variation on "please tell me a joke" and then wrote about how the jokes weren't varied enough.
Can confirm. I popped the 20 bucks for GPT4, and have been using it more and more, every day for 3 weeks. Not sure how I can get by without it now. It's just so easy to have normal conversation and get answers. Like having an expert friend across the hall you can just shoutout questions, and ask for simple reminders, recommendations.
Who cares if it gets things wrong sometimes, you would double check your co-workers answers also. And there are times when I insist I am correct, and GPT will argue back and eventually I find I was wrong.
That's what you would like to think, isn't it? I'm afraid this would be just as much true with any other kind of subjects, and as far as I know, there's no evidence either way so this is just a cheap stab you're having at them.
Meh.. I mean a lot of consulting is tasks like writing or idea generation. Using something like chat GPT to do it [faster, better] doesn't negate the value in what they do, since they are hired to do those tasks, those tasks are required for the broader work.
I bet early search engines had similar or even better figures under similar conditions.
I suppose this because I recall how much search improved my productivity over flipping through books and I know how for certain tasks ChatGPT is a better source of knowledge on how to do it than search. While often the GPT output isn’t entirely correct, more often than not it suffices to make the correct solution obvious thus saving a lot of time.
Guilty! GPT is the best colleague I ever had, but boy does it speak. You can't just copy paste, but if you consider its responses as input I find myself less dependent on other senior consultants sharing their insights. It also makes me more confident in my assessments and deliveries.
Purpose of technology is to enhance our performance, GPT is very much doing so - but with great powers comes great responsibility.
No, it increases the load one can successfully manage in a day. There isn't this tiny discrete amount of work that people need to handle. We gave that up when we left the campfires. We're trying to grow.
BCG : We know layoffs are in fashion and we'd just like you to know that if you need industrial grade ass covering excuses from a legitimate-ish sounding authority to justify what you were planning to do anyway, our 23 year old consultants and their PowerPoint presentations have got you covered.
If this is how so called consultants use AI… they should be very concerned.
A moderately skilled intern with GPT Enterprise connected to data will make them quickly obsolete. Maybe they have some potential building their own fine tuned model but surely they will screw that up
Didn’t have many interactions which BCG so far but in both we had, I was surprised at how much money they get for reshuffling information from what is all common knowledge and available in the net. I can see that this is something LLMs can do really well. It’s exactly the kind of “creativity” LLMs can do: “apply concept X to market / niche Y and give ideas on monetizing”.
I don’t blaim BCG for doing this, they are giving an outside view and political uninfluenced (except for the party that pays the tap) view.
The output of many professions is bag-of-words emotional persuation. eg. politicians, consultants, sociologists, psychologists, writers, economists, tv talking heads, media in general.
A characteristic of these professions is that there is no accountability for output they produce. It is not like a profession that builds an engine for a car. They can bullshit with confidence and get away with it.
chatGPT will replace all of them - as chatGPT itself can bullshit with the best of them.
I was sort of wondering this with the latest (I think now resolved) writer's strike. The union wanted reassurance that they wouldn't be replaced by AI; however, if I was the studios, I would have said `sounds good` - knowing full well that the union members will likely be turning to it. Unless the union polices its members, the appeal to use it is just too high.
Where ChatGPT could excel is early education learning where the ideas are simple and universally agreed and written online. As you go higher level the chance of hallucinations becomes higher and you could be taught the wrong thing without knowing the risks
Funnily enough, as a business consultant I use GPT to create executive summaries and sell people on the idea that my reports are as short as they possibly can be without information loss.
"Participants responded to a total of 18 tasks (or as many as they could within the given time frame). These tasks spanned various domains. Specifically, they can be categorized into four types: creativity (e.g., “Propose at least 10 ideas for a new shoe targeting an underserved market or sport.”), analytical thinking (e.g., “Segment the footwear industry market based on users.”), writing proficiency (e.g., “Draft a press release marketing copy for your product.”), and persuasiveness (e.g., “Pen an inspirational memo to employees detailing why your product would outshine competitors.”)."
Here is the GPT response to the first task: https://chat.openai.com/share/db7556f7-6036-4b3d-a61a-9cd253...
A confident GPT hallucination is almost indistinguishable from typical management consulting material...