I'm unimpressed. I gave it specifications for a recommender system that I am building and asked for recommendations and it just smooshed together some stuff, but didn't really think about it or try to create a resonable solution. I had claude.ai review it against the conversation we had.. I think the review is accurate.
----
This feels like it was generated by looking at common recommendation system papers/blogs and synthesizing their language, rather than thinking through the actual problems and solutions like we did.
suno is uncomfortably good. I run a group for helping founders and sometimes I make little suno songs to accompany the classes for fun, always impressed by what it spits out. (prompt: song for founder who have happy ears bringing them tears > 30 seconds gen >) https://s.h4x.club/p9u4ezl2 / https://s.h4x.club/mXuND7Eb / https://s.h4x.club/L1u2DYzW
Suno songs always have way too much treble or reverb, or something I can't quite put my finger on. They're very bright sounding.
I don't think it's a fatal flaw, but I hope future versions improve on this, or Suno starts doing some more post-processing to address it. I know there's a new "remaster" feature, but I'm not sure if it does anything there either.
Yeah they're way too wide and not muddy enough, if you're gonna be as wide as they often are you need to fill it, else they always just sound over produced/weirdly bright. I was thinking earlier about why they sound really good but not actually good and then I realized that's how I feel about most modern pop music anyway. I think the main thing you're hearing, or at least the thing I find annoying, is if you listen close to how the AI does harmony it seems to almost be cloning the original vocal line and pitching it up and out so it's slightly offset feeling giving the appearance of a second vocalist, its a tick I do in abelton to see what things might sound like build differently with vocals and it feels very much like the sumo fake backing singers. I do think they're like 6 months or so away from nailing a lot of this given how quickly they've been moving, I follow them closely and it's been impressive. (you can also pull meatier stuff out of it if you work it a bit: https://s.h4x.club/04uz6klg - don't think this sounds particularly "AI" at all - edit: turns out if you play up in chamber orchestra and choir in the prompting you can get some much better stuff out of it: https://s.h4x.club/eDubr9xJ)
I mean yes and no, like if you just let it generate lazily, then yes. However if you work on lyrics and generate a bunch of samples.. no it can be very powerful and artistic.
I think it is what nobody has answered yet.. virtualized training/testing. I watched a presentation by their research team. This is a HUGE force multiplier. Don't underestimate how much this changes robotic foundational model training.
I kind of feel like it is the inverse in many ways.
As an experienced engineer, I know how to describe what I want which is 90% of getting the right implementation.
Secondly, because I know what I want and how it should work, I tend to know it when I see it. Often it only takes a nudge to get to a solution similar to what I already would have done. Usually it is just a quick comment like: "Do it in a functional style." or "This needs to have double check locking around {something}."
When I am working in the edge of my knowledge I can also lean on the model, but I know when I need to validate approaches that I am not sure satisfy my constraints.
A junior engineer doesn't know what they need most of the time and they usually don't understand which are the important constraints to communicate to the model.
I use an LLM to generate probably 50-60% of my code? Certainly it isn't ALWAYS strictly faster, but sometimes it is way way faster. On of the other things that is an advantage is it requires less detailed thinking at the inception phase which allows my do fire off something to build a class, make a change when I am in a context where I can't devote 100% of my attention to it and then review all the code later, still saving a bunch of time.
Worse/less experienced developers see a much greater increase in output, and better and more experienced developers see much less improvement. AI are great at generating junior level work en masse, but their output generally is not up to quality and functionality standards at a more senior level. This is both what I've personally observed and what my peers have said as well.
interesting paper and lots of really well done bits. As a senior dev that uses LLM extensively: This paper was using copilot in 2023 mostly. I used it and chatgpt in that timeframe, and took chatgpts output 90% of the time; copilot was rarely good beyond very basic boilerplate for me, in that time period. Which might explain why it helped jr devs so much in the study.
Somewhat related, i have a good idea what i can and cannot ask chatgpt for, ie when it will and wont help. That is partially usage related and partially dev experience related. I usually ask it to not generate full examples, only minimal snippets which helps quite a bit.
Another factor not brought into consideration here may be that there are two uses of "senior dev" in this conversation so far; one of them refers to a person who has been asked to work on something they're very familiar with (the same tech stack, a similar problem they've encountered etc.) whereas the other one has been asked to work on something unfamiliar.
For the second use case, I can easily see how effectively prompting a model can boost productivity. A few months ago, I had to work on implementing a Docker registry client and I had no idea where to begin, but prompting a model and then reviewing its code, and asking for corrections (such as missing pagination or parameters) allowed me to get said task done in an hour.
So I often use Github Copilot at work usually with o1-preview as the LLM. This often isn't "autocomplete" which generally uses a lower end model, I almost exclusively use the inline chat. That being said.. I do also use the auto-complete a lot when editing. I might create a comment on what I want to do and have it auto-complete, that is usually pretty accurate, and also works well with me since I liked Code Complete comment then implement method.
For example I needed to create a starting point for 4 langchain tools that would use different prompts. They are initially similar but, I'll be deverging them. I would do something like copy the file of one. select all then use the inline chat to ask o1 to rename the file, rip out some stuff and make sure the naming was internally consistent. Then I might attach additional output schema file and the maybe something else I want it to integrate with and tell it to go to town. About 90% of the work is done right.. then I just have to touch up. (This specific use case is not typical, but it is an example where it saved me time, I have them scafolded out and functional while listening to a keynote and in-between meetings.. then in the laster day I validated it. There were a handful of misses that I needed to clean up.)
As someone learning programming with an llm, its 50-50 as to whether it saves or costs me extra time.
This is mostly because if i don't know that i'm asking for the wrong thing, the llm won't correct me and provide code that answers the wrong question and make things up to do that if needed.
Sure i learn by debugging the llm's nonsensical code too, and it solves my "don't want to watch a 2h tutorial because if i just watch the 10minutes that explain what i want to learn, i don't understand any of the context". But it's not much faster with the llm since I need to google things anyway to check if it is gaslighting me.
It does help understanding errors i'm unfamiliar with and the most value i found is pasting in my own code and asking it to explain what the code should do, so i find errors in my logic when it compiles but doesn't have the desired effect. And it will mention concepts i'm lacking to look them up (it won't explain em clearly but at least it's flagging them to me) in a way youtubers earely do.
Still haven't made up my mind if it is a net positive as it often ends up getting on my nerves to wait 10min for a fluff intro before it gets to the answer. Better than a 20min fluff video intro on youtube maybe?
Well LinkedIn does a lot of stuff around making sure the accounts are for real people. Kind of helps with many of the issues people are complaining about. I mean they can improve it, but they do some level of effort.
In large companies I have seen a related pattern. Usually a mid-level engineer that the managers love because they "get stuff done".. meanwhile they are a bulldozer in the code, usually with some "ship-it" buddy green lighting the work.
The reason they can "move fast" is because everyone else is trying to limit complexity, etc. and they are punching holes through the abstractions.
Then turn into your "Pete" when they get promoted...
> The reason they can "move fast" is because everyone else is trying to limit complexity, etc. and they are punching holes through the abstractions.
That's perfectly fine. Your salary is paid by paying customers which are attracted and maintained by improving their user experience. You will never get a new paying customer by advertising that you prevented your abstractions from being soiled.
Bingo! This is the right answer. It always comes down to how long will the code exist and do you need to be able to sustain high velocity over a long period of time.
If you don't need to keep it very long, then hack the hell out of it. If you are in a startup, hack it.. you don't even know if you have product market fit. If you are in an enterprise and your team is responsible for some aspect of the company.. keep it clean and move fast. As soon as you start snowballing hidden complexity via hacks.. it becomes a tar pit.
reply