Hacker News new | past | comments | ask | show | jobs | submit | thorax's comments login

Replaced my openers in the spring, and 100% wouldn't have chosen them if there wasn't HA MyQ integration. Such a silly move.

I used a local Meross install on my old garage doors, time to break them out, but ugh...


Don't think so, but there were some guesses on 3.5-turbo-- i.e. training a much smaller model on quality questions/answers from GPT-4. Same tactic worked again and again for other LLMs.

I'm definitely curious on the context window increase-- I'm having a hard time telling if it's 'real' vs a fast specially trained summarization prework step. That being said, it's been doing a rather solid job not losing info in that context window in my minor anecdotal use cases.


Do you have a link or gist of an example run you tried? I'd be curious to try something similar.


I've had to move to using Azure OpenAI service during business hours for the API-- much more stable unless the prompts stray into something a little odd and their API censorship blocks the calls.


I’ve been working directly with OpenAI’s access, are there any other advantages to doing this through Azure?


You can opt out of the safety filtering, btw.


Ah, really curious if it'll balk at the age old problem or just answer based on the code provided since it does inspect the nearby code to understand context for the generated code.

I'll have to try that when I get back to my desk!


> It returns factually incorrect data, and it returns code with subtle but important errors if you ask it anything that's not regurgitated a thousand times in the training dataset.

To be fair, that's what pretty much every person does. The bar does seem pretty high if we need more than that (especially if not specifically trained on a topic). It's not a universally perfect expert servant, but I've been exploring the code generation of GPT4 in detail (i.e. via the 'cataclysm' module I just posted about). In 1 minute it can write functions as good as the average developer intern most of the time.

We're keeping score in a weird way if we're responding quickly with it needing to "code without subtle but important errors". Because that's the majority of human developers, too. I've been writing code for 30 years, and if you put a gun to my head, I would still have subtle but important flaws in every first typing of any complex generated code.

I'm not saying you're bashing it, by the way, I get your point, but I do worry a bit when the first response is citing that the SOTA models get things wrong in 0-shot situations without full context. That's describing all of us.


I always say, if some nontrivial code compiles and runs on the first attempt, then you just haven't found the bugs yet.

GPT-4 is a fantastic collaboration tool for senior developers, who know what they want in detail and can review, verify and apply the output it generates.

Just yesterday I needed to write some detailed bash scripts. I'm no Linux guru but I know what I want and that was enough - in minutes I had a solid script that did everything I needed and wanted, something that would have taken much longer to hunt down through Google. And then I asked it about SQL, C#, AWS, Terraform, Rust and on and on and everything was high quality.

The only way I could have gained similar results without ChatGPT would have been to post all my questions to the dev slack channel and engage in hours long discussions with my colleagues.


The standard should be higher for machines. Historically, they're inorganic tools.

If a hammer was only as dense as bone, would it be a good hammer?


> would it be a good hammer?

Yes, it is possible that such a hammer would still be very useful, if that was the only available option.

Imperfect hammers can still be very very useful, when compared to having no hammers.


> that's what pretty much every person does

But the people who are relevant to the matter work to bring that beyond a minimum.

> and if you put a gun to my head

Like we do with bullshitters? ;)


Now that it's released, it's time to experiment with what it can do.

   from cataclysm import doom

   # App gets the img file from the command line and saves it as a new file at half size with _half appended to the name
   doom.resize_app()
Turned out to be all that's needed for a command-line file resize app (with PIL installed).


> I don't think GPT-4 will be a big deal in a month

Why do you think that? Competition? Can you elaborate?


Oh, a lot of reasons. For one, I'm a data scientist and I am intimately familiar with the machinery under the hood. The hype is pushing expectations far beyond the capabilities of the machinery/algorithms at work, and OpenAI is heavily incentivized to pump up this hype cycle after the last hype cycle flopped when Bing/Sydney started confidently providing worthless information (ie "hallucinating"), returning hostile or manipulative responses, and that weird stuff Kevin Roose observed. As a data scientist, I have developed a very keen detector for unsubstantiated hype over the past decade.

I've tried to find examples of ChatGPT doing impressive things that I could use in my own workflows, but everything I've found seems like it would cut an hour of googling down to 15 minutes of prompt generation and 40 minutes of validation.

And my biggest concern is copyright and license related. If I use code that comes out of AI-assistants, am I going to have to rip up codebases because we discover that GPT-4 or other LLMs are spitting out implementations from codebases with incompatible licenses? How will this shake out when a case inevitably gets to the Supreme Court?


Yeah I wrote my own plunkylib (which I don't have great docs for yet) which is more about having the LLM and prompts in (nestable) yaml/txt rather than how so many people hard code those in their source. I do like some of the features in langchain, but it doesn't really fit my coding style.

Pretty sure there will be a thousand great libraries for this soon.


You're not wrong, but surely OpenAI can't do everything and maybe they can stay ahead on features long enough to continue to be higher value?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: