Hacker News new | past | comments | ask | show | jobs | submit | FuckButtons's comments login

I suspect this is GPT-5. This is the biggest model they made and they got very little ROI hence the re-branding.

Mostly because of computational efficiency irrc, the non linearity doesn’t seem to have much impact, so picking one that’s fast is a more efficient use of limited computational resources.

There used to be competing centers of power. But then they stacked the judiciary and used manipulative propaganda to turn the congress and senate into a rubber stamp. The only check on power was having the interest of those institutions not aligned with each other, for them to have power that they were able to exercise independently.

I think the breathless hype train of twitter is probably the worst place to get an actually grounded take on what the real world implications of the technology is.

Seeing the 100th example of an llm generating some toy code for which there are a vast number of examples of approximately similar things in the training corpus doesn’t give you a clearer view of what is or isn’t possible.


I think that most of the developers who advocate for AI coding have never worked all by themselves on projects with over 500/1000 files. Because if they had they would not advocate for AI coding.


I posted this earlier, but I wanted a java port of sed for ... Reasons, and despite the existence of man pages and source code it couldn't do anything but the most basic flags.

Imo this should be low hanging fruit. Porting non-trivial but yet 3-4 core code files that are already debugged and interface specified should be what an LLM excels at.


I tried this with Microsoft's Copilot + the Think Deeper button. That allegedly uses the new o1 model. It goes into a lot of fancy talk about...pretty much what you said older models did. Then it said "here's some other stuff you could extend this with!" and a list of all the important sed functionality.

It's possible it could do it if prompted to finish the code with those things, but I don't know the secret number of fancy o1 uses I get and I don't want to burn them on something that's not for me.

You should be able to access it here if you have a Microsoft account and want to try the button: https://copilot.microsoft.com/


My mum gave me HSV, she currently has Alzheimer's and has forgotten who everyone, including me is. She is 64.


Sorry to hear that, my mom is in similar shape. As is my wife's. Fucking awful.


Sure, let’s check in on this in 4 years time and see if they’ve made any significant progress on that. Many, if not most of our problems have obvious solutions, it’s actually executing on them that’s the problem.


Just played with the qwen32b:Q8 distillation, gave it a fairly simple python function to write (albeit my line of work is fairly niche) and it failed spectacularly. not only not giving a invalid answer to the problem statement (which I tried very hard not to make ambiguous) but it also totally changed what the function was supposed to do. I suspect it ran out of useful context at some point and that’s when it started to derail, as it was clearly considering the problem constraints correctly at first.

It seemed like it couldn’t synthesize the problem quickly enough to keep the required details with enough attention on them.

My prior has been that test time compute is a band aid that can’t really get significant gains over and above doing a really good job writing a prompt yourself and this (totally not at all rigorous, but I’m busy) doesn’t persuade me to update that prior significantly.

Incidentally, does anyone know if this is a valid observation: it seems like the more context there is the more diffuse the attention mechanism seems to be. That seems to be true for this, or Claude or llama70b, so even if something fits in the supposed context window, the larger the amount of context, the less effective it becomes.

I’m not sure if that’s how it works, but it seems like it.


When I asked the 32b r1 distilled model its context window it said it was 4k... I dont know if thats true or not as it might not know its own architecture, but if that is true, 4k doesnt leave much especially for its <thinking> tokens. Ive also seen some negative feedback on the model, it could be that the benchmarks are false and maybe the model has simply been trained on them or maybe because the model is so new the hyperparameters havent been set up properly. we will see in the next few days i guess. from my testing theres hints of something interesting in there, but i also dont like its extremely censored nature either. and i dont mean the CCP stuff, i mean the sanitized corpo safety nonsense it was most likely trained on....


Yeah this simply wouldn't work. Models don't have any concept of "themselves". These are just large matrices of floating points that we multiply together to predict a new token.

The context size would have to be in the training data which would not make sense to do.


Try the llama one instead. Seemed better than qwen for some reason


I tried llama70b too with the same task, the reasoning seemed more coherent, but it still wound up coming to very invalid conclusions using that reasoning and the output was even further from correct than qwen.


Sure, what's your training corpus for that then?

I find that fairly often if I'm really wrestling with a novel or difficult problem, I will work and work at it, and then one day I will wake up with the answer fully formed with no clear understanding of any of the thought processes that got me to arrive at the solution.

Are you going to record peoples subconscious as they sleep, how do you train on something that is so poorly understood in the first place? It's nonsense.


I'm sure if you take an hour to recall you'll be able to come up with a process. Or ask a philosophy professor who specializes in reason.

But the easiest way I can think of ATM is to go through all the questions that AI currently fails on, and then have a human work through them and show the chain of thought a human would do, including the false starts, and describing the strategy pivots. Then generate your corpus based on that. However, that burns the problem-set so you'll have to constantly try to come up with new problems.


It’s quite easy to separate out the ccp from the Chinese people, even if the former would rather you didn’t.

Chinas people have done many praiseworthy things throughout history. The ccp doesn’t deserve any reflected glory from that.

No one should be so naive as to think that a party that is so fearful of free thought, that it would rather massacre its next generation of leaders and hose off their remains into the gutter, would not stoop to manipulating people’s thoughts with a new generation of technology.


This "CCP vs people" model almost always lead to very poor result, to the point that there's no people part anymore: some would just exaggerate and consider CCP has complete control over everything China, so every researcher in China is controlled by CCP and their action may be propaganda, and even researchers in the States are controlled by CCP because they may still have grandpa in China (seriously, WTF?).

I fully agree with this "CCP is CCP, Chinese are Chinese" view. Which means Alibaba is run by Chinese, not CCP. Same for BYD, DJI and other private entities in China. Yes, private entities face a lot of challenges in China (from CCP), but they DO EXIST.

Yet random guys on the orange site consistently say that "everything is state-owned and controlled by CCP", and by this definition, there is no Chinese people at all.


It's probably much more true for strategically important companies than for your average Chinese person that they are in some way controlled by the Party. There was recently an article about the "China 2025" initiative on this here orange website. One of its focus areas is AI.


Isn’t every government putting out a policy paper making AI a focus area? Why is it suddenly nefarious when China does it?


Which is why we started to have weird national-lab-alike organizations in China releasing models, for example InternLM [0] and BAAI [1]. CCP won't outsource its focus areas to the private sector. Are they competent? I don't know, certainly less than QWen and DeepSeek for now.

[0] https://huggingface.co/internlm

[1] https://huggingface.co/BAAI


Private entities face challenges from CCP? I don't think this is true as a blanket statement. For example Evergrande did not receive bailouts for their failed investments which checks out with your statement. But at the same time US and EU have been complaining about state subsidies to Chinese electric car makers giving them an unfair advantage. I guess they help sectors which they see as strategically important.



Any real political problem is multifaceted, deeply interconnected with the way the country works and its place in the world. But peoples experiences of them are not, inflation manifests as someone being able to afford rent one year, and not the next.

A good politician, can speak to the experience, but fix the problem. A good salesman can sell you a solution, even if it doesn't fix the problem. And the democratic party, seems mostly interested in talking about the problem and ignoring the experience.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: