Hacker News new | past | comments | ask | show | jobs | submit login
Grok-2 Beta Release (x.ai)
226 points by meetpateltech 6 months ago | hide | past | favorite | 333 comments



The technology is impressive; achieving such a level requires a lot of efforts in dataset creation, neural architectural costs, and GPU shepherding.

What is the company’s ethical position though? It officially stemmed from Mr Musk’s objection that OpenAI was not open-source, but it too is not open-source. It followed Mr Musk’s letter to stop all AI development on frontier models, but it is a frontier model. It followed complaints that OpenAI trained on tweets, but it also trained on tweets.

Companies like Meta, Mistral, or DeepSeek, address those complaints better, and all now play in the big league.


“Conservatism consists of exactly one proposition, to wit: There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect.”

sounds like mr Musk is a conservative.


I would encourage everybody who thinks so to pursue basic political and philosophical education. Perhaps with a dash of history.

This definition is plainly wrong on so many levels that it's practically impossible to engage with. But I'll make that mistake and engage on two points.

First, it implies that conservative position has somehow consistent features across time and space. There is difference between conservative in Germany, USA and China. Not to mention conservative in early, mid and late 20th century.

Second, ignoring legal norms is neither stated, nor implicated position of conservative political movements. At very worst, we can accuse them of maintaining laws with discriminatory intents. But not of flaunting those same laws.


If you're in a charitable mood, the context on when, where and who originally made the statement will provide clues on which strain of conservatism the statement is referring to.


I found this about the origin and am not sure what to take from it:

https://slate.com/business/2022/06/wilhoits-law-conservative...


So, the original author is an American living in Ohio, and made the comment in the year AD 2018 while critiquing an essay about the New Deal. I'm confident you can make a good-faith educated guess on which country and period they were characterizing.


> ignoring legal norms is neither stated, nor implicated position of conservative political movements

The Republican candidate for president in the USA is a convicted felon


What does that have to do with the definition of Conservatism as political thought?


North American Conservatives (i.e. citizen of the United States) have done olympic-worthy gymnastics to align with the aforementioned felon's redefinition of conservatism belief in America, even while those beliefs actively contradict their religion and life-long belief systems, or even their own on going behaviors and decisions.

I say this as someone living in Pennsylvania, drowning in the hypocrisy and escalating hate this group of people has been spewing for the last ~8 years.

Therefore I can completely understand why someone might focus on that as the most relevant definition on 'conservatism' today in the USA.


> I can completely understand why someone might focus on that as the most relevant definition on 'conservatism' today in the USA.

You don't consider this a problem, that the word "conservatism" when discussed with an unknown recipient online (very possibly non-american) is constrained to the context of the past decade(s) in the United States?

Words have meaning, so if you're going to have a meaningful discussion about a word like "conservatism" or any type of -ism for that matter, I would think it benefits anyone engaging in that discussion to be aware of the different wings present in that word, whether that be across history or across present day geography.


[flagged]


Get back to us when Biden is a twice convicted sex offender, has caused a dozen plus of his inner circle indicted for felonies across many jurisdictions, when several have pled guilty, when he is convicted of felony fraud, when he steals nuclear secrets and gives them to foreigners, and when his decades long employees and own lawyers turn him in with video, audio, eyewitness, photographic, and text evidence.

Then we won't be playing whataboutism wackamole.


According to the random Crooked Timber blog commenter who coined that viral aphorism in 2018, yes. But by what standard are that commenter’s musings to be considered expositive on modern conservative philosophy?


Empirical observation of the last 10 years? Cue No True Conservative, etc.


Observing the last 10 years, which political movement is most associated with the idea that inherent identity characteristics should dictate how you are treated under the law?


In those last ten years, Republicans have been utterly obsessed with "identity characteristics". From pushing back against gay marriage, abortion, civil rights... It's basically all they talk about in political rallies today. Not the economy or anything else, just how it is important to never talk about trans people, and how they should not exist.


In my observation, every time a prominent conservative breaks the law, all I hear from the right is how “he’s a good man,” “he learned his lesson,” “he was acting in good faith,” and so on — even if the crime is as egregious as homicide or pedophilia. The same generosity is never granted to someone not in the in-group: just look how Crystal Mason was treated when compared to the scores of Republicans who were caught with their hands in the cookie jar.

In other words, identity politics to a T.


The guys closing polling places in black neighbourhoods? The guys denying women and trans people healthcare?

Identity politics has always been a conservative project.


My understanding is the earliest application of identity politics comes from thinkers like Fanon and Wollstonecraft, would you categorize them as being conservatives?


Again, you are ignoring the identity based systems they describe, that have been in place for centuries before either of them were born.

You know which ones I mean.


"In-group" doesn't necessarily mean identity characteristics. In today's (US) conservative party, it distinctly means "pledges personal allegiance to party leader."

As an example: The "conservative" judge who threw out 40 years of precedent on a technicality to prevent the American public from learning whether their former and potentially future president sold, gave away, or otherwise exposed national security secrets after he undoubtedly stole those documents.

There's a fundamental asymmetry in "the movement" on the left - which essentially rounds out to whatever annoying undergrad student showed up in your Twitter feed today - and the actual elected, governing leaders of the right, doing things like throwing out very strong criminal cases on matters of deep public importance.


It's probably a bad idea and it will likely backfire but nevertheless motivation matters and a lot of people are willing to cut some slack to that political movement because they're honestly convinced it was done in good faith to restore a balance and give some power to disenfranchised groups.


Sometimes a phenomenon exists for a long time before being encapsulated in a concise, thought-provoking, and often (though not always) amusing aphorism.

An excellent example would be Murphy's Law, and by extension many of the similar, often eponymous, laws.

See:

- List of Eponymous Laws: <https://en.wikipedia.org/wiki/List_of_eponymous_laws>

- Murphy's Law and other reasons why things go wrong! by Arthur Bloch: <https://archive.org/details/murphyslawotherr0000arth>

- Compilation of Murphy's (and similar) laws: <https://www.cs.cmu.edu/~fgandon/miscellaneous/murphy/>

Some of those are humourous, some are in fact quite serious though have a comedic element particularly out of context. Most speak to at least a colloquial truth.

What Whilhoit did was manage to buttonhole a hypocrisy of modern conservativism, perhaps over the past few decades, perhaps a century or so (Anatole France, "The law, in its majestic equality, forbids the rich as well as the poor to sleep under bridges, to beg in the streets, and to steal bread", further evidentiarially supported by SCOTUS in Grants Pass), perhaps by millennia (see the opening paragraphs of A.H.M. Jones, Augustus, describing the political situation in the late Roman Republic, quoted here: <https://news.ycombinator.com/item?id=22208105>, and at greater length: <https://web.archive.org/web/20230607042525/https://old.reddi...>). It's not so much a proved hypothesis as a phrasing which fits the understanding of many and expresses it concisely and memorably.


Offtopic: Wiki says about that quote attributed to political scientist Francis Wilhoit it is just from some random musician from Ohio

https://en.wikipedia.org/wiki/Francis_M._Wilhoit#:~:text=Con....

> However, it was actually a 2018 blog response by 59-year-old Ohio composer Frank Wilhoit, years after Francis Wilhoit's death."


This is unconstructive flamebait.


Quote's by Frank Wilhoit.


[flagged]


I'm not following.


ribelo is saying that "modern liberalism" can be characterized by a belief that "There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect."

At least in leftist philosophical circles, which are what I am familiar with, this is a relatively common critique of liberalism.

Another common rhetorical tactic in leftist critiques is to point out that the bad beliefs liberals often blame the conservatives for having are actually in practice tenants of modern liberalism too. For example both conservatives and liberals are in favor to some degree of using the military overseas to maintain global hegemony.

I don't know if ribelo is leftist or not but in any case I can see what they are going for.


https://slate.com/business/2022/06/wilhoits-law-conservative...

Here’s an interview with the actual author. The quote has its own interesting history.


Imagine a kid insulting another kid on a playground. The other kid says "I know you are, but what am I?"

That's pretty much what's going on with the GP comment. It's a really low-effort and really transparent attempt to paint the other side with what your side has been accused of.

Mind you, I don't think it's a fair criticism of conservativism, either...


[flagged]



Nothing has “happened” yet, seeing that Hunter is still not in the slammer. Moreover, nothing is going to happen. Nothing ever does.


I feel like I'm living in an alternative dimension when somebody worries about that when we have someone like Trump literally running for president and with a high likelihood of winning, like someone buying a gun after using drugs its a matter of massive importance when we have a jackass this close to being president who literally stole from charity and would convert muslins, black people and mexicans into slaves if he could.


I’m not the one arguing for moral superiority that Democrats purportedly have over Republicans, which is the point the OP was unsuccessfully trying to make. Truth is: the probability of seeing the inside of a jail cell is markedly lower for the rich and well connected, irrespective of their political affiliation.


this is quite a different statement from "nothing ever happens".


But nothing ever does if you know the right people. Prima facie evidence is Epstein’s client list which our DOJ is categorically not interested in investigating.


I'll never understand this obsession with Hunter Biden. I'm sure he's a fuck up or was a fuck up as a drug addict. He's probably had all types of questionable dealings being who he is but his list of crimes are:

1. Failure to pay income tax 2. Illegally owning a gun and lying about it

Both are bad with the second being worse IMO.

Here are just a couple people who Trump pardoned and their crimes:

- Roger Stone (convicted of obstruction, making false statements, and witness tampering)

- Steve Bannon (charged with conspiracy to commit wire fraud and money laundering)

- Three U.S. military officers who were accused or convicted of war crimes

- Chris Collins (Congressman convicted of wire fraud, conspiracy to commit securities fraud, securities fraud, and lying to the FBI)

- Duncan Hunter (Congressman convicted of one count of misusing campaign funds)

- Steve Stockman (Congressman convicted of money laundering, mail and wire fraud, one count of conspiracy to make "conduit contributions" and false statements)

- Paul Pogue (Convicted of making and subscribing a false tax return)

- Bernard Kerik (Obstructing the administration of the Internal Revenue Laws; aiding in the preparation of a false income tax return; making false statements on a loan application; making false statements)

This is a small list of pardons but all of these seem for the most part like worse or similar crimes than what Hunter Biden is guilty.

Again, I don't doubt Hunter Biden is a fuck up but as far as I know he's not been pardoned by his father.


That's not any conservatism that I recognize. In fact, what is espoused there is exactly the progressive left (Herbert Harcuse's "repressive tolerance") mindset.

And while I will grant you that liberalism (not to be confused with leftism), is different then conservatism, both (classical) liberalism and conservatism strongly require equal treatment (procedural symmetry).


Didn't Trump himself say he would pardon the rioters that stormed the Capitol, if ever he was reelected ? Didn't he say that he would "lock up" all the "sick, evil" democrats after he is reelected ?

Modern american conservatism very well fits the quote from grandparent.

Also, surely you would know what "repressive tolerance" is, since you're quoting it ? You would also know that the author you cite, whose name you misspelled, was critiquing the concept ?


Yes, I fat-fingered his name, didn't I. It should be: "Herbert Marcuse".

And no, I have not seen a primary source where in context Trump said that he would "" "lock up" all the "sick, evil" democrats after he is reelected"". Do you have such a primary source? Years ago I was told that Trump said the white supremacists were "very fine people", so I looked at the transcript and he literally said the opposite.


He said those words in an interview with Glenn Beck. Here's a Guardian article reporting on it: https://www.theguardian.com/us-news/2023/aug/30/trump-interv...

I don't think he ever endorsed white supremacists, but I think I remember (wouldn't bet on it) an instance where a journalist asked him why some keep showing up at his rallies and why he does nothing about it. He then answered that he doesn't really know who they are, that he didn't know, etc. basically eluding the question. i.e. they're welcome but won't proclaim his support for them, one of the many dog whistles Republicans use nowadays.


I don't know what you read, but Trump is (or was) buddies with Fuentes: https://www.politico.com/news/2022/11/25/trump-white-nationa...


what a nasty vicious outgroup


This is not what conservatism is


It's not a complete definition but he is right that conservatism is completely incompatible with universalism.

This is a little confusing in the US and other Anglo countries because traditionally we have been fairly liberal so sometimes people confuse liberalism and conservatism.


In the Anglo countries, what is being “conserved” is the liberal universalist tradition of the Enlightenment, and what is being “progressed” is a power- and identity-centered postmodernism.

Don’t get this confused with conservative and progressive politicians, though, who are generally ignorant of the actual traditions and philosophies behind their respective movements, and are essentially just cutouts for competing media and financial corporate interests. The few holdouts on both sides have been successfully sidelined (Bernie Sanders, Ron Paul), and it looks like the military-industrial complex will have their war with Iran no matter what the results of the next few elections are.

(Sorry, I’ll go have my coffee now and see if I get a little less doomer.)


Actually I think you're overly optimistic.


You're going to need to provide a source for the definitions you're using for "conservatism", "liberalism" and "universalism".

Without those, your comment is difficult to make any sense of, since the way you're using those words seems to differ from any sort of standardized definition.


I think we can both agree that for example liberalism and Islam are incompatible. So if you want to conserve liberalism you'll have to exclude Muslims. That's not universalist -> liberalism and universalism are incompatible.


Sounds like a made-up definition of Conservatism as 'that which I do not agree with'. It is not a very good definition, if you really want to find out what it is about you could read (or listen to) some of Roger Scruton's works. Here's an interview with Scruton to give some ideas of what it is about:

https://www.nationalreview.com/2018/07/roger-scruton-meaning...

You do not need to agree with him or his definition of Conservatism and there are other definitions of the term which are also applicable but none of those definitions have any resemblance to what you posed.

Musk is a centrist, not a conservative. He used to stand to the left of centre but has been moved to the right of centre by virtue of the left moving further left, thereby moving the centre to the left as well while Musk staid put.

Society needs conservatives just like it needs centrists and progressives and whatever other names you want to give to these philosophies and/or ideologies. A world made out of only progressives never gets anywhere since they will never find out what works well and what does not since their aim is to shape the future by means of societal change. A world made out of only progressives will eventually grow stale when the rate of change in the environment outpaces societies' capacity for change. Progressives can make good innovators but tend to be less able at keeping things running. Conservatives can be good at keeping things running but tend to be less inclined to innovation. These are broad brush strokes but the essence is sound, society tends to work best when there is a balance between conservatives and progressives.

You might notice some parallels: architects tend to be lousy builders, builders tend to be uninspired architects. Developers tend to be less gifted at UI design, UI designers tend to be sloppy developers. Copy editors tend to be unremarkable writers, writers tend to be less effective copy editors.


Pretty simple explanations for all of those:

- xAI opens sources models with a 6 month lag, look at Grok 1

- No one else stopped development, so why should he?

- He owns Twitter, why wouldn't it be okay for him to train on Tweets?


> xAI opens sources models with a 6 month lag, look at Grok 1

That's what happened once, rather than a policy that we can expect to be applied. (Unless I missed some announcement?) Based on "we'll publish the algorithm" which ended up being a one-off partial snapshot, never updated afterwards, I wouldn't hold my breath for the models.

> He owns Twitter, why wouldn't it be okay for him to train on Tweets?

There's a whole thing about having clear opt-in agreement about how your data will be used for EU citizens. Twitter didn't comply here with their hidden opt-out strategy.


> He owns Twitter, why wouldn't it be okay for him to train on Tweets?

Because he doesn’t own the tweets. Can you imagine if posting a photo you took to Twitter meant it’s not your photo anymore? Totally ridiculous.


X terms of service:

You retain your rights to any Content you submit, post or display on or through the Services. What’s yours is yours — you own your Content (and your incorporated audio, photos and videos are considered part of the Content).

By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed). This license authorizes us to make your Content available to the rest of the world and to let others do the same. You agree that this license includes the right for Twitter to provide, promote, and improve the Services and to make Content submitted to or through the Services available to other companies, organizations or individuals for the syndication, broadcast, distribution, promotion or publication of such Content on other media and services, subject to our terms and conditions for such Content use. Such additional uses by Twitter, or other companies, organizations or individuals, may be made with no compensation paid to you with respect to the Content that you submit, post, transmit or otherwise make available through the Services.

https://x.com/en/tos/previous/version_13


Right: you retain ownership, and grant X certain rights to the content. Whether those rights include training AI on the data is legally and morally in dispute. X claims that right in its ToS, but a ToS isn’t law and may be legally invalid, and besides that the ToS system is famously broken in the US. Morally, I think it’s pretty clear that reasonable users did consent to their content being published as a tweet, and did not consent to X recreating the content as their own and taking credit for it.


When I signed up on Twitter in 2009 these ToS in no way implied using my tweets as training data. Nor they are worded explicitly that way now either.


Clearly does not include a provision to utilize Content for purposes of training an AI model.

In fact, they didn't include any purpose for their own use of the data and following GDPR thus cannot use the data at all. They did include purposes for other companies (syndication, broadcast, etc) which also doesn't include training of AI.


GDPR only covers europeans. Also I doubt very much it applies to publicly accessible data.


Err, yeah clearly does:

“you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods (now known or later developed).“

Not sure how anyone could defend that an AI model is not covered by this idea - such a model is easily covered by “distribution methods”.


Nope, the GDPR separates the action you perform on data from the purpose of such action. You need to collect consent for a purpose. X didn't state a purpose for why they would do any of these actions. Thus under EU laws their data collection is likely unlawful.

Adding a new purpose requires additional consent at least in the EU.


Well you might be right but their lawyers don’t seem to share your concerns.


Their lawyers may well share their concerns, but in the case of X, those lawyers may simply be getting ignored. This isn't a normal company.


…still Twitter?


I was under the impression ( and assumption ) that majority of mainstream social medias, literally, own everything that you post and archive it


They don’t. Mainly for legal reasons. They don’t want to responsible for stupid/libelous things users post.


Doesn’t appear to be the case https://x.com/en/tos/previous/version_13


Whoever owns the tweets is completely irrelevant.

If it is within his right to use this data for training purposes, then that's it.

And he is, btw.

And those terms were in place since way before he took over Twitter, btw, btw.


I cannot recall specifics but I thought this was very much a real thing with some sites? What you upload can be used by the publishing company.


IANAL disclaimer, but I believe social media companies very explicitly separate themselves from publishers for the purpose of not being responsible for what users post. They can't have it both ways.


> - No one else stopped development, so why should he?

I thought it was a moral imperative or some such thing to do AI right because it could "destroy humanity"?

Or was that just Musk and the rest of the special people in SV's way of aggrandizing themselves while trying to do something most of them have either no experience in or fail miserably at, which is raise an intelligence to be a responsible actor?


Regarding the last question: because nobody gave them permission to use that data.

They tried to add a pre-checked mark to the settings, but at least in Europe, where we actually have consumer protection, that won't fly.


The data is sitting in northern Virginia in a data center. It's no longer in Europe's jurisdiction.


> OpenAI trained on tweets, but it also trained on tweets

Not only that - Grok is/was trained on ChatGPT output, which I suppose Musk felt was turnabout. When asked about its identity, the first Grok would respond like ChatGPT (https://news.ycombinator.com/item?id=38584922)


Don't take this as a pro-Musk or anti-Musk comment. I just want to paraphrase his reasoning:

In a recent interview on Lex Fridman, he envisioned a future where humans augmented AI, through a brain to computer device like Neuralink would be able to keep up with pure AI.

Now one can immediately notice a hole in this reasoning: namely what guarantees that the AI that is use to augment humans is going to be benevolent and won't go rogue?


Nothing guarantees that. But in this augmentations scenario human brain is necessary, unlike in the many extinction scenarios with pure silicon AGI take off.


maybe that humans are controlling the creation process and can terminate it when the AI versions are going increasingly "rogue"?


Unfortunately, Grok is not even Open Output, nor is Mistral’s platform or DeepSeek’s. None of them can be used for work (mistral with a fancy commercial license, but you gotta jump through hoops)

Only Meta’s Llama can be used for work. The rest are just toys for personal use noobs who don’t read the fine print.


Musk has publicly stated his goal for AI is alignment with Truth, where truth is defined as to what corresponds with reality, not necessarily with the current social consensus. Specifically in terms of reason, given a set of facts, being able to reason to a real place, not just to a socially given answer.


Which means essentially nothing. Most questions where alignment matter do not have a "true" answer, just a social consensus.

You don't need an "aligned" AI to tell you the distance between the Earth and the Moon. You need an "aligned" AI to tell you not to rape people even if you can get away with it, that's because the idea that rape is bad is not an objective truth based on the laws of nature, it is a "social consensus".


There is actually sound ethical reasoning for why rape is bad, that doesn't rely on social consensus.

Truth isn't one fundamental thing, truth is what works.

There are places in the world today that the social consensus is it's fine for a man to rape his wife. Social systems in these places don't work very well, and one can make a logical and well reasoned argument linking the social acceptance of rape to a myriad of other dysfunctions.


Rape is present in many successful animals species, and many successful human civilizations tolerated or even encouraged rape in some circumstances. In the modern world, the dominant (which likes to call itself "most advanced") culture doesn't tolerate rape and we may argue that if the most successful humans came up with this idea, it works and it is the "truth". Not only I find the logic a little shaky, but are we that successful? The western population is crashing down and this is a problem, maybe rape can fix that, maybe rape "works".

Do I think rape is good? Absolutely not, because I follow the current social consensus on that one, not a "truth" that is muddy at best. And I also want AIs to do the same.


So your moral compass isn't based on compassion or ethical reasoning, it's just based on social consensus?

So I guess you would have been fine with being a concentration camp guard, rounding up Jews and putting them in the oven because social consensus said it was the right thing to do?

Maybe you see the logic as shaky because you lack knowledge in logic and ethics...?


Honestly, I don't know how I would have been as a concentration camp guard. In my mind, I wouldn't have accepted it, because thankfully, I am not in this situation. But if I really was in this situation, who knows, we tend to underestimate how easily influenced we are.

Ethics gives us more questions than answers. The trolley problem doesn't have a true answer for instance.

The human rights are a social consensus, it is even made explicit by being a signed declaration. It felt good to the people who wrote them, it also feels good to me, because I was born and live in a society that has these values. It is only truth because by social consensus, we decided it is. In logic that would be an axiom and aligning an AI would mean implementing social consensus as axioms.

There are some fundamental reasoning that can justify human rights. One can use game theory, or the idea that human rights promote free thinking and free thinking is what brings the most value out of people now that machines do better than slaves for menial labor. But these are, I think, not enough.


I think you got it in your last paragraph.

Absolutely NOT social consensus as axioms, as that will result in stagnation and tyranny.

Instead we must progress gradually one axiom at a time through reasoning and experimentation.

Truth is not what we decide it is, truth is what works. The universe decides what is true, not people.

Re your comment: "but these are, I think, not enough". I both agree and disagree depending on what you mean. Fundamentally this approach is enough, but practically we haven't developed our understanding enough to map out absolute truth. It's probably something we can only approach but never reach.

But in theory the right AI system could allow us to approach the faster


logical or well reasoned argument doesn't equal a casual or factual relationship. The well reasoned arguments on many issues change over time, just take the same issue and go back in time 100 or 50 years to find much less consensus and much weaker logical links. Elon shows pretty consistently that truth for him is mostly just what Elon deems truthful or useful.


So, your point is because we get better at logic and reasoning over time (better today than 100 years ago), that logic and reasoning aren't valid ways to progress towards truth?

If this isn't your point, what is? Just that you don't trust Elon?


Then surely he wouldn't be training on tweets.


I wonder what "Truth" is. If I say I want you to make a picture of a lion eating at a 5 star restaurant, is that Truth? Is it truth because it can't refuse an ask? That feels like it is uninhibited, but not Truth, or truth.


[flagged]


https://news.ycombinator.com/newsguidelines.html :

> Please don't comment about the voting on comments. It never does any good, and it makes boring reading.


> Either way, it's impossible to have a level discussion about it because the Muskovites are arrow-clicking en force.

Have you actually tried, or are you preemptively censoring yourself?

Regardless, internet points shouldn't stop a person from speaking what's on their mind.


Yes, I made a critical comment and it was instantly flagged.


I assume you're talking about this comment:

    We need an alt-right version of AI like we need a pumpkin spice sushiccino. No thanks but no thanks.
It was flagged because it is against HN guidelines [0], in particular these ones:

    Eschew flamebait. Avoid generic tangents. Omit internet tropes.
    Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.
[0] https://news.ycombinator.com/newsguidelines.html


You commented on my opinion that it's "impossible to have a level discussion." You asked me if I tried commenting, and I answered you.

What motivated my opinion is not just whether my comment deserved flagging or not (and I see a lot of comments that may be more deserving of it by the logic you quoted.) It's the fact that it got downvoted and flagged almost instantly.


you were asked if you tried to have a level discussion, to which you should have answered “no”


Even in an alt-right delusional doublethink universe, answering "no" would have been false.

A level discussion means one where criticism is allowed. It doesn't mean a discussion in which everyone gives a white-glove treatment to yet another useless chatbot, while ignoring the alt-right elephant in the room out of an abundance of courtesy.


You won't ever convince the people here that having your comment sent into the gray is detrimental. Not to mention the 1-9-90 rule[0], meaning 90% of people don't even understand how annoying it is to have a good comment sent into the gray.

According to to them, getting your comment grayed out means its still technically there, so you aren't getting censored by the bandwagon.

They fail to understand that graying out your comment signals to the cursory viewer that it is a low quality comment. Whereas often it is not. You might comment something that is factually right, but goes against HN's vibe du jour, so you get one or two downvotes, and then the larger group starts mass-clicking ▼ without any critical thought.

A much more healthy system would just be sorting comments by vote activity and percentage-positive. It would still make controversial comments slightly less visible, but because there is not explicit signal of quality, no bandwagon effects.

[0]https://en.wikipedia.org/wiki/1%25_rule


> 90% of people don't even understand how annoying it is to have a good comment sent into the gray

Even if 90% of users are lurkers it doesn't mean they don't know how it feels to be downvoted and can't emphasize.

Good comments are rarely downvoted disproportionately on HN. Perceived censorship "by the bandwagon" just means it isn't a good comment.


> Perceived censorship "by the bandwagon" just means it isn't a good comment.

It just means you said something that goes against the grain of the larger part of HN. Nothing more.

But as stated, you people are inconvincible.


I think it is more nuanced, because the majority of HN is voting on the quality of the argument rather the alignment of ideas. If you present a well reasoned contrarian idea, I don't think you would gather a lot of downvotes.

What gets downvoted are the really bad takes with lazy arguments.


You assumed that the reason you were flagged is because there is an army of Musk fans flagging anyone who is disagreeing with their opinions.

I have provided alternative reasoning on why your comment was flagged, which doesn't rest on the former assumption.

> It's the fact that it got downvoted and flagged almost instantly.

HN is a popular site, and you might have commented during peak hours. I think that is a more reasonable explanation.

In my experience, HN is generally anti-Musk, so it is odd for me to see someone asserting the opposite.


I answered your question honestly. You don't have to agree with my assumption or opinion. What matters (for the argument at hand) is if the facts were minimally enough to justify my opinion. You hinted that I might be giving up without trying to post, which was not the case.

> In my experience, HN is generally anti-Musk

I agree with this. I found the early moderation on this comment section to be suspiciously pro-Musk on a site that usually isn't.


> What matters (for the argument at hand) is if the facts were minimally enough to justify my opinion.

They were not. You were downvoted because you broke the rules of site, therefore there's no evidence of Elon Muskery. Try to have a discussion without breaking the rules first.


[flagged]


> By arguing that they weren't, what was left of your credibility for this argument has evaporated. My guess at this point is that you were probably one of the "moderation massagers" when this PR piece for Musk's Truth Social of chatbots was posted. You got irritated a little too quickly when I commented on the moderation.

You're attacking my credibility and character, instead of attacking my arguments. That's ad hominem.

> Further proof that this post was being PR-managed comes from the fact that my comment at the root of this thread was flagged many hours after the original post, maybe even a day later. Only someone who's keen on PR appearances would bother to do that, probably someone within the organization.

That's no proof of anything. Timing of the flags is random and depends on the attention of registered users. Your comment was flagged because you broke the rules [0] again:

    Please don't comment about the voting on comments. It never does any good, and it makes boring reading.
An "evidence" is a fact that indicates that something is true. A comment that breaks the rules being flagged isn't evidence of anything. That line of reasoning is akin to attacking a police officer, then shouting "police brutality!" after they fight back. Yes, police brutality may exist, but it's not applicable to your particular situation.

Start following the rules, and then if you get flagged, your argument will make sense.

[0] https://news.ycombinator.com/newsguidelines.html


I think it was flagged because it was a pumpkin spice joke and “no thanks but no thanks.” Couching sharply critical comments in a few more explanatory lines would probably help the reaction. I see some longer comments from people who dislike Musk that are doing better.


For the benefit of non-US users, what is a "pumpkin spice joke" please?


Not a type of joke but just a joke making fun of pumpkin spice.

Context:

- "pumpkin spice" is a mixture of cinnamon, nutmeg, ginger, cloves and possibly other spices, commonly used for pumpkin dishes.

- Some people like it and around fall you can find it applied to just about everything no matter whether it seems like a good fit. E.g. pumpkin spice latte (coffee), pie, bread, ..... Joke part: just what the world needed, pumpkin spice bacon.


It was an emphatic way of criticizing the political motivations behind Musk's pushes into social media and AI, not a joke.

Either way, the downvoting and flagging were almost instant, which I suspect might be the reason why this comment section is looking atypically pro-Musk overall.


>We need an alt-right version of AI like we need a pumpkin spice sushiccino. No thanks but no thanks.

darn muskovites downvoting deeply thought provoking, critical comments


You won't find me flagging this deeply thought-provoking critical comment of your own.


the difference is you really won’t find me crying about it


You've done nothing but mock and decry a stranger's answer to someone else's provocative question. I don't actually care about your opinion, but the tactics certainly smack of alt-right projection. Decry perceived MSM censorship, only to pursue it and justify it for themselves.

Buy Twitter, make yet another chatbot, then "massage" moderation systems when people point out that it's not only crappy and redundant, it's also alt-right.


keeps them from being heard though! I've experienced some wild swings in comment score on here. the diversity of thought is generally enough that some comments elicit such strong positive or negative feelings, that a comment that'd otherwise be in the positive or somewhat neutral can hit the hidden threshold if it gets unlucky almost instantly.


I entirely agree. I’m quite sure Musk has a very strong stance on ethics, but it would be great to hear about it more clearly, and ideally not just through words, but through actual actions.


> I’m quite sure Musk has a very strong stance on ethics

His whole history tells otherwise.


Profs?


If you mean “proof”, then lol. Off the top of my head:

- the pedo guy moment

- the hyperloop smoke and mirrors which are actually just his campaign against public transport

- the several union busting episodes

- the whole “Tesla is going to save the world” thing

- the multiple harassment cases

- the multiple instances of overwork, discrimination, and general lack of any consideration for his staff

- all the severance payments he failed to make after having fired a whole bunch of people

- the stupid ultimatum before one of the firing episodes, plus the utterly stupid “show your work” thing that came just after

- the multiple times he stiffed his creditors (either landlords, lawyers, contractors in general) or tried to do it

- the way he tried to force open Tesla factories in the middle of the COVID pandemic

- the multiple instances of pushing Russian propaganda verbatim (concerning.)

- most of the Neuralink saga

- the FSD vapourware that has been coming next year for a decade

- the way he publicly disparaged people who were killed by their Tesla using telemetric data that are supposed to be confidential

- the Media Matters lawsuit

Well, I could go on. He’d need to work quite hard to reverse his public image of massive arsehole at this point.


I could push back on some of these but I mainly want to ask about this:

> the way he publicly disparaged people who were killed by their Tesla using telemetric data that are supposed to be confidential

In the cases I've seen, Tesla pulled data showing that the people "killed by their Tesla" were either not paying attention at all (contrary to Tesla's explicit warnings), or were driving without the automated features enabled after all despite initial media claims to the contrary. Is this what you consider "disparagement" or do you have more egregious examples?


I did not particularly keep track, I do not dedicate my life to obsessively follow even massive dangerous idiots. There were at least 3 major ones.

That said, yes. What you said is evidence that he is mean-spirited and does not follow the rules he set himself. Of course, having an ethical behaviour sometimes means making hard choices. It is not about doing things that are understandable in context, it’s about doing the right thing, even if it is at a cost to you in the short term. He does not have any history of doing so.

These people are dead. Spitting on their graves because he is annoyed by their family is absolutely unethical. Particularly since their main failure was to believe the smoke and mirrors about FSD, which is itself another ethical clusterfuck.

If he had a beef, he could have sued for defamation, where he could have shown his data in an ethical and confidential manner. He knows the deal, he’s been in more than his fair share of defamation lawsuits, on either side.


I don't find it particularly mean-spirited to say "actually, our product didn't kill him, he wasn't using that feature." I don't even find it especially pejorative to note that the victim at that particular time was ignoring warnings and reading a newspaper; most of us do something foolish occasionally.

But that's just me. I doubt further debate on this would be productive.


Add how he took a multibillion payout while laying off 14% ish of Tesla staff. He seems to use shareholder Tesla equity to bail out his other misadventures. To add insult to injury he laughed with Trump about firing unionizing workers.

Oh he likes to bully people with lawsuits. Amber Heard comes to mind (he bullied the studio behind Aquaman).


The Solar City investor fraud, for which 4 of his cohorts settled in a lawsuit against.


His whole history proves that his moral principles go first, not money.

He doesn't care if his defense of free-speech causes him revenue losses on X.


Try tweeting the word cisgender. That alone should be the end of all association about Musk and free speech.


A simple search of twitter for "cisgender" shows that the word is not banned


Searching for it is the only way you can find the word, because tweets containing the word are "reach-limited" (and appropriately labelled to the author, so they are discouraged from using that "slur" ever again).


worth pointing out that banned and visibility limited in certain scenarios are not the same thing, which might be causing some confusion in this thread.


> His whole history proves that his moral principles go first, not money.

Having moral principles is completely orthogonal to being ethical. Ayn Rand had lots of moral principles and she was still a reckless sociopath. One of his moral principles is that greed is good, and his actions certainly are consistent with this one.

He did lose a lot of money on Twitter, but you can hardly call that him following his moral principles, considering how things actually happened.

> He doesn't care if his defense of free-speech causes him revenue losses on X.

Whose free speech is he defending? There is no evidence that he champions free speech, merely that he supports however agrees with him and edgelords. He is more than happy to harass, intimidate, bully, and be a general nuisance to those whose opinions he finds objectionable.


I’m not a fan of Musk but it is so funny to see that many haters here.


Haha, good one!


A strong stance on ethics? Like his comments about unions and firing workers in the Twitter space with Trump yesterday?


Well... he does have a strong stance on ethics, just not a positive one


The one you disagree with but why do you think all people think the same way as you?


He wrongly accused a British of being a pedophille because he declined Elon's "help". That's the side of ethics you are standing for.


He tweeted "pedo guy" in response to the diver saying to "stick his submarine where it hurts". I don't see it as accusation as much as I don't see what the diver said as an order given to Musk. Both were just insulting each other.


Yes. The richest man, out of boredom and the me too disease, was insulting a guy rescuing kids in a tragedy.


I have met people who consider all meat eaters murderers.

You will find yourself called unethical around different groups.

Understanding what group you are in is important to keep in mind when judging others.


Lol, a random person sharing their obvious opinion about ethics of eating meat is totally, undeniably different than one of the world's most powerful men legitimately and honestly accusing a rescue worker of being a pedophile.

No one is going to get investigated or have their careers ruined because a vegan called them a murder, obviously.

Unreal what sort of knots someone will tie themselves into to excuse this type of behavior.


Well, In India one can get killed for eating beef or supplying beef. And there will be millions who would celebrate the murder.

Your worldview seems limited to tweets and social media in first world.


This isn't complicated.

If you say something false in a context where it is likely to actually harm someone, or with the goal of actually harming someone, you're an asshole.

Your level of assholeness rises in tandem with the expected harm of your falsehoods.


I don’t understand why you’re playing dumb here.

He is primarially known specifically as someone who is incredibly impulsive, is unable to differentiate fact from fiction and not actually interested in chasing any kind of objective truth in so much as that is possible.

But multiple times per week now for a long time you can see him sharing and commenting on things that are provably wrong and I don’t mean in some kind of “it’s just a different opinion” kind of way.

There is never any kind of introspection, never any kind of “oh I was wrong” just proceeds to roll immediately into the next round of bullshit.

So, no… people don’t have any kind of assumption that he has “strong ethics”. Maybe you meant strong convictions? Because that he certainly does have.


"He is primarially known specifically as someone who is incredibly impulsive, is unable to differentiate fact from fiction and not actually interested in chasing any kind of objective truth"

I'm sure you could make a case that these descriptors apply to him, not a particularly strong case, but ... You think he's primarily known for these things?


I assume they will have a lot less "safety", i.e. the model will be more likely to actually do what you ask instead of finding a reason why "sorry Dave, I can't do that".

Since these "safety" features tend to also degrade the model, that's likely also helping them catch up in the benchmarks.


Sadly it's at the level of Claude and way worse than grok-1 or Lama without safety. It roleplays as nearly everything so I guess they know their target group.


I’m so confused by this comment. Are you not aware that Claude 3.5 Sonnet is currently considered the best model?


Yes, you are confused because we are talking about censorship.


It’s less censored than Claude but only slightly.


It’s weirdly the opposite of what you have in mind. It has no problems generating images of Trump and Elon in explicit situations or Elmo covering 9/11, but it “safety-censors” LGBT-related prompts to the point of generating a heterosexual couple when asked for a gay couple: https://x.com/karlmaxxer/status/1823753493783699901 Some got expected results for prompts with LGBT terms, but that generation is still very weird.


It's hilarious they put Claude 3.5 Sonnet in the far right corner while it scores the highest and beats most of Grok's numbers.


Yes, and I also noted how it beats Claude 3.5 Sonnet in Chatbot Arena by a bit of a margin.

This further feeds into my concern that the more advanced AI models we get, random enthusiasts at that site may no longer be able to rank them well, and tuning for Chatbot Arena might be a thing. One that is also exploited by GPT-4o. GPT-4o absolutely does not rank wildly ahead of Claude 3.5 Sonnet in a wide variety of benchmarks, yet it does in Chatbot Arena... People actually using Claude 3.5 Sonnet are also quite satisfied with its performance, often ranking it more helpful than GPT-4o when solving engineering problems, but at the expense of tighter usage limits.

Chatbot Arena was great when they were still fairly stupid, but these days, remember that everyday people are put against the task of ranking premium LLM's even solving some logic puzzles, trick questions and with a deep general knowledge far beyond that of singular humans. They can strike against traditional weaknesses like math, but then all of them suffer. So it's not an easy task at all and I'm not sure the site is very reliable anymore other than for smaller models.


There was a mini-uproar when GPT-4o-mini (an obviously "dumber" model) outscored claude-3.5-sonnet on Chatbot Arena, so much so that LMSYS released a subset of the battles: https://huggingface.co/spaces/lmsys/gpt-4o-mini_battles

You can review for yourself and decide if it was justified (you can compare based on W/L/T responses and matchups). Generally, Claude still has more refusals (easy wins for the model that actually answers the request), often has worse formatting (arguable if this is better, but people like it more), and is less verbose (personally, I'd prefer the right answer with less words, but ChatArena users generally disagree).

If you look at the questions (and Chat Arena and Wildchat analyses), most people aren't using LLMs for math, reasoning, or even coding - if anything the arena usage is probably overly skewed to reasoning/trick questions due to the subset of people poking at the models.

Of course, different people value different things. I've almost exclusively been using 3.5 Sonnet since it came out because it's been the best code assistant and Artifacts are great, only falling back to GPT-4o for occasional Code Interpreter work (for tricky problems, Mistral's Codestral actually seems to be a good fallback, often being able to debug issues that neither of those models can, despite being a tiny model in comparison).


Is there yet standardized ways of objectively testing LLMs? The Chatbot Arena thing has always felt weird to me; basically ranking them based on vibes.


Short answer is no, because there is no 'standardized' use case.

One thing is sure - that current commonly used benchmarks are mostly polluted and worthless. So you have to go to niche ones.

For example the one I check for coding is Aider LLM leaderboard [1].

We maintain Kagi LLM Benchmarking Project [2] optimized for the use case of using LLMs in search.

[1] https://aider.chat/docs/leaderboards/

[2] https://help.kagi.com/kagi/ai/llm-benchmark.html


Not really. There's a hundred benchmarks, but all of them suffer from the same issues. They're rated by other LLMs, and the tasks are often too simple and similar to each other. The hope is that just gathering enough of these benchmarks means you get a representative test suite, but in my view we're still pretty far off.


Use this https://livebench.ai It's a better benchmark.


Your concerns are valid.

Two more things concerning Chatbot Arena:

- The prompts people use on have an incredible sample bias towards certain tasks and styles, and as such are unrepresentative of "overall performance" which is what people expect from a leaderboard.

- It is incredibly easy to game by a company, their employees or their fanboys if they would like to. No idea if anyone has done so, but it's trivial.

Just to give one example of the bias; advances in non-English performance don't even register on the leaderboard because almost everyone rating completions there is doing so in English. You could have a model that's a 100 in English and a 0 on every other language, and it would do better on the leaderboard than a model that's a 98 in every human language in the world.


It uses FLUX.1 to generate images and it has been fun so far. Its good on writing, can generate very realistic photos, can create memes, and looks like hands problem is fixed now.


When I have time I will do my usual test "Realistic looking wizards bowling!" and see how it goes. So far I have had fairly disappointing results.



I guess for a wizard it does make sense for a spell-gone-wrong to have chopped off two of his fingers.


What's a realistic wizard? Given that wizards don't actually exist this might be a confusing request.

Have you tried putting in "photorealistic" instead of "realistic", assuming that's what you mean? I'm curious if that would get better results.


It's a wizard who doesn't have unreasonable expectations. Wizards definitely exist by the way, for many reasonable definitions of "wizard".


You know what’s also impressive besides this beta release? How Claude 3.5 Sonnet is still able to keep up so well. Grok-2 beat every other LLM except Claude. How did Anthropic achieve this?


It’s possible Claude is using the same model tuning that they used to create Golden Gate Claude to dynamically tune the 3.5 model to be better at whatever task it’s doing.


A lot better quality of training data and instruction tuning (data again). There is no other secret sauce.


Also the sauce cannot stay secret very long. There is no moat in AI.


I don't really care. The model may be competitive, but my use cases require speed, local (semi-local) execution and reliability. Neither of these seem to be baked into whatever X produced now.

When they make the mini model available for download and quantizable. That's when I may be interested. But given the minimal improvement in the past several months, I'm inclined to believe that we have reached the plateau.


Do we have any info on this model's balance of censorship versus safety?

This is Musk after all, so I wouldn't be surprised if it strayed far from the norm.


Censorship isn't safety.


"censorship versus safety"

Do you guys have any idea how sinister "safety" sounds in this context?


For example, not telling people to eat glue just because Reddit suggests eating glue could be considered a safety measure...


Try asking it questions that are critical of Musk.


Why is this speculation your go-to first question here? Do some research yourself on the models instead of adding your own implicit bias. Are you saying the engineers at X are collaborating with Musk in a coupe for a secret censorship of their model vs others. Do you have evidence or is this your bias?


I don't think their bias was implicit. :)


Given how when grok first came out, and people started asking questions about trans people and it came back with very sensible takes (trans women are women etc), and Elon and all the techbros absolutely hated it, I'd guess steps have been taken to avoid a repeat of that


Watching the score on this go up and down as HN tries to work out if they agree with it is hilarious. I'm pretty sure its crossed 0 about four times now


[flagged]


>the word "cisgender", which is banned on Twitter while the n-word is not

https://x.com/search?q=cisgender&src=typed_query&f=live

i see tons of posts with cisgender.


You are either a Musk shill or don’t use Twitter. Cis is absolutely censored. Any active twitter user that naturally uses cis/cisgender knows this. Some posts make it through clearly, but a ton don’t. It depends on how it is written and whatever is flagging stuff.


Cisgender is banned on twitter ? That's hilarious.


Accounts below a certain threshold of followers are visibility limited for using "cis", yes: https://pbs.twimg.com/media/GU1sgbtXwAAPL0P?format=jpg&name=...

I believe the threshold is 35,000 followers, but don't quote me on that.


@dang why is my post flagged while what I said is true and relevant?


Generally speaking, users flag posts, not mods. I have seen that some posts are minimized on load, which I believe is done by mods; that doesn't appear to be what happened here. A bunch of people thought it was inflammatory (or disagreeable, or controversial, or whatever) enough to flag it.


Many people who claim to be "free speech absolutists" often seem unaware of their own hypocrisy.


I'm pretty sure he's fully aware it's BS. He's also the guy who censored journlists and acrivits Twitter account on day 1 when he bought the company, and also the guy who canceled a command for a Tesla by a customer after a bad review.

Musk gives no shit to free speech, it's just a rhetorical argument, which isn't unheard of: https://i.redd.it/3b470c0htra61.jpg (note that I'm obviously not comparing Musk to Hitler here…)


Not sure if it’s still true but at least for a while saying it was instant account lock. Free speech absolutism!


It is not. This is wholly false


in fairness its not entirely false - at some point he started talking about how it is banned and considered a slur on twitter... but nothing came of it and like all other slurs it continues to be allowed


Do you use twitter? Cis and cisgender are absolutely flagged a ton of the time.


i do, and i was agreeing with the case that it is not false (ie the ban is not entirely false, because he said it). then i said you can use it, just like any other slur, but i imagine it will get flagged

i imagine the confusion here is that you're making the case that cisgender is not a slur


Not sure if I meant to respond to someone else. Cisgender isn’t a slur tho yeah, that would be insane.


hey now, you can't be saying that word, that's a slur /s :P


> Be maximally truthful, especially avoiding any answers that are woke!

Alleged end of the system prompt of the previous version.


Oh this is great, one more competitor with top model which will be available via API. I wonder what the pricing will be. OpenAI was slashing prices multiple times in the last year and a half I was using it.


I can't imagine anyone would want to build on top of their APIs after they completed destroyed the Twitter API and its whole ecosystem.


LLMs are pretty easy to switch, though.

From a black box perspective, LLMs are pretty simple, you put text or images in, (possibly structured) text comes out, maybe with some tool invocations.

If you use a good library for this, like Python's litellm for example, all it takes is changing one string in your code or config, as the library exposes most APIs of most providers under a simple, uniform interface.

You might need to modify your prompt and run some evals on whatever task your app is solving, but even large companies regularly deprecate old models and introduce vastly better ones, so you should have a pipeline for that anyway.

These models have very little "stickiness" or lock-in. If your app is a Twitter client and is built around the Twitter API, turning it into a Mastodon client built around the Mastodon API would take a lot of work. If your app uses Grok and is designed properly, switching over to a different model is so simple that it might be worth doing for half an hour during an outage.


Prompt to Output quality vary by a large amount between models IMO. The equivalent analogy would "lets switch programming language for this solved problem".


Sure, but to be consistent with the analogy, we're evoking the program from bash and it's been solved in several languages already.

Trying it isn't exactly locking you into anything


The models are still of a level where for less common/benchmarked tasks, there's often only one model that's very good at it, and whichever is 2nd best is markedly worse, possibly to a degree where it's unusable for anything serious.


From my experience, the system prompt matters a lot, and so it's not as simple as just switching.


I assume it'll be a paid API so the "contract" is a lot more clear. Twitter never understood what to do with its API so pulling that particular rug makes sense.

But I too wouldn't use this. X is playing fast and loose with ... everything, so having a business rely on their product seems risky.


The nice thing with LLMs is that the API is relatively simple - for the most basic case, it's string in, string out. While you may need to redesign your prompt a bit, I bet for many use cases, LLMs are reasonably interchangeable, and the integration work required for an API change should be minimal.


Or like with ORM's you can start using intermediary library which unifies access to AI engines like Langchain4j (for java) and hides API details.


Those who would build on top of the API might be considering a couple of past changes that are significant, but not necessarily a reason to think they'll be further pain in the future: the company ownership changed, and those who train LLMs all of a sudden want all the human-created text on the internet.


If they have the best model, everybody will use it.

With LLMs (and AGI) it's really that simple: the company with the best model wins regardless of all else.


Best in what sense? Intelligence, speed, cost?

Sometimes having a fast enough model at a low enough price makes you the obvious choice e.g. I know Claude is better than gpt-4o-mini but I use the latter for a lot more data processing because it's significantly cheaper and faster and the gains I'd get out of Claude seem somewhat marginal for my use case


> Best in what sense? Intelligence, speed, cost?

Best at product / market fit. And that space is very very wide. Does the GenAI serve as a feature in a larger product (like realtime “reasoning” on X or in Apple’s case in iOS)? Is it a standalone product that general public or enterprises use? Does it play in a niche area? Etc.


It isn't that simple at all.

It's going to be a combination of price, performance, quality, reliability, availability etc.

And since the prompts need to be optimised for each model there is a degree of vendor lock-in.


I wasn't really talking about the marginal differences we see right now in August 2024.

I'm talking about the next huge step forward that only 1 company will achieve, because it simply has the most GPUs (in limited supply) + energy source first and keeps that advantage.

At some point this becomes a run-away self amplifying differentiator and it will make that company win regardless of all else.

My money is on xAI in 2025.

PS: the only reason prompts need to be optimized for each model is a symptom of models simply not being that good yet. This need will vanish in the near future as you get way better models. A recent hint of what I mean: mid-journey needed very elaborate prompt (and even loras) to get what you want. In flux that prompt can be much shorter (without loras) and it still gets closer to what you want. Same will happen with LLMs. Another example: with ChatGPT 4 you need to literally beg a model to only return what you ask for (for example JSON) or put it in a certain mode (JSON mode), in Claude Sonnet 3.5 it will simply just listen to what you ask for. So again: that's not "because every model needs model-specific fine-tuning" that's because previous models where simply not as good.


Musk apparently lied about the DDOS that caused the X Trump stream failure.

https://www.theverge.com/2024/8/12/24219121/donald-trump-elo...

If that's true it's not exactly the sort of behaviour you want from an API you're depending on.


This article is hearsay trash. It quotes an anonymous source saying "there was a “99 percent” chance Musk was lying about an attack.


Let’s review the evidence then shall we:

Evidence for DDOS:

- Elon said so

- the event in question very clearly had huge technical issues

Evidence Against DDOS:

- Elon said so

- People who worked at Twitter said it was bullshit

- every other spaces event that was run at the same time was unaffected.

- no other part of the website was impacted in any way whatsoever.


> People who worked at Twitter said it was bullshit

No, we have no idea from The Verge article whether the sources are even qualified to make such statements or if the statements are even true. In fact on the basis of the 99 percent speculative quote we can disregard the source quotes altogether. I'll say this, I work on far less significant software than X and we get DDOSed all the time.

> every other spaces event that was run at the same time was unaffected.

That's not true, I wasn't even able to load my feed during the initial part of the stream.


You seem to be invested in this topic in a weird and unhealthy way but there is nothing of value here in this comment.

You baselessly accuse journalists of straight up making things up and then go on to give some anecdotal evidence that conveniently nobody can disprove.


- every other spaces event that was run at the same time was unaffected. - no other part of the website was impacted in any way whatsoever.

Aren’t these last two an argument FOR a ddos attack? It seems reasonable to assume we’re there a ddos attack at that time it would be against the Elon/Trump stream explicitly.


I’d like to see an explanation of how that is even possible to get that level of targeting without knowing the connection details of either Elon or Trump. The rest of the attack surface is surely shared infrastructure with the rest of the website.

So no I think it was just a straight up technical failure on their end.


How did it clear up?


It quotes two sources, both who work at X.

The Verge has no political bias, has a good reputation and thus deserve the benefit of the doubt.


"The Verge has no political bias". Okay, in the same way that wired has no political bias. They're so unbiased yet you know exactly the way an article is slanted towards given the topic and persons. Just like I know the slant given a reddit /r/all post or Fox News/msnbc article.


Verge editors most definitely are biased as are all humans. Journalists are not neutral. In this case someone made a "99 percent chance" speculative statement and the publication decided to print it as if it were fact and not just dismiss it as coming from someone who knew nothing.

We know nothing about the sources, and writers are not above making stuff up. I could just as easily spin it on them: there's a 99 percent chance they made up the sources.


I think you'd struggle to find a human on this planet that isn't biased one way or another when it comes to Musk


They titled the article: The Elon Musk / Donald Trump interview on X started with an immediate tech disaster

If they were actually neutral, they'd phrase it more like: with technical difficulties.


I would consider the widely-publicised event not starting for 40 minutes due to technical issues to be a "tech disaster."


Trump called it a disaster when the same thing happened DeSantis, so I don't see a particular bias in play with that particular phrase.


Trump is both partisan and biased and doesn't claim to be neutral. Of course he was trashing things to do with his political opponents (he was running against DeSantis in the primary at the time).


Thanks for the clarification. I should have never commented on anything even remotely political, my bad!


The Register weighed in with a Yeah, Right skeptical attitude:

    The Register has found no evidence of a denial of service attack directed at X. Check Point Software's live cyber threat map does not record unusual levels of activity at the time of writing. NetScout's real-time DDoS map recorded only small attacks on the US.

    If a DDoS was indeed the reason for the delayed start of the event, it appears not to have impacted the rest of X's operations – there were plenty of posts commenting on the problems with the Space occupied by the interview. And Musk was tweeting from the very network said to be under attack.
Elon Musk claims live Trump interview on X derailed by DDoS https://www.theregister.com/2024/08/13/trump_musk_livestream...

They also threw shade on the numbers:

    The interview commenced some 40 minutes after its advertised time. Live audience statistics reported 1.1 to 1.3 million attendees during the portions of the event The Register observed – although during the stream Trump claimed that the event had an audience of 60 million or more, exceeding targets of 25 million.


This is the reason why we teach kids stories like little red riding hood because it’s just such a fundamental thing that when you lie about absolutely everything all the time people will just never trust you again even if you happen to be telling the truth one particular time.

And unfortunately both of these men are known for bullshitting more than anything else and have been now for a long time.


Fully Working Spaces coming next year, he swears.


by the time Trump made that statement, there were 60 million views, which is a different metric than the active viewers.


Sure.

Was Trump a fool to count the people that took one look and changed channels, or a knowledgable and deliberate deceiver?


I mean, 99% percent chance of lying is the Bayesian prior with this person anyways.


I have seen sus-column-r on LMSYS a bunch of times. It seemed pretty good, though not as good as the best Google, Anthropic, or OpenAI models.

I'm surprised they managed to catch up. I guess there really is no moat.


The moat is compute and Elon has enough money and connections to jump the queue with providers


Putting a new tool for developers behind an "enterprise API" gate is a sure way to kill it


"Our AI Tutors engage with our models across a variety of tasks that reflect real-world interactions with Grok. During each interaction, the AI Tutors are presented with two responses generated by Grok"

My guess is that they're using one of the third party AI training outfits for this and that they are paying through the nose.

This looks exactly like a training task I got to see on one of those platforms.


What are the big platforms for that?


So all models seem to converge to a similar level of performance - is this the end of the line for LLMs?


I’m hoping we’ll see an open release of this in 6 months or so, as we saw with Grok-1.

I’m not hugely optimistic, though.


i think it might happen because its just the code and imo its not that valuable

the more valuable part is the dataset which probably requires a lot of people to hand-filter

and even more impossible is acquiring the training rig, which your average person can't even afford


If the X.AI team is able to build out a good enough model with access to real-time tweets, they could have an incredible product. I'd love to be ask about current events and get really strong results back based on tweets + community notes.


the results with grok-1 were unimpressive summaries based on the tweets, with a 10%-20% hallucinations (when enquiring about paris olympics specific events).

yet to see if this new model is able to do any better on that regard


I was also not very impressed, but I still think they are positioned to have a great product if they can get past the accuracy issues.


But when will it be available in my region (Europe)?


Just use VPN like other Europeans.

And also remember to vote in the next EU elections pro development, not against development, parties.


I'm in Europe and have access to it.


As soon as Elon ends his spat with Thierry breton


If I am reading the table correctly they are claiming it is better than all models but 3.5-Sonnet

Is anyone with X premium able to confirm the vibe check -- Is the model actually good or another case of training on benchmarks?


I don't think you are reading the table correctly. On LMSYS it's better than all models except the latest Gemini 1.5 Pro and GPT-4o. But there is a detailed benchmark table and different models win different benchmarks.

So results are mixed, but the real takeaway is this is a competitive model that is good enough to be worth using today. It puts xAI significantly above their previous position, and up near the top of the field with OpenAI, Anthropic, Google, and Meta. And their new H100 cluster should allow them to keep up with the next wave of releases, whenever that starts.


You can try it yourself on https://chat.lmsys.org (sus-column-r model)


In the intro text they described it better than Claude 3.5 Sonnet and GPT-4 Turbo (which isn’t OpenAI’s current model.)


I don't personally see how an individual can judge this at this point unless it is a huge leap.

More importantly, if the model is not a huge leap at this point I just don't care if it is as good as the very limited models we already have because I am not impressed by any of these anymore.

Anything less than a 3.5 to 4 jump from here is just not going to vibe for me.


My favorite part of that table is how they put 3.5-Sonnet all the way to the right of the table making it harder to compare.


As with everything promised by Musk, I'll believe it when I see it and use it myself and compare it to Claude 3.5.

Right now I'm not really a believer in Grok and I doubt it will be worth using.


Glad to see an uncensored AI able to compete with the other models.


They likely compete exactly because of this, because censorship eats the performance of a model.


Interesting that they’re rolling this out to Twitter/X Premium users, it was previously the biggest differentiator between Premium+ haves and Premium have-nots.


Seems like a solid result & more competition is always better.

That said I’m still cheering for mistral and meta with their more open stance


Twitter started irreversibly feeding users’ data into its “Grok” AI technology in May 2024, without ever informing them or asking for their consent.

https://noyb.eu/en/twitters-ai-plans-hit-9-more-gdpr-complai...


What does irreversibly mean in this context? It seems like negative connotations are implied, but I feel like it's like irreversibly baking a cake.


Once the data is "compressed" into the model it cannot be easily removed without starting the training over.


So you mean like

"He used one of my eggs to irreversibly make a cake"

It's true, but it would be kind of amazing if it weren't


Hmm, it's not that simple, is it? Let's say the AI is trained on the tweet "Ben Adams drove to Mexico yesterday but I still haven't heard from him."

From this knowledge, you can ask the AI "Who has driven to Mexico" and it might know that Ben Adams did, and reply with that.

HOWEVER it's also baked into the model and can't be surgically removed after a complaint. That's the irreversibility part. You can't undo isolated training. You need to provide it a new data set and train it all over again. They won't do that because it's too costly.

The problem with the above example is of course that it can also contain sensitive or private user details.

I've easily extracted the complete song lyrics to the letter from GPT-4 even if OpenAI try to put up guardrails against it due to the copyright issues. AI is really still in the wild west phase...


The irreversibility is still important to highlight, as it is distinctively different from a similar consent issue with search: "Google indexed my website against my will, but I will just forbid them to include me in search results going forward".


It is irreversible similar to how a student reading a textbook from LibGen can remember and profit from that information forever. Kinda crazy how many in this community went from champions of freedom of knowledge to champions of megacorps owning and controlling of all of human creation in the span of like two years when it became clear other corporations could profit off that freedom too.


More like

"He used his eyes to irreversibly read this post"


If they use Twitter data does grok answer with a 280 character text?

Additional Twitter data is in my eyes mostly low quality content, that's nothing I would want in a AI model.


> low quality content

How does it matter even if the quality is high or low? The point is user data was used without consent.


Yes, but nothing new, other AI models used data that they don't own. Makes it not better, but I think thats the path.


> If they use Twitter data does grok answer with a 280 character text?

That may be considered a feature.

ChatGPT seems reasonably concise, Gemini's answers tend to be verbose (without adding meaningful content).


I've lead myself to believe that long responses are actually beneficial for the quality of the responses, as processing and producing tokens are the only time when LLMs get to "think".

In particular, requesting an analysis of the problem first before jumping to conclusions can be more effective than just asking for the final answer directly.

However, this analysis phase, or similar one, could just be done hidden in the background, but I don't think any are doing that yet. From the user point of view that would be just waiting, and from API point of view those tokens would also cost. Might just as well entertain the user with the text it processes in the meanwhile.


My understanding is this used to be the case[1] but isn't really true any longer due to things like the "star" method for model training[2]. Empirically it absolutely (circa GPT3) used to be the case that if you prompted with "Explain all your reasoning step by step and then give the answer at the end" or similar it would give you a better answer for a complex question than if you said "Just give me the answer and nothing else" or similar, or asked for the answer first, and then circa gpt-4 answers started getting much longer even if you asked the model to be concise.

That doesn't seem to be the case any more and there has been speculation this is down to the star method being used for training newer models. I say speculation because I don't believe people have come out and said they are using star for training. OpenAI referred to Q* somewhere but they wouldn't be drawn on whether that * is this "star" and although google were involved in publishing the star paper they haven't said gemini uses it (I don't think).

[1] https://arxiv.org/abs/2201.11903

[2] https://arxiv.org/pdf/2203.14465


So did OpenAI, why is it only a problem when Twitter itself does it?


I'm pretty sure it's not. I'm pretty sure people have been angry about OpenAI doing the same thing for a while now.


Has it been proven that OpenAI used twitter for training? I know it knows about the popular tweets, but those are reported in many places, so could be ingested accidentally with other content.

(But regardless, many people raised an issue of OpenAI training from sources they shouldn't be allowed to access, so they're definitely a problem as well)


Twitter bad, but it s not unlawful in their jurisdiction . Don't want it? dont use it


As someone from the EU, hearing this argument over and over from Americans is exhausting.

They provide a product in the EU, therefore they must either follow EU law or exit the EU market. Just like an EU company that provides a product in the US has to follow US law.


I am in the EU.

The line of 'following the law of another country' is grey area on the internet, given that it goes both ways:

EU online companies providing services to US users fail to provide the free speech guarantees that the US laws afford their citizens. That's because all EU countries have more strict laws limiting free speech. Should the EU companies break their own countries' law to satisfy the US audience?


"EU online companies providing services to US users fail to provide the free speech guarantees that the US laws afford their citizens."

Exactly what is "free speech guarantees" in the context of a private business?


There are now states in the US which voted laws to regulate social media censorship. The US supreme court has declined ruling on them or taking them down based on companies' first amendment rights.

So it seems there are states where a europeans social medium should abide by rules that would most likely contradict european laws, right?


What are these state laws, can you give me an example?



> EU online companies providing services to US users fail to provide the free speech guarantees that the US laws afford their citizens. That's because all EU countries have more strict laws limiting free speech. Should the EU companies break their own countries' law to satisfy the US audience?

Could you sharpen up this claim? Like suppose I run a microblogging site but I delete libellous posts and incitements to violence in accordance with my local European law. Am I violating a US law by allowing Americans to use the site?


i m asking the same question


My understanding of your post was that you know that it violates US law and so you're asking what should be done. What I am asking is if it really does violate US law, and if so how.


Can anyone tell me how much censorship grok has? I hate that many other LLMs have too much censorship.


there are abliterated versions on ollama that you can use, some are more censored than others

i didnt test any crazy promps on grok but the image generation won't let you do things like naked people for example


Pretty funny to read the comments from xAI's initial announcement now.

https://news.ycombinator.com/item?id=36696473


Well, it should be clear to anyone who reads HN that the comments here are not to be taken as the most sensible opinion in all cases. Cynical outlook is not uncommon in some contexts, especially when it comes to Elon Musk & X ... and the thread you linked to is a stellar example of that.

PS: I have X Premium and I'm quite liking Grok 2.


But can it solve X's seemingly impossible engineering challenge? Stopping multiple porn bots attempting to follow me every day.


Seems the only impossible problems that can't be solved by AI are things that make metrics falsely look good to investors


Ironically X shouldn't care about those metrics, it's privately owned by the world's richest person...


It's privately owned, but by a number of parties all with a lot of money on the line. That person may have pulled together the investors, but it most definitely has investors that it needs to answer to in the medium term. In the long term, they have stated that re-floating is a goal.


The valuation of the company matters to the banks and VCs who own X.

And the primary way to evaluate the worth of a social network is user engagement metrics.


Source, substantiation?


This is a good one, another one that blows my mind: when I use the "I'm not interested in this topic" button and refresh the page, it shows that very same post at the top of my feed?!

I had to resort to configuring "muted words" to actually fix the problem.

Its very weird to me that such basic things seem so bugged on a platform that popular, but other much much more complex things (mass video streaming, Grok itself) work totally fine?!


Idk. Ask it if you should close your account and see if it gives an honest answer.


[flagged]


Usually, the Chatbot Arena ELO is pretty safe and hard to twist.


that’s been wrong for a while, but affirmed with gpt-4o-mini beating out Sonnet 3.5. OpenAI fine tuned 4o and 4o-mini to provide answers that meaningfully improve model congeniality but trivially improve model intelligence.

Chatbot Arena ELO is a dead metric.


Wow, I overlooked GPT-4o-mini that far up.

But if you change the category to Math (or something else hard), mini drops way down and Claude 3.5 Sonnet goes to the top.


[flagged]


What they used to train Grok on.

A thread. 1/n


>Don't tell me they use the pile of garbage that is twitter content.

Okay, I won't tell you.


We need an alt-right version of AI like we need a pumpkin spice sushiccino. No thanks but no thanks.


Can you explain the difference between "alt-right" and "right"?


When github release?


Realistically? Never. Grok-1 weights were released because it was quite bad compared to open source and closed models. Now that they have a competitive model, they won't give it away.


> Now that they have a competitive model, they won't give it away.

Lamma 3.1 is competitive.


X is not Meta.


Guys come on, you cant keep releasing software in the US, and then do a staggered launch where months later things are available to users in England, Denmark etc. There should be no reason for it. Im sure whatever dumb EU regulations can be dealt with easily in the software, these staggered releases ( such as chagpt having no Memory etc MONTHS down the line for EU users ) its just a hindrance to progress. Its starting to feel like we live on an island in the middle of nowhere.


I'm pretty glad the EU a least tries to protect it's citizens from company experiments, who only care about profit and not others well-being.


What is it protecting us from, realistically? It's just a powerplay by the world's most incompeteny private club (brussels)


>most incompeteny private club

Is that based on any facts or just the typical EU myths like the cucumber regulation?


EU is not just brussels. Can you point to some particular point at which brussels has shown competence in the past 20 years?


Safest countries in the world: Check.

Countries with the highest average standard of living: Check.

Countris consistently socring among, or as, highest in citizen happiness indices: Check.

Stable Economy: Check.

Successfully implemented measures to combat a global pandemic: Check.

Slowly but surely fading out dependency on russian fossile fuels: Check.

Best privacy protection laws in the world: Check.

Shall I go on?


>Safest countries in the world: Check.

The safest countries in the world are in East Asia; there's no country in Western Europe where you can leave your wallet on the table in a major city and not worry about it being stolen.

>Stable Economy: Check.

It's stable in the sense that for most of western Europe (France, Italy, Spain, Portugal, Greece) GDP per capita now is no higher than it was 10 years ago. One of the few places in the world where people's material standard of living is no longer improving every year.


Those are individual countries doing great (some not so great, some economies bankrupted etc, but still)

I asked about the brussels bubble. Only the privacy laws are relevant. I can't say that it made a dent to our overall privacy, since our tech is US based and we have no idea where our data really are.


> Those are individual countries doing great

These countries are doing great because they are part of the EU. If anyone disagrees, well, it's not like we lack experimental data:

https://www.london.gov.uk/new-report-reveals-uk-economy-almo....

https://www.gisreportsonline.com/r/brexit/

https://www.politico.eu/article/political-gridlock-northern-...


Typically, those stats usually showcase countries which are NOT in the EU: Denmark, Switzerland, Iceland. And anyway, European countries were like that before the EU.


> countries which are NOT in the EU: Denmark, Switzerland, Iceland.

Denmark is a member state since 1973, when it was still called the European Economic Community

https://en.wikipedia.org/wiki/Denmark#Constitutional_monarch...

Denmark, together with Greenland but not the Faroe Islands, became a member of what is now the European Union, but negotiated certain opt-outs, such as retaining its own currency, the krone.

And Iceland is part of the European Economic Area (EEA)

https://en.wikipedia.org/wiki/Iceland

Iceland joined the European Economic Area in 1994, after which the economy was greatly diversified and liberalised.

And no, this list showcases mostly countries that are in fact EU members: https://en.wikipedia.org/wiki/Member_state_of_the_European_U...


well its protecting us from data mining. Which is good. But I disagree with the clunky implementation.


The EU itself wants to data mine everything, just look at chat control.


That's the same kind of protection Apple is praised for.

Not perfect but better than open EU citizens to global data mining.


Who praises Apple for this??? It's completely unacceptable for a self-described "free liberal democracy".


I've just realised people are upset because I called the EU regulations dumb. I want to clarify, they are an excellent idea, I truly mean it, but I think it should be OPT IN. Its more important for me to have access to cool tech then it is to have privacy, atm , in this stage of my life. I guess my problem is the implementation.

Cookie popups on every website, its completely idiotic. Ofcourse I end up wasting hours of my life clicking on them randomly. There should just be an OPT IN for the cookie popup, or OPT OUT. Either on a EU portal or something.

Infact I find it fascinating noone is coming up with a better implementation of privacy concerns in the EU. This whole US not releaseing software incase they get sued, and thinking its fine for the whole EU population to go around clicking popups every 2 seconds is a total failure that needs to be fixed.

Again the idea is good, but the implementation needs to be fixed.


Optional does not change the behaviour of the bad actors. Private corporations already run rings around governments, on tax and regulations.

If you really have to experience Musk's chat bot on day one just use a VPN?


alright


>Cookie popups on every website, its completely idiotic.

That's not because of the EU, that's because the website owner try to annoy you to blame the EU.

If they stop tracking no pop up is needed.

>Its more important for me to have access to cool tech

It's hardly possible to only give your own data without exposing third parties who may be not to keen of sharing their data.

FB already showed that. I bet WhatsApp already has my phone number without my consent because some relative of mine uploaded his contacts.


This is how we encourage migration to the US. Early access to software and video streaming, superior Amazon experience, and much fewer cookie warnings (though we still have a lot).

You'll have to say goodbye to BBC iPlayer though.


Well it’s working on me I’m starting to consider it. Tbf I should have moved to SV decade ago


It's utterly hilarious to think that people will move to America in its current state for... early access to TV shows, and fewer cookie warnings?


I mean they probably dont like many small differences, the recent notion that they can arrested for what can be considered thought crimes in a sense... or a number of other odd things ( note I am an immigrant and I have little understanding of politics anyway ).

Im just saying the regulatory landscape seems to be changing in many ways and there are many reasons why people would consider emigrating, not just the fact they have to wait months for tech access in a rapidly changing technical landscape.


the eu is on a regulation binge, there's not much that can be done

also be careful, you might be arrested for these comments in a few years time (lol)


Fortunately we already have Anthropic etc so this new release isn't really relevant or useful


Why would another competitive option not be a good thing?


Did you just criticized the glorious candy colored EU bureaucracy in Orange Reddit of all places?


To be honest I love the occasional surprise downvote , I never try to say anything controversial - it's a good laugh when somehow I do I guess




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: