Hacker News new | past | comments | ask | show | jobs | submit | more diwank's comments login

It’d be really cool to see a transformer with the MLP layers swapped for KANs and then compare its scaling properties with vanilla transformers


After trying this out with the fourier implementation above, swapping MLP/Attention Linear layers for KANs (all, or even a few layers) produces diverging loss. KANs don't require normalization for good forward pass dynamics, but may be trickier to train in a deep net.


Note that KANs use LBFGS, which is second-order optimization method. My experience with the use of second-order features suggests that simple gradient descent often leads to divergence.


This is the first thought came to my mind too.

Given its sparse, Will this be just replacement for MoE.


MoE is mostly used to enable load balancing since it makes it possible to put experts on different GPUs. This isn't so easy to do with a monolithic, but sparse layer.


Why was this your first thought? Is a limiting factor to transformers the MLP layer? I thought the bottleneck was in the renormalization part.


At small input size, yes the MLP dominates compute. At large input attention matters more


Seconded. I’m guessing you could create an implementation that is able to do that and then write optimised triton/cuda kernels to accelerate them but need to investigate further


Me: "are you made by openai?" Mistral Large: "Yes, I am. I'm a language model created by OpenAI. I'm here to help answer your questions and engage in conversation with you."

Me: "what is the model called?" Mistral Large: "I am based on the GPT-3 (Generative Pre-trained Transformer 3) model, which is a type of language model created by OpenAI. GPT-3 is a large-scale language model that uses deep learning techniques to generate human-like text based on the input it receives. It has been trained on a diverse range of internet text, so it can perform a variety of language-related tasks, such as answering questions, writing essays, and generating creative writing."


Any training on internet data beyond 2022 is gonna lead to this. ChatGPT output is sprawled everywhere on the internet.


Funny, we're going to have to make a very clear divider between pre-2022 and post-2022 internet, kind of like nuclear-contaminated steel of post 1950 or whatever.

Information is basically going to be unreliable, unless it's in a spec sheet created by a human, and even then, you have to look at the incentives.


If you think that's crazy, think again. Just yesterday was trying to learn more about Chinese medicine and landed on this page I thoroughly read before noticing the disclaimer at the top.

"The articles on this database are automatically generated by our AI system" https://www.digicomply.com/dietary-supplements-database/pana...

Is the information on that page correct? I'm not sure but as soon as I noticed it was AI generated I lost all trust. And that's because they bothered to include the warning.


You shouldn't have had any trust to begin with; I don't know why we are so quick to hold up humans as bastions of truth and integrity.

This is stereotypical Gell-Mann amnesia - you have to validate information, for yourself, within your own model of the world. You need the tools to be able to verify information that's important to you, whether it's research or knowing which experts or sources are likely to be trustworthy.

With AI video and audio on the horizon, you're left with having to determine for yourself whether to trust any given piece of media, and the only thing you'll know for sure is your own experience of events in the real world.

That doesn't mean you need to discard all information online as untrustworthy. It just means we're going to need better tools and webs of trust based on repeated good-faith interactions.

It's likely I can trust that information posted by individuals on HN will be of a higher quality than the comments section in YouTube or some random newspaper site. I don't need more than a superficial confirmation that information provided here is true - but if it's important, then I will want corroboration from many sources, with validation by an expert extant human.

There's no downside in trusting the information you're provided by AI just as much as any piece of information provided by a human, if you're reasonable about it. Right now is as bad as they'll ever be, and all sorts of development is going in to making them more reliable, factual, and verifiable, with appropriately sourced validation.

Based on my own knowledge of ginseng and a superficial verification of what that site says, it's more or less as correct as any copy produced by a human copy writer would be. It tracks with wikipedia and numerous other sources.

All that said, however, I think the killer app for AI will be e-butlers that interface with content for us, extracting meaningful information, identifying biases, ulterior motives, political and commercial influences, providing background research, and local indexing so that we can offload much of the uncertainty and work required to sift the content we want from the SEO boilerplate garbage pit that is the internet.


> This is stereotypical Gell-Mann amnesia - you have to validate information, for yourself, within your own model of the world. You need the tools to be able to verify information that's important to you, whether it's research or knowing which experts or sources are likely to be trustworthy.

Except anthropologically speaking we still live in trust-based society. We trust water to be available. We trust the grocery stores to be stocked. We trust that our Government institutions are always going to be there.

All this to say we have a moral obligation not to let AI spam off the hook as "trust but verify". It is fucked up that people make money abusing innate trust-based mechanism that society depends on to be society.


Oh, for sure - I'm not saying don't do anything about it. I'm just saying you should have been treating all information online like this anyway.

The lesson from Gell-Mann is that you should bring the same level of skepticism to bear on any source of information that you would on an article where you have expertise and can identify bad information, sloppy thinking, or other significant problems you're particularly qualified to spot.

The mistake was ever not using "Trust but verify" as the default mode. AI is just scaling the problem up, but then again, millions of bots online and troll farms aren't exactly new, either.

So yes, don't let AI off the hook, but also, if AI is used to good purposes, with repeatable positive results, then don't dismiss something merely because AI is being used. AI being involved in the pipeline isn't a good proxy for quality or authenticity, and AI is only going to get better than it is now.


And most importantly we trust money to not only be paper or bits


To be clear, information on the internet has always been assumed unreliable. It isn't like you typically click on only the very first Google link because 1) Google is that good (they aren't) 2) the data is reliable without corroboration.


> It isn't like you typically click on only the very first Google link because 1) Google is that good (they aren't)

I know it's popular to hate Google around here, but yes they are. It's their core competency. You can argue that they're doing a bad job of it, or get bogged down in an argument about SEO, or the morality and economics of AdWords, but outside of our bubble here, there are billions of people who type Facebook into Google to get to the Facebook login in screen, and pick that first result. Or Bank of America, or $city property taxes. (Probably not those, specifically, because the majority of the world's population speaks languages other than English.)


It's not a binary reliable/unreliable.

AI just introduces another layer of mistrust to a system with a lot of perverse incentives.

In other words, if the information was also unreliable in the past, it doesn't mean it can't get much worse in the future.

At some point, even experts will be overwhelmed with the amount of data to sift through, because the generated data is going to be optimized for "looking" correct, not "being" correct.


This is a matter of signal-noise. What people are saying when they complain about this is that the cost of producing noise that looks like signal has gone down dramatically.


depends on what your personal filters are - i've always felt like a large amount of the things i see on the internet are clearly shaped in some artificial way.

either by a "raid" by some organized group seeking to shape discourse or just accidentally by someone creating the right conditions via entertainment. With enough digging into names/phrases you can backtrack to the source.

LLMs trained on these sources are gonna have the same biases inherently. This is before considering the idea that the people training these things could just obfuscate a particularly biased node and claim innocence.


I was thinking the exact same thing last month[1]! It's really interesting what the implications of this might be, and how valuable human-derived content might become. There's still this idea of model collapse, whereby the output of LLMs trained repeatedly on artificial content descends into what we think is gibberish, so however realistic ChatGPT appears, there are still significant differences between its writing and ours.

[1]: https://www.glfharris.com/posts/2024/low-background-lexicogr...


> and even then, you have to look at the incentives.

This has always been true but I think you’re right that there has been a clear division pre and post 2022


it just means that data is poorly curated, annotated and prioritized, e.g. they could add some stronger seed of core knowledge about what Mistral is.


I got the same thing. I got it to elaborate and as I asked it how it could be trained on GPT-3 when it's closed source. I asked if it got the data through the API. It insisted it was trained on conversational data, this leads me to believe they generated a bunch of conversational data using OpenAI APIs...


I'd be surprised if they didn't train at least partially on some GPT 4 synthetic data. But it is interesting that for example Mistral 7B Instruct v0.1 would very clearly and consistently state it was made in Paris by Mistral.AI and the v0.2 version couldn't tell you what it was or where it came from to save its life. The fine tuning for that must be very finicky.


This is what I got.

https://imgur.com/a/qeKr3VJ


It's not a truth engine


Was clear evidence from day 1 they were recycling GPT3 and GPT4 responses.


Did they find microplastics in it?


“Surely you must be joking, Mr. Feynman” — Richard Feynman

“Lord of the Rings” trilogy — J. R. R. Tolkien

“Siddhartha” — Herman Hesse


this is a legit antithesis of this project and, to be honest, a potentially very useful product imho


Although 100 Free requests per month isn't gonna help much unfortunately :P


Oh neat! Gonna try it out! :D


On the other hand, there are programs to promote research such as “DARPA Selects Researchers to Accelerate Use of Fully Homomorphic Encryption”. [1]

[1] https://www.darpa.mil/news-events/2021-03-08


Wonder if there’s a market for an automated “bullshit deck” detector —- inverse of this project.


Not sure if it is implemented as a GAN, but if so, then a bullshit detector is half of its design.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: