Falcon LLM – A 40B Model

LeoPanthera · on June 18, 2023

> to remove machine generated text and adult content

Why are tech companies so puritanical? Adult content is not immoral.

gillesjacobs · on June 18, 2023

It's important for general-purpose use: with generative models there's always a chance of hallucinations. For all uses except the specific adult-flavoured ones, you don't want the response to contain vulgarities. No one wants their company's chatbot to start narrating furry erotica. If trained on adult content, you would need to have a burdensome moderation layer downstream of the LLM.

When you do want the more niche adult themed LLMs, there are fine-tuning datasets available. Fine-tuning a vanilla open-source LLM for these uses works great. There are active communities of adult roleplay LLMs on imageboards.

Havoc · on June 18, 2023

Government of Abu Dhabi footed the bill on this one and they have a uhm unique take on:

>Adult content is not immoral.

b33j0r · on June 18, 2023

I was gonna say the same thing… except, we’re also doing it in Mountain View, SF, etc.

People are scared it’s going to show their niece lewd imagery, or say the n-word. And, these things have happened.

Enterprises see it as essential to protecting the technology from being shunned from society. We shall see sooner than later how this works out!

ElFitz · on June 18, 2023

Companies in the Silicon Valley have banned such content before LLMs were even a thing.

Facebook being a good example, regularly treating content considered to be timeless art (such as http://www.musee-orsay.fr/en/artworks/lorigine-du-monde-6933...) as if it were porn, and banning it, along with the users responsible for publishing it.

And YouTube demonetising videos containing swear words, or merely colloquial speech.

It’s nothing new, really. Just a very (for western countries) American one.

plastic3169 · on June 18, 2023

I take issue with silicon valley deciding the global moral guidelines based on their arguably narrow view and came here to defend you, but after seeing that artwork I gotta say isn’t that porn? I mean of course it’s up for debate, but just because it’s old and hang up in museum does not make it non-porn.

Maybe the problem is more facebook banning all porn including the culturally relevant one.

ElFitz · on June 18, 2023

That would be hard to say. I recall a US Supreme Justice basically defining "hard-core pornography" as "I know it when I see it".

Some would say that pornography implies sex.

Others that it is defined by the content’s purpose.

Some would contend that considering the mere exposition of the female body outside of any sexual interaction as pornographic is, in itself, objectification and sexualisation.

I don’t really have an opinion.

And, in the case of this specific painting, to me it is merely an amusing, well-thought, intriguing, multi-layered play on both words and symbols, that isn’t really significantly any more graphic than the anatomy book for children I had at 6 [1].

But that’s just me. Some could argue they aren’t even remotely the same.

If you’d prefer a less divisive example, Instagram also banned Almodovar’s nipple film poster [2].

> Maybe the problem is more facebook banning all porn including the culturally relevant one.

Or Facebook being left to decide what is and what isn’t porn, and applying their one and only rule to the whole world as if one size could fit all.

[1]: https://www.amazon.fr/Limagerie-corps-humain-Emilie-Beaumont...

[2]: https://www.wmagazine.com/culture/almodovar-nipple-poster-in...

pmoriarty · on June 18, 2023

"Some would say that pornography implies sex. Others that it is defined by the content’s purpose. Some would contend that considering the mere exposition of the female body outside of any sexual interaction as pornographic is, in itself, objectification and sexualisation."

It definitely varies from culture to culture. In the US a bared nipple on national television was a sensation, in the UK it's par for the course.

In some cultures women regularly go bare chested, in the US being topless on a beach could land you in jail.

In some cultures men wear nothing but a penis-sheath in public, while that would be considered outrageous in many other cultures.

In the US it used to be considered indecent if a woman showed an ankle, and uncivilized not to wear a hat.

In India hugging or kissing in public might get you assaulted[1], while it's no big deal in many other countries.

There are taboos in every culture, but what is taboo varies from one culture to another.

[1] - https://www.bbc.co.uk/news/world-asia-india-65593675

plastic3169 · on June 18, 2023

Yes, I totally agree with you and the Almadovar poster is another good example. The earlier artwork just brought to my mind that maybe porn is age old thing and it’s interesting that if it survives for long enough it gets this aura of acceptability.

Either way I think social media platforms should tune their filters to each society they operate in.

ElFitz · on June 18, 2023

Oh, porn definitely is an extremely old thing.

Pompei is remarkable in that aspect. Not only for the porn [1] preserved by the eruption, but also for the (some explicit) graffiti [2] [3] [4] [5]. It seems their perception and use too have evolved over time. And that "x was here" truly is timeless.

[1]: https://blainebonham.com/the-brothels-of-pompeii-may-i-see-a...

[2]: https://www.theatlantic.com/technology/archive/2016/03/adrie...

[3]: https://kashgar.com.au/blogs/history/the-bawdy-graffiti-of-p...

[4]: https://www.wondriumdaily.com/writing-on-the-wall-decoding-t...

[5]: https://en.m.wikipedia.org/wiki/Roman_graffiti

pmoriarty · on June 18, 2023

"Why are tech companies so puritanical? Adult content is not immoral."

It's all about "optics" and PR. These companies don't want their brands associated with porn. That's why YouTube doesn't allow porn on their site, even though it would be enormously profitable.

kramerger · on June 18, 2023

Oh mate, there is porn on YT.

harshreality · on June 18, 2023

Unless you have visibility into a niche corner of youtube that nobody else does...

Is there content on YT that could possibly be called porn? Sure. Is there actual porn the way most people understand it? Not more than a vanishingly small amount that would be next to impossible to find without getting directly linked to it.

YT doesn't even allow gratuitous posing, especially lingering rear shots, for try-on hauls. Sex talk (no explicit activity) is allowed, but put behind their "inappropriate" warning. I haven't seen anything that isn't at least dual-use educational or ASMR. I haven't seen anything that's particularly vulgar, even if it didn't descend into problematic fetishes. Plenty of YT channels were banned for suggestive sexual ASMR, even without nudity or explicit roleplay. They don't have as much problem with scantily clad, suggestive dancing in mainstream-approved media content though, but that's not porn either.

The most nudity I've seen allowed on YT was in fringe dance performances and very occasionally in mainstream music videos.

startupsfail · on June 18, 2023

Does that include LGBTQ content? If yes, then this probably go under the definition of intolerance and in California there are protections. YouTube should not be allowed to filter the content like that, it must have it readily available for all audiences.

harshreality · on June 19, 2023

It has nothing to do with sexual orientation or claimed gender, and everything to do with what body parts are shown and what explicit words (or other sounds) are used.

kramerger · on June 18, 2023

There is a huge piracy community on YT that occasionally includes porn.

Finding it is a bit hard as they move around a lot and Google is getting better at removing them, but they are definitely there.

comfypotato · on June 18, 2023

Whether or not it’s immoral is opinion-based. It’s probably typically in order to not alienate those who think it is. Effectively a business decision even in the context of an open model.

prox · on June 18, 2023

If it’s truly opinion why not give people the option of turning it off or on?

ElFitz · on June 18, 2023

Because puritans, extremists, and generally everyone who believes themselves morally superior to others, believe they have the moral right, if not imperative, to impose their views onto others. And to get said others out of their wayward paths.

Anything less would mean letting black sheep harm and corrupt society as a whole, durably. How could you let that happen?

Their pure and morally superior ends thus justify the means, coercion being the least intrusive and oppressive of those, and paling in comparison to other acceptable methods, such as public shaming, ostracising, and even violence.

Basically, to self-righteous zealots, freedom and individuality are secondary to what they see as morally, and universally, right.

And how could it be otherwise? You can’t possibly be convinced you know the one and only acceptable way for all and accept people should be free to do as they want, can you?

That, and some have always liked to weaponise these sentiments for influence, political power, and monetary gain.

prox · on June 18, 2023

I have been thinking that most cultures before roughly the 18th century were not puritan at all, at least in some shape or form. For instance I have been reading muslim poets who made fairly erotic poems, same for India. Now America is an outlier here, it’s actually the place were puritans went.

It was only in the age of full on colonization that you see the puritan groups forming (Salafists for instance rise in the 19th century) and it’s a similar mechanism at work in India.

Not sure the background on that, would love to have more insight how the history of morals developed. Maybe at some point it became a tool to keep people in check?

ElFitz · on June 18, 2023

Perhaps. The factors and morals seem to vary significantly, depending on both historical periods and cultural contexts.

Take, for example, Japan during the Meiji restoration [1]. In their quest to dispel behaviours deemed immoral or indecent by Western societies, they largely eliminated mixed onsen ("konyokuburo"), in an attempt to transform into a "modern", "civilised" nation, worthy of international respect rather than colonisation.

Medieval Europe is also interesting. Despite the church's portrayal of sexuality as sinful and degrading, many nobles maintained mistresses, and even certain popes (Alexander Borgia comes to mind) fathered children out of wedlock.

Finally, both ancient Romans [2] and Greeks [3] held perspectives we would find surprising today, if not ambivalent.

I still my surprise when noticing a dozen boxes of very explicitly shaped cakes in a remote Japanese mountain trail souvenir shop. I guess we could call that culture shock.

[1]: https://www.bathclin.co.jp/en/happybath/did-you-know/a-brief...

[2]: https://en.wikipedia.org/wiki/Sexuality_in_ancient_Rome

[3]: https://www.washingtonpost.com/news/volokh-conspiracy/wp/201...

criley2 · on June 18, 2023

It's funny to read this comment in the context of users who are angry they cannot impose their pro-pornography views on the creators of the model. How dare they obey different morals! How dare they have different views! My views are more important and thus they should do what I want!

>Their pure and morally superior ends thus justify the means, coercion being the least intrusive and oppressive of those, and paling in comparison to other acceptable methods, such as public shaming, ostracising, and even violence.

You yourself hit about 80% of your "ends justifications" in your shameless attacks on folks who don't want pornography coming out of the LLM they're using at work. The irony is unreal.

ElFitz · on June 18, 2023

Oh, people who want everybody to always be able to publish anything, anywhere, anyone else’s preference, sensitivity, and beliefs, be damned, definitely are just as zealous as the ones they are hell-bent on fighting.

comfypotato · on June 18, 2023

LLMs don’t support this sort of functionality, and thus the trainers eliminate adult content from the training so as not to lose market share to other censored models. As stated in other comments: users can fine tune the model to incorporate adult content if it is necessary for their use case.

You should take a look at the HN guidelines. Your comment is a strongly worded take on politics, religion, and otherwise significantly controversial topics. Ideological battle is discouraged.

ElFitz · on June 18, 2023

> LLMs don’t support this sort of functionality, and thus the trainers eliminate adult content from the training so as not to lose market share to other censored models.

And yet Stability.ai had managed to do just that with Stable Diffusion, even if it’s wasn’t in the model per se, and was quickly worked around anyway.

> You should take a look at the HN guidelines. Your comment is a strongly worded take on politics, religion, and otherwise significantly controversial topics. Ideological battle is discouraged.

Tomato, tomato.

One’s observations and commentary on the methods of those inclined to wage ideological battles, including censorship, is another’s strongly worded ideological crusade.

est31 · on June 18, 2023

It's an open model so if you want, for a fraction of the training price, you can fine tune it with adult fan fiction to generate even more adult fan fiction if you really want to.

An open source model allows for that. Compare this to ChatGPT/GPT-4 which are closed and filtered at the API level.

gillesjacobs · on June 18, 2023

Turning it off or on would effectively mean maintaining two different LLMs which is costly (dataset collection, maintenance, training, MLOps).

jfengel · on June 18, 2023

Can you just turn it off in an LLM? Or would that basically require retraining the entire model?

Nathanba · on June 18, 2023

because their opinion is that it's immoral

moffkalast · on June 18, 2023

It was made in the UAE, I'm surprised it doesn't censor far more.

aaronsteers · on June 21, 2023

Although not evil, adult content should be opt-in, and should be able to be opted-out at a platform level... hence, the need for censored models. Imagine a restaurant booking AI app, built on GPT, that accidentally doubled as a bomb-making tutor or an adult content generator. It's a lawsuit waiting to happen, if nothing else, and it's worth making these use cases harder (if not impossible) to implement in mainstream, commercially available products. Note that for many of these products, the age and consent for adult material has not been already established.

So far, the open source ecosystem seems to be doing a good job of providing both censored and uncensored LLMs - and it seems there are valid use cases for both.

Think of this as similar to Falcon LLM being launched in both 40B and smaller 7B variants - the LLM often will need to match the use case, and the 7B model is a good example of making the model smaller (and worse) on purpose in order to reach certain trade-offs.

nashashmi · on June 18, 2023

And you know this because you’re the expert of knowing what is morality? Decades of people telling you it is immoral, yet somehow you come along to say the opposite and hope people believe it? The only justification you can give is an ad hominem attack accusing the other person of not knowing what they are talking about.

Try a little bit of academic knowledge here: https://m.youtube.com/watch?v=wSF82AwSDiU

nashashmi · on June 18, 2023

Of course it is not immoral: https://m.youtube.com/watch?v=gRJ_QfP2mhU

d0mine · on June 18, 2023

Have you look at the video description? It says explicitly that views expressed in the video are not supported by the mainstream research.

nashashmi · on June 18, 2023

I did not see that bit. I came upon this many years ago. TED has changed its thought process and stances on multiple subjects since then. Adding such disclaimers is a new turn of editorialism.

If you want to go down the rabbit hole of research that both recognizes and refutes the assertions, you will find more opinions expressed than facts. But neither the facts nor opinions are interesting. It is the narratives and the lessons derived that hold more value. And that video expresses some of them.

There is a longer documentary style video on the same subject by the same speaker.

And my search turned up this bit: https://pornstudycritiques.com/gary-wilson-wins-second-legal...

There is a movement out there trying to crush this video.

ix-ix · on June 18, 2023

That whole website is "all of science disagrees with us, but we know we are right, and there is a cabal of scientists trying to silence us". This is the least scientific website and you should not use them as a source.

nashashmi · on June 19, 2023

Fine. Won't use them for science. The significant question remains: why is someone trying so hard to crush that video?

0898 · on June 18, 2023

TII is based in the United Arab Emirates.

kossTKR · on June 18, 2023

Haven't we seen that too much reinforcement to censor worsens the model, just like ignoring some data actually makes it worse in all other parts?

Even though this is quite bizarre on its surface - that ignoring for example works of fiction makes it worse at programming.

In that case a simple middle man agent that is inaccessible to the user would provide better quality while maintaining censorship that can even be dynamically and quickly redefined or extended.

bpiche · on June 18, 2023

Falcon is UAE

elahieh · on June 18, 2023

Is there a guide out there for dummies on how to try a ChatGPT like instance of this on a VM cheaply? eg pay $1 or $2 an hour for a point and click experience with the instruct version of this. A docker image perhaps.

Reading posts on r/LocalLLAMA is people’s trial and error experiences, quite random.

lhl · on June 18, 2023

For Falcon specifically, this is easy, it's embedded here: https://huggingface.co/blog/falcon#demo or you can access the demo here: https://huggingface.co/spaces/HuggingFaceH4/falcon-chat

I just tested both and it's pretty zippy (faster than AMD's recent live MI300 demo).

For llama-based models, recently I've been using https://github.com/turboderp/exllama a lot. It has a Dockerfile/docker-compose.yml so it should be pretty easy to get going. llama.cpp is the other easy one and the most recent updates put it's CUDA support only about 25% slower and generally is a simple `make` with a flag depending on which GPU you support you want and has basically no dependencies.

Also, here's a Colab notebook that should let shows you run up to 13b quantized models (12G RAM, 80G disk, Tesla T4 16G) for free: https://colab.research.google.com/drive/1QzFsWru1YLnTVK77itW... (for Falcon, replace w/ Koboldcpp or ctransformers)

joshka · on June 18, 2023

Take a look at youtube vids for this. Mainly because you're going to see people show all the steps when presenting instead of skipping them when talking about what they did. E.g. https://www.youtube.com/watch?v=KenORQDCXV0

ianpurton · on June 21, 2023

I have a dockerfile here https://github.com/purton-tech/rust-llm-guide/tree/main/llm-... for running mpt-7b

docker run -it --rm ghcr.io/purton-tech/mpt-7b-chat

It's a big download due to the model size i.e. 5GB. The model is quantized and runs via the ggml tensor library. https://ggml.ai/.

api · on June 18, 2023

A small cheap VPS won’t have the compute or RAM to run these. The best way (and the intent) is to run it locally. A fast box with at least 32GiB of RAM (or VRAM for a GPU) can run many of the models that work with llama.cpp. For this 40G model you will need more like 48GiB of RAM.

Apple Silicon is pretty good for local models due to the unified CPU/GPU memory but a gaming PC is probably the most cost effective option.

If you want to just play around and don’t have a box big enough then temporarily renting one at Hetzner or OVH is pretty cost effective.

gardnr · on June 18, 2023

Falcon doesn't work in llama.cpp yet: https://github.com/ggerganov/llama.cpp/issues/1602

redox99 · on June 18, 2023

They said 1 or 2 bucks an hour. You can get an A100 for that.

londons_explore · on June 18, 2023

Try $100/hour for big LLM's... And you're probably going to need a fleet of 16 machines unless you want to quantize it and do inference only.

sp332 · on June 18, 2023

Where can you even find a machine for $100/hour? The most expensive one on this list is just over $5 and is definitely overkill for running a 40B model. https://www.paperspace.com/gpu-cloud-comparison

0x008 · on June 19, 2023

The prices there are totally not real. The A100 x 8 is listed for $5, whereas amazon price calculator lists it for $32 per hour...

sp332 · on June 19, 2023

It might have been a spot price. Spot prices now are higher than $5 but still under $10.

sbierwagen · on June 18, 2023

Doesn't do so great on the leaderboards: https://tatsu-lab.github.io/alpaca_eval/

binarymax · on June 18, 2023

Depends on the benchmark. Does well on other metrics when compared to open models.

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

redox99 · on June 18, 2023

That leaderboard is incorrect

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

lhl · on June 18, 2023

Actually it's HF's leaderboard that's bugged. Falcon is only on top since their MMLU scores are bugged across all LLaMA models: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

cosmojg · on June 18, 2023

Given that HellaSwag performance seems to correlate with reasoning ability more than other benchmarks, Falcon certainly look promising! Hopefully this is a clean result and not the product of dataset contamination.

avereveard · on June 18, 2023

I've given it a try, to having a chat is good, to follow langchain prompts it's not.

I guess it depends on the type of work you want to extract from it.

londons_explore · on June 18, 2023

I really like the fact that the leaderboards are almost identical when using claude or GPT-4 as evaluators.

If a less powerful model can be a good decider of the better answer between two more powerful models, it opens up a lot of research opportunities into perhaps using these evaluations as part of an automated reinforcement learning process.

mikeravkine · on June 18, 2023

Its a pretty terrible model, I wouldn't use this for anything at all. Vicuna 1.1 outperforms it in all my tests.

reasonabl_human · on June 18, 2023

What tests are you running? Anything outside of the alpaca eval test suite?

brucethemoose2 · on June 19, 2023

That leaderboard does not line up with my personal experience with those models at all...

logicchains · on June 18, 2023

Worth noting that according to the initial press release, they're also working on Falcon 180B, which would be the largest (and likely most effective) open source model by far.

bioemerl · on June 18, 2023

No, don't tell me that, I'm going to need more graphics cards now.

jumpCastle · on June 18, 2023

Not by far, there's bloom and opt if you count it.

lordofgibbons · on June 18, 2023

Those weren't open source though.

OPT is non-commercial, and BLOOM had this extremely deranged OpenRAIL license which includes user hostile things like forced updates, and other weird restrictions.

gardnr · on June 18, 2023

Falcon was intially release under a weird modified license. It looks like they changed it to Apache May 31st: https://www.tii.ae/news/uaes-falcon-40b-now-royalty-free

airgapstopgap · on June 18, 2023

Bloom and OPT are weak, predictably so. Nobody uses them (except for research sometimes, and even then rarely). It doesn't make sense to look for hardware to run a 176B model when a 13B outperforms it across the board.

moffkalast · on June 18, 2023

And also the most unfeasibly impossible to run.

kristianp · on June 18, 2023

17 days ago: https://news.ycombinator.com/item?id=36145185

kytazo · on June 18, 2023

Not everyone always catches up with all the news, its important enough for a report if you ask me.

Risyandi94 · on June 18, 2023

Falcon LLM is a foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens. TII has now released Falcon LLM – a 40B model.

jrflowers · on June 18, 2023

Has anybody gotten this running on consumer hardware ala llama or is that not in the cards?

orost · on June 18, 2023

Experimental Falcon inference via ggml (so on CPU): https://github.com/cmp-nct/ggllm.cpp

It has problems but it does work

bestcoder69 · on June 18, 2023

I've only seen people mention that it runs really slow, even on like A100s.

danieldk · on June 18, 2023

There are no big differences compared to other LLM architecturally. The largest differences compared to NeoX are: no biases in linear layers, shared heads for the key and value representations (but not query).

Of course, it has 40B parameters, but there is also a 7B parameter version. The primary issue is that the current upstream version (on Huggingface) hasn't implemented key-value caching correctly. KV caching is needed to bring the complexity down from O(n^3) to O(n^2). The issues are: (1) their implementation uses Torch' scaled dot-product attention, which uses incorrect causal masks when the query/key sizes are not the same (which it the case when generating with a cache). (2) They don't index the rotary embeddings correctly when using key-value cache, so the rotary embedding of the first token is used for all generated tokens. Together, this causes the model to output garbage and it only works when using it without KV caching, making it very slow.

However, this is not a property of the model and they will probably fix this soon. E.g. the transformer library that we are currently developing supports Falcon with key-value caching and it the speed is on-par with other models of the same size:

https://github.com/explosion/curated-transformers/blob/main/...

(This is a correct implementation of the decoder layer.)

zwaps · on June 18, 2023

Super helpful!

Do you have some more info on these issues and where this is discussed? Besides following y'all at explosion, any tipps of whom to follow so I don't get blindsided?

danieldk · on June 18, 2023

I found them out myself when making our own implementation of the model. We test our outputs against upstream models. In decoding without history, our tests passed, but in decoding with history there was a mismatch between our implementation and the upstream implementation. Naturally, I assumed that our implementation was wrong (being the newer implementation, not sharing code with theirs), but while debugging this I found that our implementation is actually correct.

Then I was planning to report these issues. Someone else found the causal mask issue a week earlier, so there was no need to report it:

https://github.com/pytorch/pytorch/issues/103082

I reported the issue with rotary embeddings in a discussion of problems that people were running into trying to use KV caching:

https://huggingface.co/tiiuae/falcon-40b/discussions/48#648c...

More in general, I am not sure what the best place is to track these issues. Maybe a model's discussion forums?

kbrkbr · on June 18, 2023

I tried it using oobabooga's webui side by side with Alpaca 65B loaded in 4 bit on the same AWS instance with 64GB of VRAM.

While Alpaca produced 3 tokens/sec, Falcon produced 0.17 tokens/sec.

So it is very slow with the current tooling still.

brianjking · on June 19, 2023

How can you deploy Oogabooga to AWS/Huggingface/etc?

Any tips?

Cheers!

logicchains · on June 18, 2023

llama.cpp just got Falcon support (not yet merged), so you could run it on just RAM. Not too fast though.

gardnr · on June 18, 2023

In case anyone wants to follow along: https://github.com/ggerganov/llama.cpp/issues/1602

ilaksh · on June 18, 2023

Anyone tried doing LoRA-style training to focus this on better code generation?

SparkyMcUnicorn · on June 18, 2023

It's WizardLM, not WizardCoder, but I'd bet it improves it in some degree.

https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-...

antonmks · on June 18, 2023

When it comes to writing stories, this model is way behind ChatGpt 3.5

SparkyMcUnicorn · on June 18, 2023

I don't have first hand experience, but I've heard that it performs really well at story writing with some fine tuning.

interlinked · on June 18, 2023

> You will need at least 85-100GB of memory

RAM or VRAM?

kiraaa · on June 18, 2023

you need 94gb, does not matter which RAM.