Transformers.js

xenova · on March 17, 2023

Hi everyone! Creator of Transformers.js here :) ...

Thanks so much to everyone for sharing! It's awesome to see the positive feedback from the community. As you'll see from the demo, everything runs inside the browser!

As of 2023/03/16, the library supports BERT, ALBERT, DistilBERT, T5, T5v1.1, FLAN-T5, GPT2, BART, CodeGen, Whisper, CLIP, Vision Transformer, and VisionEncoderDecoder models, for a variety of tasks including: masked language modelling, text classification, text-to-text generation, translation, summarization, question answering, text generation, automatic speech recognition, image classification, zero-shot image classification, and image-to-text. Of course, we plan to add many more models and tasks in the near future!

Try out some of the other models/tasks from the "Task" dropdown (like the code-completion or speech-to-text demos).

---

To respond to some comments about poor translation/generation quality, many of the models are actually quite old (e.g., T5 is from 2020)... and if you run the same prompt through the PyTorch version of the model, you will get similar outputs. The purpose of the library/project is to bring these models to the browser; we didn't train the models, so, poor quality can (mostly) be blamed on the original model.

Also, be sure to play around with the generation parameters... as with many LLMs, generation parameters matter a lot.

---

If you want to keep up-to-date with the development, check us out on twitter: https://twitter.com/xenovacom :)

gl-prod · on March 17, 2023

Can I use it in Deno? It requires a worker (fails in node because "self")

xenova · on March 17, 2023

Yes, there are some workarounds you can do to get it working in non-browser environments. I do aim to get a permanent solution, which will ideally work out-of-the-box for both browser and node/deno environments.

Some other users also reported the issue (which stems from a bug in onnxruntime-web), and we were able to get it working in these cases:

1. https://github.com/xenova/transformers.js/issues/4 2. https://github.com/xenova/transformers.js/issues/19

gl-prod · on March 17, 2023

Thanks, I will be following

rkagerer · on March 17, 2023

Is there an Optimus model yet for Prime number encoding?

rc202402 · on March 17, 2023

Good one

penny10k · on March 17, 2023

What did Optimus Prime say when he first learned about machine learning? "Autobots, roll out the algorithms!"

rektide · on March 17, 2023

I really liked the suggestion that if it takes off, the web should consider trying to expose something like the OpenXLA intermediate model, which powers the new PyTorch 2.0, TensorFlow, Jax, and a bunch of other top tier ML frameworks.

It already is very well optimized for a ton of hardware (cpus, gpus, ml chips). The Intermediate Representation might already be a web-safe-ish model, effectively self-sandboxing, which could make it safe to expose.

https://news.ycombinator.com/item?id=35078410

nl · on March 17, 2023

Shouldn't it be possible to build a WebGL backend for OpenXLA?

Edit: There seems to be some progress on a WASM backend for OpenXLA here: https://github.com/openxla/iree/issues/8327

and a proposed WebML working group at W3C: https://www.w3.org/2023/03/proposed-webmachinelearning-chart... that references OpenXLA

rektide · on March 17, 2023

Making each webapp target & optimize ML for every possible device target sounds terrible.

The purpose of MLIR is that most of the optimization can be done at lower levels. Instead of everyone figuring out & deciding on their own how best to target & optimize for js, wasm, webgl, and/or webgpu, you just use the industry standard intermediate representation & let the browser figure out the tradeoffs. If there is inboard hardware, neural cores, they might just work!

Good to see WebML has OpenXLA on their radar... but also a bit afraid, expecting some half ass excuses why of course we're going to make some brand new other thing instead. The web & almost everyone else has such a bad NIH problem. WASI & web file apis being totally different is one example, where there's just no common cause, even though it'd make all the difference. And with ML, the cost of having your own tech versus being able to re-use the work everyone else puts on feels like a near suicidal decision to make an API that will never be good, never perform anywhere where near it could.

nl · on March 17, 2023

> Making each webapp target & optimize ML for every possible device target sounds terrible.

Yes it does.

Did something I said imply that?

OpenXLA is an intermediate layer that frameworks like PyTorch or JAX can use. It has pluggable backends, and so if there was a web-compatible backend (WebGL or WASM) then everyone could use it and all models that were built using something that used OpenXLA[1] would be compatible.

[1] Not 100% sure how low-level the OpenXLA intermediate representation is. I know it's not uncommon when porting a brand new primitive (eg a special kind of transformer etc) to a new architecture (eg CUDA->Apple M1) that some operations aren't yet supported, so this might be similar.

rektide · on March 17, 2023

I support having web targets. It'd be a good offering.

But it feels upside down to me from what we really all should want, which is a safe way to let the web target any backend you have. WebGPU or WebGL or wasm are going to be OK targets, but with limited hardware support & tons of constraints that mean they won't perform as well as openxla.

Also how will these targets get profiled? Do we ship the same WebGL to a 600w monster as a rpi?

There's a lot of really good reasons to want OpenXLA under the browser, rather than above/before it.

nl · on March 17, 2023

> WebGPU or WebGL or wasm are going to be OK targets, but with limited hardware support & tons of constraints that mean they won't perform as well as wasm.

I don't understand. "WebGPU or WebGL or wasm".. "won't perform as well as wasm".

rektide · on March 17, 2023

*OpenXLA, edited

brrrrrm · on March 17, 2023

I don't think a high level representation is necessary for relatively straightforward FMA extensions (either outer products in the case of Apple AMX or matrix products in the case of CUDA/Intel AMX). WebGPU + tensor core support and WASM + AMX support would be simpler to implement, likely more future proof and wouldn't require maintaining a massive layer of abstraction.

cromwellian · on March 17, 2023

The issue is, much of the performance of Pytorch, JAX, et al comes from running a JIT that is tuned to the underlying HW, and come with support for high level intrinsic operations that were either hand-tuned or have extra hardware support, especially ops dealing with parallelizing computation across multiple cores.

You'd probably end up representing these as external library function calls in WASM, but then the WASM JIT would have to be taught that these are magic functions that are potentially treated specially, so at that point you're just embedding HLO ops as library func, and them embedding an HLO translator into the WASM runtime, I'm not sure that's any better.

By analogy would be be better to eliminate fragment and vertex shaders and just use WASM for sending shaders to the GPU, or is the domain specific language and its constraints beneficial to the GPU drivers?

rektide · on March 17, 2023

Yes! In short:

Do we leave it to every web app to figure out how best to serve everyone, and have them bundle their own tuning optimizers into each app? Or do we bake in a higher level abstraction that works for everyone that the browser itself will be able to help optimize?

There's some risk & the browser apis likely won't come with all the escape-hatches the full tools might have to manually jigger with optimizations, but the idea of getting everyone to DIY seems like a promise of misfit: way too much code when you don't need it, way not enough tuning when you do need it. And there's other risks; the assurity that oh we just need one or maybe two ops on the web & then everything will be fine forever doesn't wash with me. If we make new ops the old code won't use it.

And what about hardware that doesn't have any presence on the web; lots of cheap embedded cores have a couple tflops of neural coprocessing, but neither wasm nor webgpu can target that atm, it's much too simple a core for that kind of dynamic execution; it's the sea of weird expansive hardware that OpenXLA helps one target (and target very well indeed) that is it's chief capability, and I can't imagine forgoing a middleman abstraction like it.

crowwork · on March 17, 2023

checkout https://mlc.ai/web-stable-diffusion, which is builds on top of Apache TVM and brings in models from PyTorch2.0, ONNX and other means into the ML compilation flow

itsaquicknote · on March 17, 2023

Hah, ChatGPT has successfully poisoned the well. Well done sama.

This lib is great work, a JS interface for running HF models. The comments about how "bad" the outputs are as surprising to me as they are alarming.

OAI has now set the zero-effort bar so high that even HNers (who click on .js headlines) fall into the gap they've left. That sucking sound you hear is market share being hoovered up.

buryat · on March 18, 2023

your comments are very snarky

It would be great if we all try to keep the tone respectful and avoid snarkiness to maintain a constructive discussion

https://news.ycombinator.com/newsguidelines.html

itsaquicknote · on March 22, 2023

No they're not mate, it's just you. I've read the guidelines (thanks for helpfully linking them). I see this on HN, people infer offense and cite the book rather than engage.

By not highlighting what you found "snarky" your response is a definitional "shallow dismissal". I see you just "picked the most provocative thing to complain about". Not a lot of being "kind" either.

So you know what would also be great? If you held yourself to the standards you're keen to police around here.

anonu · on March 17, 2023

I typed in 1 2 3 4 5 6 in a text generation task with length=500 and got this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 4142 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 1 2 3 4 5 6 7 8 9 10 11 12 13 15 15 16 16 18 19 20 21 22 23 24 25 25 26 27 28 29 30 31 32 32 33 34 35 36 37 38 39 41 42 44 45 46 47 48 50 51 53 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 85 86 87 88 89 90 92 93 94 95 97 98 99 100

This is the third time that a candidate has been elected. In this article I will use the names of the candidates and the candidates. In 2016 the following is a list of the current and former U.S. presidential candidates: Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush/Bush/Bush/Bush (with Republican presidential candidates) Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush/Bush/Bush/Bush/Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush Former Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/Bush/B-1919191929

drcongo · on March 17, 2023

Gonna use this as lyrics, thanks.

sva_ · on March 16, 2023

That's pretty neat. I'm personally wondering in how far ML compute will be done on consumer devices, rather than on servers. We're currently seeing a lot of models that are so large that it doesn't seem feasible to run them locally. But I think there is reason to believe that these models carry a lot of redundancy. Redundancy that could lead to order of magnitude less memory/compute needed.

Or perhaps hardware will catch up before.

nwoli · on March 16, 2023

The trick here will be using large models as data generators to distill some sub task into a web computable model. (I’ve done it a few times for vision rather than text and it’s amazing how potent it is.)

ShamelessC · on March 17, 2023

Right! In a lot of cases, just having the synthetic responses plus human filtering for your sub task is enough for less essential tasks. I’m thinking of “procedural” content useful for less sensitive things like games.

thebruce87m · on March 17, 2023

Can you describe the vision bit? I have a general idea but would like to know the details, e.g. which models you used.

simonw · on March 16, 2023

It's possible to run a full GPT-3 style language model on any device with 4GB of RAM now, so running models on consumer devices is getting more and more feasible by the day. https://simonwillison.net/2023/Mar/11/llama/

whimsicalism · on March 17, 2023

Its possible to run a RLHF tuned Llama 7b model. Whether this is "full GPT-3 style" is up for debate.

freedomben · on March 17, 2023

I'm mostly a layman with ML stuff, so I might be doing something wrong, but I've not been impressed with Llama even at higher levels. I've run the 35B model in my home lab and it gave some pretty nonsensical responses. The 13B did better though, so could very well be user error.

refulgentis · on March 17, 2023

There’s this gold rush going on, you’re right, any B without RLHF is meh.

The things getting published as “on device LLM” focus on bitcrushing the lowest B model with minimal RLHF and then pronouncing we have on device LLMs. We’ll definitely get there but signal >>> noise currently.

First person to admit this and write their blog post with A / B tests vs. a Markov chain deserves the gold.

simonw · on March 17, 2023

Have you tried Alpaca yet? It's a massive improvement on base LLaMA.

illiarian · on March 17, 2023

> I'm personally wondering in how far ML compute will be done on consumer devices, rather than on servers.

Running ML on the device has been one of Apple's value propositions for a long time. They are currently silent on everything that's unfolding, but I expect them to at least mention something and WWDC (and trying to run that something on the device)

bredren · on March 17, 2023

If I understand correctly, there was an all-company invited annual AI day which was silent on recent developments.

But then ~two weeks later there was what seemed like an on-background / press leak about the XDG group that specifically mentioned AI as a current discipline. (Gurman / Bloomberg)

It seems to me that the release of Core ML stable diffusion (mentions itt) is something if a comment in of itself. At least in the read between the lines / hiding in plain sight style of Apple.

The company is unveiling a new and presumably next major computing platform at a quality level only they could possibly deliver.

So the relative quiet / lack of comment may be in deference to the gravity of that work.

That said, these changes are too big to ignore—-we should at least hear language that acknowledges the major developments in AI of late at WWDC and some idea for how Apple is thinking about them.

refulgentis · on March 17, 2023

They’re there, released Core ML Stable Diffusion a couple months ago.

bredren · on March 17, 2023

I am not a Swift dev but it seemed like the speed of this release was very fast by Apple standards.

Can anyone in the know confirm that?

Ameo · on March 16, 2023

> Or perhaps hardware will catch up before.

I feel like that's been the pretty consistent lesson in computing over the past decades. New technologies start out as expensive, exotic, and specialized and become cheap and commonplace over time. The more business value the technology provides, the faster it will happen as well I think.

The models will certainly get better (faster to train, less data needed, smaller param counts, etc.) too, though, just like compilers and software have evolved hugely alongside hardware.

yieldcrv · on March 17, 2023

they'll meet in the middle. that's what's already happening, and there will probably be co-processors added into consumer devices that excel specifically at the kind of processing that these models need.

dragonwriter · on March 17, 2023

> there will probably be co-processors added into consumer devices that excel specifically at the kind of processing that these models need.

There already are, e.g., Google Edge TPU, Apple Neural Engine, etc.

Tostino · on March 17, 2023

They don't help with the memory requirements of these LLMs though.

yieldcrv · on March 17, 2023

are any of the LLM or image AI like Stable Diffusion fine tuning methods leveraging Apple Neural Engine?

the best I've seen is a renderer leveraging "metal"

justinator · on March 17, 2023

Hmm, this works with literal translation, then?

    Hello, how are you?

is literally,

    Bonjour, comment êtes-vous?

But usually you would say,

    Bonjour, comment ça-va?

(Hello, how goes it?)

Which the model likes to translate to,

    Bonjour, comment est-ce faite?

Which no french person would ever say to you because that's a lot of words and doesn't really sound very... French.

And of course are you talking to someone familiar... so on and so forth.

xenova · on March 17, 2023

Hi! Creator of the library here. If you change the generation parameters to be greedy (i.e., sample=no and top_k=0), you will get "Bonjour, comment êtes-vous?"

The top_k and sample generation parameters are just there to show that they are supported :), and is sometimes useful for the other tasks (like text generation w/gpt2, to get more variety)

scambier · on March 17, 2023

I understand there's reasons the translation is incorrect, but if the very first example you're showing on the page is wrong, most people (who are fluent enough) will just roll their eyes and leave it at that. Maybe showcase an example that works?

t00ny · on March 17, 2023

I did a couple of tries with simple sentences in French and the results were not great. But it’s still impressive.

Edd314159 · on March 17, 2023

I uploaded the Windows XP desktop wallpaper into the image classifier. Just the raw image file. It gave me the labels "monitor", "computer screen", "desktop". "Field", "sky", grass", that kind of thing were nowhere to be found.

I know this is more of a comment on the state of AI models than Transformers.js. It's probably not even representative of state-of-the-art image classifier models. Just a fun example of how these things learn.

xenova · on March 17, 2023

Haha very interesting! I assume it's because that type of image is only found on computer screens, so, the model thinks the grass "contributes to it's idea of what a computer screen is".

... and of course, the library only ports those models to the browser; if you train a better model, you can always convert it to the ONNX format, then use it with the library.

iLoveOncall · on March 16, 2023

Even the default example of "Hello, how are you?" from English to French yields an awfully wrong result ("Hello, what is your experience?")...

I wouldn't trust them for anything else.

The other models are not better, here's the text generation output from "I enjoy walking my cute dog":

> I enjoy walking with my cute dog, I have been going to the park, and I just happened to like walking with my cute dog. I like to play with the dog. My dog (Hannah) has been on my way home since December and when she came home she told me to go out and stay back. I told her that she had been too busy. I had to start working and had to go outside and go see myself again.

It could be just an algorithm that generates random sentences that it wouldn't make less sense.

xenova · on March 17, 2023

Hi there! Creator of Transformers.js here :)

I think it's worth pointing out that the library just gets the models working in the browser. The correctness of the translation is dependent on the model itself.

If you run the model using HuggingFace's python library, you will also get the same results (I've tested it, since, I wasn't too happy with those default translations and generations).

With regards to the text generation output, this is also similar to what you will get from the PyTorch model. Check out this blog post from HuggingFace themselves which discusses this: https://huggingface.co/blog/how-to-generate.

nl · on March 17, 2023

> Even the default example of "Hello, how are you?" from English to French yields an awfully wrong result ("Hello, what is your experience?")...

Really? For me that gives "Bonjour, comment êtes-vous?" with the default settings.

> text generation output

Yeah, text generation is really something that requires a big model. The Llama 7B param model quantized to 4bit is 13G and that is the smallest model I'd actually attempt to use for unconstrained text generation.

wdaher · on March 17, 2023

> "Bonjour, comment êtes-vous?"

The idiomatic translation here would be "Bonjour, comment allez-vous?"

xenova · on March 17, 2023

As shown in the demo video (on GitHub [1], or Twitter [2]), you do get that result sometimes (with randomness)

Using greedy sampling (sample=false and top_k=0) you get "Bonjour, comment êtes-vous?", which appears to be a very direct translation.

As mentioned in one of my previous comments, these inaccuracies also occur in the PyTorch models, and so, it's not the library's fault :')

[1] https://github.com/xenova/transformers.js [2] https://twitter.com/xenovacom/status/1628895478749315073

Fiaxhs · on March 17, 2023

« Bonjour, comment êtes-vous? » barely translates to « Hi, how are you feeling today? » or, depending on the context, to something like « Hi, please describe yourself » to a native French speaker.

Broge · on March 17, 2023

Yes I love being able to run ML models without dealing with Python package management!

k1t · on March 17, 2023

I guess single words just don't give it enough context to go on - I got some pretty weird results by just switching the input text to: Hi!

Often it would say "Bonjour", but then it would say things like:

ce sujet, je peux dire tout à fait que les médias sont vraiment un grand plus beau jeu de tas d'élevage dans mon ensemble.

or

Voir le chapitre intitulé “E-Malonie”, à l’adresse : http://www.mythuana.com/index_f.php!

and once simply: o

petilon · on March 17, 2023

How does this compare to tensorflow.js [1] ?

[1] https://www.tensorflow.org/js

winchester6788 · on March 17, 2023

They both solve different issues. This is a library akin to huggingface transformers, while tensorflow js is akin to tensorflow or pytorch

BenoitP · on March 17, 2023

This project is a wrapper over ONNX-converted pytorch models; ONNX[1] would be a tensorflow.js equivalent with the javascript backend

[1] https://onnxruntime.ai/pytorch

mahastore · on March 16, 2023

This is great. Awesome work. I selected the model for sentiment analysis and changed the prompt . Though it took a while to download the roughly 170MB of model file but I understand it is just a one time thing. And it did the work without crashing. I can imagine this being used in many devices with embedded browser.

junrushao1994 · on March 17, 2023

Curious if this library can be integrated with WebGPU - there is a recent post on (https://news.ycombinator.com/item?id=35191687) announced that WebGPU can now be used for large models

xenova · on March 17, 2023

Once ONNX runtime releases their WebGPU backend, we will add support for it! :)

It should also be noted that browser support for it isn’t very high at the moment… so, unfortunately, we are stuck with WASM (CPU) for now.

Garcia98 · on March 17, 2023

How does this compare to this project: https://github.com/visheratin/web-ai

Does it support using custom ONNX models?

sebastianconcpt · on March 17, 2023

In one hand it's impressive how much it can do and in the other is not useful for anything more than making interesting a character of a videogame?

colordrops · on March 17, 2023

And also privacy-respecting services.

slimebot80 · on March 17, 2023

Are there more accurate models available?

All my tests seem to give poor results. I assume because it has to be a downloadable size?

xenova · on March 17, 2023

Here is the full list of available models: https://huggingface.co/Xenova/transformers.js/tree/main/quan...

As I mentioned in another comment, the library just allows the models to be run in the browser. The models generally give the same outputs as if they were run with their PyTorch equivalents, so, the quality can (for the most part) be blamed on the original model.

Also, remember to play around with generation parameters. Some tasks like code completion and speech-to-text work best with greedy sampling (sample=false, top_k=0), while others like text generation work best with random sampling (sample=true, top_k>0)

redox99 · on March 16, 2023

What's performance like, compared to regular PyTorch (running on CPU)?

sanxiyn · on March 17, 2023

This runs on GPU with WebGL, so it will depend on what GPU you have.

Broge · on March 17, 2023

Looking at the code it seems like it's only running using simd so far. I think the creator said something about the WebGL models being inaccurate when quantized or something.

sanxiyn · on March 17, 2023

Indeed, I see WebGL commented out here: https://github.com/xenova/transformers.js/blob/main/src/mode....

xenova · on March 17, 2023

Right - currently, everything runs using WASM (32-bit, with 64-bit coming soon [1,2]), and I plan to add support for WebGPU soon!

(WebGPU is the successor to WebGL, which is coming out in April 2023 [3])

[1] https://github.com/WebAssembly/memory64/issues/36#issuecomme... [2] https://groups.google.com/a/chromium.org/g/blink-dev/c/VomzP... [3] https://github.com/microsoft/onnxruntime/issues/11695#issuec...

petilon · on March 17, 2023

If I run transformer.js using Node on a machine that has GPU (such as Nvidia Jetson Nano) will it take advantage of the GPU?

xenova · on March 17, 2023

Currently the library only runs on the CPU (it's only a few weeks old). WebGPU support is planned though (which releases soon [1])

[1] https://groups.google.com/a/chromium.org/g/blink-dev/c/VomzP...

JDazzle · on March 17, 2023

This isn't a robot in disguise! Imposter!

patientplatypus · on March 17, 2023

I'd like to use this transformer model in rust (because it's on the backend, because I can use data munging and it will be faster, and for other reasons). It looks like a good model! But, it doesn't compile on Apple Silicon for wierd linking issues that aren't apparent - https://github.com/guillaume-be/rust-bert/issues/338. I've spent a large part of today and yesterday attempting to find out why. The only other library that I've found for doing this kind of thing programmatically (particularly sentiment analysis) is this (https://github.com/JohnSnowLabs/spark-nlp). Some of the models look a little older, which is OK, but it does mean that I'd have to do this in another language.

Does anyone know of any sentiment analysis software that can be tuned (other than VADER - I'm looking for more along the lines of a transformer model) - like BERT, but is pretrained and can be used in Rust or Python? Otherwise I'll probably using spark-nlp and having to spin another process.

Thanks.