Janus Pro 1B running 100% locally in-browser on WebGPU

ndr · 2025-01-28T15:02:59 1738076579

The image generation results are extremely poor, but it's exciting that it does _anything_ in the browser.

vunderba · 2025-01-28T16:47:46 1738082866

Even the full 7b model's results are relatively low-res (384x384) so its hard for me to imagine the generative aspect of the 1b model would be useable.

Comparisons with other SoTA (Flux, Imagen, etc):

https://imgur.com/a/janus-flux-imagen3-dall-e-3-comparisons-...

nicce · 2025-01-28T20:39:32 1738096772

I am not sure if the results are that comparable to be honest. For example DALL-E expands the prompt by default to be much more descriptive. We would need to somehow point out that it is close to impossible to produce the same results than DALL-E, for example.

I bet there has been a lot of testing that what looks "by default" much more attractive for the general people. It is also a selling point, when low effort produces something visually amazing.

littlestymaar · 2025-01-28T18:21:40 1738088500

It's still very impressive that it gets the cube order right!

Also it looks like octopuses are suffering the “six finger hand” syndrome with their arms from all models.

qingcharles · 2025-01-28T17:30:05 1738085405

I actually had some pretty impressive results (and a few duds). I think we've lost sight of how amazing something like this actually is. I can run this on my low-end GPU in a web browser and it doesn't even tax it, yet it's creating incredible images out of thin air based on a text description I wrote.

Just three years ago this would have been world-changing.

jjice · 2025-01-28T15:11:33 1738077093

I don't know a lot about image generation models, but 1B sounds super low for this kind of model, so I'm pretty impressed, personally.

diggan · 2025-01-28T15:24:57 1738077897

If I remember correctly, SD had less than 1B parameters at launch (~2 years ago?), and you could generate pretty impressive images with the right settings and prompts.

salviati · 2025-01-28T16:18:08 1738081088

Yep! Less than 1B in total [0]:

> 860M UNet and 123M text encoder

[0] https://github.com/CompVis/stable-diffusion/blob/main/README...

refulgentis · 2025-01-28T17:24:19 1738085059

Janus Pro 1B is a multimodal LLM, not a diffusion model, so it's got a bit more things to pack in the parameters. It is super low parameter count, in an LLM context.

jjice · 2025-01-28T15:31:58 1738078318

Oh wow okay thank you for the context

amelius · 2025-01-28T16:21:21 1738081281

The reason why this doesn't work on Firefox:

https://news.ycombinator.com/item?id=41157383

thefirstname322 · 2025-01-29T10:34:05 1738146845

Hi HN! We’re excited to launch JanusPro-AI, an open-source multimodal model from DeepSeek that unifies text-to-image generation, image understanding, and cross-modal reasoning in a single architecture. Unlike proprietary models, JanusPro is MIT-licensed and optimized for cost-efficiency—our 7B-parameter variant was trained for ~$120k, outperforming DALL-E 3 and Stable Diffusion XL in benchmarks like GenEval (0.80 vs. 0.67) 25.

Why JanusPro? Decoupled Visual Encoding: Separates image generation/understanding pathways, eliminating role conflicts in visual processing while maintaining a unified backbone 2.

Hardware Agnostic: Runs efficiently on consumer GPUs (even AMD cards), with users reporting 30% faster inference vs. NVIDIA equivalents 2.

Ethical Safeguards: Open-source license restricts military/illegal use, aligning with responsible AI development

please checkout the website: https://januspro-ai.com/

pentagrama · 2025-01-28T16:57:34 1738083454

Happy to have these models running locally on a browser. However, the results are still quite poor for me. For example: https://imgur.com/a/Dn3lxsU

sdesol · 2025-01-28T17:18:36 1738084716

It's not too bad given that it runs in your browser. I took your prompt and asked GPT-4o mini to elaborate on it and got this https://imgur.com/a/qmQ7ZHl

The burger looks good.

gavinguang · 2025-01-29T08:07:19 1738138039

https://www.janusproai.net/ This is janus pro website that can be tried online.

n-gauge · 2025-01-28T17:22:04 1738084924

I like the local running of this and learning about how it works.

Q:These models running in WebGPU all seem to need nodejs installed. I that for just the local 'server side', can you not just use a python http server or tomcat for this and wget files?

andrewmackrodt · 2025-01-28T17:26:16 1738085176

Had a peek at the repo and it looks to be a react frontend, so a JavaScript runtime is needed to "bundle" the application in a way browsers can consume. If you had the dist folder then I imagine you can use whatever web server you want to serve the static files.

jedbrooke · 2025-01-28T15:41:06 1738078866

well it was a long shot anyway but it doesn’t seem to work on mobile. (tried on iOS safari on iPhone 11 pro)

a 1B model should be able to run in the RAM constraints of a phone(?) if this is supported soon this would actually be wild. Local LLMs in the palm of your hands

nromiun · 2025-01-28T15:49:15 1738079355

I don't know about this model but people have been running local models in Android phones for years now. You just need a large amount of ram (8-12 GB), ggml and Termux. I tried it once with a tiny model and it worked really well.

kittikitti · 2025-01-28T22:30:46 1738103446

This is from Reddit, what were you expecting?

bla3 · 2025-01-28T16:33:19 1738081999

This needed a 4 GB renderer process and about that much additional memory use in the GPU process for me, in Chrome.

rahimnathwani · 2025-01-28T20:25:29 1738095929

  Local LLMs in the palm of your hands

https://apps.apple.com/us/app/mlc-chat/id6448482937