Hacker News new | past | comments | ask | show | jobs | submit login
Janus Pro 1B running 100% locally in-browser on WebGPU (reddit.com)
212 points by fofoz 1 day ago | hide | past | favorite | 22 comments





The image generation results are extremely poor, but it's exciting that it does _anything_ in the browser.

Even the full 7b model's results are relatively low-res (384x384) so its hard for me to imagine the generative aspect of the 1b model would be useable.

Comparisons with other SoTA (Flux, Imagen, etc):

https://imgur.com/a/janus-flux-imagen3-dall-e-3-comparisons-...


I am not sure if the results are that comparable to be honest. For example DALL-E expands the prompt by default to be much more descriptive. We would need to somehow point out that it is close to impossible to produce the same results than DALL-E, for example.

I bet there has been a lot of testing that what looks "by default" much more attractive for the general people. It is also a selling point, when low effort produces something visually amazing.


It's still very impressive that it gets the cube order right!

Also it looks like octopuses are suffering the “six finger hand” syndrome with their arms from all models.


I actually had some pretty impressive results (and a few duds). I think we've lost sight of how amazing something like this actually is. I can run this on my low-end GPU in a web browser and it doesn't even tax it, yet it's creating incredible images out of thin air based on a text description I wrote.

Just three years ago this would have been world-changing.


I don't know a lot about image generation models, but 1B sounds super low for this kind of model, so I'm pretty impressed, personally.

If I remember correctly, SD had less than 1B parameters at launch (~2 years ago?), and you could generate pretty impressive images with the right settings and prompts.

Yep! Less than 1B in total [0]:

> 860M UNet and 123M text encoder

[0] https://github.com/CompVis/stable-diffusion/blob/main/README...


Janus Pro 1B is a multimodal LLM, not a diffusion model, so it's got a bit more things to pack in the parameters. It is super low parameter count, in an LLM context.

Oh wow okay thank you for the context

The reason why this doesn't work on Firefox:

https://news.ycombinator.com/item?id=41157383


Hi HN! We’re excited to launch JanusPro-AI, an open-source multimodal model from DeepSeek that unifies text-to-image generation, image understanding, and cross-modal reasoning in a single architecture. Unlike proprietary models, JanusPro is MIT-licensed and optimized for cost-efficiency—our 7B-parameter variant was trained for ~$120k, outperforming DALL-E 3 and Stable Diffusion XL in benchmarks like GenEval (0.80 vs. 0.67) 25.

Why JanusPro? Decoupled Visual Encoding: Separates image generation/understanding pathways, eliminating role conflicts in visual processing while maintaining a unified backbone 2.

Hardware Agnostic: Runs efficiently on consumer GPUs (even AMD cards), with users reporting 30% faster inference vs. NVIDIA equivalents 2.

Ethical Safeguards: Open-source license restricts military/illegal use, aligning with responsible AI development

please checkout the website: https://januspro-ai.com/


Happy to have these models running locally on a browser. However, the results are still quite poor for me. For example: https://imgur.com/a/Dn3lxsU

It's not too bad given that it runs in your browser. I took your prompt and asked GPT-4o mini to elaborate on it and got this https://imgur.com/a/qmQ7ZHl

The burger looks good.


https://www.janusproai.net/ This is janus pro website that can be tried online.

I like the local running of this and learning about how it works.

Q:These models running in WebGPU all seem to need nodejs installed. I that for just the local 'server side', can you not just use a python http server or tomcat for this and wget files?


Had a peek at the repo and it looks to be a react frontend, so a JavaScript runtime is needed to "bundle" the application in a way browsers can consume. If you had the dist folder then I imagine you can use whatever web server you want to serve the static files.

well it was a long shot anyway but it doesn’t seem to work on mobile. (tried on iOS safari on iPhone 11 pro)

a 1B model should be able to run in the RAM constraints of a phone(?) if this is supported soon this would actually be wild. Local LLMs in the palm of your hands


I don't know about this model but people have been running local models in Android phones for years now. You just need a large amount of ram (8-12 GB), ggml and Termux. I tried it once with a tiny model and it worked really well.

This is from Reddit, what were you expecting?

This needed a 4 GB renderer process and about that much additional memory use in the GPU process for me, in Chrome.

  Local LLMs in the palm of your hands
https://apps.apple.com/us/app/mlc-chat/id6448482937



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: