Even the full 7b model's results are relatively low-res (384x384) so its hard for me to imagine the generative aspect of the 1b model would be useable.
I am not sure if the results are that comparable to be honest. For example DALL-E expands the prompt by default to be much more descriptive. We would need to somehow point out that it is close to impossible to produce the same results than DALL-E, for example.
I bet there has been a lot of testing that what looks "by default" much more attractive for the general people. It is also a selling point, when low effort produces something visually amazing.
I actually had some pretty impressive results (and a few duds). I think we've lost sight of how amazing something like this actually is. I can run this on my low-end GPU in a web browser and it doesn't even tax it, yet it's creating incredible images out of thin air based on a text description I wrote.
Just three years ago this would have been world-changing.
If I remember correctly, SD had less than 1B parameters at launch (~2 years ago?), and you could generate pretty impressive images with the right settings and prompts.
Janus Pro 1B is a multimodal LLM, not a diffusion model, so it's got a bit more things to pack in the parameters. It is super low parameter count, in an LLM context.
Hi HN! We’re excited to launch JanusPro-AI, an open-source multimodal model from DeepSeek that unifies text-to-image generation, image understanding, and cross-modal reasoning in a single architecture. Unlike proprietary models, JanusPro is MIT-licensed and optimized for cost-efficiency—our 7B-parameter variant was trained for ~$120k, outperforming DALL-E 3 and Stable Diffusion XL in benchmarks like GenEval (0.80 vs. 0.67) 25.
Why JanusPro?
Decoupled Visual Encoding: Separates image generation/understanding pathways, eliminating role conflicts in visual processing while maintaining a unified backbone 2.
Hardware Agnostic: Runs efficiently on consumer GPUs (even AMD cards), with users reporting 30% faster inference vs. NVIDIA equivalents 2.
Ethical Safeguards: Open-source license restricts military/illegal use, aligning with responsible AI development
It's not too bad given that it runs in your browser. I took your prompt and asked GPT-4o mini to elaborate on it and got this https://imgur.com/a/qmQ7ZHl
I like the local running of this and learning about how it works.
Q:These models running in WebGPU all seem to need nodejs installed. I that for just the local 'server side', can you not just use a python http server or tomcat for this and wget files?
Had a peek at the repo and it looks to be a react frontend, so a JavaScript runtime is needed to "bundle" the application in a way browsers can consume. If you had the dist folder then I imagine you can use whatever web server you want to serve the static files.
well it was a long shot anyway but it doesn’t seem to work on mobile. (tried on iOS safari on iPhone 11 pro)
a 1B model should be able to run in the RAM constraints of a phone(?) if this is supported soon this would actually be wild. Local LLMs in the palm of your hands
I don't know about this model but people have been running local models in Android phones for years now. You just need a large amount of ram (8-12 GB), ggml and Termux. I tried it once with a tiny model and it worked really well.
reply