Not really understanding the benefit of running this at the edge to be honest? The additional latency for the request is absolutely negligible compared to the latency of LLMs/SD.
> Not really understanding the benefit of running this at the edge to be honest?
It's primarily a benefit for Cloudflare. Instead of huge and expensive mega-datacenters that AWS/GCP/Azure operates, they can rent cheaper colo space and better distribute workloads. The latter is, I think, the key... AWS basically incentivises you to stay in a single region as long as possible (mostly because the UX of both the web UI and the CLI just sucks when dealing with multiple regions), which means that a lot of users tend to stick in the AWS region most close to themselves, the services aren't really interconnected between regions because that's a headache to set up, while Cloudflare runs "at the edge" from the beginning and people don't even think about introducing silent dependencies on any specific region. And if a Cloudflare DC/region has a massive outage, chances are high no one will notice it because the workloads will just silently shift to somewhere else.
It's a bit of a "if all you have is a hammer, everything looks like a nail" situation. It's not about the latency from you to the edge node, it's about already being in the Cloudflare worker ecosystem as a developer.
For voice recognition, latency absolutely matters.
Getting started with Workers AI + SDXL (via API) couldn’t
be easier. Check out the example below:
curl -X POST \
"https://api.cloudflare.com/client/v4/accounts/{account-id}/ai/run/@cf/stabilityai/stable-diffusion-xl-base-1.0" \
-H "Authorization: Bearer {api-token}" \
-H "Content-Type:application/json" \
-d '{ "prompt": "A happy llama running through an orange cloud" }'
-o 'happy-llama.png'
First of all, there is a \ missing before the last line.
Second, what is my "{account-id}"? I can't find it anywhere in the Cloudflare dashboard.
I have the feeling it might be my email?
But when I use that, I get this error:
{"result":null,"success":false,"errors":[{"code":7003,"message":"Could not route to /client/v4/accounts/<my_email>/ai/run/@cf/stabilityai/stable-diffusion-xl-base-1.0, perhaps your object identifier is invalid?"}],"messages":[]}
No, you don't have to register a domain with them.
Sign up (or log in), and you'll be taken to your Dashboard.
On the left is a bunch of options. Click on the one labeled "Workers", and on that page in the top right you'll see "Account ID". That value should be the one you want.
An order of magnitude cheaper, open AI DALL-3 I think currently is priced around $0.04 per image generation. But it's obviously not nearly as good out of the box.
Oddly, I don't see anything about pricing for Workers AI on the Workers pricing page[0] but their Workers AI blog post from Sept 2023[1] says the pricing is per 1k "neurons":
> Users will be able to choose from two ways to run Workers AI:
> Fast Twitch Neurons (FTN) - running at nearest user location at $0.125 / 1k neurons
> Neurons are a way to measure AI output that always scales down to zero (if you get no usage, you will be charged for 0 neurons).
Here's the key detail:
> To give you a sense of what you can accomplish with a thousand neurons, you can: generate 130 LLM responses, 830 image classifications, or 1,250 embeddings.
productionizing ai models is a pain, this makes it easy. say you were building a d&d app and wanted to generate character art, this would make it very easy to get started. aws has similar offerings (e.g sage maker) but it’s not on the edge.
Trying to understand why a developer would like to call an API to generate code rather than use a coding AI assistant within their editor? Genuinely curious.
Why? Well, I'm considering using a LLM API to generate per-user custom code at runtime -- like a query builder that accepts plain English. The application involves filtering a data stream by the user's custom criteria.
I'm not yet committed to this because I know that many (most?) people cannot express their intentions in plain English concisely and precisely enough to be implemented as an algorithm. As my first formal instructor of programming taught me, a lot of programming just that: thinking through what one wants, with sufficient rigor. Support for such a feature could be a nightmare, making it more trouble than it is worth. However, I may offer it as an experiment. It might work well enough to, say, draft Google Sheets formulas that power users could tweak.
I get the feeling Cloudflare doesn't know either. But the model is freely available via Hugging Faces, so why not support it as one of the models. Just because you or I can't think of something doesn't mean that some one else won't. Maybe someone will come up with a genius idea of what to do with it. The other models * seem more useful, but adding models is likely not that much overhead.
I gather $0.01 / 1k neurons. Which apparently is "130 LLM responses, 830 image classifications, or 1,250 embeddings."
What's that in sane measurements like dollars per 1k tokens?
As much as I enjoy measuring my cars speed in beard seconds...could we fkin not?