Hacker News new | past | comments | ask | show | jobs | submit login
Workers AI Update: Stable Diffusion, Code Llama and Workers AI in 100 Cities (cloudflare.com)
77 points by todsacerdoti on Nov 23, 2023 | hide | past | favorite | 32 comments

Anybody know this works out pricing wise?

I gather $0.01 / 1k neurons. Which apparently is "130 LLM responses, 830 image classifications, or 1,250 embeddings."

What's that in sane measurements like dollars per 1k tokens?

As much as I enjoy measuring my cars speed in beard seconds...could we fkin not?

Couple notes after testing:

- relatively slow at around 15 seconds per generated image, although still faster than a cold start from replicate.

- No ability to specify resolution/aspect ratios

- no way to specify the sampler

- limited to 20 steps maximum

Dang, 15 seconds for an image? that's very bad considering the limitation of 20 steps. Even the entry-level GPUs take ~5 seconds for 20 steps image.

thank you. wish we didnt have to wait for third party comments to find out the downsides and limitations.

15s for 20 steps is pretty bad.

Not really understanding the benefit of running this at the edge to be honest? The additional latency for the request is absolutely negligible compared to the latency of LLMs/SD.

> Not really understanding the benefit of running this at the edge to be honest?

It's primarily a benefit for Cloudflare. Instead of huge and expensive mega-datacenters that AWS/GCP/Azure operates, they can rent cheaper colo space and better distribute workloads. The latter is, I think, the key... AWS basically incentivises you to stay in a single region as long as possible (mostly because the UX of both the web UI and the CLI just sucks when dealing with multiple regions), which means that a lot of users tend to stick in the AWS region most close to themselves, the services aren't really interconnected between regions because that's a headache to set up, while Cloudflare runs "at the edge" from the beginning and people don't even think about introducing silent dependencies on any specific region. And if a Cloudflare DC/region has a massive outage, chances are high no one will notice it because the workloads will just silently shift to somewhere else.

It's a bit of a "if all you have is a hammer, everything looks like a nail" situation. It's not about the latency from you to the edge node, it's about already being in the Cloudflare worker ecosystem as a developer.

For voice recognition, latency absolutely matters.

ah very fair point on voice - didn't think about that.

It's for companies like us that already run almost everything directly on Cloudflare Workers

We integrate with Replicate for SDXL but if this was production ready it would have been likely we went with this instead

Maybe now, but in the future?

    Getting started with Workers AI + SDXL (via API) couldn’t
    be easier. Check out the example below:

    curl -X POST \
    "https://api.cloudflare.com/client/v4/accounts/{account-id}/ai/run/@cf/stabilityai/stable-diffusion-xl-base-1.0" \
    -H "Authorization: Bearer {api-token}" \
    -H "Content-Type:application/json" \
    -d '{ "prompt": "A happy llama running through an orange cloud" }'
    -o 'happy-llama.png'
First of all, there is a \ missing before the last line.

Second, what is my "{account-id}"? I can't find it anywhere in the Cloudflare dashboard.

I have the feeling it might be my email?

But when I use that, I get this error:

    {"result":null,"success":false,"errors":[{"code":7003,"message":"Could not route to /client/v4/accounts/<my_email>/ai/run/@cf/stabilityai/stable-diffusion-xl-base-1.0, perhaps your object identifier is invalid?"}],"messages":[]}

On any URL from Cloudflare regarding your account, it's the big ID in the URL. When you're logged in and navigate to https://dash.cloudflare.com/, you'll be redirected to https://dash.cloudflare.com/{account-id}

From that link:

    Log in to the Cloudflare dashboard
    and select your account and domain.
What does that mean? Do I have to register a domain with Cloudflare first?

No, you don't have to register a domain with them.

Sign up (or log in), and you'll be taken to your Dashboard.

On the left is a bunch of options. Click on the one labeled "Workers", and on that page in the top right you'll see "Account ID". That value should be the one you want.

Is this cheaper or more expensive than using OpenAI?

An order of magnitude cheaper, open AI DALL-3 I think currently is priced around $0.04 per image generation. But it's obviously not nearly as good out of the box.

having a hard time calculating what the pricing is for this

Oddly, I don't see anything about pricing for Workers AI on the Workers pricing page[0] but their Workers AI blog post from Sept 2023[1] says the pricing is per 1k "neurons":

> Users will be able to choose from two ways to run Workers AI:

> Regular Twitch Neurons (RTN) - running wherever there's capacity at $0.01 / 1k neurons

> Fast Twitch Neurons (FTN) - running at nearest user location at $0.125 / 1k neurons

> Neurons are a way to measure AI output that always scales down to zero (if you get no usage, you will be charged for 0 neurons).

Here's the key detail:

> To give you a sense of what you can accomplish with a thousand neurons, you can: generate 130 LLM responses, 830 image classifications, or 1,250 embeddings.

[0] - https://developers.cloudflare.com/workers/platform/pricing

[1] - https://blog.cloudflare.com/workers-ai/

How many dollar bills does it take to make a pile worth sleeping in?

Like with most serverless functions

having a hard time calculating why anybody needs this/wants this

productionizing ai models is a pain, this makes it easy. say you were building a d&d app and wanted to generate character art, this would make it very easy to get started. aws has similar offerings (e.g sage maker) but it’s not on the edge.

It seems cheaper than the OpenAI API and is very easy to use from a worker.

I see it more as a convenience feature for people already using CF Workers

Trying to understand why a developer would like to call an API to generate code rather than use a coding AI assistant within their editor? Genuinely curious.

Why? Well, I'm considering using a LLM API to generate per-user custom code at runtime -- like a query builder that accepts plain English. The application involves filtering a data stream by the user's custom criteria.

I'm not yet committed to this because I know that many (most?) people cannot express their intentions in plain English concisely and precisely enough to be implemented as an algorithm. As my first formal instructor of programming taught me, a lot of programming just that: thinking through what one wants, with sufficient rigor. Support for such a feature could be a nightmare, making it more trouble than it is worth. However, I may offer it as an experiment. It might work well enough to, say, draft Google Sheets formulas that power users could tweak.

How could you possibly make such a thing safe from code injection?

I get the feeling Cloudflare doesn't know either. But the model is freely available via Hugging Faces, so why not support it as one of the models. Just because you or I can't think of something doesn't mean that some one else won't. Maybe someone will come up with a genius idea of what to do with it. The other models * seem more useful, but adding models is likely not that much overhead.

* https://developers.cloudflare.com/workers-ai/models/

I will give this a shot but does anybody know what inference times are like?

16-17 seconds for generating one image.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
