Yes! That's where our heads are at as well. The reality with a lot of multimodal / image proc style code is that it's never truly serverless - image manipulation in node.js is tragically bad so you always end up needing python endpoints to do it.
Re: version / client languages etc - right now we don't have block versioning but it's definitely going to be required. As of now the blocks are each their own endpoint, by design. We're thinking about allowing people to share their own blocks and perhaps even outsource compute to endpoint providers, while we focus on the orchstration laters.
Better observability and monitoring is definitely on the docket as well. Especially because some of these tasks take a really long time - some times even going past the expiry window of the REST api. We'll be switching over to queued jobs and webhooks
It's a combination of things. The idea is that you can build workflows that chain functionality from ai models, as well as lower level image processing tasks. For lower level tasks we use the usual suspects - PIL, ImageMagik, OpenCV etc.
Thats true. We started off with a base set of blocks but i think the real utility will come in the easy orchestration and api end point building. We're pushing in the direction of apis and shareable workflows so hopefully some of these comparisons get clarified soon
We started this to solve bulk processing issues we had when building a previous eCommerce tool so I 100% know what you mean. We're adding API support soon and we'll add some examples of how to connect this to Shopify or something like Airtable/ Strapi / Retool etc for workflow automations
That being said, you're not wrong. It's definitely inspired by ComfyUI. But, with much simpler abstractions, much broader utility and extensions like building a user front end coming up shortly
Oh yea I know what you mean. There are several parallels here with shader nodes for sure. We've been thinking about a voyager/agent-style approach where an agent can start to learn "skills" where skills are individual blocks. Each skill represents a certain function applied to an image and based on a specific instruction set we should be able to craft a sequence of actions that will lead to that result.
One way to leverage that is building the graphs via a prompt, but another way might be to not think of the workflow as a pre-constructed graph at all. Rather perhaps we build dynamic graphs whenever you ask for a certain action - like a conversational image editing interface.
So you say something like make the woman's hair purple. We apply segmentation to the hair, and then add a puple color overlay exactly to that area.
Re: version / client languages etc - right now we don't have block versioning but it's definitely going to be required. As of now the blocks are each their own endpoint, by design. We're thinking about allowing people to share their own blocks and perhaps even outsource compute to endpoint providers, while we focus on the orchstration laters.
Better observability and monitoring is definitely on the docket as well. Especially because some of these tasks take a really long time - some times even going past the expiry window of the REST api. We'll be switching over to queued jobs and webhooks