Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft Paint's new AI image generator builds on your brushstrokes (petapixel.com)
155 points by mikece 36 days ago | hide | past | favorite | 192 comments



It's a little bit more work to set up, but I'd recommend Krita with diffusion if you want to get into this. Runs locally, swap exact checkpoints you want to use, supports poseable controlnets, and it doesn't phone home to mommy Microsoft.

https://github.com/Acly/krita-ai-diffusion

Video demonstration

https://youtu.be/AF2VyqSApjA


I got myself recently one of those cheap graphic tablets (Huion), for some drawing and note taking, and discovered Krita. Absolutely amazing open-source software. And integration with SD is just amazing.


Agreed. I was looking for a FLOSS Photoshop and stumbled upon this gem. Outstanding piece of work, this is. Up there with the greats like Blender, DaVinci Resolve, VLC.


Sharing this for absolutely no reason ;)

https://krita.org/en/donations/


I've had a pretty awesome experience with this, even with an old 8 GB AMD card. It's not perfect, but being able to run models locally within the paint app while mixing different control layers feels like the future.


>even with an old 8 GB AMD card

That's the main bummer here, not the age of the GPU, but the need for a dedicated one.

When most people nowadays have laptops, and only with integrated GPUs to boot, that's a large demographic being excluded, even with new machines.

Maybe with the new gen of chips with NPUs from Qualcomm, Intel and AMD this will slowly change but they just dropped and I personally won't exchange my 9 month old laptop for a new one just for that feature alone, and I suspect most PC users will behave the same considering the multi year long upgrade cycles in this market.

So it will probably take a long time till on-device generative AI becomes mainstream on the PC.


That's where Qualcomm/Intel/Microsoft "AI PC" branding with 40 TOPS bar comes in. Besides, GPU had been a requirement for AI for years by now.


For decent on-device inference, you need enough memory of high-enough bandwidth and enough processing power connected to that memory.

The traditional PC architecture is ill-suited to this, which is why for most computers, a GPU (which offers all three in a single package) is currently the best approach... only at the moment, even a 4090 only offers enough memory for moderately-sized models to load and run.

The architecture that supports Apple's ARM computers is (by design or happenstance) far better suited: unified memory maximises the model that can be loaded, some higher-end options have decently-high memory bandwidth, and the architecture lets any of the different processing units access the unified memory. Their weakness is cost, and that the processors aren't powerful enough yet to compete at the top end.

So there's currently an empty middle-ground to be won, and it's interesting to watch and wait to see how it will be won. e.g.

- affordable GPUs with much larger memory for the big models? (i.e. GPUs, but optimised)

- affordable unified memory computers with more processing power (i.e. Apple's approach, but optimised)

- something else (probably on the software side) making larger model inference more efficient, either from a processing side (i.e. faster for the same output) or from a memory utilisation side (e.g. loading or streaming parts of large models to cope with smaller device memory, or smaller models for the same output)


Really? I had almost exactly the opposite experience with a 16GB RX 6800 AMD card, which I would hope would be up to the task. It was painfully slow, taking several minutes to do anything, and crashed multiple times. As much as the idea interests me, I gave up on it fairly quickly.


Maybe someone could explain what I'm doing wrong?

Using the online service, I couldn't figure out how to get it to generate an image from a doodle using img2img. Generating an image from a text prompt works, but that's nothing new. Running locally, it was way too slow on my M2 Mac Air.


I was about to ask how you do the equivalent in Krita, but I saw the GitHub showed that under "Using ControlNet to guide image generation with a crude scribble". Cool, gonna try this on my Mac mini.


It was still too cumbersome to set up and use last time I checked. And most of these integrations are just model+prompt without any common features that sdwebui (or comfyui) has. It looks cool as a proof of concept, but isn’t good, cause it’s a poor model+controlnet+segmentation ui. All they have to do is to add a few fields and checkboxes, but don’t expect it soon. I don’t even think that an image editor is a correct starting point for this coop. Imo, it’s webui.


Thanks for sharing, I've been struggling to learn Draw Things and this feels much more intuitive.


Holy crap it is crazy that this is possible at all, let alone offline and local, let alone free and open source. Just, wow.


Putting AI in MS Paint feels like putting a flux capacitor in a Geo Metro.


I think it’s the best place to put it though.

The demographic are people (usually children) who aren’t discerning about the output but would love to have something high quality to look at.

Paint has always been about how some low quality squiggles may represent some latent art ability. It’s not true, but that’s the feeling it gave children and this continues that.

Of course there’s the whole other argument of whether it’s good or not….


> The demographic are people (usually children) who aren’t discerning about the output but would love to have something high quality to look at.

This might be the best argument against putting AI in MS Paint that I've yet seen. I know this is a slippery-slope argument, so take it with the applicable grain of salt, but imagine a world where kids grow up making this kind of generative art, and never progress beyond "low quality squiggles", because they get frustrated when they turn the AI off and it looks bad.


I think there's some history to it. Didn't they release Paint 3D around the time when Hololens and mixed reality was about to be the next big thing?


I recall they were literally going to kill Paint in favor of Paint 3D until they got backlash over it. Does Windows still have a "3D Objects" folder alongside Pictures, Videos, etc?

Microsoft seriously has no taste.


Yes there was.


I think someone at MS hit their head on the toilet...


...yeah but you know, 1.21 Gigawatts!


Wasn't that the point of using the DeLorean? That back in those times it was just a weird failed car?


The Delorean was chosen because it's horrendous 0-60 times made the acceleration to 88mph a believably difficult to achieve goal for the protagonist.


This is very close minded. Won't someone think of the MS Paints PM's corporate career prospects if they didn't do this?


I like the idea that in MS (as probably in so many companies right now) there's a diktat (official or not) to start using AI everywhere.

And somewhere on the Paint team someone smiled, and thought "hold my beer"...

(Just wait until we get Co-pilot baked into Notepad!)


It’s not April 1st. What’s going on here?

Should notepad complete your sentences too?

Maybe file explorer should just make new files for you!


Speaking of April 1st jokes... we're getting closer and closer to actually having CADIE[0], Google's 2009 April Fools joke.

Several things that CADIE was advertised to do are actually possible (if perhaps undesirable) now:

* Having an AI automatically synthesise responses to email on its own is possible, though a really bad idea for what are hopefully obvious reasons.

* A feature to automatically add red-eye could absolutely be built now. Face detection + automatic inpainting + img2img (to keep the look of the original eyes) would do the trick. Obviously not a useful feature, but it doesn't seem impossible.

As I recall, there were other things too, but these are the obvious matches.

[0] https://www.cnet.com/tech/services-and-software/april-fools-...


You have to strike while the iron is hot! Even iTerm just released an update that adds an “AI concierge” to your terminal…

And no, it’s still not April 1st!


Chrome now offers to fill out all my forms I guess. And Facebook now offers to write all my comments for me.

The Internet doesn't need you any longer.


It sounds like a bit, but it's true. Outlook offers to write my e-mails. LinkedIn suggests how I should comment on posts I see. I assume this is all being done to give the impression of feature growth to investors, but the reality is that Dead Internet Theory is becoming more real every day.


Soon AI will post questions and then answer them itself on StackOverflow/StackExchange.


They'll then be automatically closed by an AI moderator as being "Created by AI". Or "Opinion based"


I approve.


As long as they are different models that learn from each other, it’s fine ?


Jira now makes your requirements for you. They all seem perfectly logical and not connected to anything your actual customers need.


So no difference to the real thing? ;)


I hope it can figure out story points for me, as I never understand what they mean and what value to put there.

I think AI will be better at story point estimation than me.


Well Edge browser does complete your sentences.


Sssh! Don't give them ideas...


> What’s going on here?

AI hype overload. Same shit with iterm2, solutions looking for problems


It’s helpful for people who are new to using things on the terminal. I use a similar version with the ctrl+l hotkey, you can do most things on the terminal just by typing english now.

That+fish is an amazing introduction to the shell. Totally new avenue for learning and being immersed quickly.


Totally new avenue to run unknown commands from unknown sources indeed, potentially as root, what could go wrong


AI is a solution in a desperate search of problems at this point. This, and all these companies employing full-time engineers while having had finished products for decades.


Honestly, I've gotten so used to copilot's autocomplete in vscode that I kind of wish it did. (As one commentor pointed out, it does do that in Edge, though it gets annoying and just deletes intentional newlines sometimes)


Installer should install programs without your input, and Settings should change your settings! Oh wait


Does nobody take joy in creating something themselves anymore? Everything has become about the end result, and it's a shame. I'm not sure what we're loosing, but it's something


That assumes that this is something people asked for, and not a case of MS shoehorning AI into everything to look "AI-native" to pump their stock.

Microsoft has long had a 'vote for features' thing on their support forums [0]. I can guarantee that not one person has ever said that MS Paint needed an AI makeover.

[0] https://feedbackportal.microsoft.com/feedback


Someone of us engineers would like to create (reasonable looking) art, but are so bad that we could use as much AI help as we can.


Sometimes the "end result" isn't my end result

I want to create a game, but I have no interest in creating art and models, much less programming physics, netcode, abilities, etc

I'd prefer to create the games rules, mechanics and interactions (aka, the actual game experience). But I am forced to do the above because the gaming industry is lagging behind in DX it barely changed in the last 10 years

Every "art" has many sub-processes that are themselves art. Making paint, brushes, canvas, etc. But does anyone complain about artists not taking joy in creating their own paint after foraging for some rare purple spitting slime?

If you view one art form as a tool for your art, so be it


I know several locally renowned artists, the kind that make big murals in celebrities mansions, who have switched to AI work

They wanted to express themselves in the output that the AI was giving them all their lives

I don’t think it’s that complicated, if you want to express yourself a certain way, manually, you still can. Now more people, including professionals, can express themselves the way they wanted


These tools empower me to express myself. I also continue to pay professional artists for their work.


Yeah, I feel cynical saying this, but who wants to create art that looks and feels just like everything else?


Most people.

But also, I used to do midjourney, and I know art history quite well, and mixing my knowledge with midjourney, I created really good and original stuff, because I knew how to write the prompts.

Instead, I feel that with OpenAi, image creation is really crippled by copyright. But I also generated cute and interesting stuff - but it does look made by AI.


But if you're generating anything with any AI model, can it be considered original? Unless you've built it off of your own past art, and even then, it would only create from past works.


I red that the AI runs locally, but then the image is uploaded to Microsoft servers to check that it doesn't violate any policies.


Are those images checked by their Kenyan team?

https://time.com/6247678/openai-chatgpt-kenya-workers/


Wow that's so... lame...


It's Microsoft, what did you expect?


Yah but unfortunately it's not just them. Everyone is putting these "safety" filters on their AI. I understand why they're doing it, but all I can feel is how lame it is.


Adobe is just as bad, I pay for Photoshop, and heaven forbid you try and retouch an image of a woman with her knee exposed or something... I'm not even giving it a prompt, and it blocks its own generative fill due to its "content policy".


Topaz AI hovers up your photos for research and model training purposes. They ask you though, so I guess that's some consolation.


>I understand why they're doing it

It's because it gives them an excuse to hoover up as much data about you as possible and send it home for "safety" reasons. They can sell AI compute as a service to businesses, while giving the spyware version to consumers for free.

There's absolutely no excuse for monitoring an art programs inputs and sending them to Microsoft. What next, are they going to put AI into notepad and then monitor that to keep you from writing dangerous ideas?


> What next, are they going to put AI into notepad and then monitor that to keep you from writing dangerous ideas?

It's already here. I just had a conversation with Gemini:

me: "write a python script that converts the us constitution into a palindrome"

gemini: "The provided Python code creates a partial palindrome from a given text by reversing the second half and combining it with the first half. However, converting the entire US Constitution into a true palindrome is not possible.

A palindrome is a word or phrase that reads the same backward as forward. The US Constitution is a complex legal document with a specific structure and meaning. Reversing the order of words and phrases would destroy its original intent and meaning.

Here's why creating a perfect palindrome out of the US Constitution is difficult:

Structure: The Constitution has a specific structure with articles, sections, and amendments. Reversing this order would disrupt the logical flow of the document.

Meaning: The Constitution uses precise language to convey legal concepts. Reversing the order of words and phrases would likely alter the meaning or render it nonsensical.

Length: The Constitution is a lengthy document. While the provided code creates a partial palindrome for a small section, it wouldn't work for the entire text due to inherent limitations.

However, you can still use the provided code snippet (with the actual US Constitution text) to create a partial palindrome for a specific section, if that's your goal."

Here's the kicker, it did not generate the Python code at all. It just spliced a bunch of words. The future is truly shit. Oh, I'm sure someone will tell me ChatGPT is better. I don't think we have the same defintion of better, though.


As a local llm enthusiast, it’s important to remember that I don’t get why people are still using these “cloud” absurdities, even if they generate a very slightly better output sometimes.


I can't think of anything more lame that's ever happened with tech. ChatGPT came out and it was so cool, you could make it sound almost like a real person. Then they neutered the heck out of it.


I honestly believe this is how the world's information will continue to be filtered more and more, and the information that is 'disliked' will be slowly wound out from society. All that ever comes to mind is re-writing news articles in 1984 and see this fully ending at that point.


AI is used to translate news stories already. There is a news site in Poland called onet.pl. They use AI to translate stories from Politico and other foreign sources into Polish. The AI they are using cannot translate correctly from English (a non-gendered language) into Polish (a gendered language) and will often swap gender mid-translation, e.g. it will add words like "he said" while citing what a woman said. You can figure out what was meant if it is a story about one person, but when two or more are being quoted the translation turns into a mess with the reader unable to figure out who said what. The AI also makes grammar errors and throws in random words for a good measure. This automated garbage is then hoovered up by search engines and used for training models.


A great example. It doesn't even have to be out of malice as your example shows, just automation without actual checksums.

"For the greater good" ism will just naturally evolve and thoughts / works / stances distant from the median will simply become absorbed and disappear from what we call knowledge.

This is how 'AI' takes over.


Imagine if Microsoft sold pocket calculators. They'd refuse to accept or render numbers like 8008135.


So Microsoft's policies dictate what my machine can generate? Cool, cool cool.


No they dictate what their program (Paint) can generate. Nobody is forcing you to use it.


After they sell it it's no longer theirs. Makita can't come to me and tell me what I can and can't do with their power drill, it's mine to do what I wish. They build it and sell it but once payment has been tendered that's where their possession rights end.


This is purely the choice of the vendor. Makita may not do this, but John Deere absolutely exert control over how their products are repaired and maintained.


They sell you a license to use the software, so you can't for instance redistribute MS Paint as your own software. But you could certainly hack it not to phone home.


Microsoft dictates what their software can generate, much in the same way they dictate what colors you can use in paint and what font sizes you can use in word.

You can still generate as much porn or whatever it is you want to generate using other software.


It's not the same thing. When you change the font size, Word doesn't upload your entire document to the cloud to ensure your content aligns with MS's corporate puritanical value system, freezing the entire UI for 10 seconds while it waits for approval from the Microsoft censors before finally allowing your chosen font size.



yet


[citation needed]

before I get my pitchfork out (I got a new one, it's real shiny), can someone dig up a reference that says they're actually doing this?


I link it in a comment below.


No, this isn't correct. The content moderation is another model that runs locally on the NPU, similar to the group of models used for image generation. There is no image uploaded to Microsoft. Note that there is no support for non-NPU devices.


If so someone needs to correct https://stratechery.com/2024/windows-returns/ article that reports: "That latency, frustratingly enough, doesn’t come from the actual rendering, which happens locally on that beefy hardware, but rather the fact that Cocreator validates everything with the cloud for “safety”".


I really doubt it runs locally. Most people don’t have GPUs that are capable of generating images with anything approaching speed.


Microsoft are trying to sell a whole new class of very expensive computer as part of this AI push.

https://arstechnica.com/gadgets/2024/05/new-arm-powered-surf...


Those wery expensive computers are cheaper than macs with same ram/storage


I wouldn't br surprised if the macs hardware were still better for local LLMs and other AIs tho.


40 trillion ops per second to run microsoft paint lmao


Hopefully someone reverse engineers the model (unless this is a joke)


It's not a joke unfortunately.

From: https://stratechery.com/2024/windows-returns/

"That latency, frustratingly enough, doesn’t come from the actual rendering, which happens locally on that beefy hardware, but rather the fact that Cocreator validates everything with the cloud for “safety”"


Why would MS bother to check something that's been created and stored locally?

Are they also uploading the files into the cloud to feed their AI/ML datasets?


If you use it to make CSAM locally, they would get hammered for allowing it to have happened


I can't wait for my Word documents to be uploaded to Microsoft to ensure I am not writing anything that is deemed to violate their content policy.


Photoshop isn't getting hammered even though it's been used for such things (CSAM, revenge porn, etc) for decades


Adobe Photoshop already checks for currency/money. And their AI extensions certainly check for “inappropriate” material. And there is a disclaimer on their cloud storage service that it can be searched.


Photoshop isn't created by Microsoft which has a completely different corporate ethos than Adobe. Whataboutism doesn't really work regardless of how it is attempted.


lol, nice deflection. MS Paint then. I can guarantee you it has happened.

Also what does "corporate ethos" have to do with "getting hammered"?


The multitudes of pearl clutching types that use Windows vastly outnumbers the number that use Photoshop. If some body goes online and says that Microsoft is allowing Windows to make kiddie porn, the backlash would be for greater than if some company famous for making software used by artists. To even think it would be different is just not being honest with how the world works


How would that even be known? Do Paint-exported JPEGs have a Microsoft watermark?


All it would take is a tweet that says someone used MS Paint to do something that the MS AI system allowed them to create, and the internet would go into fits.


MS stock wouldn’t crumble that day, PR dept would solve it in hours. The only reason they do that is “telemetry”.


Oh please. SD 1.5 has enough CSAM in it that I’m surprised that stability AI hasn’t been raided:

https://www.theverge.com/2023/12/20/24009418/generative-ai-i...


Weird since if image generation can run locally a simple classifier (eg clip) aught to be sufficient and run locally as well before presenting the image


I suspect they have humans reviewing those images and training AI in the process.


Only explanation that makes sense so far.


Other than training, there's CYA to ensure they aren't making inappropriate images of children or other things that might be deemed illegal


It seems like a whole bunch of unnecessarily liability. Once you place yourself in the position of moderator of what people may use their computers for, you are arguably liable for failures of moderation.

To put it another way, nobody's ever sued Canon for making cameras which are used to take illegal photos, but if Canon suddenly started screening your photos to make sure they were acceptable per local laws or whatnot, they suddenly are actually a responsible party in the creation and distribution of whatever it is people use their cameras for.


You and the oil paint commenter are missing a very obvious difference. Oil paints and canon lenses are just tools. Dumb objects even. They do not suggest or create from thin air something. The lens just directs photons in front of it to a sensor. The oil paint just is and gets smeared around into whatever the painter does with it. This is totally different in that you can draw a simple stick figure, and it creates the realistic work on its own. It did it, not you. Thinking that paint and a lens are on the same level is farcical.


Also the oil paint doesn't have its own memory banks that someone could accuse of being trained on c---- p---. But still, I also wonder if Microsoft's liability would be less if it weren't screening images.


I wonder why nobody is asking where do the images used to produce output come from? Have their creators been asked for permission to use them? I'm guessing they come from the Bing data lake of images scraped off the internet. Better add Bing robot to your robots.txt.


You’re stuffing “wow” into a new tech. Cameras were wow too at the time. Now generative ai is real and common like lens and paint.

Regulating it is as absurd as regulating Skyrim or a similar moddable game to prevent users from having it rough with elfs or whatever. It’s all corporate interests covering behind ethics, nothing more.


> it creates the realistic work on its own. It did it, not you.

It recombined other people's images based on analysis of your drawing. That's all it does.


Oh, well, if that's all it does...

Is the whole being obtuse deliberate, or do you really not see how these two things are different?


It would be cheaper for them if they thought on those two things as similar. A dumb tool does not need a team of lawyers and bunch of moderators/reviewers to make sure there's no legal or PR blowback when someone draws a penis and the tool produces an image of a girthy member.


If they want that they could easily have a local classifier and only upload suspected images for review. The fact that they're willing to slow down the experience for everyone implies that they're doing training or something.


Oil paint manufacturers do not have to do that, why should a developer of a paint program have to do it?


If we're going to compare apples and onions, then sure, you can state this comparison with a straight face. For those of us with a bit of seriousness about the conversation, these are not even close to being the same.


It would be better for them if they said it's like a bottle of pouring ink--it forms random swirls and shapes and if someone sees a penis then it's not the problem of the ink manufacturer. The fact that they are sending images to their server means they are doing content screening and model training. That creates potential liability for them. They are already potentially exposed, because someone will sooner or later ask them to reveal the sources of images used for training their models and if it's their Bing image lake then they may have a problem of using other people's content without permission beyond research applications.


Yeah that too. Sure plenty of tools could be used to retouch illegal pics and so on, but a court would probably see it differently if the model distributed by Microsoft is able to produce that on its own.

At the same time... F that, I don't want Microsoft nannying me.


That's kind of even worse, other people have no business at all looking at my private pictures.



Haha that's absolutely terrible.


lmao


What happens if you block “calling home” on a DNS level?


They'll just use their own DNS servers and you won't be able to stop them in an ever-evolving game of cat-and-mouse.


How are they able to sustain keeping this free? Or is the idea that it's a demo for now and they try to charge for this later?


It's not sustainable.

Copilot+ runs locally, but has the quality of 2 years ago, it's basically useless for anything but shitposts and spam.

The cloud AI tools all burn hideous amounts of money, all ran at a loss.

AGI is a red herring, and will not happen. The architecture of generative-AI systems simply doesn't permit the required logic and reasoning capability.

Even the "Actually, Indians" concept of outsourcing the tertiary sector to the developing world by way of having low-skill workers clean up AI generated trash is unviable. (It both doesn't work, and is politically doomed.)

What's going on here is that tech companies are tearing up everything to pump their stock prices after the covid-tech-boom and ZIRP ended. Burn down their core products to keep the bubble going just a bit longer.


I’m not sure why you think cloud AI runs at a loss. This is simply not true. The cloud providers are public companies and they’re definitely not losing money in AI. The bigger AI companies like OpenAI and Anthropic are not losing money in their monetized APIs. Nvidia isn’t selling GPUs at a loss. Every place I’ve worked is absolutely reducing costs and doing new / more business as a result of their LLM use. So I’m not entirely sure my first hand experience and the economics in actual reality jibe with your opinion.


The people selling the proverbial shovels are turning a profit.

> The bigger AI companies like OpenAI and Anthropic are not losing money in their monetized APIs.

[CITATION NEEDED]

Inference is more expensive than claimed, used extensively as a 'slot machine' with users trained to just keep re-generating until they get something useful, and only keeps getting more expensive as model quality has to go up.

And in practice, training the model is far less one-off than claimed. Current tools are not sufficient.

> Every place I’ve worked is absolutely reducing costs and doing new / more business as a result of their LLM use.

Unless you are working in SEO, Marketing, or spam, I don't believe you.

LLMs aren't reliable enough to replace actual human labour. While it's true many companies are fooling themselves into believing they're reducing costs, in practice other staff is picking up the slack. This is unsustainable unless your company has massively overhired.

Things like "AI generated software tests" are a farce. The consequences aren't immediate, but will show up long term.


I work closely with both companies quite a lot so you can either take my word for it or not - it doesn’t bother me either way to be honest. They are pricing inference to make money.

I don’t feel like you really have much experience using LLMs in business. However an example of where they’re very powerful is in summarization. For instance we have a pretty complex customer support model for our fraud and other cases with various disparate data sets including prior cases, related possible fraudsters identified via our fraud models, etc. We built a copilot LLM multi agent system that has access to various functions as sub agents that are prompted and context aware of how to summarize their specified data sets. They also have the ability to render widgets on demand or if their context implies it’s relevant. This allows quite a lot of complex high cognitive load information to be distilled rapidly and the investigators to interrogate the copilot on a case. As the copilot develops “answers” as a summary it dynamically renders an appropriate contextual dashboard with the relevant visualization.

By structuring the application as a multi agent model we can constrain the LLM to pretty well specified tasks with fine tunings and very specific contexts for their specific task. This almost entirely eliminates hallucination and forgetfulness. Even if it were to do so the actual ground truth is visualized for the investigator.

Prior systems either dumped massive amounts of cognitive load in the investigators face or took man years of effort to create a specific workflow, and in an adversarial dynamic space like fraud you need a much more dynamic approach to different types of new attacks.

We aren’t replacing anyone. That’s not our goal. In fact we grew our investigator footprint because both our precision and recall have grown dramatically making our losses much less. We hire more skilled investigators and greater number to address more suspected cases faster and better.

Listen. When John Henry battled the steam drill he did win, but it killed him. Go to any modern bore site and you won’t see less people working on the tunnel but more people - people who aren’t there for their strong back and ability to swing a pick but because they’re highly trained experts. They’re just building more complex tunnels that don’t collapse and don’t lose dozens of workers per dig.

This form of automation is no different in my experience so far.

So, if all you can see is SEO and grift, it might be a lack of imagination and experience on your part and some magical AI thinking sprinkled in. All your points about LLMs failures are true but they also all have solutions that don’t require slot machines as you say or imply it’s all a scam. They’re a tool like any other and they require handling in specific ways to be most effective. Even if chatgpt is a pretty unconstrained interface and that leads to issues doesn’t mean that’s the only way to use the tech.

Use of LLMs to generate software is dumb. Although a protio, LLMs are actually pretty remarkable at generating Cucumber tests as Gherkin is a natural language grammar that plays into their native strength better than producing computer language grammars. This is useful if say you have business people or whatever writing effectiveness or whatever testing where they can provide a specification of policy and a well prompted LLM can generate pretty exhaustive cucumber tests (which can be pretty redundant and formulaic when asserting positive and negative cases exhaustively) which can then be revised by hand as needed. Since they’re natural language as well the business people tend to be pretty good at debugging the tests up front and with a large set of cucumber tests written by hand you’ll see tons of errors anyways. The LLM tests tend to be much much higher quality than the human written ones.


For what it pertains the finances, we simply will have to agree to disagree here. To be convinced I would require detailed financial data that you cannot and should not share with random strangers.

But to say something useful, let me try to elaborate my general criticism here:

> Prior systems either dumped massive amounts of cognitive load in the investigators face or took man years of effort to create a specific workflow, and in an adversarial dynamic space like fraud you need a much more dynamic approach to different types of new attacks.

This begs a question: Why didn't a computer system to summarize this data already exist? Or rather, what stopped the prior systems from doing this work? (And I'll consider conventional machine learning; classifiers and the like, as traditional computer systems here)

And there's generally two options here:

1. Conventional computer systems absolutely could do this work, but they just haven't been built. (Say, because nobody signed off on the R&D but would sign off on AI hype R&D)

2. The LLM system is doing a task the conventional computer system cannot do.

Number one's problem is simple: It's just inefficient and wasteful. Number two is a red flag: There's very little overlap between the things a conventional computer system cannot do, and the things you can trust an LLM to do reliably.

As you describe this system, selecting which data is relevant for fraud investigation is a very traditional classification task. Using normal machine learning for that is basically industry standard.

So what's the LLM actually doing? Subtract the hard logic of normal software, and the classification of machine learning, and the answer is generally: A complex nuanced reasoning task.

But that's precisely what LLMs are not to be trusted for, because they are incapable of that kind of reasoning.

> Listen. When John Henry battled the steam drill he did win, but it killed him.

You're missing the point I was making with that remark. It's not about firing people or not.

It's that these systems are dangerous to evaluate from a high level. It's very easy to miss externalities that'll tip the entire endeavour into a net-negative. You need the investigation of what exactly the AI systems are doing, on a specific detailed level.

E.g.:

> This is useful if say you have business people or whatever writing effectiveness or whatever testing where they can provide a specification of policy and a well prompted LLM can generate pretty exhaustive cucumber tests (which can be pretty redundant and formulaic when asserting positive and negative cases exhaustively) which can then be revised by hand as needed.

"A specification that has been prompted into sufficient detail" is just a program. You're describing the most inefficient declarative programming stack on the planet.

Granted, the programming stack to actually declare business rules this way isn't very good, but using AI here is just an error-prone transpiler.

It's very easy to "looks good to me" these tests and claim the project a success, yet miss subtle errors in the generated tests. I remain skeptical about how well these tests will hold up in the longer term.


You misunderstand. We are one of the top shops for ML based fraud detection. But when someone is accused of fraud they get to appeal it. Then a human is in the loop and the model scores and all inputs are investigated and compared against many other sets of data and policy etc. LLMs facilitate this effort by making the investigators job considerably easier in navigating the enormous amount of complex information. The LLMs role is not to make decisions but to assist in navigating and understanding a lot of high cognitive load information. We have been doing a lot for many years using traditional techniques to make this process easier. But LLMs unlocked a level of dynamism and responsive UX that has blown the lid off our ability to adjudicate appeals. This has significant economic gain for us as offboarding legit customers for fraud causes a lot of losses over a long term.

The LLM isn’t used for reasoning at all. The human does all the reasoning. The LLMs task is summarization and semantic analysis of relevance which LLMs are fantastic about especially in a well managed and fine tuned environment with guard rails and context scoping. It’s a true copilot scenario and the LLM takes direction from the human and answers questions only. All decisions are investigator driven. This is the right relationship. The LLM coupled with IR tools does information retrieval and summarization and the human makes decisions and reasons.


I agree Nvidia is for sure making money.

For the other companies in your list and more specifically the ones that are developing LLM's they are investing billions of dollars in the hope that they will make all of those back and then some in the future. Meta for example said they invested $10bn into AI just last quarter alone.


Note I didn’t discuss the fully loaded costs I discussed the inference costs.

At this point I see training costs as basic research and R&D, partially and not entirely oriented at -making training scale- and cost optimization for next generations of foundational and fine tuned models. For example Falcon is literally basic research funding.

That’s relevant because if they stopped with say Claude Opus and 4o and did no more training they would be handsomely profitable indefinitely because the product is that useful. However it’s an arms race and the limit of effectiveness hasn’t been reached as training costs fall dramatically. So it’s not the right time to stop because whoever stops first loses everything to whoever doesn’t stop. More they’re feeding off each other in a virtuous cycle.

Once diminishing returns kill the race whoever is left in the race will have a handsome business indefinitely as the moat to build such a return diminished model is probably enormous. But if inference optimizations keep going as they’re going it’ll be crazy cheap to operate.

The second layer market of tools that constrain, optimize, and effectively apply the models in effective ways will be the first to really turn a profit. Many hype wagon AI companies are already being snapped up by larger companies to bootstrap their internal work.


good points, thank you


If so, what happens when the music stops and they run out of furniture to burn for heat? Is the job market going to explode again because suddenly they desperately need IT professionals to rebuild the mess they've made? Is it going to tank further because so much capital has been destroyed that there's less to work on?


I agree with everything you’ve said, and to add to this why they do this without caring about the company future is because in 99.9% of the times the ones taking decisions aren’t founders who have a soft spot for the company. Those are long retired in the Bahamas. The leaders come and go quite frequently and the good ones that have been here for a while are ready to retire which means the bigger the boom is before they leave the better, with 0 regard to what comes after them.


> The cloud AI tools all burn hideous amounts of money, all ran at a loss.

Press X to doubt.

I highly doubt if ChatGPT API is losing money. Yes, I've read claims saying so. No, I haven't seen a credible one. And it's getting cheaper and cheaper, currently even faster than Moore's law (gpt-4o is 6x cheaper than gpt-4).


What would be the stock market play here


The simple answer is that AI Paint runs locally. That's it. It's not some kind of grand conspiracy by the "elites".


Brother all of this upcoming AI integration in Windows opens a whole new era of siphoning off valuable user data, that little stable diffusion on MS Paint there could be written off as a small PR and marketing position. In any case it runs locally.


I have absolutely no idea what data they’re even getting out of this.

I am probably more privacy oriented than the next guy, and I’m just not really seeing the pitch here.


An incentive to stay on Windows and/or update to their new "AI-enabled" Surface line? Most regular users will just parse this as "oh that cool generative art AI thing comes built into Windows!".

EDIT: I mean, you don't need to look far to see this effect; see e.g.: https://news.ycombinator.com/item?id=40447474.


The e.g. image generation AI feature is using DALL-E externally and this kind of thing isn't free. The one time cost of a Windows license doesn't seem worth it, especially with the prices OEMs pay for them. There's gotta be other long term revenue generation goals driving it.


Microsoft all but owns OpenAI at this point; using DALL-E isn't free, but they're almost certainly running it on their own infrastructure, and were kind of already giving access to it away for free, with Bing Chat and now Copilot.

Also, if DALL-E is anything like Stable Diffusion, it's effectively free to run. For comparison, my 5+ years old machine that I recently put a 4070 Ti in, can generate images with SD all day long without breaking sweat, and does it faster than even paid on-line services I've interacted with so far. That same machine chokes on any LLM above 8-10 billion parameters; it can handle that much with full GPU offload, but try anything larger and it becomes just an elaborate space heater.

(I tried running 8x7b Mixtral on it once, and managed to OOM and crash the system :/).


Not this specifically, but for thier recall feature they are going to constantly take screenshots of what a user is doing, then then AI will analyze it and keep track of it all.

They say it’s all local. Maybe that’s true. Maybe for now. Will it be forever? Is the recall local, but insights are collected? Time will tell. I certainly won’t be signing up to be part of the experiment.


I think the strategy is to sell the OS and hardware required to run this.


User data is more valuable.

LLMs need lots of input


> User data is more valuable.

If that was true, there would be a free laptop by now.


Wasnt there some effort from facebook to give all 3rd world free laptops? Their whole worth is basically just user data so that makes sense. Not sure why it didnt pan out.


I'm just going off what Microsoft has actually announced which is a new line of surface hardware and an updated OS.


Yeah, probably that and seeing untapped potential in inexplicable omission of specific combination of "Dungeons" and "Dragons" in mobile junk game markets despite popularity of each respective words.


It runs locally, so there's no need for them to spend money on a server-side inference layer.


Windows 11 has ads in the Start Menu now.


I honestly don't mind that as much as them searching the whole web when I'm searching for a local file.


The AI won't let you draw a penguin smashing the windows logo anymore, so this is partly funded by the anticompetition department.

(the above is technically parody and not precisely serious)


When users are not using AI in the Cloud but AI on their PC they are benefitting about as much as such a service online would cost. MS is then able to somehow take part of that saved money, perhaps by selling users more high-end apps which can run local-only on users' PC, and thus save users money they would otherwise spend on data-center services. Parts of that saved money must trickle to MS, because its software is what is producing that value.


They try to kill the competition, make people dependent on AI and then they can put a price tag on it.


Eventually it will tied to o365 subscription (or windows license subscription) is my guess. For now they just swallow the cost to get access to all data and use that data to improve Azure and GPT. Bing didn't make a profit for many years for example


Microsoft is the company that is most obviously pursuing computers as a service. (perpetual OS rental, at least.)


It runs locally! You’re the one paying for the compute by virtue of having the computer it runs on.


We'll see how "sell AI itself as a feature" rather than building actually useful features with AI works.


Love this direction for Paint. Simplistic but capable and fun for kids (the original's main strength)


I worry tools like this will make kids not bother with developing their own drawing skills.


To me it's not so much about drawing skills. Technical drawing ability is not as necessary as it once was.

It's about aesthetic taste. You will end up with lots of art looking like the AI generated aesthetic. E.g. overly impressive looking at first glance, but lacking any real "character" or human element.

Time and time again we see that impressive looking art does not always translate into good art. Keith Haring's simple line art characters or Van Gogh's big brushstrokes were visually very original and very human, even if they weren't as complex and technically impressive as AI generated art.

I like to assume people are smart. And what will always stand out in art is originality and humanness. Even crappy looking MS Paint drawings can have charm to them, like the NBA Paint guy on Twitter.

I don't really think this feature is all that worrisome, it's more that it'll add unnecessary bloat to a program that's supposed to be very simple and just for making quick pictures. If I want to make something more composited I'm going to use Photoshop, not MS Paint.


In the same way that photoshop meant that people didn’t have to learn how different brushes and mediums work on different surfaces?

People have been worrying about this since the time Socrates thought that books would make people dumb because they wouldn’t need to memorise anything.

I’m an artist and I’m excited to see how people learn to use these new tools in creative ways.


It really isn't the same. Photoshop is

a) not a drawing app

b) an expensive prosumer tool with a UI that rivals airplane instrument panels in terms of density. Most kids would be turned off by that as soon as they open it.

MS Paint OTOH is free and has been built into Windows for decades, so it's often the first drawing app kids use (may be different now because of mobile apps). Either way, the 'loss-of-skills' concern here is correlated with the dumbing-down of computing in general, from Gen Z's confusion about file managers and directory structures, to using Grammarly and ChatGPT to write papers.

When I was growing up, my parents would say "You won't always have a calculator in your pocket". Well, they were wrong, but that doesn't mean I don't still benefit from being able to do math in my head for general life stuff (and back of the napkin calculations at work).


I wouldn't be that dismissive - image generation is steadily building crufts and becoming multimodal in input side for marginal returns. Rather I think it's working as a gateway drug into drawing.


One may as well claim that Excel is a gateway drug to learning PEMDAS.


Worrying about this is a bit like your middle school math teacher saying “you won’t always have a calculator in your pocket”. But then we did! Basic arithmetic is no longer a needed skill.


It’s less about need and more about discovering hobbies and interests. Art is a creative outlet. People should still look to have some kind of creative outlet.


Recently I've been walking a considerably younger colleague through linear interpolation and since they use Copilot, we were constantly being interrupted by it's suggestions. It was definitely not conductive to the learning process.

I imagine having something similar in paint will prove pretty distracting as well.


Clippy 2.0.


No idea if this any good... Every 3 seconds of video spawns a 10 second video advertisement. I gave up.


This extremely short demo video was embedded in the article. It gives a good gist of what it does.

https://www.youtube.com/watch?v=dEd3YbCeXYk


I watched this 4 times and I kind of get it, but it is a horrible demo that ends so fast it’s hard to evaluate what happened.

This seems like it should be a 5 minute demo, from a blank canvas to the turtle.


Me being perpetually unartistic I would actually love it if I could just make some sketches and AI could beautify them for me, or fill in the blanks and render what I want.

This would be like what frameworks were for artists who wanted to make full stack webapps.


First time in a long time I kind of wish I had a Windows machine to try this out.


Subj works absolutely the same as sdwebui’s img2img/sketch tab, but with less features. The ui may be overwhelming for the first time, but there’s nothing really hard if you just want to try it out.


Just showed as I read this to a painter/ex-architect girl which replied ‘damn this AI I’m going to be out of job very soon’.


If you haven't seen what they've been doing over at tldraw[1] with AI, you definitely should. A different direction from MS Paint, but very cool.

[1] https://x.com/tldraw


Neat to see this particular tech again. I think it's cool for art's sake. Something someone can use as they paint, so the final result is all their own brush strokes.


That's like calling a paint by numbers art worthy of hanging in a museum. It just let's someone look like they have talent while possessing nothing of the sort.


> It just let's someone look like they have talent while possessing nothing of the sort.

That sounds like a big "just". So it lets people without talent make something that looks like a person with talent made it? Sounds pretty good.


How dare they!!??! /s


I think it's closer to tracing. If the end result of following an AI is capped by being somewhat flawed or the artist not learning certain skills, is that a bad thing? I see it as a limitation instead of a negative. Copying AI art can only get someone so far. The best results will come from the artist making their own decisions.


Who is copying AI art? They are using "AI" to create "art". They are drawing less than primitive shapes, and the AI is turning it into something. The funny thing is, the video is so short and clipped it's farcical. This takes "Draw a circle, draw the rest of the fucking owl" to an entirely different level.


I was thinking of following along to the AI generated output for inspiration. But you're right that that's a major feature.


Man it's just fun to make paintings. I don't need AI everywhere, but since it's impossible to draw stuff with precision using a mouse, this makes sense. Also when you want detailed control over AI images, this might be a lot easier than using text prompts.


Is that something that you could do with an Open AI API call?


What if copilot is down? I wont be able to use MS Paint?


Eh? The whole point of Paint was that it never changed. It's even been copied for it's simplicity, a clone of which is the only painting app I have on my Mac exactly because all it does is the basics. Adding AI to it is a joke.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: