Hacker News new | past | comments | ask | show | jobs | submit login
Photoshop for Text (stephanango.com)
183 points by kepano on Oct 18, 2022 | hide | past | favorite | 116 comments



To me, this article misses the point, but only slightly. If we compare the proposed Photoshop for text and the actual Photoshop for images, those are very different tools in how they work with their respective mediums.

Photoshop doesn’t use AI models to generate new things for the artist, photographer, whoever (it does have generation of course, but not in the same way). A tool like Photoshop for text, in the way described, seems more like a writing aid than a writing tool; something I can give vague commands like “make this paragraph more academic” (whatever that means), and it’ll spit out something approximating the academic style it analyzed. Whereas “auto balance the levels of this image” is much more concrete in what it means, there’s no approximation of “style.”

I feel like Photoshop for text should be an editor that cares more about the structure of stories and who’s in them, with ways to organize big chunks in easy ways, rather than something that generates content for you.


> Photoshop doesn’t use AI models to generate new things for the artist, photographer, whoever

It certainly does have this, such as tools to remove objects and paint over them as if they were not there, automatic sky replacement (replace a cloudy sky with a sunset), super resolution (AI up scaling) and a range of what they call 'Neural Filters'.

You want your low quality boring midday image of some famous bridge to be a high resolution, taken during sunrise without that person riding a bicycle? Photoshop will do it with very little user skill or input.

Some beta features include style transfer, makeup transfer and automatic smile enhancment.


Yeah, but nobody is buying Photoshop for the inpainting features. Photoshop's bread-and-butter is raster graphics tools, and almost all of them function deterministically.


Content aware fill's been in Photoshop since the CS5 days, so at least back to 2019.


Yes, and it sucked for anything more complex than filling sky with more clouds or foliage with more leaves.


Not sure what you mean by "deterministically", but most models are deterministic during inference: given the same input or prompt, they'll generate the same output.


Are they? I thought there was always a kind of “random parameter” or random step midway?

Hope this question is not too dumb!


Usually not, perhaps noise filters, but mostly not. There was amusing case of reversing twirl filter a while ago: https://boingboing.net/2007/10/08/untwirling-photo-of.html


Surely that twirl effect isn't made using AI but just a geometric function, right?


Thanks a lot. I was mistaken.


It depends on the algorithm, but many models can be re-generated if you start with the same seed.


Is it common and/or expected to have control over the seed though?


Yes, if you are fine tuning some other parameters you need to keep the seed the same to be able to have any idea what you're doing.


Thank you! I was mistaken then.


Tense and plurals is one that would that seems like it would be helpful for a computer to keep consistent while you edit, so when you edit the subject to be plural from singular, it tacks on the "s" onto the verb for you. If you had

"Families enjoy this restaurant"

And started typing and changed "Families" to "our family" this "photoshop for text" would change "enjoy" to "enjoys" for you.

Enough little facets like that, I could see being useful.


It's like content-aware text refactoring


Clippy on steroids.


The article is more about user experience than implementation. When you apply an effect in Photoshop you can change all relevant pixels at once. Whereas we don’t have good tools to change all words in a document at once (while retaining the intended meaning)


I get your point, but your comment made me search for a Photoshop Stable Diffusion plug-in. I had no idea.

I’m decent with Photoshop but have not had a need for it lately. This makes me want to fire it back up.

Each video is 1-2 minutes long. Mind blown again.

https://youtu.be/XdctF1uTLBo

https://youtu.be/mn1PV6HqXGU

https://youtu.be/Wo6ZDYFCWTY


Oh that would be cool. Write a paragraph about a red ball, then halfway through, decide it's a blue ball. The editor then highlights all occurrences of the variable and rewrites them for you. Like the "Refactor local variable name" bit in VSCode.


One bif difference between image editing and text editing.

I can see the effect of a change to an image at a glance. Judging whether it is a much worse outcome takes seconds usually.

With wholesale changes to text, seeing if it inadvertently makes something worse takes minutes. It's orders of magnitude slower. That makes the editing process much slower, whilst still fraught with risk.

I am not talking about not trusting AI to get it right. But about a general change that happens not to work. Maybe your academic writing was better of with one narrative paragraph. Maybe you want to change tense but forgot to mark a quote and you are putting false words in someone's mouth. We proofread changes by human editors. AI editors will require the same.


This got me thinking.

An image is 2D in space, 0D in time. Because we perceive 3D/1D, i.e. one dimension above in each aspect, we can see a whole image at once.

We can't see an entire video (2D/1D) at once - we have to move in time. We can't see an entire 3D object at once either - we have to move in space.

Technically, text is just as visual as an image. However, meaning is conveyed by the sequence in which we interpret the symbols, and that sequence is 0D in space, 1D in time.

Maybe beings in an 2D time dimension would be able to parse text instantly?


You can parse text instantly too, just probably has a limit of up to a few words that you can perceive "at once" as you say. Editing a whole 3 pages at once is the same as saying in photoshop you can edit a whole magazine worth of images at once, you also can't.

As with anything the shorter the feedback loop the better. The difference being that an image has an information density which is much higher than text, "image worth 1000 words", and you can make sense of it easier.

If you want to edit lots of text and see if it flows well quickly, some other ways would involve mapping the textual representation to something we can perceive quicker. The same way sometimes systems will have different sound tones for different actions, making a skilled operator able to detect mistakes by "listening" to the UI. If you can represent textual changes in an aggregated visual form, you might get what you describe.

For example if you could make nice grammar produce a nice musical sound when "read" by a computer program, you could potentially assess for correct grammar quicker by listening to the whole thing in much less time than you'd take reading it.


That brings to mind the quadrivium --- the second part of the seven liberal arts:

Arithmetic: quantity.

Geometry: quantity in space.

Music: quantity in time.

Astronomy: quantity in space and time.

Back in my uni days, the student centre featured a painting which was a representation of a musical piece --- something classical, possibly Beethoven's Fifth. It displayed in space (or more accurately, in a plane), what was usually performed over time. The painting was more conceptually interesting than visually appealing, though I like the idea.

Whilst I think you have a point re: images, there are often those which reward a more measured appreciation. A "where's Waldo" type visual puzzle might be a more trivial example but there are images which reveal themselves over time.

There's also what works leave us with. Text ultimately conveys, well, textual or verbal information. Speech is similar. Both can deliver a mood, though that's not essential. Music on the other hand seems to me to be far more emotional. Images in their simplest form are literally iconographic, more representing something than portraying it, though with detail they tend toward the latter. (Plastic arts such as sculpture seem similarly iconographic, and of course, the original icons were often such portrayals.) Video and drama seem to me more related to music than texts, working on moods and emotions. That though is occurring to me as I write this, as well as the realisation that both often rely heavily on a soundtrack. In the case of opera or musicals, to a dominating extent, less so in a straight stage play. And of course, on television, there's the infamous laugh track to guide us in our emotional response to a sitcom, standing in for the immersive live-audience experience.

Back to the suggestion: the idea of being able to autotune a text, so to speak, seems ... possible, now or soon, with advances in AI, GPT3, and the like. But as rocqua said, far harder to take in at a glance as compared with visual or video arts.


watch the feature film 'Arrival' (2016) for an implementation.


Only if the image is small(within the viewspan). If the image was huge, the brain needs to hold context while you finish seeing it, similar to how brain needs to hold context while you finish reading the comprehension to derive meaning. (This is called sphota theory of grammer as given by ancient sanskrit scholar Bhartihari). Reading is zoomed out seeing. (Edit: Abhinavagupt further showed that sound naturally works on this principle, where the meaning of a word is concluded as the sounds finish hitting the eardrums.)


Maybe we can develop number of text classifiers that would give you hints on the effect of the changes without you having to read it. Or maybe even use AI proof readers that would give you an opinion about the new text. It is an interesting field for sure.


It could highlight differences for you. Current models already seem competent enough to take into account a lot of context and not to alter the meaning.


I cannot stand the idea of this. I know it's coming, I know I can't stop it, but it will be a catastrophic loss.

Most people are bad at writing. This will make them "better" at writing, but only in a certain way. I love reading things people write, because it gives me a window into their mind, how they think, who they are at a deep level. That is the joy of reading, and the structure of writing is a big part of that.

Now, a lot of people's writing has become homogenized by the computerization of our world already, but only at a low level: spelling, basic grammar, and so on.

When people have the ability to inpaint whole paragraphs, dreamed from the blob of internet text (which is mostly corporatized, computerized, email-ized, sterile in the way described above) we will lose something essential.

And another problem arises. I send an email asking something, to a coworker or to a friend. They inpainted their response. Did they really understand what I was asking? Did we really communicate at all?

Images, I guess, suffer from the same problem. But images are less interpersonal. They are communication, but not communication like writing is communication. In nearly all circumstances, images are less subtle than words are (the subtlety of visual communication happens irl, where such machines have yet to insert themselves).


A century or so ago as business correspondence became standardised, what emerged were form letters and templates for various correspondence.

I'm not sure when this practice started, and it may have gone by several different terms --- "sample business letters", "model business letters", and "handbook of business letters" are terms I'm coming up with.

I'd consider 1960 to present to be fairly recent but two books from that period are the McGraw-Hill Handbook of Business Letters (<https://www.worldcat.org/title/28181038>) and The Complete Book of Business Letters (<https://www.worldcat.org/title/1975991>).

I'm finding a similarly titled work or collection from 1885 though I'm not sure it's the same concept, Sample business correspondence, 1885:

<https://www.worldcat.org/title/831831066>

What all of these address, however, is the fact that most people aren't good at composing correspondence, and/or that businesses benefit by standardised forms of communications.


> "When people have the ability to inpaint whole paragraphs"

Wow. Using the term inpaint (from AI art apps) illustrates your point with great clarity, at least for me.


"You have been a great employee... <x> .. had to let you go".

Textshop, in-paint the rest with something nice, will ya.. oh and add a friendly sign off.


Holy cow, that's a multi-billion-dollar industry of corporate doublespeak right there.


Read it rote out of a notebook for extra sincerity.


YES! Hugely important point.

Worse yet, not only will the 'in-painting' obscure the writer's meaning, or even whether they even had meaning, it will ALSO render that meaning more generic, and eliminate the most information-dense surprising bits.

All of these tools are merely synthesizing new text/images/code/etc. from millions of existing examples. When there is any doubt about the intent or the output, it fills in the MOST EXPECTED output. It does NOT fill in the possibly intended but unique image, phrase, code that would be the output of a unique insight. Real brilliance will be simply lost in the sauce.

Ugh. I won't be using these, and I'll shun those who do.


Right - in terms of information theory, automatically-generated infilled data communicates zero bytes of additional information. It could be compressed away entirely losslessly and I could instead generate it myself. There is no need to waste bandwidth sharing it.

Communication only has value if it is surprising. That much has been known since Shannon.


Language models don't pick the top choice all the time, and even picking the top choice is some amount of information, more precisely -p_token*log(p_token). They return a distribution we sample from. Sampling adds surprise and detail that was not originally there. Even more, if you have some sort of verification going - a second model to rate the outputs of the first, a math verifier or code executer when available - then language models can surprise us with interesting solutions to our tasks.


That is great when we are looking for interesting solutions. for example, it looks like GPT3-type systems might be very good search engines for obscure topics, getting them to finish paragraphs with their enormous search base. I've found a few tantalizing bits that I couldn't find on Google or DDG searches.

But, when they are trying to express what WE are saying, it looks like a very lossy solution at best.

I've had a deposition taken with an "AI" stenographer, and it was horrific, frequently reversing the meaning of sentences I said, or replacing an uncommon name with a common name (e.g., "John Kemeny" replaced with "Jack Kennedy"). Of course the transcript LOOKS great, it doesn't have any of the "(unintellegibile)" notations of a human transcript. It also does not go back at break points and ask for proper spellings of names, addresses, etc. like a human transcriber.

This is in the context of a legal trial with consequences, and I'm horrified to see this kind of crap passing for usable products, and here we are looking to foist it off on the general public as writing tools. We're forking doomed by smart idiots looking to make a quick buck with novel "tools".


I don't know if this is quite correct. The inpainted text has zero bits of information only if you already have on your end the exact, very large model they would have used. If you don't, and they only sent their prompt rather than the full text, you're in for a bit of a download.

Put another way, their prompt is arguably a pointer to data buried in that model. You're raising the question of whether they should only send the pointer, or just cache the query result, in a sense, and save you the trouble of looking it up.


EXACTLY what I was pointing to, more clearly expounded - thx!


It's interesting to imagine what the interaction model for this will be -- is there a sense in which this auto-generation will be like collaborating, and so it just changes what we think of to constitute the process of creation?

And maybe that's not as bad as it seems right this moment. Once upon a time great painters actually made their own paints. These days we wouldn't think about that skill as in the necessary catalogue of an artist, and in a bunch of ways -- most ways -- we're the better for it. Perhaps something similar will unfold here.


You know, something interesting is happening to me here.

I've never been upset about Stable Diffusion, because as you say, once upon a time, you had to physically paint an image to be visually creative. Never posted a comment about SD with "I cannot stand the idea of this. I know it's coming, I know I can't stop it, but it will be a catastrophic loss." Now, at the suggestion that it might happen for writing, all the sudden it seems wrong! Someone else noticed this in the other thread about Copilot: HN is not upset about Stable Diffusion, but seems to frequently be upset about Copilot! I think you may be right actually -- calm down, it's just a tool.

I guess I want to revise my prior comment. I don't just want to talk to a machine. Just like I don't just want to see Stable Diffused images. The human part is essential, it's the heart of the thing. My intuition is that you're supposed to augment the human part, not replace it.


I totally get your (original) point -- I'm an (amatuer) fiction writer trying to make myself feel better, a bit. But everything evolves. Think of how Magnus Carlson trains for and plays chess, vs how it was historically done. No point trying to resist the forces loose in the world, be one of the first to embrace them. I'm curious what this means right this moment for an artist upon the advent of SD and the like.


The difference is that images are expected to support the message. They're art, figures, illustrations. If I send someone a generated image, I still picked the image. It's still my message. If I send someone generated words, I did not pick the words. It's no longer my message. I'm tasking a machine with choosing my thoughts. It's no longer augmentation.


Or maybe HN is right about the tool where we have the most knowledge, and artists are right about the tool that does things they understand.


I was thinking HN is afraid of tools that cheapen it's own work. Whilst artists are afraid of tools that cheapen their work.


Yes, that's the obvious cynical interpretation. But our beliefs aligning with our self interest doesn't automatically make us wrong, especially in areas where we have expert knowledge. It could just be that our interest aligns with the general interest.


It will add a new level of suspense. Did they use passive voice, because they are trying to distance themselves from the decision or did algorithm thought that was the best choice here.

I agree with you overall. It will likely have similar impact to grammarly and similar services.


Images have much less of this problem. They are not the raw result of cognition. So changing images doesn't change someone's voice. Instead images are always interprative. The camera doesn't capture what the eye sees. Often it takes a lot more interpretation than what a camera does to make a JPEG to end up with something like what the eye saw. In that process you can also change an image into something your eye would enjoy seeing.

Alternatively you can lie and say 'this is what I saw' about something that is not even close. Once images are used to promote falsehoods, your worry becomes true. But much editing, either for accuracy or for beauty, has no such harmful effects.


> Most people are bad at writing.

True, but most people don’t write. ~Everybody reads, and people are, on the whole, supremely atrocious at it.

The problem of writing getting “easier” is trivial compared to the problem of bad reading comprehension.


People go jogging and ride bicycles, even though cars exist, because they enjoy it and to tone their bodies. It's more effort, but it does things for them the car does not do, including provide thrills or promote well-being.

That is to say, these tools will exist and they will be used. But people will still write and make art - often perhaps using these tools in some way - because they'll have the time and find it rewarding to do so. And others will also assign some value to this. We might briefly get addicted, and then have a society-wide discourse on what healthy use is, similar to social media.

Isn't it compelling to wonder what humanity will decide to do with technology when technology were to be limitless? As in, what essentially human choices will we make in what, when and how to use technology? (Singularity-themed scifi tries to provide some answers since the 90s.)


Yes, and I mean, while you could argue that jogging or riding bicycles is a form of self expression, I don't consider them so the same way I consider writing to be so. Cars help you get from point A to point B in a way that is qualitatively different than walking; but both are ontologically different from writing, because writing is not a thing that a single person does -- it's an act of communication. Writing and reading, and the dynamic of both playing out among people, is related to the formation of the self for both parties in the deepest and most crucial way.*

Easier put: the joy of writing and the joy of reading are strongly linked. I don't write things with the expectation that nobody will read them, and I don't read things with the expectation that nobody wrote them. Or, in this case, a machine.

And I think a "writing photoshop" will be much harder to detect than an image Photoshop. Did I picture the author as a white collar Yale graduate because they are one, or because the machine told them that was best?

* For example: I would sooner give up my ability to walk than give up my ability to communicate myself to others; that's the difference I'm talking about.


What do you think about text transformations like the ones described in the essay? Not inpainting but tools that could help “good” writers, like changing narration in a chapter from first person to third person in one step?


Changing perspective is not as easy as changing personal pronouns. It also affects the knowledge of a character as well. First person perspective has insights of the main character's inner thoughts and feelings, since they are the narrator, third person has not. Then there is third person limited and third person omniscient, which require totally different approaches and need to have different twists and turns to be belieavable. Changing the pov basically requires total rewrite of the story.


I am worried about something else

The authors of most shared articles and most comments are not even passing a “turing test”. In the vast majority of cases the readers just consume the data.

With GPT-3 we can already make “helpful and constructive” seeming comments that 9 out of 10 times may even be correct and normal. But 1 out of 10 times be kind of crappy. Aby organization with an agenda can start spinning up bots for Twitter channels, Telegram channels, HN usernames and so on, and amass karma, followers, members. In short, we are already past this point: https://xkcd.com/810/

And the scary thing is that, after they have amassed all this social capital, they can start moving the conversation in whatever directions the shadowy organization wants. The bots will be implacable and unconvinced by any arguments to the contrary… instead they can methodically gang up on their opponents and pit them agaisnt each other or get them deplatformed or marginalized, and through repetition these botnet swarms can get “exeedingly good at it”. Literally all human discussion — political, religious, philosophical etc. - could be subverted in this way. Just with bots trained on a corpus of existing text on the web.

In fact, the amount of content on the Internet written by humans could become vanishingly small by 2030, and the social capital — and soon, financial capital — of bots (and bot-owning organizations) will dwarf all the social capital and financial capital of humans. Services will no longer be able to tell the difference between the two, and even close-knit online societies like this one may start to prefer bots to humans, because they are impeccably well-behaved etc.

I am not saying we have to invent AGI or sexbots to do this. Nefarious organizations can already create sleeper bot accounts in all services, using GPT-4.

Imagine being systematically downvoted every time you post something against the bot swarm’s agenda. The bots can recognize if what you wrote is undermining their agenda, even if they do have a few false positives. They can also easily figure out your friends using network analysis and can gradually infiltrate your group and get you ostracized or get the group to disband. Because online, when no one knows if you’re a bot… the botswarms will be able to “beat everyone in the game” of conversation.

https://en.m.wikipedia.org/wiki/On_the_Internet,_nobody_know...


Another reason the future of social media is invite-only. People obviously aren't invisible BS detectors, but if things really get that bad they're not just going to take it indefinitely. They'll notice and adjust.


By the way, if you are worried, it isn't true that this is "coming." A form of it will come but like self-driving cars, it will be more hype than substance.


Inpainting entire paragraphs also will remove the heuristic of “too stupid to merit a reply” from their disorganized sentences.


The piece struck me as a dystopian fantasy. And like some such fantasies written in the past, I agree: it’s coming.


This is not comparable.

Photoshop lets a conscious hand change the medium. Now we have tools that generate the message out of thin air. We're about to outsource human expression to machines.

And why? To automate things we don't want to write about, and automate reading them. I picture a future of machines talking to each other in increasingly clever human language while humans on each end just get the straight talk: "Did you put the cover on the TPS reports?" "Yes, I did. I got the memo."

This also ushers in a terrifying era of communication. "I'm so sorry for your loss", says a machine on behalf of someone, using all the right words but not feeling any of them. "Thank you" replies another, on behalf of someone else.


Your last paragraph is already close to how birthday greetings work on Facebook


I’m confused. What’s the difference between “I’m sorry for your loss” written by a spell checker versus a human?

The effect is the same on the reader. This looks hand-wavy to me.


I expanded my thoughts into a short post, if you care to read it: https://nicolasbouliane.com/blog/machine-talk

Spelling and grammar checkers are amazing. It becomes a problem when we outsource expressing ourselves to a machine. If kind, thoughtful words come from an artificial brain, expressing sympathies just becomes a box-ticking exercise.

We mocked "press F to pay respects", but we've already built it in real life. "Congratulate John about his new job" (LinkedIn), "Wish Jane a happy birthday" (Facebook). Having the machine write the message for us is the next step, and I find it terrifying. What value is there to a gesture if it asks nothing of us? No thoughtfulness, hell, no thought.

At this point, you might as well go all-in. Attach a pen to a CNC, connect AI to it, and offer heartfelt handwritten notes as a service. Offer to connect it to social media and auto-detect heartfelt handwritten letter opportunities. Have that letter in the mail while the corpse is still warm.


I'll add one to your facebook/linkedin example: Hallmark cards.


Pretty much, but without the effort of going out, then buying, choosing and signing one.


I think if you'd like a sample of what the author of the post is referring to, go checkout the Hemingway text editor[1]. It's very simple, and as far as I know makes no use of the advanced language models that have been in vogue for the past few years. What's really opening the door is thinking of these language models as hyper-efficient compression. With Stable diffusion, we can pack enormous amounts of artistic tedium into simple text prompts, with the weights getting small enough to fit on a flash drive[3].

Grammarly[2] also have a desktop app that appears to offer editing advice, although generally I think they're focused on grammatical correctness over anything else.

The question I'm left asking myself at the end of the article is, to what end do we need to edit text like Photoshop? Part of me sees this "Photoshop for Text" as something that would be akin to "No Code" tech stacks. Good No-Code/Low-Code solutions usually allow to build specific classes of products (websites, 3D assets) in ways that are faster than the status quo. But anyone who spends enough time in a No Code stack eventually hits the wall where the people who designed the tool had to sacrifice the flexibility of text for the convenience of a GUI.

I yearn for the day that we can set a language model loose on something like the NCBI database or arXiv and have it point out open problems in the field to new PhD students. Or have it figure out whether my ablation studies make sense. Or an AI that can generate math proofs for me. A lot of this linked to model interpretability and understanding, but I think the work that DeepMind is doing is showing that there might be a way to utilize this stuff in expert domains sooner than we think.

1: https://hemingwayapp.com/

2: https://www.grammarly.com/

3: https://andys.page/posts/how-to-draw/#


Autocomplete in Gmail has been getting more and more robust. At first it only suggested grammatical correction in words. Later it started giving advice on better sentence structure and then it just started suggesting whole sentences. Each of those steps I loved. More often then not it just says what I wanted to say, with less button pushes, and without all of the mistakes that I make as a non-native english speaker. I sound smarter in gmail and I like it.

If I didn't have this experience, then giving the machine any input on what I write would seem crazy to me. I would think that language is too personal, too contextual, that I need control over every word and every letter.

But now I love writing with the help of the machine. It still feels like me speaking, the machine doesn't add any extra context that I don't approve of. It really feels like the messages are still mine, and the autocomplete just helps me extract my thoughts from my head in a better and more effective way.


Imagine how your brain is atrophying.

  1. (of body tissue or an organ) waste away, especially as a result of the degeneration of cells, or become vestigial during evolution.

  "without exercise, the muscles will atrophy"


I don't think it's atrophying. We said the same thing about spell checkers, and my spelling seemingly hasn't suffered.

I'm not a native speaker. It's nice to have training wheels sometimes, even for a language I'm familiar with.


These tools will only become more pervasive, so why does it matter if that part of the brain "atrophies"? I'm sure people had the same worries about mental math during the rise of the calculator


Just imagine how flaccid the math part of most of our brains must be. Quick! What's 67 * 42?


There's an area where we've been way, way ahead of the curve for 70 years now.

It's sound.

Long before GPT, image synthesis, video deep-fakes and these imagined "Photoshop for words", we had sound synthesis.

That's a very useful marker. Because we can read the things people were saying about the future of sound, their hopes, fears and predictions as far back as the 1960s when Robert Moog and Wendy Carlos were patching modular synths.

Most of the fears and predictions turned out to be rubbish. Musicians, orchestras and live events didn't get replaced. Instead we invented synth-pop bands.

And many of the things technologists imagined people would want to do, turned out to be way off the mark. To my knowledge Isao Tomita was the only talented artist to "replace an orchestra" with synthesisers. Most people who used the tools "as intended" were artless, and forgettable. Everyone else ran riot in the parameter space - messing and subverting the technology to get the weirdest punk-ass squelches and wobbles possible.

So I always have to look on these "How synthetic X is going to make the real X obsolete" with a pinch of salt.


I think there's difference between synthesizers, which I would compare to something like different paints and brushes and papers, and then Photoshop and Corel draw and whatnot; and where image generators are heading which is more analogous to automatic music generation.

I will also put it forward that for reasons I'm ignorant of, eye seems to be more readily fooled than ear. 20 years ago with crappy tools all I had to do was smudge and clone a hydrant in a photo and it would effectively be gone for 99% of observers. But similarly primitive ways of trying to change or alter a sound file were immediatelly noticed by all listeners.


> So I always have to look on these "How synthetic X is going to make the real X obsolete" with a pinch of salt.

From your comment, it seems that the linked article is far more fear-mongering than it is; I gathered a mostly optimistic tone from it.

The final paragraph -

> While some of these capabilities sound a bit scary at first, they will eventually become as mundane as “desaturate”, “Gaussian blur” or any regular image filter, and unlock new creative potential.


The analogy to sound synthesis is strange to me. As both a musician whose primary instruments are acoustic guitars and computer software, and as a writer of poetry, prose and essays. I can’t speak to fears of synthesis obsoleting other music performance, mostly what I’ve heard expressed is snobbish purity that it’s not “real”—either real talent, real music, or real performance. But at the end of the day, synthetic instruments are still instruments. Fundamentally a primitive like a string or a reed attached to something which lets you make sound from it. A “simple machine” in the mechanics sense. They’re not composition synthesis, which is definitely an area of exploration now, and which is more akin to image and video synthesis.

Photoshop conceptually sits somewhere between these two extremes of electronically-aided creation, but much closer to sound synthesis, than what the author is hypothesizing. I can’t even think of a text analogy for sound synthesis as you’ve described. The least nonsensical imagined example I can think of is “this word does not exist” (as in word synthesis), which would be more valuable as a game or a gag than as a tool.


While I don't necessarily disagree with the sentiment, I think the antithesis to your argument is that where synthesizers "failed", modern sampling technology has crept in, especially in scoring. It's less common in the feature film industry, but a significant portion of TV and game scores are produced entirely with virtual instruments, or at least have a very heavy sampling component (e.g. recorded solo instrument against a synthetic backing track). Most people can't tell, or don't care - as will most likely be the case with AI-generated or AI-assisted content as it moves forward into various domains.


Some of the stuff he describes like "summarizing text" and such is not like the core Photoshop (or all of Photoshop as it was until very recently, like 5-6 years ago), but more like AI stuff - see: #.

For core text processing that would be similar to how a bitmap editor processes text (fitering, replacement, conversion, etc.), there are some tools, like the aptly named:

https://textsoap.com/mac/

# Photoshop has since added some AI stuff, like for object removal and such, but its main functionality wasn't and is still not about that).


This idea is really interesting. I’ve been using Copilot since it became publicly available for personal side projects. Recently, I ran out of google drive space and had to use something other than google docs to write my essay for school. I booted up a text document and started writing, and then copilot started writing with me. I ran it through a plagiarism checker, and it turned out to be clean, which surprised me.

Can’t say that what it generated was particularly insightful, however it was helpful to reach an obituary word count.


> it was helpful to reach an obituary word count.

I'm pretty sure you meant "obligatory" word count. But "obituary" is an awesome accident. Like Copilot is waiting, happy to sum up your life in a tidy paragraph, when the time comes.


Hahaha, you got me there! I’ll be caught one day and pay for the hours I’ve saved. At the end of the day if I can’t read what I wrote thats on me.


I wish my text editor had a field where I could enter a snippet of python code which would be fed the opened/selected text to process it and replace it with its output. I often do a lot of logically and sequentially complex search-and-replace which becomes a lot of manual labor. As a simple example" replace all the keyboard quotes with “open-close quote pairs” while making sure the final dot is placed outside rather than “inside the quote.”


In Vim you can already push marked text through any shell command, including python scripts that read from stdin. You could probably make this work with a few lines and a keybinding.

I rarely need more than the builtin macro system though, as macros can basically do anything including regex-based search it can do any formatting or change as long as the steps stay the same each iteration.


Vim is not my type yet (perhaps later as I grow up for it), I'd prefer something like Sublime, Kate or Gedit.


would shell commands not be a good way to do that ?


I do it this way now - push a file through a script in the command line. But I would be more happy with a handy gui-mouse-click-copy-paste solution. Ideally I'd like a panel in something Sublime/Gedit-like where I would just paste a python function, click run and have it applied without even saving the file to the hard drive.


Having a live preview of the diff before any changes were applied (like vscode and most editors do when using find/replace) would make this a lot more useful.


The essaie focuses on image and I can’t help making the parallel with digital sound editing tools, which are [in my very uneducated opinion] more numerous and diverse.


Some people are better at writing than editing, some people are better at editing than writing.

I find it necessary to heavily edit when writing - up to, and including, this comment. I don’t mind it, and I don’t mind doing it to other people’s writing either, so this new way of doing things appeals to me.

I’d be interested to hear what anyone who’s able one-shot their writing thinks of this. I feel like that type of person may have less of a desire for this kind of stuff?


> I’d be interested to hear what anyone who’s able one-shot their writing thinks of this. I feel like that type of person may have less of a desire for this kind of stuff?

Yep. I don't use spell-checking and I don't use auto-complete bars on phone keyboards, largely because I feel it keeps my skills sharp. I would use tools like the article describes when I deem them to have become necessary to stay competitive at what I do, but at the moment I don't feel it's clear-cut whether their use would promote or harm my faculties to write.


That makes sense. What the long-term effects of such 'performance enhancing' tools might mean for how its users think, or externalise those thoughts, is hard to call.

I also wonder what it means if everyone is farming out parts of their intellect to similar models, which might be limited pathologically, by training data or enforcement.


Text style transfer or paraphrasing/rewriting without spending big bucks on fine-tuning a language model would be nice.


I often call vim the "photoshop of text". command mode has a rigidity like an image , but you can cut copy and paste parts of it.

AI text generations will be more like games than like photoshop. Photoshop is 2 step: do and undo, but editing text is more like a continuous stream of refinements.


Yes, I think macros and regex-based Vim features are more fitting for the comparison to Photoshop than AI-powered features which are a new addition to image/video editing as well. Piping a selection through a shell command or regex based replace is like applying a filter to a selected set of pixels.


Do people really want to read machine generated fluff?


I fed your question to GPT-3 with the instruction to "answer this question in flowery, Shakespearean prose" simply because I assumed it would fail. The result?

If by "machine generated fluff" thou dost refer to the banal, trite and insipid content that oft pollutes the Interwebs, then nay, I believe not that people desire to read such.


looks like a lot of hacker news comments


Ha! I disagree. I find it both kinder and more clear in its reasoning.


iI have read stuff generated by LaMDA and it is really good at generating synthesis of any topic, I like more what it generates than what I read on wikipedia


Flawed premise IMHO. Images have inherently more entropy than text (all else being equal). More dimensions to "play around with" is what made Photoshop useful and popular.

The "thought-processing" angle is cool though.


The paper at COLING 2022 that I wrote, titled "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" included a GUI constrained text generation studio that I market as being "Like Photoshop but for text"

https://github.com/Hellisotherpeople/Constrained-Text-Genera...


The difference is our brains can immediately judge the result of an image transform, but it takes a re-read to see if the text transform hasn't done something bad, like lost the meaning of the text.

A blurry image is less likely to get you into trouble than saying the wrong words, so it is critical to validate manually what it says.

That said, such tools could be useful for recommendations on how to rephrase things.


Well, discrete images are an approximation/sampling of some kind of continuous process in nature. We therefore have a whole arsenal of tools to process this signal, ie photoshop.

Text? I'm not sure what kind of model we have here. Only now that we have word embedding and other nlp models can we hope to do the same kind of thing we do with text as we do with images?


There are tons of filters people need today. There are no longer secretaries proofreading or lawyers reviewing given the get it done in an hour demands put on people. I would pay a ton for something that automatically fixes know what Twitter is angry about today so I don’t need to keep up on trends, be able to make sure language is gender neutral an unlikely to offend special interests, warn of internationally sensitive phrasing, reword anything that could be hate speech, warn when you are likely going to tick off the SEC. People are far too emotionally invested in what others say and it would be great if you could just send public text through a few filters in the name of not publicly embarrassing yourself. I should imaging a popup saying “that is dumb, illogical, and going to violate a TOS unless you change it like this” will shortly be a common popup instead of flags for review.

The surprise is not that it is coming, but that it hasn’t been a thing for years given how flammable the internet is.


I hope we get some updates in Adobe the address the issue of data merging and downsizing text based on user input to fit a field box. It is incredible that simple things like that are not easy to do natively in the app.


How interesting. The more things change, the more they stay the same.

"Photoshop for Text" is, arguably, called "Typesetting". Aldus Pagemaker and Quark XPress were both quite popular at this task.


>The filters will be … as good as if you wrote the text yourself.

Always the promise


This is directly not on topic but reminds me that for most photo-editing Photoshop is overkill. Photoshop is more then adjusting channels and blurring/sharpening few areas.


Photoshop for text exists for decades. It's called a desktop publishing app. Those were all the rage in the 80s, from Pagemaker to QuarkXpress and even Microsoft Publisher.


It's already there - as a Google docs, Wikis and other forms of collaborative editing. This is what makes text great - feedback from others and maybe even their input.


Collaborative editing mostly works in very specific ways. I do not work with editors in a real-time manner for the most part.


Isn't that Canva?


I feel as the person who wrote this never worked in publishing... As an ex-editor now working in IT I find this post a bit... ludicrous? Anyone else feels that or am I missing something here?


> Up until now, text editors have been focused on input. The next evolution of text editors will make it easy to alter

Vim can do this, not in the sence you putting into the article but at least it does it without requiring using anything else except 3 rows of keyboard.

> Text filters will allow you to paraphrase text, so that you can switch easily between styles of prose: literary, technical, journalistic, legal, and more.

Pfff. Styles of prose are: trolling, documenting, cat-talking, legal and just making a list of something. Trolling cannot be augmented, documenting feature begs of some connections to reality, proper cat-talking requires throwing a lot of synonyms really fast, legal is kind of Java programming when you type one line and get 40, and augmenting lists might be done with either md-style but without requiring to draw every symbol of that ascii tables or excel-style but without gui.


> Vim can do this, not in the sence you putting into the article

So in what sense? Some other irrelevant sense?


What do you mean by “cat-talking” here? Bing doesn’t yield much.


You don't want Bing. You want Bengal




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: