To me, this article misses the point, but only slightly. If we compare the proposed Photoshop for text and the actual Photoshop for images, those are very different tools in how they work with their respective mediums.
Photoshop doesn’t use AI models to generate new things for the artist, photographer, whoever (it does have generation of course, but not in the same way). A tool like Photoshop for text, in the way described, seems more like a writing aid than a writing tool; something I can give vague commands like “make this paragraph more academic” (whatever that means), and it’ll spit out something approximating the academic style it analyzed. Whereas “auto balance the levels of this image” is much more concrete in what it means, there’s no approximation of “style.”
I feel like Photoshop for text should be an editor that cares more about the structure of stories and who’s in them, with ways to organize big chunks in easy ways, rather than something that generates content for you.
> Photoshop doesn’t use AI models to generate new things for the artist, photographer, whoever
It certainly does have this, such as tools to remove objects and paint over them as if they were not there, automatic sky replacement (replace a cloudy sky with a sunset), super resolution (AI up scaling) and a range of what they call 'Neural Filters'.
You want your low quality boring midday image of some famous bridge to be a high resolution, taken during sunrise without that person riding a bicycle? Photoshop will do it with very little user skill or input.
Some beta features include style transfer, makeup transfer and automatic smile enhancment.
Yeah, but nobody is buying Photoshop for the inpainting features. Photoshop's bread-and-butter is raster graphics tools, and almost all of them function deterministically.
Not sure what you mean by "deterministically", but most models are deterministic during inference: given the same input or prompt, they'll generate the same output.
Tense and plurals is one that would that seems like it would be helpful for a computer to keep consistent while you edit, so when you edit the subject to be plural from singular, it tacks on the "s" onto the verb for you. If you had
"Families enjoy this restaurant"
And started typing and changed "Families" to "our family" this "photoshop for text" would change "enjoy" to "enjoys" for you.
Enough little facets like that, I could see being useful.
The article is more about user experience than implementation. When you apply an effect in Photoshop you can change all relevant pixels at once. Whereas we don’t have good tools to change all words in a document at once (while retaining the intended meaning)
Oh that would be cool. Write a paragraph about a red ball, then halfway through, decide it's a blue ball. The editor then highlights all occurrences of the variable and rewrites them for you. Like the "Refactor local variable name" bit in VSCode.
One bif difference between image editing and text editing.
I can see the effect of a change to an image at a glance. Judging whether it is a much worse outcome takes seconds usually.
With wholesale changes to text, seeing if it inadvertently makes something worse takes minutes. It's orders of magnitude slower. That makes the editing process much slower, whilst still fraught with risk.
I am not talking about not trusting AI to get it right. But about a general change that happens not to work. Maybe your academic writing was better of with one narrative paragraph. Maybe you want to change tense but forgot to mark a quote and you are putting false words in someone's mouth.
We proofread changes by human editors. AI editors will require the same.
An image is 2D in space, 0D in time. Because we perceive 3D/1D, i.e. one dimension above in each aspect, we can see a whole image at once.
We can't see an entire video (2D/1D) at once - we have to move in time. We can't see an entire 3D object at once either - we have to move in space.
Technically, text is just as visual as an image. However, meaning is conveyed by the sequence in which we interpret the symbols, and that sequence is 0D in space, 1D in time.
Maybe beings in an 2D time dimension would be able to parse text instantly?
You can parse text instantly too, just probably has a limit of up to a few words that you can perceive "at once" as you say. Editing a whole 3 pages at once is the same as saying in photoshop you can edit a whole magazine worth of images at once, you also can't.
As with anything the shorter the feedback loop the better. The difference being that an image has an information density which is much higher than text, "image worth 1000 words", and you can make sense of it easier.
If you want to edit lots of text and see if it flows well quickly, some other ways would involve mapping the textual representation to something we can perceive quicker. The same way sometimes systems will have different sound tones for different actions, making a skilled operator able to detect mistakes by "listening" to the UI. If you can represent textual changes in an aggregated visual form, you might get what you describe.
For example if you could make nice grammar produce a nice musical sound when "read" by a computer program, you could potentially assess for correct grammar quicker by listening to the whole thing in much less time than you'd take reading it.
That brings to mind the quadrivium --- the second part of the seven liberal arts:
Arithmetic: quantity.
Geometry: quantity in space.
Music: quantity in time.
Astronomy: quantity in space and time.
Back in my uni days, the student centre featured a painting which was a representation of a musical piece --- something classical, possibly Beethoven's Fifth. It displayed in space (or more accurately, in a plane), what was usually performed over time. The painting was more conceptually interesting than visually appealing, though I like the idea.
Whilst I think you have a point re: images, there are often those which reward a more measured appreciation. A "where's Waldo" type visual puzzle might be a more trivial example but there are images which reveal themselves over time.
There's also what works leave us with. Text ultimately conveys, well, textual or verbal information. Speech is similar. Both can deliver a mood, though that's not essential. Music on the other hand seems to me to be far more emotional. Images in their simplest form are literally iconographic, more representing something than portraying it, though with detail they tend toward the latter. (Plastic arts such as sculpture seem similarly iconographic, and of course, the original icons were often such portrayals.) Video and drama seem to me more related to music than texts, working on moods and emotions. That though is occurring to me as I write this, as well as the realisation that both often rely heavily on a soundtrack. In the case of opera or musicals, to a dominating extent, less so in a straight stage play. And of course, on television, there's the infamous laugh track to guide us in our emotional response to a sitcom, standing in for the immersive live-audience experience.
Back to the suggestion: the idea of being able to autotune a text, so to speak, seems ... possible, now or soon, with advances in AI, GPT3, and the like. But as rocqua said, far harder to take in at a glance as compared with visual or video arts.
Only if the image is small(within the viewspan). If the image was huge, the brain needs to hold context while you finish seeing it, similar to how brain needs to hold context while you finish reading the comprehension to derive meaning. (This is called sphota theory of grammer as given by ancient sanskrit scholar Bhartihari). Reading is zoomed out seeing. (Edit: Abhinavagupt further showed that sound naturally works on this principle, where the meaning of a word is concluded as the sounds finish hitting the eardrums.)
Maybe we can develop number of text classifiers that would give you hints on the effect of the changes without you having to read it. Or maybe even use AI proof readers that would give you an opinion about the new text. It is an interesting field for sure.
It could highlight differences for you. Current models already seem competent enough to take into account a lot of context and not to alter the meaning.
I cannot stand the idea of this. I know it's coming, I know I can't stop it, but it will be a catastrophic loss.
Most people are bad at writing. This will make them "better" at writing, but only in a certain way. I love reading things people write, because it gives me a window into their mind, how they think, who they are at a deep level. That is the joy of reading, and the structure of writing is a big part of that.
Now, a lot of people's writing has become homogenized by the computerization of our world already, but only at a low level: spelling, basic grammar, and so on.
When people have the ability to inpaint whole paragraphs, dreamed from the blob of internet text (which is mostly corporatized, computerized, email-ized, sterile in the way described above) we will lose something essential.
And another problem arises. I send an email asking something, to a coworker or to a friend. They inpainted their response. Did they really understand what I was asking? Did we really communicate at all?
Images, I guess, suffer from the same problem. But images are less interpersonal. They are communication, but not communication like writing is communication. In nearly all circumstances, images are less subtle than words are (the subtlety of visual communication happens irl, where such machines have yet to insert themselves).
A century or so ago as business correspondence became standardised, what emerged were form letters and templates for various correspondence.
I'm not sure when this practice started, and it may have gone by several different terms --- "sample business letters", "model business letters", and "handbook of business letters" are terms I'm coming up with.
What all of these address, however, is the fact that most people aren't good at composing correspondence, and/or that businesses benefit by standardised forms of communications.
Worse yet, not only will the 'in-painting' obscure the writer's meaning, or even whether they even had meaning, it will ALSO render that meaning more generic, and eliminate the most information-dense surprising bits.
All of these tools are merely synthesizing new text/images/code/etc. from millions of existing examples. When there is any doubt about the intent or the output, it fills in the MOST EXPECTED output. It does NOT fill in the possibly intended but unique image, phrase, code that would be the output of a unique insight. Real brilliance will be simply lost in the sauce.
Ugh. I won't be using these, and I'll shun those who do.
Right - in terms of information theory, automatically-generated infilled data communicates zero bytes of additional information. It could be compressed away entirely losslessly and I could instead generate it myself. There is no need to waste bandwidth sharing it.
Communication only has value if it is surprising. That much has been known since Shannon.
Language models don't pick the top choice all the time, and even picking the top choice is some amount of information, more precisely -p_token*log(p_token). They return a distribution we sample from. Sampling adds surprise and detail that was not originally there. Even more, if you have some sort of verification going - a second model to rate the outputs of the first, a math verifier or code executer when available - then language models can surprise us with interesting solutions to our tasks.
That is great when we are looking for interesting solutions. for example, it looks like GPT3-type systems might be very good search engines for obscure topics, getting them to finish paragraphs with their enormous search base. I've found a few tantalizing bits that I couldn't find on Google or DDG searches.
But, when they are trying to express what WE are saying, it looks like a very lossy solution at best.
I've had a deposition taken with an "AI" stenographer, and it was horrific, frequently reversing the meaning of sentences I said, or replacing an uncommon name with a common name (e.g., "John Kemeny" replaced with "Jack Kennedy"). Of course the transcript LOOKS great, it doesn't have any of the "(unintellegibile)" notations of a human transcript. It also does not go back at break points and ask for proper spellings of names, addresses, etc. like a human transcriber.
This is in the context of a legal trial with consequences, and I'm horrified to see this kind of crap passing for usable products, and here we are looking to foist it off on the general public as writing tools. We're forking doomed by smart idiots looking to make a quick buck with novel "tools".
I don't know if this is quite correct. The inpainted text has zero bits of information only if you already have on your end the exact, very large model they would have used. If you don't, and they only sent their prompt rather than the full text, you're in for a bit of a download.
Put another way, their prompt is arguably a pointer to data buried in that model. You're raising the question of whether they should only send the pointer, or just cache the query result, in a sense, and save you the trouble of looking it up.
It's interesting to imagine what the interaction model for this will be -- is there a sense in which this auto-generation will be like collaborating, and so it just changes what we think of to constitute the process of creation?
And maybe that's not as bad as it seems right this moment. Once upon a time great painters actually made their own paints. These days we wouldn't think about that skill as in the necessary catalogue of an artist, and in a bunch of ways -- most ways -- we're the better for it. Perhaps something similar will unfold here.
You know, something interesting is happening to me here.
I've never been upset about Stable Diffusion, because as you say, once upon a time, you had to physically paint an image to be visually creative. Never posted a comment about SD with "I cannot stand the idea of this. I know it's coming, I know I can't stop it, but it will be a catastrophic loss." Now, at the suggestion that it might happen for writing, all the sudden it seems wrong! Someone else noticed this in the other thread about Copilot: HN is not upset about Stable Diffusion, but seems to frequently be upset about Copilot! I think you may be right actually -- calm down, it's just a tool.
I guess I want to revise my prior comment. I don't just want to talk to a machine. Just like I don't just want to see Stable Diffused images. The human part is essential, it's the heart of the thing. My intuition is that you're supposed to augment the human part, not replace it.
I totally get your (original) point -- I'm an (amatuer) fiction writer trying to make myself feel better, a bit. But everything evolves. Think of how Magnus Carlson trains for and plays chess, vs how it was historically done. No point trying to resist the forces loose in the world, be one of the first to embrace them. I'm curious what this means right this moment for an artist upon the advent of SD and the like.
The difference is that images are expected to support the message. They're art, figures, illustrations. If I send someone a generated image, I still picked the image. It's still my message. If I send someone generated words, I did not pick the words. It's no longer my message. I'm tasking a machine with choosing my thoughts. It's no longer augmentation.
Yes, that's the obvious cynical interpretation. But our beliefs aligning with our self interest doesn't automatically make us wrong, especially in areas where we have expert knowledge. It could just be that our interest aligns with the general interest.
It will add a new level of suspense. Did they use passive voice, because they are trying to distance themselves from the decision or did algorithm thought that was the best choice here.
I agree with you overall. It will likely have similar impact to grammarly and similar services.
Images have much less of this problem. They are not the raw result of cognition. So changing images doesn't change someone's voice.
Instead images are always interprative. The camera doesn't capture what the eye sees. Often it takes a lot more interpretation than what a camera does to make a JPEG to end up with something like what the eye saw. In that process you can also change an image into something your eye would enjoy seeing.
Alternatively you can lie and say 'this is what I saw' about something that is not even close. Once images are used to promote falsehoods, your worry becomes true. But much editing, either for accuracy or for beauty, has no such harmful effects.
People go jogging and ride bicycles, even though cars exist, because they enjoy it and to tone their bodies. It's more effort, but it does things for them the car does not do, including provide thrills or promote well-being.
That is to say, these tools will exist and they will be used. But people will still write and make art - often perhaps using these tools in some way - because they'll have the time and find it rewarding to do so. And others will also assign some value to this. We might briefly get addicted, and then have a society-wide discourse on what healthy use is, similar to social media.
Isn't it compelling to wonder what humanity will decide to do with technology when technology were to be limitless? As in, what essentially human choices will we make in what, when and how to use technology? (Singularity-themed scifi tries to provide some answers since the 90s.)
Yes, and I mean, while you could argue that jogging or riding bicycles is a form of self expression, I don't consider them so the same way I consider writing to be so. Cars help you get from point A to point B in a way that is qualitatively different than walking; but both are ontologically different from writing, because writing is not a thing that a single person does -- it's an act of communication. Writing and reading, and the dynamic of both playing out among people, is related to the formation of the self for both parties in the deepest and most crucial way.*
Easier put: the joy of writing and the joy of reading are strongly linked. I don't write things with the expectation that nobody will read them, and I don't read things with the expectation that nobody wrote them. Or, in this case, a machine.
And I think a "writing photoshop" will be much harder to detect than an image Photoshop. Did I picture the author as a white collar Yale graduate because they are one, or because the machine told them that was best?
* For example: I would sooner give up my ability to walk than give up my ability to communicate myself to others; that's the difference I'm talking about.
What do you think about text transformations like the ones described in the essay? Not inpainting but tools that could help “good” writers, like changing narration in a chapter from first person to third person in one step?
Changing perspective is not as easy as changing personal pronouns. It also affects the knowledge of a character as well. First person perspective has insights of the main character's inner thoughts and feelings, since they are the narrator, third person has not. Then there is third person limited and third person omniscient, which require totally different approaches and need to have different twists and turns to be belieavable. Changing the pov basically requires total rewrite of the story.
The authors of most shared articles and most comments are not even passing a “turing test”. In the vast majority of cases the readers just consume the data.
With GPT-3 we can already make “helpful and constructive” seeming comments that 9 out of 10 times may even be correct and normal. But 1 out of 10 times be kind of crappy. Aby organization with an agenda can start spinning up bots for Twitter channels, Telegram channels, HN usernames and so on, and amass karma, followers, members. In short, we are already past this point: https://xkcd.com/810/
And the scary thing is that, after they have amassed all this social capital, they can start moving the conversation in whatever directions the shadowy organization wants. The bots will be implacable and unconvinced by any arguments to the contrary… instead they can methodically gang up on their opponents and pit them agaisnt each other or get them deplatformed or marginalized, and through repetition these botnet swarms can get “exeedingly good at it”. Literally all human discussion — political, religious, philosophical etc. - could be subverted in this way. Just with bots trained on a corpus of existing text on the web.
In fact, the amount of content on the Internet written by humans could become vanishingly small by 2030, and the social capital — and soon, financial capital — of bots (and bot-owning organizations) will dwarf all the social capital and financial capital of humans. Services will no longer be able to tell the difference between the two, and even close-knit online societies like this one may start to prefer bots to humans, because they are impeccably well-behaved etc.
I am not saying we have to invent AGI or sexbots to do this. Nefarious organizations can already create sleeper bot accounts in all services, using GPT-4.
Imagine being systematically downvoted every time you post something against the bot swarm’s agenda. The bots can recognize if what you wrote is undermining their agenda, even if they do have a few false positives. They can also easily figure out your friends using network analysis and can gradually infiltrate your group and get you ostracized or get the group to disband. Because online, when no one knows if you’re a bot… the botswarms will be able to “beat everyone in the game” of conversation.
Another reason the future of social media is invite-only. People obviously aren't invisible BS detectors, but if things really get that bad they're not just going to take it indefinitely. They'll notice and adjust.
By the way, if you are worried, it isn't true that this is "coming." A form of it will come but like self-driving cars, it will be more hype than substance.
Photoshop lets a conscious hand change the medium. Now we have tools that generate the message out of thin air. We're about to outsource human expression to machines.
And why? To automate things we don't want to write about, and automate reading them. I picture a future of machines talking to each other in increasingly clever human language while humans on each end just get the straight talk: "Did you put the cover on the TPS reports?" "Yes, I did. I got the memo."
This also ushers in a terrifying era of communication. "I'm so sorry for your loss", says a machine on behalf of someone, using all the right words but not feeling any of them. "Thank you" replies another, on behalf of someone else.
Spelling and grammar checkers are amazing. It becomes a problem when we outsource expressing ourselves to a machine. If kind, thoughtful words come from an artificial brain, expressing sympathies just becomes a box-ticking exercise.
We mocked "press F to pay respects", but we've already built it in real life. "Congratulate John about his new job" (LinkedIn), "Wish Jane a happy birthday" (Facebook). Having the machine write the message for us is the next step, and I find it terrifying. What value is there to a gesture if it asks nothing of us? No thoughtfulness, hell, no thought.
At this point, you might as well go all-in. Attach a pen to a CNC, connect AI to it, and offer heartfelt handwritten notes as a service. Offer to connect it to social media and auto-detect heartfelt handwritten letter opportunities. Have that letter in the mail while the corpse is still warm.
I think if you'd like a sample of what the author of the post is referring to, go checkout the Hemingway text editor[1]. It's very simple, and as far as I know makes no use of the advanced language models that have been in vogue for the past few years. What's really opening the door is thinking of these language models as hyper-efficient compression. With Stable diffusion, we can pack enormous amounts of artistic tedium into simple text prompts, with the weights getting small enough to fit on a flash drive[3].
Grammarly[2] also have a desktop app that appears to offer editing advice, although generally I think they're focused on grammatical correctness over anything else.
The question I'm left asking myself at the end of the article is, to what end do we need to edit text like Photoshop? Part of me sees this "Photoshop for Text" as something that would be akin to "No Code" tech stacks. Good No-Code/Low-Code solutions usually allow to build specific classes of products (websites, 3D assets) in ways that are faster than the status quo. But anyone who spends enough time in a No Code stack eventually hits the wall where the people who designed the tool had to sacrifice the flexibility of text for the convenience of a GUI.
I yearn for the day that we can set a language model loose on something like the NCBI database or arXiv and have it point out open problems in the field to new PhD students. Or have it figure out whether my ablation studies make sense. Or an AI that can generate math proofs for me. A lot of this linked to model interpretability and understanding, but I think the work that DeepMind is doing is showing that there might be a way to utilize this stuff in expert domains sooner than we think.
Autocomplete in Gmail has been getting more and more robust. At first it only suggested grammatical correction in words. Later it started giving advice on better sentence structure and then it just started suggesting whole sentences. Each of those steps I loved. More often then not it just says what I wanted to say, with less button pushes, and without all of the mistakes that I make as a non-native english speaker. I sound smarter in gmail and I like it.
If I didn't have this experience, then giving the machine any input on what I write would seem crazy to me. I would think that language is too personal, too contextual, that I need control over every word and every letter.
But now I love writing with the help of the machine. It still feels like me speaking, the machine doesn't add any extra context that I don't approve of. It really feels like the messages are still mine, and the autocomplete just helps me extract my thoughts from my head in a better and more effective way.
1. (of body tissue or an organ) waste away, especially as a result of the degeneration of cells, or become vestigial during evolution.
"without exercise, the muscles will atrophy"
These tools will only become more pervasive, so why does it matter if that part of the brain "atrophies"? I'm sure people had the same worries about mental math during the rise of the calculator
There's an area where we've been way, way ahead of the curve for 70
years now.
It's sound.
Long before GPT, image synthesis, video deep-fakes and these imagined
"Photoshop for words", we had sound synthesis.
That's a very useful marker. Because we can read the things people
were saying about the future of sound, their hopes, fears and
predictions as far back as the 1960s when Robert Moog and Wendy Carlos
were patching modular synths.
Most of the fears and predictions turned out to be rubbish. Musicians,
orchestras and live events didn't get replaced. Instead we invented
synth-pop bands.
And many of the things technologists imagined people would want to do,
turned out to be way off the mark. To my knowledge Isao Tomita was the
only talented artist to "replace an orchestra" with synthesisers.
Most people who used the tools "as intended" were artless, and
forgettable. Everyone else ran riot in the parameter space - messing
and subverting the technology to get the weirdest punk-ass squelches
and wobbles possible.
So I always have to look on these "How synthetic X is going to make
the real X obsolete" with a pinch of salt.
I think there's difference between synthesizers, which I would compare to something like different paints and brushes and papers, and then Photoshop and Corel draw and whatnot; and where image generators are heading which is more analogous to automatic music generation.
I will also put it forward that for reasons I'm ignorant of, eye seems to be more readily fooled than ear. 20 years ago with crappy tools all I had to do was smudge and clone a hydrant in a photo and it would effectively be gone for 99% of observers. But similarly primitive ways of trying to change or alter a sound file were immediatelly noticed by all listeners.
> So I always have to look on these "How synthetic X is going to make the real X obsolete" with a pinch of salt.
From your comment, it seems that the linked article is far more fear-mongering than it is; I gathered a mostly optimistic tone from it.
The final paragraph -
> While some of these capabilities sound a bit scary at first, they will eventually become as mundane as “desaturate”, “Gaussian blur” or any regular image filter, and unlock new creative potential.
The analogy to sound synthesis is strange to me. As both a musician whose primary instruments are acoustic guitars and computer software, and as a writer of poetry, prose and essays. I can’t speak to fears of synthesis obsoleting other music performance, mostly what I’ve heard expressed is snobbish purity that it’s not “real”—either real talent, real music, or real performance. But at the end of the day, synthetic instruments are still instruments. Fundamentally a primitive like a string or a reed attached to something which lets you make sound from it. A “simple machine” in the mechanics sense. They’re not composition synthesis, which is definitely an area of exploration now, and which is more akin to image and video synthesis.
Photoshop conceptually sits somewhere between these two extremes of electronically-aided creation, but much closer to sound synthesis, than what the author is hypothesizing. I can’t even think of a text analogy for sound synthesis as you’ve described. The least nonsensical imagined example I can think of is “this word does not exist” (as in word synthesis), which would be more valuable as a game or a gag than as a tool.
While I don't necessarily disagree with the sentiment, I think the antithesis to your argument is that where synthesizers "failed", modern sampling technology has crept in, especially in scoring. It's less common in the feature film industry, but a significant portion of TV and game scores are produced entirely with virtual instruments, or at least have a very heavy sampling component (e.g. recorded solo instrument against a synthetic backing track). Most people can't tell, or don't care - as will most likely be the case with AI-generated or AI-assisted content as it moves forward into various domains.
Some of the stuff he describes like "summarizing text" and such is not like the core Photoshop (or all of Photoshop as it was until very recently, like 5-6 years ago), but more like AI stuff - see: #.
For core text processing that would be similar to how a bitmap editor processes text (fitering, replacement, conversion, etc.), there are some tools, like the aptly named:
This idea is really interesting. I’ve been using Copilot since it became publicly available for personal side projects. Recently, I ran out of google drive space and had to use something other than google docs to write my essay for school. I booted up a text document and started writing, and then copilot started writing with me. I ran it through a plagiarism checker, and it turned out to be clean, which surprised me.
Can’t say that what it generated was particularly insightful, however it was helpful to reach an obituary word count.
I'm pretty sure you meant "obligatory" word count. But "obituary" is an awesome accident. Like Copilot is waiting, happy to sum up your life in a tidy paragraph, when the time comes.
I wish my text editor had a field where I could enter a snippet of python code which would be fed the opened/selected text to process it and replace it with its output. I often do a lot of logically and sequentially complex search-and-replace which becomes a lot of manual labor. As a simple example" replace all the keyboard quotes with “open-close quote pairs” while making sure the final dot is placed outside rather than “inside the quote.”
In Vim you can already push marked text through any shell command, including python scripts that read from stdin. You could probably make this work with a few lines and a keybinding.
I rarely need more than the builtin macro system though, as macros can basically do anything including regex-based search it can do any formatting or change as long as the steps stay the same each iteration.
I do it this way now - push a file through a script in the command line. But I would be more happy with a handy gui-mouse-click-copy-paste solution. Ideally I'd like a panel in something Sublime/Gedit-like where I would just paste a python function, click run and have it applied without even saving the file to the hard drive.
Having a live preview of the diff before any changes were applied (like vscode and most editors do when using find/replace) would make this a lot more useful.
The essaie focuses on image and I can’t help making the parallel with digital sound editing tools, which are [in my very uneducated opinion] more numerous and diverse.
Some people are better at writing than editing, some people are better at editing than writing.
I find it necessary to heavily edit when writing - up to, and including, this comment. I don’t mind it, and I don’t mind doing it to other people’s writing either, so this new way of doing things appeals to me.
I’d be interested to hear what anyone who’s able one-shot their writing thinks of this. I feel like that type of person may have less of a desire for this kind of stuff?
> I’d be interested to hear what anyone who’s able one-shot their writing thinks of this. I feel like that type of person may have less of a desire for this kind of stuff?
Yep. I don't use spell-checking and I don't use auto-complete bars on phone keyboards, largely because I feel it keeps my skills sharp. I would use tools like the article describes when I deem them to have become necessary to stay competitive at what I do, but at the moment I don't feel it's clear-cut whether their use would promote or harm my faculties to write.
That makes sense. What the long-term effects of such 'performance enhancing' tools might mean for how its users think, or externalise those thoughts, is hard to call.
I also wonder what it means if everyone is farming out parts of their intellect to similar models, which might be limited pathologically, by training data or enforcement.
I often call vim the "photoshop of text". command mode has a rigidity like an image , but you can cut copy and paste parts of it.
AI text generations will be more like games than like photoshop. Photoshop is 2 step: do and undo, but editing text is more like a continuous stream of refinements.
Yes, I think macros and regex-based Vim features are more fitting for the comparison to Photoshop than AI-powered features which are a new addition to image/video editing as well. Piping a selection through a shell command or regex based replace is like applying a filter to a selected set of pixels.
I fed your question to GPT-3 with the instruction to "answer this question in flowery, Shakespearean prose" simply because I assumed it would fail. The result?
If by "machine generated fluff" thou dost refer to the banal, trite and insipid content that oft pollutes the Interwebs, then nay, I believe not that people desire to read such.
iI have read stuff generated by LaMDA and it is really good at generating synthesis of any topic, I like more what it generates than what I read on wikipedia
Flawed premise IMHO. Images have inherently more entropy than text (all else being equal). More dimensions to "play around with" is what made Photoshop useful and popular.
The paper at COLING 2022 that I wrote, titled "Most Language Models can be Poets too: An AI Writing Assistant and Constrained Text Generation Studio" included a GUI constrained text generation studio that I market as being "Like Photoshop but for text"
The difference is our brains can immediately judge the result of an image transform, but it takes a re-read to see if the text transform hasn't done something bad, like lost the meaning of the text.
A blurry image is less likely to get you into trouble than saying the wrong words, so it is critical to validate manually what it says.
That said, such tools could be useful for recommendations on how to rephrase things.
Well, discrete images are an approximation/sampling of some kind of continuous process in nature. We therefore have a whole arsenal of tools to process this signal, ie photoshop.
Text? I'm not sure what kind of model we have here. Only now that we have word embedding and other nlp models can we hope to do the same kind of thing we do with text as we do with images?
There are tons of filters people need today. There are no longer secretaries proofreading or lawyers reviewing given the get it done in an hour demands put on people. I would pay a ton for something that automatically fixes know what Twitter is angry about today so I don’t need to keep up on trends, be able to make sure language is gender neutral an unlikely to offend special interests, warn of internationally sensitive phrasing, reword anything that could be hate speech, warn when you are likely going to tick off the SEC. People are far too emotionally invested in what others say and it would be great if you could just send public text through a few filters in the name of not publicly embarrassing yourself. I should imaging a popup saying “that is dumb, illogical, and going to violate a TOS unless you change it like this” will shortly be a common popup instead of flags for review.
The surprise is not that it is coming, but that it hasn’t been a thing for years given how flammable the internet is.
I hope we get some updates in Adobe the address the issue of data merging and downsizing text based on user input to fit a field box. It is incredible that simple things like that are not easy to do natively in the app.
This is directly not on topic but reminds me that for most photo-editing Photoshop is overkill. Photoshop is more then adjusting channels and blurring/sharpening few areas.
Photoshop for text exists for decades. It's called a desktop publishing app. Those were all the rage in the 80s, from Pagemaker to QuarkXpress and even Microsoft Publisher.
It's already there - as a Google docs, Wikis and other forms of collaborative editing. This is what makes text great - feedback from others and maybe even their input.
I feel as the person who wrote this never worked in publishing... As an ex-editor now working in IT I find this post a bit... ludicrous? Anyone else feels that or am I missing something here?
> Up until now, text editors have been focused on input. The next evolution of text editors will make it easy to alter
Vim can do this, not in the sence you putting into the article but at least it does it without requiring using anything else except 3 rows of keyboard.
> Text filters will allow you to paraphrase text, so that you can switch easily between styles of prose: literary, technical, journalistic, legal, and more.
Pfff. Styles of prose are: trolling, documenting, cat-talking, legal and just making a list of something. Trolling cannot be augmented, documenting feature begs of some connections to reality, proper cat-talking requires throwing a lot of synonyms really fast, legal is kind of Java programming when you type one line and get 40, and augmenting lists might be done with either md-style but without requiring to draw every symbol of that ascii tables or excel-style but without gui.
Photoshop doesn’t use AI models to generate new things for the artist, photographer, whoever (it does have generation of course, but not in the same way). A tool like Photoshop for text, in the way described, seems more like a writing aid than a writing tool; something I can give vague commands like “make this paragraph more academic” (whatever that means), and it’ll spit out something approximating the academic style it analyzed. Whereas “auto balance the levels of this image” is much more concrete in what it means, there’s no approximation of “style.”
I feel like Photoshop for text should be an editor that cares more about the structure of stories and who’s in them, with ways to organize big chunks in easy ways, rather than something that generates content for you.