Hacker News new | past | comments | ask | show | jobs | submit login
A functioning Turing Machine using Notepad++ and its find/replace regex engine (github.com/0xdanelia)
428 points by MarcellusDrum on Feb 22, 2022 | hide | past | favorite | 87 comments



Cool demonstration of regular expression power! Note that as implemented, this is a space bounded Turing Machine with a hardcoded tape length limit:

> With the current implementation, the read/write head looks at the 22nd element of the tape whenever we want to read the current position. With complicated machines that utilize long lengths of the tape, you would need to increase this number so that you never delete a necessary tape element as you scroll along it. This can be done by increasing the {20} that appears at the beginning of the expression.

Embedding the tape head pointer ^ within the tape, just to the left of the scanned bit, should remove this restriction.


Yep! Embedding the tape head and the current state on the tape would make this more than just PSPACE-complete. I believe that in this case, regular expressions would not even be necessary: the Turing machine could be simulated with JUST find and replace.


   The instruction sets are the remaining lines, formatted like >C.I:WMN
   C current instruction name.
   I the input from the current tape position. Either 0 or 1. Each instruction has 
   execution parameters for both inputs.
   W the output to be written to the tape at the current position. Either 0 or 1.
   M the movement of the read/write head. A 0 moves the head one position to the 
   left. A 1 moves it to the right.
   N the name of the next instruction to be executed at the new tape position.
Would it have been clearer to use "<" or ">" to specify whether to move left/right from the current position?


As it's structured, that wouldn't work. To move, it replaces the leading zero of the tape with:

  * If the movement is "0": "00"
  * If the movement is "1": ""
The choice of 1 to move right is arbitrary (but makes sense to match the 0), it could be any other symbol, but the 0 is needed to because that's what's prepended to the tape.

Someone could probably find a way around it, but it would likely make it harder to understand.


If you like this, you'll love Save Endo, one of the top two most fun ICFP Programming Contests ever. (The other is Cult of the Bound Variable)

https://save-endo.cs.uu.nl/


If you like this, you might also like Retina[0]!

[0]: https://github.com/m-ender/retina/tree/master/Examples


I was under the impression that RegEx was not turing complete. Is there something special about N++’s regex engine that allows this?


Regular Expressions as defined mathematically is strictly less powerful than turing machines. (by several levels, they are strictly less powerful than general context-free grammars, which are strictly less powerful than general context-sensitive grammars, which are strictly less powerful than arbitary grammars)

The 'regexes' in modern languages and tools, however, are not actually regular in the original mathematical sense, they have been augmented by several constructs that are not regular. Perl is the leader in this field, its latest addition is regexes which can reference itself recursively, making (presumably, never seen a proof) it at least context-free. Here, the non-regular constructs used is capture groups and arbitary forward lookahead.

In addition to that, the author is doing something sneaky by making the user press a button continuesly to advance the state of the turing machine, so it's not actually search+replace that is turing complete, it's search+replace+"repeatedly pressing replace all". Unlike what some other people say in this thread, repeatedly pressing a button to simulate the machine doesn't count as 'turing complete' and is not comparable to plugging the machine in power.


Seems like you could use automated hardware to repeatedly press the button to facilitate the computation without any additional software or human interaction required (I'm reminded of the episode of the Simpsons where Homer tries to automate his job with a nodding bird!)


No, nothing special about N++ (well, lookahead, but many regex engines have that). Repeated Search+Replace is the key, and not part of regex.


And if you allow multiple pattern-replacement pairs, you don't even need regex (the pattern can be a single fixed string) as it is enough to simulate semi-Thue systems [1].

[1] https://en.wikipedia.org/wiki/Semi-Thue_system


So, Notepad++ basic interface is Turing complete because it's a place you can write a Turing machine and execute it.


I find myself wondering if anyone has done a proper analysis to prove that human activity is Turing complete!


We can manually execute the steps of a Turing machine. That's all that's needed.


It is so, trivially; humans wrote the find/replace box in Notepad++.


I mean, I can write a FSM that outputs the code for find/replace in Notepad++.


Yeah, but you can't write an FSM that can execute itself repeatedly over its own output indefinitely!


Turing's original idea of a "computer" was a human manipulating symbols on paper, so I would say trivially yes


We are guaranteed to halt, e.g lifespan. After death cells enter apoptosis stage where cells do final stages upon shutting down. I’d say it appears we are Turing complete, or at least it appears so from this angle


Given that Turing machines cannot be guaranteed to halt (and it's trivial to write an infinitely-running Turing machine), therefore humans are not, technically, Turing-complete.


Turing completeness is a theoretical construct that ignores that errors in execution inevitably occur during a sufficiently long program, due to the second law of thermodynamics (entropy must increase in a closed system). Any realization of a Turing complete engine eventually fails. That's different than halting, though, because it doesn't answer the question as to whether the program of the machine eventually reaches a halt instruction, when properly executed.


Well, I would say that any physical implementation of a Turing machine will eventually halt because you cannot guarantee an infinite energy supply. So if we're willing to ignore physical limitations then humans can be Turing complete - if not, then no machine be either.


How about a human population?


Oh, that too though we’re looking at it from a different angle. Most species so far have gone extinct though we’re a special kind, we may as well cause that without any external factors.


Aaah of course, thanks


It is proven that Turing machines can recognize a wider class of languages than Chomsky regular expressions. However, the article uses something more powerful than Chomsky regular expressions, because they contain backreferences (as \2 and \4), and also there is a repeated search-and-replace involved, which also adds to their power.


Do you mean Chomsky hierarchy by Chomsky regex?


Regular languages are an element included in Chomsky's hierarchy: https://en.wikipedia.org/wiki/Chomsky_hierarchy https://en.wikipedia.org/wiki/Formal_language


It seems Notepad++'s "search + replace all" feature replaces text in place inside a search context rather than buffering the changes and applying them outside a search context after the search is complete. This makes search recursive.

When combined with enhanced regexes (backreferences and lookahead), you have the ingredients for Turing completeness.


I made a harder-to-understand investigation using Perl (non-regular, of course) regexps a while back.

https://perlmonks.org/?node_id=809842


I did essentially this in vim once – although I made it a little more "user friendly" with vim macros' ability to both call and modify themselves, so it would stop itself if the Turing Machine halted.


So it can play Towers Of Hanoi?


Certainly; that's a space bounded computation, only needing O(n) tape for n disks.


But will it run Doom?


Nowadays we ask "will it run LibreOffice?"


Nice one to add to the list of accidentally Turing complete systems. What's your favourite :)?


Still one of my favorites: On The Turing Completeness of PowerPoint, https://www.youtube.com/watch?v=uNjxe8ShM-8


A bit off topic but in the french internet there is an add, these days, asking people to try the Turing test to find a (good) job. I find it very funny. Imagine a job interview where you have to be turing-complient to get the job : you play the tape and the recruiter act like a scanning device...


I wouldn't say that it's on accident. The theoretical definition of Regular Expressions is specifically not Turing complete. But Regex as a tool just isn't as useful in that form, so it was deliberately extended to be Turing complete with new operators.


Great, now someone should write a Javascript interpreter for it!


Not a Turing machine if the user has to press a button for each step.


That's not relevant as to whether it's a turing machine or not, tho...

The possible calculations (and genericity) is what matters. The "button for each step" could be analogous to powering the turing machine, or turning some crank for Babbage's machine, or whatever..


This is absolutely relevant, turing's original definition described a completely autonomous and automatic machine, that's why every turing machine has a halting state that locks it in a loop when computation is finished.

>could be analogous to powering the turing machine, or turning some crank for Babbage's

Those things are done once for those machines, you press power-on or turn a crank for just one time and the machine starts, this is not the case here, here the human is acting as the control logic for the machine, repeatedly pulsing to drive the computation.

Your computer is not a computer without a hardware clock, the repeated pressing of a button is acting as a hardware clock here.


>This is absolutely relevant, turing's original definition described a completely autonomous and automatic machine, that's why every turing machine has a halting state that locks it in a loop when computation is finished.

That's just an accidental part, and is orthogonal to the manual "click next".

The halting state absolutely will be there, locked in a loop and everything, you just manually click to step over each iteration of that loop.

As said, the "click next" is no different that a clock signal in a CPU, or manually cranking the turing machine to play.

It's not at all relevant to the abstraction.

>Your computer is not a computer without a hardware clock, the repeated pressing of a button is acting as a hardware clock here.

Which is exactly why it's an irrelevant detail to the turing machine computation. It's just the clock, not the digital logic.

Whether the hardware clock is internal, or external, or I have a mule rotate around a millstone to drive it, or a big chunk of quartz, is irrelevant, as long the machine is receiving it. This includes me clicking a button to send a pulse.

Let's put it another way, what you're saying is isomorphic to:

"This setup is not a turing machine but this exact setup plus a while loop to repeatedly call xsendkey for Enter is".


>This setup is not a turing machine but this exact setup plus a while loop to repeatedly call xsendkey for Enter is

Yes, that's exactly what I'm saying. And that's why the title is misleading, it's not really Notepad++'s search and replace feature that is turing complete on it's own, because turing complete means "can *Simulate* an arbitary turing machine", the 'can' here is not meant to imply a "if you sat next to it and repatedly pressed enter" kind of remark, it just means you can vary the machine being simulated by varying the input string, but once you enter an input string into the turing-complete system, it should be able to simulate the machine on it's own while you step back and watch. That's what the original machine would have done anyway, so how can you 'simulate' a turing machine if you need something it doesn't ?.

Leave Notepad++'s search and replace feature on a text file containing a transition table for 10^9 years, would it simulate anything on it's own ? On the other hand, leave a JVM running for 10^9 years... you get what I mean.

This is mostly an informal philosophical disagreement on the actual vs. the potential, real academic proofs of turing equivalence side-step the matter of simulation entirely by describing the systems involved in static terms. It's implicitly assumed there is a background animator stepping every system according to its rules.

But I think it's relevant to our intuitive definition of what a computer is, I don't think anyone ships their programs in a form where the user needs to invoke a debugger and repeatedly press 'step over' to get to the next state of the computation. Those kinds of "X is turing complete" headlines circulate on social media and leads people to think 'X' can actually act as an automatic computer, but little do they know the author uses their hand as a hardware clock without really calling attention to it.


No, Babbage's earlier design requires the crank to be turned all the time. Basically human-powered instead of fossil-fuel powered. As long as the human only needs to turn the crank without thinking, the human could always be replaced by a steam engine.

The purpose of Turing completeness is to describe whether a system is capable of performing arbitrary computation. That once a machine is Turing-complete, Church-Turing thesis stipulates it can do any computation that another Turing-complete machine can do, subject to resource and time constraints.

In the early days of computing, where designing and building a general-purpose computer was still a major practical challenge, one could imagine whether the crank is turned by a human wasn't important, because any idea that involves a physical crank can be supplemented by a steam engine, a solved problem. It's very easy to swap out the human for a steam engine - the actual novel problem was to verify that the internal logic of the machine is general-purpose enough.

In that context, one can see what the true spirit of the Church-Turing thesis is. It abstracts away the things that don't need to be part of the picture, so that people can focus on understanding the mathematical notion of computation - what is computable and what isn't, and what kind of designs are capable of computing everything that's computable.

To illustrate, you'd say it's imprecise to say a language is Turing complete - you technically still need a CPU and RAM. But those are just assumed to be available when the true focus is to design a language. Similarly, when designing a mechanical computation machine, whether we have built in the monotonic power source / hardware clock is just not a very important distinction given the context of the design. Everything that holds for Turing-complete machine would still hold, just except the machine needs a power source.

It's true that requiring constant clicks makes the result less interesting (and maybe very much so), but for the spirit of Turing completeness the power source is just a trivial matter.


It's pretty relevant. A simple arithmetic equation becomes Turing complete if a helper system is attached to execute it over and over on the previous output.

The class of things that are Turing complete with one button push is much smaller than the class of things that can run a single step of a Turing machine per button push.

I'm not here to argue about which one is the "real" Turing complete, but they're very distinct groups, and I'd say that getting into the former is significantly more impressive and interesting than the latter.


…or clicking the On-Click Transitions in Microsoft® PowerPoint™…


Not a Turing machine if it's powered by human created electricity.


Terrible and fallacious analogy, the repeated press of a button doesn't power the search+replace process, it still needs electricity to run, the repeated pressing is more like a hardware clock that repeatedly pulses to advance the computation, which does indeed makes the search+replace process alone not turing complete, every turing machine has control logic to automate the execution loop.


If you had to press different buttons you would have a point.


Ah yes, regular expressions, my favorite Turing-complete language.


So what you're saying is I could use this to parse HTML?


Parsing HTML can be done with a state machine. That’s how HTML5 defines how to parse it: https://html.spec.whatwg.org/multipage/syntax.html#syntax


Who said that regexes could not be used to properly parse HTML? ;-)


"Yeah, but your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should" - Dr. Ian Malcolm.


jokes, this is neat.


Neat, Though it does make me wonder if a python parser wont soon become standard in editors. Its just too damned convenient for many things, sure you might be able to come up with a regexp search and replace that does the same thing, but odds are it will take longer than coding a few loops. BASH sucked far too much, and C++ and most languages lacked the convenient filesystem libs required, but python just works and is easy to read and modify. The downsides of python, like the atrocious speed and low maintainability dont matter for scripts you will use just once.


I've needed such functionality often enough that i wrote a small utility[0] that uses my LIL scripting language[1] (similar to Tcl) to process some text and have its output. A couple of examples are in the shots [2][3] (note that as this is written in Lazarus it also works under Linux and Mac too, though the site only has a Windows executable).

Before that i used to write scripts or even full programs (in Free Pascal which has some simple string handling) to do similar processing and i did find it much more cumbersome to go through that route.

Of course this only works for editing/generated/transforming text pieces (and i pretty much always use it via clipboard), for processing files i still end up writing full scripts or programs (depending on the case, if i need to preprocess stuff i use another LIL-based tool, lip[4]).

[0] http://runtimeterror.com/tools/liteproc/

[1] http://runtimeterror.com/tech/lil/

[2] http://runtimeterror.com/tools/liteproc/shot.png

[3] http://runtimeterror.com/tools/liteproc/shot2.png

[4] http://runtimeterror.com/tools/lip/


I used to have a Perl script, bound to a key chord, that would pop a dialog box prompting for a s/// regexp-replace expression and apply it to the clipboard contents.

Never got around to making it properly interactive, because I stopped needing it when I switched to Emacs and had both its native capabilities and C-u M-| available.


Python just works... Every now and then I'm still bitten by the 2->3 transition. And I'd very much not like it to be a required dependency in my systems either.

And no, regexps get a bad rep but they are for the easy 99% and insanely quick to come up with. Learn basic syntax and you'll be thankful for decades to come.


Learn basic syntax and you'll spend the next few decades wondering each time _which_ basic syntax is expected because none of them ever say.


I know the basic syntax and I never have that problem. Perhaps you mean the advanced syntax? However I don't see the problem, there either. I usually don't need it, and to be honest, when I do, I find it more maintainable to use multiple simpler expressions combined with some programming.


Some by "regex" mean "globs" and you can just use "*" and "?" for stand-ins for some number of characters.

Some allow "|", some allow backrefs, some allow "()", or require them escaped with \, or allow them but not with * after. Some are case insensitive, some not. Some allow "{0-5}", some allow "[0-9]", some have handy things like "\w".

It's just the guessing game of exactly what they want. It should be required that each regex box has an example next to it using as many allowed features as possible.


> Some by "regex" mean "globs" and you can just use "*" and "?" for stand-ins for some number of characters.

But that's _not_ regexp, it's glob; a totally different pattern matching system. I mean, you can call a duck a horse, but that doesn't mean you're right... or that there's anything confusing about horses.


> Some by "regex" mean "globs" and you can just use "*" and "?" for stand-ins for some number of characters.

What kind of scary tool is that?


Glob is used in most shells, plus in several languages (Tcl has both glob and regexp matching).


Yeah, but they don't call it "regex" which would be the scary part.


Ah, I interpreted the original statement as

> Some [people] by "regex" mean "globs"


Ah, that makes far more sense. Though it makes OPs argument even more nonsensical ;)


Look for 'pcre' on the label. Accept no substitutes and you'll be fine.


*almost always: https://blog.cloudflare.com/details-of-the-cloudflare-outage...

>The Lua WAF uses PCRE internally and it uses backtracking for matching and has no mechanism to protect against a runaway expression.

https://blog.cloudflare.com/making-the-waf-40-faster/

>Back in July 2019, the WAF transitioned from using a regular expression engine based on PCRE to one inspired by RE2, which is based around using a deterministic finite automaton (DFA) instead of backtracking algorithms. This change came as a result of an outage where an update added a regular expression which backtracked enormously on certain HTTP requests, resulting in exponential execution time.

>After the migration was finished, we saw no measurable difference in CPU consumption at the edge, but noticed execution time outliers in the 95th and 99th percentiles decreased, something we expected given RE2's guarantees of a linear time execution with the size of the input.


You must not have suffered enough, I mean used regexs widely enough.

He meant “syntax” in the sense that different regex engines have different syntax and capabilities - can I do a negative look ahead assertion in engines Z, how do I do a zero width lookaround in pcre, gnu, python, posix, etc.

Depending how far down the rabbit hole you want to go, start here:

https://swtch.com/~rsc/regexp/regexp1.html


Oh and dont forget that just because the documentation says it uses a specific regex syntax, that does not mean its correctly implemented. It gets worse when an application examples specify a specific syntax but what happens in the background is that it will use the default reg exp engine, which can be system or enviroment specified.


I just don't count those cases as 'basic' syntax.


Basic - naming a capture group. Awk / posix - \1 Perl - $1. Python - (?P<name>).

I’d find it hard to ignore the specifics. I mean if you only ever use one tool and were never exposed to other regex engines I guess.


Quick: which programs use ‘|’ and which use ‘\|’?


I wouldn't be happy with Python, but I do agree with the general premise that extensibility or automation ought to be expected of editors.

My own editor is written in Ruby, and so all extension is done by loading Ruby code into the running process, and I can drop into the Pry debugger with a keypress, or another keypress gives me a prompt to enter a single-line expression instead. The latter is literally a one-line method. Adding a binding to eval() a whole buffer would be equally trivial... Being able to extend everything trivially in a language I'm comfortable with (so not Emacs lisp) makes such a difference to usability.

Incidentally, the ability to interact with the open buffers using a script also from outside the editor is another thing I love as an extension mechanism for editors - an idea I first saw in FrexxEd (co-written by the founder of Curl) for the Amiga, which exposed the open buffers in the filesystem (think the Amiga equivalent of a FUSE filesystem), which would have the added benefit of not being language specific. It doesn't need to involve any FUSE-like stuff either - just a command line utility to "cat" an open buffer and to replace the open buffer from stdin would be sufficient.


Out of curiosity, what do you actually use the Ruby liveloading feature for? I use Neovim, which has a similar (but less powerful) feature that allows one to live-execute Lua code (or Vimscript, I suppose). But I almost never use it outside of testing code for my configuration, because the rest of the editor's feature set suffices fairly well for text editing.


I don't very often load code "live" other than on starting a client in the form of "macros" which are pretty much stuff I don't want to add into the editor core because it e.g. might be short lived or is just too esoteric. Even though my editor is mostly for myself, imagine anything you might put in your .vimrc, except it's only self-discipline that separates this from core editor code.

E.g. I have functions in there to insert headers in my journal for example.

In terms of executing code "actually" live, it's ~80% for debugging when I work on the editor itself. Pry provides all of the plumbing so all the code needed to add that is just to suspend the editors own input handling and call pry, coupled with an exception handler that calls pry as well, so there was no reason not to, basically.

But once I'd added it, the 20% left felt worthwhile as a means of e.g. do complex searches, or generate tables or otherwise do search and replace of content that requires more (e.g. parsing timestamps and replacing them with another format....) and any number of things I don't do very often but that feels very comfortable when you do need it. To be clear it's not like it does much you can't do easily without it. E.g. after all I could just dump the buffer to a file, load it in my repl of choice, manipulate it and write it out again and reload it in my editor. It's one of those small things that feels unimportant when you don't have it there, that doesn't save you a huge amount of time, but that just makes things feel nicer when you get used to them.


Vim can run Ruby scripts, so it also provides this feature.


Yeah it does not need to be python, I dropped emacs lisp due to (pain), never liked bash, or perl. Java like in matlab isnt nice either, too damned verbose. Ruby might work, most webstuff is great at showing stuff, but I havent tried it.

The generic variant would be to make it modular and replaceable, but that would be a mistake, I would much rather have a tool good at a specific subset of jobs than a config heavy tool which is medicore at all. Perhaps advanced search and replace fits that, but I think its a fundamental aspect of editing.

Thats a good idea!


If anyone whose life line says ‘makes an editor’ is reading the above: I beg you to use Lua instead of Python. It's a lot faster, and that matters in productivity apps. Speed can be the difference between ‘I wrote a script’ and ‘I could write a script but didn't’.


You can record and edit beanshell macros in JEdit. It does exactly what you described but you also have access to all the Java Standard Library (including regex).


That sounds like a felony abuse of Python. Despite, wasn't perl originally a tragic attempt at such language? :)


How about elisp?


How do you sandbox python?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: