Hacker News new | past | comments | ask | show | jobs | submit login
Forth implemented in Rust trait system (github.com/ashymad)
157 points by Ashymad on June 12, 2020 | hide | past | favorite | 84 comments



I don't see why people won't just take the step D and Lisp do-- allowing full use of the programming language at compile time.

You can execute an ordinary functions at compile-time to read a DSL from a string or read attributes (reflective metaprogramming) on your program's classes. Take the string it outputs, use mixin(), and you have code. For example:

    // Sort a constant declaration at Compile-Time
    enum a = [ 3, 1, 2, 4, 0 ];
    static immutable b = sort(a);
"a" only appears in the compiler's memory. "sort" is a normal function that runs at compile-time. "allowing accessto the full language at compile-time" is similar to what dynamic languages such as Python and JavaScript give you, except D is a static language with GCC and LLVM backends.


Rust is getting that, gradually, with the work of replacing the ad-hoc "const evaluator" with miri (an interpreter for a Rust intermediate representation). Right now you have procedural macros, which are Rust code that operates on the syntax tree, including custom attributes.

Proper reflective metaprogramming would be a fairly big step though - right now, the macro systems happen well before the type system even gets a chance to look at the code, so the data to play with types in an interesting way isn't there at the right step.


Rust is getting that, gradually.

Uh oh. I'd hoped the language would settle down.

The Go crowd knows when to stop. Go is mediocre, but stable.


You say that like mediocre is a compliment. But I guess "mediocre" has been the best descriptor of Go from its inception.


Kind of, yes. Go has a specific goal - replace C++ for server-side web related stuff. It's supposed to be dull, boring, reliable, and suitable for use by large numbers of programmers doing that kind of work. It's a success at that.

Go got some things right, like "goroutines". Other languages are trying to add threads to a language with "async" type callbacks (Javascript), or "async" type callbacks to a language with threads (Python). Those concepts do not play well together, especially when retrofitted. Go does have a one-size-fits-most solution which handles the main use case - a server process handling a very large number of intermittently sending clients.

The same can be said of "functional" add-ons. Full-on functional languages can be OK, but functional features in an imperative language tend to be on the painful side syntactically. Retrofitting "functional" is even worse.


It's possible to stop too soon though.


[flagged]


It would be great if you could avoid aggressive attacks on other people's views.


One issue not yet mentioned with Turing complete language at compile is that it makes tooling and IDE integration much more difficult.

When you need to run an unbounded program each time you want to provide real time feedback, like type inference or in Rust case lifetime inference, you make the language tooling much less simple and accessible.


> One issue not yet mentioned with Turing complete language at compile is that it makes tooling and IDE integration much more difficult

How so?

> When you need to run an unbounded program

How is a program that provably terminates but takes 2 years to finish any better for compile-time computation? You want timeouts in either case.


>How is a program that provably terminates but takes 2 years to finish any better for compile-time computation? You want timeouts in either case.

I'm not sure how realistic that actually is. Besides certain party tricks like encoding the ackermann functino using primitive recursion, I haven't actually seen anything like that. From my experience the vast majority of programs that don't finish in a reasonable amount of time are those that have some logic bug.

A timeout seems pretty off-putting. The idea that compilation could fail on a weaker machine just because it isn't fast enough just doesn't sit right with me.


> A timeout seems pretty off-putting. The idea that compilation could fail on a weaker machine just because it isn't fast enough just doesn't sit right with me.

The timeout should be set based on what the user of the machine considers acceptable. Why should I have to tolerate a 2 hour wait time for a compile-time computation to finish in order for the auto-complete menu to appear on emacs just because I am using a laptop from 2010?

> Besides certain party tricks like encoding the ackermann functino using primitive recursion, I haven't actually seen anything like that

Try prolog then (or something similar) and write something of the form sha512(X) = 0 this will initiate a brute-force search that will last essentially forever.


>Why should I have to tolerate a 2 hour wait time for a compile-time computation to finish in order for the auto-complete menu to appear on emacs just because I am using a laptop from 2010?

I don't understand. If you aren't willing to wait that amount of time, you're simply not getting working auto-complete at all. Hell, if you can't even compile the project, I don't see how you can meaningfully work on it all.

>Try prolog then (or something similar) and write something of the form sha512(X) = 0 this will initiate a brute-force search that will last essentially forever.

This is literally just another nonsensical party trick. I'm talking about something that someone might actually want to use, not contrived examples that really don't matter at all. I don't care to consider imaginary people that intentionally brick their programs.


I just realized that you said "are those that have some logic bug" rather than "are those that due to a logic bug loop forever" and also mentioned that this is true for the "vast majority of programs" in which case.. sure? If anything this is an argument that supports timeouts and can happen in both turing-complete and non-turing complete languages, plus I do not understand how popularity has anything to do with it.

> If you aren't willing to wait that amount of time, you're simply not getting working auto-complete at all

Or I am getting a working auto-complete for code that does not make the completion system to get killed by the timeout.

> Hell, if you can't even compile the project

I never mentioned not being able to compile the project.


A user configurable parameter for timeout would be a terrible language design mistake.

You would want as much as possible to make the timeout not based on runtime -- but on an abstract model of the computation which doesn't actually depend on any machine particular. "Hey your program didn't compile on my machine -- try a faster cpu" would be a nightmarish interaction.

You want constraints about how long these programs are allowed to execute to be tight enough for developers to learn a sense for what is reasonable to do with these features -- as opposed to waiting for runtime.


So, the user should not have the option to say "hey, I want this done within 10 minutes because I have to leave soon" or "I want my auto-complete to be done within 1 second", got it. Instead we should care about the feelings of arm and atom cpus.


The tractable replacement for a timeout is some kind of steps number. I thinks I've seen lots of long-running static-analysis tools use this for reproducible builds and results.


The analog of a timeout in a compiler is recursion depth limits.


Or just instruction count. Although in either case, while we are now machine agnostic (with some reasonable assumptions about the design) changing optimization flags that a build is running under might make it fail.


That alone definitely won't suffice. A naïve fibonacci implementation will give you a recursion depth of n, but even fib(64) take ages.


I don't think he's saying that a limit on recursion depth solves this problem. Rather, it's just a similar scenario where a real-world constraint (memory, time, whatever) comes to bear and may make a compilation fail on machine X where the identical code and environment would otherwise succeed on machine Y.


Alright, but that's not a feature, right? Stack limitations are just a natural limitation. Timeouts are something that you'd intentionally introduce, when you don't necessarily need it. And if possible, I'd like to minimize dependency on specifics of the machine as much as possible.


I dunno, I think they're pretty similar— both are an arbitrary cap that protects against exhausting a real world resource (time, memory).

In any case, the compile time evaluation would obviously still have a bunch of interesting constraints— like, probably reasonable to read in the contents of a file, but is it reasonable to make a network call? What about reading a file that's on a network share? Can you even tell the difference? That operation could also hang for reasons entirely outside of algorithmic complexity.

And of course, like a stack overflow, that's also true today; you can hang your compile by including a header from a network share that hangs.

So would it really be that bad to have the compilation hang, and just show a stack trace for the invoked-at-compile-time code when interrupted?

I don't think the halting problem is a deal breaker here.


It's not a limitation or the machine, it's a setting in the compiler.

Try doing anything pretty fancy at compile time with templates in C++ or macros in Rust and you'll quickly need to set the recursion depth limit higher.


A language that is Turing complete at compile means that an IDE cannot reliably tell whether code has a syntax error without running a program that can take arbitrarily wrong.

All other tooling faces the same issue. What variables have a given type? If determining the type takes arbitrarily long, how are you supposed to discover them? Now consider the plight of an automated refactoring tool.

The IDEs and tools can still be written, of course. But they take more work and can't be as reliable.


> A language that is Turing complete at compile means that an IDE cannot reliably tell whether code has a syntax error without running a program that can take arbitrarily wrong.

A language that is not Turing complete at compile-time does not necessarily mean that an IDE will be able to reliably tell whether a certain piece of code has a syntax error without running a program that can take an arbitrarily amount of time to finish.


It is possible to construct counterexamples, but in practice it tends to work out much, much better.

The various tools built around Java stand as an example.


No, that's not true actually. It's possible that the language is defined so that the syntax checker (lexing, parsing, name resolution) and even type checker is run first and compile time execution after that. The only thing what the undecidability then affects is whether we get the constant value or whether the program runs forever. But in the latter case, we have still performed successfully syntax checking.


I suspect a big reason is that you need an interpreter in addition to the compiler. From what I understand, the D gods spent quite some time and effort developing theirs.

For interpreted languages, there are no excuses; hooking into the interpreter at compile time is trivial. Full macros [0] are nice, but depends a lot on the syntax. A way to evaluate expressions at compile time [1] would go a long way.

[0] https://github.com/codr7/gfoo#macros

[1] https://github.com/codr7/gfoo#bindings


If you do not want to implement a separate interpreter you can do multiple compilation passes instead.


True, but in the cross-compilation case, you then need to do fun stuff like compile anything invoked at compile time for the host arch as well as the target.



Another issue not mentioned is cross-compiling.

If the result of my `sort ` function is dependent on `sizeof(void )` being 8, when I compile from my x86 hardware for a 32-bit only architecture, assumptions go awry.

Note that this isn't likely with something like sort, but definitely is* likely with precomputing values/structs etc.


Zig takes this approach. I haven't done more than kick the tires curiously, but it seems very cool.


They do. You can.

You have procedural macros, which are literally "Rust code is the input, which you write rust code to manipulate, and output more rust code".

You have const fns, which are interpreted at compile time, and are closer to what you're referring to.


Terra[1] does this using Lua as it's compile time language.

[1]: http://terralang.org/


The most useful values at compile time are types, so straight-up interpreting the core language at compile time is not enough. You need meta-typed named values to transport types in, and meta-functions to pass them to.

Fortraith cleverly re-purposes existing features, doing some violence to language usage conventions to achieve it.


> so straight-up interpreting the core language at compile time is not enough. You need meta-typed named values to transport types in, and meta-functions to pass them to.

Can you please explain what you mean by this?


In Rust (for example), Types are not Values. You can't store them in variables, pass them to functions, etc.

However, at macro-expansion time, often what you want to do is examine a Type and expand the macro differently based on some information about the Type.

So you would like your Macro language to have some notion of manipulating Types that is absent from the language itself


It should be noted that gcc and clang both execute certain functions at compile-time at their higher optimization levels.

In addition C++ has constexpr which marks an expression to be evaluated at compile-time.


The recent signs are that in near-future Rust releases you are able to have rudimentary logic in constant expressions: if, match, loop etc. (Around Rust 1.47 ish. That's three months away.)

When the capabilities of the constant evaluation system get improved and stabilized, we are going to see the exact feature you are describing. (With the caveat that non-deterministic functions are not allowed to ensure deterministic builds.)


> I don't see why people won't just take the step D and Lisp do-- allowing full use of the programming language at compile time.

Because "full use of the programming language" implies Turing completeness, which means compilation may require unbounded time and compute resources. You can allow use of a non-Turing complete subset, and this is something that dependently-typed languages can do quite elegantly.


Many type systems are already Turing complete like C++ and indeed Rust as evidenced by the project linked here.

It’s a valid point. Of compilation already is Turing complete, why not just drop the pretense and allow arbitrary compile time expressions.


Even where nothing in compilation is Turing complete, many pieces may be unbounded. And where there are artificial bounds, they could as well be applied to something otherwise Turing complete.


A program that requires 2^256 time units and 2^256 memory units is not really any better compared to unbounded computations.


Lisp failed as an industrial language precisely because it has powerful meta programming capabilities. In the real world, nobody wants to have to solve a puzzle every time they visit a new area of the codebase.


This is just nerd flexing. Rust has a slightly more flexible compile time code generation story than D. There's even a package that connects to your SQL DB and type-checks your queries.


I hope not.

I really don't want Rust to be crippled with a 1000 of DSL just because devs could write them easily.


Doesn’t the compiler already do optimizations like that?


Because that's fundamentally incompatible with a competently-designed compiled language. Consider:

  fn immutable foo() long:
    asm(long x:"rax") "mov rax 7"
    return x
Now try cross-compiling that from a 32-bit ARM machine.

Aside: D is kind of weird in this regard because most of it is designed to work as a interpreted language as well as a compiled one. To the extent the D is good, it's not compiled[0]; to extent that it's compiled[0], it's not good.

0: in the language design sense, not the language implementation sense.


Pretty much all languages with turing-complete compile-time expressions only expose a subset. This comfortably lands in the "not in that subset" category.


> [...] and Lisp do-- allowing full use of the programming language at compile time.

If you only expose a subset, you very specifically don't have the full use of the programming language.


Now you're just being pedantic. Also, leaving out the "D" just before your quote is plain disingenuous. That contextualizes the discussion and clearly establishes we don't mean being able to directly run ASM operations and syscalls willy-nilly at compile time.


Definitely interesting, but I don't see any mention of being able to use this interactively or incremental compilation, which is what makes Forth, Forth. Forth isn't just syntax. It's an operating system and a programable programming language.


The term "concatenative language" fits better for adaptations of Forth that also discard major semantics of Forth. There is a holistic quality to Forth that results in a lot of idioms that are not obvious, and concatenative systems that deviate from the whole tend to fall into an unexplored territory.

For example, in CASE ... OF ... ENDOF ... ENDCASE, the input value is consumed when OF is entered. But this means that the default result, positioned before ENDCASE, has to move the input value to the top of stack so that it can be consumed before the result. The examples in the ANS standard prefer using >R ... R> as a scratchpad for this purpose, so that any number of results may be pushed into the data stack, while the input is moved onto the result stack and then pushed back to the data stack. But the whole existence of the return stack and words that use it is a curious detail that doesn't come up if you start from an RPN calculator instead of a complete Forth system. Someone writing a RPN system in an applicative language, upon seeing this semantic, might be tempted to beef up the syntax rather than add this idiom. And in Forth this idea likely was only arrived at through the numerous iterations Moore made to achieve better expression in fewer words, since the return stack is useful everywhere.


You got me. I was too lazy to implement the return stack. But it should be fully possible. The hardest part was not implementing things in the trait system, but parsing any non-trivial syntax using the clunky macro_rules!. For this reason if/else/then are not a special syntax but words that block and unblock subsequent word execution.


The trait system and macros run at compile time and sadly there is no way to interact with the Rust compiler. However the macro is fully incremental (creating new words too). Which means in theory if it were possible to ask for input and pass it to the macro it would behave more like a VM:

  // create a word
  forth!(: inc 1 + ;);
  // ask user for $input
  type Stack = forth!($input inc return);
  // if you keep the stack around it can be used again
  forth!({ Stack } inc .);
  // should print $input + 2


This is a pretty superficial treatment of Forth. It's more of a simple RPN calculator than a Forth.


This is brilliant! Equivalent, in its way, to C++ template metaprogramming. It was cheeky to do a Forth instead of an ML. The traditional way to demonstrate TC is by sieving primes at compile time.

C++ is actively replacing its compile-time ML with core language features, but isn't there yet. Still, it has been many years since I needed to code any of my own TMP.


Sort of, arithmetic is implemented (through the trait-eval crate) from the Peano axioms. Probably it is not very fast, even in comparison to C++ standards.


Indeed, could possibly be sped up by using typenum[1] instead of trait-eval, as it is not Peano axioms based.

[1] https://docs.rs/typenum/


> Rust's trait system is Turing complete

has anyone tried implementing rust in rust's trait system


How about making an LLVM backend which compiles to rust traits?


You're thinking too small. I want to see it run Doom ;)


Doom might actually be easier! Implementing borrowck is hard.


Turing complete≠uses the syntax of your favorite programming language.


Any Turing-complete system can do anything a general computer can do, which includes compiling your favorite programming language.


No, that's not what Turing complete means. Turing complete means it can do an arbitrary computation, but "implementing Rust" usually means it needs to be able to take in a string of code and produce a binary, which means your program needs to have some way of actually doing that. Sure, you can encode the compiler into the Turing machine, but an arbitrary Turing-complete tarpit may not actually have the syntax to know what a string is. Usually the best you can do is encode the programming language into some form the machine can understand. (For example, with Fractran you'd encode your input as some sort of Gödel numbering before giving it to the program.)


Compiling is an arbitrary computation.

Encoding the inputs/outputs for a given TC system may be a pain, but that is irrelevant to expressiveness power.


Right, I'm not saying you can't put the compiler for the encoded input/outputs into the machine or that it's not expressive enough. I'm just saying that you'll have to encode stuff, it's not like traits somehow can magically give you a "compileToRust()" function that you pass the unmodified code into.


Yeah, but when someone mentions some system is TC they are not claiming input/output is easy to encode.

In fact, in most cases, when you hear the TC claim it is about an exotic system :)


Fractran accepts any positive integer as an argument, and it is trivial to convert any, say, UTF8 encoded string to a positive integer.


Incorrect.

Turing completeness means using the syntax of your favorite programming language is "merely" a matter of writing the appropriate program.

What GP is proposing is completely impractical, of course. But not unreasonable, and certainly not impossible.


No, because the program may not be able to actually understand the syntax of your favorite programming language. As I mentioned in another comment, FRACTRAN is Turing-complete and you cannot just shove a string of Rust code at it because it has no idea what a string is; the best you can do is implement the "compiler" as working on some numerically-encoded version of Rust and producing some encoded version of a native binary on the other side. So you've lost the syntax because you've had to do that additional encoding.


I understand the distinction between a string and a numerically-encoded version of it, but my computer doesn't.


But your computer does understand the distinction between a string numerically encoded as consecutive bytes in memory, versus a string encoded as a arbitrary-precision integer equal to ( 2^(first ASCII value) * 3^(second ASCII value) * 5^(third '') * 7^(fourth '') * ... ).


You are misunderstanding the difference between “can in theory” and “presently able to do.”


I don't think so; would you mind explaining why you think that?


A Turing-complete language can, in theory and assuming no real-world limitations like RAM, do any computation. You can use Rust traits to compile Rust, or run a JVM instance, or whatever. Input and output are limited to what the compiler has access to, but you could perhaps have input as source files and output as text strings in a Rust binary.

That doesn't mean that the OP's hack can be used today, or even tomorrow for compiling-Rust-in-Rust. But you could, in theory, do so.


My point is that it's not actually going to be "you can type Rust code here and the trait system will magically compile it", it at best is going to be "you make some trait abomination that is somehow maps to the Rust program you wanted to compile and the machine will give you back some other abomination which you can through some encoding process get some sort of useful result out of".


Wow that's what I call an emergent property!


Sick! :-)


Finally, Rust is production-ready.

It is neat to see something more practical than an Brainf* interpreter.


A list of even more practical pieces of production code in Rust: https://wiki.mozilla.org/Oxidation#Shipped


I suspect you missed the /s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: