I only looked for a few minutes, so first impressions:
1. Try the declaration syntax
x Foo;
instead of
Foo x;
I tried it before, you might like it.
2. I think the way you're defining the AST types is a crapload of work. You should have had a bunch of dumb structs, all in one file.
Then you can see everything at once, and you aren't mixing AST representation with codegen logic. Sometimes that's a better way to do algebraic types in C or C++.
3. I don't know what "type Foo struct {...}" does but you'll save a lot of work if the type system only has names as types, nominative typing, without losing usability.
4. Personally I'd parse straight to the AST type you define and not use the mpc lib with its own AST implementation. I don't believe in parser combinator libraries, especially not in C. It's better to copy/paste those loops. Better than using a parser generator too. But since you have a parser already... not right now.
Edit: 5. Avoid looking at Zig, Myrddin, etcetera, if you can. There are obviously paths that any C-like language tends to go down in the 21st century, and the world would probably be better if you rethought the problems from a blanket slate.
> Edit: 5. Avoid looking at Zig, Myrddin, etcetera, if you can. There are obviously paths that any C-like language tends to go down in the 21st century, and the world would probably be better if you rethought the problems from a blanket slate.
I read the points above this one and cannot comment/disagree with them. However, I am very curious about (5).
- What would you classify as "etcetera" here? (ie what other languages would you list)
- What are the paths in question?
- With said blanket slate, what other mindset may be useful to keep in mind?
Blanker slate, damn iPhone. More of a blank slate.
I don't know, it's a general problem of balancing the originality you might get from not looking at other people's work, with what you miss out from not looking at it.
"Etcetera" includes other low-level languages people've made. Like, I guess Go might even count. Honestly 5 is kind of stupid. No great reason not to ignore it. It's the sort of thing to do for 1 month, but not forever.
By "paths" I mean different features that different languages have and how they do them. You could just copy how these languages try to improve the ergonomics around error handling, for example, or you could decide how you'd like to do it. Thinking from first principles it's likely you'll end up walking into exactly the same decision other languages make, only with a different choice of operator. But it's possible you'd improve matters.
Other paths are questions like implicit conversions or how do explicit conversions happen. And what do you name the bitwise negation operator? Can you do pointer arithmetic? How do you handle pointers to array elements? Do you have a one-to-one mapping from indentifiers to indentificands?
With creative works you are necessarily adapting and building on previous work. What first principles as a goal does is allow you the possibility of making new mistakes...and while that's good in a research sense, for most practical design you want to restrict the deep experiments to a certain focus area and loosely copy the rest so that you have a stable framework to test those ideas with. Shooting off in an all-original direction basically ensures you will make mistakes.
> I don't know, it's a general problem of balancing the originality you might get from not looking at other people's work, with what you miss out from not looking at it.
Yeah, that's a fun one. The impression I get is that, the only way you can reverse that knowledge bias is to have sufficient knowledge of and experience with a given field that you can look at all possible approaches objectively. But that creates a paradox, since you can only gain said amount of knowledge by studying others' work... :/
So, you're saying I should avoid looking at other languages for a month? Sorry, not clear on this bit.
As for error handling, that's a tough one. Try/catch is great for worlds without recursion (as inherent in OOP, or elsewhere), if you ask me. Go's fancy "returning an array is a first-class idea, I am so awesome" and the resulting `ok, err` is... I guess you're forced to type that out every single time and thus forced to think about it, which is good, but it still feels really inelegant. Erlang's atom-based {ok, value} / {error, Reason} return value approach seems interesting/nice/cute - but, admittedly, only because I haven't tried to actually use it (yet) ;)
How would I handle errors myself, pretending I hadn't wrote the above. Hmm. (Now all I can do _is_ think of the above. :D) Well, having something like Error/Fail be a first-class type next to True/False/NULL could be interesting, then I could do `if (something()) { ... otherthing() ... }` style constructs but with added enlightenment about failure states, so I could `if (something()) OK { ... otherthing() ... } else Fail { ... cleanup() ... }` or similar. (In this case the success/failure state would be being propagated within the scope of the if block, with appropriate scope analysis to look for ambiguity.) This is basically just renamed try/catch though, and all I've done is concretely demonstrate that language design requires investments of more than 15 minutes, heheh :)
That being said, on bitwise negation... my first instinct is to make that a function. Then I started thinking about in-source dynamic DSL lexing like Perl 6 has, "so the user can set up their own operators", and then I suddenly realized I was reinventing Forth. Raincheck #2.
Pointer arithmetic... depends on the language in question, and whether it's so low-level you want unfettered access to memory. I consider this from the perspective of something like PHP, which offers enough low-level access to be useful in a lot of sitations, but still leaves me high and dry when I least want it to. The problem of course is whether I want a language that does its own memory management or not, and that's a question I'm really headscratching over actually. (I now realize/remember PHP gets away with its relative simplicity because it's an interpreter, and that this comparison is a bit wonky. Raincheck... #3?)
Pointers to array elements is a C-ism. I'm 100% sure this can be cleaned up to be a bit more elegant, even in the context of a low-level language that allows for memory twiddling.
--
When I initially read your comment, and before I typed out all the above, for a bit I really started wondering about the balance problem you opened with. The fun paradox (if my theorization is even half correct) I mentioned is one way to look at it, but it _is_ really hard, and I didn't know have any good ideas about a solution.
One idea presented itself as I finished reading, in the form of the question "...what on earth are identificands?!"
I had no idea what that meant. And this gave me a thought.
I wonder if, it could be possible to publish a language-design tutorial, in the form of a gigantic pile of unanswered questions that do explain enough to get an understanding, but don't suggest or hint at any one particular solution to a given problem?
Obviously such a work would involve significant reinvention of a lot of wheels and a lot of duplication of work. But I wonder if it wouldn't result in a deeper understanding of the problem domain, and maybe even some newly sparked ideas.
I made up the word identificand. Like, integrand, subtrahend, identificand.
I mean, don’t take me too literally about the month thing. Obviously your mind is already poisoned by other languages. But I’d say, try to avoid just doing what other languages do, and inject some novelty. If only like a chess player going off-book.
I think you'll get more comments if you include specific code samples.
How would you implement the Fibonacci sequence in whack? How might one organize a simple game of hangman?
If it's suitable for use as a tcp server/client, how about a "echo" client?
Things like that can really show off the stdlib and the syntax choices.
Relatively few people will read through the code, and even those that do will likely understand the code better if they start from "this is the syntax or idea that needs to be implemented" as expressed in example code.
Documentation is also, obviously, more coherent to read than implementation code, and you don't seem to have any documentation explaining what features whack has (other than a note that it doesn't have a comprehensive type system).
But that function is defined as taking a Bool. What is this call supposed to do?
I’m also curious how you plan to implement match type. How would this work if you give it eg a char ? Will the compiler know what the type is and pick the right clause. I don’t really see a way to do it by inspecting the data at runtime so maybe the pointer would have to have runtime type information attached to it, but you would then need to transform that info when dereferencing the pointer as this info can’t live in memory next to the pointed-at objects if you want C compatibility.
I also can’t really tell what match type is for from your example. Are you intending to have inheritance and then using match type as a kind of ad-hoc polymorphism (eg is my Animal a Dog or is it a Cat?), or some sort of weird template-like thing, or something else entirely?
If you allow subtyping then does “func(Dog->Int)” successfully match something of type “func(Animal->Int)”?
Will commence working on a doc to explain some design choices, and what some non-obvious code fragments do; will update here when I commit.
There's no subtyping being done currently.
The 'match type' construct matches the type of an expression at compile time.
I should also add that some design choices may be reviewed before the first release.
Interesting project! I've written compiler frontends for both GCC and LLVM, and surprisingly found it easier to write one for GCC, despite LLVMs reputation on modularity. I'd love to hear the reasons why you chose LLVM for code generation over something else!
In my experience, LLVM has too much code motion for understaffed projects to seriously consider. When I’m developing small languages my goal is to reduce overall work or, barring that, keep the work constant & get some other benefit. While I always reach for c++ first, I’m under no delusion that it’s a fit language for describing easily portable, easily consumable, and stable API/ABI. For that work, C is the undisputed grand champion. As such, I generally just translate to C. With the vector intrinsics provided by Clang (or GCC), I can still target all the features I need.
The halcyon days of high level languages like C are far behind me. These days I target custom ISAs using lovingly hand-crafted machine code. My goal is to write assemblers =)
would recommend looking into writing a grammar, have that generate your AST, then do some transformations on the AST to generate code. you will save a lot of time.
I recently did that for a language that i made, via instaparse. the flexibility and speed i gained was very big. my language isn't Turing complete, but it has functions, lookup tables, and some pattern matching.
> would recommend looking into writing a grammar, have that generate your AST, then do some transformations on the AST to generate code. you will save a lot of time.
I had this thought, so I used a parser combinator (mpc) to generate the AST from the grammar and source file, then extract useful elements from the AST for codegen.
With mpc you can support macros, adding better macro definitions at compile-time, not just primitive cpp-style replacements. This would be definitely a game changer.
I wanna see a language that is both dynamic and can be compiled. Runs on a VM and on bare metal. Something like C++, Java, and Python combined. It can definitely be done and would be an interesting exercise.
JavaScript is dynamic and compiled (as are many other dynamically-typed JIT languages). Did you mean dynamically typed and optionally statically typed?
> Runs on a VM and on bare metal
JavaScript is also runs on a VM and bare metal (e.g. V8's Ignition interpreter will interpret JavaScript, but TurboFan will compile it to machine code).
This is super cool, thanks for sharing!
Are there any books or other resources that you found helpful in learning and implementing Whack? I’m dabbling a little bit with PLs and would love to hear your opinion.
You might find Types and programming languages - Benjamin C. Pierce to be particularly interesting. There's a GitHub repo with a list of materials, can't seem to find it. When I do, will remember to share!
Thank you! Looks very interesting. As for working with LLVM codegen libraries, would you recommend anything beyond the official docs? I found that most books for this sort of thing are based older APIs from a few versions ago.
I haven't come across good material on the matter. If you do use C++ I do recommend you employ RAAI idiom in code generation for help with lexical scoping for your language.
I know that mentioning other language on a thread about A language is contentious, but if you haven’t played with Nim, I seriously recommend you check it out.
1. Try the declaration syntax
instead of I tried it before, you might like it.2. I think the way you're defining the AST types is a crapload of work. You should have had a bunch of dumb structs, all in one file.
Then you can see everything at once, and you aren't mixing AST representation with codegen logic. Sometimes that's a better way to do algebraic types in C or C++.
3. I don't know what "type Foo struct {...}" does but you'll save a lot of work if the type system only has names as types, nominative typing, without losing usability.
4. Personally I'd parse straight to the AST type you define and not use the mpc lib with its own AST implementation. I don't believe in parser combinator libraries, especially not in C. It's better to copy/paste those loops. Better than using a parser generator too. But since you have a parser already... not right now.
Edit: 5. Avoid looking at Zig, Myrddin, etcetera, if you can. There are obviously paths that any C-like language tends to go down in the 21st century, and the world would probably be better if you rethought the problems from a blanket slate.