Shecc is a self-compiling compiler for 32-bit Arm and RISC-V architectures, focusing on a subset of the C language and incorporating basic optimization techniques. It generates executable Linux ELF binaries for ARMv7-A and RV32IM and includes a minimal C standard library for GNU/Linux I/O. Written in ANSI C, it is broadly compatible and features an integrated machine code generator, removing the need for external tools. The compiler uses a two-pass process for syntax checking and translating operations into machine code, and has a register allocation system for RISC-style architectures. Additionally, it includes an architecture-independent, SSA-based middle-end for enhanced optimizations.
I keep vacillating on whether to use Go or Rust for my next cli or desktop tool (eg., for desktop egui in Rust). However, every time I see C or start to write it, I just see in it too much fun. In Go I see boredem, and Rust, frustration.
I'm not writing production software for a large audience, at best tools for myself which a few others might use.
Yet, it also seems a waste to be using what many now regard as out dated tech.
It would be a great thing if a C-2030 could be arranged with a new standard library and a move-semantics compiler mode (say).
Rust is frustrating if you try to write code in it like you would in languages you already know, and assume the borrow checker is just an annoyance you’re supposed to `clone()` your way out of.
If you try and learn what the ownership rules in the compiler are trying to teach you—admittedly less comprehensibly at times than you might prefer—about software design, it is an incredibly fun and rewarding language that will shape the way you write software in every other language.
Then developers will adhere to those apis, so long as, one's own array-struct decays to the above, so there can be standard library fns against the above, eg.,
I've been working on a C compiler with the goal of ANSI C compliance, and most of the projects described as a "C compiler" seem to only support an arbitrary subset of C. This project seems to support more than most educational/toy CCs, but I still believe there’s value in implementing full standard C.
I am a faculty member at a university, and the main reason I initiated this project was to create a practical learning environment for my students. I have noticed that many tools labeled as 'C compilers' only partially implement the C language, which has been a source of frustration for me. My goal is to demonstrate how to build a basic C compiler and enhance it to include some key features of the C99 standard, as well as optimization strategies commonly found in contemporary optimizing compilers. Despite its modest size, this project is robust and capable of self-hosting, meaning that students have the opportunity to develop an optimizing compiler that can compile its own code, progressively refining it for improved instruction per cycle (IPC) and better code density. Unable to find an existing one that met these expectations, I wrote a new one with my students.
To me, a useful cutoff would be to define "C compiler" as anything that can compile TCC (the tiny c compiler, that can more-or-less compile the old C versions of GCC).
Maybe it's legitimate to say "TCC must be a single-file amalgam first and you have to use an external preprocessor".
What're you thinking of doing with the preprocessor? Accept the complexity and build that too, run a pre-existing one, implement a subset of it, other...
CPP needs to run after lexing, and integer constant expressions need to be parsed and interpreted for #if.
So I'm trying to implement my own since I'm already doing lexing/parsing/interpreting.
Implementing everything end-to-end also seems like the only way to output decent error messages.
The C preprocessor is hilariously underspecified in the standard, so implementing the standard doesn't guarantee that you'll be able to handle real-world C programs (even ones that don't use GNU or clang extensions).
K&R preprocessor was indeed underspecified and allowed lots of variations---much of those issues can be seen in the GCC manual [1]---, but the current ISO C is much better at that job AFAIK. I think `## __VA_ARGS__` is the only popular preprocessor extension [2] at this moment, as the standard replacement (`__VA_OPT__`) is still very new.
Yes, consider the case of shecc. It requires just a handful of C code lines to interpret directives set in the C preprocessor. Unlike relying on existing tools like cpp, as, or ld, shecc stands alone as a minimalist cross-compiler. This design could be particularly beneficial for students delving into the study of compiler construction. See https://github.com/sysprog21/shecc/blob/master/src/lexer.c#L...
I largely meant a standard-complaint implementation though, which shecc doesn't claim to be. ;-) In comparison I can easily see that this lexer is not suitable for preprocessor because C requires a superset of numeral tokens [1] during the preprocessing phase.
While initializing shecc, it automatically creates a visualized control-flow graph and dominator tree. These can be visualized using Graphviz. To view them, execute the command "make check-snapshots" and then inspect the created CFG.dot and DOM.dot files.