I've been working on a C compiler with the goal of ANSI C compliance, and most of the projects described as a "C compiler" seem to only support an arbitrary subset of C. This project seems to support more than most educational/toy CCs, but I still believe there’s value in implementing full standard C.
I am a faculty member at a university, and the main reason I initiated this project was to create a practical learning environment for my students. I have noticed that many tools labeled as 'C compilers' only partially implement the C language, which has been a source of frustration for me. My goal is to demonstrate how to build a basic C compiler and enhance it to include some key features of the C99 standard, as well as optimization strategies commonly found in contemporary optimizing compilers. Despite its modest size, this project is robust and capable of self-hosting, meaning that students have the opportunity to develop an optimizing compiler that can compile its own code, progressively refining it for improved instruction per cycle (IPC) and better code density. Unable to find an existing one that met these expectations, I wrote a new one with my students.
To me, a useful cutoff would be to define "C compiler" as anything that can compile TCC (the tiny c compiler, that can more-or-less compile the old C versions of GCC).
Maybe it's legitimate to say "TCC must be a single-file amalgam first and you have to use an external preprocessor".
What're you thinking of doing with the preprocessor? Accept the complexity and build that too, run a pre-existing one, implement a subset of it, other...
CPP needs to run after lexing, and integer constant expressions need to be parsed and interpreted for #if.
So I'm trying to implement my own since I'm already doing lexing/parsing/interpreting.
Implementing everything end-to-end also seems like the only way to output decent error messages.
The C preprocessor is hilariously underspecified in the standard, so implementing the standard doesn't guarantee that you'll be able to handle real-world C programs (even ones that don't use GNU or clang extensions).
K&R preprocessor was indeed underspecified and allowed lots of variations---much of those issues can be seen in the GCC manual [1]---, but the current ISO C is much better at that job AFAIK. I think `## __VA_ARGS__` is the only popular preprocessor extension [2] at this moment, as the standard replacement (`__VA_OPT__`) is still very new.
Yes, consider the case of shecc. It requires just a handful of C code lines to interpret directives set in the C preprocessor. Unlike relying on existing tools like cpp, as, or ld, shecc stands alone as a minimalist cross-compiler. This design could be particularly beneficial for students delving into the study of compiler construction. See https://github.com/sysprog21/shecc/blob/master/src/lexer.c#L...
I largely meant a standard-complaint implementation though, which shecc doesn't claim to be. ;-) In comparison I can easily see that this lexer is not suitable for preprocessor because C requires a superset of numeral tokens [1] during the preprocessing phase.