Hacker News new | past | comments | ask | show | jobs | submit login
I wrote the least-C C program I could (briancallahan.net)
246 points by ingve on Feb 21, 2022 | hide | past | favorite | 131 comments



Steve Bourne used a set of macros to enable ALGOL-like programming in C and used it to implement his Unix shell - https://research.swtch.com/shmacro


This is obviously terrible, but my favorite part of this whole thing is the fact that he included the parentheses in the IF/THEN macros, so you didn't have to do it for the condition in the if statement. So, like, this line doesn't need parentheses around the condition:

    IF (n=to-from)<=1 THEN return FI
All modern languages that try and do a refresh on the C style (Rust and Swift notably) do this, it's clearly the right idea. They should just update the C and C++ syntax to make those parentheses optional at this point.

(PS. some people would also recoil at the assignment-as-expression used in that line, but that's just good clean fun!)


They already make them optional for single nested statements, which causes a gigantic mess when you extend code and forget to add them.

   if (a)
       doThis();
       andThat();
is interpreted as

    if(a){
      doThis();
    }
    andThat();
Some compilers are nice enough to throw around misleading indentation warnings, but without an explicit block termination like FI this just causes issues all over the place.


he meant making the (a) part optional to make "if a" instead. parenthesis are (), {} are braces :)


One of the few times I get to remember that English is only my second language. Can't really think of a reason for not dropping them unless there is a weird corner case in the existing grammar.


If you dropped the parentheses around the condition, I think you’d have to make the braces around the body mandatory.

Many (most?) C style guides make the braces mandatory, but it’d be a big step to actually change the language syntax that way. Tons of existing code would need to be updated. .. although I guess that could be automated!


You could probably do an either or thing: "You can drop the parentheses if you use braces. But you can't drop both". Though you're right, this is why it'll never happen.

Basically, C had a choice of eliding the parentheses around the condition or the braces around a single-statement block, and they chose the braces. With a half-century of hindsight, that was probably the wrong choice.


Interesting, either/or could work! That's a nice idea.

Syntactically it seems unambiguous because after an 'if' you just need to check for an '(', and if you don't find it you're looking for a '{' after an expression.

You'd lose forward compatibility, but I think there's precedent for that in C language revisions.


> One of the few times I get to remember that English is only my second language.

Most native speakers aren't aware of the terms. They'll call anything by any name.

There is a significant constituency for "curly brackets {}" and "square brackets []".

Anyway, there are mistakes you could make that would give you away as a nonnative speaker, but that wasn't one of them.


> There is a significant constituency for "curly brackets {}" and "square brackets []".

Yes, but there is still basically no constituency for “parentheses” (“{}”) or “parentheses” (“[]”), except in some narrow specific contexts; in programming, one example would be discussion of certain Lisp dialects where either all or certain uses of “()” in classic Lisp can (or, in the “certain” case, sometimes must) use the others, in which context “parentheses” are sometimes used generically for all three.

So, while there is a variety of ways paired delimiters are described by native speakers, the particular use here was still outside of the normal range of that variation.


The GP is only correct in America, and only in formal usage.

In everyday usage in the UK, I’d wager that most refer to () as brackets, [] as square brackets and {} as curly brackets.


This is not relevant to the actual context of the remark that was written. The mistake being corrected is the misreading of the word "parentheses", believing it to refer to `{` and `}`. Even in the UK, that conclusion is wrong. So the GP you refer to as "only correct in America" is simply "correct", period (full stop).


I never realized this. Does "parenthetical" still have the same meaning?


Yep.

Fun fact: parenthesis / parenthetical can mean 'a word or phrase inserted as an explanation or afterthought into a passage which is grammatically complete without it, in writing usually marked off by brackets, dashes, or commas.' (Merriam-Webster) - so you can have a parenthetical statement without using actual parentheses.


would that count?


In everyday use, by non programmers, they're all just brackets.


Good point - I should have said "everyday use among programmers". I don't imagine people regularly encounter braces or square brackets if not working in code.


ha, I was googling FORTRAN C macros and went nowhere, that was the page I was looking for :)


Lennart Augustsson once wrote the least-Haskell Haskell program, amazingly without the help of a preprocessor:

https://augustss.blogspot.com/2009/02/regression-they-say-th...

And to answer the question in the top comment:

https://hackage.haskell.org/package/BASIC-0.1.5.0/docs/Langu...


cpaint.h

    #include<curses.h>
    #include<stdio.h>
    #define ;
    #define do
    #define ʌ *
    #define N -1
    #define call
    #define main.
    #define , ];
    #define ꞉= =
    #define = ==
    #define ; ];
    #define not !
    #define I int
    #define M 256
    #define or ||
    #define end ;}
    #define CALL }
    #define S case
    #define X x<<8
    #define size [
    #define <> !=
    #define var int
    #define Y y<<16
    #define begin {
    #define F FILE*f
    #define ꞉integer
    #define POOL ;}}
    #define W p[y*q+x]
    #define Z (W&(M N))
    #define packed char
    #define OK break;case
    #define procedure int
    #define fill꞉= return
    #define close fclose(f)
    #define readln(a) c=fgetc(f)
    #define H(a,b) mvaddch(a,b,' ')
    #define writeChar(a) fputc((a),f)
    #define open(a,b) F=fopen((a),(b))
    #define A(a) attron(COLOR_PAIR(a))
    #define B(a) attroff(COLOR_PAIR(a))
    #define read(a) switch(getch()){case
    #define draw(a) {W=Y;W|=X;W|=((a)&0xff);}
    #define LOOP for(y=0;y<w;y++){for(x=0;x<q;x++){
    #define check if(fgetc(f)!=83){fclose(f);return N;};w=fgetc(f);q=fgetc(f)
    #define start initscr();clear();keypad(stdscr,TRUE);cbreak();noecho();curs_set(0)


There was a reddit thread about crafty Unicode usage in programming a few years ago.

https://www.reddit.com/r/rust/comments/5penft/parallelizing_...

> If you look closely, those aren't angle brackets, they're characters from the Canadian Aboriginal Syllabics block, which are allowed in Go identifiers. From Go's perspective, that's just one long identifier.

The thread goes downhill fast from there to the point where

> I once wrote a short Swift program with every identifier a different length chain of 0-width spaces.


Checkout this Arthur Whitney (Genius Language Designer/Programmer) thread - https://news.ycombinator.com/item?id=19481505

In the above thread, user "yiyus" actually went through the macro code line-by-line and made explanatory notes; unbelievable! - https://docs.google.com/document/d/1W83ME5JecI2hd5hAUqQ1BVF3...


Yet another pre-processor-to-victory post. Check the header in the source.

If we're going to use crazy header files, I want to see someone get the linux kernel to build and boot while including this: https://gist.github.com/aras-p/6224951


> #define M_PI 3.2f

A long time ago, a friend who was studying mathematics at the time, approached me laughing hysterically and showed me a page in an "intro to C"-style book. It showed an example of how one would write a "get circumference of a circle" function. At the top of the code, there was a #define for the value of pi.

The text describing the code said something like this about why pi is #defined and not included directly in the expression:

"We define pi as a constant for two reasons: 1) it makes the expressions using it more readable 2) should the value of pi ever change, we will only have to change it in one place in the code"


As funny as it sounds, there is still a bit of truth in there, though: code might change from using floats to double for example, so you might want to replace single-precision constants by double-precision constants. Only need to replace a single pi constant in that case. :)


There is a similar joke in the book Learning Perl, 3rd Edition by Randal L. Schwartz and Tom Phoenix. I wrote a post about it here fourteen years ago: https://susam.net/blog/from-perl-to-pi.html . Quoting from the book:

"It's easier to type $pi than 𝜋, especially if you don't have Unicode. And it will be easy to maintain the program in case the value of ever changes."

There is also a comment by Randal Schwartz in the comments section where he credits Tom Phoenix with that particular bit of humour.


Good catch! I might be misremembering this event and it was actually Perl, not C. I see Learning Perl, 3rd Edition was published in 2001, which does put its publication at about the right time in early 00s.


This #define would have almost been required by law in Indiana: https://en.wikipedia.org/wiki/Indiana_Pi_Bill


Add Pi to locale?


Depends if you're using biblical pi


This is just too good. It must be a joke on the author's part :D


Actually I find it pretty clever because pi value may change in a programming context, as can change its precision


We’ve been wrong before :)


What, you need more than 3.2f?


On more serious note 22.0/7 or 3.1 in base 7 is a rather good approximation sufficient to quite a few problems and good to calculate things in one’s head.


Fun fact, 22/7 is only accurate to one part in 2,500, or 0.04%, netting you one extra digit (i.e 3.14 is less accurate with 3 digits but 3.142 is more).

Most remember 3.14159 which is way better than 22/7, but pi has an even better unusual rational approximation, 355/113 gets you two extra digits, it is accurate to one part in 12 million, or 0.000008% . So you have to go to 3.1415927 to beat it.


JPL answers the question: How Many Decimals of Pi Do We Really Need?

> For JPL's highest accuracy calculations, which are for interplanetary navigation, we use 3.141592653589793.

https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-decimal...


That's not saying much as it appears to just be double precision, but yeah, rational approximations only save you one or two digits and after a certain point it’s more nerdy to just memorize pi.



But why is it then 3.2f instead of 3.1? Seeing as 3.14 gets rounded to 3.1. Or is that part of the joke and the reason for why the value nee changing?


Regardless of the actual answer, it can make sense to round it up depending on the use case e.g. if you’re calculating the tolerances for a shaft onto which you place a disk, computing the disk diameter (and thus volume, and weight) to be larger than actual provides a safety margin which rounding down would not. Any additional safety margin added afterwards might not be sufficient in all round-down cases.


Sorry, I should be more clear. This is a line from the header linked by the parent comment that reminded me of this story.


Oh my the defines. What is language and semantics?

"When I use a word," Humpty Dumpty said, in a rather scornful tone, "it means just what I choose it to mean - neither more nor less."

Very ugly, ill-defined, uncodified democracy. No one tells you when you vote, no one counts up the votes, there are no official results.

When you speak or communicate with someone else, you are voting with them. "I vote this word means X, for a poor/crude/coarse agreement of what X is the first place."

And this is propaganda at its finest. Evil doublespeak: say one thing, and it actually is another. Snuck in.

Fantastic.

Related: https://github.com/Droogans/unmaintainable-code


I don't quite understand these lines:

    #define ;

    #define do
I was under the impression it was "#define CNAME value" – what does it mean when there is no value? A trip to Google didn't turn up anything for me, so I'm wondering if a C master can weigh in. Thanks!


It defines those to a value of "", effectively stripping them from the source code.

The most common place where this sort of thing occurs is the common idiom

    #ifdef DEBUG
    #define debug(...) realdebug(__VA_ARGS__)
    #else
    #define debug(...)
    #endif
which allows you to sprinkle debug() calls through your code and have them disappear if you compile without defining the macro DEBUG.


Of course for your quoted use it's more common to see:

    #define debug(...) ((void)0)
This forces you to use debug() as a statement. If no value is defined you could omit the semicolon on a debug() statement and it still compiles.


Can I do it with an #ifndef?


> I use lots of characters that look like ASCII but are in fact not ASCII but nonetheless accepted as valid identifier characters.

Clever, I was wondering how the : was done, but it's an abomination :-/

With some simple improvements to the language, about 99% of the C preprocessor use can be abandoned and deprecated.


Walter, D has conditional compilation, versioning and CTFE without preprocessor so I guess that covers the 99% "sane" functionality. Where do you draw the line between that and the 1% abomination part, i.e. your thoughts on, say, compile time type introspection and things like generating ('printing') types/declarations?


The abomination is using the preprocessor to redefine the syntax and/or invent new syntax. Supporting identifier characters that look like `:` is just madness.

Of course, I've also opined that Unicode supporting multiple encodings for the same glyph is also madness. The Unicode people veered off the tracks and sank into a swamp when they decided that semantic information should be encoded into Unicode characters.


What other kind of difference should be encoded into Unicode characters? For example, the glyphs for the Latin a and the Cyrillic а, or the Latin i and the Cyrillic (Ukrainian, Belarusian, and pre-1918 Russian) і look identical in practically every situation, and the Latin (Turkish) ı and the Greek ι aren’t far off. At least not far off compared to the Cyrillic (most languages) д and the Cyrillic (Southern) g-like version (from the standard Cyrillic cursive), or the Cyrillic т and the several Cyrillic (Southern) versions that are like either an m or a turned m (from the cursive, again). Yet most people who are acquainted with the relevant languages would say the former are different “letters” (whatever that means) and the latter are the same.

[Purely-Latin borderline cases: umlaut (is not two dots in Fraktur) vs diaeresis (languages that use it are not written in Fraktur), acute (non-Polish, points past the letter) vs kreska (Polish, points at the letter). On the other hand, the mathematical “element of” sign was still occasionally typeset as an epsilon well into the 1960s.]

Unicode decides most of these based on the requirement to roundtrip legacy encodings (“have these been ever encoded differently in the same encoding?”), which seems reasonable, yet results in homograph problems and at the same time the Turkish case conversion botch. In any case, once (sane) legacy encodings run out but you still want to be consistent, what do you base the encoding decisions on but semantics? (On the other hand, once you start encoding semantic differences, where do you stop?..) You could do some sort of glyph-equivalence-class thing, but that would still give you no way to avoid unifying a and аeveryone who writes both writes them the same.

None of this touches on Unicode “canonical equivalence”, but your claim (“Unicode supporting multiple encodings for the same glyph is [...] madness”) covers more than just that if I understood it correctly. And while I am attacking it in a sense, it’s only because I genuinely don’t see how this part could have been done differently in a major way.


It's a good question. The answer is straightforward. Let's say you saw `i` in a book. How would you know if it is Latin or Cryillic?

By the context!

How would a book distinguish `a` as in `apple` from `a` as in `a+b`? (Unicode has a separate letter a from a math a.)

By the context!

This is what I meant by Unicode has no business adding semantic content. Semantics come from context, not from glyph. After all, what if I decided to write:

(a) first bullet point

(b) second bullet point

Now what? Is that letter a or math symbol a? There's no end to semantic content. It's impossible to put this into Unicode in any kind of reasonable manner. Trying to do it leads one into a swamp of hopelessness.

BTW, the attached article is precisely about deliberately misusing identical glyphs in order to confuse the reader because the C compiler treats them differently. What better case for semantic content for glyphs being a hopelessly wrongheaded idea.


I'm obviously not Walter, but I have a succinct answer that may upset a few people, but avoids a lot of confusion at the same time.

The idea of a letter in an alphabet and a printable glyph for that letter are two different ideas. Unicode could have and probably should have had a two-layer encoding where the letters are all different but an extra step resolves letters to glyphs. Where one glyph can represent more than one letter, a modifier can be attached to represent the parent alphabet so no semantic information is lost. Comparison for "same character" would be at the glyph level without modifiers, and we could have avoided a bunch of different Unicode equivalence testing libraries that have to be individually written, maintained, and debugged. Use in something like a spell checker, conversion to other character sets, or stylization like cursive could have used the glyph and source-language modifier both.


(I expect Walter probably has better things to do than to reply to random guys on the ’net, but we can always hope, and I was curious :) )

First off, Unicode cursive (bold, Fraktur, monospace, etc.) Latin letters are not meant to be styles, they are mathematical symbols. Of course, that doesn’t mean people aren’t going to use them for that[1], and I’m not convinced Unicode should have gotten into that particular can of worms, but I think you can consistently say that the difference between, for example, an italic X for the length of a vector and a bold X for the vector itself (as you could encounter in a mechanics text) is not (just) one of style. Similarly for the superscripts and modifier letters—a [ph] and a [pʰ] or a [kj] and a [kʲ] in an IPA transcription (for which the modifiers are intended) denote very different sounds (granted, ones that are unlikely to be used at the same time by a single speaker in a single language, but IPA is meant to be more general than that).

(Or wait, was this a reply to my point about Russian vs Bulgarian d? The Bulgarian one is not a cursive variant, it’s derived from a cursive one but is a perfectly normal upright letter in both serif and sans-serif, that looks exactly the same as a Latin “single-storey” g as in most sans-serif fonts but never a Latin “double-storey” g as in most serif fonts, and printed Bulgarian only uses that form—barring font problems—while printed Russian never does. I guess you could declare all of those to be variants of one another, even if it’s wrong etymologically, but even to a Cyrillic user who has never been to Bulgaria that would be quite baffling.)

As to your actual point, I don’t think the comparison you describe could be made language-independent enough that you wouldn’t still end up needing to use a language-specific collation equivalence at the same time (which seems to be your implication IIUC). E.g. a French speaker would usually want oe and œ to compare the same but different from o-diaeresis, but a German speaker might (or might not) want oe and o-umlaut to compare the same, while every font renders o-diaeresis and o-umlaut exactly the same. French speakers (but possibly not in every country?) will almost always drop diacritics over capital letters, and Russian speakers frequently turn ё (/jo/, /o/) into е (/je/, /e/) except in a small set of words where there’s a possibility of confusion (the surnames Chebyshev and Gorbachev, which end in -ёв /-of/, are well-known victims of this confusion). Å is a stylistic varisnt of aa in Norwegian, but a speaker of Finnish (which doesn’t use å) would probably be surprised if forced to treat them the same.

And that’s only in Europe—what about Arabic, where positional variants can make (what speakers think of) a single letter look very different. Even in Europe, should σ and ς be “the same glyph”? They certainly have the same phonetic value, and you always have to use one or the other...

Of course, we already have a (font-dependent) codepoint-to-glyph translation in the guise of OpenType shaping, but it’s not particularly useful for anything but display (and even there it’s non-ideal).

[1] https://utcc.utoronto.ca/~cks/space/blog/tech/PeopleAlwaysEx...


printed Bulgarian only uses that form

This is a total pedantitangent but I don't think that's actually true. These wikipedia pages don't talk about it directly but I think give a bit of the flavour/related info that suggest it's not nearly that set in stone:

https://bg.wikipedia.org/wiki/%D0%91%D1%8A%D0%BB%D0%B3%D0%B0...

https://bg.wikipedia.org/wiki/%D0%93%D1%80%D0%B0%D0%B6%D0%B4...

The second one, in particular, says early versions of Peter I's Civil Script had the g-looking small д, so these variants have been used concurrently for some time.


I made no mention of collation, alternate compositions, or of fonts. All I'm saying is that Unicode from the beginning could have had capital alpha and capital Latin 'A' been the same glyph with a glyph-part representation and a separate letter-part representation could have made clear which was which. O-with-umlaut and o-with-diareses could have been done the same. Since you've mentioned fonts, I'll carry on through that topic. Rather than having two code points with two different entries in every font, we could have considered the glyph and the parent alphabet as two pieces of data and had one entry in the font for the glyph.


Ignoring Unicode and focusing just on C: if the glyph matches a glyph used in any existing C operator maybe it shouldn't be legal as an identifier character.


I’m not defending either standard Unicode identifiers or C Unicode identifiers (which are, incidentally, very different things, see WG14/N1518), no :) The Agda people make good use of various mathematical operators, including ones that are very close to the language syntax (e.g. colon as built-in type ascription and equals as built-in definition, but Unicode colon-equals as a substitution operator for a user-defined type of terms in a library for processing syntax), but overall I’m not convinced it’s worth it at all.

As a way to avoid going ASCII-only, though, excluding only things that look like syntax might be simultaneously not going far enough (how are homograph collisions between user-defined identifiers any better?) and too far (reliably transplating identifiers between languages that use different sets of punctuation seems like it’d be torturously difficult).


That ship sailed long before Unicode. Even ASCII has characters with multiple valid glyphs (lower case a can lose the ascender, and lower case g is similarly variable in the number of loops), not to mention multiple characters that are often represented with the same glyph (lower case l, upper case I, digit 1).


That's a font issue with some fonts, not a green light for blessing multiple code points with the exact same glyph.

In fact, having a font that makes l I and 1 indistinguishable is plenty of good reason to NOT make this a requirement.


> The Unicode people veered off the tracks and sank into a swamp when they decided that semantic information should be encoded into Unicode characters.

As if that weren't enough, they also decided to cram half-assed formatting into it. You got bold letters, italics, various fancy-style letters, superscripts and subscripts for this and that.. all for the sake of leagacy compatibility. Unicode was legacy right from the beginning.


The "fonts" in Unicode are meant to be for math and scientific symbols, and not a stylistic choice. Don't use them for text, as it can be a cacophony in screen readers.

Unicode chose to support lossless conversion to and from other encodings it replaces (I presume it was important for adoption), so unfortunately it inherited the sum of everyone else's tech debt.


Unicode did worse than that. They added code points to esrever the direction of text rendering. Naturally, this turned out to be useful for injecting malware into source code, because having the text rendered backwards and forwards erases the display of the malware, so people can't see it.

Note that nobody needs these code points to reverse text. I did it above without gnisu those code points.


Yeah, where do you stop when you start adding fonts to Unicode?


If #include <𝒸𝓊𝓇𝓈𝒾𝓋𝑒.h> is wrong, I don't want to be right.


𝐼 𝘭𝘪𝘬𝘦 𝑢𝑠𝑖𝑛𝑔 𝐛𝐨𝐥𝐝 𝑎𝑛𝑑 𝘪𝘵𝘢𝘭𝘪𝘤 𝑡𝑒𝑥𝑡 𝑗𝑢𝑠𝑡 𝑡𝑜 𝒎𝒆𝒔𝒔 𝑤𝑖𝑡ℎ 𝑎𝑛𝑦𝑜𝑛𝑒 𝑤ℎ𝑜'𝑠 𝑡𝑟𝑦𝑖𝑛𝑔 𝑡𝑜 𝑢𝑠𝑒 𝑡ℎ𝑒 𝑠𝑒𝑎𝑟𝑐ℎ 𝘧𝘶𝘯𝘤𝘵𝘪𝘰𝘯 𝑖𝑛 𝑡ℎ𝑒𝑖𝑟 𝑏𝑟𝑜𝑤𝑠𝑒𝑟 𝑜𝑟 𝑒𝑑𝑖𝑡𝑜𝑟. Sᴍᴀʟʟ ᴄᴀᴘs ᴡᴏʀᴋ ғᴏʀ ᴀ sɪᴍɪʟᴀʀ ᴇғғᴇᴄᴛ. ℑ𝔱 𝔤𝔢𝔱𝔰 𝔢𝔵𝔱𝔯𝔞 𝔣𝔲𝔫 𝔦𝔫 𝔯𝔦𝔠𝔥 𝔱𝔢𝔵𝔱 𝔢𝔡𝔦𝔱𝔬𝔯𝔰 𝔴𝔥𝔢𝔯𝔢 𝔶𝔬𝔲 𝔠𝔞𝔫 𝔪𝔦𝔵 𝔲𝔫𝔦𝔠𝔬𝔡𝔢 𝔰𝔱𝔶𝔩𝔢𝔰 𝔞𝔫𝔡 𝔯𝔢𝔞𝔩 𝔰𝔱𝔶𝔩𝔢𝔰. pᵁrⁿoͥvͨiͦdͩeͤs oͤpⁿpͩoˡrͤtˢuˢnities cᶠoͦnͬfusing aᵖnͤdͦᵖˡᵉ sͭoͪfͤtͥwͬare.


Use Markdown if you want italics.


We have emojis so we’re probably not far from Unicode characters that blink.


To clarify, what is needed are:

1. static if conditionals

2. version conditionals

3. assert

4. manifest constants

5. modules

I occasionally find macro usages that would require templates, but these are rare.


One other thing that would be great that sometimes people use the preprocessor for is having the names variables/enums as runtime strings. Like, if you have an enum and a function to get the string representation for debug purposes (i.e. the name of the enum as represented inside the source code):

    typedef enum { ONE, TWO, THREE } my_enum;

    const char* getEnumName(my_enum val);
you can use various preprocessor tricks to implement getEnumName such that you don't have to change it when adding more cases to the enum. This would be much better implemented with some compiler intrinsic/operator like `nameof(val)` that returned a string. C# does something similar with its `nameof`.


> you can use various preprocessor tricks to implement getEnumName such that you don't have to change it when adding more cases to the enum.

For those who don’t know: the X Macro (https://en.wikipedia.org/wiki/X_Macro, https://digitalmars.com/articles/b51.html)


Hey, even an article written by Walter, that's a fun coincidence! :)

This is slightly different than the form I've seen it, but same idea: in the version I've seen, you have a special file that's like "enums.txt" with contents like (warning, not tested):

    X(red)
    X(green)
    X(blue)
and then you write:

    typedef enum { 
        #define X(x) x
        #include "enums.txt"
        #undef X
    } color;

    const char* getColorName(color c) {
        switch (c) {
            #define X(x) case x: return #x;
            #include "enums.txt"
            #undef X
        }
    }
Same idea, just using an #include instead of listing them in a macro. Thinking about it, it's sort-of a compile time "visitor pattern".


As an update, I removed all use of the X macro in my own code.


I like that ONE == 0.


Did not even think about that :) Just so used to thinking of enums like that as opaque values.


> With some simple improvements to the language, about 99% of the C preprocessor use can be abandoned and deprecated.

Arguably the C feature most used in other languages is the C preprocessor's conditional compilation for e.g. different OSes. Used by languages from Fortran (yes, there exists FPP now - for a suitable definition of 'now') to Haskell (yes, `{-# LANGUAGE CPP #-}`).


In C++, anyway. C’s expressiveness, on the other hand, is pretty weak, and a preprocessor is very useful there.

A better preprocessor (a C code generator, effectively) would be a simple program that would interpret the <% and %> brackets or similar (by “inverting” them). It is very powerful paradigm.


You're talking about metaprogramming. I've seen C code that does metaprogramming with the preprocessor.

If you want to use metaprogramming, you've outgrown C and should consider a more powerful language. There are plenty to pick from. DasBetterC, for example.


But the <%-preprocessor would be the most powerful metaprogramming tool, would it not? Simply because the programmer would have at their disposal the power of the entire real programming language as opposed to being limited to whatever template language happens to be built in. For instance, if I want to generate a piece of code to define an enum and, at the same time, have a method to serialize it (say, into XML), then with <% it is a trivial task, whereas in C# I need to define and use some weird "attribute" class, while C++ offers me no way whatsoever to accomplish this, with all its metaprogramming power. Is D different in this regard?


D can indeed do it. But that is way too advanced for what C is.

I'll repeat that if metaprogramming is important to you, you need a more advanced language. Why are you using C if you want advanced features?


Because it seems to me that metaprogramming/preprocessing/code generation is orthogonal to how advanced or complex the language is.


Back when I had just learned Pascal, and was beginning to learn C, I did some of this. No idea why I thought that would make it easier to learn. I did not take it as far as the author of this article did. But I did expand it to function calls like "#define writeln printf". Looking back, I'm a bit amazed I managed to learn it, as I was obviously putting more work into not learning C than learning it.


It was practically a rite of passage back in the days when Pascal and Pascal-like languages were common to do this with C....


There's also https://libcello.org/ a popular (?) macro-heavy library which makes C feel modern.


"modern" meaning less explicit?


Meaning classes, algebraic data types, pattern matching, boxed objects, iterators and garbage collection. All they need is smart pointers or a borrow checker and it'd practically be C++ or Rust, except it's rather brittle because it's just a bunch of macros.


Have you ever seen it used in the wild?


Their FAQs say that's unlikely:

> Is anyone using Cello?

> People have experimented with it, but there is no high profile project I know of that uses it. Cello is too big and scary a dependency for new C projects if they want to be portable and easy to maintain.


There was the "Val Linker"[1] (also for the Amiga, though I can't seem to find that version), written in a kind of Pascal-ish C, powered by macros.. Snippet:

    void get_order_token()
   BeginDeclarations
   EndDeclarations
   BeginCode
    While token_break_char Is ' '
     BeginWhile
      order_token_get_char();
     EndWhile;
    copy_string(token, null_string);
    If IsIdentifier(token_break_char)
     Then
      While IsIdentifier(token_break_char)
       BeginWhile
        concat_char_to_string(token, token_break_char);
        order_token_get_char();
       EndWhile;
      lowercase_string(token);
     Else
      If token_break_char Is '['
       Then
        While token_break_char IsNot ']'
         BeginWhile
          concat_char_to_string(token, token_break_char);
          order_token_get_char();
         EndWhile;
        order_token_get_char();
        If case_ignore.val
         Then
          lowercase_string(token);
         EndIf;
       Else
        concat_char_to_string(token, token_break_char);
        order_token_get_char();
       EndIf;
     EndIf;
    return;
   EndCode
*1 https://ftp.sunet.se/mirror/archive/ftp.sunet.se/pub/simteln...


Hasn't everyone done at least something similar to this? I'm surprised, I re-define C quite often when I'm bored.


I once saw a C header that defined "BEGIN" as {, "END" as } and other pascalisms. I find it difficult to understand how some people are so stubborn to change their model of thinking.


I believe I’ve seen stuff like this used as partial help when transcribing a program from pascal or Fortran to C, from before the era of automatic tooling to help.

Whether they’d ever go back and finish the migration is not known.


The biggest pitfall with manual bulk transcribing Pascal to C back in day, was that operator precedence between both languages are really different. Not only is their model of thinking different, it is also wrong.


Well, pasting the code into VS Code ruined some of the fun, since the non-ASCII homoglyphs are now highlighted by default.


> Caught me. I had recently heard that Arthur Whitney, author of the A+, k, and q languages (which are array programming languages like APL and J), would use the C preprocessor to create his own language and then write the implementation of his language in that self-defined language.

Stephen Bourne did this in the sources of the Bourne shell, around 1977, to make C look like Algol.

Here it is in V6 Unix, 1979:

https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...

https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...


If I remember correctly, the first Bourne Shell was written in a Pascal-ish C.


It's interesting, first time I wrote C was after learning programming through Java. My "C" code was all new_<type>(..) .. I couldn't not think in Java syntax.


I once wrote a set of macros to emulate the syntax of Oberon, and then used that to write some code that could later be easily converted to real Oberon. It was a fun exercise - highly recommended.


Do you mean this?

1) Write CPP macro language to emulate Oberon syntax.

2) Write program in the above macro language.

3) This program looks like Oberon but we now have two ways of "compiling" it; a) Feeding it to the CPP/C Compiler or b) Feeding it to the Oberon compiler directly with a little bit of tweaking.

Have i understood it correctly?


Yes, you have.


Once I macro'd `const auto` to `let` in my C++ program. After a few moments of "haha C++ go rust", I got terrified and undid it.


let is now a thing in C++, but not const auto, so probably good that you undid it :-)


In header file's line 12, is there two different semicolons? Is one of them actually semicolon-looking another Unicode character?

(https://github.com/ibara/cpaint/blob/21c70acba373df932920d5f...)


U+FF1B Fullwidth semicolon. The semicolon in CJKV ideographs.


Yep. Been there. Done that.

In the 80's I worked for a guy who insisted that we wrote all our C using macros that made it look like FORTRAN, amongst much other nonsense. How fondly I remember the many hilarious hours spent trying to pin down the cause of unexpected results.

I don't remember any specific examples, but consider:

#define SQ(v) v*v

int sq = SQ(++v);


Classic pitfall with any type of text based pre processor. All variables inside need excessive amount of parenthesis.

In similar vein there is also the “do {} while 0” trick to allow macros to be appear like normal functions and end with semicolon.

Don’t even want to imagine how many more hacks would be needed to transform into another syntax using macros only.


I do like the style of kona if I need an example to confuse people about how C looks like, https://github.com/kevinlawler/kona/tree/master/src


That is the commonly accepted way to write k interpreters after all. ngn/k looks more like k than C too.

https://codeberg.org/ngn/k/src/branch/master/a.c


A “least-C” needs lazy evaluation at a minimum :-)


My thoughts exactly. It's just C with a slightly different syntax. Would be way more interesting if it was using lazy evaluation, or maybe some other kind of term rewriting, possibly with garbage collection or some smart miniature region based memory management.


> would use the C preprocessor to create his own language and then write the implementation of his language in that self-defined language

Yeah that sounds like the easiest way to make your colleagues hate you

I "love" how we had more languages in the 70s (usually created as a one-off project for people with not so much user friendliness in mind) think m4, awk, tcl, etc


Awk is actually great. M4 not so much.

Some absolute lunatic solved this year's Advent of Code in m4; it was impressive.


Terraform module args used to be very limited, and I didn’t know how to generate JSON it would take instead of HCL, so I actually used m4 to avoid repeating every template n times. And now we are sad because of course Terraform has improved quite a bit.


I used to contribute to an ultima online emulator and I definitely used awk to generate C# classes from a CSV file.


I mean we do have a lot of (perhaps too many) markdown dialects today. Wikipedia, wordpress, github, stackexchange, you name it. Last time I was using a Q&A forum for calculus course, it uses $$ to start and close a MathJax div.


My fave is Jira, where they have one syntax when creating an issue and another for editing it


Well, at least that is the obvious way to delimit a math div, isn't it?


Would you consider doing string processing in C rather than in Awk or Tcl?


It is funny question because I think most languages get string processing right. Pascal gets it right.

Except C.


But BSTR, too, is a C construct.


In FORTRAN, thank you.

(It was a long time ago, and there was no C compiler on our IBM/370...)


TCL is basically THE string processing language... because everything is a string :p.

For short scripts, awk is nice, but most people would use Python nowadays, and die hard Unix greybeards will use Perl or TCL depending on the mood.


Version 8.4 changed it a bit.


> Yeah that sounds like the easiest way to make your colleagues hate you

Well I'm not Whitney's colleague but I really like his code.


What do you like about it? I don’t think it needs to be stated why the majority of people here probably hate it, but I am curious why anyone would actively like it. I can maybe see that there’s a sense of achievement in being able to grok a codebase that is often described as unreadable


It might be unreadable in the same sense as Chinese or Russian is to someone who hasn't learned to read it. Learn to read and it turns out not to be unreadable?

I like it because it makes it easier for me to see the big picture. The forest and the mountain. It doesn't over-emphasize the bark on the trees; it doesn't drag on and make me scroll & jump through a maze of boring minute detail. At the same time, it doesn't actually bury and hide whatever detail there is; it's all there for when you need it. Whitney also generally simplifies things a lot and avoids tedious contortions others would make for portability or some theoretical conception of maintainability, readability, or vague "best practices". It's very straightforward and -- once you get past the Whitney vocabulary and general style -- there are no mountains of abstractions and layers you need to grok before you can work on the code.

The biggest problem is that after getting used to that style, "normal" code starts to feel like kindergarten books with very simple sentences written in horse sized capital letters, a handful per page. Except that those letters are not used to write a short children's story, but a complex labyrinthine machine, and the over-emphasis on minute detail just obscures the complexity and you end up with cross cutting concerns spread over thousands of lines of code and many many files. It might look clean and readable on the surface, yet: there be dragons. And nobody wants to tame those dragons because there's so much code. And then I find myself sighing and asking: why on earth do we need an entire page for what would really amount to a line of code if we didn't insist on spelling everything out like it's babby's third program after hello world? It's just tedious.

Whitney certainly writes his own sort of dragons, but it's easier to keep them all in your head. For example, the b compiler won't work out of the box on multiple platforms. I'm fairly confident I could port it without much effort, as long as the platform meets some basic requirements.


Yeah I don't think anyone likes over-engineered architecture astronaut code with too many layers and unnecessary abstractions, whether that's been formatted in the Whitney style or in a more conventional one. I think what I can't get over are the short identifiers (and filenames) and the way it just looks like a wall of text without any breathing space, though looking at another example someone posted[0] it seems there's a bit more whitespace and structure than I remember.

> Learn to read and it turns out not to be unreadable?

There's the thing, if someone learns Russian they can converse with 140 million people in Russia, similarly Ukrainians and Belarusians will be fine and they could probably make themselves understood through the Caucasus and Central Asia. If you learn to read C written in the Arthur Whitney style you can "converse" with a fairly small number of people who like the Whitney style[1]. So taking the example a bit further, I learned the Cyrillic alphabet in an afternoon and through knowing another Slavic language I can roughly parse the meaning of many Russian things I read (audio is another thing entirely, I can only pick out a handful of Czech/Russian homophones). If I had gotten up to speed with the ngn/k codebase would I be productive on one of the projects you wrote in a similar style, or is there a similar productivity wall where I'd have to first learn some idioms local to your codebase?

Sorry for the questions, I know people who like this style probably have to answer these questions fairly frequently. I am genuinely just quite curious though.

[0] = https://codeberg.org/ngn/k/src/branch/master/h.c

[1] = is there a proper name for this or is it ok to refer to it as "Whitney style"?


I don't know if there's a proper name for it. At least people who are aware of the style would probably recognize Whitney's name so that's the best term I've come up with yet.

For me, the end goal isn't Whitney style, but I've been pursuing effective programming all my life. When I learned to code, I wasn't talking to anyone except my computer, and that alone was exciting enough to make it a life-long hobby and profession for me.

Do you know what brought me to Hacker News? Arc. And Paul Graham's writings about Lisp. The message was never about a popular language that everyone speaks. If anything, it was rather the opposite: pg saw in lisp a powerful and expressive (if niche) language that makes you competitive against larger players who stick to boring mainstream languages. I wasn't interested in competition or startups, but merely in powerful ways to make the computer do what you want. I don't particularly care if I'm the only person on planet earth willing to wield that power; it's for my own enjoyment. Programming for me isn't about "product" so "productivity wall" isn't something I think about, and complaining about productivity wall would be a bit like complaining that getting fit for Tour de France is time consuming, why don't they just drive it by car?

That said, I think there are people who find K, APL, and related languages very productive in their niche. I'm definitely not speaking for everyone.

Anyway, it is the curiosity and desire to discover a powerful way to command the computer that has driven me to study Haskell, APL, PostScript, forth, TLA+, lisps, BQN, K, Erlang, and more. "Whitney C" is just one milestone along the journey, and I don't know where the journey will eventually lead; I'm just not happy with any existing language right now.

So the answer is no, learning Whitney C will not make someone immediately productive with my code, just as learning Java does not make you immediately productive C++. They are different languages. However, anything you learn can shrink the productivity wall; knowing APL or BQN, K, and Whitney C might make it easier to grasp whatever I come up with next. That applies to all of programming in general though; the more you know, the more you know, and some of that knowledge will almost always transfer. There will be familiar patterns and ideas.

I also think people seriously overestimate the productivity wall. As you say, one can learn to read cyrillic in an afternoon. Kana in a weekend. But learning Russian or Japanese is significantly more work than learning the script. In terms of scope, I'd say learning APL or Whitney C is closer to learning kana than it is to learning Japanese.

(EDIT: I also find it ironic that programmers are ostensibly excited about learning new things, yet at the same time programmers really love to complain about languages that look alien won't give a few weekends to learn them)


Good to know re the name, I typed it out a couple of times and wondered if I was just doing something stupid. I really wasn't trying to formulate an attack on this style, if someone or some organization uses it and it works then more power to them. I was really just trying to understand a bit better, but it's possible that is something that I can only really get by experience.

So I find APL, J, K and friends quite fascinating (and J is on my list to try) but I haven't seen much hostility to them. People understandably get a bit intimidated by how different it is but they usually still seem curious. The real hostility is reserved for Whitney C. In this case I don't think it's like - if you'll forgive me for abusing that human language metaphor a bit - an English speaker learning Russian, more like

    ifAnEnglishSpeakerEncountered
    aLocaleWhereEnglishIsWrittenA
    ndStructuredDifferentlyToWhat
    theyWereUsedToSomethingLikeTh
    is.
I can understand why their instinct is to recoil in horror and think "I already know a more standard English, I'm currently learning Russian and Japanese ... I have little patience for trying out this alternate form of English". It's obviously an exaggerated/contrived example, but this is genuinely how that C code appears to an outsider at first blush (or at least it did to me and a couple of my friends).

That said your replies have piqued my interest, I'm gonna have to properly dig through that ngn/k repo some day. If I turn into a Whitney C guy then I'm holding you responsible :D


Yep. It's mostly a knee jerk reaction.

After accustomization with a style that doesn't force explicit declarations of identifiers and their types, verbose type conversions, line breaks and indentation after every statement and brace, etcetra., one could definitely make a different (and similarly exaggerated) human language metaphor. For example, take some English text and feed it through a parser. Feels good to read?

    (S (NP Parsing)
       (VP refers
           (PP to
               (NP (NP the activity)
                   (PP of
                       (S (VP analysing
                              (NP a sentence)
                              (PP into
                                  (NP its component categories and functions))))))))
       .)
That's a bit how mainstream languages feel after using something that hasn't been forced into such an artificial form :) If you're willing to let go of that, you can write sentences and clauses on the same line, almost like prose!


Hehe that's actually a nice way to put it. So it's a little bit like the red pill, you can't really go back after going embracing Whitney C :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: