Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Sum (algebraic) types for C in one 100 line header (github.com/grego)
92 points by marosgrego 66 days ago | hide | past | favorite | 43 comments



I know it's a minor thing, but why does this link to GitHub? The repository is described only as being a mirror of a SourceHut repo, and the user's profile has a banner telling that he's part of the giveupgithub.org movement. Looking at this post's submitter username, I think OP is the one who owns the repository, so why do this?


I am confused. If the author is part of the Give Up GitHub movement, then why still keep 29 repositories on GitHub! Obviously, they didn't give it up.


not op, but github serves two purposes: git mirror, and reputation/network effects.

By relying on github for the latter while redirecting to another mirror for the former allows you to utilise github's reputational system to promote both your project AND the mirroring alternative.


Saw the title and thought "Didn't I just see C sum types in HN recently?"

https://news.ycombinator.com/item?id=40307098

> Inspired by datatype99, but consisting of one small standard-conforming C99 macro-only header that is fast to compile.


Nice system built with only a couple of C99 macros. That small header also achieves matching for specific C types over the "sum types".

In the context of C99, isn't tagged union the more usual name instead of sum type?


It's interesting but not very practical. It's an "algebra of types" with only constants, no variables, since C does not have type variables. Thus it misses much of the point of algebraic data types in a language like Haskell (deep composability and generics with very little code to write).

Take one of the most basic sum types in Haskell's base library, Maybe:

    data Maybe a = Nothing | Just a
This simple construction gives you the ability to write functions over optional values and avoid the issues with null pointers. This would be an amazing feature to have in C but it can't be achieved due to C's very limited type system.


I used to define `Maybe` in C as a macro over a tagged union. It really unveiled a number of logic errors at compile-time, but the issue of course is to use this kind of metaprogramming sanely. E.g., instantiating `Maybe` to some other macro `T(a)`, where `a` is some other (concrete) type, would already cause a headache.


I got to say adding type variables to C would be super duper. And to me actually feels rather C like as well.


+1. It cannot possibly be worse than the C++ `std::variant` blunder.


Honest question: what's wrong with std::variant? I'm quite fond of using it.

Sure it has trade offs like being empty by exception, but that's necessary if you don't want it to allocate.

Also, pattern matching would be nice instead of std::get with visitor, but that's more of a language issue than an std::variant issue


Everytime I have to use std::get<> and std::visit<> I feel like the committee has failed all of us.


> that's necessary if you don't want it to allocate.

Wait, what does one have to do with the other?


I'm not totally 100% on this, but:

valueless_by_exception happens when the move assignment operator throws (which should never happen in good C++, but is not disallowed by the language). That means, the variant used to hold value X, we were moving value Y into it, but the move assignment operator failed. What is now the state of the variant? It's not X, since we started running the move constructor, which could have started destroying the old value before throwing. But it's not Y either, because the constructor didn't finish. So what's the value that's being held? The answer is that there is no valid value.

How could this be fixed using allocation? Well, if you assume that std::variant allocated, it could be implemented in such a way that the std::variant just held a pointer to a heap area which stored the actual contents of the value. When you move assign, you construct a new element on the heap and move into that.

If the move assignment completes successfully, you swap the old value pointer for the new one and destroy the old value. But if move assignment/construction fails, you just retain the old pointer: nothing has been destroyed, and the variant still holds a valid value. This is very similar to the "strong exception guarantee" for std::vector, which is violated if the move assignment operator for the value throws.

This is one of many reasons why the move constructor/assignment operator should always be noexcept: there's no reason why it should ever throw, and it violates a bunch of these kinds of guarantees.

I think, anyway. I would be happy to be corrected if I got this wrong.


Yes exactly. Maybe being able to select the implementation strategy would have been nice, but I personally prefer noexcept constructors to an allocating variant


It's a nice idea, but I don't think it adds enough clarity to the code to justify the messy compiler warnings and errors that this kind of preprocessor abuse will eventually cause.


Question: is C gaining in popularity? If so, why exactly? Are their good reasons to be using C in product development rather than more modern languages?


It looks like some cool new kids are rediscovering it, after more than a decade of pythonisation and javascriptisation of programming. You can have things that behave like values and actually reside on the stack, you can ask the operating system for more memory and you have static fixed-size memory which is allocated when the program is run. Good old procedures without the distractions of prototypal inheritance, variable hoisting, dynamic typing, coroutines, list comprehensions and lambda callbacks.

It's probably a more relevant tool for a larger class of problems than many people imagine.


There are a large number of "C but better" options, where the direction of "better" varies. Examples would be Go, Zig, Rust, each a different "direction" of better, but generally better.

Anyone thinking of trying this I would advise against going to C directly. The footguns are dangerous enough as it is and are arguably even more dangerous if you're coming from a more modern language that if you came at C directly. You don't need to go this far back in time to have these things.

But I do strongly agree that anyone who has done nothing by dynamic scripting language development should pick up a modern static language and learn how to use it. I consider having at least one of each a basic tool in the well-rounded developer's toolkit. And if you're still running on 200x-era arguments about the advantages and disadvantages of static versus dynamic languages, as many people are, you're going to be pleasantly surprised by the modern static languages.

But you won't be pleasantly surprised by C. Most of the arguments developed for static vs. dynamic in that era were really about dynamic versus C. They all still apply; C qua C has not moved much since then.

(Maybe you could justify C as "closer to the metal" but in the modern era personally I'd recommend just going straight to assembly for orientation and a tour. You don't have to dive super deep into it to get the sense.)


I agree. There’s not a plausible scenario where I’d start a new project in C today. I’d pick one of the alternatives you mentioned if I needed that level of control and compilation to native code. The risk/reward ratio for C over any of those isn’t worth it in any case where I’d need something like it.


C evolves, even if slowly.

I think some devs are fed up of Complexity and excessive abstractions.

C has very few language features, it's a very simple language -- while all modern languages have new (complex and taxing) features.

It's kinda like a Minimalist movement, can we do the same Modern Mumbo Jumbo but in a simple, minimalistic C way?

GC and RAII? Nah, just use Arenas/Pools. Or don't use heap at all.

C is an unsafe language but modern tools get better and better every year, with GCC doing really cool static analysis and finding buffer overflows at compile time.

Also, C runs anywhere and everywhere.


Anyone who thinks C is simple is just aiming the gun at their foot. If you want simple, you want something like Scheme.


See e.g. wonderfully named "A Simple, Possibly Correct LR Parser for C11" [0], specifically the opening discussion of the ambiguities in the C grammar. Never mind the semantics, which are quite divorced from the underlying hardware! I mean, PDP-11 had a carry flag, a double-wide multiplication instruction, a combined divide-with-remainder instruction (just as x86 does) yet those are unexposed in C.

[0] https://hal.science/hal-01633123/document


Yes, but maybe I want to shoot myself in the foot or am willing to take the risk, it's my computer and nobody can tell me what to do with it. That's why I like C. I've been writing Python for almost two decades now, all the new languages are trying even more to tell me how to do things or that I'm doing them wrong, screw that, I don't want to be coddled or patronised by the language I'm using, I'm writing C.


I think parent means simple as in the language has very few facilities and the ones it gives you are very barebones (hence simple). C is simple in the sense that a screwdriver is simple compared to a power tool.

It doesn't mean that it is simple to use correctly. An expert might be able to accurately judge the torque achieved by hand, but a beginner can easily under- or over-torque things. Etc...


Frankly this is why I have trouble taking all this C advocacy seriously. Do managers really want their engineers using C? I understand how it can be pleasurable in the way that listening to vinyl is pleasureable. But my undersanding is that it takes years and years of experience to not make catastrophic errors in C. And that's a lot of risk.


> Nah, just use Arenas/Pools. Or don't use heap at all.

This will solve more than 90% of problems. I'll also add that replace C pointers with a "Fat Pointer" or Go-like slice struct and avoid the C stdlib. Then you'll have fixed 99% of issues.


Custom allocators are pretty useful, yes. And slices, although it would be nice to have an easy syntax for using them.

Still have null pointers though. So an optional type would be useful too. Billion dollar mistake and all. Oh, and you have to check errno, might want a better way to handle errors so that you have to check them, or at least acknowledge that they exist. Probably need a linter to make those stick, you really do want violating a nullability constraint to fail at compile time.

And yeah, you have to avoid libc now, but it has so much useful stuff! So we'd want a library that offers all that useful stuff, but uses our fancy custom allocators, and optionals, and error checks, and slices.

While we're dreaming, wouldn't namespaces be nice? Like you have a function do_bar on a struct Foo type, so there's this foo_do_bar function (almost like a method), it'd be nice to be able to just say foo.do_bar, y'know?

Of course, at that point, you've almost got a whole different language! But it can compile and link with C, so yeah, best of both worlds.

What if I told you...


Err.. Why does one need to even use a different language to implement Arenas and Slices? Yes having syntax level support for them would indeed be nice (something like new languages like Odin and Zig are trying) but you can start using them today and make programming in C an order of magnitude better.


> Question: is C gaining in popularity?

Did C ever go away? The computing industry is built on C. It is the foundation underpinning everything.

> Are their good reasons to be using C in product development rather than more modern languages?

Even if you use a "modern" programming language you'll be building on C at some level. Linux, Postgres, Redis, CPython, etc. are all written in C. Most programming languages have a C FFI for a reason. It is worthwhile to familiarize yourself with C if only to have an understanding of what underpins everything. Besides, you never know, you might need to write a native Python module or Postgres plugin one day.


From what I've seen of people progressing through their careers, being a strong C++ programmer helps a lot with learning Rust, and being a strong C programmer helps a lot with C++.

I'd be curious to see how new developers do with Rust out of the gate moving forward. It basically makes you generate a machine checkable proof that you're using advanced memory management techniques correctly, and those techniques were invented to make zero cost abstractions work in C++.

Anyway, for me, it's Rust for all new projects these days, but I still like the minimalism of C. Big companies say it takes about 6 months for strong engineers unfamiliar with all this stuff to reach productivity with Rust. I'd guess those same engineers could get to the point of writing sort-of-correct C in a week or so.


Perhaps this is obvious, but embedded systems. C/C++ are still really the only languages where you can basically take for granted that you'll have functional tooling and something resembling a standard library regardless of what processor you're targeting. And many embedded projects still reach for C over C++ because a lot of the functionality the C++ adds isn't really ideal for embedded environments anyway, so for team ergonomics it ends up being better to reach for the simpler language that's easier to teach to new devs, etc.


I’ve started using zig for some stuff recently and it can include c effortlessly. Also with rust it is pretty easy to interact with c (well, you need unsafe and I guess for this particular case, bindgen would not be able to provide bindings).

I would not reach to c libraries on many other platforms but recently I started using more c, just not from c..


C is the lingua franca for embedded and I think embedded is becoming bigger, both in community and in literal size. There are lots of new devices that are bigger than a small arduino (and even bigger than my pc from a decade ago) and require a lot of software written for them. This requires huge teams working on them, but the only languages available is C or in some cases some limited form of C++. Because there are new people coming to these spaces, they are taking ideas from outside of the firmware world and applying them to the pseudo-firmware world because they want to use them.

Thats my observation at least. I also think C is just so frictionless, that even if it is dangerous people will use it anyway. Yes there are a thousand footguns like people have said, but in the end you can get something compiling and running pretty easily, which is enough I guess.


I don’t know anything about trends but I still write some C because it’s widely supported across a variety of platforms.


TFA isn't about product development.

Things that you hope will be used in widely different contexts need to be created using tools that can easily be "glued" into those widely different contexts.

Right now, C remains the best glue language we have. If you want to write things that can be used from dozens of other languages (or even just 2 other languages), you are likely to write it in C.

In 5-10 years time? We'll see (no pun intended)


It’s the FFI thing - it’s next to impossible to bind to Cpp or rust in a preferred language. It’s a great tragedy that so much code is cloistered away. In 10 years Zig , maybe Mojo or Odin or something else will have stolen mind share away


I think it's less that it's gaining in popularity than that we're seeing some backlash from the people who aren't liking Rust as much as they wanted to. The boring truth is still that if you have a "bottom layer" task to solve that needs access to hardware details (be it instruction formats, SIMD algorithms, MMIO drivers, system-dependent stuff like container management, etc...) C remains the quickest and easiest and frankly the most fun environment to do it in.

I mean, there's a reason why Karpathy is hacking on "llm.c" as part of his "clean up the world" project and not "llm.rs".

Is that a good thing or bad thing? I won't engage. But I think that's where the psychology is. For myself, I genuinely love C in a way that it's clear I never will Rust.


> I mean, there's a reason why Karpathy is hacking on "llm.c" as part of his "clean up the world" project and not "llm.rs".

You post this as if everyone is supposed to know who Karpathy is or what any of that means...


Andrej Karpathy was the head of AI at Tesla and was the primary developer of the AI/Machine Learning curriculum at Stanford. He is an unabashed user of C for the development of ML.


> You post this as if everyone is supposed to know who Karpathy is or what any of that means...

Firstly, this is HN! If every commenter mentioning Karpathy has to prefix their comment with an intro about Karpathy and LLM, then HN will become a boring read.

Secondly, if you don't know who Karpathy is and what any of this means, it is not all that difficult to search for "Karpathy" and "LLM" in your favorite search engine.


Carcinization.


Variant records have existed long before (Algol 68 had them!) that that one crab-related language was a thing.


And sum types from ML circa 1972 (over fifty years ago). Indeed rust is basically an ML with affine types.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: