C Template Library

glouwbug · on Dec 30, 2020

I backported a handful of C++ STL containers to ISO C99 to exemplify that a subset of C++, namely C with Templates, is a highly usable type-safe feature within the realm of ISO C99.

Included are a couple examples, like a JSON parser and a simple 6502 compiler. I have also written a small wiki, that showcases how to use the included containers: https://github.com/glouw/ctl/wiki

tom_mellior · on Dec 30, 2020

This looks cool, but I dislike the naming. Calling a queue "queue" would be so much more readable than "que", and so on. I think the everything-is-three-letters gimmick is overall a drawback. Also, would it really hurt that much to write "#define PRIMITIVE" instead of "#define P"?

jiggawatts · on Dec 30, 2020

I've found that in forums like this, people generally react badly to comments about naming, but I've found that it actually matters more than people think it does.

Many things have exactly one correctly spelled full name, but many alternative contractions or cute variations.

Programmers can be expected to know the correct spelling of "vector" or "dequeue". They can't be expected to know your specific contraction or abbreviation of it. They can't guess and they're not mind readers. They have a muscle memory that has them type out the full word automatically, without thinking.

Don't break these expectations.

nerdponx · on Dec 30, 2020

Common Lisp is a truly awful offender in this regard, with "cute" slightly-abbreviated opaque names for subtly-varying concepts.

EQ/EQL/EQUAL/EQUALP, PROCLAIM/DECLAIM/DECLARE, and the atrocious PAIRLIS (really??). And not to mention the fact that PAIRLIS returns an association list, which is entirely different from a "plist".

And then of course CAR, CDR, CADR, CDDR, et alia.

This is all second nature for an experienced Lisp programmer, but it makes for a difficult learning experience.

sgerenser · on Dec 30, 2020

I mostly agree with this comment except “dequeue” is a bad example. C++ has called this data structure “deque” (pronounced deck) since the beginning of the STL. Is there another language/library that spells it dequeue?

mikepurvis · on Dec 30, 2020

For better or worse, Rust calls it a VecDeque, but I think this is meant to clarify that the structure is just Vec storage with some extra methods on top to provide convenient ring buffer usage patterns:

https://doc.rust-lang.org/std/collections/struct.VecDeque.ht...

chapium · on Dec 30, 2020

https://docs.oracle.com/javase/7/docs/api/java/util/Deque.ht...

IncRnd · on Dec 30, 2020

> Is there another language/library that spells it dequeue?

Yes.

  https://www.npmjs.com/package/dequeue
  https://api.jquery.com/dequeue/
  https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.queue-1.dequeue?view=net-5.0

DaiPlusPlus · on Dec 30, 2020

.NET uses Dequeue as a verb, not a noun though.

sgerenser · on Dec 30, 2020

Perhaps that's another reason to call the data structure a "Deque" since "Dequeue" is already used for the verb meaning "to remove an item from a queue."

IncRnd · on Jan 1, 2021

It's also for a completely different purpose, but I was answering about the spelling as a direct answer to a question.

PicassoCTs · on Dec 30, 2020

You are correct. But more important is internal consistency - and adherence to the consistency rules of the eco-system. This allows to deduce naming of API-Elements after a while. So the most important rule is- do not be creative. Think like a bureaucrat, adhere to the standards and the ecosystem around you.

I hate this field so much. That anyone creative chooses willingly to go into software is a miracle. Are there people out their with pre-broken spirits, that want to join this soul-less galley of horrors? Starvation should be preferable to this. But i digress..

Why not make naming something that can be personalized? Make the variablename a sort of rule-based-building-block-lego-system, that can be adapted by the user.

Increment+Variable -> "upIndex" for you Increment+Variable -> "incVar" for me. Everyone is happy, the frameworks and apis only specify the buildingblocks as variableNames.. autotranslated variableNames, customized to everyones flavour.

Hooray, i contributed. Prolonging the nightmare.

ludocode · on Dec 30, 2020

I've had the opposite complaints on my similar but verbose library. Some want short names and some want full explicit names.

I'm not sure which way is more popular. Paul Graham has mentioned in several essays how programmers are lazy and want to type the minimum number of characters possible. On the other hand Elon Musk has loudly complained about acronyms and contractions.

In the end you can't please everyone, so the developer just has to pick a style and go with it. If you dislike the short names, it's easy enough to put a comment above the template instantiation explaining what's going on.

bee_rider · on Dec 30, 2020

Graham seems like he'd be much more authoritative on this topic than Musk.

saurik · on Dec 30, 2020

I mean, one can (and I will argue "should", though I appreciate many people will be angry about this) used macros to change the names to whatever they want... but this makes a lot more sense if libraries stick with full names and the contractions are house style.

joshocar · on Dec 30, 2020

I think a lot of this has to do with what people develop with. If they are a vim person then typing long names is annoying, but if they use any type of IDE, the autocomplete removes that hassle and the benefits of having a clearer naming convention can shine.

spijdar · on Dec 30, 2020

In case anyone reading doesn't realize, Vim can do keyword completions. It isn't automatic, but out of the box, you can press ctrl+n to move through a list of possible completions.

Not as elegant as some IDEs I'm sure, but if typing is stopping you from using longer names, this could help.

ori_b · on Dec 30, 2020

Also, for IDE style completion:

https://github.com/ycm-core/YouCompleteMe

coliveira · on Dec 30, 2020

Vim has autocomplete too. This is not an argument about vim since I have seen vim users employing both conventions.

ori_b · on Dec 30, 2020

People confuse 'verbose' and 'clear'. Long names make my eyes glaze over as a reader.

sgillen · on Dec 30, 2020

I think in this case it’s largely a matter of taste. I actually like the everything is three letters thing. It feels clean, and personally I can still read and understand each abbreviation.

I do actually agree on the primitive vs p thing though...

tom_ · on Dec 30, 2020

Another advantage of #define PRIMITIVE - it doesn't steal a handy 1-char identifier that the consuming program might want to use for its own purposes. And as well as making a longer identifier, I'd also personally suggest also adding a library-specific prefix, to further reduce the risk of conflicts.

(I have worked on code where the debug print macro was called P...)

asimjalis · on Dec 30, 2020

The three-letter names have the advantage that it makes the function names shorter since they are prefixed with the same letters.

nitrogen · on Dec 30, 2020

This is cool! I've seen this C-with-templates used in the past with ".X" headers but haven't used the technique myself. Having something based on STL might help people to center on one library instead of constantly reinventing collections in C (which I've also done for some structures, and open sourced though I don't recommend using it: https://github.com/nitrogenlogic/nlutils).

ummonk · on Dec 30, 2020

I recall a few years back doing something like this implementing a binary heap just as a hobby project (https://github.com/Equationist/ctl/blob/master/pqueue.h). I was curious to see how unoptimized it was vis a vis the STL, and ran compiled equivalent test code for both my priority queue and the STL's C++ implementation, with just integers. I was kind of shocked to find that my implementation was like 30% faster when both were compiled at -O3 levels. Seeing this backport I guess there isn't anything fancy in the STL implementations, so it's less surprising that my naive implementation could be just as fast (or happen to be faster).

glouwbug · on Dec 30, 2020

There's this theory that humans are antennas, and that we pick up "wavelengths from above" to build and create the world we live in. Not only did we decide on the same project name and idea, but we also used the same pre-processor hacks. How surreal.

As for your pqueue implementation, the efficacy of an optimizing compiler lies in the inherent complexity of the language. I can imagine the optimization backend for gcc is simpler (and more effective) than g++, but I am not an expert on compiler internals.

eezurr · on Dec 30, 2020

Its more likely that humans are redundant. That is, it is probable in a field of millions that two people come up with the same idea and implementation. I think the birthday paradox is a similar corollary on a smaller scale.

https://en.wikipedia.org/wiki/Birthday_problem

kzrdude · on Dec 30, 2020

These kinds of macro hacks are pretty widespread, aren't they? To me, klib and its khash hash table is the most well-known implementation, see: https://github.com/attractivechaos/klib

And I've written similar things myself (hash table and vectors).

pwdisswordfish5 · on Dec 30, 2020

gcc and g++ use the same backend.

wwweston · on Dec 30, 2020

Where have you heard that theory? I've wondered about it myself, but other than some songwriter-specific conversation, I'm not sure I'm familiar with other people talking about it, and I'd like to be.

glouwbug · on Dec 30, 2020

I'm not sure where I heard it, but if you are interested in an article: https://www.sciencedaily.com/releases/2014/01/140116085105.h...

joshuaissac · on Dec 30, 2020

It sounds somewhat similar to Sheldrake's morphic resonance ideas.

ncmncm · on Dec 30, 2020

In other words, it's 100% BS; in Sheldrake's case, wholly contradicted by his own examples.

benibela · on Dec 30, 2020

FreePascal has a lot of containers, perhaps more than the STL, but they all suck

I wanted a good hash map, so I spend a lot of time comparing them. There are like 10 separate map implementation in FreePascal, but they are all slower than C++'s std::unordered_map.

Then I compared another dozen of libraries, and found the fastest one that is like 50% faster than std::unordered_map. But it was a rather straight forward implementation of non-linear open addressing.

matvore · on Dec 30, 2020

Did you consider using more traditional macros like VEC_ADD which works for all types rather than generating e.g. vec_<type>_add? The latter approach doesn't work with cscope and ctags, which is a pretty big drawback in my book (I like it when my super trivial IDE configuration works, which is only possible with C).

See the comment and following 4 macros for how this would work with a (add-only, not growable) hashmap:

https://github.com/matvore/nusort/blob/master/src/util.h#L16...

pjmlp · on Dec 30, 2020

Well, that is how BIDS 1.0 was designed, before C++ got templates.

panta · on Dec 30, 2020

This is awesome! Very useful for writing highly portable software in C. I think I’ll use this library also with my students.

woadwarrior01 · on Dec 30, 2020

This along with Cosmopolitan libc which was recently posted here[1], would be awesome for writing tiny cross platform utilities in C.

[1]: https://news.ycombinator.com/item?id=25556286

apankrat · on Dec 30, 2020

Nice, very nice. All of it. Long live C.

Being stuck with C++ I did something in reverse - ported C-style ("intrusive") containers to ++, making them a bit safer to use, but keeping the syntax nearly the same.

https://github.com/apankrat/notes/tree/master/intrusive-cont...

einpoklum · on Dec 30, 2020

The Boost libraries have an intrusive containers library, described here:

https://www.boost.org/doc/libs/1_75_0/doc/html/intrusive.htm...

Have you considered it? Is it deficient for your use case?

ktpsns · on Dec 30, 2020

Here's a stupid question: what's the difference of using an intrusive container against using an STL container of (probably smart) pointers? i.e. std::list<My class*>. I've done that plenty of times in my codes in the past.

humanrebar · on Dec 30, 2020

An intrusive node has a getter for the next intrusive node. And the data structure would use that getter to organize the elements of the data structure.

Among other implications (different memory locality and allocation guarantees), you could have the same intrusive node object used in several container objects or even classes (though not at the same time). That is, pull an element from a doubly linked list and insert it into a binary tree without reallocation.

huhtenberg · on Dec 30, 2020

Adding/removing items to std::list will result in extra allocations/deallocations.

It will also take up more space, to store that very pointer.

All of which starts to matter (a lot) when operating with very large data sets.

zackees · on Dec 30, 2020

With intrusive you can remove from the list in constant time from the object itself.

With std::list you need to find the iterate to the node before you can delete it.

apankrat · on Dec 30, 2020

The post I linked to mentions it explicitly.

gpderetta · on Dec 30, 2020

but it doesn't mention why not to use it.

apankrat · on Dec 30, 2020

It does, in the exact same sentence.

gpderetta · on Dec 30, 2020

You mean this:

"the goal here is to make a better version of C-style containers rather than to implement something C++-style and similar,"?

But a goal is not a reason. I.e. why doing

  CONTAINER_OF( user_data, vip );

instead of

  container_of<&user_data::vip> vip_list;

is preferable? How are they even meaningfully different?

apankrat · on Dec 30, 2020

The reason is that the goal is not achievable using boost contraptions. At the very least it's not possible to use boost version without inheritance.

As to why to do what you wrote I have no idea. These two code snippets are unrelated.

gpderetta · on Dec 30, 2020

of course it is possible to use boost intrusive without inheritance: it supports both hooks as base and hooks as members.

apankrat · on Dec 30, 2020

Oh, well, good for it.

If I were to pick the reason, it'd be simply that I don't like Boost.

To me, Boost is an ultimate embodiment of all that went wrong with C++ when it evolved from being a better version of C into the multi-paradigm monstrosity that it is now. Just look at the man page linked above. How to get a clever little concept of intrusive containers and completely decimate it into a technically correct, but unpalatable formulistic piece of engineering that, above all else, is rid of any shred of elegance that made the original concept so great in the first place.

baybal2 · on Dec 30, 2020

What do you think about Glib?

higherhalf · on Dec 30, 2020

Are they fuzzed? Should be useful as some guarantee against memory-safety bugs. Didn't find this mentioned in the project.

glouwbug · on Dec 30, 2020

I have a series of functional tests in `tests/func` which randomly call container functions and check their output against the C++ STL. These tests run on every push, on all optimization settings, with and without gcc's address sanitizer.

ku-man · on Dec 30, 2020

Great work! And thank you for making it available for everyone.

Did you also run tests against other libraries such as glib?

eqvinox · on Dec 30, 2020

Shameless plug:

http://docs.frrouting.org/projects/dev-guide/en/latest/lists...

https://github.com/FRRouting/frr/blob/master/lib/typesafe.h

Differs in a bunch of design decisions. Memory management for items is strictly out of scope, though some of the structures use malloc/free for their own purposes (e.g. heap). Much more focus on sorted/hashed structures, explicitly differentiating for [not] having duplicate items that compare equal. Also, atomic/lock-free versions. Uses macros instead of #including files multiple times.

But fun to see someone else's go at the same idea :)

ludocode · on Dec 30, 2020

Might as well plug mine too:

https://github.com/ludocode/pottery

I'm not really sure what makes a project take off on HN. I posted mine here a couple weeks ago and got one single upvote :(. I'm glad to see the new style of #include templates getting more attention though. I think in general it's the right way to do templates in C.

eesmith · on Dec 30, 2020

FWIW, for sorting in a template-like C package I use https://github.com/swenson/sort/ .

ludocode · on Dec 30, 2020

That library is included in Pottery's benchmarks! :) It's another of the very few C libraries that use the #include template style.

Pottery's intro_sort has comparable or better performance to swenson's quick_sort but with a lot more features. You can supply a move or swap expression to Pottery to swap non-bitwise-movable types, you can supply array access expressions to sort non-contiguous arrays, and it has the heapsort fallback to guarantee O(nlogn) performance.

I don't have a Timsort yet though. For that, swenson's implementation is surely the best.

eesmith · on Dec 30, 2020

:) I'm using it for Timsort. Though in truth that's because I come from Python where it has a special fame - I haven't actually tested its performance against the other ones on my data set.

optymizer · on Dec 30, 2020

Congrats on having an idea, coding it up and releasing it as a library! You should be proud.

Now I'd like to respectfully offer a bit of constructive criticism. Your idea is fine, but I think the API needs some polishing.

1) The 3 letter names have to go. 'vec' is somewhat is established in graphics libraries, so you might get away with that, but 'lst' - seriously? Just use 'list', 'queue', 'map', etc. Don't invent your own terms for well-known concepts. It's ok if combined data types are a bit longer. 'list_queue' is fine and very readable, 'lst_que' is less so.

2) "#define P" is clunky for several reasons: a) why do we have to care about POD vs non-POD? is it impossible to get rid of that? b) for the same reason that just using "int p" is clunky; and c) imagine lots of developers actually using your library. It would be great if each one of them didn't have to look up what "define P" is (what would google results be?). My suggestion would be to do away with having to define it

3) multiple #define in front of an #include is verbose imo. It works, but it's not perfect. I'd shoot for perfect, but that's just me. I suspect this would require significant API change so I don't have a good suggestion at the moment, I just know that I personally would not prefer multiple #define calls before I include the list. I get that you're using X-includes, but a single #define is my limit before I groan, but other people's tolerance level might be different.

Cheers!

asveikau · on Dec 30, 2020

> 2) "#define P" is clunky for several reasons: a) why do we have to care about POD vs non-POD?

Because the latter needs some equivalent to a copy constructor and destructor.

I had another reaction: there is no equivalent to a move constructor? That becomes painful when it's time to support a vector of vectors. (Don't want to do a deep copy of all elements when you resize the vector.)

CyberDildonics · on Dec 30, 2020

It all seems reasonable to me. Caring about shortened names is putting a lot of emphasis on something trivial. Programming gets a lot harder than that.

optymizer · on Dec 30, 2020

I understand your view and I don't think it's wrong. I have a different set of values for code, and I do like to hang out with people who think like you for periodic reality checks.

leni536 · on Dec 30, 2020

The performance of sort being exactly the same as the C++ implementation looked a bit suspicious to me at first. Notable that in the C++ performance test a function pointer is passed to std::sort in a similar way how the C library is used [1].

This is probably a good way to check that the C library doesn't do worse in this case, but it's notable that C++ has potentially faster ways to do the sorting:

1. Pass a function object (even a lambda that just wraps "compare")

2. Don't pass a comparison at all for the "vector<int>" case, as the default works there.

Both of these methods have better chances for inlining the comparison within the sort implementation. For the function pointer case it would require constant propagation and possibly "cloning" to do the same, which is less likely for a complicated algorithm like sort.

[1] https://github.com/glouw/ctl/blob/09f5e0a89d3fc621aff94290c4...

edit: I wonder how ctl's vector sort compares to qsort, I expect it to be faster.

ludocode · on Dec 30, 2020

I don't believe this is correct. Passing a function name to a C++ template doesn't mean it will be called by an indirection at runtime. The compare function will certainly be inlined. Wrapping it in a lambda is useless and would at best optimize to the identical code.

leni536 · on Dec 30, 2020

If you pass a function name then the corresponding type template parameter deduces to a function pointer type. Then this function pointer is passed all the way down where the comparison happens, and by that point it's unlikely to be inlined.

edit: In this talk this exact scenario is analysed in depth:

https://youtu.be/8nyq8SNUTSc?t=435

ludocode · on Dec 30, 2020

It appears you are technically correct. TIL!

I don't think it matters in this benchmark though. Compilers can certainly deduce that it's a compile-time constant because the template is fully instantiated in the same translation unit and only used once so it's not hard for a compiler to propagate it. At least in my tests GCC and Clang seem to propagate it just fine.

I suppose a case where it might matter would be if you call std::sort several times in the same translation unit (or the same binary under LTO) with the same template types but different comparison functions. In that case calls to std::sort with different comparison functions would share an implementation so it couldn't be inlined.

You're right though, it should be fixed. Rather than making it a lambda, the benchmark probably shouldn't pass a compare expression to std::sort at all so it can make its own decisions on how to compare values. For example I expect C++20 implementations of std::sort may provide alternate implementations of partitioning that use the spaceship operator if it exists (at least my implementation will.) This is one of the true advantages C++ has that can't be replicated in C.

gpderetta · on Dec 30, 2020

These days GCC and clang can inline function pointers in many scenarios even through function calls, but it wasn't the case even a few years ago, and they still fail to inline in many important cases.

Inlining pointers requires doing constant propagation and the compiler can give up if it has to track too much stuff, while in the lambda case, the type of the comparator is always available.

Also lambdas can be removed in the early inliner stage exposing them to more optimization opportunities, while function pointers have to be inlined after constant propagation.

As usual, it is a trade off, so there is no substitute to measuring in your specific scenario.

leni536 · on Dec 30, 2020

> At least in my tests GCC and Clang seem to propagate it just fine.

In my tests gcc clones sort and inlines the comparison only in -O3. clang doesn't do it even with -O3, neither with libstdc++ nor libc++.

https://godbolt.org/z/4feeex

trott · on Dec 30, 2020

EA developers weren't happy with STL, so they created their own version, EASTL. One of their complaints had to do with custom allocators: STL allocators combine allocation and initialization.

Reading their critique of STL design decisions, as well as of their own, might be useful here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n227...

dralley · on Dec 30, 2020

Ziglang has done a lot of interesting work here as well. They have no default allocator - things in stdlib that require allocation require you to provide one, but you get a decent selection of different allocators in stdlib to use for various use cases.

https://ziglearn.org/chapter-2/

qalmakka · on Dec 30, 2020

Isn't this somewhat mitigated by the introduction of std::optional<> in C++17? By using it optional<T> instead of T, you can allocate storage while deferring initialization to a latter time (using emplace() or operator=, if T is either moveable or copiable).

sgerenser · on Dec 30, 2020

Perhaps somewhat, but it adds an extra layer of (often) unnecessary complexity. This is especially the case which comes up often where you’re interacting with external libraries or even your own internal, older libraries. For example, if you have a vector, and want to pass its contents to a C API that takes a pointer and length. If it’s a vector of raw values/structs, no problem. If it’s a vector of std::optional<Foo> now you’re stuck unless you go through and recopy the whole thing.

That being said, I’ve never found the STLs use of initialized allocation to be a problem, but I don’t work in game design.

gpderetta · on Dec 30, 2020

you still need to initialize the optional itself.

But the criticism is still questionable: an allocator can define ::construct() not to do anything, and leave actual construction to user code.

A few more of the criticisms are addressed in C++11 which fully support stateful allocators; they were not mandated (but still allowed) in C++03 because some implementors felt that the STL was too much work to implement already.

Most of the rest are QoI issues, and it is perfectly reasonable for EA to have its own STL implementation that give the same guarantees across platforms, but it is not a criticism of the standard itself.

qalmakka · on Jan 1, 2021

> you still need to initialize the optional itself.

Isn't initalising an optional basically a trivial operation? The default constructor is fundamentally the equivalent of setting a boolean flag to 0.

gpderetta · on Jan 1, 2021

Be careful that 'trivial' is a term if art in the context of C++ initialization. Having to do work is different than not having to do it. Constructting a vector of N optionals is O(N) as opposed of a vector of a truly trivially constructible type, which is O(1) (although I'm not 100% sure if the standard guarantees it or it is just a QoI).

Alupis · on Dec 30, 2020

I have to admit, the STL - and more generally templates - in C++ are very perplexing to someone that has a strong background in a language with a more complete generics implementation (ie, Java or C#).

The very fact that templates are not classes, specifically not base classes for all derived variants of the template is perplexing and, in my opinion, a major swing and miss for a language that seems to pride itself on having everything including the kitchen sink.

I've read the reasons behind why templates do not share any inheritance with derived classes, and it makes sense within the context of how C++ and more specifically the compiler works... but it limits the usefulness of templates past mere containers.

dragontamer · on Dec 30, 2020

> but it limits the usefulness of templates past mere containers.

Your post (specifically, your conclusion) makes me think that you are fundamentally mistaken about how to use templates.

Template metaprogramming is an old-trick at this point, with a huge variety of applications in C++. But even if you ignore Template metaprogramming (which many argue is too difficult to understand...), bog-standard C++ techniques like CRTP (Curiously Recurring Template Pattern) shows how useful templates can be outside of containers.

https://en.wikipedia.org/wiki/Curiously_recurring_template_p...

---------

C++ templates are extremely different beasts than generics. Templates are closer to a meta-language than a language feature. They hold important roles in optimization... specifically allowing you to convert "runtime code" into "compile time code" in a huge class of problems.

CRTP is a great example of retaining OOP-style polymorphism while compiling down into highly efficient non-virtual calls... with practical applications into GUI frameworks, video games and more. And its only made possible due to templates.

Alupis · on Dec 30, 2020

Interesting, I've never heard of CRTP and will need to read up on it.

I do "get it" that templates are not generics... there's just a lot of gotchas for someone assuming generics, err - templates are similar to other languages. Like the difficulty in nesting templates in templates...

dragontamer · on Dec 30, 2020

Even in the use of "containers", generics are the complete opposite of how to use templates.

Java / C#'s concept of "Everything derives from Object" is outright rejected in C++. Templates are part of this culture: how can we provide extendibility when nothing is an Object?

Well, look at iterators in C++. All C++ iterators extend from all collections WITHOUT the use of OOP-derivations. That's thanks to the magic of Templates.

------

What's going on? C++ templates allow the creation of new iterator classes. It invokes the compile-time language to create a new class at compile time.

In Java / C#: vector<MyClass> is the same old vector as all other vectors.

In C++: vector<MyClass> is a custom-made, completely separate class from all other vectors. vector<MyClass>::iterator is a 2nd, custom-made completely separate class from all other iterators.

And yet, all of these different classes can be used the same, thanks to the use of auto, or other compile-time structures in C++.

Alupis · on Dec 30, 2020

Getting Java's generics out of your head when doing C++ is definitely a challenge. It's just so disappointing you cannot do something like:

    void myFunc(List<? extends MyDataContainer<?>> list);

Instead you end up with a base class MyDataContiner and then manually extending it with something like MyDataContinerInt or MyDataContinerLong etc...

C++ templates may be very powerful, but limitations like this feel out of place in a language as bloated as C++ already is...

Admittedly, I still have much to learn about C++. It's been quite an experience, frustrating at times, learning this language.

dragontamer · on Dec 30, 2020

I'm curious: what's wrong with myFunc(List<BaseClass * >) ??

EDIT: Remember, in C++, "BaseClass" cannot polymorph, only "BaseClass * " can. (BaseClass is where the data is stored precisely. If its 20 bytes of RAM, then DerivedClass might be 24 bytes of RAM, and therefore can't fit. But both BaseClass * and DerivedClass * are compatible with each other, and may convert into each other, as long as you know the pointer goes to the right kind of class)

And if that's not sufficient, then a static_assert over the type information is probably what you need. Ex:

   template<class T>
   void myFunc(List<T> list){
      static_assert(std::is_nothrow_convertible(T, BaseClass)); // I haven't tried, but this probably works
   }

That's what I mean about C++ templates: they're a meta-language. You can add if-statements / else statements over type information at compile-time in C++.

In this case, we've created a static_assert (aka: a compile-time error will pop up) if T does not convert into BaseClass (note: T may have a CopyConstructor(BaseClass), so maybe its not exactly tthe same concept as what you're trying to do... but this gives you an idea of C++isms...)

Alupis · on Dec 30, 2020

> I'm curious: what's wrong with myFunc(List<BaseClass * >) ??

Well, nothing I suppose, except BaseClass cannot then be a template, and you cannot have derived class-specific return types like a template would allow.

So, having BaseClassInt type return int from some function but BaseClassLong return a long gets more challenging than it should be. (I recognize all my examples so far had a void return type, that was just for simplicity).

> But both BaseClass * and DerivedClass * are compatible with each other, and may convert into each other, as long as you know the pointer goes to the right kind of class)

Which loses type safety, unfortunately.

> In this case, we've created a static_assert (aka: a compile-time error will pop up) if T does not convert into BaseClass

I'll look into the static_assert() option, could prove useful in some cases.

At the end of the day, there are two things at play here - my lack of C++ familiarity, and in some cases it's just how C++ is.

It is a strange feeling though, having a language limit your expressiveness for the first time...

dragontamer · on Dec 30, 2020

> Well, nothing I suppose, except BaseClass cannot then be a template

    template <class T>
    void myFunc(List<T*>){
        static_assert(stuff about T that you want to guarantee at compile time); 
    }

See here: https://en.cppreference.com/w/cpp/header/type_traits

For a list of common functions used for compile-time type checking.

Alupis · on Dec 30, 2020

Thank you for this conversation. You've given me some homework to read up on, and I appreciate it a lot!

aw1621107 · on Dec 30, 2020

I think you can get something similar now with C++20 concepts?

    template<typename T, typename U>
    void my_func(std::vector<T>* vec) requires std::derived_from<T, my_data_container<U>>;

(Probably mangled something, but the idea is there)

Without concepts, you're left with SINFAE:

    template<typename T, typename U, typename = std::enable_if_t<std::is_base_of_v<my_data_container<U>, T>>>
    void my_func(std::vector<T>* vec);

or just a regular function and a hope that anything wrong will be caught by the compiler:

    template<typename T, typename U>
    void my_func(std::vector<T>* vec) {
        // use my_data_container<U> somewhere and hope things blow up if something is wrong
        // Not 100% sure this is allowed, but something similar should be at least
    }

_slyo · on Dec 31, 2020

Templates are the killer feature of C++. They add incredible power to the language, but also dramatically increase its complexity. Templates are only superficially similar to generics in C#/Java. They are essentially a type-safe, Turing-complete, syntactically-constrained macro system deeply embedded into the grammar of the language.

Few other programming languages offer a similar feature. C++ pulled it off with an ISO standard and multiple conformant implementations.

Templates are loved by many, but they are a contender (along with UB) for the most hated C++ feature due to complexity.

slavik81 · on Dec 30, 2020

As a C++ developer, I see things entirely the other way. Java templates are much less powerful than C++ templates. In Java, templates were basically just a convenient way to do type erasure. That's better than nothing, but the inability to specialize templates makes them quite limited. In C++, you can define a common base class if you need one, but being able to optimize based on the template type is very useful.

Function templates are great for writing concise, efficient math that handles half-precision, single-precision, double-precision, single-precision complex numbers and double-precision complex numbers (and more) with minimal duplication.

Templates are also what make interfaces like std::format efficient and easy to use.

aw1621107 · on Dec 30, 2020

> The very fact that templates are not classes, specifically not base classes for all derived variants of the template is perplexing

I'm a little confused at what you're trying to get at here. What do you want to do that needs such a thing? What is the equivalent Java/C# code you have in mind?

> but it limits the usefulness of templates past mere containers.

What kinds of uses did you have in mind for a "template base class"?

Alupis · on Dec 30, 2020

Extend a template and specialize it for some type. Extend the template again and specialize it for some other type.

The two resulting derived classes are not the same, and you cannot have a function that accepts the base template with a wild card type and pass both types into it.

In Java, you can do: void myFunc(List<?> list); and all List objects with any type can be accepted by this method, in addition any class that extended List can also be accepted into this method, such as ArrayList<?> or whatever.

As far as I've seen, this is impossible in C++, unfortunately.

aw1621107 · on Dec 30, 2020

Why not

    template<typename T>
    void my_func(std::vector<T>& vec);

?

The reference (or use a pointer, if you want) ensures that derived types can be passed, and the template allows for any parameterization to be used.

teabee89 · on Dec 30, 2020

In the wiki an int compare(int * , int * ) function [1] is used, which is odd if the elements are not pointers (when P is defined). Shouldn't it be changed to int compare(int, int) when P is defined? I wonder if it impacts at all the benchmarks.

[1] https://github.com/glouw/ctl/blob/49b90312f9473057fd2ed77941...

glouwbug · on Dec 30, 2020

The compare nomenclature is a one size fits all, given larger structs require a pointer to prevent large amount of copies. Think of qsorts compare, which uses two void* arguments.

tu7001 · on Dec 30, 2020

Be nice to see a hash map here.

glouwbug · on Dec 30, 2020

Due to popular demand, I will implement unordered_set after the holidays, once the new year is in full swing.

bufo · on Dec 30, 2020

I recommend looking at the khash implementation https://github.com/attractivechaos/klib/blob/master/khash.h It's very competitive to optimized implementations like abseil and much, much better than the C++ STL one.

MrBuddyCasino · on Dec 30, 2020

In the Java stdlib, Set is implemented on top of HashMap. I don't see how the opposite is possible?

masklinn · on Dec 30, 2020

Yes, that part of the readme:

> STL std::map will not be implemented in CTL because maps only provide slight syntactic improvements over sets.

makes no sense to me.

It's the other way around: a set is a map with a irrelevant / ignored value (or singleton / ZST if available), with built-in functions for set-oriented operations (union/difference/subset/superset/disjoint). So if you're only going to provide one of them, you provide a map (that's what Go does).

You don't have to implement sets in terms of maps though, it can be useful to provide different implementations of them as their usage patterns can be different and thus it might make sense to use a different underlying data structure (not sure about trees there but definitely a possible consideration for hashes).

It also makes sense to have a bespoke set implementation if the language doesn't have ZSTs, to avoid wasting space (between 1 byte and 1 word per key depending on the storage details).

Finally it can make sense to have a bespoke set implementation due to language visibility rules, so you can e.g. hit the hashes directly (rather than have to recompute them) when performing operations involving multiple sets.

quietbritishjim · on Dec 30, 2020

It's straightforward to use a set type as a mapping type, so long as it provides three things:

* Comparisons don't have to use all members of the type e.g. if I have two Employee objects then I'm allowed to create a comparison function that only compares their employee IDs and ignores the name fields (perhaps because the name ought to be derived from employee ID). (I know this article is not about C++, but as a reference point this has always been possible in C++'s STL.)

* Comparisons don't need a full-blown instance of the contained type e.g. in our employee example, the set can make use of a comparison function between int and Employee without having to construct an Employee with that ID. (This has been possible in C++ since C++11, and is often called heterogeneous lookup.)

* The last point is a bit fiddly: Either the set type allows mutation of the objects so long as the parts relevant to comparison don't change, or you only wanted to store const values anyway. I mention this because in C++ if you have a std::set<Foo> then you can't mutate the Foo elements at all, even if it has no effect on the comparison of the elements with other elements. (Unless there are const methods that actually mutate it... yuck!)

So long as you have those three things, you can have a set of (key, value) pairs where the comparitor only compares keys, and bingo you have a mapping type.

ludocode · on Dec 30, 2020

This is true, but your second point is difficult to do in C. You would have to have a set template and a separate lookup template to search the set based on a key type. This is actually how heterogeneous lookup is implemented in C++: it's a template within a template. This is not a big deal in C++ because templates are instantiated automatically but we're instantiating templates manually in C. It's also only available as of C++14 and only on ordered containers, not hash tables.

It's more straightforward to combine these templates: make map the fundamental type, but make values contain their keys and make the key type default to the value type if omitted. This way it's both a map and a set. My own C hash table template does it this way [1]: it stores only a value type and not keys, but there is an optional separate key type used for lookups, hashes and comparisons. To make it a map, you provide both a key comparison expression and an expression to convert values to keys.

[1]: https://github.com/ludocode/pottery/tree/master/include/pott...

masklinn · on Dec 30, 2020

There's a 4th requirement: that the set has some way to return colliding entries on insertion (or provides access to the entry itself), which most don't. Otherwise lookup requires iterating the entire set.

> It's straightforward

It's really not. Straightforward is the other way around, this is convoluted and involved. It makes the library much harder to use for rather limited gains.

> Either the set type allows mutation of the objects so long as the parts relevant to comparison don't change

This sounds like a footgun requirement at best, I'm pleasantly surprised (for once) that C++ doesn't just let you mutate the entire thing at will.

MrBuddyCasino · on Dec 30, 2020

To be honest, that doesn't sound so straightforward to me. Going the other way and implementing Set on top of HashMap is comparably trivial, making those kind of API acrobatics unnecessary.

gpderetta · on Dec 30, 2020

a map is a set indexed only on a subset of the fields of your type.

This maps directly to the relational model, where a table (or relation) being a set of tuples is the fundamental concept: for your typical programming language set the uniqueness constraint (and key), would be comprised by all the columns, while for a map it would only be comprised by a subset (just one at the limit), while the remaining columns would be the mapped type.

Set operations (union, intersection, difference) do make sense for maps as well in fact.

Given a powerful enough language, you can generalize your sets and maps with multiple uniqueness constraints and secondary indices. See boost.multindex for example.

edit: and BTW, most STL implementations use the same underlying implementation for std::set and std::map which has a set-like interface with the ability to specify the key mapping function.

glouwbug · on Dec 30, 2020

Sets (and maps) are implemented as red black trees in the STL. The STL unordered_set (and unordered_map) is what I believe Java would call a HashMap, unless I am mistaken (I am not a Java guru).

masklinn · on Dec 30, 2020

I don't know that the STL requires rbtrees for map and set, but you're correct that `std::unordered_map` is the STL's hashmap. The Java version of std::map would be java.util.TreeMap (and std::set => java.util.TreeSet).

tom_mellior · on Dec 30, 2020

The STL's complexity requirements pretty much dictate a balanced binary search tree as the underlying representation for map and set. The choice then is essentially between red-black trees and AVL trees.

wffurr · on Dec 30, 2020

A map is a set of key,value pairs, and the set lookup function only considers the key in the comparison.

einpoklum · on Dec 30, 2020

Remember that C++'s unordered_set is quite slow (e.g. due to the use of lists for hash buckets.)

glouwbug · on Dec 30, 2020

I will keep this in mind when I implement unordered_set. Thank you, and happy holidays

a-dub · on Dec 30, 2020

hooray for hashes!

daitangio · on Dec 30, 2020

Impressive work. I think you can use it twice in a file including it two times with redefine.

Can you provide an example with typedef and two different list types?

glouwbug · on Dec 30, 2020

I imagine something along the lines of:

    typedef char* charp;
    #define P
    #define T charp
    #include <lst.h>

    typedef struct { int x, y; } point;
    #define P
    #define T point
    #include <lst.h>

    int main(void)
    {
        lst_charp a = lst_charp_init();
        lst_point b = lst_point_init();
        lst_charp_push_back(&a, "the");
        lst_charp_push_back(&a, "quick");
        lst_charp_push_back(&a, "brown");
        lst_charp_push_back(&a, "fox");
        lst_point_push_back(&b, (point) { 53, 32 });
        lst_point_push_back(&b, (point) { 11, 22 });
        lst_point_push_back(&b, (point) { 83, 23 });
        lst_charp_free(&a);
        lst_point_free(&b);
    }

eqvinox · on Dec 30, 2020

Wait ... "imagine" - you can't currently do more than one? That's kind of the essential point with templates.

(I scrolled through it and assumed you'd just do "#define lst list_charp"?)

wffurr · on Dec 30, 2020

“Imagine” as in that code is written in an HN comment, not a C source file, and probably hasn’t been checked with a compiler yet. Why don’t you copy and paste it into a test, try it out, and report back?

glouwbug · on Dec 30, 2020

Yes, as many templates as you like. An A* pathfinder, for instance: https://github.com/glouw/ctl/blob/master/examples/astar.c#L8...

Although if you're interested in just a batch rename, it's something like this:

    #define vec_char str
    #define P
    #define T char
    #include <vec.h>

gpderetta · on Dec 30, 2020

it seems that it "instantiates" two list types: lst_charp and lst_point and uses both of them. What's the issue?

rurban · on Dec 31, 2020

I didn't like the names at all, and also it shouldn't be in the main namespace, but under ctl/

https://github.com/rurban/ctl

In work are map, forward_list, unordered_set, and utf-8 support for strings and identifiers, with its stricter rules.

SurfingInVR · on Dec 30, 2020

This is interesting! Why not ANSI C, if not going for the latest C spec?

glouwbug · on Dec 30, 2020

C99 Variadic macros, for one, which allows for a nice to use `foreach` implementation. As for the latest C spec, I use a set of C compilable by C++, since most of the functional tests back-test against the STL, and hence compile with a C++ compiler

smlckz · on Dec 30, 2020

You do not need variadic macros!

        #define foreach(container, variable, iterator) \
        for (JOIN(container, it) iterator = JOIN(JOIN(container, it), each) (variable); \
        !iterator.done; iterator.step(&iterator))

Is this less nice? I personally do not like dangling )'s [I did not have to write DOS batch scripts :D]

glouwbug · on Dec 30, 2020

Amazing, I can drop the variadic macro.

EDIT: Done, and thank you so much - it's so much cleaner this way

smlckz · on Dec 31, 2020

You're welcome. And you might like to see ''sys/queue.h'': https://github.com/openbsd/src/blob/master/sys/sys/queue.h

souprock · on Dec 30, 2020

You're missing out on _Generic, _Static_assert, and restrict.

https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-...

okaleniuk · on Dec 30, 2020

Ok, you can take away my C++ now :-) Seriously though, nicely done!

sherjilozair · on Dec 30, 2020

How would I include the definitions without the implementations?

notorandit · on Dec 30, 2020

Long live C!

ausjke · on Dec 30, 2020

Looks awesome to me. Will use it in future projects. Thanks for the great work.