Hacker News new | past | comments | ask | show | jobs | submit | zenogais's comments login

Like the author said this is completely unoptimized. The natural next step in optimization might be to profile and then SIMD optimize the slow bits in compression and decompression. This would likely produce a significant speedup and may even bridge the gap with lz4.


The algorithm is extremely resistant to SIMD optimizations.

Every pixel uses a different encoding, 95% of the encodings rely on the value of the previous pixel, or the accumulated state of all previously processed pixels. The number of bytes per pixel and pixels per byte swing wildly.

SIMD optimization would basically require redesigning it from scratch.


SIMD only gets you up to the width that your hardware platform supports and every SIMD program has to be rewritten for the new width.

Two other immediate avenues are multithreading, which think could be quite effective for this algorithm or GLSL, of that I have no opinion.


What you're looking for exists today and is called a stable-coin (for example, USDC).


Agree with this and I think it's the crucial point. Where I've had the most success with UML-like tools and diagrams is early on in a project since its much faster to rip up and rebuild a few diagrams. The diagrams also help with getting consensus among stakeholders that this direction makes sense and doesn't have any glaring holes.

Later on you can drop the diagrams entirely and perhaps keep them as some onboarding documentation if they're still useful.


But it links to the Coinbase blog post in the "Inspirations" section on the front page?


Ah yes, I missed that. Thanks.


Cars aren't killing people though, it's the drivers of those cars. Such an odd headline.

Better title: "Why are drivers killing more pedestrians?"


>Better title: "Why are drivers killing more pedestrians?"

I disagree. That title lacks information and might lead to some people to think drivers are beating the pedestrians to death or maybe they're more prone to gun violence against pedestrians.

Maybe, a better title could be: "Why are drivers who are currently driving a car, pedestrians currently walking on the street, by running them over with said car"


Type generic expressions [1] already exist in C11. You can do this without a special compiler front-end if you need it, you would just need to define each variant by hand. These are supported in GCC and Clang with the -std=c11 flag.

[1]: http://www.robertgamble.net/2012/01/c11-generic-selections.h...


I developed these ideas a bit further with the use of XX macros, which allow you to make a set of functions generic across some list of types. There can be real performance wins to this over void*: https://abissell.com/2014/01/16/c11s-_generic-keyword-macro-...


That looks handy. I got bored one day and built a fluent interface: http://flukus.github.io/fluent-interfaces-for-c.html


_Generic has been created for a specific use (tgmath.h) and is not very powerful.


I wanted to like this article because I'm been thinking about this a lot in the context of game development, noticed a few things. One thing I'll say from briefly playing with this - the code leaves lots out a looks ostensibly simpler than it really is. Would very much appreciate tips / pointers on this or a more fleshed out and working implementation of the code.

For the bulk data with holes code:

First, there's an initialization step that has to happen the first time you allocate your bulk_data_t. Namely, you need to iterate through every item in the list and set its next_free item to the item following it, looping the last item back around to zero. You also need to do this for all the items between the new size and old size every time you resize your item list.

Second, safe iteration over all of the bulk data doesn't seem possible without adding some sort of flag to indicate whether or not an item is free.

Am I missing something here?


No need to preinitialize the list. When allocating an element you have two cases: a) the free list is non empty: you pop an element by making the head point to its successor; b) the free list is empty: you increase the size of the vector by one.

To deallocate an element you simply set the elemnt next to the whatever is the current head, then make head point to this element.

You start with an empty vector.

An element is either allocated or in a free list, that's why the next pointer can be kept in an union.


I would say it depends on how one might want to handle it, like when you create an item_t you set next_free = -1 as a flag to indicate that it is not free. And have bulk_data_t's 0th position's next_free be 0.


Update: I was indeed missing something.

I think I've figured out roughly what the author intended, code below [0].

First, it looks like he's relying implicitly on data stored in std::vector. Namely vectors have both a capacity and a size. The capacity is total number of allocated elements. The size is the total number of elements stored actually stored.

Second, vector::resize won't reallocate until it runs out of capacity, but it will give you access to extra elements if you need them. So this is used to lazily re allocate while bumping up the size of the vector.

Both of these effectively make it "do the right thing" by leaning on the vector storing both size and capacity.

If you hand manage those values yourself you can get a pretty compact C implementation without a lot of code.

One last thing: Using a union here for the item_t is pretty much guaranteed to get you a segfault. The whole thing should really be a struct. This also allows for setting next to sentinel value if necessary.

[0]: C code for bulk_data_t example: https://pastebin.com/Tfcdt39h


The author's pseudo-code implies that bd->items is full already, since it is probably initialized to the initial number of items that are added. This is probably for memory optimization for games. Also explains why resize increases the size by 1, just enough memory to add the new item.

This way you don't need to worry about keeping track of size and capacity either.

The reason I suggested -1 is because when we iterate through bd->items, we need a way to know if it's a valid value or just "holed".


> This way you don't need to worry about keeping track of size and capacity either.

The only reason you don't have to worry about this is because std::vector handles it for you, at least in the code examples provided by the author. If you choose to go with a pure C implementation (which is what I'm trying out) then you will have to keep track of these.

> The reason I suggested -1 is because when we iterate through bd->items, we need a way to know if it's a valid value or just "holed".

Yep, I was able to get an example using -1 as a sentinel working and passing fuzz testing.


A union is perfectly fine, the next element is only needed when the element is in the free list in which case item_t won't be accessed.


It should work fine in the examples, but it won't work if you also attempt to use the -1 in the next slot to tell you which items are non-free.


If you want to iterate through your objects without some external mean to track the live objects (like an embedded next pointer in the object itself), then the swap and pop idiom is a better solution (iterating via indexing is going to be significantly faster than following the next pointers).


I think the strong counter-argument to this is in [1]. Quite simply we need to make sure we don't get fooled by randomness and see bias where there is none, thereby making good algorithms worse in an attempt to fix imaginary bias.

[1]: https://jacobitemag.com/2017/08/29/a-i-bias-doesnt-mean-what...


I don't think it's that good counterargument. The explained problem was treated as simple mapping - this value causes that result. It spends a bit talking about what statistics mean, but unfortunately doesn't discuss the idea of confounding variables, which influence both sides of the equation.


Initial thought.

This reads largely like a recapitulation of Boehm's 2003 book "Balance Agility and Discipline" [1]. Have you read this and did this provide any inspiration for the article?

[1]: https://www.amazon.com/Balancing-Agility-Discipline-Guide-Pe...


I haven’t read it but I’ll put it on my reading list, thanks.


^ This is pretty much standard at any startup in my experience.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: