Itsy Bitsy Data Structures – Simplified examples of many common data structures

thejameskyle · on Aug 27, 2016

Hi I wrote this thing. I guess I'll comment on this thread since it got some attention.

I think a lot of people that are in this thread went into this project with the complete wrong state of mind.

I'm not trying to teach people how a hash table or linked list implementation works. I'm not trying to make a library that is useful to anyone.

What I'm trying to do is connect the dots between a bunch of different important concepts in data structures.

People can go read things about each of these individual topics to great length and I encourage them too, but in general there are a lot of great computer science resources for just piecing a lot of related topics together for beginners.

Sure you can read a 600 page book that will cover things from end to end, but those are very intimidating when you have no prior knowledge of the topic and sometimes you don't get as much out of them if you don't know the basics.

If I can provide people with just a quick highlight reel of essential knowledge of data structures then I've done what I set out to do.

I hope Hacker News can appreciate that, I don't expect it to, but I can hope.

Jizzle · on Aug 27, 2016

As someone who has implemented all these before, I thought the presentation was great. Yes, there are some holes, but in under 1500 lines, ASCII art and all, a lot was covered. More importantly, the ideas put forward can always be built upon--more data structures, more talk about time/space complexity or even probability (e.g. bloom filters), forays into solving famous problems with those data structures (optimally or not), etc, etc. Whether this is the first and final installment or not, it's a nice contribution.

posterboy · on Aug 27, 2016

I didn't read it, because I know the topic already, but also because I found it too long for an overview. If the crowd with reduced attention span is your target group, this doesn't work. This would amount to a 20 pages perhaps. I'd assume that's roughly how much it would need in the 600 page book, too.

Also, there is no TOC to gloss over, the ascii, cute as it is, is harder to fly over than legit text with embedded code snippets.

thejameskyle · on Aug 27, 2016

It perfectly matches up with a 30 minute talk that I gave yesterday. It's meant to be read sequentially not as a reference. It's helped a lot of people already, and this format has worked in the past: the super tiny compiler has been turned into interactive tutorials and even recommended by professors. People love the format and I think it connects with people better.

But then again I don't know why you thought your opinion mattered if you didn't even read it. I guess that is the quality feedback I do expect from Hacker News commentors though.

eonw · on Aug 27, 2016

i for one read it and found it useful. thanks for your time and energy spent making and sharing it.

marvy · on Aug 26, 2016

Most languages have a built in type for a dynamic array, but many of them lack a built-in queue. I have a favorite trick for a quick-and dirty queue in those languages. You need an array (initially empty) and an integer (initially zero). To add to the queue, add to the array. To get the front of the queue, just do array[int]. And finally, to remove from the queue, simply increment the integer. The advantages of this approach are simplicity and speed: each operation takes amortized constant time. The obvious disadvantage is that it's potentially a major memory leak. For instance, if you try to do breadth-first search of a tree using this queue, your memory usage will be proportional to the size of the tree. How much worse than optimal is this? It depends on the tree. If it's a complete binary tree, you've doubled your memory usage, which might not be too bad. But if the tree is a path, then you use linear memory, whereas a proper queue would only use O(1) memory.

munificent · on Aug 27, 2016

You can avoid the memory leak without adding too much complexity by just making it a circular buffer:

https://en.wikipedia.org/wiki/Circular_buffer

pkd · on Aug 27, 2016

I wouldn't call this a trick. This is a staple in the data structures books when they teach how to implement queues using arrays. The same approach can be extended to implementing circular and double ended queues as well.

lliiffee · on Aug 26, 2016

You could periodically delete the wasted space. I think that if you, say, delete the wasted space when it is equal to the used space, you still still have O(1) amortized time.

marvy · on Aug 26, 2016

You could, but then it's not quite so quick-and-dirty anymore :) The de-queue method goes from "self.i += 1" (Python syntax) to

    self.i +=1
    if self.i > len(self.a)/2:
      self.a[:self.i] = []
      self.i = 0

I didn't test the above code, and I am only 50% sure it is bug-free. I'm 99% sure the leaky version is bug-free. Memory leaks are not a bug if I document them, right? ;)

But yes, periodically deleting the wasted space is still easier than trying to do the usual thing with circular buffers and stuff :)

andrepd · on Aug 27, 2016

Easier and a major performance disaster. You shouldn't ever do this ever. If you are a programmer, it's your job to implement this correctly.

marvy · on Aug 29, 2016

Asooka: a lot of your comments are showing up as dead. I have no clue why, they seem fine to me. I can't reply directly to the one you made in this thread: it's dead too.

lliiffee · on Aug 27, 2016

I definitely see your point! I guess there is a place for algorithms at any point on the easy/good trade-off.

jscheel · on Aug 26, 2016

Seems to be a lot of snark in this thread already. If you see a problem, create an issue or submit a pull request. It's great that OP is trying to help spread knowledge.

notsorandomname · on Aug 26, 2016

Hash table realization seems to be harmful with the "no collisions" assumption. That's like writing a red black tree and assuming that insert/delete operations will keep tree balanced so we don't need rotations.

ksk · on Aug 26, 2016

How is it harmful? You really think that somebody who is going to write a red black tree implementation (or similar) is going to use this document as a reference? That'd be like complaining about a high school physics textbook because somebody from Boeing might use it to design a plane.

xg15 · on Aug 26, 2016

Some simplifications may actually be harmful to understanding if they throw away key points or introduce logical contradictions.

For a hash table, the general "promise" is that you map a very large space of keys (e.g. all strings) to a very small space of indexes using a simple function. Common sense might tell you that this shouldn't be possible ("how can you put something big into something small?") - and of course actual hash tables solve the conundrum by acknowledging collisions and defining ways to deal with them.

But by saying "let's assume no collisions" the reader is asked to just ignore the logical contradiction instead of solving it. That can lead to frustration in the best case or to serious misconceptions in the worst case.

Of course there is nothing wrong with saying "let's ignore those points for now, I'll explain later". Also I guess it depends very much on the learning style of the reader. But bad simplification can be very annoying for some learners.

ksk · on Aug 26, 2016

>For a hash table, the general "promise" is that you map a very large space of keys (e.g. all strings) to a very small space of indexes using a simple function

Well, AFAICT there is no mention of any of this in the document. No "promise" or any such thing has been said or implied. Please stick to what the author has actually said rather than what you imagine people are thinking about when reading the document.

>. Common sense might tell you that this shouldn't be possible ("how can you put something big into something small?")

No, it doesn't tell me that at all, because that point is not brought up in the document. But is that how you think most people learn? Using common sense?

>But by saying "let's assume no collisions" the reader is asked to just ignore the logical contradiction instead of solving it.

Sure, you think those are the logical contradictions in the readers' minds. Can you actually establish that?

> But bad simplification can be very annoying for some learners.

A list of what is annoying for "some" learners is quite possibly very very large. Lets not....

--

Anyway, we can go down this rabbit hole, but it isn't actually productive, and quite boring to be honest. My point is that every-time someone tries simplify something for beginners, people seem to relish that opportunity to show everyone how "smart" they are about trivial data-structures. Yes, if you want somebody to make an expository article for beginners with the detail of something like TAOCP, then yeah, you're going to be disappointed. But, you can and should learn Newtonian physics, even if its "wrong", before you move on to Einstein.

coldtea · on Aug 27, 2016

>Well, AFAICT there is no mention of any of this in the document. No "promise" or any such thing has been said or implied.

I'm not sure you got the parent's comment. What you wrote about is true -- and that's what's wrong with the document. That it doesn't mention such a promise, not it describe an actual hash table implementation (where collisions are inevitable and should be dealt with).

>Please stick to what the author has actually said rather than what you imagine people are thinking about when reading the document.

Again, the problem is that people sticking with "what the author has actually said" will not learn how hash tables actually work.

>No, it doesn't tell me that at all, because that point is not brought up in the document. But is that how you think most people learn? Using common sense?

Now you're just being contradictory to be contradictory. Or trolling.

Yes, people use common sense to learn -- and to understand what they are learning. How is this not obvious?

>Sure, you think those are the logical contradictions in the readers' minds. Can you actually establish that?

I'm a reader, and immediately jumped to see the same logical contradiction.

Not even sure what you're trying to defend exactly. Sloppy examples?

ksk · on Aug 27, 2016

>Again, the problem is that people sticking with "what the author has actually said" will not learn how hash tables actually work.

Huh? People are complaining about the implementation not having collision detection. The author clearly lays out what collisions are, why they are required and also that they need to be dealt with. Did you read the article?

>Yes, people use common sense to learn -- and to understand what they are learning. How is this not obvious?

It would help if you didn't change the context and scope of what I have said. I am not talking about some random topic that you can derive through first principles or common sense or other logical means. To me its obvious you didn't understand what I meant or what I was replying to. Its pointless to argue further and derail the thread.

>Not even sure what you're trying to defend exactly. Sloppy examples?

Not to sound gratuitously rude, but if you don't know what my point is, why not ask me first?

douche · on Aug 27, 2016

How hard is it to introduce the idea of slots or pidgeonholes that the elements are stuck in, and if there is more than one item there, it is added to a list/collection?

hexane360 · on Aug 26, 2016

I'm not sure I understand. OP just made an analogy to red black trees. He's not suggesting anyone will make bad trees because of this document.

He's just saying that the hash table is pretty useless if two random keys could access the same memory.

ksk · on Aug 26, 2016

>He's just saying that the hash table is pretty useless if two random keys could access the same memory.

That is already mentioned in the document itself. Also mentioned is that a proper implementation would have to deal with collisions.

As to it being useless.. the value of content is in its explanation of simple concepts, not in some imagined production-quality requirement that people want to assume it must have.

coldtea · on Aug 27, 2016

>As to it being useless.. the value of content is in its explanation of simple concepts, not in some imagined production-quality requirement that people want to assume it must have.

The value of an explanation of simple concepts is in their simple AND adequate explanation.

You can cut the implementation details, but can't cut essential properties to simplify, because then you're not explaining the original concept at all.

duaneb · on Aug 26, 2016

Yea, a hash table without even bucketing(?) is laughably useless.

elliotec · on Aug 26, 2016

At least the ASCII art for the hash table section was pretty funny.

noobermin · on Aug 26, 2016

One thing's for sure, it caught my attention. Humungo text that made me chuckle. Then, I clicked on the text, and it was the example. Pretty clever.

mthoms · on Aug 26, 2016

At first glance, this looks like it could be a useful reference. Regardless it's worth a visit for the ASCII art alone. My favorite: the literal interpretation of "hash tables".

vonmoltke · on Aug 26, 2016

> algorithms are implemented with data structures

Eh, for a certain definition of "algorithm". Plenty of signal processing algorithms, for instance, have nothing to do with data structures.

riboflava · on Aug 26, 2016

Yes, I would say the relationship is the opposite. Data structures imply a set of algorithms based on the promises of the data structure implementation. Algorithms themselves are free-floating things that can be reasoned about just as well in a C program as an instruction flow diagram as a Turing machine program. Take for instance an algorithm to divide two 32-bit integers. Or an algorithm that computes the max in a sequence (which is an abstraction) of numbers. There is no data structure, unless the argument is that a memory cell is itself a data structure.

lfowles · on Aug 27, 2016

As the title of Wirth's book goes:

Algorithms + Data Structures = Programs

jjnoakes · on Aug 26, 2016

I would consider the signal's value over time to be a sequence.

vonmoltke · on Aug 26, 2016

Algorithms that work over multiple samples are not tied to a data structure, except in an overly broad and useless sense of "data structure". A time or frequency series is not a data structure. It can be stored in any of several data structures, with various pros and cons depending on other circumstances, but the choice of data structure has nothing to do with the algorithm.

squeaky-clean · on Aug 26, 2016

> It can be stored in any of several data structures, with various pros and cons depending on other circumstances, but the choice of data structure has nothing to do with the algorithm.

It has nothing to do with the theory of the algorithm. But everything to do with the implementation. So I think the original statement is true.

> algorithms are implemented with data structures

riboflava · on Aug 26, 2016

Where are the data structures? [0]

    MIN := X[N]; k := N;
    for j := N-1 step -1 until 1 do
      if X[j] < MIN then
        begin MIN := X[j]; k := j;
        end;

[0] https://assets.cs.ncl.ac.uk/seminars/139.pdf

icebraining · on Aug 26, 2016

riboflava · on Aug 26, 2016

X is data. There is no defined structure to it apart from the fact that there are N elements and you can select them. Is it a linked list, is it a random access array? It is not yet defined to be structured in any way. Furthermore you can analyze (and implement) this algorithm independent of the assumptions you make about how the access operation is performed, and then add those assumptions back in if you want to do another analysis.

icebraining · on Aug 27, 2016

You seem to be disputing a bunch of statements nobody made; all was said was "algorithms are implemented with data structures", and that example fits - you need a data structure to implement that algorithm. That's all that was claimed.

squeaky-clean · on Aug 27, 2016

> X is data. There is no defined structure to it apart from the fact that there are N elements and you can select them. Is it a linked list, is it a random access array? It is not yet defined to be structured in any way.

Sure, but then that's not the entire implementation, error: X is undefined. If you want to actually run that code, it needs to be one of those (or many other) things.

kylehotchkiss · on Aug 26, 2016

Question: for the hashtables example, it looks like it's setting arbitrary indexes inside a JS array, instead of incrementing ones. IIRC, Javascript stringifys arrays with gaps like this:

[ 1,2,3,undefined/null,undefined/null,undefined/null,undefined/null,undefined/null,4,5,6,undefined/null,undefined/null,undefined/null] - With the hashtables function, if you accidentally generate an index that is say 100,000, wouldn't you end up with an unwieldy-sized array that would be more difficult to search than one like this:

[{ value: 1, index: 103405}, { value: 2, index: 14550 }]

Or does JS store the arrays as an object internally and just stringify them with all the blanks?

wolf550e · on Aug 26, 2016

JS engines have both sparse and dense representations for arrays and array instances switch from one to another when you insert and delete elements. Details are of course implementation defined.

HiroshiSan · on Aug 27, 2016

This is fucking awesome, did you draw the ascii pictures yourself?

zeusk · on Aug 26, 2016

This is too dumbed down, imo.

Also, the part where it explains why memory is zero-addressed is so wrong, it rustled my jimmies.

ksk · on Aug 26, 2016

>This is too dumbed down, imo.

That is why the word 'simplified' appears in the title, and the words 'super simplified' appear in the linked document. The purpose is to convey a point, and take away other distractions. Its probably not a good idea to introduce calculus to first graders either. If you don't think omitted content is a distraction, then it's probably not meant for you.

andrepd · on Aug 27, 2016

I have 2 issues with this. First, you could say what you wanted to say in 1/3rd of the length if you weren't speaking in such a conversational manner and interspersing your discourse with irrelevant jokes and stuff like ""OOooooOOOooh soo exciting" right?". I don't read stuff on data structures to amuse myself. I want it to be as succint and to the point as possible and not waste my time kidding around. If I want to kid around I will do something else. But I guess some people prefer this, I suppose.

Second, I don't think the best place to go on lengthy explanations with figures included is in the comments of a source code file. Markdown is your friend.

seagreen · on Aug 27, 2016

  I have 2 issues with this [..] I don't read stuff on data 
  structures to amuse myself. I want it to be as succint and 
  to the point as possible and not waste my time kidding 
  around.

Someone should write a serious data structures book for you. Oh wait, they have, there are literally hundreds of them.

If you're attacking a topic with your full attention then you want maximum information as fast as possible.

If you're tired on a friday and want to learn something and have fun then information density doesn't matter anymore.

TuringNYC · on Aug 27, 2016

There is no shortage of Algorithms and Data Structures books. But different people prefer different styles of learning and interaction. This meets the needs for some people. I appreciated it and encourage James Kyle to continue his great effort.

mcbits · on Aug 27, 2016

I can't speak for everyone, but I read conversational prose a lot faster than dense technical writing. If I were to read enough of each style to absorb the same amount of information, I don't think it's safe to assume I would spend less time on the dense material just by virtue of its length.

weatherlight · on Aug 27, 2016

lighten up :) the giant "Itsy Bitsy Data Structures" in itself was pretty light hearted.