Hacker News new | past | comments | ask | show | jobs | submit login
Itsy Bitsy Data Structures – Simplified examples of many common data structures (github.com/thejameskyle)
223 points by thejameskyle on Aug 26, 2016 | hide | past | favorite | 49 comments



Hi I wrote this thing. I guess I'll comment on this thread since it got some attention.

I think a lot of people that are in this thread went into this project with the complete wrong state of mind.

I'm not trying to teach people how a hash table or linked list implementation works. I'm not trying to make a library that is useful to anyone.

What I'm trying to do is connect the dots between a bunch of different important concepts in data structures.

People can go read things about each of these individual topics to great length and I encourage them too, but in general there are a lot of great computer science resources for just piecing a lot of related topics together for beginners.

Sure you can read a 600 page book that will cover things from end to end, but those are very intimidating when you have no prior knowledge of the topic and sometimes you don't get as much out of them if you don't know the basics.

If I can provide people with just a quick highlight reel of essential knowledge of data structures then I've done what I set out to do.

I hope Hacker News can appreciate that, I don't expect it to, but I can hope.


As someone who has implemented all these before, I thought the presentation was great. Yes, there are some holes, but in under 1500 lines, ASCII art and all, a lot was covered. More importantly, the ideas put forward can always be built upon--more data structures, more talk about time/space complexity or even probability (e.g. bloom filters), forays into solving famous problems with those data structures (optimally or not), etc, etc. Whether this is the first and final installment or not, it's a nice contribution.


I didn't read it, because I know the topic already, but also because I found it too long for an overview. If the crowd with reduced attention span is your target group, this doesn't work. This would amount to a 20 pages perhaps. I'd assume that's roughly how much it would need in the 600 page book, too.

Also, there is no TOC to gloss over, the ascii, cute as it is, is harder to fly over than legit text with embedded code snippets.


It perfectly matches up with a 30 minute talk that I gave yesterday. It's meant to be read sequentially not as a reference. It's helped a lot of people already, and this format has worked in the past: the super tiny compiler has been turned into interactive tutorials and even recommended by professors. People love the format and I think it connects with people better.

But then again I don't know why you thought your opinion mattered if you didn't even read it. I guess that is the quality feedback I do expect from Hacker News commentors though.


i for one read it and found it useful. thanks for your time and energy spent making and sharing it.


Most languages have a built in type for a dynamic array, but many of them lack a built-in queue. I have a favorite trick for a quick-and dirty queue in those languages. You need an array (initially empty) and an integer (initially zero). To add to the queue, add to the array. To get the front of the queue, just do array[int]. And finally, to remove from the queue, simply increment the integer. The advantages of this approach are simplicity and speed: each operation takes amortized constant time. The obvious disadvantage is that it's potentially a major memory leak. For instance, if you try to do breadth-first search of a tree using this queue, your memory usage will be proportional to the size of the tree. How much worse than optimal is this? It depends on the tree. If it's a complete binary tree, you've doubled your memory usage, which might not be too bad. But if the tree is a path, then you use linear memory, whereas a proper queue would only use O(1) memory.


You can avoid the memory leak without adding too much complexity by just making it a circular buffer:

https://en.wikipedia.org/wiki/Circular_buffer


I wouldn't call this a trick. This is a staple in the data structures books when they teach how to implement queues using arrays. The same approach can be extended to implementing circular and double ended queues as well.


You could periodically delete the wasted space. I think that if you, say, delete the wasted space when it is equal to the used space, you still still have O(1) amortized time.


You could, but then it's not quite so quick-and-dirty anymore :) The de-queue method goes from "self.i += 1" (Python syntax) to

    self.i +=1
    if self.i > len(self.a)/2:
      self.a[:self.i] = []
      self.i = 0
I didn't test the above code, and I am only 50% sure it is bug-free. I'm 99% sure the leaky version is bug-free. Memory leaks are not a bug if I document them, right? ;)

But yes, periodically deleting the wasted space is still easier than trying to do the usual thing with circular buffers and stuff :)


Easier and a major performance disaster. You shouldn't ever do this ever. If you are a programmer, it's your job to implement this correctly.


Asooka: a lot of your comments are showing up as dead. I have no clue why, they seem fine to me. I can't reply directly to the one you made in this thread: it's dead too.


I definitely see your point! I guess there is a place for algorithms at any point on the easy/good trade-off.


Seems to be a lot of snark in this thread already. If you see a problem, create an issue or submit a pull request. It's great that OP is trying to help spread knowledge.


Hash table realization seems to be harmful with the "no collisions" assumption. That's like writing a red black tree and assuming that insert/delete operations will keep tree balanced so we don't need rotations.


How is it harmful? You really think that somebody who is going to write a red black tree implementation (or similar) is going to use this document as a reference? That'd be like complaining about a high school physics textbook because somebody from Boeing might use it to design a plane.


Some simplifications may actually be harmful to understanding if they throw away key points or introduce logical contradictions.

For a hash table, the general "promise" is that you map a very large space of keys (e.g. all strings) to a very small space of indexes using a simple function. Common sense might tell you that this shouldn't be possible ("how can you put something big into something small?") - and of course actual hash tables solve the conundrum by acknowledging collisions and defining ways to deal with them.

But by saying "let's assume no collisions" the reader is asked to just ignore the logical contradiction instead of solving it. That can lead to frustration in the best case or to serious misconceptions in the worst case.

Of course there is nothing wrong with saying "let's ignore those points for now, I'll explain later". Also I guess it depends very much on the learning style of the reader. But bad simplification can be very annoying for some learners.


>For a hash table, the general "promise" is that you map a very large space of keys (e.g. all strings) to a very small space of indexes using a simple function

Well, AFAICT there is no mention of any of this in the document. No "promise" or any such thing has been said or implied. Please stick to what the author has actually said rather than what you imagine people are thinking about when reading the document.

>. Common sense might tell you that this shouldn't be possible ("how can you put something big into something small?")

No, it doesn't tell me that at all, because that point is not brought up in the document. But is that how you think most people learn? Using common sense?

>But by saying "let's assume no collisions" the reader is asked to just ignore the logical contradiction instead of solving it.

Sure, you think those are the logical contradictions in the readers' minds. Can you actually establish that?

> But bad simplification can be very annoying for some learners.

A list of what is annoying for "some" learners is quite possibly very very large. Lets not....

--

Anyway, we can go down this rabbit hole, but it isn't actually productive, and quite boring to be honest. My point is that every-time someone tries simplify something for beginners, people seem to relish that opportunity to show everyone how "smart" they are about trivial data-structures. Yes, if you want somebody to make an expository article for beginners with the detail of something like TAOCP, then yeah, you're going to be disappointed. But, you can and should learn Newtonian physics, even if its "wrong", before you move on to Einstein.


>Well, AFAICT there is no mention of any of this in the document. No "promise" or any such thing has been said or implied.

I'm not sure you got the parent's comment. What you wrote about is true -- and that's what's wrong with the document. That it doesn't mention such a promise, not it describe an actual hash table implementation (where collisions are inevitable and should be dealt with).

>Please stick to what the author has actually said rather than what you imagine people are thinking about when reading the document.

Again, the problem is that people sticking with "what the author has actually said" will not learn how hash tables actually work.

>No, it doesn't tell me that at all, because that point is not brought up in the document. But is that how you think most people learn? Using common sense?

Now you're just being contradictory to be contradictory. Or trolling.

Yes, people use common sense to learn -- and to understand what they are learning. How is this not obvious?

>Sure, you think those are the logical contradictions in the readers' minds. Can you actually establish that?

I'm a reader, and immediately jumped to see the same logical contradiction.

Not even sure what you're trying to defend exactly. Sloppy examples?


>Again, the problem is that people sticking with "what the author has actually said" will not learn how hash tables actually work.

Huh? People are complaining about the implementation not having collision detection. The author clearly lays out what collisions are, why they are required and also that they need to be dealt with. Did you read the article?

>Yes, people use common sense to learn -- and to understand what they are learning. How is this not obvious?

It would help if you didn't change the context and scope of what I have said. I am not talking about some random topic that you can derive through first principles or common sense or other logical means. To me its obvious you didn't understand what I meant or what I was replying to. Its pointless to argue further and derail the thread.

>Not even sure what you're trying to defend exactly. Sloppy examples?

Not to sound gratuitously rude, but if you don't know what my point is, why not ask me first?


How hard is it to introduce the idea of slots or pidgeonholes that the elements are stuck in, and if there is more than one item there, it is added to a list/collection?


I'm not sure I understand. OP just made an analogy to red black trees. He's not suggesting anyone will make bad trees because of this document.

He's just saying that the hash table is pretty useless if two random keys could access the same memory.


>He's just saying that the hash table is pretty useless if two random keys could access the same memory.

That is already mentioned in the document itself. Also mentioned is that a proper implementation would have to deal with collisions.

As to it being useless.. the value of content is in its explanation of simple concepts, not in some imagined production-quality requirement that people want to assume it must have.


>As to it being useless.. the value of content is in its explanation of simple concepts, not in some imagined production-quality requirement that people want to assume it must have.

The value of an explanation of simple concepts is in their simple AND adequate explanation.

You can cut the implementation details, but can't cut essential properties to simplify, because then you're not explaining the original concept at all.


Yea, a hash table without even bucketing(?) is laughably useless.


At least the ASCII art for the hash table section was pretty funny.


One thing's for sure, it caught my attention. Humungo text that made me chuckle. Then, I clicked on the text, and it was the example. Pretty clever.


At first glance, this looks like it could be a useful reference. Regardless it's worth a visit for the ASCII art alone. My favorite: the literal interpretation of "hash tables".


> algorithms are implemented with data structures

Eh, for a certain definition of "algorithm". Plenty of signal processing algorithms, for instance, have nothing to do with data structures.


Yes, I would say the relationship is the opposite. Data structures imply a set of algorithms based on the promises of the data structure implementation. Algorithms themselves are free-floating things that can be reasoned about just as well in a C program as an instruction flow diagram as a Turing machine program. Take for instance an algorithm to divide two 32-bit integers. Or an algorithm that computes the max in a sequence (which is an abstraction) of numbers. There is no data structure, unless the argument is that a memory cell is itself a data structure.


As the title of Wirth's book goes:

Algorithms + Data Structures = Programs


I would consider the signal's value over time to be a sequence.


Algorithms that work over multiple samples are not tied to a data structure, except in an overly broad and useless sense of "data structure". A time or frequency series is not a data structure. It can be stored in any of several data structures, with various pros and cons depending on other circumstances, but the choice of data structure has nothing to do with the algorithm.


> It can be stored in any of several data structures, with various pros and cons depending on other circumstances, but the choice of data structure has nothing to do with the algorithm.

It has nothing to do with the theory of the algorithm. But everything to do with the implementation. So I think the original statement is true.

> algorithms are implemented with data structures


Where are the data structures? [0]

    MIN := X[N]; k := N;
    for j := N-1 step -1 until 1 do
      if X[j] < MIN then
        begin MIN := X[j]; k := j;
        end;
[0] https://assets.cs.ncl.ac.uk/seminars/139.pdf


X


X is data. There is no defined structure to it apart from the fact that there are N elements and you can select them. Is it a linked list, is it a random access array? It is not yet defined to be structured in any way. Furthermore you can analyze (and implement) this algorithm independent of the assumptions you make about how the access operation is performed, and then add those assumptions back in if you want to do another analysis.


You seem to be disputing a bunch of statements nobody made; all was said was "algorithms are implemented with data structures", and that example fits - you need a data structure to implement that algorithm. That's all that was claimed.


> X is data. There is no defined structure to it apart from the fact that there are N elements and you can select them. Is it a linked list, is it a random access array? It is not yet defined to be structured in any way.

Sure, but then that's not the entire implementation, error: X is undefined. If you want to actually run that code, it needs to be one of those (or many other) things.


Question: for the hashtables example, it looks like it's setting arbitrary indexes inside a JS array, instead of incrementing ones. IIRC, Javascript stringifys arrays with gaps like this:

[ 1,2,3,undefined/null,undefined/null,undefined/null,undefined/null,undefined/null,4,5,6,undefined/null,undefined/null,undefined/null] - With the hashtables function, if you accidentally generate an index that is say 100,000, wouldn't you end up with an unwieldy-sized array that would be more difficult to search than one like this:

[{ value: 1, index: 103405}, { value: 2, index: 14550 }]

Or does JS store the arrays as an object internally and just stringify them with all the blanks?


JS engines have both sparse and dense representations for arrays and array instances switch from one to another when you insert and delete elements. Details are of course implementation defined.


This is fucking awesome, did you draw the ascii pictures yourself?


This is too dumbed down, imo.

Also, the part where it explains why memory is zero-addressed is so wrong, it rustled my jimmies.


>This is too dumbed down, imo.

That is why the word 'simplified' appears in the title, and the words 'super simplified' appear in the linked document. The purpose is to convey a point, and take away other distractions. Its probably not a good idea to introduce calculus to first graders either. If you don't think omitted content is a distraction, then it's probably not meant for you.


I have 2 issues with this. First, you could say what you wanted to say in 1/3rd of the length if you weren't speaking in such a conversational manner and interspersing your discourse with irrelevant jokes and stuff like ""OOooooOOOooh soo exciting" right?". I don't read stuff on data structures to amuse myself. I want it to be as succint and to the point as possible and not waste my time kidding around. If I want to kid around I will do something else. But I guess some people prefer this, I suppose.

Second, I don't think the best place to go on lengthy explanations with figures included is in the comments of a source code file. Markdown is your friend.


  I have 2 issues with this [..] I don't read stuff on data 
  structures to amuse myself. I want it to be as succint and 
  to the point as possible and not waste my time kidding 
  around.
Someone should write a serious data structures book for you. Oh wait, they have, there are literally hundreds of them.

If you're attacking a topic with your full attention then you want maximum information as fast as possible.

If you're tired on a friday and want to learn something and have fun then information density doesn't matter anymore.


There is no shortage of Algorithms and Data Structures books. But different people prefer different styles of learning and interaction. This meets the needs for some people. I appreciated it and encourage James Kyle to continue his great effort.


I can't speak for everyone, but I read conversational prose a lot faster than dense technical writing. If I were to read enough of each style to absorb the same amount of information, I don't think it's safe to assume I would spend less time on the dense material just by virtue of its length.


lighten up :) the giant "Itsy Bitsy Data Structures" in itself was pretty light hearted.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: