Hacker News new | past | comments | ask | show | jobs | submit login
Sort, sweep, and prune: Collision detection algorithms (2023) (leanrada.com)
340 points by wonger_ 5 months ago | hide | past | favorite | 58 comments



One interesting note on this approach is that the author suggests using a "fast" sorting algorithm like mergesort/quicksort as the sorting algorithm for best performance. But in practice, you may get better performance from a "worse" sorting algorithm: insertion sort.

The reason is that objects in collision detection systems tend to move relatively small steps between frames, meaning you can keep lists from the previous frame that are already mostly sorted. For sorting these mostly sorted lists, insertion sort tends towards O(n) while Quicksort tends towards O(n^2)!


The author covers this pretty much exactly as you describe it. From Part 2 of TFA,

> Let’s look at the sort step, which is the bottleneck of the algorithm according to the analysis.

> You can see that most of the time, the sort does nothing at all! The list is almost always already sorted from the previous frame.

> Even when it becomes unsorted, it usually just takes a couple of swaps to be sorted again.

> Here’s insertion sort in action:


If you use a modern sort, it'll just go "Oh that's sorted, we're done". Pattern Defeating Quicksort will do that and so would Glidesort, Driftsort, IPNsort and so on. Most of these algorithms will also correctly shortcut trivial "almost sorted" cases like 123457689 (only longer, if you really only have nine elements you want a custom sort for that, there will be a correct algorithm and it's probably written down somewhere already)

Rust's old sorts, both of them, get this right. There's a fair chance the current version of your C++ stdlib even gets this right in its unstable sort. In 1965 it's understandable that you reach for an insertion sort for this case, in 2024 it's embarrassing if your language doesn't provide a built-in sort where this just works.


> Rust's old sorts, both of them, get this right.

And the newer sort in the stdlib does too, right?

Link https://www.reddit.com/r/rust/comments/1dl079x/the_rust_stdl...



You don’t just want to sort the data, you actually want to collect a list of overlapping pairs, (or more like you want to collect the list of pairs that have changed since last time).

Using a built in sort will give you the same complexity, but iterating over 10000 elements again to gather this list is an overhead that you can avoid with a DIY approach


For what it's worth, Haskell's built-in sort does this, and I think Python, too. (But I'm not as confident about Python. My memory is a bit hazy.)


There is an alternative to sorting every step : Build your indexing structure a little looser, so that you catch the candidates collisions when object have moved less than epsilon.

For example that can be done by increasing the radius of the spheres by epsilon. As long as the spheres have not moved by epsilon, you don't need to recompute the index.

To avoid latency peaks when you do need to recompute the index, you can start building a lagging index by sorting 10% each frame (aka amortizing the computation cost). After 10 frames you have an index that is valid as long as the position is within epsilon of the position 10 frames ago.


> For sorting these mostly sorted lists, insertion sort tends towards O(n) while Quicksort tends towards O(n^2)!

Only if you pick your pivot really, really badly. You can randomise your pivot to get O(n log n), or in the case of already almost sorted lists, you can pick a pivot by just picking the element in the middle of the list.

But you are right, that even with optimal pivots, QuickSort is still O(n log n) even in the best case. There are simple variants of merge sort that give you O(n log k) behaviour where k is the number of runs (both ascending and descending) in the data. The 'sort' you get by default in eg Haskell's standard library uses such an algorithm. I think Python does, too.


This was really well put together.

What’s funny is that I’ve been doing some form of gamedev since late 90s and most of this is abstracted by engines at this point, but this is essential to understanding how complex system simulations work.

Thanks to the author for making a very accessible article


I've always enjoyed this document regarding continuous collision detection:

https://github.com/bepu/bepuphysics2/blob/master/Documentati...

The library itself is amazing in terms of performance. It is a bit challenging to integrate with though due to the amount of optimization involved.


> This naive algorithm runs in O(n2) time in Big O terms.

Is this true? The naive algorithm's outer loop (i) counts n - 1, while the inner loop (j) begins at i + 1 (so counts progressively less than n - 1).

Not a CS major, is this roughly equivalent (for large values of n) to O(n2) or is it, as it appears, something less?


You're right that it's not exactly n^2. For the i-th element we perform (n - i - 1) comparisons (indexing from 0). This adds up to a total of (n - 1) * n / 2 comparisons. (see https://en.wikipedia.org/wiki/Triangular_number)

In the end it doesn't make a difference for big-O analysis because it's used to describe the behavior when n approaches infinity, where the quadratic factor takes over.


The "optimisation" of starting the inner loop at `j = i + 1` is done to avoid testing every pair of objects twice.

[Ed: I should add that it also prevents testing an object against itself.]

The algorithm is O(n^2) because every pair will be checked once.


Big O is just complexity classes, describing how number of abstract computations scale (that's the key word) with input size (input list length). Generally speaking, if you could analytically express number of computations as a function of input size, Big O takes the largest term and drops all factors.

It does not necessarily describe performance of the algorithm. `20n2^+5n` and `2n^2 + 9001n` are both O(n^2)


> It does not necessarily describe performance of the algorithm.

Not necessarily true. It does indeed describe performance of the algorithm. It just compares scenarios with coarser granularity. You can tell from the very start that a O(1) algorithm is expected to outperform a O(N²) alternative.


My algorithms class taught to think of it not as "describing performance" in an absolute sense, but as "describing how performance changes as the size of the input data increases".

It is not necessarily true that an O(1) algorithm will outperform an O(n^2) alternative on a particular set of data. But it is true that an O(1) algorithm will outperform an O(n^2) alternative as the size of the input data increases.


This sometimes doesn't work out in practice because the scaling involved runs into a limitation your big-O model didn't account for. Typical examples are: The size of the machine registers, physical RAM, addressable storage, or transmission speeds.

If your O(1) algorithm takes an hour for any input, and the competition is O(n) it may seem like there must be cases where you're better, and then you realise n is the size in bytes of some data in RAM, and your competitors can do 4GB per second. You won't be competitive until we're talking about 15TB of data and then you remember you only have 64GB of RAM.

Big-O complexities are not useless, but they're a poor summary alone, about as useful as knowing the mean price of items at supermarkets. I guess this supermarket is cheaper? Or it offers more small items? Or it has no high-end items? Or something?


> (...) but as "describing how performance changes as the size of the input data increases".

Yes, that's the definition of asymptotic computational complexity. That's the whole point of these comparisons. It's pointless to compare algorithms when input size is in the single digit scale.


You could have an O(N^2) algorithm outperform an O(N) on the scale of 10,000 (or whatever scale you want to imagine). The big-O notation compares only asymptotic behaviour, and sometimes the lower power factors are overwhelming.


Counter-examples: https://en.m.wikipedia.org/wiki/Galactic_algorithm

All that different complexity classes show is that there is some n for which one outperforms the other. In practice this n can be extremely large.


No. This is outright WRONG. As others have pointed out, complexity classes describe asymptotic complexity. In other words, Big O describes scaling properties of the algorithm with regards to input size. The value is rough regression shape of the performance function if computed from 0 to infinity.

O(1) is by no means expected to outperform O(N^2) for any practical dataset. One algorithm is constant time (even though the constant can be so large that it will not finish before heat death of the universe), the other gets quadratically slower as the input size increases.

For example if O(n^2) algorithm can be implemented with SIMD, you could easily expect it to heavily outperform O(n) algorithm on many practical datasets when wall-clock performance is measured.

You can't draw immediate conclusions about performance of the algorithm from Big O class alone.


Yup, something that often is omitted about O notation is absolute per-item and setup/activation cost.

SIMD is also not easy to reason about as the formula changes quite a bit once you become bottlenecked by bandwidth - Zen 5's AVX512 implementation lets you blaze through data with 128-256B/cycle (think >128B/0.25ns) which is absolutely bonkers. But that quickly comes to a halt once you exceed data in L1 and L2 - streaming data from/to RAM is just about 32B/c.

So the consideration of memory traffic becomes important, where "better by Big O classification" algorithms with more granular data access might start to outperform.


it's the summation of numbers from 1 to n which is n(n+1)/2. This reduces to quadratic complexity because big O notation ignores all coefficients and terms that scale slower


I don't know if this will make more sense to you but Big O is like limit calculations from calculus.


这今天


I enjoyed the use of illustration. Seems like an appropriate usage.

Sometimes I feel like these articles with interactive illustrations are more like an excuse to put together a bunch of cool demos, like there's a lot of fluff with not much substance (a bit like a TED talk), but this one didn't let the illustrations take over.



A long time ago I did something similar but rather than sorting I just kept index lists for each direction and the objects sorted themselves. Meaning like there are 4 lists `objectIndicesSortedByLeftEdge/RightEdge/TopEdge/BottomEdge` If an object moves horizontally then it updates its own indices in the leftEdge and rightEdge arrays. This is because as it moves it likely only has to swap 1 or 2 indices at most to move itself.


I think your method seems useful for scenes that are mostly static. the "recreate the graph" approach seems better the more dynamic things are.


objects are unlikely to jump from one side the screen to the other so comparing them to stuff on the other side of the screen seems like a waste (which is what a generic sort will do).


I hadn't seen this before, but isn't it similar to using something like a quad-tree to reduce the number of potential colliders?


Yes, although you're more likely to see something like a k-d tree in offline rendering than in real-time rendering.


Got distracted (in a good way) by this website. It's fun and inspiring.


> I won’t cover other approaches, such as space partitioning or spatial tree subdivision

I'm curious about this comment, anyone know if the algorithm in the article is generally faster than space partitioning/spatial tree subdivision?

A long time ago I used a spatial tree type of approach which seemed naively to be a pretty good approach, but I never investigated or compared algorithms other people were using (this was 80's, pre-internet).


The complexity involved in maintaining space partitioning, tree subdivision etc can end up being a big hassle, especially if you have huge numbers of moving objects.

It's much easier to write, debug and optimize something that manages a single list of entities, or a grid of i.e. 256x256 'cells' that each contain a list of entities, than it is to set up a complex partitioning scheme that maintains all your tree invariants every time an object moves.

In the days of i.e. DOOM or Quake the performance of these underlying systems was much more important than it is now, so it (IMO) made more sense for the authors of those engines to cook up really complex partitioning systems. But these days CPUs are really good at blasting through sorted arrays of items, and less good at chasing linked lists/trees (comparatively) than they used to be, due to pipelining. Your CPU time isn't going to be spent on managing those entity lists but will instead be spent on things like AI, rendering, etc.


I was curious if any linear programming methods have been proposed for the problem since you just need to see if a point is inside a polyhedron or if two polyhedra intersect. There are.https://users.encs.concordia.ca/~akgunduz/CollisionDetection...


A tangent, but HNers interested in an article on collision detection might know: are there any similar articles that show how to compute the intersection of two capsules (as in the space that that a sphere moving in a straight line in a time step occupies) in 3D? My own hobby 3D game got stuck on that hurdle and I couldn’t find any examples anywhere :(


Capsule-capsule overlap can be detected by treating the capsules as line segments with radius/thickness. But I think you need something more complicated than capsule-capsule intersection to truly solve the problem (continuous collision detection between dynamic objects).

The Rapier project and its documentation[0] might be of interest to you. Rapier has a more sophisticated CCD implementation than most (popular in gamedev) physics engines.

> Rapier implements nonlinear CCD, meaning that it takes into account both the angular and translational motion of the rigid-body.

[0] https://rapier.rs/docs/user_guides/rust/rigid_bodies#continu...


I haven't worked on collision detection since the 1990s. But convex hull collisions are very fast with GJK. And the space swept by translating a convex hull is itself a convex hull. So continuous collision detection (no fly-through) is possible that way. But that's usually overkill. Objects that move more than their own dimension in one frame time are usually bullets, and those should get special handling.


You make one of them have thickness of both, so that the collision detection becomes line vs capsule collision.

Another trick is to rotate both so that one of the move directions is axis-aligned, and then the problem looks more like AABB.


For fine collision between capsules, you consider the capsules as 3d segments with thickness. And two of those collide if the distance between segments < sum of half-thicknesses.

You can check this site with nice animations which explain how to compute the distance between segments : https://zalo.github.io/blog/closest-point-between-segments/

One more generic way to compute the distance between shapes is to view it as a minimization problem. You take a point A inside object 1 and a point B inside object 2, and you minimize the squared distance between the points : ||A-B||^2 while constraining point A to object 1 and point B to object 2.

In the case of capsules, this is a "convex" optimization problem : So one way of solving for the points, is to take a minimization (newton) step and then project back each point to its respective convex object (projection is well defined and unique because of the convex property) (It usually need less than 10 iterations).

In geometry, we often decompose object into convex unions. One example is sphere-trees, where you cover your object with spheres.

This allow you to use fast coarse collision algorithm for spheres to find collision candidates of your capsules : You cover your capsules with spheres (a little bigger than the thickness) so that the capsule is inside the union of the spheres.


Isn’t it just a matter of checking if two lines’ perigee fall within radius sum, and if so check if the respective segments’ closest points to the perigee are within radius sum?


Excellent article. Sorting the list is a really simple and neat idea, without the need for clustering or a special data structure.


I'm a big fan of spatial grid hashes to simplify situations like this. Nice to see other approaches. Iterative Insertion sorting would be easier to port to a compute shader than a spatial grid hash, so maybe this method is better if we get into the millions of objects range?


Isn't spatial hashing most suited to colliding objects of similar sizes, and with similar dimensions in each axis? (So you can match the grid size accordingly) If you've got a mix of very different sizes and shapes, I think sweep and prune can be a better choice


Yes, in TFA the author uses objects that are the same size. I imagine that is why OP brought up spatial hashing since they specifically mention this.


Who specifically mentions what? Unless I'm missing something, nether TFA or the comment I replied to specifically mentioned the objects being similar size.


Since the balls probably don’t move much per frame, should the list be considered “nearly sorted?”


Yes, the author gets to that in the second part.


Not mentioned is spatial hashing, which works well without complex data structures.

Also sad that you need all this complexity to test 25 balls in Javascript. 25^2 is a small number in C, especially when your balls fit in cache.

Still a good introduction to various algorithms, though.


The title was curious to me because I expected more a post about the `intersects` function, e.g the pip algorithm... turns out it's more about the complexity involved by the number of objects to test, which in the end is also interesting.


我Ronald Hansen


E Don Lancaster


This is awesome, very beautiful and well written!

Are there other handy 2-d algorithms that are this well explained? I know redblobgames has a treasure trove too, but curious about more.


Super interesting, my first thought before I read the article was why not a bloom filter but didn’t expect it to be “physical” collision


The animations don’t show on iOS Safari.


Very Nice animated examples




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: