One big problem with the conclusion is that intuitions from low dimensional spac...

Terr_ · on April 20, 2018

> One big problem with the conclusion is that intuitions from low dimensional spaces often don’t carry over to high dimensional spaces. e.g. common example is how the volume of the unit hypersphere intersected with hypercube ratio goes to zero.

Is that un-intuitive, or am I looking at it the "wrong" way? I'm imagining the overlap of a 2D square/unit-circle versus the overlap of a 3D cube/unit-sphere, and from those two data points there's already a downward trend.

I mean, the 2D case is really just the single cross-sectional slice from the 3D case through the middle, the one that that has the greatest possible overlap. All other possible slices will less overlap.

Following that logic, a 3D square/cube overlap is likewise the the "middle cross-section of greatest overlap" for a 4D scenario, etc.

Strilanc · on April 20, 2018

Try this one.

I give you a megabox: a million dimensional hypercube spanning one meter along each axis. I also provide a bunch of megaspheres, million-dimensional hyperspheres scaled to be 0.99 meters in diameter. How many megaspheres can you fit into the megabox?

sudhirj · on April 20, 2018

The first answer in my head is 1, but I'm guessing this is more akin to 'how many circles can you fit in a sphere?'?

Strilanc · on April 20, 2018

The spheres and the cubes are of the same dimension, so it's not like trying to put flat things inside of a thing with volume. (Technically I should have said million dimensional balls instead of spheres. Sorry if that was confusing.)

sudhirj · on April 20, 2018

Then isn’t the answer 1? For 2 dimensions, one circle fits in a square, in three dimensions one sphere fits in a cube. Should be the same for any value of n?

Strilanc · on April 20, 2018

Consider that the distance between opposite corners of a square is √2 meters, but for a cube it's √3 meters. In general, for an n-dimensional hypercube of diameter 1m, the exactly-opposite corners are √n meters apart.

Meaning, in a million-dimensional meter-diameter hypercube, exactly-opposite corners will be a kilometer apart. There's a lot more diagonal wiggle room than you think.

therein · on April 20, 2018

That blew my mind but it still sounds like it would generalize, wouldn't it?

jstanley · on April 20, 2018

I'd love to hear the correct answer to this, with explanation.

OscarCunningham · on April 20, 2018

I can't tell you the exact answer, but it's at least 10^100000.

The coordinates of the centre of each megasphere have to lie in the range [0.49,0.51]. Let's just think about megaspheres which are pushed as far as they will go into some corner, so their coordinates are each either 0.49 or 0.51. For each of the one million directions, the distance between the centres of two megaspheres in that dimension will either be 0.02 or 0, depending if their coordinates are the same or different. So by Pythagoras' Theorem the total distance between their centres will be sqrt(n)0.02, where n is the number of dimensions in which their coordinates differ. In order for them to fit the distance must be at least 0.99, so we need sqrt(n)0.02 >= 0.99, which means we need n >= 2451. But we have 1000000 coordinates to choose from, so it's not hard at all to make sure that at least 2451 of them differ. We can start with the sphere whose centre has all coordinates equal to 0.49. Then the sphere with coordinates 0.51 in the first 2451 dimensions, and all others 0.49. Then the sphere with coordinate 0.51 in the next 2451 dimensions. And so on.

That squeezes in at least 1000000/2451 = 407 spheres, but even after that there's loads more room. How about the sphere that has coordinates 0.49 and 0.51 alternating? Or changing every 3? In fact if we just assign coordinates at random then each dimension has a 50/50 chance of being different, so we would expect them to be different in 500000 dimensions. The probability of them being different in only 2450 dimensions is given by (1000000 C 2450)/(2^1000000) = 10^-293570. The number of pairs of spheres is quadratic in the number of spheres. So if we just shove the spheres into corners at random we can expect to place about sqrt(1/10^-293570) = 10^146785 spheres before two of them collide.

And that still leaves loads of room away from the corners! The point at (0.5,...,0.5) is still a distance of sqrt(1000000)0.01 = 10 away from the centre of any sphere we've placed so far, so there's still plenty of space for more spheres.

EDIT: The coordinates should actually be 0.495 and 0.505, but the principle is the same.

Strilanc · on April 20, 2018

I don't know the exact answer, but the lower bound I computed had over a quarter million digits. See [1]

1: http://algassert.com/puzzle/2014/07/08/Boxing-Megaspheres.ht...

OscarCunningham · on April 20, 2018

I can improve the lower bound a bit.

Let V be the volume (content) of a hypersphere of radius 0.99 (twice the radius of the spheres in the question). The total volume in which the centres of the spheres can lie is 0.01^1000000. So if the densest packing uses n spheres, then we must have nV >= 0.01^1000000 or else there would be a point at least 0.99 away from the centre of any sphere, and we could add an extra one there.

Using Stirling's approximation to bound V above gives that n is greater than 10^388118.6...

Strilanc · on April 20, 2018

I don't follow the logic here. How are you getting the equation nV >= 0.01^1000000? Shouldn't there be a packing efficiency constant in there? It almost looks like you're trying to pour the spheres like a liquid.

OscarCunningham · on April 20, 2018

Imagine just putting the spheres in one at a time until you can't put in any more. Why can't you put in any more? It must be that every point of [0.495,0.505]^1000000 is within 0.99 of the centre of one of the spheres. Otherwise you could put in a new sphere with that point as its centre. This means that if we imagine spheres of radius 0.99 around each of our original spheres (which only have radius 0.495) those big spheres must cover all of [0.495,0.505]^1000000. Hence their combined volume must be as large as the volume of that box. Note that I defined V to be the volume of a sphere with radius 0.99, not diameter 0.99 as in the statement of the problem.

Strilanc · on April 20, 2018

Thanks, that's much clearer and I understand now. That's a good simple lower bound.

ubershmekel · on April 20, 2018

Downward trend doesn't imply trending to zero. Would your intuition suggest a trend to zero?

Even if it does - you can build intuitions to explain unintuitive results. Unituitive is subjective and a similar word would be "surprising". Because you can look at data to explain a surprising result - it doesn't make the initial result less surprising.

dTal · on April 20, 2018

Don't forget the 1D case, which has a 100% overlap.

pbhjpbhj · on April 20, 2018

Doesn't a sphere require all points to be separated from the centre point by the same distance r, where r>0? A point isn't a sphere.

gus_massa · on April 20, 2018

Are you taking about the orange peel, the orange juicy part or the whole fruit?

If you consider only the "orange peel", if the center is 0 and r=1, then the 1D version is just two points {-1} and {1}, not only one point.

If you consider only the "whole orange" then the 1D case is the segment [-1, 1], if the center is 0 and r=1

In both case it is equal to the 1D "square" equivalent.

detaro · on April 20, 2018

Really depends on who you are asking for a definition. E.g. there is the term "point sphere" for the special case of a sphere that is a point.

pbhjpbhj · on April 20, 2018

Is there any interesting result that comes from casting a point as being any shape (presumably the point-sphere is also a point-dodecahedron, etc.)?

chestervonwinch · on April 20, 2018

Define a topology on any set you'd like and a sphere is just the set of limit points of any open ball.

dTal · on April 21, 2018

The 1D case is a line segment, not a point. A point is the 0D case and indeed a degenerate case.

Zophike1 · on April 22, 2018

> One big problem with the conclusion is that intuitions from low dimensional spaces often don’t carry over to high dimensional spaces. e.g. common example is how the volume of the unit hypersphere intersected with hypercube ratio goes to zero. One funny thing I saw once was something like “the real curse of dimensionality is how different optimization in high dimensional spaces is compared to low dimensional spaces”.

How so usually in Pure Mathematics(Analytic area's) everything done in R^{1} R^{2} is usually generalized to R^{N} and much of the intuition carriers over. So how does it fail here in the context of Machine Learning ?

Note: I know nothing about ML :(

edraferi · on April 22, 2018

All the mechanics scale like you’d expect. You have bigger matrices containing more observations with more parameters per observation. Keep the dimensions aligned properly and everything works out just like it would with smaller vectors.

The discrepancies appear when you start dealing with probabilistic stuff. Suddenly everything’s conditioned by everything else, and your complexity skyrockets. Example: it’s easy to find people with about average height and weight, but the more physical characteristics you measure (increasing dimensionality) the more like you are to find something statistically unusual, eg very long fingers. Very interesting Air Force research on this [0]

WRT to optimization, higher dimensions can help avoid the local minima problem. When you have 1000s if possible directions to go, it’s unlikely that none of them will yield a better answer.

[0] https://www.thestar.com/news/insight/2016/01/16/when-us-air-...