Hacker News new | past | comments | ask | show | jobs | submit login
A Curious Property of 82000 (plus.google.com)
167 points by ColinWright on May 19, 2015 | hide | past | favorite | 44 comments



Tried encoding it as an PBSAT problem. (It's actually relatively simple to encode as one. Just have a fixed upper number of digits, a constraint saying the boolean must be >=2, and n-1 constraints saying the other n bases must equal the binary one. Double the number of digits each time until you get satisfiability, then binary search from there.)

For instance, this is the {2,3} case, with 2 digits:

    * #variable= 4 #constraint= 2
    1 x1 +2 x2 -1 x3 -3 x4 = 0;
    1 x1 +2 x2 >= 2; 
Unless I've done something wrong (which I probably have), the 2,3,4,5,6 case is unsatisfiable at least up to 8092 binary bits, or roughly 10^2435. Not surprising, but still, interesting. Can't go higher or sat4j runs out of memory on my machine.

And yes, it successfully recovers all of the other cases mentioned.

The wonders of punting everything to external solvers. Even considering that there's (massive) problems with this encoding.

Code: https://gist.github.com/TheLoneWolfling/da51e3da0045f53ecf85


I spent a bit of time optimizing it (mainly writing the problem file incrementally). Currently chugging through d=16371.

Rather amusing in a way - 81855 variables, only 9 constraints. (Actually, only 5 constraints, but sat4j seems to split some of them.)

EDIT: 16371 is unsatisfiable. Continuing...

EDIT: 2.1GB PBS file. 6 lines. I now see why the PBS file specification suggests not reading in files line-by-line.

EDIT: 640.572s to read in the problem, versus 101.481s last time. Now solving... If the ratios remain constant this'll take ~4 hours, 25 minutes.


32754 is also unsatisfiable.

My machine won't handle 65521, unfortunately.

If someone with more RAM could run https://gist.github.com/TheLoneWolfling/da51e3da0045f53ecf85 (changing -Xmx4g to something higher in the process), I'd be happy.

In other words, unless I'm doing something wrong, there are no solutions to the 2,3,4,5,6 case below 2^32755, at least.


This is what makes mathematics so damn fascinating. The Basel problem is another example: for any small number you pick I can pick a large enough N that sum(1/n^2) from 1 to N will be closer to pi^2/6 that that small number. That boggles the mind, really. I mean, I just use multiplication, inversion and addition a lot on integer numbers in a rather predictable fashion and get to ... pi^2/6?? There's a huge "why" / "how come" here.

Perhaps one would think that "pi" is this geometry thing ... but then maybe not :) And again, there's \lim_{n\to\infty} 2^n \sqrt{2-\sqrt{2+... \sqrt 2}} = \pi as well which is perhaps even more baffling.

Peter Rozsa most excellent book, "Playing with Infinity" has a chapter "Mathematics is one". There's no such parts of maths as geometry and calculus separately. That book is perhaps the best maths book for people with ... less affinity towards maths, I guess.


Here [1] is a page with lots of neat proofs of the solution of the Basel problem.

[1]: http://math.stackexchange.com/questions/8337/different-metho...


Perhaps totally unrelated, but there is a curious property of 1,000,001 I revisited yesterday that only bit me while learning Erlang.

  1> <<"A">>.
  <<"A">>
  2> <<65>>.
  <<"A">>
  3> <<01000001>>.
  <<"A">>
It would appear this <<..>> syntax parses binary numbers, as 65 is indeed 0b1000001. But the next number in binary is not "B" like I thought it would be.

  1> <<01000001>>.
  <<"A">>
  2> <<01000010>>.
  <<"J">>
As it turns out, the low bits of a million and one correspond exactly to the low bits of 0b1000001 = 65, "A". This seems incredibly unlikely. That's why a million and ten corresponds to "J", and a million and two corresponds to "B".

  * (format t "~2R" 01000001)
  11110100001001000001
  * (format t "~2R" 65)
  1000001
  * (format t "~2R" 01000010)
  11110100001001001010

  3> <<01000002>>.
  <<"B">>


the low bits of a million and one correspond exactly to the low bits of 0b1000001 = 65, "A". This seems incredibly unlikely

Not as unlikely as it seems. Ignore the "and one" part, and the question is simply why 10^6 is 64 modulo 256; but since 10^6 = 5^6 * 2^6 is a multiple of 64, it's must be one of {0, 64, 128, 192} modulo 256, which takes "incredibly unlikely" down to "one out of four possibilities".

(Going a bit further, you can rule out 0 and 128 because 5^6 is obviously odd, and you can rule out 192 because 5^6 is an odd square. But the most important part is that X000000 base 10 is immediately required to be XX000000 base 2 modulo 256.)


Neat! Don't miss the (heuristic) argument in the comment by Tim Gowers about why 82000 was lucky, and that there may not be any other larger number with the same properties.


[deleted]


The argument isn't saying that arithmetic is random. It's saying that a lot of properties of number systems behave as if they are random.

A good example is finding arithmetic sequences of length K in the prime numbers (for example, the sequence 3, 5, 7 is an arithmetic sequence of length 3, as is 17, 23, 29). It can be shown that random sets that have a density similar to the primes (i.e. the chance that N is in the set is proportional to 1 / log(N)) have arbitrarily long arithmetic sequences - as, in fact, do the prime numbers.


"a lot of properties of number systems behave as if they are random"

Choosing a weird base (and yes, base ten is weird) certainly contributes to this effect.


This looks like an interesting perspective. Do you have any pointers to discussions of topics that become more intuitive/clearer when thought about in other bases ?


The whole topic of "why does this random (in base 10) number suddently behaves strange" doesn't exist in e.g. base 2.


Choose a base (b), and a candidate (n).

Assume (n) is a number whose probability of satisfying the given condition for a given base is indistinguishable from other numbers of the same length in the given base.

The probability that (n) will satisfy the condition for all bases from 2 to (b) is:

product ((2^(log(k,n)-1))/((k-1)(k^(log(k,n)-1)))), k=2 to b

https://www.wolframalpha.com/input/?i=product+%28%282%5E%28l...

Edit: The log operations should be floored, but this still gives one a sense of the rapidly diminishing probability.


Your link does not actually lead to a graph. Can you summarize the findings of your post?


The original article, by Thomas Oléron Evans [0], does something similar. I made a graph of it here. It's a heuristic estimate of the expected number of integers that satisfy the property for bases (2..b), as a function of b.

https://i.imgur.com/3h8GSIp.png

https://gist.github.com/anonymous/9f71926313e6ee5c595c

As the author writes, for b <= 4, the expected number diverges to +infinity. For b = 5, the expected number is less than one, and quickly decreases. So, the conjecture is that there is no satisfying solution for b >= 6.

[0] http://www.mathistopheles.co.uk/maths/covering-all-the-bases...


There are an infinite number of solutions for the constraint for a single base. When you consider the probability of a solution occurring amongst the set of numbers of equal length, it becomes apparent that the size of each set grows faster than the number of solutions it contains. Requiring the number to satisfy the constraint for multiple bases causes the probability to decrease rapidly. This naive heuristic analysis suggests that the probability of finding a solution is not only inversely proportional to the chosen base, but also inversely proportional to the value of the chosen candidate.


6 is 1 mod 5. So the sum of all the digits in base 6 of such a number, must be 0 or 1 mod 5 (because the last digit in base 5, has the value of the number mod 5, and here has to be 0 or 1)

If a number only has 0's or 1's in a base, the sum of its digits in that base is just the number of 1's.

Similarly, 5 is 1 mod 4, 4 is 1 mod 3, 3 is 1 mod 2.

Inspect 82000:

10111000 in base 5: 4 1's, is 0 mod 4 110001100 in base 4: 4 1's, is 1 mod 3 11011111001 in base 3: 8 1's, is 0 mod 2

Seems worth mentioning.


    In fact, for b = 5, that infinite sum is equal to about 1.89.
    Substituting this into the upper and lower bounds for e(5)
    suggests that the expected number of positive integers (greater than 1)
    expressible with only “1” and “0” in bases 2 through to 5 should
    lie between 0.08 and 0.59. 
Of course I had to go on to b=6 and found the probability between 0.005 and 0.112


The expectations are not independent though. There can be no more solutions for b=6 than there are for b=5. Given what we already know, we should calculate those sums, not from i=2, but from i=10^2184 or so, because we there are no solutions below that. The theoretical expected number of solutions for b=6 is therefore vanishingly small.


I wonder how "n exhaustive search has been carried out up to 3125 digits in base 5 and no solution exists in this range" was implemented? Obviously it's not a brute force solution.


The (original) author checked everything up to 10^21 with a simply brute force Python program.

The more exhaustive search is referenced from The Online Encyclopedia of Integer Sequences:

Checked to 3125 (5^5) base-5 digits in just under 1/2 hour using a minor modification of the PARI program at A230360. Interestingly, with 5 replaced by 9 and the digits 2 and 3 permitted, it appears the complete set is--somewhat coincidental with this--{0, 1, 2, 3, 8281, 8282, 8283}. - James G. Merickel, Dec 01 2013

http://oeis.org/A146025


It's easer to brute force than you might think as there are only two possible 'valid' answers for each digit. In other words in base six 0,1,1,1 -> 1,0,0,0 not 0,1,1,2. So, even a vary dumb brute force aprach grows at 2^(k) not 6^(k) base 6 digits.

Add to that you can swap back and forth between bases.

EX: check 1,0,0,0,0,0 in base six (7,776 b10) = 2,2,2,1,0,1 in base 5. Clearly the next lowest possible number in base five is now 1,0,0,0,0,0,0 (15,625 b10).


I wonder if a SAT solver would be able to find the next number, if it exists.


Vaguely related: I've always liked that 0x69 = 105 and 0105 = 69.


Why base 1 is not 1? or 0?

Let's see. Let's build a numeric system where two digits represent all the numbers and call it base 2. Only 0 and 1. But it may equally be A and B. So if you mix them all in combinatorial order they can form all known numbers in that base. A, B, BA, BB, BAA, BAB, BBA, BBB, and so on.

So base 3 is A,B,C

Base 4 is A,B,C,D

Therefore Base 1 is A, whatever symbol you use. And it has only one number: A. Which may well be the interpretation of decimal zero.


there is no exact base 1, as the concept of base depends on at least one value and zero. no point in having bases if you don't have zero.

the only system with base 1 that exist is "counting" (go to wikipedia for the proper mathy name). where all occurrences of the only symbol get summed. want to say 9? draw 9 symbols. 9 knots on a rope or stones on a tablet.

only that then 1 is the more obvious choice if you really want to push for some parity with proper bases.


On the contrary, unary numbers are a well-defined concept (http://en.wikipedia.org/wiki/Unary_numeral_system).

Unary numbers are perfectly commensurate with binary, decimal, octal, etc., numbers. In binary, you write

  B2 B1 B0
to mean the number:

  B2 * 2^2 + B1 * 2^1 + B0 * 2^0.
The same thing applies to unary numbers, just note that 1^n = 1 for every n, so writing:

  B2 B1 B0
means:

  B2 * 1^2 + B1 * 1^1 + B0 * 1^0 = B2 + B1 + B0, 
or, in other words, add the tally marks to get the number.


Unary is a well defined concept, but it's not "base 1".

Or if you don't want to argue about the definition of base, at the very least it doesn't have any of the properties other base systems have.

A number in any base system can be thought of as an infinite string of zeros, followed by the digits. "1" is just a convenience, it could also be written "0000000001", or "0000...0001".

And then you can extend it to decimals, and have an infinite string of zeros after the decimal point. You can't have decimals in unary.

In other words it isn't necessary to store any additional information on how many symbols you need. The symbols are fixed, you just choose what state they are in.

Unary "cheats" by using a meaningless symbol and passing all the information in the number of times the symbol is written.

All the algorithms for adding numbers, dividing, subtraction, etc, can be generalized to work in any base. But not unary. Unary requires entirely different algorithms because it's not a base number system.

The formulas for converting between two bases work for any base system except unary. The formulas for measuring the number of digits you need to represent a number work for any base system except unary.

It's just an entirely different system of representing numbers. It's more similar to Roman numerals.


Unary representation differs from better bases in one very important regard: a base-n representation of a number uses n symbols, and you write the number as the coefficients of a power series in n. The n symbols you use are the natural numbers m such that 0 <= m < n.

But in a unary representation, the only number you could represent that way is 0. Instead, a unary system uses as coefficients the natural numbers m such that 0 < m <= 1. An immediate consequence is that there's no way to represent zero.

(In "unary-style" binary, you'd count like this: 1, 2, 11, 12, 21, 22, 111, 112... . The constraint that none of your coefficients can be zero (because it doesn't exist) makes the representation unique. If you're willing to use infinite representations along with finite ones, you could then represent zero in two's complement as ...111112, that is, one more than -1.)


Yes, thanks for the qualification regarding zero...the extension to the "base-1 case" is not as simple as I made it sound. Point taken.


you are assuming the lack of a digit is a digit. didn't you just made a way to add zero on base 1?

from the "base" definition on wikipedia: "In mathematical numeral systems, the radix or base is the number of unique digits, including zero."


Hmm, the original comment of yours, to which I replied, was rather different than the one there now.


Also unary encoding is actually used in practice for Golomb coding.

http://en.wikipedia.org/wiki/Golomb_coding

The quotient is sent in unary.

I remember because I implemented that for a lossless image encoding/decoding scheme at some point.



>Blogspam

This is not an accurate characterization and you should be downvoted for it. The article (which links to your source) is great. It's also more approachable than your link, which I wouldn't have read through. (in fact, I did just read through it just now and found it a chore.)

your link also shouts out to our (great) link at the bottom, and calls it "far less circumlocutory". why do we have to read a less approachable source, when we can read a great blog post about it? especially when the link is included for anyone who wants to dive deeper.

secondary sources that explain a subject are good.


I'll take blogspam with a Fields Medalist in its comments over non-blogspam with no comments.


In fairness, the original source gives a calculation that's basically a more quantitative version of Gowers's dimension-of-intersection-of-Cantor-sets argument, and yields the same conclusion.


Both posts have their merits; the OP is more succinct.


That source points to this at the bottom.


From the guidelines:

  Please submit the original source. If a post reports on
  something found on another site, submit the latter.
In this case, it's not just a matter of blindly following the guidelines; the original source goes into much more detail behind the find and search for additional terms in the sequence.


Except that they are both original sources. Two separate sites can come up with similar articles, especially if they collaborated with one another on it. The fact that both articles link to one another with one being described as easier to digest and one being more thorough implies that this is what happened here. Therefore, neither would be blogspam, just different approaches to the same topic. They both have their merits, one is easier to digest, one is a lot more detailed. If someone is interested in the other, they can easily get to it through the links in the respective articles.


The submitted (Google Plus) link was published May 17th. The more detailed (mathistopheles) link was published May 5th and was only updated very recently to link to the summarized Google Plus post. It seems clear to me that the Google Plus post summarized the mathistopheles post.

I do think that the word blogspam is too strong in this case, but given that the mathistopheles post is the original one by almost several weeks, by convention that should have been submitted as the original source.


If it is of any relevance, I am the original poster and I have no objection to Richard Green's article being the one referenced. He has clearly credited me as his source, and very few people would have seen my piece if he hadn't written his. Also his piece is clearly much more digestible in summarising the points of interest, while mine was more of a puzzle (the solution page is the one being referenced) followed by a loose account of my investigation into the problem.

I'm just pleased that it seems to have captured the imagination in the way it has.


It is amazing - I discovered that 82 multiplied by 1000 is equal to 82000, AND, 82002 subtract 2, is also 82000! It is spooky!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: