tl;dr: Scientific American published an article about an analog computing technique that the author claims can solve NP-complete problems, but which contains an elementary mistake that invalidates the result. Before SciAm published it, an arXiv preprint circulated and commenters on HN and Reddit pointed out the mistake. The editor either did not read these replies, or did read them but decided to publish anyways.
This is an embarrassing fuckup on their part, but one that is endemic in mass-market scientific publications, which want to publish sensational results but lack the expertise to recognize them or to summarize results without major distortions. The problem is so severe that I recommend avoiding these sources entirely, because the frequent major errors will effectively leave you less knowledgeable than you started.
Even highly-respected peer-reviewed scientific journals make mistakes. The beauty of science (and scientific publishing) isn't that it's always right, but that it's willing to admit it was wrong (given sufficient evidence or reasoning).
Scientific American has very high standards. They do extensive fact-checking, even for columns that don't matter as much (such as David Pogue's). Many of the longer articles in Scientific American are written by the scientists who did the work, often stuff that was previously published in peer-reviewed scientific journals. The Scientific American version is written for a generalist audience.
Scientific American, the magazine, rarely breaks major sensational science stories. It's not their mission.
You say you "recommend avoiding these sources [such as Scientific American] entirely." What do you recommend instead? Reading peer-reviewed journals? Who can read those, other than the specialists?
I recommend reading Scientific American, alert to the possibility that something may be wrong. Because there's no such thing as a scientific publication that's always 100% right.
The problem is the standard for reviews. It cannot become mandatory to read Reddit before publishing a paper. Reddit has an endemic issue of publishing sensational results with major distortions, Reddit contains frequent major errors which will effectively leave you less knowledgeable than you started. It makes no sense to avoid scientific journals, but then trust whatever you read on Reddit.
I don't necessarily disagree, but there's a certain amount of due diligence that editors must do and, as I recall, this idea + rebuttal made a bit of a splash even on non-HN/Reddit media back when the original came out. Even if we ignore that, there's a massive a-priori improbability for the claimed result being true given the current state-of-the-art knowledge by experts in the field. Claiming to "solve" N=NP is definitely a red flag and anyone modestly acquainted with the field, as editors probably should be, would know to consult a larger group of people within the field before publishing this. Let's not forget Sagan's advice to require extraordinary evidence for extraordinary claims.
Either way, I think the editors should at least publish a/the rebuttal at this point. (At the very least.)
EDIT: I should say: I know this isn't -- per se -- claiming that P=NP, but it's in the same league in terms of boldness since they're claiming that there's a physical machine that can solve NP problems in P, but there's, as yet, no plausible explanation for why no N-TM has yet been realized in physical reality. (Given current QC theory, at least.)
Reading the comments in your [0], I'm confused about the relationship between thermodynamic entropy and information entropy. In thermodynamics, I've always thought of entropy as randomness, with the 2nd law saying that eventually the universe will turn to static. That is, entropy is increasing and information is decreasing.
But in [2] and [3] I read that information entropy is information, and it has the same formula as thermodynamic entropy! Am I missing a negation or reciprocal somewhere? Is entropy information or lack or information?
Reading those Wikipedia pages isn't helping me. Can someone explain why my intuition is wrong here?
No, you're right, it's just a terminological morass. Entropy and information are different names for the same concept, one that in its full generality is simply a measure of the uncertainty about the value of a random variable.
Entropy is a bad name, one that will probably continue to confuse people for generations. Information is a decent name, but it's a bit backwards: information is the resolution of uncertainty. Really, the concepts implied by the names information and uncertainty are opposite sides of the same coin: for each unit of uncertainty, you gain one unit of information when you observe a random variable's outcome.
The problem is that in the conventional definition (with the negative sign in front of the sum), a more positive quantity denotes more uncertainty. Perhaps the conventional quantity should have been called uncertainty and the name information should have been given to the inverse quantity, i.e., to the sum without the negative sign.
As it is, people usually use the word information as a synonym for entropy or uncertainty, but when they focus on resolving uncertainty they sometimes use it to mean something like the opposite. In the end, so long as you are confident in you understanding of the concept, it doesn't much matter because it's easy to figure out what everybody means.
And so entropy and information are opposites: as you gain more information about a system the uncertainty is reduced. If the entropy of a system increase, the amount of information required to describe it increases. If you have an amount of information equal to the entropy, your uncertainty is zero and the system is completely described. This seems to square with our usage of the terms in the context of thermodynamics and information theory.
Another way to look at this is the process of reading a stream of bytes.
A high-entropy stream means that based on what you have read so far you have a very small chance of predicting the next byte you read.
By the same token (as it were) in those circumstances the next byte you read will contain a great deal of information about its value that you did not previously have.
In a low-entropy stream the bytes coming in might be: 0, 1, 2, 3... and by the time you get to byte N you can be pretty sure of its value, so the next byte contains very little information you don't already have.
Both information and entropy are measures of the novelty of the next byte in the stream, but information is measured from the perspective of what you get when you read it and entropy is measured from the perspective of what you have before you read it.
If you understand quantum mechanics, it's pretty easy to map the process of quantum decoherence to the increase of entropy. It can be more clearly said, then, that quantum coherence is a measure of negentropy: in classical terms, uncertainty, but in quantum terms, the amount of computational power the system has—the amount of certainty it can be used to create.
If you think of our universe as one big quantum computer, it started with some amount of negentropy, and is slowly "burning through" it by resolving quantum predecessor states to their ultimate physical conclusions. When all the negentropy is gone, the system will be in a (quantum-)predictable state.
(It might be a classically predictable state, too, now that I think about it; if the universe only contains atomic hydrogen spread evenly across the universe at an exactly even temperature, then I think you get "for free" both a reading of the position and momentum of all those atoms, no?)
Anyway, that's all to say: you can measure the power of a quantum-coherent state in "bits of negentropy"; when that state collapses/decoheres, you will know N more bits of information than you knew before it collapsed. The negentropy (potential to answer questions) in the system has been converted into entropy (answers to questions).
And that's the strangest thing about all this—that it falls out that "ability to increase certainty" is the ultimate conserved resource in our universe, the ultimate clamp on thermodynamics. A certain clump of particles, as a closed system, can only be used to increase your certainty by N bits; after that, the system is useless for answering any further questions about anything, ever again.
And, on the other hand, think about things like electron spin: as long as no decoherence is occurring—no certainty is being created—thermodynamics doesn't care if something is acting on something else with a force, or anything like that. There's not really such a thing as "energy" in the universe, no requirement that every ongoing force have a "casting cost" in negentropy. There's only decoherence. Heat-death-universe hydrogen atoms are perpetual motion machines, for free, only and precisely because their perpetual motion will never compute a thing.
In the Bayesian and inductive-logical view of probability theory, you must define a probability distribution with respect to an information set. That set formalizes your observations of (and assumptions about) the state of the universe, and the entropy of the resulting distribution quantifies the quality of your knowledge about the universe.
From that perspective, the second law of thermodynamics simply says that the quality of your knowledge about the universe with respect to an information set assembled at time t degrades as you move from t out into the future. Put like that, it seems rather obvious, I think.
Quantum theory further tells us that it's not possible to assemble an information set that is complete, in either the sense that (a) we can get a perfect picture of the universe as it is right now, or that (b) the quality of our information won't degrade over time. (The two senses are equivalent, and you can probably see why.)
So Laplace's demon can't exist. Which isn't terribly surprising to the modern mind, either, but I can imagine how scientists of a previous era might have taken the news badly.
The way I've squared the similar terminologies in different fields is that as thermodynamic entropy increases, the amount of information required to encode/duplicate the state of the system increases (consider the extremes of maximally compact matter (say pre-big bang) and heat death where all matter is spread out arbitrarily over an unbounded volume). So entropy is equivalent to information as understood as a measure of disorder. This squares with shannon entropy as it can be understood as just a measure of the "disorder" in a random variable.
The difficulty is distinguishing the other interpretation of information as a specific correlation, a quantification of order rather than disorder. When we think of communicating the state of a RV as some number of bits of "information", one way to think of it is that we're reducing the "possible disorder" (i.e. uncertainty) by communicating partial state. And so information here is the change in the possible disorder (uncertainty), the more information you receive the less uncertainty that remains.
The confusion comes from what posesses the information, you or the object.
If a distribution is spread out, I know very little about it. If I learn something about it, and make it less spread out, then I have more information about it, and there is less information left in it.
Learning about something is taking information out of it. So something has a lot of information if you know nothing about it. If you know a lot about it, you've taken all the information out of it, and there's nothing left.
Casual and handwavy, but these two ideas might be worth thinking about.
Maxwell's demon is a good bridge. If you have perfect information you can reverse entropy.
Another way to connect the two is to look at the edge case - when the universe is just a uniform heat - it won't take much information to describe. Yup, same temperature everywhere.
>when the universe is just a uniform heat - it won't take much information to describe.
Actually this is when it takes maximum information to describe, as no particle is correlated with any other (and thus no savings from encoding). You basically have to fully describe each particle individually as a point in space. The alternate extreme is where matter is maximally compact and thus maximally correlated.
That's a great way of putting it. Maximum entropy means minimal intrinsic information (and thus you need a lot of external information) and vice versa.
Well, the entropy in information theory is randomness too, in the sense that it's _unpredictable_, right?
If you've got 1000 numbers, but you can predict them all with "n[i] = i * 2", then there's not that much information present, in the information theoretical entropy sense -- all you need is the formula `n[i] = i * 2` to describe all 1000 numbers, actually writing all 1000 (or 100,000) out just wastes space without providing any more information beyond `n[i] = i * 2 for i=1..1000` or whatever.
But if the 1000 numbers are truly random, you need all 1000 numbers to describe the set of 1000 numbers in it. There is more information present, there is more entropy, precisely because of the randomness or unpredictability.
Or to put it another way, if you can compress something losslessly, then it makes sense to think the uncompressed version doesn't have more "information" in it than the compressed version, even though it's more bytes, right? And what can't be compressed losslessly at all? Something entirely random with no predictability.
So that's my understanding of the relationship -- information theoretic entropy is, in fact, randomness too. Random here meaning 'unpredictable', not describable by any formula or shortcut or compression -- the normal way we use the term 'random' in day-to-day software engineering, in fact, and the meaning that's crucial for cryptography for instance. (Then, as now, military budgets drove computer science research -- Shannon was a cryptographer on a DoD contract) (It doesn't neccesarily mean the 'random' data is _senseless_, just that it's entirely irregular and unpredictable).
So, okay, now I'm grasping and I'm not sure what I'm talking about, but in this sense, as the entropy in the universe increases, the 'information' in the universe increases in the sense that there's no shortcut for describing the universe's state, it's increasingly just randomly distributed stuff?
I don't know if that last paragraph makes sense, or if you can really talk at all about 'the amount of information in the universe increasing' -- I vaguely recall reading stuff on that before, the relationship between physical and information entropy, but I don't remember enough to speak on it really.
But my understanding is that Shannon was inspired by the thermodynamic entropy in inventing information theory, realizing that the amount of randomness present in a sequence of data, the amount of unpredictability in it, was usefully understood as 'how much' information it contained. The "information" in "information theory" is not the plain-use meaning of the word 'information', although it has a relationship to it. But it means, well, precisely this, entropy or unpredictability. And that meaning has proven useful for reasoning about computation. But it's a low-level mathematical/computational sort of 'information', not information like answers to trivia questions.
>> But my understanding is that Shannon was inspired by the thermodynamic entropy in inventing information theory
The rest of this comment is from Wikipedia[0]
During their discussions, regarding what Shannon should call the "measure of uncertainty" or attenuation in phone-line signals with reference to his new information theory, according to one source:
“My greatest concern was what to call it. I thought of calling it ‘information’, but the word was overly used, so I decided to call it ‘uncertainty’. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, ‘You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.”
That is really an awesome explanation, thank you! And thanks to the other commenters as well.
I think my interpretation of the 2nd law as "static" (like on a TV) misleads me because static seems so uniform, and really increasing entropy is the opposite of uniform. It is n[i] = rand() rather than n[i] = C. So your explanation helps a lot!
EDIT: So a followup question: Does information theory have anything like the 2nd law of thermodynamics? In the digital world we are used to lossless transfer, but children know that when you play "telephone" you lose information (or "gain" it in the sense the new message can't be predicted). Or maybe the analog of the 2nd law is the pigeonhole principle---I'm not sure. But if there is so much symmetry between the two kinds of entropy, is there a 2nd law for information?
My naive understanding of your follow up question is that energy is a special case of information and the Second Law is a special case of Shannon's Information Theory. The system in the second law is a special case of Shannon's channel. Apparent lossless digital transfer is a red herring though. Noise is inherent to any channel.
I think it is unfair to give Shannon too much credit, Boltzmann and Gibbs came up with the exact definition Shannon gave
S = -k \Sigma p_i ln(p_i),
including a suitable information theoretic interpretation. As physicists they only thought about ignorance of the exact particles position and momentum in systems of 10^24 particles, but the essential idea is the same. Moreover von Neumann had generalized that notion to quantum mechanics
S = - tr(\rho ln(\rho)),
which gives it a form that is even more abstract than the information theoretic formulation.
It was only ignorance on Shannons part that he was not familiar with the work of physicists and mathematicians before him on the subject.
Subset sum is actually only weakly NP-complete. What that means: best known algorithms are only exponential in terms of problem size (number of bits it takes to encode the numbers) but not in terms of the numbers themselves. Dynamic programming solves subset sum in Poly(numbers) time, which is obviously exponential in terms of bits.
So there's a danger here associated with demonstrating your superpowers on subset-sum: unless the numbers are huge, you are actually solving a trivial problem.
I didn't dig into the article enough to check whether the authors made this mistake.
Yes, they also make this mistake. The frequencies correspond to the numbers being summed, so if the numbers are huge (>100 bits) the frequencies will be so high that the wavelengths would be smaller than atoms, where you can't construct physical circuits. If you scale down the frequencies, the time it takes for a complete period goes up exponentially with the number of bits in the numbers.
To be pendantic with terminology, if you take a sequence and remove some elements, what you're left with is a subsequence. If you take a sequence and you select some contiguous elements, what you get is a substring. So "amqz" is a subsequence of the alphabet but not a substring. "lmnop" is a substring and a subsequence.
Sounds like you're not a scientist or a mathematician then. No problem there. But why litter a discussion clearly not meant for you with rude and irrelevant comments? Breaking hashing functions is not related to the article, nor is it equivalent to the problem of P versus NP.
Breaking polynomially computable hashes is in NP, in the sense that its decision version - is there any word lexicographicaly smaller than w that hashes out to v - is in the NP class. Thus P=NP would give us a procedure to compute preimages of all known reasonable hash functions, with complexity depending on the size of the shortest solution.
That is one implication. But as I explained in my other comment, the nonexistence of one-way functions does not imply the resolution of P versus NP. So the two problems are not equivalent.
You're missing my point, which is that the existence of cryptographically secure hashing functions is not equivalent to P != NP. So even if someone were to prove that there are no secure hashing functions (which you would be fine with so long as reversal takes a long time in real life), this would not necessarily resolve P versus NP.
I would suggest you read more about one-way functions and how theoretical cryptography relates to complexity theory, but you don't seem to take mathematical inquiry very seriously. Again, it's not a problem, but you should consider it a prerequisite for having serious discussions about P versus NP if you insist on relating it to hashing.
P Vs. NP isn't just of consequence to cryptography. It's extremely relevant to scheduling and resource use optimization. E.g. FedEx want to deliver parcels as fast as possible. Factories should be laid out optimally, chips should have their internal wiring minimized etc. Those all have NP problems underneath them and improvements to computing in that category would have huge economic benefits (although it ain't gonna happen)
This is an embarrassing fuckup on their part, but one that is endemic in mass-market scientific publications, which want to publish sensational results but lack the expertise to recognize them or to summarize results without major distortions. The problem is so severe that I recommend avoiding these sources entirely, because the frequent major errors will effectively leave you less knowledgeable than you started.