Krapivin made this breakthrough by being unaware of Yao's conjecture.
The developer of Balatro made an award winning deck builder game by not being aware of existing deck builders.
I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before. This makes me kind of sad, because the current world is so interconnected, that we rarely see such novelty with their tendency to "fall in the rut of thought" of those that came before. The internet is great, but it also homogenizes the world of thought, and that kind of sucks.
> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before. This makes me kind of sad, because the current world is so interconnected, that we rarely see such novelty with their tendency to "fall in the rut of thought" of those that came before. The internet is great, but it also homogenizes the world of thought, and that kind of sucks.
I think this is true only if there is a novel solution that is in a drastically different direction than similar efforts that came before. Most of the time when you ignore previous successful efforts, you end up resowing non-fertile ground.
Right, the other side is when you end up with rediscoveries of the same ideas. The example that comes to my mind is when a medical researcher found the trapezoidal rule for integration again[1].
It is not a problem if you are a student learning how to solve problems. Solving previously solved problems is often a good way to learn - because there is a solution you know you are not hitting something that cannot be solved, and your teacher can guide you if you get stuck.
For real world everyday problems normally it is an application of already solved theory or it isn't worth working on at all. We still need researchers to look at and expand our theory which in turn allows us to solve more problems in the real world. And there are real world problems that we pour enormous amounts of effort into solving despite lacking theory, but these areas move much slower than the much more common application of already solved theory and so are vastly vastly more expensive. (this is how we get smaller chip architectures, but it is a planet scale problem to solve)
I agree -> even if someone spends their time "rediscovering" an existing solution, I think that the learning experience of coming up with a solution without starting from current best solution is really valuable. Maybe that person doesn't reach the local maximum on that project, but having a really good learning experience maybe enables them to reach a global maximum on one of their next projects.
If I want some novel ideas from a group of people, I'm going to give them the framework of the problem, split them into groups so that they don't bias each other, and say: go figure it out.
It is, because you are wasting time reinventing the wheel. Also if something is already well researched, you might miss intricacies, traps, optimizations etc. previous researchers have stumbled upon.
I find a lot of the time concepts and patterns can be the same in different fields, just hidden by different nomenclature and constructed upon a different pyramid of knowledge.
It's nice we have a common language that is mathematics, the science of patterns, to unify such things but it's still going to be a challenge because not everyone is fluent in all the various branches of mathematics.
It's even mind-blowing how many ways you can approach the same problem with equivalencies between different types of mathematics itself.
I think that shows how great the trapezoidal rule is. I feel like this is brought out too many times, that now it is used to make fun of people. It is 18 years old at this point.
I mean, it sort of deserves being made fun of. 18 years ago Google existed, surely you'd search for "area under a curve" before going through all the effort of writing a paper reinventing integrals?
Edit: actually the paper was written in 1994, not sure what the "18 years" was referring to. But still, peer review existed and so did maths books... Even if the author can be excused somewhat (and that's already a stretch), peer reviewers should definitely not let this fly.
Unfortunately quite common to see serious mathematical issues in the medical literature. I guess due to a combination of math being essential to interpreting medical data and trial results, but most practitioners not having much depth of math knowledge.
Just this week I came across the quote "Frequentist 95% CI: we can be 95% confident that the true estimate would lie within the interval." This is an incorrect interpretation of confidence intervals, but the amusing part is that it is from a tutorial paper about them, so the authors should have known better. And cited by 327!
https://pmc.ncbi.nlm.nih.gov/articles/PMC6630113/
Does it make you less of a peer to others who found it before ? At leas the author showed ability to think creative for himself , not paralyzed by the great stagnation like the rest of us.
And what makes you less of a peer is not knowing the basics. And being so unaware of apparently not knowing the basics, and/or uninterested, that you don't bother to check something that is highly checkable.
This is why peer review exists. One can not known everything themselves. It's fairly common for CS paper submissions to reinvent algorithms and then tone down the claims after reviewers suggest that variants already exist.
Calculus textbooks existed in 1994. It just took me 30 seconds to find “area under a curve” in the index of the one I own, and another 30 seconds to find the section on numerical integration, which describes the trapezoidal approximation.
So you already know the particular area of the larger topic of mathematics that you need to look for, you already have a textbook for that particular subject in your possession, meaning you don't need to go to the library and somehow figure out the right book to choose from the thousands the 510 section; you know what you are looking for exists, and then you aren't surprised you can find it?
I know how to find the area under the curve, but there's so much biology I don't know jack shit about. Back in 1994, It would have been hopeless for me to know the Michaelis-Menten model even existed if it had been relevant to my studies in computer science. That you can right click on those words in my comment in 2025 and get an explanation of what that is and can interrogate chatgpt to get a rigorous understanding of it shouldn't make it seem like finding someone in the math department to help you in 1994 was easier than just thinking logically and reinventing the wheel.
There is thing called "higher education". Ostensibly one of its chief purposes is to arm you with all that interconnected knowledge and facts that is useful in your chosen field of study. To the boot, you get all of that from several different human beings who you can converse with, to improve the scope and precision of the knowledge you're receiving. You know, "standing on the shoulders of the giants" and all of that stuff?
> So you already know the particular area of the larger topic of mathematics that you need to look for
So did the author of the paper. The paper’s title itself mentions the area under a curve. It would not have been difficult to find information about how to calculate an approximation of the area under a curve in the library.
I'd argue this is an argument against purely peer review, as her peers also weren't mathematicians.
Some of us when learning calculus wonder if we'd been alive before it was invented, if we'd be smart enough to invent it. Dr. Tai provably was. (the trapezoid rule, anyway) So I choose to say xkcd 1053 to her,
rather than bullying her for not knowing advanced math.
No, we have no proof of that. We just know that she published a paper explaining the trapezoidal rule.
(A) That approximation for 'nice' curves was known long before calculus. Calculus is about doing this in the limit (or with infinitesimals or whatever) and also wondering about mathematical niceties, and also some things about integration. (B) I'm fairly certain she would have had a bit of calculus at some point in her education, even if she remembered it badly enough to think she found something new.
I mean, it's possible she reinvented the wheel because what she really needed in her life is for the math department to laugh at her, but that seems far fetched to me.
> This is the the strongest argument for not shaming reinvention...
Sounds like a pretty weak argument? I'm sure there are some good arguments for re-invention. But this ain't one of them.
Basically, re-invention for fun or to help gain understanding is fine. But when you publish a 'new' method, it helps to do a bit of research about prior work. Especially when the method is something you should have heard about during your studies.
there might be a decent amount of "survivorship bias" too. meaning you only hear of the few events where someone starts from first principles and actually finds, be it luck or smarts, a novel solution which improves on the status quo, but i'd argue there are N other similar situations where you don't end up with a better solution.
That being said, I so disagree with just taking the "state of the art" as written in stone, and "we can't possibly do better than library x" etc.
Plenty of "state of the art", at least a decade ago, that was not very state of anything.
I think bias is inherent in our literature and solutions. But also, I agree that the probability of a better solution degrades over time (assuming that the implementations themselves do not degrade - building a faster hash table does not matter if you have made all operations exponentially more expensive for stupid, non-computational, reasons)
In 1973 Clifford Cock solved the problem of public keys first time in history that no one in GCHQ managed to solve in the past 3 years. He jolted down the solution in half hour after hearing about it then wondered why is it such a big thing for everyone else. A fresh view unclouded by prejudice can make all the difference.
Viewed through the lens of personal development I suppose one could make an argument that there wasn't much difference between rediscovering an existing valid or invalid solution. Both lead to internalisation of a domain's constraints.
Outside of math and computational science, nothing is proven to not work because scientific research doesn't work in proofs. Even in math and computational science, there are fields dedicated to researching known proven wrong logic because sometimes there are interesting findings, like hypercomputation.
A lot of major scientific discoveries were made while people were trying to turn base metals into gold; also known as alchemy.
Some examples include discovering phosphorus, the identification of arsenic, antimony, and bismuth as elements rather than compounds, and the development of nitric acid, sulfuric acid, and hydrochloric acid. Alchemy ultimately evolved into modern chemistry.
I think the key is that thinking that something is a waste of time is the type of mentality that prevents individuals from pursuing their interests to the point where they actually make important discoveries or make great inventions.
If you put enough time and energy into anything you're bound to learn a lot and gain valuable insights at the very least.
The problem is that when the proof is wrong, as in this case a related conjecture held up for 40 years, which is not a "proof" per se, but still ostensibly an extremely high reliability indicator of correctness.
Another example is when SpaceX was first experimenting with reusable self landing rockets. They were being actively mocked by Tory Bruno, who was the head of ULA (basically an anti-competitive shell-but-not-really-corp merger between Lockheed and Boeing), claiming essentially stuff along the lines of 'We've of course already thoroughly researched and experimented with these ideas years ago. The economics just don't work at all. Have fun learning we already did!'
Given that ULA made no efforts to compete with what SpaceX was doing it's likely that they did genuinely believe what they were saying. And that's a company with roots going all the way back to the Apollo program, with billions of dollars in revenue, and a massive number of aerospace engineers working for them. And the guy going against them was 'Silicon Valley guy with no aerospace experience who made some money selling a payment processing tool.' Yet somehow he knew better.
All the cases you bring up are not "proofs": a conjecture is very much not one, it's just that nobody bothered to refute this particular one even if there were results proving it isn't (cited in the paper).
Similarly, ULA had no "proof" that this would be economically infeasible: Musk pioneered using agile ship-and-fail-fast for rocket development which mostly contradicted common knowledge that in projects like these your first attempt should be a success. Like with software, this actually sped things up and delivered better, cheaper results.
The Apollo missions, of which Boeing was a key player, were also a 'ship and fail fast' era. It led to some humorous incidents like the strategy for astronaut bathroom breaks to simply be 'hold it' which was later followed up by diapers when we realized on-pad delays happen. Another one was the first capsule/command module being designed without even a viewport. Of course it also led to some not so humorous incidents, but such rapid advances rarely come for free.
In any case Musk definitely didn't pioneer this in space.
> Of course it also led to some not so humorous incidents, but such rapid advances rarely come for free.
Luckily, you can run a lot higher risks (per mission) when going unmanned, and thus this becomes a purely economic decision there, almost devoid of the moral problems of manned spaceflight.
Manned spaceflight has mostly been a waste of money and resources in general.
The first man on Mars will likely discover far more in a week than we have in more than 50 years of probes.
There's a fundamental problem with unmanned stuff - moving parts break. So for instance Curiosity's "drill" broke after 7 activations. It took 2 years of extensive work by a team full of scientists to create a work-around that's partially effective (which really begs a how many ... does it take to screw in a light bulb joke). A guy on the scene with a toolkit could have repaired it to perfection in a matter of minutes. And the reason I put drill in quotes is because it's more like a glorified scraper. It has a max depth of 6cm. We're rather literally not even scratching the surface of what Mars has to offer.
Another example of the same problem is in just getting to places. You can't move too fast for the exact same reasons, so Curiosity tends to move around at about 0.018 mph (0.03 km/h). So it takes it about 2.5 days to travel a mile. But of course that's extremely risky since you really need to make sure you don't bump into a pebble or head into a low value area, meaning you want human feedback with about a 40 minute round trip total latency on a low bandwidth connection - while accounting for normal working hours on Earth. So in practice Curiosity has traveled a total of just a bit more than 1 mile per year. I'm also leaving out the fact that the tires have also, as might be expected, broken. So it's contemporary traveling speed is going to be even slower.
Just imagine trying to explore Earth traveling around at 1 mile a year and once every few years (on average) being able to drill hopefully up to 6cm! And all of these things btw are bleeding edge relative to the past. The issue of moving parts break is just an unsolvable issue for now and for anytime in the foreseeable future.
----------
Beyond all of this, there are no "moral problems" in manned spaceflight. It's risky and will remain risky. If people want to pursue it, that's their choice. And manned spaceflight is extremely inspiring, and really demonstrates what man is capable of. Putting a man on the Moon inspired an entire generation to science and achievement. The same will be true with the first man on Mars. NASA tried to tap into this with their helicopter drone on Mars but people just don't really care about rovers, drones, and probes.
You get extremely diminishing returns with probes. There's only so much you can do from orbit. Rovers are substantially more useful, but are extremely expensive. Curiosity and Perseverance each cost more than $3 billion. As the technology advances and we get the basic infrastructure setup, humans will rapidly become much cheaper than rovers.
A big cost with rovers is the R&D and one-off manufacturing of the rover itself. With humans you have the added cost of life support, but 0 cost in manufacturing and development. The early human missions will obviously be extremely expensive as we pack in all the supplies to start basic industry (large scale Sabatier Reactions [1] will be crucial), energy, long-term habitation, and so on.
But eventually all you're going to need to be paying for is food/life support/medicine/entertainment/etc, which will be relatively negligible.
Yeah, but then you are going to get a very little return from those 10 probes.
Sending a person there for a one way mission would probably give us more data than 100 probes. And I have a feeling that there are a lot of people willing to go on a such a mission.
What sort of things would you expect on the list? A lot of those are critical prerequisites for humanity's advancement. They also left out some really important stuff like studies on sex in space, exercise in space, effects of radiation in space (as well as hardening electronics), and so on.
A space station on Mars would probably not provide much more than that so should be a low priority, but obviously the discoveries to be made on land trounce those to be made in space.
Eventually you cannot run high risks in unmanned. If a rocket fails getting a satellite to orbit just build a new one. However missions to the outer planets are often only possible once every several hundred years (when orbits line up) and so if you fail you can't retry. Mars you get a retry every year and a half (though you get about a month). If you want to hit 5 planets that is a several hundred year event. And the trip time means if you fail once you reach the outer planet all the engineers who knew how the system works have retired and so you start from scratch on the retry (assuming you even get the orbits to line up)
> However missions to the outer planets are often only possible once every several hundred years (when orbits line up) and so if you fail you can't retry.
Just send ten missions at the same time. No need to wait until you fail.
Sure, it's better to frame it as "reintroduction": for those early attempts to be succesful with Soviets pushing on the other side as well, it is a strategy that works the fastest.
Thanks for the funny incidents as well, and my empathy for the not so funny ones!
Had they received the same grant money as Boeing ($4.2b vs $2.6b), it wouldn't have been such a close call.
I'd also note that they were also late by 3 years or so: this did not produce miracles, it was just much cheaper and better in the end than what Boeing is still trying to do.
Still, I would be surprised if SpaceX did not greatly benefit from knowledge gained in Falcon 1 development when building their Falcon 9 rocket and then optimizing it for reusability — they started development of Falcon 9 while Falcon 1 was still operating.
This illustrates beautifully how stupid labeling ideas stupid is.
To know that that an idea or approach is fundamentally stupid and unsalvageable requires a grasp of the world that humans may simply not have access to. It seems unthinkably rare to me.
On the other hand, I knew from the beginning that the Space Shuttle design was ungainly, looking like a committee designed it, and unfortunately I was right.
(Having a wing, empennage and landing gear greatly increased the weight. The only thing that really needs to be returned from space are the astronauts.)
It was designed to support a specific Air Force requirement: the ability to launch, release or capture a spy satellite, then return to (approximately) the same launch site, all on a single orbit. (I say 'approximately' because a West Coast launch would have been from Vandenberg Air Force Base, returning to Edwards Air Force Base.)
The cargo bay was sized for military spy satellites (imaging intelligence) such as the KH-11 series, which may have influenced the design of the Hubble Space Telescope. Everything else led on from that.
Without those military requirements, Shuttle would probably never have got funded.
I'm listening to "16 Sunsets", a podcast about Shuttle from the team that made the BBC World Service's "13 Minutes To The Moon" series. (At one point this was slated to be Season 3, but the BBC dropped out.) https://shows.acast.com/16-sunsets/episodes/the-dreamers covers some of the military interaction and funding issues.
You're saying the same thing he is, but with more precise examples. There were also plenty of more useless requirements which is what he was getting at with it being 'designed by committee.' It was also intended to be a 'space tug' to drag things to new orbits, especially from Earth to the Moon, and this is also where its reusable-but-not-really design came from.
It's also relevant that the Space Shuttle came as a tiny segment of what was originally envisioned as a far grander scheme (in large part by Werner von Braun) of complete space expansion and colonization. The Space Shuttle's origins are from the Space Transportation System [1], which was part of a goal to have humans on Mars by no later than 1983. Then Nixon decided to effectively cancel human space projects after we won the Space Race, and so progress in space stagnated for the next half century and we were left with vessels that had design and functionality that no longer had any real purpose.
Let alone on launch. It's amusing that NASA is supposed to be this highly conservative safety-first environment, yet went with a design featuring two enormous solid rocket boosters. We knew better than this even during the Saturn era was very much a move fast and break things period of development.
There is nothing wrong with solid rocker boosters for that application. The issue is they failed to figure out figure out the limits and launched when it was too cold. (they also should have investigated when they saw unexpected non-fatal seal issues)
Solid boosters are more complex and so Saturn could not have launched on time if they tried them. So for Saturn with a (arbitrary) deadline not doing them was the right call. Don't confuse right call with best call though: we know on hindsight that Saturn launched on time, nobody knows what would have happened if they had used solid boosters.
I wasn't referencing Challenger in particular. I'm speaking more generally. SRBs are inherently fire and forget. This simply increases the risk factor of rockets substantially, and greatly complicates the risks and dangers in any sort of critical scenario. In modern times when we're approaching the era of rapid complete reuse, they're also just illogical since they're not meaningfully reusable.
Yeah, but that qualifier you put there means I think you need to frame it as "reused." They dragged a couple of giant steel tubes out of the ocean after a salt water bath and then completely refurbished and "reused" them. It's technically reuse, but only just enough to fit the most technical definition of the word, and certainly has no place in the modern goal of launching, landing, inspecting/maintaining (ideally in a time frame of hours at the most), and then relaunching.
The only real benefit of SRBs is cost. They're dirt cheap and provide a huge amount of bang for your buck. But complete reuse largely negates this benefit because reusing an expensive product is cheaper, in the longrun, than repeatedly disposing (or "reusing") a cheap product.
Do we know that the economics work for SpaceX? It's a private company and it's financials aren't public knowledge, it could be burning investor money? E.g. Uber was losing around 4B/yr give or take for a very long time.
You can't know anything for certain but most of every analysis corroborates what they themselves say - they're operating at a healthy (though thin) margin on rocket launches and printing money with Starlink.
The context of this of course is that they've sent the cost of rocket launches from ~$2 billion per launch during the Space Shuttle era, to $0.07 billion per launch today. And the goal of Starship is to chop another order of magnitude or two off that price. By contrast SLS (Boeing/NASA's "new" rocket) was estimated to end up costing around $4.1 billion per launch.
To be fair cost per launch was in that ballpark already ($$0.15-0.05) with Ariane, Atlas and Soyuz non-reusable vehicles. SpaceX maintains the cost just about to undercut the competition.
I think they maintain the price there. They'll want to drive the cost as low as possible, because price - cost = profit for them. A penny saved is a penny earned.
They undercut every other launch provider. There's no way they're burning investor money to achieve that at the scale of their operations. This is all due to the cost savings of reusable F9. If they wanted to they could jack up their rates and still retain customers and still be the cheapest. There is no reason to believe they are unprofitable.
Most of their income comes from government subsidies and grants. So, it is rather funny to see the owner of the company running around the government and "cutting" costs.
SpaceX's total funding from government grants and subsidies is effectively $0. They do sell commercial services to the government and bid on competitive commercial contracts, but those are neither grants nor subsidies.
No, they didn't. The government wants to get to the Moon via the Artemis program (which will never go anywhere, but that's a different topic) and so NASA solicited proposals and bids for a 'human landing system' [1] for the Moon. SpaceX, Blue Origin, and Dynetics all submitted bids and proposals. SpaceX won.
Amusingly enough Blue Origin then sued over losing, and also lost that. They were probably hoping for something similar to what happened with Commercial Crew (NASA's soliciting bids from companies to get astronauts to the ISS). There NASA also selected SpaceX, but Boeing whined to Congress and managed to get them to force NASA to not only also pick Boeing, but to pay Boeing's dramatically larger bid price.
SpaceX has since not only sent dozens of astronauts to the ISS without flaw, but is now also being scheduled to go rescue the two guinea pigs sent on Boeing hardware. They ended up stranded on the ISS for months after Boeing's craft was deemed too dangerous for them to return to Earth in.
If they can keep raising money from investors, that seems proof enough to me that the economics must be good enough.
Ie investors would only put up with losing money (and keep putting up money), if they are fairly convinced that the long run looks pretty rosy.
Given that we know that SpaceX can tap enough capital, the uglier the present day cashflow, the rosier the future must look like (so that the investors still like them, which we know they do).
The economics very likely didn’t work. It’d be irresponsible for a launch company to model Starlink without a customer knocking on your door with a trailer full of dollars to sponsor the initial r&d and another bus of lawyers signing long term commitments. Vertical integration makes the business case much more appealing.
Musk owns 42% of SpaceX's total equity and 79% of the voting equity.
The non-Musk shareholders range from low-level SpaceX employees (equity compensation) through to Alphabet/Google, Fidelity, Founders Fund.
There are actually hundreds of investors. If you are ultra-wealthy, it isn't hard to invest in SpaceX. If you are the average person, they don't want to deal with you, the money you can bring to the table isn't worth the hassle–and the regulatory risk you represent is a lot higher
> How much of their balance sheet is debt vs equity?
I believe it is almost all equity, not debt.
There is such a huge demand to invest in them, they are able to attract all the investment they need through equity. Given the choice between them, like most companies, they prefer equity over debt. Plus, they have other mechanisms to avoid excessive dilution of Elon Musk's voting control (non-voting stock, they give him more stock as equity compensation)
> Given the choice between them, like most companies, they prefer equity over debt.
What do you mean by 'most companies'? Many companies use debt on their balance sheet just fine, and even prefer it. Banks, famously, have to be restrained from making their balance sheet almost all debt.
I'm not sure this is true, though I think it looks true.
I think the issue is that when a lot of people have put work into something you think that the chances of success yourself are low. This is a pretty reasonable belief too. With the current publish or perish paradigm I think this discourages a lot of people from even attempting. You have evidence that the problem is hard and even if solvable, probably is timely, so why risk your entire career? There are other interesting things that are less risky. In fact, I'd argue that this environment in of itself results in far less risk being taken. (There are other issues too and I laid out some in another comment) But I think this would look identical to what we're seeing.
Right. FWIW, Feynman predicted that physics would become rather boring in this regard, because physics education had become homogenized. This isn't to propose a relativism, but rather that top-down imposed curricula may do a good deal of damage to the creativity of science.
That being said, what we need is more rigorous thinking and more courage pursuing the truth where it leads. While advisors can be useful guides, and consensus can be a useful data point, there can also be an over-reliance on such opinions to guide and decide where to put one's research efforts, what to reevaluate, what to treat as basically certain knowledge, and so on. Frankly, moral virtue and wisdom are the most important. Otherwise, scientific praxis degenerates into popularity contest, fitting in, grants, and other incentives that vulgarize science.
I think that's why most innovative science today happens at the intersection of two domains– That's where someone from a different field can have unique insights and try something new in an adjacent field. This is often hard to do when you're in the field yourself.
In my experience the best approach is to first try to solve the problem without having read the prior work, then read the prior work, then improve your approach based on the prior work.
If you read the prior work too early to you get locked into existing mindsets. If you never read it then you miss important things you didn’t thought of.
Even if your approach is less good than the prior work (the normal case) you gain important insights into why the state of the art approach is better by comparing it with what you came up with.
A decade ago I read this same advice in "The Curmudgeon's Guide to Practicing Law": spend at least a little time trying to solve the problem before you look to how other's have solved it. One benefit is that occasionally you may stumble on a better method. But the more common benefits is that it helps develop your problem-solving skills and it primes you to understand and appreciate existing solutions.
Then you’re very unlikely to come up with a novel approach. It’s very difficult to not let reading “state of the art” research put up big guardrails in your mind about what’s possible.
All of the impressive breakthroughs I saw in academia in the CS side were from people who bothered very little with reading everything related in literature. At most it would be some gut checks of abstracts or a poll of other researchers to make sure an approach wasn’t well explored but that’s about it.
The people who did mostly irrelevant incremental work were the ones who were literature experts in their field. Dedicating all of that time to reading others’ work puts blinders on both your possible approaches as well as how the problems are even defined.
Worst case: you don't have a fresh perspective, but you have learned something and you can try plenty of other problems.
There's also a fair chance of finding possibilities that are "obviously" implicit in the prior work but haven't yet been pursued, or even noticed, by anyone.
> If you read the prior work too early to you get locked into existing mindsets.
I agree, though in some cases coming up with your own ideas first can result in you becoming attached to them, because they are your own. It is unlikely for this to happen if you read the prior work first.
Though I think overall reading the prior work later is probably still a good idea, but with the intention not to become too impressed with whatever you come up before.
> The developer of Balatro made an award winning deck builder game by not being aware of existing deck builders.
He was aware of deck builders and was directly inspired by Luck be a Landlord, but he was not aware of just how massive the genre is.
Direct quote from the developer:
> The one largest influence on Balatro was Luck Be a Landlord. I watched Northernlion play for a few videos and loved the concept of a non-fanatsy themed score attach roguelike a ton, so I modified the card game I was working on at the time into a roguelike.
> I cut myself off from the genre at that point intentionally, I wanted to make my own mistakes and explore the design space naively just because that process is so fun. I hear the comparison to Slay the Spire a lot but the truth is that I hadn't played that game or seen footage of it when I designed Balatro, not until much later.
“They’re cheering for you,” she said with a smile.
“But I could never have done it,” [Milo] objected, “without everyone else’s help.”
“That may be true,” said Reason gravely, “but you had the courage to try;
and what you can do is often simply a matter of what you will do.”
“That’s why,” said King Azaz, “there was one very important thing about your quest
that we couldn’t discuss until you returned.”
“I remember,” said Milo eagerly. “Tell me now.”
“It was impossible,” said the king, looking at the Mathemagician.
“Completely impossible,” said the Mathemagician, looking at the king.
“Do you mean … ,” said the bug, who suddenly felt a bit faint.
“Yes, indeed,” they repeated together, “but if we’d told you then, you might not have gone …
and, as you’ve discovered, so many things are possible just as long as you don’t know they’re impossible.”
I think going one layer lower - the fundamental issue is that the internet drives people to unrealistic perceptions of the competence of others. Think about all of the undeniably brilliant people that have been involved in software over the past 40 years, and how many of them used hash tables in performance critical environments. Let alone mathematicians and others using them in applied domains. And you think there's something fundamental that all of these people just somehow missed?
The argument of 'if that's such a good idea, why wouldn't somebody have just done it already?' seems to have grown exponentially with the advent of the internet. And I think it's because the visibility of competence of other's became so much more clear. For those who lived through e.g. Carmack's Golden Age you knew you were never going to be half the coder he was, at least based on the image he successfully crafted. That 'slight' at the end is not to say he wasn't a brilliant developer or even perhaps the best in the world at his peak, but rather that brilliance + image crafting creates this Gargantuan beast of infallibility and exceptionalism that just doesn't really exist in reality. I think it's from this exact phenomena that you also get the practical fetishism of expertise.
A professor I had in college, whose first published result was from a piece of homework he turned in where he incidentally solved an open question about bound on a problem, had a curious habit.
I ended up failing and taking his course again (because I had A Lot going on in college), and thus, noticed something.
Each semester, on one of the assignments in the latter half of the class, he assigned one problem out of, perhaps, 30 in the problem set, where as written, it was actually an open problem, and then a day or two before they were due, he'd send out an "oops, my bad" revised version.
I suspect that this was not an accident, given that it always happened only once.
> A professor I had in college, whose first published result was from a piece of homework he turned in where he incidentally solved an open question about bound on a problem, had a curious habit.
A related anecdote is about George Dantzig (perhaps best known for the simplex algorithm):
> During his study in 1939, Dantzig solved two unproven statistical theorems due to a misunderstanding. Near the beginning of a class, Professor Spława-Neyman wrote two problems on the blackboard. Dantzig arrived late and assumed that they were a homework assignment. According to Dantzig, they "seemed to be a little harder than usual", but a few days later he handed in completed solutions for both problems, still believing that they were an assignment that was overdue. Six weeks later, an excited Spława-Neyman eagerly told him that the "homework" problems he had solved were two of the most famous unsolved problems in statistics.
Picking two examples out of all people approaching problems, while ignoring wasted effort and failures to make progress because of not understanding current knowledge, is an absolutely terrible reason to approach from ignorance.
The biggest gains in theory and in practice are far more often obtained by masters of craft, giving much more weight to attacking problems from a position of knowledge.
In fact, even in this case, this progress required that the author was aware of very recent results in computer science, was thinking deeply about them, and most likely was scouring the literature for pieces to help. The “Tiny Pointers” paper is mentioned directly.
Well said. I really dislike the narrative here, that ignorance is something that leads to discovery. One, the poster gives two examples, as if there's something we should gain for such a small sample. In addition to that, one of the examples isn't valid. The student's former professor is a co-author of the "Tiny Pointers" [1] paper that he was reading and working through. And, it was a conjecture, I don't see how someone should think that it would mean it's impossible.
I would rather, instead of thinking ignorance is a key ingredient to discovery, that instead it's the willingness to try things.
A similar idea came up in Veritasium's latest video today. Training AI by DeepMind to predict protein folding greatly improved by withholding the most evident information about a protein's primary structure — its linear polypeptide chain — within the The Structure Module step. [0]
After asking ChatGPT not to agree with me that your comment and these two different approaches to solving problems are the alike, it concluded there still might be similarities between the two.
It’s important to think outside the box, and that’s easier when you’re not aware of the box, but we also stand on the shoulders of giants, and are doomed to repeat history if we don’t learn from it. As usual, things aren’t clear-cut.
Plus, their choice of bringing up Balatro wasn't correct either. The developer DID play other deck builders, just not that many. Particularly, they played Slay the Spire which is the genre's most influential "Giant" and the entire reason for the game's progression structure (Small fights leading to a known big fight with particular anti-strategy gimmicks).
[0] links to an interview where the developer says they didn't play Slay The Spire ("the truth is that I hadn't played that game or seen footage of it when I designed Balatro")
> It’s important to think outside the box, and that’s easier when you’re not aware of the box
I don't think so. If you are not aware of the box, there is a much greater chance for you to be well within the box and not realizing it. By that, I mean that either you rediscovered something, or you are wrong in the same way as many others before you. By chance, you may find something new and unexpected, but that's more of an exception than the rule.
Nothing is more depressing than hunting for a parking spot, down a long one-way strip of parked cars, right behind another car hunting for a parking spot.
I’ve been working on and off for years on a scrabble endgame solver; it uses all these techniques from chess like transposition tables, Negamax with alpha beta pruning, NegaScout, aspiration search and so on. There’s a French person who built his own endgame solver and this solver is significantly faster than mine, even with all of the optimizations that I’ve put into it. He is kind of secretive about it because it’s closed source and he makes some money on it, but we’ve talked a bit about it, compared some positions and we’ve determined that his move generation algorithm is actually not asoptimized as mine. But he can still solve the endgame faster despite seeing fewer positions, which implies to me that he’s doing a significantly better job of pruning the tree.
But when we try to talk details, I asked him for example do you use minimax with alphabeta pruning and he told me like “I’m not sure if I am using minimax or what that is :(“ .. I ask him to describe what he does, he essentially describes minimax with pruning. I’ve sorta figured out that he must be doing some very intelligent version of an aspiration search. It’s really eye-opening because he doesn’t have any of this training. He’s never seen any related algorithms, he’s just figuring all this out on his own.
I think of Andre Geim as a great example of balancing the two. I couldn't find the exact quote, but he said something to the effect of "when I enter a new field, I make sure I learn the basics so I don't spend all my time making dumb mistakes. But I don't get so into it that I get stuck in the mindshare."
I'll also say I think that diversity in approaches is more important than One Right Way. Some people need to set out on their own, while others spend decades refining one technique. Both have led to extraordinary results!
If we achieved local maximum at something, the only way to progress is to make a big leap that brings you out of it. The trouble is that most of such big leaps are unsuccessful. For every case like you are describing there are probably hundreds or thousands of people who tried to do it and ended up with something worse than the status quo.
Many problems are abstract and so we have to build "cartoon" models of what's going on, trying to distill the essence of the problem down to a simple narrative for what the shape of the problem space is and where the limitations are. That often works but backfires when the cartoon is wrong or some assumptions are violated about when the cartoon description works.
Results like this are pretty rare, nowadays, and I suspect this happened because the problem was niche enough or some new idea has had time to ferment that could be applied to this region. This seems like a pretty foundational result, so maybe I'm wrong about that for this case.
A lot of progress is made when there's deeper knowledge about the problem space along with some maturity for when these cartoon descriptions are invalid.
This reminds me of the Neal Stephenson article "Innovation Starvation" from 2011:
>A number of engineers are sitting together in a room, bouncing ideas off each other. Out of the discussion emerges a new concept that seems promising. Then some laptop-wielding person in the corner, having performed a quick Google search, announces that this “new” idea is, in fact, an old one—or at least vaguely similar—and has already been tried. Either it failed, or it succeeded. If it failed, then no manager who wants to keep his or her job will approve spending money trying to revive it. If it succeeded, then it’s patented and entry to the market is presumed to be unattainable, since the first people who thought of it will have “first-mover advantage” and will have created “barriers to entry.” The number of seemingly promising ideas that have been crushed in this way must number in the millions. What if that person in the corner hadn’t been able to do a Google search?
>In a world where decision-makers are so close to being omniscient, it’s easy to see risk as a quaint artefact of a primitive and dangerous past (…) Today’s belief in ineluctable certainty is the true innovation-killer of our age
I believe Ramanujan did the same with various maths problems. The Cambridge undergrad course sprinkles a few unsolved problems in the practice questions just in case someone does this again.
There's a reason the phrase "beginner's luck" exists. I'm not sure the naïveté and success are causally related so much as they might be coincident.
Could knowing about prior research skew one's perspective and tarnish novel thought? Sure. But we don't know. Maybe we'd have an even better Balatro if the creator knew about some other deck builders. Maybe we wouldn't, we don't know. We cannot prove the counterfactual.
On the opposite extreme, there are examples of thinkers whose success stemmed from knowing much about one domain or much about many domains and integrating (Luhmann, Goethe, Feynman, Von Neumann etc.). In the general case, I think we are probably much better off promoting knowledge and study, and not ignorance and chance.
That said, I do think we should retain our willingness to play and to try things that are "out of bounds" with respect to the existing accumulated knowledge. We should live informed lives, but play and explore like unschooled children.
> the authors have also learned of several other hash tables that make use of the same high-level idea in different settings [7, 9].
At least part of the result was already known, and the fact authors didn't know about it mostly goes to the large corpus of knowledge we already posses.
But the core inspiration came from looking at another recent research paper "Tiny Pointers": that is totally against your premise.
If Krapivin was a software engineer looking to implement this solution as optimization for a particular problem, he would have done so without ever thinking of making a research paper to prove it formally, but mostly relied on benchmarking to prove his implementation works better.
Now, it has always been somewhat true that lots of existing knowledge limits our creativity in familiar domains, but you need both to really advance science.
This is confirmation/survivorship bias. You only hear about these positive cases. The vast majority just ends up rediscovering old techniques and their year-long paper/work got rejected.
> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.
This is an "Einstein failed Math" fallacy. It's true that novel and notable work tends strongly not to be bound by existing consensus, which when you say it that way is hardly surprising. So yes, if consensus is wrong in some particular way the people most likely to see that are the ones least invested in the consensus.
But most problems aren't like that! Almost always the "best" way to solve a problem, certainly the best way you're going to solve the problem, is "however someone else already solved it". But sometimes it's not, and that's when interesting stuff happens.
> Krapivin made this breakthrough by being unaware of Yao's conjecture.
I don't think there's any evidence of this. Yao's conjecture is not exactly standard undergraduate material (although it might be—this is a commentary on detail rather than difficulty. But i certainly didn't encounter this conjecture in school). If not knowing this conjecture was the key, millions and millions of students failed to see what Krapivin did. I imagine you'd have to ask him what the key to his insight is.
Hashing is a pretty unintuitive sort of computation. I'm not surprised that there are still surprises.
Great point. Also, Krapivin was working on another paper co-authored by his former professor. He in fact was not working from ignorance. And like you said, most of everyone didn’t know anything about this conjecture, so ignorance certainly wasn’t an ingredient here.
Your exact thoughts have already been put to paper by
L.P.Hammet, godfather of physical organic chemistry (exact description of chemical reactions):
one might “... overlook the great difference between exact theory and approximate theory. Again, let me emphasize my great respect for approximate theory. [...] if one starts looking for an effect predicted by this kind of theory to be impossible, the odds are against a favorable outcome. Fortunately, however, the community of scientists, like that of horseplayers, contains some people who prefer to bet against the odds as well as a great many who always bet on the favorite. In science we should, I think, do all we can to encourage the man who is willing to gamble against the odds of this sort.
This does not mean that we should encourage the fool or the ignoramus who wants to play against suicidal odds, the man who wants to spend his time and usually someone else’s money looking for an effect incompatible with, let us say one of the conclusions reached by Willard Gibbs. Gibbs started from thoroughly proven generalizations, the first and second laws of thermodynamics, and reasoned from them by exact mathematical procedures, and his conclusions are the best example I know of exact theory, theory against which it is futile to struggle.”
There's a problem in all human understanding - knowing when, and knowing when not to apply pre-existing knowledge to a problem.
Have we been grinding away in the right direction and are only moments away from cracking the problem, or should we drop everything and try something completely new because we've obviously not found the solution in the direction we were heading.
To put it into a CS type context - Should we be using a DFS or BFS search for the solution, because we don't have knowledge of future cost (so UCS/Djikstra's is out) nor do we know where the solution lies in general (so A* is out, even if you ignore the UCS component)
> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.
Both Danny Trejo and Tim Allen spent time in prison before becoming famous. While that's interesting, I'm not sure I'm ready to believe that's the best way to become a professional actor.
Edit to be a little less snarky, apologies:
"Outsiders" are great for approaching problems from fresh angles, but I can almost guarantee that the majority of nuts-and-bolts progress in a field comes from people who "fall in the rut of thought" in the sense that they area aware enough of the field to know which paths might be most fruitful. If I had to place a bet on myself, I wouldn't take a wild uninformed swing: I'd get myself up to speed on things first.
Outsiders sometimes do great work. They also sometimes:
There is a book about this theory written in the 1960's called 'The Structure of Scientific Revolution' by Kuhn that talks about some sciences which progress one funeral at a time and how progress is not linear. He also remarks how people from outside the standard thoughts and education surrounding the current system are typically the ones to actually progress science.
One example is Geocentrism vs Copernican astronomical models -- Copernican could never have sprung from the status quo because everything revolved around the Earth in Geocentrism instead of around the Sun. You can't square that circle.
About geocentrism vs heliocentrism, 3blue1brown has recently released a video [1] that talks about about it. It is about the cosmic distance ladder, but geocentrism is mentioned, and it makes a lot of sense in context.
To summarize: heliocentrism was known to the ancient Greeks, who realized the Sun was much bigger than the Earth, so it would seem logical to have the Earth go around the sun.
But the counterargument was that if the Earth goes around the Sun, the stars should move relative to each other during the year, because of parallax, and they didn't have instruments that were precise enough to see it, so they assumed that they didn't.
Copernicus major contribution wasn't heliocentrism, but the orbital periods of planets. And the model wasn't complete until Kepler calculated the shapes of the orbits. For details, watch the video, it is really good.
I'm being picky here, but I don't think you portray an fair view of Kuhn's epistemology here.
Kuhn does not define a value-scale of both methods, on the contrary, he merely introduces the concept of different researchs: one being critical (developing new paradigms) and one being accumulating (further refining existing paradigms).
He also hints to the almost inevitably organic interactions between the two, such that critical research naturally evolves from a pragmatic need to express things simply from a new paradigm when the old one becomes too clumsy for a use case.
This is what happened in your example as well. Copernic (and later Galileo) did not invent heliocentrism out of the blue, the theory around it existed since antic Greece. It is even arguably the Renaissance, leading metaphysicists to revisit ancien texts, that spurred the idea to Copernic to consider it. But ultimately the need for the new paradigm was pushed by the need to revisit the calendar, which was drifting, and the difficulty to do it in a geocentric world, where you have to take planet retrocession into account.
Heliocentrism was well known, the issue was that the copernican model was a bad model for the evidence and knowledge of physics available at the time (it was basically equivilent to a geocentric model but less need more epicycles, not less, and also required that the earth rotated and some unusual properties for stars). It took Kepler figuring out ellipses and slowly beating out epicycles as a way to do the math, as well as some other experiments which established the world did indeed rotate (not for lack of trying by heliocentricism advocates, but it's a hard measurement to make), to bring the idea mainstream. (And arguably only Newton's laws of motion actually tied it all together)
Having just finally read (well, listened to) Kuhn's book, I can say:
(a) I wouldn't quite characterize the book as being "about this theory" — it's a bit more nuanced. He definitely says that it's usually younger scientists with less invested in the currently reigning theory that are most likely to push forward a revolution. However, I don't recall any examples in the book of people who where wholly _unaware_ of the previous theory.
(b) You should absolutely, definitely read it. It's a classic for a reason, and the writing style is a delight.
I've always had a mind that worked that way - I can imagine how something works or could work before looking up how it actually does work. But there's no real benefit to thinking that way in my experience. Thinking differently has only been a career impediment or gotten on my teachers nerves for being "smart" in my experience.
For example, as a young kid I saw a geometric ball made up of hinges that allow it to expand and contract, and in some stages it looks a little like a gear. So then I started wondering if you change gears instead of switching gears in a car. Then a decade or so later I started seeing CVT transmissions in cars, which is the same concept where you can change the size/ratio by expanding or contracting the roller instead of switching gears.
I agree in the specific case that the state of the art is in a local maxima, but saying "the best way to approach a problem is by not being aware of disregarding previous attempts" ignores the much more frequent banal work of iterative improvement. Leaping out of a local maxima is rare and sexy and gets articles written about you and is important, but the work of slowly iterating up to a nearby peak is also important.
I think progress needs both individual achievements who break out of preconceived notions and the communal work of improving within the notions we currently have.
Last year's Advent of Code had a task that was NP complete and lacked good well known approximation algorithms. I almost gave up on it when I realised as that feels impossible
In practice the data was well behaved enough and small enough that it was very doable.
"fall[ing] in the rut of thought" reminds me of this paragraph from "The Footpath Way":
> So long as man does not bother about what he is or whence he came or whither he is going, the whole thing seems as simple as the verb "to be"; and you may say that the moment he does begin thinking about what he is (which is more than thinking that he is) and whence he came and whither he is going, he gets on to a lot of roads that lead nowhere, and that spread like the fingers of a hand or the sticks of a fan; so that if he pursues two or more of them he soon gets beyond his straddle, and if he pursues only one he gets farther and farther from the rest of all knowledge as he proceeds. You may say that and it will be true. But there is one kind of knowledge a man does get when he thinks about what he is, whence he came and whither he is going, which is this: that it is the only important question he can ask himself. (The Footpath Way, Introduction (1))
Even though the author is talking about a different kind of knowledge, the image of sticks of a fan - where going down one gradually excludes the others - stuck with me.
This is a really tough problem. I don't think ignorance is the answer, but it's also difficult to set aside things that seam legitimate and go down a rabbit hole of reinventing something on a hunch. I guess the saving grace is that it's impossible to know enough about such a wide swathe that it's often a problem. With large models that conceivably can encode the collective knowledge, though, we have to be vigilant about creating an orthodoxy that ultimately constrains us.
I watched a casual youtube video by a philosophy professor talking about the same thing that great scholars are different than great thinkers. Many great thinkers came up with great philosophies because they misread past works.
Eventually I'll get to actually rolling a POC/tech demonstrator that just has less modules at perhaps less current density, for showing that even several kV DC can be efficiently transformed not just on paper to few or sub kV DC. At enough voltage grounding is no longer optional anyways, so might as well do essentially an auto transformer plus extra protection to protect humans against electric shock (RCD doesn't work directly, but the functionality can still be offered, it just has to sense quite differently).
Why DC? An overhead line only limited by peak voltage (arc) and thermals can carry twice the power when running DC instead of AC, assuming both measured relative to ground.
Also, you can run you transistors completely steady-state at all frequency components between their own switching fundamental and your load transients.
No more over provisioning just to make up for legacy 50/60 Hz AC.
Also, to a degree, you can just plug raw batteries in with that be DC grid, at most having a little bit of DC regulation to force the voltage a bit higher/lower than the batteries.
Like, a power supply basically rated to a couple percent of the battery input/output max power: only need to move the small extra voltage, though ofc at the full current.
Lastly, DC converters are just way smaller and lighter, so you could avoid the heavy bulky transformers in trains and alleviate power limiting from them. Relevant for fast double-decker trains because you'd prefer to have human space where you currently park the transformer.
I have to say though, novel development of technology by pulling recent innovations in the fundamental/material science fields underlying the target, is very not an easy thing to do.
I would say not letting your thoughts be constrained by the bias of existing approaches.
This isn't easy, at all. It requires training yourself into having a open and flexible mind in general.
Not knowing about something is more like a cheat to get there easier.
But it's supper common that innovation involves a lot of well known foundation work and just is very different in one specific aspects, and it's quite hard to know about the other foundation work but not that specific aspect especially if you don't even know which aspect can be fundamentally be "revolutionized"/"innovated".
But what always help if you learn about a new topic is to try blindly first yourself and then look at what the existing approaches do. Not just for doing ground braking work but even for e.g. just learning math.
One of the math teachers I had over the school years before university used this approach for teaching math it yielded way better independent understanding and engagement it helped me a lot later one. Sadly I only had that teacher for 2 years.
> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.
I don't think that's warranted.
You will find that the vast majority of lottery winners have bought lottery tickets. However that doesn't mean that buying lottery tickets is a good idea financially.
You get novel branches of thought, but in the limit case, you're also reinventing the universe to bake an apple pie.
So there's something of a tradeoff between attempting to ensure people can do more than mimic existing doctrine and efficiency of getting up to speed without having to re-prove existing math and science.
The Balatro dev also, for example, has talked about how he was heavily influenced by several specific other games.
Walter Isaacson said something similar about Einstein and Steve Jobs. Sometimes you need to reject commonly held assumptions to make progress. Einstein rejected the idea of ether. According to Isaacson this was probably because Einstein was working outside of university. Inside university, professors would likely have pushed Einstein to stick to the idea of ether.
Consider a simplified example. There is some area of scientific research. Working within the framework gives you a 1 in 4 chance of making some minor improvement. Working outside the framework gives you a 1 in a million chance to create a great leap in knowledge.
For any single individual, the best choice is the former. The latter is a gamble that most people will lose, wasting their lives chasing crazy theories.
For society, you want a split. You need some doing the second option to have the eventual amazing discovery, but you also need to progress the current understanding further.
If we introduce a chance for the minor progress to lead to the same major advancement, it becomes a bit more simple for society to calculate the best allocation of researchers, but for any single person, the best option still remains to dedicate themselves to the small advancement.
It is just easier to think out of the box when you do not have your mind "polluted" with previous ideas and from time to time someone appears that was thinking just in another way, probably the most obvious to them without knowing about the orthodox thinking in the subject.
> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.
Extrapolating a "best way" from a couple of examples of success is bad reasoning. There are definitely ways in which it can be necessary to ignore the standing wisdom to make progress. There are also definitely ways in which being ignorant of the knowledge gained by past attempts can greatly impede progress.
I would point out, that it is also possible to question and challenge the assumptions that prior approaches have made, without being ignorant of what those approaches tried.
Figuring which is which, is indeed hard. Generally, it seems like it works well to have a majority of people expanding/refining prior work and a minority people going in and starting from scratch to figure out which of the current assumptions or beliefs can be productively challenge/dropped. The precises balance point is vague, but it seems pretty clear that going to far either direction harms the rate of progress.
Ok, but you are disregarding the 1000s of things the undergrad was aware of and the fact that he worked with other researchers who were aware of the existing results enough to understand the significance of the result.
The real trick is simply to try to understand things directly and not rely on proof by authority all the time.
It takes time to read all the prior research. You could grow old by the time you get through it all. Likelihood of contributing to the field declines with age.
You might believe someone's proof of a conjecture and then be discouraged from delving any more into that rabbit hole.
More often than not you will be reinventing something. But that's not necessary less productive than reading other people's work. In the former case, you're at least making something, if not new.
So there are some arguments for being fresh in a well-trodden field with an ocean of research that you cannot boil all at once.
On the other hand, there is the publish-or-perish pressure in academia, which requires original research. You could just keep busy and throw enough shit agains the wall such that enough of it sticks.
Domain knowledge is valuable as you can wield it as opportunities arise to great effect. This lets you leap frog problems by applying known solutions. There's risk of being blind to novel approaches that require innovation.
Being capable of tackling problems from first principles is invaluable because we frequently encounter problems that are novel in some dimension, even if that dimension is the combination of dimensions. This lets you leap frog large problems by decomposition, possibly going against the grain and innovating by, hopefully, simplifying. However there is risk in falling into traps that countless others have already learned the hard way.
This may come as a surprise to some but, believe it or not, you can have both. In fact, you should.
There is certainly a need for ignoring common wisdom if you want to make something new. I don't think being unaware of it is necessary as long as you are willing to go forward while being told that you are on a fool's errand.
> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.
I think sometimes this is true. On the time I've had new starters on my engineering team I've always tried to teach them about the problem before they get exposed to any of our solutions. Sometimes they will have brand new insights that we've been completely blind to. It doesn't always happen, but there is only one opportunity for this, once they've seen the solutions they can't be unseen.
It's less that it's the best way to approach a problem, but that it optimizes for a different goal. Building on existing knowledge is how you find the local maxima for a problem by moving along the slop you have. Starting from scratch is how you find different slopes, which may lead to higher local maximas.
Of course, if you happen to be on a slope that leads to the global maxima, starting from scratch is far less effective. We don't really know where we are usually, so there's a trade-off.
There was a good article posted to HN years ago that covered this and used rocketry as one of the examples, but I don't recall what it was. The author was well known, IIRC.
In university lectures, we'd be presented with a problem on one slide, and then like ten seconds later the solution on the next. I'd usually cover my ears and look away because I was still busy coming up with my own solution!
Somewhat surprisingly (to me), this is also found for User Interfaces [0]. The best initial design for a User Interface for a feature phone was done by designers who were not shown previous work by other designers. Iterations based on previous designs were only better if they were shown the "winning" initial design".
> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before...
That depends...
- Krapivin was an undergrad, tinkering with stuff for fun. If he'd put a couple months into this, and just ended up re-inventing a few things? That'd be decently educational.
- Vs. if your team needs to ship product, on something resembling the schedule? Yeah. You definitely stick to tried-and-true algorithms.
This soinds like the approach deepseek CEO used for hiring. He favored young inexperienced teams so they can bring a fresh perspective and try things from new way. It paid off nicely.
In terms of practical engineering, this is also why I love to do side projects that reject existing bodies of libraries, and try to work up from first principles, and/or focus on composability rather than integration.
It's a trade-off, at first it takes longer to iterate on features, but sometimes a more minimal and/or composable tool finds its way to production. Real Systems are made of duct tape anyways.
When training physically, you can overtrain one muscle and depend on them. By not using those muscles on purpose you can improve your other muscles.
It is well known that limitations improve creativity.
That said I still think the best path is to learn a classical path, if you want you can question some axioms, but it's mostly irrational in that there's almost no reward for you personally, except clout, most of the reward goes to the whole science.
I used to think this a few decades ago. I think it's just as accessible with some mix of anti-authoritarianism and defiant personality.
Essentially you learn a thing, you accept it for now and you think "well but maybe!"
Like I personally think there should be multiple mathematical zeroes but I accept it as wackiness unless I can clearly demonstrate coherency and utility as to why.
> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.
Maybe it's just because there are more people working on these problems who don't know previous approaches than the opposite.
> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.
Survivorship bias, you aren't aware of all the failures where people who were unaware of prior art made all the mistakes predictable to people who were.
> I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before.
Everyone likes to focus on why you cannot do and why trying will be futile.
You don't have to disregard prior efforts. You just have to focus on one simple question:
That's a load of selection bias though. I'm sure there have been many, many more people who don't know anything about deck builder games who tried to make one and didn't succeed.
This is nonsense. You need to double check the answers, spot mistakes, adapt the code to your needs and go to the sources it lists to learn rapidly about that particular thing.
it's fundamentally how these things work; learning token distribution given prior context. The expected output over time is the mean value of that distribution. Regression to the mean is the danger I'm talking about.
I think it's 2 different approaches, some enjoy the one (playing with the thing itself) and some enjoy the other (playing with the various secondhand abstractions that refer to the thing).
They are different tastes. They deliver different results.
Similarly, the fortran or algol team implemented a lot of optimization tricks on first try, things that are now considered advanced, without "knowing it".
This is relevant to HN because I'm probably paraphrasing this incorrectly but pg has said the following about why it's hard to launch a startup: the vast majority of ideas that sound stupid are, in fact, stupid. The ones that sound like great ideas have most likely already been done. Thus, the startups that have huge growth potential tend to be the ones that sound stupid given the conventional wisdom (so unlikely to have been tried yet) but are, contrary to the norm, actually great ideas in disguise. These ideas are basically by definition very rare.
Better yet, let customers decide if it’s reinventing the wheel. Many times, founders prematurely decide it’s duplicative, or delude themselves into thinking it’s not.
We all guess at the value customers receive, but only they can say for sure.
Also there isn’t anything that hasn’t been done before either in Balatro, it’s just a nice combination of deck builder tricks applied to poker. And the presentation is also well done which has nothing to do with the mechanics.
Balatro took the basic game mechanics of a very familiar game and said “what if they were dynamic”. The world’s a big place and I’m willing to believe it’s been done before… but I can’t think of one.
It’s the combination of familiar scoring mechanics with fun meta game modifiers that made Balatro so successful. What happens to poker if two of a kind is suddenly the most important hand? Or if not playing face cards leads to incrementally better scores every hand?
Again, I can’t claim it’s never been done, but saying it’s just another deck builder is missing the point.
What's the proportion to breakthroughs that are easier with familiarity? How many accidental discoverers do we need to match the output of a Terrance Tao or an Erdős?
That seems like the wrong question to ask. After all, there's no shortage of people who are unfamiliar with Yao's conjecture.
Or alternatively, even the most well-read person is not au fait with the state of the art in almost all subjects, so they have a chance to make an accidental discovery there.
But this kid wasn't an outsider: he was already studying computer science at perhaps the most rigorous and prestigious institution in the world, and it's not a coincidence that he made this discovery rather than an equally talented twenty-year-old who works in a diamond mine in Botswana. There's no risk that we'll reduce the number of accidental discoveries by educating people too much.
> That misses the point that there may be breakthroughs that are much harder or near impossible to make if you're familiar with the state-of-the-art.
If that's the point, you should maybe try and find even a single example that supports it. As the article points out, Krapivin may not have been familiar with Yao's conjecture in particular, but he was familiar with contemporary research in his field and actively following it to develop his own ideas (to say nothing of his collaborators). Balatro's developer may not have been aware of a particular niche genre of indie game[1], but they were clearly familiar with both modern trends/tastes in visual and sound design, and in the cutting edge of how contemporary video games are designed to be extremely addictive and stimulating. To me, these examples both seem more like the fairly typical sorts of blind spots that experts and skilled practitioners tend to have in areas outside of their immediate focus or specialization.
Clearly, both examples rely to some extent on a fresh perspective allowing for a novel approach to the given problem, but such stories are pretty common in the history of both math research and game development, neither (IMO) really warrants a claim as patently ridiculous as "the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before."
[1] And as good of a video game as Balatro is, there are plenty of "roguelite deckbuilder" games with roughly the same mechanical basis; what makes it so compelling is the quality of its presentation.
I get what you are saying but what if the amount of breakthroughs by people that did know about what came before was orders of magnitude higher than this number, would that change your mind?
You hear about this stuff because it's notable. Almost 100% of the time, if you disregard what other people have done, you are going to waste a lot of time.
For every case like this you have thousands of people who waste a huge amount of time and mental effort recreating something that has already been invented.
Maybe the best way to have the best of both worlds is to ensure well-established areas of research are open to “outsider art” submissions on the topic?
You've remembered two examples of this (arguably) happening, so you attempt to draw a conclusion based on the ease with which you came up with those examples. But in reality, this method of inference is prone to error, as it doesn't consider the denominator, or how many attempts were made to achieve the results you're able to remember.
Sounds like a bit of survivorship bias. Every success from people following well known principles does not translate into a blog post or research paper. You also don’t hear about all of the people who failed because they tried something novel and it didn’t work.
I would suggest positive takeaways is to: trust but verify. If you’ve got a novel solution idea and don’t understand why others aren’t doing it that way, do both and compare. You’re guaranteed to learn something one way or another. Also: if you reinvent the wheel or do something suboptimal then that’s okay too. Sometimes the solutions don’t make sense until you see what doesn’t work. Likewise: be open to learning from others and exploring solutions outside of your predefined notion of how things should work.
A while back I was asked to write some software to make print out labels small enough to fit through a button hole. I convinced myself it wasn't possible because the spacing between the cutter and the print head. Then my boss showed me a label that was printed by a competitors system and I had it figured out within an hour or so. Although this was a minor thing it convinced how powerful knowing something is possible (or not) really is.
To me science is about defining the bounds of the possible. To do that you also need to push the bounds and occasionally update everything you thought you knew. In the case of CS where we find new, lower bounds, I find that especially exciting.
I meant to append this to my original reply but I’ll say here: I enjoyed Knowledge Based AI from Georgia Tech. The lectures tallied about different types of knowledge acquisition as it applies to humans and them how we’ve mapped that to machines. That along with HCI were my favorite OMSCS courses.
In your case, seeing a new, lower bounds helped push you to look in different directions and re-examine your assumptions. In KBAI we weren’t allowed to share code (of course) but were given very wide leniency in what we could share. Posing my performance numbers and reading other’s gave me a sense of what’s possible and had a similar effect as your story.
> You also don’t hear about all of the people who failed because they tried something novel and it didn’t work.
Or all the people who, through ignorance or hubris, thought they could do better than the state of the art. Or all the people who independently invent things that already exist. These may be what you're referring to, but thought it's worth re-stating these as they are far more common cases than people inventing truly novel approaches.
I cannot tell you the number of times I thought I invented something new and novel only to later find out it already existed. So, while it is true that you can sometimes find paths untraveled, many things related to first principles seem already heavily explored in CS.
That sounds like it could be interpreted two ways. On one hand you’re not the first to discover something, on the other hand your invention is validated as being worthwhile.
In times like those (depending on how much work I put into it) I might retrace the steps I took when I was searching for a solution before writing my own and if I can find something like a stack overflow post then link to the ultimate solution. Or blog about it using the search terms I originally tried etc.
A core part of science is reproducing others work.
Also from HCI one thing I took away from research on brainstorming: slight variations and deviations can be novel and produce a better outcome. The research here is that people misunderstanding someone else’s idea isn’t a problem, but rather generates a brand new alternative. If you feel you’ve redone some work, look a little closer, perhaps something small about it is novel or new.
I actually have a hot take that is related to this (been showing up in a few of my recent comments). It is about why there's little innovation in academia, but I think it generalizes.
Major breakthroughs are those that make paradigm shifts. So, by definition, that means that something needs to be done that others are not doing. If not, things would have been solved and the status quo method would work.
Most major breakthroughs are not the result of continued progress in one direction, but rather they are made by dark horses. Often by nobodies. You literally have to say "fuck you all, I'm doing this anyways." Really this is not so much different than the founder mentality we encourage vocally yet discourage monetarily[0]. (I'm going to speak from the side of ML, because that's my research domain, but understand that this is not as bad in other fields, though I believe the phenomena still exists, just not to the same degree). Yet, it is really hard to publish anything novel. While reviewers care a lot about novelty, they actually care about something more: metrics. Not metrics in the way that you provided strong evidence for a hypothesis, but metrics in the way that you improved the state of the field.
We have 2 big reasons this environment will slow innovation and make breakthroughs rare.
1. It is very hard to do better than the current contenders on your first go. You're competing against not one player, but the accumulated work of thousands and over years or decades. You can find a flaw in that paradigm, address the specific flaw, but it is a lot of work to follow this through and mature it. Technological advancement is through the sum of s-curves, and the new thing always starts out worse. For example, think of solar panels. PVs were staggeringly expensive in the beginning and for little benefit. But now you can beat the grid pricing. New non-PV based solar is starting to make their way in and started out way worse than PV but addressed PV's theoretical limitations on power efficiency.
2. One needs to publish often. Truly novel work takes a lot of time. There's lots of pitfalls and nuances that need to be addressed. It involves A LOT of failure and from the outside (and even the inside) it is near impossible to quantify progress. It looks no different than wasting time, other than seeing that the person is doing "something." So what do people do? They pursue the things that are very likely to lead to results. By nature, these are low hanging fruit. (Well... there's also fraud... but that's a different discussion) Even if you are highly confident a research direction will be fruitful, it will often take too much time or be too costly to actually pursue (and not innovative/meaningful enough to "prototype"). So we all go in mostly the same direction.
(3. Tie in grants and funding. Your proposals need to be "promising" so you can't suggest something kinda out there. You're competing against a lot of others who are much more likely to make progress, even if the impact would be far lower)
So ironically, our fear of risk taking is making us worse at advancing. We try so hard to pick what are the right directions to go in, yet the truth is that no one has any idea and history backs this up. I'm not saying to just make it all chaotic. I think of it more like this: when exploring, you have a main party that travels in a set direction. Their strength together makes good progress, but the downside is there's less exploration. I am not saying that anyone should be able to command the ship on a whim, but rather that we need to let people be able to leave the ship if they want and to pursue their hunches or ideas. Someone thinks they saw an island off in the distance? Let them go. Even if you disagree, I do not think their efforts are fruitless and even if wrong they help map out the territory faster. But if we put all our eggs in one basket, we'll miss a lot of great opportunities. Right now, we let people off the main ship when there's an island that looks promising, and there are those that steal a lifeboat in the middle of the night. But we're all explorers and it seems like a bad idea to dissuade people who have that drive and passion in them. I know a lot of people in academia (including myself) who feel shackled by the systems, when really all they want to do is research. Not every one of these people are going to change things, in fact, likely most won't. But truth is, that's probably true if they stay on the ship too. Not to mention that it is incredibly common for these people to just leave academia all together anyways.
Research really is just a structured version of "fuck around and find out". So I think we should stop asking "why" we should pursue certain directions. "Because" is just as good of an excuse as any. In my ideal world, we'd publish anything if there is technical correctness and lack of plagiarism. Because the we usually don't know what is impactful. There are known knowns, known unknowns, and unknown unknowns. We really are trying to pretend that the unknown unknowns either don't exist, are not important, or very small. But we can't know, they're unknown unknowns, so why pretend?
[0] An example might be all the LLM based companies trying to make AGI. You want to compete? You're not going to win by making a new LLM. But one can significantly increase their odds by taking a riskier move, and fund things that are not well established. Other types of architectures. And hey, we know the LLM isn't the only way because we humans aren't LLMs. And we humans also use a lot less energy and require far less data, so even if you are fully convinced that LLMs will get us all the way, we know there are other ways to solve this problem.
No, no, no. That's the wrong thing to take away from it.
Something I got from Richard Feynman descriptions of his method of study, was to first and foremost, read the prompt of the problems, and work diligently trying to solve the problems by himself, for a reasonable amount of time.
Then, and only then, go and read the other solutions. The solutions can be the same, they can be different, and by doing all this preliminary work the researcher can truly understand the nuances of these solutions, something they can't grasp if the solutions were shown just after reading the problem.
So, the best way to approach a problem is:
- Try to solve it by yourself. Several times if necessary, give it an honest effort.
- Then, solved or not, go and read other people's solutions.
Ok, big shout out to monort [0] for the link to the video [1].
This is just a quick overview from a single viewing of the video, but it's called "funnel hashing". The idea is to split into exponentially smaller sub arrays, so the first chunk is n/m, the second is n/(m^2), etc. until you get down to a single element. Call them A0, A1, etc., so |A0| = n/m, |A1| = n/(m^2) etc., k levels in total.
Try inserting into A0 c times. If it fails, try inserting into A1 c times. If it fails, go down the "funnel" until you find a free slot.
Call \delta the fraction of slots that are empty (I'm unclear if this is a parameter that gets set at hash table creation or one that's dynamically updated). Setting c = log(1/d) and k = log(1/d) to get worst case complexity O(log^2(1/d)).
This circumvents Yao's result by not being greedy. Yao's result holds true for greedy insertion and search policies and the above is non-greedy, as it's cascading down the funnels.
There are probably many little hairy details to work out but that's the idea, as far as I've been able to understand it. People should let me know if I'm way off base.
This very much reminds me of the "Distinct Elements in Streams" idea by Chakraborty, Vinodchandran and Meel[2].
Actually they propose two algorithms: Funnel Hashing and Elastic Hashing. Funnel Hashing is "greedy" and defeats the Yao's conjecture that concerns greedy hash mechanisms. Elastic Hashing is "non-greedy" and provides a better amortized time than greedy algorithms.
That it circumvents Yao’s conjecture by being non-greedy contradicts the article. Is the article wrong or is your understanding of the paper? I don’t know, just want to see if you’re noticing something the article’s authors don’t know.
> Farach-Colton, Krapivin and Kuszmaul wanted to see if that same limit also applied to non-greedy hash tables. They showed that it did not by providing a counterexample, a non-greedy hash table with an average query time that’s much, much better than log x.
One thing I don't understand from watching the video, is what happens in the (very rare) case that you get collisions all the way down the funnel. I assume this is related to the "One special final level to catch a few keys" (around 14:41 in the video), but given that it has to be fixed size, this can also get full. What do you do in that case?
Thanks! So I guess the best recourse then is to resize the table? Seems like it should be part of the analysis, even if it's low probability of it happening. I haven't read the paper, though, so no strong opinion here...
(By the way, the text fragment does works somewhat in Firefox. Not on the first load, but if load it, then focus the URL field and press enter)
Yeah, I presume so. At least that's what Swiss Tables do. The paper is focused more on the asymptotics rather than the real-world hardware performance, so I can see why they chose not to handle such edge cases
This bothered me too, reading it and the sample implementations I've found so far just bail out. I thought one of the benefits of hash tables was that they don't have a predefined size?
The hash tables a programmer interacts with generally very much have a fixed size, but resize on demand. The idea of a fixed size is very much a part of the open addressing style hash tables -- how else could they even talk of how full a hash table is?
What the author says there is "What we just showed was that we can achieve a worst-case expected probe complexity of log squared one over delta with a greedy algorithm. And we don't have too much time to go over the non-greedy algorithm but...".
The funnel hashing described in the video is greedy. The video doesn't cover the non-greedy elastic hashing.
"Greedy" means that the search and insertion do the same probe sequence, and insertion just uses the first free slot in that sequence.
This strikes me as something that many people probably figured out a non-rigorous version of and didn't think it was special.
It's kind of one of those resource management hacks you do when you're constrained and screwed by limitations. Splitting things up by priority is a common go-to for resource allocation. This is a spin on that.
I wonder how many other "in the trenches hacks" people have done that overturn widely accepted things the inventors didn't realize were a big deal: "well I usually have a bunch of deliveries to make and I've figured out a clever way to map out the quickest path... "
Don't get me wrong - recognizing it and then formalizing it, doing the work, publishing the paper - that's a lot of effort. I'm not taking that away.
Also relevant: in this particular case the authors themselves note that the results better theoretical behavior in the worst case, but no practical uses yet. So I think any software engineer exploring this direction would have abandoned it pretty quickly, for the same reason that galactic algorithms aren't typically invented by them either (unless they also do compsci as a hobby of course). In fact the Wiki page for galactic algorithm mentions another optimal-but-impractical hash table as one of its examples[0][1].
> I wonder how many other "in the trenches hacks" people have done that overturn widely accepted things the inventors didn't realize were a big deal: "well I usually have a bunch of deliveries to make and I've figured out a clever way to map out the quickest path... "
A lot of them. Having said that: yes, I can imagine that others would have thought up Dijkstra's shortest path algorithm, since he himself said it came to him while shopping, and that it only took him twenty minutes to reason through the original O(n²) algorithm. (edit: oh wait, that's what you're alluding to isn't it? Heh, that went straight over my head).
On the other hand, I don't think the faster versions of Dijkstra's algorithm would have been invented by anyone without at least some understanding of priority queues and big-O behavior. And at that point I hope people realize that they possess some specialized knowledge that might not be entirely common.
In fact, I'd argue that the true strength of Dijkstra's write-up is that it gives us a vocabulary to reason about it and come up with specialized data structures for particular situations.
Anyway, what you're touching on is the difference between engineering and science: engineering works with confidence built from tests, rules of thumb that reflect lessons learned from historical results, and (in modern times) verified predictions from science. Those rules of thumb might be used when lacking a deeper scientific understanding of why it works. The tests might exist to work around the limitations of scientific knowledge (e.g. modelling turbulence). Science creates insights and predictions through modelling of empirical results. At least that's the difference according to Bill Hammack[0].
In an ideal world the two professions work together and build on each other's results to propel each other forward of course.
Eh, the trading salesmal problem is more like the Collatz conjecture: it looks simple but there's a lot of complexity hiding under the surface, and it requires some expertise to truly understand why it's really hard. So then we're talking about the opposite problem.
Note that your informal description did not match the TSP since there's no reason to disallow backtracking or visiting the same place twice.
Thanks so much for this link. I remain convinced that papers are so much more understandable with an accompanying talk by the creators. I wish papers would just come with a video talk included.
Exactly, the authors get to eschew the formalism required in papers. Often the core ideas of research are simple in themselves and the real complexity lies in formally proving the results.
Also, I'd not be surprised if someone already invented and used this funnel hashing technique in say the 80's in some game or whatnot but just never realized what they had stumbled onto. Not to diminish the research, it's very ingenius.
I think papers make good references. I think of it more like the equivalent of a "datasheet" for an electronic part, say. Once you understand the intricacies, it's a valuable reference but more often than not, it's not very good and conveying motivation or intuition.
Thanks for the video, def a lot better than the article.
I do find it a bit weird that this is somehow better than just over-allocating (and thus reducing the chances of key collisions, which also makes worst case 'less worse') given his approach also allocates more memory through the aux arrays.
I don't think anybody is really saying it is. Academics treat big-Oh performance on very very full hash tables like a sport. Real world code on real world CPUs often has a more complex cost function than what the academics considered; cache sizes, fitting in a cacheline, memory bandwidth, icache pressure, ...
He's not allocating through aux arrays, he's splitting the already allocated memory into log(n) layers. You can just track those aux arrays with math in the implementation.
It's probably not better than over-allocating except in memory constrained scenarios. But the overhead of funnel hashing is not high - it requires 0 extra memory
Overallcoation has a limit. You only have so much RAM/storage. Beyond that you start swapping.
I could really use a hash table (or similar structure) that degrades less with higher occupancy.
Skimming the paper [1], the key difference they used is that their hash table insertion algorithm will probe further than the first empty slot, instead of greedily filling the first empty slot it finds. They combine this with a clever probing sequence which provably finds empty slots efficiently, even if the table is very full.
This means insertions when the hash table is less full are slower, but you avoid the worst-case scenario where you're probing for the last (few) remaining open slot(s) without any idea as to where they are.
An interesting theoretical result but I would expect the current 'trick' of simply allocating a larger table than necessary to be the superior solution in practice. For example, Rust's hashbrown intentionally leaves 1/8th (12.5%) of the table empty, which does cost a bit more memory but makes insertions/lookups very fast with high probability.
I might be misreading their algorithm, but from my look at the paper the key improvement is a non-uniform strategy where they divide the array into buckets and focus on different buckets as they fill the table. This increases the average number of locations to be probed even when the table is emptier. They still place the item in the first empty slot they see with this strategy.
The "skipping slots" has to do with jumping ahead in the hash sequence.
But you could do some hybrid, where you do greedy fill for a while and then switch to this fancier fill once your table is approaching full (using some heuristic)?
Not really because you have to use the same algorithms during subsequent lookups. Imagine you add a bunch of items with insert algorithm A, then you cross some threshold and you insert some more items with Algorithm B. Then you delete (tombstone) a bunch of items. Now you look up an item, and the slot is filled, and you're stuck. You have to proceed your lookup with both algorithms to find what you're looking for (thereby giving up any performance benefits) because the current occupancy doesn't tell you anything about the occupancy at the time when the item you were looking for was inserted.
This is neat! I always wondered if there would be a way to 'containerize' tables like this. IE a regular table is like a bulk carrier ship, with everything stuffed into it. If you could better organize it like a container ship, you could carry much more stuff more efficiently (and offload it faster too!)
The theoretical properties of hash table always seemed so impressive to me that they bordered on magic (and this just extends them). What seemed crazy was how they could be so much better than trees, which to me were intuitively the most efficient way to store data.
What I realized is that the theory of hash tables involves a fixed-sized collection of objects. For this fixed collection, you create a hash-function and used that like a vector-index and store the collection in a (pre-allocated) vector. This gives a (fuzzy-lens'd) recipe for O(1) time insert, deletion and look-up. (The various tree structures, in contrast, don't assume a particular size).
The two problems are you have to decide size beforehand and if your vector gets close to full, you insert etc processes might bog-down. So scanning the article, it seems this is a solution to the bogging down part - it allows quick insertion to a nearly-full table. It seems interesting and clever but actually not a great practical advance. In practice, rather than worrying a clever way to fill the table, I'd assume you just increase your assumed size.
Edit: I'm posting partly to test my understanding, so feel to correct me if I'm not getting something.
Proofs of constant time operations include time taken to resize the table. This takes much more time (linear in the size of the table), on insertions when the table is resized, but that time is amortized over all the insertions already done. It still works out to constant average time if you grow the table enough each time (once it starts to get too full) so it happens with decreasing frequency.
It looks to me like the idea is, as you generally describe, that you segment your table into a 2d structure (well conceptually) and proceed to fill one ‘row’ at a time until it’s about 75% full, at which point you move on to the next one.
I don’t have time to fully grok the paper, but they claim this makes insertion consistently fast (I believe this until we’re at 75% of total capacity, but maybe they have some other mode for filling when they’re at 75% in every row?). They also claim retrieval is fast, and I didn’t read enough to understand how even retrieval works, or why it is faster.
I’ll put out that there a lot of times that it would be really nice to have a nearly full hash table still, you know, work. You can’t always change the size of one during execution of a program. And, in some environments memory counts a lot. That said, I would like to see and play with an implementation — I’m not sure this is ‘worth it’ in the general case.
It is also probably cache inefficient, as are most things about hash tables, with the exception of linear probing for reading out of a fairly full one, in which case, you get to just keep pulling stuff directly out of memory to check it. So, it’s not clear to me that this is performance wise worth it. Anyway, I’d like to fully understand it, it seems like an interesting new idea.
It looks like the result only matters in the case where the hash table is close to full. But couldn't one just deal with this case by making the table size 10% bigger? (Or, if it is resizeable, resizing earlier)
In reality 75% is the standard fill factor for linear probe that also exhibits the best locality (if the table gets too full it just allocated double (or x) the memory, and copies the existing entries).
Most non-linear probe tables (e.g. cookoo) suffer due to the fact RAM is not 'random' at all.
> And for this new hash table, the time required for worst-case queries and insertions is proportional to (log x)2 — far faster than x.
> The team’s results may not lead to any immediate applications
I don't understand why it wouldn't lead to immediate applications. Is this a situation where analysis of real-world use cases allows you to tune your hash implementation better than what a purely mathematical approach would get you?
I haven't read the paper, but sometimes asymptotic improvements do not translate to real world improvements due to a large multiplicative factor in the complexity that gets factored out in the O() analysis. So the dataset required to see speed-up is impractically large.
In this case "x" is 1/d where d is the unused fraction of space.
So if you leave 0.1% of your hashtable unused your x is 1000 - quite problematic. However if you leave 12.5% of your hashtable unused your x is 8 - quite reasonable, and not something logarithmic behavior would necessarily speed up, for reasonable constants.
This is pretty much exactly the case for this algorithm. It is a very slow hash table due to the lack of locality. It seems to only have benefits at very large size and very high load factor.
At that scale, it may be practically better to use a B-tree or a radix tree over a 128-bit hash of the key.
I'm not up to date on the state of the art but I've implemented hash tables a few times and we would expand the hash table whenever it was 75% full, which means x is never greater than 4. Improving the runtime from O(x) to O((log x)^2) doesn't matter when x is so small.
I imagine there are some niche memory-constrained applications where you'd let x get larger but I never ran into them personally.
Robin Hood hash works well with high load factors ~95% because it balances the average lookup with a fair distribution of buckets. That makes it ideal when you don't want to waste memory on bloated tables.
I think the 0.95 figure for Robin Hood hash tables is rather optimistic. Robin Hood helps in a few ways: it allows for early termination of failed lookups, and it reduces probe-length variance (thereby making the performance of any one hash table operation more predictable). However, it does not reduce the average number of probes needed to perform a successful lookup. The hash-table benchmarks I published last year (https://jacksonallan.github.io/c_cpp_hash_tables_benchmark/#...), which test various hash tables in the 0.44-0.875 load-factor range, show that the performance of Robin Hood tables ("ankerl" and "tsl" in the article) deteriorates rapidly as the load factor climbs high, just like conventional linear- and quadratic-probing tables. This is in contrast to the SIMD and hybrid open-addressing/separate-chaining tables, whose performance is much more stable across the whole load-factor range.
How does this work? Naively I'd have expected that to implement a sparse vector with efficient random access, you'd use a hash table, but I assume this must depend on the sparse vector not being a hash table or otherwise there wouldn't be an improvement. Are there other ways of implementing the efficient random access, or do you sacrifice the performance of random access?
adaptive radix trie can be and is used for sparse arrays, O(log(k)) where k is the key size, const for bounded integers
alternatively, depending on your data and the sparseness, bit vectors can indicate filled and empty slots and popcnt be used to find the actual slot for some index.
For this sort of high-load factor application, its not so much "memory constrained" in the sense of embedded hardware, as is it "memory constrained" in the sense of "my petabyte database table needs 1 TB just to store the index"...
I'm pretty sure nobody uses uniform probing hash tables in the wild. Every time I have wanted to have very high load factors (>90%), cuckoo hashing has been good enough, and below 70-80%, linear probing is blazing fast and absolutely good enough.
Last time I needed high occupancy I was for a cache. So I've stirred my 32b keys with Phi-mul-rshift and randomly (xor-shift) displaced picked slot old value by log2(size)/2 with linear probing of up to log2(size).
In practice the worst-case operations are avoided by reserving a little more space for the hash table. And the new results come at the cost of slower “good case” insertions.
Even when something isn't a galactic algorithm "how well it maps to being done efficiently in real hardware with hardware sized data sizes" is more and more often the important bit over "slightly better bounds at infinity". The former is probing the edge of mathematics while the latter is probing the edges of practical construction so they rarely seem to align much anymore at these depths.
> Even when something isn't a galactic algorithm "how well it maps to being done efficiently in real hardware with hardware sized data sizes" is more and more often the important bit over "slightly better bounds at infinity"
You're just repeating the same claim. Why is that more and more often then important bit? Why is it more important now? Why are these factors not captured in complexity analysis?
For instance, some complexity analysis assumes random access memory of arbitrary size, but memory above a certain size is better modelled with logarithmic access time. But this too can be captured in complexity analysis, so it's not really evidence of any divergence.
And then you have cache-oblivious data structures that scale uniformly across all cache sizes, which is a product of complexity analysis.
So I'm asking for what exactly is being meant with a justification of why you think this matters now more than it did before.
Ignoring the galactic algorithm bit, as it's in the quote but I think you get why it would cause a divide here, just because you could try to do something in a field does not mean that's what is actually being done in a field. This is what I mean by probing the edges of mathematics vs practical construction, using the same toolset to probe isn't enough to unite those two approaches alone.
Look at this paper as an example. Does its worst case Big O analysis attempt to model memory hierarchies for constructable systems or does it further diverge from any hardware considerations in favor of asymptotic behaviors towards infinities for a generic construction?
This is a good summary, but getting down to exactly what I meant: there aren't any hash table authors or maintainers in the business who have been stopped from trying something because of Yao's conjecture. So this result is not going to unleash a wave of practical innovation.
>Why you think this matters now more than it did before.
More of the low hanging fruit has been picked over time. The motivation most of the original algorithms for a lot of computer science problems were practical. Once all (or most) of the optimal solutions for practical purposes have been found you are necessarily (or almost necessarily) left with only theoretical solutions.
I think because computers are more complicated than they used to be. They have more features that need to be accounted for than they used to. This in turned happened because cpu frequencies stopped increasing so we had to find other ways to make things go faster. Like simd, threads, speculative execution, caches.
If all of the latencies to the main memory and L1/L2/L3 caches have been only increasing throughout the years, is picking the right data memory layout more or less important than before?
The answer is obvious, but I don't really consider that evidence of systems programming and complexity analysis diverging. When approaching any given problem, you would still start with a basic complexity analysis to get in the right order of magnitude, and then select among cache-friendly approaches that fall in that range. You almost never go the other way around.
>If all of the latencies to the main memory and L1/L2/L3 caches have been only increasing throughout the years
That's actually not the case at all both L1/L2 (to a lesser extent L3) have been improved dramatically in the past years. L1/L2 are effectively running at a very similar clock to the CPU, e.g. L1 nowadays could be 2 CPU cycles. About the L3, it does depend on the architecture and where the cache coherency occurs - yet generally the latency has not increased.
The main memory (DRAM) latency has been more or less similar, though.
>is picking the right data memory layout more or less important than before?
It has always been important, it's not about the latencies per se but the locality, which has been the case for more than two decades. The general rules are:: 1st, if you data set is small, e.g. N is tiny - it's effectively O(1) (as N is a constant), 2nd) pick bigO that works best [usually O(1) or O(logN)], 3rd) big the datastrtucture with the best locality.
I don't know where do you get the data but the latencies are surely not decreasing and 2 cycles for L1 is totally wrong. I should have perhaps been more specific that under latency I actually meant latency in terms of CPU cycles not the actual number (ns) which obviously depends on the CPU frequency.
Examining the few past generations of both Intel and AMD microarchitectures this is what I concluded. L1 is between 4 and 6 cycles. L2 on Intel is between 12 and 16 cycles whereas AMD is roughly around that but 1-2 cycles faster. L3 on AMD is ~40 cycles where L3 on Intel goes anywhere between 60 and even up to 120 cycles.
It is an engineering feat to increase the capacity of the cache without sacrificing a little bit of latency. It happens that the latency increase is sometimes less visible due to some other effects but generally speaking this is what happens.
> The main memory (DRAM) latency has been more or less similar, though.
~70ns of DRAM latency in a CPU core running @3Ghz is different than the core running at @5Ghz. Former is ~200 cycles while the latter is ~400 cycles. So, the higher the core clock is, higher the latency is in terms of CPU cycles stalled.
> It has always been important
Yes, I am aware of that and I am only trying to make a case for the parent comment since it was asking for a clarification why would it be more important today than it had been (at all) in the past.
> it's not about the latencies per se but the locality
I think it's vice versa because the actual issue we are trying to mitigate is the high DRAM latency. As a consequence it just happens that we use data locality to minimize those effects.
A contributing factor is how complex/smart the current hardware is. It can have cache line size, different forms of hardware/software prefetching, different ports for different ops, memory latency, simd extensions. These leave many opportunities for algorithms to be optimized over. There is also the issue of real life scenarios not matching with asymptotic ones (due to e.g., size of input), which when coupled with previous factors, leads to even more potential optimization schemes. Obligatory reference: https://agner.org/optimize/
1) complexity analysis ignores coefficients which can make a huge difference, especially since computers usually have bounds
2) real life may influence the likelihood of best/worst case. I think data tends to be somewhat sorted in practice so algorithms with best case on sorted data perform better
Complexity analysis typically assumes an ideal Von Neumann machine. Systems programming embraces the discontinuities in registers, L1, L2, L3, ILP, branch prediction, and so on. (Maybe there's a good pun to be had that complexity analysis stays as far away from computer complexity as possible.) The simple model is essential, as it's the only way to create a body of proofs that will last. What's increasingly hard, though, is that there are oodles of papers, of the sort demonstrating that one algorithm or data structure is better than another, yada yada, and there's usually some benchmarking in each, and that kind of empirical demonstration is now far less than useful unless the paper's authors are good systems programmers and have given up on improvements.
Exactly, seems like the Von Neumann machine assumption is outdated, especially with modern supercomputers. Memory access itself isn't O(1), it's more like O(N^(1/3)) because space is a serious consideration, and that's your distance from a focal point to all other points within a 3D volume.
Yet, memory access can be limited by latency and bandwidth. Predictable access patterns (e.g. linear scan) tend to enjoy prefetech reduces the latency of accessing random parts.
Real world hash table can either be resized to avoid worst-case degradation or else be provisioned to have a good amount of slack when the worst expected case occurs. (E.g. a hash table used a cache will apply pressure to evacuate old entries before it gets foo full.)
Bucket hashing obviously has no problem finding a free spot. I didn't say chained on purpose because chains can be resizable vectors, so we load at most one pointer when going from table to bucket.
Bucket can evolve to red/black tree, if there are too many entries given O(logK) (collisions) for lookups. Still bucket based ones tend to be outclasses (both latency and memory footprint) by the dumbest linear probe.
Some hash tables promote updated entries to the front of the chain for faster access next time. It occurs to me that a good old splay tree would do that.
The thing is, you're not supposed to have buckets so large that you need a fancy data structure. If the table is not resizeable, it might not be avoidable.
its more like real-world hash tables are tailored to the specific pattern of querying and inserting data.
if you know empirically that 80% of your requests come to a tiny subset of hot keys - you make a specific hashset just for these hot keys with constant access time, while keeping the rest of the table cold.
your L2 cache is an example of such optimization - a high bandwidth and low latency memory on the CPU die, that is orders of magnitude faster than random DRAM access - a tiered memory model
Isn’t the problem that the scaling behavior only dominates with infinite n?
If you have a constant factor, that doesn’t go into the scaling rule, so having something scale (log x)2 could still be 100 times more expensive than something that scales linearly with x for all x smaller than 2^100.
There are practically important algorithms like this. There are three linear-time algorithms for constructing a suffix array, but all of them have rather large constant factors. Packrat parsing is linear in the size of the input string, but can easily execute more than 100 instructions per byte. And there are numerous theoretically asymptotically superior algorithms whose constant factors are too high for them to be useful at all, the so-called "galactic algorithms", https://en.m.wikipedia.org/wiki/Galactic_algorithm.
There is nothing in the article suggesting that this is such a case, however.
Perhaps the technique requires a lot of additional metadata, so that you could fit a 50% full "normal" hash table in less memory than it takes to store a 99% full hash table using this new approach. Thus the normal hash table can always outperform the new hash table in practice despite worse big O performance because it doesn't hit the pathological worst case except in situations where the new hash table would have run out of memory.
The intro picture about pointers in a drawer immediately reminded me of a talk I saw at FUN with Algorithms 2018 called Mind the Gap that gave me an aha moment about leaving space in data structures. Cool then to try to locate it, and see that it was by the same professor in the article, Martín Farach-Colton.
Read this within my half hour break and man, wow what a story. I'm not a software guy, I'm a sys and net guy. Despite not caring or knowing about hash tables, that articles a great read! Thanks for sharing!
I would like to see this being applied practically. Is there a video demonstrating this or is it still too soon? Is the algorithm secret sauce or will it be open sourced?
I guess the most we could hope for here is that this leads to some other discovery down the road, either in hashtables or maybe one of the similar structures like bloom filters?
Anyone else think this could be used with distributed hash tables to dramatically speed up searching or building them? Maybe more exoticly to LLMs and lookup tables. A clever algorithm like this should be applicable in a lot of more specialized data structures or applications.
It's likely a DHT would greatly benefit from this sort of algorithmic reduction in time and be less susceptible to constant factor overheads (if there are any).
That is the paper Krapivin read in ~2023 and inspired him. The actual paper with the breakthrough is from January 2025: https://arxiv.org/abs/2501.02305.
i feel this article is missing some detail or incorrect in reporting the actual development here. either that or i am missing something myself...
hash tables are constant time on average for all insertion, lookup and deletion operations, and in some special cases, which i've seen used in practice very, very often, they have very small constant run-time just like a fixed-size array (exactly equivalent in-fact).
this came up in an interview question i had in 2009 where i got judged poorly for deriding the structure as "not something i've often needed", and i've seen it in much older code.
i'm guessing maybe there are constraints at play here, like having to support unbounded growth, and some generic use case that i've not encountered in the wild...?
What you are missing is how the hash table behaves when it is almost full. If there is one empty spot left in the whole table, how do you find it when you insert a new entry?
I know that's probably a naive answer, I honestly don't even know how a hash table works. I know how a hash map works, at least some implementations use a linked list as a bucket. So the hash gives you the bucket, then you linear search the bucket for the element. Buckets should be small so the time to search them is negligible, giving O(1) lookup and insert performance.
Obviously this is different from what's being discussed here, this data structure doesn't even really get "full" but it's also the only implementation I know is in practical use. Not sure why one might use a hash table instead
So using buckets with linked lists like that is neither space efficient or fast. A strategy that nowadays is more common and fast is to store conflicts in the table itself using some strategy to find a place in the table itself to put the new entry. The simplest (but not optimal) way to do this is to just take the next one that isn't used yet.
This means a linear scan that once the table gets close to being full will approach O(n). To avoid this, better strategies for choosing the next place to look is used, as well as automatic resizing of the hash table at some occupancy percentage to keep the lookup chains short. Other strategies in use will also approach O(n) but will require resizing on difference occupancy percentage. What is new in this approach is that they manage to go faster than O(n) even at almost full occupancy.
>The simplest (but not optimal) way to do this is to just take the next one that isn't used yet.
The linear probe is by far the most efficient way to build a hashtable on any modern hardware, nothing is near close. Everything else leads to cache trashing on misses. For the nearly full table - that's a mistake - table should not go above a specific fill factor, e.g. the notorious 75% for large tables.
Yes, of course. In practice it still outperforms pretty much anything else. The lower fill factor is still cheaper (memory footprint) than having buckets and indirection.
If you're trying to find an available cubby hole for your boots, you'll find one very quickly if they're almost all empty. But if they're almost all full, it will take longer for you find one. This discovery shows that it won't take as long as previously thought.
hash tables are most famous for their O(1) lookup time, but then you have to deal with collisions. There's many famous ways of dealing with collisions and you can convince yourself of their [worst case] lookup time. The article is discussing a new upper bound on that worst case.
This is always a point of frustration for me. Under the normal big-O model, hashtables aren't, and can't be O(1) lookup, even for the mythic perfect hash function! You can only get the O(1) figure by breaking the big-O model and bolting on some in-practice assumptions[1] that are used nowhere else in discussion of asymptotic complexity.
Specifically, as you resize the table, you need a function with a bigger output space to give you more places to store the item. To have a bigger output space, you unavoidably have to do more computations. If you go from a function with n possible outputs to one with 2n possible outputs, as n goes to infinity, you eventually have to use a function with more steps.
Your sense of frustration seems to stem from thinking about something which is similar to big-O, but not actually being big-O.
All of the points you raise are valid if you want to have a good understanding of actual runtime cost of an algorithm.
None of it is relevant if you want to compare algorithms according to big-O, in order to find one which doesn't lead to failure[1] when faced with large amounts of data.
Perhaps you should define your own big-R notation for what you're after.
What are you taking about? By the actual standards of big-O that allow you to derive the constant run time? See the second linked comment which establishes that it only becomes constant when you add in practical hardware considerations.
No bounded run time function can have an infinite output space (while still halting, anyway).
What you are saying in that post is that a hashmap should be O(log(n)) because if you have a word size w you need k_w = log(n)/w operations to compute the hash of a key that's large enough to not have collisions.
However by that logic the comparison function should also have a similar factor, since by the same argument the key values need to be large enough not to collide. Ie if the key values are just the natural numbers up to n for example, you'll need k_w words to store them and hence comparison takes O(k_w) = O(log(n)).
Thus a binary tree should also have an additional log(n) factor due to this, and hence be O(log(n)^2)).
However the point of big O is to compare, and since both have this log(n) factor you could just cancel that in both, leaving you with the familiar O(1) and O(log(n)) respectively for the hashmap and binary tree.
Is it sloppy to do that prematurely? Kinda yes, but it makes comparisons much easier as you don't have to go cancelling those log(n) factors all the time. And the point of big O is to compare the algorithms, not accurate running time.
1) You should have led with that to get to the point immediately instead of cryptically alluding to the existence of such a point and adding an irrelevant example.
2) Your point was already made in an initial reply[A]. You didn't need to make your comment at all, and if you were going to lazily allude to the existence of an argument, you could have just linked that one. Or, just upvote it. Comment sections don't need repeats of points already made unless you have something to add.
To address the point you did add:
3) You're still conceding the core point that you have to go outside the usual big-O model to get hashtables being O(1). In no other case does anyone take big-O to mean asymptotic complexity relative to comparable solutions for the same problem -- not unless that's explicitly stated. You can't have a data structure with constant-time lookup, period.
I actually kind of get where you're coming from, I think. I'm trying to understand your argument, I think it goes something like this:
1. "But if the universe grows too big, then the integers get too big!". Fair point.
2. You concede that the Word-RAM model does accurately describe an O(1) hashing operation.
3. We now define a new computation model, where hashing is just O(1).
4. "this is still wrong, because this doesn't model the behavior as keys grow to infinity"
The important thing here is that the hashing oracle isn't defined to even care about integer arithmetic complexity. I think one thing you may be confused about is that it seems like the hashing oracle can only be implemented with integer-sized keys, actually the thing that epeople are trying to minimize in hashtable research is calls to that oracle. That oracle can be some genie in a lamp, my deadbeat cousin, or a computer program.
---
I think you even get that though on a logical level, so the crux probably goes something like, "The word-RAM model forces me to be of bounded size, but for higher level intuition, I just forget that? What? And then I stitch them back together? I mean sure, given that it's O(1), I get it, but there's no way this 'stitches together' cleanly, it's not actually O(1)!"
I didn't even know this term, but it seems like people have invented a term for this, the trans-dichotomous model. Basically, word size (integer size) scales with problem definition size. So I imagine that it's just taken for granted that trans-dichotomy sort of just "propagates up" your stack of logic, informally, if you want to understand each and every operation.
I mean, for me, the first paragraph was mostly fine enough for me - analyzing abstract models asymptotically w.r.t. various kinds of cost is already something I'm familiar with, so just saying hashing is an O(1) cost by definition makes sense.
The proofs doesn't have to know about what the hashing oracle actually is - the "integer" in the Word-RAM model is not the same as the "integer" produced by the hashing oracle, even though they both have "O(1)" cost within their respective models. I think viewing it as hooking them up together hurts you.
Word-RAM Integer is O(1) because it's trans-dichotomous, and in this specific hashtable analysis, the hashing oracle is O(1) just because it is by definition, we only care about # of calls to that function.
Even though the intuition is that they're hooking up together - recall that your hashing oracle backend can be your drunk cousin at 3am spouting out a consistent uniform random number mapping in the range [0...n-1], we're not really particularly interested in anything else.
Oh, and also, the oracle must take in two parameters, (key, size of space), and the assumption is that, for each size of space, the key is randomly uniformly consistently mapped. Although in the paper they don't even need to deal with variable hashtable resize schemes so they don't even need to re-define this.
But like again, like, we make these "constants" by saying, "I don't care how the hash function is performed, as long as it satisfies these properties; all I'm gonna try to do is implement an algorithm that does the minimal hashing possible". That's not nefarious, that's standard abstraction practice.
I did try to sit with your doubt for a while though and this is the best explanation I can come up with. If I didn't understand, oh well.
The problem is, big-O doesn't use the word-RAM model mentioned in the comment. That's something you have to bolt on to the get the O(1) hashtable result. Which is the very point I was making originally, that you have to go outside standard big-O and incorporate practical considerations, which isn't done anywhere else as a matter of course.
> That's something you have to bolt on to the get the O(1) hashtable result.
No, you brought that on by claiming not considering it was an error.
In standard big-O you take as an assumption that memory operations are O(1), comparisons are O(1) and calculating hash values are of the same order as making a comparison (and vice versa).
Once you do that, then a hash table becomes O(1) on average, as it's just a O(1) calculation followed by a O(1) memory lookup (assuming no collisions on average).
Again, Big-O is only relevant w.r.t a specific cost analysis.
Hashtable analyses don't care how you get the hash function - it can take exactly 72 hours to arrive, but as long as the oracle satisifes some probability distribution guarantee, it's good. That is an O(1) operation, a single hash.
Like, there's no going "outside" for big O, there's no "standard". It's always backed by some notion of what you are computing. There's pointer machines, turing machines, automata, circuits, for your complexity. Pointer machines can't directly address at offsets, Word-RAM can. Different tradeoffs for different analyses - pointer machines often model GC-collected languages and heavy pointer data structures, while Word-RAM deals with more compact data structures.
Again, as mentioned in the paper, Big O isn't used as a notion for number of total operations, it's only trying to *minimize the number of calls to the hashing oracle*. So your analyses can get even more fine-grained.
You might even be excited to learn about the External Memory Model. In that model, you have two blocks of memory; one block where *every single operation* is free, and one block in which you can't access, you must fetch into memory. You fetch blocks of size B, your local memory is B << N, and external memory is N << M. Even operation in block N is free, but and the only thing you can do on block M is read/write a block of size B. The question then is, to fetch the minimum number of blocks for your specific algorithm.
I've been originally confused by algorithms in this model, too - sometimes the "real number" of operations would be an order of magnitude more than the number of block accesses, but we just don't even count them? How does that make sense?
Turns out this actually models real hardware surprisingly well - it's often faster to simply recompute and do extra computation in order to minimize memory accesses. It's used a lot in databases. Although I can't find the exact wording "External memory model", the FlashAttention paper uses the wording "IO/Aware" and analyzes block transfers like this on the GPU, so... yeah.
So going 'higher up' in the abstraction stack doesn't necessarily sacrifice practicality either; in fact, it can sometimes help.
Sure, we can view the intuition as chaining together layers of abstraction, I get that view. But each layer functions perfectly logically, independently of the others. It's just like good decoupled software practice.
> you have to go outside standard big-O and incorporate practical considerations, which isn't done anywhere else as a matter of course.
Hopefully I've explained it better. Different models of computation are a fundamental part of theoretical computer science, since of course, as you've pointed out, you have to be very careful about defining operations. At the same time, part of the great part about CS (and math in general) is, that you can specify, "Assuming these conditions, what can I logically derive?" and that's just done everywhere.
To be fair, in tree tables, we also assume that comparison is a constant time operation as well when it also necessarily needs to be O(log(n)) for an arbitrary sized table, making it an O(log(n)^2) data structure by this standard.
Good point. Perhaps we could store the length of the common prefix from the last time we went left as well as from the last time we went right. The minimum of those two should remain a common prefix with the search key for the rest of the lookup and should therefore be safe to skip during subsequent comparisons.
But that would actually help it because you can store how far down the match is by that point in the tree, and know how few digits you have to actually compare.
Yeah usually when article starts by explaining what is known or tells you a story they do it to fluff up the text and bait you into reading because the actual content you want is not there.
It's also possible that they don't actually know the difference, or was even asked to change it by an editor to make it sound more trendy. Skepticism of even good journalism is healthy. Gell-Mann amnesia and all.
It would be better if it came from AI, as it would show that AI will be able to make helpful improvements to technology.
AI has come up with impartments to other algorithms, such as matrix multiplication, so it's not super farfetched for it to come up with something similar to this, especially with all the improvements to AI lately.
I told it is was possible, the worst case runtime characteristics and asked it to figure out how you would achieve it. I also gave it some hints from the paper.
[1] - 2nd to the last paragraph: "The fact that you can achieve a constant average query time, regardless of the hash table’s fullness, was wholly unexpected — even to the authors themselves."
Forgive my mathematical ignorance, but why would it be? Isn’t it just asymptotically close but not actually equal? What does the 0’s repeating give you that 1 does not?
All of the answers given at this point seem to rely on intuition and how things "should" work. Intuition like that is great, except when trying to approach something that seems like a paradox when applying just intuition.
An expression like "0.999 repeating" does not just fall from the sky, after which we go ahead and probe it to figure out what it is or means or what it's equal to. A mathematical construction is precisely defined. So one goes to the definition and asks: what does "0.999 repeating" mean? After several rounds of unpacking definitions and verifying certain properties of convergent real sequences, one arrives at the conclusion that "0.999 repeating" is in fact 1. They are one and the same. Not "asymptotically", not "approximately" – they are the same. They are just written out differently. Your and mine handwriting is probably different, but that doesn't mean the number 1 written by one of us is different from the number 1 written by the other.
This is a typical misunderstanding about mathematics. Everything in mathematics is defined by humans, and the definitions can be unpacked layer by layer. The natural world is full of things that just exist, fully formed, without human interaction. The natural sciences give us wonderful tools to probe those things and figure out what they are and how they are. That's excellent, but it's usually not the right tool for answering questions like in this thread.
Asymptotic analysis is only relevant if you have some series (or a function).
The series 0.9, 0.99, 0.999,... is asymptotically close to 1 and also asymptotically close to 0.9999... (with infinite 9s), since for any epsilon, I can find an index N after which all elements of the series are within epsilon of the target.
Since a single series can't have two limits, 1.0 should be equal to 0.999...
Note that real numbers are allowed to have infinite digits after the point, otherwise they wouldn't include things like 1/3.
> “Our brains are just not wired to do probability problems very well, so I’m not surprised there were mistakes,” Stanford stats professor Persi Diaconis told a reporter, years ago. “[But] the strict argument would be that the question cannot be answered without knowing the motivation of the host.”
This is wrong. Let’s label the goats A and B to simplify things (so we do not need to consider the positions of the doors). There are 3 cases:
1. You pick the right door. The other two doors have goats. The host may only choose a goat. Whether it is A or B does not matter.
2. You pick the door with goat A. The host may only choose goat B.
3. You pick the door with goat B. The host may only choose goat A.
The host’s intentions are irrelevant as far as the probability is concerned (unless the host is allowed to tell the contestant which door is correct, but I am not aware of that ever being the case). 2/3 of the time, you pick the wrong door. In each of those cases, the remaining door is correct.
The most strict argument is yet another statistics professor got basic statistics wrong.
> "The problem is not well-formed," Mr. Gardner said, "unless it makes clear that the host must always open an empty door and offer the switch. Otherwise, if the host is malevolent, he may open another door only when it's to his advantage to let the player switch, and the probability of being right by switching could be as low as zero." Mr. Gardner said the ambiguity could be eliminated if the host promised ahead of time to open another door and then offer a switch.
The hosts’s intentions absolutely do matter, because the problem (as originally stated) doesn’t specify that the host always opens a door and offers a switch. Maybe he only offers a trade when you initially picked the good door.
The problem as stated does not give the host such an option. If it did, the host opening a door would imply that the player picked the right answer, and it would only happen 1/3 of the time.
Some of the comments aged fairly well, although not in the way that their authors intended:
> There is enough mathematical illiteracy in this country
> If all those Ph.D.’s were wrong, the country would be in some very serious trouble.
In the 1800s, Carl Friedrich Gauss lamented about the decline in mathematical ability in academia. Despite academia since having advanced mathematics farther, mathematical ability in academia still has evidence of decline. Professors tend to be good at extremely specialized things, yet they get the simple things wrong. I once had a Calculus professor who failed to perform basic arithmetic correctly, during his calculus class. All of the algebra was right, but his constants were wrong. This happened on multiple occasions.
The problem as stated, in the article you pulled your quote from, puts no limits on how the host decides whether or not to offer a switch:
> Imagine that you’re on a television game show and the host presents you with three closed doors. Behind one of them, sits a sparkling, brand-new Lincoln Continental; behind the other two, are smelly old goats. The host implores you to pick a door, and you select door #1. Then, the host, who is well-aware of what’s going on behind the scenes, opens door #3, revealing one of the goats.
> “Now,” he says, turning toward you, “do you want to keep door #1, or do you want to switch to door #2?”
All you know is that in this particular instance the host has opened a door and offered a switch. You cannot conclude that the host always opens a door and offers a switch.
The problem as stated allows the host to offer switches only when the contestant picked the door with the prize, or only when the moon is gibbous, or only when the tide is going out. Diaconis and Gardner are completely correct to point out that the problem as stated is under specified and that the intent of the host matters.
The problem as stated has the host open an incorrect door and offer the player a chance to change his choice. Inferring that another possible variation might exist does not change the fact that we are discussing the variation that was presented. Both you and Diaconis are wrong.
> The problem as stated has the host open an incorrect door and offer the player a chance to change his choice.
Correct, in this one particular instance. You cannot conclude from this particular instance that the host always opens the door and offers a change.
> Inferring that another possible variation might exist
is totally reasonable, while denying the possibility that the host might be able to choose his actions specifically to benefit or screw you
over is an unwarranted leap.
The problem statement does not put constraints on the host. You cannot solve the problem by assuming that those constraints exist and then attack those like Diaconis who point out that those constraints don’t exist and that the thing that is unconstrained matters.
There is a genuine human language problem here, NOT A MATH PROBLEM, which accounts for two differing but self-consistent views. It is a legitimate difference in views because human language is genuinely ambiguous.
"You pick a door, the host opens another door and reveals a goat. Should you switch?"
Does this mean you are in one particular situation where the host opened a 2nd door, with a goat? Or does it mean the host always opens a 2nd door with a goat?
If the host always opens a 2nd door, showing a goat, you should switch to the third unopened door.
If all you know, is this time you picked a door, then the host revealed a goat, you don't know what to do. Maybe this host only opens goat doors after you pick the right door, in order to trick you into switching? In that case switching would be the worst thing you could do.
A host with that strategy is a special case, but special cases where a potential general solution (always switch) doesn't work, are all you need to disprove the general solution. It cannot be a general solution if their is even one special case it doesn't work.
Most people interpret the problem to mean the host always reveals goats.
But if the language isn't clear on that, then you do have a different problem, whose solution is really impossible to optimize for without some more information on general host behavior or strategies. Without that information, all you can do is flip a coin. Or always stay, or always switch. You have no means to improve your odds whatever you do.
The problem statement does put constraints on the host, by specifying that the host opened an unselected door with a goat behind it, only to ask if the player wants to change his choice. The answer to the question of whether the player should change the choice is well defined. Other variations are irrelevant since they are different problems.
Your argument is equivalent to denying that 2 + 2 = 4 is correct because the author had the option to write something other than a 2 as an operand.
Nope. The probability theory doesn't work like that. When you argue that 2+2=4 you assume 2 and 2 are known and they are not.
A=you picked the car at first
B=the host opened the door
P(A|B) can be anywhere between 0 and 1.
In your calculations you assume that P(A|B)=P(A) which is correct ONLY if A and B are independent. Independence of A and B is not in the problem statement, you invented this clause yourself.
> All you know is that in this particular instance the host has opened a door and offered a switch. You cannot conclude that the host always opens a door and offers a switch.
And in this particular instance, it makes sense to switch.
I'm sorry, but the problem is well-formed and well-specified.
> And in this particular instance, it makes sense to switch.
Are you are accepting that the host might be someone who only opens a door with a goat when your first choice was the door with a car behind it, and still arguing that you should switch?
The question asks what the best choice in this situation is, not a different situation. The answer does not depend on whether any other situations exist.
We don't really know the situation if all we were told is that we picked a door, and the host showed us a goat behind another door.
If we know that the host will always show us a goat behind another door, then yes, we should clearly switch.
If the host typically lets us just open the door, but will show us a goat before we open the door if the show is running too fast and they need to kill time, then we should switch if offered.
If the host typically lets us open the door, but will show us a goat if the show is running too fast, or if the prize budget is running low and we picked the car, then we should switch if we think the previous games went quickly, but not if there were some slow games already.
If the host only shows a goat when the contestent picks the car, then we should never switch.
Many problem statements include that the host always shows a goat; and if it doesn't you can kind of assume it, because it's a well-known problem, but if it's a novel problem and unsaid, then how are you supposed to know? I haven't watched enough Let's Make a Deal to know if they always give a second choice. Reading the NYT article linked elsewhere in the thread, I am reminded that Monty Hall could offer cash to not open doors too, so with the problem as stated and Let's Make a Deal being referenced, I have to assume an antagonistic host, unless provided with more information on their behavior.
As stated, assuming unknown behavior of the host, we can't put a number on the probability of switching.
Also, to address another point you made elsewhere in the thread. In addition to specifying the host behavior, it should also be specified in the problem statement that the car and goat positions were determined randomly, or at least the car was random, and the two goats are considered equal and assigned as convenient.
> So let’s look at it again, remembering that the original answer defines certain conditions, the most significant of which is that the host always opens a losing door on purpose. (There’s no way he can always open a losing door by chance!) Anything else is a different question.
That was how everyone interpreted the original column (according to thousands of letters sent to Marilyn vos Savant), and it was made explicit in a clarification made in a follow up to the column. What you are discussing is a different problem.
When you say "it makes sense to switch" you assume that P(A|B)=P(A) which is correct only if A and B are independent. Their independence is not given in the problem statement.
That they are independent is the only reasonable assumption, everything else is going out of your way to complicate the problem.
If they are NOT independent because Monty only shows you a goat behind the door if you've picked the wrong (or right) door, this is giving the game away. You don't need to guess, you always know what to pick with 100% certainty based on Monty's algorithm.
(Also, the game show doesn't work like this. And the text doesn't mention Monty's motivations, which in standard logic puzzle formulation must mean they are irrelevant, just as the phase of the Moon is also irrelevant and you must not take it into consideration)
If Monty picks randomly instead of always a goat, and he shows a car, the game has ended and no probabilities are involved, because you don't get to switch anymore; you've lost.
If Monty opens a door and there's a goat, we're within the parameters of the problem as stated (and you should switch!).
> That they are independent is the only reasonable assumption
No, this is not true. From the mathematical viewpoint Monty can have any strategy as long as it satisfies the problem statement. Which is, he DID open the door for whatever reason, the rest is uncertain. This literally what Diaconis means when he says "the strict argument would be that the question cannot be answered without knowing the motivation of the host" -- yes, in the strict sense he is indeed correct. This thread started because ryao stated that Diaconis is wrong [1].
Now even if you try to play the card of "reasonable assumptions" and rule out "boring" strategies because they are "giving the game away" this still won't eliminate all "non-independent" cases. The space of possible probability distributions here is way bigger than your list above. I can come up with an infinite number of "reasonable non-independent" strategies for Monty.
For example:
1) He rolls a dice before the game in his dressing room, secretly from the audience.
2) If he gets 6: he will open a door if you guessed incorrectly. If you guessed correctly he won't open the door.
3) If he gets 1-5: he will open a door if you guessed correctly. If you guessed incorrectly he won't open the door.
The situation is still the same: you've made your guess, then Monty opened the door with a goat and now you need to figure out whether to switch or not. It matches the problem definition stated above: the door was opened but we don't know why.
Let's see your chances if we assume Monty follow the dice approach:
event A: you've guessed correctly from the first try
event B: Monty opened the door
P(A|B): probability that you've guessed correctly given that Monty opened the door -- if it's less than 50% you should switch
P(A) = 1/3
P(B) = (1/6)x(2/3) + (5/6)x(1/3) = 7/18
P(AB) = (5/6)x(1/3) = 5/18
P(A|B) = P(AB)/P(B) = 5/7
So, in this case Monty doesn't "give up the game" -- there's still a significant random aspect to it. However in this setup for you it's better for you to stay (5/7 of winning) rather than switch (2/7).
You're right: I was focused on Monty picking a door with a goat depending on whether you had picked the right door. That would certainly give the game away, but indeed is not the only option.
However,
> Now even if you try to play the card of "reasonable assumptions" and rule out "boring" strategies because they are "giving the game away" this still won't eliminate all "non-independent" cases. The space of possible probability distributions here is way bigger than your list above. I can come up with an infinite number of "reasonable non-independent" strategies for Monty.
None of the assumptions you proceed to list are "reasonable". They introduce enough to the puzzle that they ought to be stated as part of the problem. Since they aren't, it's safe to assume none of those are how Monty picks the door.
Your "dice rolling" formulation of the puzzle is nonstandard. If you want to go with it, you must make it clear in the presentation of the puzzle. There are infinite such considerations; maybe Monty observes the phase of the Moon, maybe Monty likes the contestant, and so on... it wouldn't work as a puzzle!
Given no additional information or context, all we're left with is assuming Monty always opens a door with a goat behind it.
If we want to introduce psychology: I bet you almost all of the naysayers to vos Savant's solution to the puzzle are a posteriori rationalizing their disbelief: they initially disbelieve the solution to the standard puzzle, then when shown it actually works, they stubbornly go "oh, but the problem is underspecified"... trying to salvage their initial skepticism. But that wasn't why they reacted so strongly against it -- it was because their intuition failed them! I cannot prove this, but... I'm almost certain of it. Alas! Unlike with probabilities, there can be no formal proofs of psychological phenomena!
> Given no additional information or context, all we're left with is assuming Monty always opens a door with a goat behind it.
If you're playing against an opponent and trying to devise a winning strategy against him you can't just say "given no additional information or context, all we're left with is assuming his strategy is to always do X" and viola: present a strategy Y that beats X.
In this case X is "always opens a door with a goat behind it" and Y is "always switch doors". This is fascinating but simply incorrect from the math standpoint.
> Your "dice rolling" formulation of the puzzle is nonstandard. If you want to go with it, you must make it clear in the presentation of the puzzle. There are infinite such considerations; maybe Monty observes the phase of the Moon, maybe Monty likes the contestant, and so on... it wouldn't work as a puzzle!
The "dice rolling" it's not a problem formulation, it's one of the solutions to that problem i.e. specific values of X and Y that satisfy all the requirements. I present it to prove that more than one solution exist and furthermore not all solutions have Y="always switch", so you can't establish Y independent of X.
They key difference here is that I don't consider it as a "puzzle", whatever that means. I consider it to be a math problem. Problems of this kind are often encountered in both Game Theory and Probability Theory. It's perfectly fine to reason about your opponents strategies and either try to beat them all or find an equilibrium: this is still math and not psychology.
You can argue that it's a puzzle instead and I don't mind. What I do mind however is saying that Diaconis was wrong. He specifically said "the strict argument would be..." meaning that his conclusions hold when you consider it as a math problem, not as a "puzzle". My whole point is to demonstrate that.
> They key difference here is that I don't consider it as a "puzzle", whatever that means. I consider it to be a math problem. Problems of this kind are often encountered in both Game Theory and Probability Theory. It's perfectly fine to reason about your opponents strategies and either try to beat them all or find an equilibrium: this is still math and not psychology.
This was quite obviously a puzzle, of the "math problem" kind. It admits a pretty straightforward -- but counterintuitive -- solution, which made some admittedly smart people upset.
Everything else is smoke and mirrors.
> this is still math and not psychology.
If you read the responses to vos Savant's column, they are quite emotional. There was quite obviously an emotional response to it, of the "stubborn" and/or "must attack vos Savant's credentials" kind, too.
Indeed. But the hosts machinations can be arbitrarily more complex; maybe he offers the switch to contestants he finds attractive only when they’ve picked a goat, and contestants he finds unattractive when they’ve picked the car.
This works as a logic puzzle. Assuming the host offers different doors depending on contestant attractiveness makes absolutely no sense. It's a bizarre assumption.
Maybe the goats can wander from door to door, or maybe there is no car, or maybe behind all of the doors there are tigers. Which would be absurd and unrelated to this puzzle.
The hosts' strategy is the core of the puzzle. The question is what information he has just conveyed to the player by opening the door and that depends entirely on his mental state. If he was always going to open a door then the player should switch. If he is opening a door only if the player has picked the car then they should not switch. If he has bizarre ad hoc motivations then the correct decision depends on bizarre ad hoc considerations.
And, as CrazyStat has correctly pointed out, as stated in the linked article the hosts' strategy is an unknown. It could be bizarre. Although I'd still rather say vos Savant was correct in her reasoning; since the answer is interesting it seems fairer to blame the person posing the question for getting a detail wrong.
The source material for the article says otherwise:
> So let’s look at it again, remembering that the original answer defines certain conditions, the most significant of which is that the host always opens a losing door on purpose. (There’s no way he can always open a losing door by chance!) Anything else is a different question.
You need to know why the host did that though. The host might have an adversarial strategy where they only open the door when using vos Savant's logic would make the player lose.
Her logic was really interesting, it is easy to see why she gets to write a column. But at the end of the day the problem is technically ill-formed.
> There’s no way he can always open a losing door by chance!
Yes there is, he might have picked a remaining door at random with the plan of saying "You lose!" if he finds the car. An actor with perfect knowledge can still have a probabilistic strategy. That doesn't really change the decision to switch, but it does have a material impact on the analysis logic.
She's correct that is a necessary assumption to get the most interesting form of the problem. But that isn't what the questioner asked.
I doubt the person writing the question had the more nuanced version in mind. When the column was published, nearly everyone who wrote to Marilyn vos Savant had been in agreement that the host always opened a door with a goat as the problem was specified. This idea that the host did not was made after Marilyn vos Savant was shown to be correct and after she had already confirmed that it was part of the consensus.
Adding ad hoc hypotheses about the host's motivations turns this into a family of related problems, but not The Monty Hall problem.
For any given logic puzzle, you can safely assume anything not specified is outside the problem.
Here, what Monty had for lunch, whether he finds the contestant attractive, or some complex algorithm for his behavior is left unspecified and -- since this is a logic puzzle -- this must mean none of this matters!
Imagine if Monty opened a door with a goat only if he had had goat cheese for breakfast. Sounds ridiculous for the logic puzzle, right?
We can safely assume, like Savant, that Monty always picks a door with a goat, turning this into a logic puzzle about probability.
Anything else is going out of your way to find ambiguity.
Well sure, it doesn't appear that vos Savant was asked the Monty Hall problem. She seems to have been asked an ill formed alternative problem and answered that instead. Then the interpretation of the ill formed question with the most interesting assumptions about the host's behaviour became the Monty Hall Problem.
And the linked article (and by extension Mr. Diaconis & CrazyStat) was talking about the question that vos Savant was asked as opposed to the one where the assumptions to come to an answer are enumerated.
> We can safely assume, like Savant, that Monty always picks a door with a goat, turning this into a logic puzzle about probability.
No we can't. Otherwise we can safely assume any random axiom, like "The answer is always the 3rd door". You have to work with the problem as written.
This is like being asked how to solve a (legal) Rubik cube configuration and then considering how close you can come to a solution if given an illegal configuration. It is not relevant since it is not what was presented. You can always make things more complex by considering variations that are not relevant to the original problem.
This was a real life game show where the host did not always open a door and offer a switch. In fact most of the time he did not offer a switch. See [1] (Ctrl-F cheating for the relevant paragraph).
> [...] the problem (as originally stated) doesn’t specify that the host always opens a door and offers a switch. Maybe he only offers a trade when you initially picked the good door.
That would make no sense. Also, Monty always opens a door with a goat. The problem is well-formed, but most people misrepresent it in order to object to it.
Nope! That’s you adding a constraint that does not exist in the original problem.
> Was Mr. Hall cheating? Not according to the rules of the show, because he did have the option of not offering the switch, and he usually did not offer it.
From [1].
Constraints matter. Don’t play fast and loose with them.
You have removed a constraint from the original problem. What could have happened does not matter since we are being asked about what did happen.
Imagine two people playing a game of chess. Various moves are played. Then you are asked a question about the state of the board. The answer depends on the state of the board. It does not depend on the set of all states the board could have taken had one player made different choices.
Oh yes, the problem text specifies that in this particular instance the host opens a door and offers a switch. It does not specify that the host does this every time, which is the constraint in question.
It's typically considered unnecessary to specify that, because it comes from a game show where he always reveals one wrong door. Monty Hall was the first host of the show.
> because it comes from a game show where he always reveals one wrong door.
Nope!
> Was Mr. Hall cheating? Not according to the rules of the show, because he did have the option of not offering the switch, and he usually did not offer it.
This is about the only proper counter I've read so far. Why did it take so long to post?
This does indeed change the whole problem. I would argue that the problem as stated in vos Savant's column is different (and she says as much later on, "all other variations are different problems"), but I admit this makes me lose the supporting argument I've been using of "...and this is how Monty's game show worked". Point conceded.
I would also argue most people who objected to vos Savant's solution weren't considering Monty's strategy at all. They were objecting to the basic probabilities of the problem as stated by vos Savant, merely because they are counterintuitive (which can be summed up as "if you switch, you're betting you got the first guess of 1/3 wrong"), and everything else is an a posteriori rationalization.
> I posted exactly the same thing in a reply to you way earlier in the thread [1]. It didn't take so long to post, you just weren't paying attention.
I'm not asking why YOU took so long to respond or finding fault in your reasoning abilities, I'm saying there's been a lot of arguing in general in this sub-topic, and few people mentioned this fact -- which is the only relevant fact for challenging vos Savant's formulation of the problem (which matters because it's what sparked all this fuss).
> It didn't take so long to post, you just weren't paying attention.
This is the most dismissive possible thing to say, especially in response to a comment of mine where I'm conceding a point. I missed ONE other particular comment of yours, hence "I wasn't paying attention"? Wow. Sorry for not following your every response to everything.
> [...] the discussion was about Persi Diaconis.
I don't know nor care who Diaconis is, I just care about whether the Monty Hall problem truly was underspecified or not. This is about the Monty Hall problem, not about some person.
Ah, but probability is all about hypothetical instances, and how the host makes his decisions—or if he’s allowed to make a decision at all—is a key consideration in the calculation of the probability. If we don’t know how the host decides whether or not to offer a switch then we can’t calculate a probability and can’t decide which choice is better.
I see your point. You are arguing that the fact that the host did this could convey additional information that would affect the distribution. This criticism still does not seem valid to me because this argument can be used to alter the correct answer to a large number of problems.
Consider the question of whether John Doe did well on his mathematics examination. This would seem like a straightforward thing depending on the questions and his answers. We can assume they are provided as part of the problem statement. We could also assume that a definition for “did well” is included. We could then consider a situation where under chaos theory, his act of taking the examination caused a hurricane that destroyed his answer sheet before it had been graded. This situation was not mentioned as either a possibility or non-possibility. However, we had the insight to consider it. Thus, we can say we don’t know if he did well on his mathematics examination, even though there is a straightforward answer.
Another possibility is that game show could have rigged things without telling us, with a 90% chance of the prize is behind behind door #1, a 9% chance that the prize is behind door #2, and a 1% chance that the prize is behind door #3. Which door was the initial choice would then decide whether the player should change the choice, rather than anything the host does. However, this was not told to us, but to avoid saying that choosing the other door is always the answer, we decide to question the uniformity of the probability distribution, despite there being no reason to think it is non-uniform. Thus, assuming that the game show might have altered the probability distribution, we can say not only that the host’s intent does not matter, but we don’t know the answer to the question.
To be clear, my counterpoint is that these considerations produce different problems and thus are not relevant.
You might live in a world where the host doesn't want to give you the car, and only opens a door and offers you the option of switching if your first choice was the door with the car behind it. In that world, you shouldn't switch. I don't think this form of the problem statement gives you any reason to believe that you aren't in that world.
> So let’s look at it again, remembering that the original answer defines certain conditions, the most significant of which is that the host always opens a losing door on purpose. (There’s no way he can always open a losing door by chance!) Anything else is a different question.
Yes, I am discussing a different problem, and I don't think the original problem formulation gives enough information to distinguish between the 2 problems.
The answer can add assumptions, which is fine. I'm not passing judgement on Marilyn vos Savant. I do object to claims that the problem statement is sufficient to have a single answer, and based on that, I'd object to claims that somebody in that situation would be wrong not to switch doors. I would object on exactly the same grounds to anyone who tells you "you're wrong, there's a 50% chance of getting a car" (I might object further, on the grounds that the most obvious interpretation which gives that answer is inconsistent with this form of the problem statement).
If you're discussing a different problem, then it's not the Monty Hall Problem, which we're discussing here.
It's a probabilities logic puzzle, it's not about psychological tricks. Anything of that sort is an extraneous ad hoc hypothesis that you're introducing.
The point is whether, upon the reveal of a goat, you should switch or stick to your original choice. Nothing else matters. What Monty had for breakfast doesn't matter. Whether he likes you or not doesn't matter.
Following your arguments throughout this thread, I think the piece that is confusing you is the framing of the problem as a game-show host, which primes you to think of the host being "fair" by default.
To understand how the framing might change how you interpret the problem, consider the following scenario: You are in a game of poker, and you have a flush with king high. Your opponent reveals all but one card from their hand, which shows they have 4 hearts, and they also reveal that their last card is an ace, but they don't reveal its suit. It's your turn to bet. Do you bet, or do you fold?
Now you could treat this as a simple statistics problem -- there are four possible aces they could have in their hand, and only one is a heart, so only a 1/4 chance they will beat you. But is the solution to this problem that there is a 3/4 chance of winning the pot? In the problem text, we haven't specified under what conditions your opponent will reveal which cards in their hands. But somehow, by saying it's a game of poker makes you think that they probably are more likely to reveal their hand if they are bluffing, so the true probability is not 3/4.
We are primed by this description of this person as your "opponent" to think about them making the decision adversarially. What if instead we say that that game of poker is part of a game show and your opponent is the host of the game show? Depending on the assumptions you make about your opponent's motivations, you must calculate the odds differently, and simply saying "3/4" is not unambiguously correct.
This was the problem as stated in the Marilyn vos Savant column that started the controversy:
> Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say #1, and the host, who knows what’s behind the doors, opens another door, say #3, which has a goat. He says to you, “Do you want to pick door #2?” Is it to your advantage to switch your choice of doors?
Diaconis is in fact correct that given just that information the problem cannot be solved. What is missing is a statement that the host will always reveal a goat and always offer you a chance to switch doors.
If the host can chose whether or not to make the offer then if you you happen to receive the offer when you are on the show you cannot say anything about whether or not switching is to your advantage.
For instance suppose the show has given away a lot of cars earlier in the season and the producers ask the host to try to reduce the number of cars given away during the rest of the season. The host might then only offer switching when he knows the contestant has picked the car door.
He will still open a goat door first because that's more dramatic. He just won't offer to let you switch before going on to open either your door or the remaining door.
I don't think "motivation of the host" is a great way to accurately describe the issue that Diaconis is calling out, it is rather intended to be more intuitive.
In a precise way, the reason the question is underspecified is because it doesn't say if the probability of the host offering you a chance to choose again is dependent on which choice you make. If the host offers the choice more twice as often when your pick right and when you pick wrong, then changing you pick is the incorrect choice.
Now, colloquially, it can makes sense to assume the host always offers the choice, but practically, if we're looking at how to use statistics in a real world situation, that isn't a safe to always assume that probabilities are independent.
The question as stated does not permit such a choice by the host since if it were a choice, it had already been made.
This is like being presented with a nearly completed game of chess, asked if the loser can lose in 1 move and then arguing that the answer is more nuanced because there might have been other moves taken that produced a different end games rather than the ones that produced this particular end game. We do not care about those other end games, since we are only considering this particular one.
> The question as stated does not permit such a choice by the host since if it were a choice, it had already been made.
Whether the choice was already made by the host makes no difference, what matters is what information about the hidden state can be derived from that choice.
Let's say the rules of the game are modified to sat that the host never offers a re-selection when you already have selected a door with a goat. Then if the host has offered you a re-selection you should definitely not take it because you already have the good prize. You know this because the re-selection offer provides information about what is behind the door you selected.
In fact, any time your choice of door has amyy statistical effect on whether a re-selection is offered, then a re-selection offer (or lack) provides a small amount of information that modifies the expected value of choosing a new door.
> This is like being presented with a nearly completed game of chess, asked if the loser can lose in 1 move and then arguing that the answer is more nuanced because there might have been other moves taken that produced a different end games
It is absolutely nothing like that. That is not a question about statistics or probability.
I think this explanation is just cope. Nothing about the problem should lead you to believe that the host is an evil genie purposefully trying to trick you. Attacking the framing device for the problem is the kind of post-hoc rationalization you make after failing at a probability test.
> Nothing about the problem should lead you to believe that the host is an evil genie purposefully trying to trick you.
Is it really unreasonable to assume that the host would like to keep the car? As I see it, that's the economic intuition behind why most people don't switch.
I think we're disagreeing about how much it's reasonable to assume. I'm happier treating it as a self contained problem (in which case I'd say that the form quoted by CrazyStat is underspecified); but if you're familiar with the TV show it's based on, you can reasonably assume that he always opens a door with a goat and gives you a chance to switch.
My objection is to the claim that "most people get it wrong", if most people are being fed the underspecified problem. I think the gut reaction is not to switch, because in most comparable situations across human experience it would be a mistake (imagine a similar situation at a sketchy-looking carnival game rather than a TV show). They then try to justify that formally and make mistakes in their justification, but the initial reaction not to swap is reasonable unless they've been convinced that Monty Hall always opens a door with a goat and gives a chance to switch.
> My objection is to the claim that "most people get it wrong", if most people are being fed the underspecified problem. I think the gut reaction is not to switch, because in most comparable situations across human experience it would be a mistake (imagine a similar situation at a sketchy-looking carnival game rather than a TV show
This may have a role to play. However there is a long history of people who aren't "going off their gut", including statisticians, getting this wrong with a very high level of confidence. It seems pretty clear that there is more than just an "underspecificity" problem. If you properly specify the problem, you will get similar error levels.
I agree, but I believe the reason for the errors is because people intuitively have a pretty good grasp of the game theory for the situation where someone is trying not to give you something they promised (and it's the sort of thing where IRL you shouldn't believe somebody trying to convince you to change your mind, so it's a useful bias to ignore parts of the problem even when it is fully specified). I believe that the statisticians then try to justify that, and end up making incorrect arguments.
> I agree, but I believe the reason for the errors is because people intuitively have a pretty good grasp of the game theory for the situation where someone is trying not to give you something they promised
Unfortunately this doesn't match reality. The vast majority of people who got the problem wrong when it was first published are not confused about the rules and insist that the chances are even (same chance to get a car switching as not switching). This doesn't match a theory that these people think the host is trying to trick the player in some way.
Additionally, You can reframe the problem and will still see significant error rates.
It contains a clarification that the article omitted from the description:
> So let’s look at it again, remembering that the original answer defines certain conditions, the most significant of which is that the host always opens a losing door on purpose. (There’s no way he can always open a losing door by chance!) Anything else is a different question.
Yes, of the host opens a door, it will always be a losing door. Nobody is disputing that.
The part that is underspeified is: does the host always open a door and if not, does the player's choice of a door impact whether the host opens a door?
I think you should take the time to understand why this explanation matters. It reveals some important things about how people can make mistakes with statistics. Not understanding something doesn't make it "cope".
Did you not read the entire article on how scads of intelligent people got this wrong? And the explanation of why they got it wrong? It’s like following a map that carefully routes you around a sinkhole, and then stepping right into the sinkhole.
I read the entire article. They all had defective reasoning. The player picked an option with a 1/3 chance of being right and a 2/3 chance of being wrong. The host’s action did not change that. However, the host’s action did make the remaining door have a 2/3 chance of being right and a 1/3 chance of being wrong.
This is downvoted, but correct. Which door is right and which door is wrong doesn't get reshuffled when the host removes a wrong door, so even though there's only 2 doors left the chance isn't 50:50, it's still 33:67 - with the player having most likely chosen a wrong door.
It's not correct. P(A|B)=P(A) only if A and B are independent.
Requiring independence in this case literally means "the host opens the door regardless of the player making the right or wrong choice first time". It's a core assumption in your calculations, without it the math is not correct.
I need to write a blog post or something convincing everyone we need to stop talking about the Monty Hall problem and replace it with a new problem with all the ambiguities removed. (Unless ambiguity is the point, then Monty Hall is fine.)
There are no ambiguities in the Monty Hall problem. It's usually people who skim read and make assumptions about the challenge. No new problem is going to stop people from skim reading.
For example, going by that ddg search, one result is making a fuss about not knowing whether Monty opens a door at random and happens to show a goat, or purposefully opens a door with a goat behind it. But we do know: it's always on purpose, Monty never opened a door with a car behind it, thus prematurely ending the bet. So there's no ambiguity.
The problem is cool because the right answer doesn't seem intuitively right, even though it can be formally shown to be right.
> But we do know: it's always on purpose, Monty never opened a door with a car behind it
We only know that if the problem tells us. Sometimes it doesn't.
> There are no ambiguities in the Monty Hall problem
The problem has been written up thousands of times. I'm sure that some writeups are sufficiently unambiguous, but many are not. For example, consider the two "variants" described by this comment https://news.ycombinator.com/item?id=8664550
> The host selects one of the doors with a goat from the remaining two doors, and opens it.
> The host chooses one of the remaining two doors at random and opens it, showing a goat.
This commenter was trying hard for semantic precision, and yet, I think if you encountered the first variant in isolation it would be perfectly reasonable to interpret it as "The host [randomly] selects one of the doors with a goat [although he might have selected the prize]" even though this is clearly not what the commenter was attempting. If you disagree, that only proves my point: this problem is prone to silly and wasteful semantic debate, rather than the interesting probability result it should be focused on.
The original wording was "You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat."
I think there's a good argument that the intended interpretation should be the favored one. If he doesn't use the knowledge to always reveal a goat, why did they bother specifying that he has the knowledge?
But it still doesn't quite explicitly say that he's not picking randomly.
What if he used the knowledge to decide whether or not to open a door and offer you the choice? I think that scenario is why most people instinctively want to not switch (even if they aren't consciously aware of it), so it's a pity to disregard it.
It seems to me you're answering your own question. There's only one reasonable interpretation; you have to go out of your way (unreasonably!) to find any ambiguity.
Nah, there's a lot of space between "this should be the favored interpretation" and "this is strictly implied" - especially in the context of a math puzzle!
But this makes no sense. You're proving the "counterintuitive answer" with the probability theory but when confronted with the fact that your proof is not mathematically correct you say "this is a simple brain teaser". I don't see how it changes anything. If you rely on the probability theory then you have to do it correctly, there's no special brain teaser version of math.
It's like going around saying that "counterintuitive answer to equation x/x=1 is 17" and then "prove it" by dividing 17 by 17 . When confronted with the fact that in math to solve an equation is to find ALL solutions and not just one, you say "It's not at all about rules lawyering the premise of the puzzle". Well avoid dealing with the math then because the math in fact is all about the rules.
> But this makes no sense. You're proving the "counterintuitive answer" with the probability theory but when confronted with the fact that your proof is not mathematically correct you say "this is a simple brain teaser"
The answer IS mathematically correct. It's just counterintuitive and it made some PhDs trip.
Nobody here -- not even you, before -- was arguing it's mathematically incorrect. Some people, when told the right answer, claimed the problem is underspecified and admits more than one context that may change the answer, which is not at all the same as saying the answer is mathematically incorrect! Not even Diaconis, the person some of you are so eager to defend (for some bizarre reason) claims it's "mathematically incorrect"!
I do argue that. In probability theory you can't assume independence of two random variables unless it's a part of the problem statement because this assumption changes everything. Where I got my degree this would be considered an error and an incorrect solution to the problem.
It's not different from how in middle school you can incorrectly assume that "x is positive" when solving x^2=4 in R. An answer "x=2" is mathematically incorrect.
Well, you're wrong. What's worse, the followup by vos Savant clarified this.
> In probability theory you can't assume independence of two random variables unless it's a part of the problem statement
In most logic puzzles you can safely assume an interpretation of the problem that makes sense and which doesn't go into extraneous tangents. Going "well, akshually, if we assume a spherical goat..." is usually a bad sign.
Frankly, all of this still reads as an a posteriori rationalization for finding the solution to the straightforward formulation of the puzzle counterintuitive.
> Frankly, all of this still reads as an a posteriori rationalization for finding the solution to the straightforward formulation of the puzzle counterintuitive
Luckily, when I was introduced to that problem many years ago it was presented correctly and the answer albeit counterintuitive was perfectly clear. Developing an intuition for it was also rather easy (what is the chance I guessed incorrectly at a first try? it's the answer).
What's above is my reaction to the incomplete formulation of the problem and an incorrect answer that follows.
The reason I'm so annoyed by this is because probability theory is very fragile and only works when applied with absolute precision. If you follow your approach with the Two Envelopes Problem and make some "reasonable assumptions", you get a crazy answer (always switch). And people who are in the business of logic puzzles rather than probability theory wouldn't even know the difference.
Therefore I would rather discourage people from working on logic puzzles and suggest doing the actual math instead.
I imagine one reason people have a hard time with the monty hall problem is that they have learnt a rule that seems to fit but really doesn't. A person not trained at all in math might do better as they haven't learnt that rule.
There's probably a name for that cognitive bias, but I don't know it.
I'm still working on it, but I think a key idea is that 1) The "host" tells you he is going to eliminate one of the two losing options, and then allow you to choose from the two remaining options. 2) The host allows you, if you wish, to "protect" one of the options from being eliminated. If you choose to protect, you may still choose either of the remaining options after elimination occurs.
The correct solution is to protect one of the options and then choose the other option.
The real challenge is coming up with appropriate flavor text for this idea, but I think I'll get it eventually.
The classiest person I ever knew placed me before him on a paper just to be nice. Not only was he responsible for getting the grants funding our research, but the heft of both the theoretical and paper authorship work. I'm no longer in R&D and/or academia, but at that point I decided to do the same for someone if I ever have the opportunity to write a noteworthy paper with someone who is my junior.
This is cool enough. But I find the "celebrification" style of the piece a bit off putting. Did I really need to see multiple posed shots of this young man reposing in various university settings? It's like we need our own version of La La Land to glorify the survivors of computer success to motivate more to participate.
This is a cool and important one. I was happy to see the article celebrating the young scientist. This is well deserved. Congratulations to him. And I hope this inspires other undergraduates as well.
I presume you've never read any quanta magazine pieces before. They like to talk about & emphasize the human side of these discoveries too, instead of merely focusing on the abstract.
"It's like we need our own version of La La Land to glorify the survivors of computer success to motivate more to participate."
It wouldn't be so cringe if they talked about the technical side too. The article says nothing about what "tiny pointers" are. Andrew Krapivin doesn't have a GitHub with any code. His paper is gated so I can't read it. Do they expect me to venerate this kid on faith? We should be celebrating developers whose work is open source, but instead the establishment relegates them to the lowly thankless role of janitors. The only reason a pop science journo would write an article about an open source developer would be not to lionize them but rather put them in their place for a code of conduct violation.
Specifically, they focus on the human interest part, and a pop science into to the field, and avoid discussing the actual result, because the closer they get to the actual piece of research, the more mistakes they make in describing it.
That's exactly why I'm here scrolling, finding your comment, yes. TFA is way too 'pop sci', but I know the paper will be too much for me at midnight, if not ever, so where's my 'explain it like I know what a hash table is' story?
I'm suggesting that a logical question that readers of Hacker News might want to consider when they get excited about a new result in CS, Physics, Biology, etc. is what are the practical circumstances that fostered these results? i.e. How did someone get paid to work on these problems?
In this case the funding source was NSF...
This is particularly pertinent to ongoing attacks on the federal research infrastructure of the United States:
* The current administration has proposed to cut NSF's funding from $9B/year to $3B/year.
* The current administration is trying to impose retroactive changes to already negotiated grant indirects that will result in budget shortfalls of hundreds of millions of dollars. One immediate consequence of this is dramatically fewer graduate students will be admitted to graduate programs across the board; undergraduate summer research programs will be shut down; etc.
Now we have faster data structures we can fill that extra time by writing less efficient code, and loading more pointless libraries. This is the march of computer science.
I read through this and I'm not sure if people have heard of dictionary trees for hash tables. Of course, quantamagazine.org has been known to sensationalize these types of things.
The developer of Balatro made an award winning deck builder game by not being aware of existing deck builders.
I'm beginning to think that the best way to approach a problem is by either not being aware of or disregarding most of the similar efforts that came before. This makes me kind of sad, because the current world is so interconnected, that we rarely see such novelty with their tendency to "fall in the rut of thought" of those that came before. The internet is great, but it also homogenizes the world of thought, and that kind of sucks.
reply