“We’ll add an index when it gets slow” is saying “screw the guy who’s on call the night this system starts timing out”.
Invisible cliffs in your code that it will fall off at some unknown point in the future are the antithesis of engineering.
If you deliberately aren’t implementing indexing, know at what point indexing would begin to matter. Put in place guard rails so the system doesn’t just blow up unexpectedly.
There are 100% cases where you shouldn’t use indexes. That’s not the same thing as being able to get away without an index.
> Invisible cliffs in your code that it will fall off at some unknown point in the future are the antithesis of engineering.
I think this describes the general shape of problem seen in any engineering domain. After a certain point of time (i.e. unknown), we can no longer guarantee certain properties of the system. This is why we add all kinds of qualifiers and constraints in our discussions with the customer.
Certainly, you can make some predictions about how tables will grow over time, but there are also some potential headaches that can be incurred if you arbitrarily add indexing. "We might need to expire records to manage growth" so you go ahead and index that CreatedUtc column. Turns out, the usage patterns of your app changed such that that table is now experiencing 20x more insert volume than you could have ever anticipated and your prediction is now a gigantic liability. With 20x insert volume you would have _never_ indexed that column. You would have used some alternative path involving temporary tables, etc. Now, you are stuck with essentially same problem - a judgement call at 2am regarding whether or not you should drop an index while the rest of your team is in bed.
Since I am unable to predict the future I feel like waiting until the actual problem arises and then dealing with it at that moment is the best option. Strategically, not having an index will likely fail more gracefully than if an index is bottlenecking requests during peak load (i.e. instantaneous vs cumulative data volume).
The key to sleeping at night is to add metrics and alarms near those cliffs or 'edges of the map' so they aren't invisible anymore. It's hard to anticipate every angle to this but in any case where you're making this kind of assumption or tradeoff it's a good idea.
A simple example is setting an alarm on your max load test traffic. When you get that alarm you know your system is now operating in unknown territory and it's probably time to do another load test or at least take a close look at scaling.
Largely agree with this take, it often becomes "add an index because query might be slow" without much discussion or trade offs around query patterns. There's a lot of "what if" over engineering that happens where you end up in hypothetical scenarios.
Look at your real world use cases, query patterns and think hard about it.
> screw the guy who’s on call the night this system starts timing out
This was a very small billing practice, and that person was going to be me. I thought then, and still think now, that I made a reasonable trade off between what would be work at the time and potential future urgent work.
Additionally, this wasn't the sort of thing that would fail suddenly when you hit a critical point. Instead, running full text searches (a very small fraction of what the database was handling) would just slow down in a way that would have been very clear to the users.
Stealing time from future you also doesn’t pay off. Future you wants to take vacations and change jobs and have family events and sleep and stuff.
It doesn’t take much work to do the back of envelope math:
How long should these queries take? Less than two seconds?
How much slower do they get as record counts increase? 100ms every 250,000 records? Okay, so this will become intolerably slow when the record count hits about 5 million. When’s that going to happen? Never? Then we’re done here. Within a few months? Well let’s plan for that now. Within a few years? Maybe not within the lifetime of this system? Judgement call. Let’s put something in place that forces us to at least look back at this before it gets bad. Even if that’s just a google calendar reminder.
When building systems we are always making trade-offs between the present and the future. It's only "stealing time from future you" when you make bad trade-off; otherwise it's prioritization.
In this case, leaving out an index for full text search meant I didn't need to maintain the software for that index either, which would have been stand-alone at the time I was building this. Possibly this even was a choice that was, in expectation, time-saving for future me.
Feeling a little bit like you’re applying a motte and Bailey argument here.
The bold claim in the article was that there are many circumstances where adding an index isn’t necessary. Diverse examples were given. These included a MySQL database, where adding an index is no additional maintenance overhead whatsoever (FULLTEXT idx in the table DDL). The implication was that there are many circumstances that affect many people where even cheap and easy to implement indexes were not needed.
Now when challenged you’re retreating to the more defensible position that in the case where you’re querying an ad hoc data structure where you would have to implement your own index that that is often a bad trade off. Or maybe it was a bad tradeoff because this was a small app you were building that you knew you would be the long term maintainer for.
But that’s not what I’m talking about is it, and I don't know all the specific circumstances of your single specific project where this was the right choice, so we’re not going to get anywhere.
> These included a MySQL database, where adding an index is no additional maintenance overhead whatsoever (FULLTEXT idx in the table DDL)
MySQL didn't add support for FULLTEXT until v5.6, released 2013-02, a few years after I was working on this. At the time if I had wanted a full text search index it would have needed to be an additional service.
Looking now (no longer on my phone) it's a bit more complex than that: while MySQL has supported FULLTEXT indexes since 3.23.23 (2000-09-01) [1] if you wanted to use InnoDB (and you probably did -- it was much better than MyISAM [2]) you initially couldn't use FULLTEXT. That was added in v5.6 [3], and at the time I was developing this software the standard option was to set up Sphinx.
I've edited the post to add some of this history, so future readers understand this was about whether to add a dependency on an external indexing service.
Most systems get scrapped and never go to production. Early stage is validation and search for a useful concept. It often pays off very well to steal time from the future.
Indices are far more likely to cause a cliff than to remove one, IME. Full table scans are linear and grow linearly. Indices are crazy fast until one day they're suddenly not.
I don't think I've ever seen such a cliff where the database is using the same query plan?
There's sometimes small incremental slow downs as you pass btree depth thresholds and have to add more, but tbh even that's not usually noticeable.
I have seen plenty of instances where the database decides an index is no longer the right access,but gets that wrong, and the result is terrible.. but I see that as a different issue.
> There's sometimes small incremental slow downs as you pass btree depth thresholds and have to add more, but tbh even that's not usually noticeable.
Even this kind of "narrow" slowdown can be substantial. Well, I don't know specifically that it's depth thresholds, but that kind of behaviour - the table passes a certain size and it's suddenly taking 30% longer to insert/read than it did when it was 2% smaller.
> I have seen plenty of instances where the database decides an index is no longer the right access,but gets that wrong, and the result is terrible.. but I see that as a different issue.
IMO it's the same issue, at least from the perspective of the poor chump on call.
> Even this kind of "narrow" slowdown can be substantial. Well, I don't know specifically that it's depth thresholds, but that kind of behaviour - the table passes a certain size and it's suddenly taking 30% longer to insert/read than it did when it was 2% smaller.
This has usually been a memory problem for us - our hot dataset no longer fits in it.
As the number of cliffs increases, so does the likelihood that you will hit a second cliff in the middle of fixing the first one. At some point you hit a Kessler Syndrome situation where people start quitting and the people who should have fixed the problem in the first place aren't around anymore.
Nobody wants to stick around for a lecture they can see coming.
In enterprise software I noticed software tends to work in a WUCT'ed up manner. Things may slow down over time, but no one complains about it because "the software is just slow", then suddenly one day you hit the timeout component of some higher layer and then the software is completely and utterly broke.
Well, this isn't a valid reason. It's not the software that is causing the WUCT.
If you let WUCT run free in any complex process, you are just asking for everything to break all the time and people to do nothing but fire-fighting. So, you are much better dealing with the WUCT causes instead of insisting that software or any other component scales forever? (Because they never will, and you will just make something else bream.)
WUCT is one of the main reasons why we monitor system operations.
Everything has a critical threshold. Even your perfect db schema will still eventually run out of disk. There are no magic solutions that work on all scale levels.
Yup. Quarter end reporting has been going swimmingly for years but this time the orders query is hitting a connection timeout and the numbers are due tomorrow.
That threshold doesn’t gradually creep up on you, it hits you all at once at the worst possible time.
Invisible cliffs in your code that it will fall off at some unknown point in the future are the antithesis of engineering.
If you deliberately aren’t implementing indexing, know at what point indexing would begin to matter. Put in place guard rails so the system doesn’t just blow up unexpectedly.
There are 100% cases where you shouldn’t use indexes. That’s not the same thing as being able to get away without an index.