I think this is spot on, and I'd like to add my own thoughts:
1 -
An oft overlooked area where performance matters, is concurrency. It feels like it's taboo talk about performance and concurrency together, and it can be dangerous to mix the two too much, but they do go hand-in-hand to some degree. Take a web app. If you have 10K req, and it takes 500ms to process each request, you'll need 5000 workers (threads, processes, whatever) and some people to have the wait 1 second to get a response. Drop that processing time to 5ms and you not only drop the amount of workers, but also decreases the response time across the board.
2 -
Databases in general, and B-Trees specifically, are a great example of a generalizations. Redis shows us the performance benefit of using more specialized data structures (yes, Redis is in-memory, but the data structure are more important than the fact that its in memory (though the two are closely related)). At an extreme, if you need to list results in order, nothing beats an array.
3 -
The GC doesn't eliminate the need to think about memory. I've seen a number of performance-sensitive applications that failed to utilize a fraction of the 32GB or 64GB of memory that was available. A lot of developers just don't think about memory anymore. (I can't find it, but I remember a popular post that talked about this with respect to iOS' better performance than Android (while having less resources available)).
> I can't find it, but I remember a popular post that talked about this with respect to iOS' better performance than Android (while having less resources available)
This may have been it, and is a great read anyway if it wasn't:
Concurrency is also a problem if you access or lock resources blindly. I once sped up a Java webapp by factor 10 by removing an unnecessary lock in the authentication and authorization code.
Performance problems aren't always tied to CPU usage. Waiting for disk IO is the most common example, but, as you say, coarse locks can also be problematic. Probably the worst I've seen are threaded web frameworks used for making external API calls (to facebook, say). You can a machine doing 0 CPU and Disk, but completely locking up.
> Probably the worst I've seen are threaded web frameworks used for making external API calls (to facebook, say). You can a machine doing 0 CPU and Disk, but completely locking up.
Or the intranet equivalent: LDAP group resolution for a user. So slow ... so incredibly slow. I've never seen an application which cannot be (massively) speed up by caching this information after retrieving it the first time.
Another related issue is the common advice to "not reinvent the wheel" and use standard library routines and algorithm implementations, often stating how unlikely it is that you would be able to code something better than that yourself. However, this is not exactly true. While the authors of these standard routines may be experts in implementations of their favorite data structures and algorithms, they are not experts in the exact needs and requirements of your application. The standard libraries are used by everyone and so they have to balanced to work well in a wide variety of use cases. In order to support this, standard library authors have to make different tradeoffs than what would be optimal for your specific application. While it is true you may not be able to write as good of a generic implementation that would work well in all use cases (yet), it is quite doable to create an implementation that is superior to the standard for just your use case.
Of course there are other concerns in doing this, such as testing, maintenance, etc but from a performance point of view the idea that "you couldn't write a better version than the expert's implementation in the standard library" is not really true.
I think the argument for using libraries is almost always one of solving for "adequate" performance, not best-case. If you want a library you can actually reuse, you're designing it so that it will work in many situations, and when it doesn't, the design should also support copy-pasting some of the code and reworking it to fit the application. Playing "guess the optimization" adds configuration bulk which can negate the ability of the library to actually specialize in this way.
There are specialties, of course, where the point of the library is to be a non-trivial, high-performance implementation. Those are designed towards a specific domain, though.
"Of course there are other concerns in doing this, such as testing, maintenance, etc but from a performance point of view the idea that "you couldn't write a better version than the expert's implementation in the standard library" is not really true."
Perhaps, but the bigger problem is that usually in order to know you've written the better version, you need to have actually used the library in a use case as close to the real one as possible and MEASURED.
An excellent article, though I think it should have separated out technical debt as it's own item instead of cramming it under "compound interest" and covering it only briefly.
Most of the time I've seen performance be a huge mountain to climb precisely because there is so much technical debt. The system ends up complexity-locked due to the huge resources necessary to pay down the debt, which ironically usually merely increases the unwillingness to do so.
In order to address performance well you usually need a system that is understandable, straightforward to modify and extend, and possible to have high confidence in its correctness. Unfortunately, those are often unreasonably high goals for many projects, so performance languishes along with all the other desirable aspects of aokp engineered systems such as robustness, elegance, security, good UX, etc.
I dont think he meant compound interest as technical debt in the usual sense. He's talking about building slow abstractions on top of slow abstractions and multiplying your slowdown. The code itself might be very clean code thats easy to understand. Usually, performance optimized code is really nasty looking and hard to understand
The thesis of the article which I see as "Design for Performance" strikes me as wrong, but I'm interested in hearing why my intuitions are off.
1. Done is better than perfect.
In general I agree with this meme. I have started lots of projects that didn't get completed (zero value to anyone else) and seen lots of features never get off the ground because they were stuck in a design phase (with people arguing over design) rather than just picking anything and getting it done.
So perhaps my gut reaction to this article is that it is very very easy to become obsessed with trying to think through performance design decisions up front, and if you're the kind of person who falls into this trap, you need to ignore the article and focus on "Done" before "Fast."
2. Performance is perception
The performance of an application is highly subjective and discontinuous. If it takes 1-150ms to render a page, for most people that is almost exactly the same level of perceived performance. If I take 300 to 1000 ms to render my page, it is now sluggish. If I take 5s or more to render my page, it is either broken or time to get a cup of coffee and "wait for the durn thing to load."
It's this weird stair-step nature to performance that doesn't map nicely to big O notions of complexity, factors like 100x or any other simple continuous, numerical scale.
That's why phrases like "unnecessary work" strike me as unhelpful. The work simply doesn't matter until it has a noticeable impact. If you've got a simple abstraction that re-renders the page every 60th of a second even if nothing has changed, the abstraction has value and the work is neither beneficial nor detrimental, until you cross a perceptual boundary.
3. It's not done until it's fast (enough).
This meme is also something I agree with. Development can be done such that 1. Things work and then 2. They are made fast. Perhaps the wisdom the author is trying to impart is that in his experience 2 never happens and you must realistically reverse the order.
I liked the article, but I didn't understand the meaning of this paragraph:
> The string map< occurs 588 times in the WED code base. set< occurs 822 times! Approximately 100% of this code is running a lot slower than it could be. This performance problem is the cumulative result of using slow-constant-time coding techniques for years.
Why does 100% of this code run slower than it could? Maps and sets give you quick inserts and lookups, which arrays don't. Without knowing the context in which these 1400-odd structures are used, how can we really know that they're not the proper choice?
std::map and std::set are designed to support a large set of use cases, and as a result, cannot really be optimized.
In particular, the iterator invalidation semantics prevent implementers from using more cache-friendly data structures like B-trees.
I agree wholeheartedly.
Sadly, good performances are impossible to maintain for the code-base I am working on. After years of very fast iterations without optimizations (or even good architecture) and with a product team pushing for even more new features at an accelerated rate, the technical debt is spiraling out of control..
I am not sure of what can ben done in these conditions. After 18 months in that company, I will very probably move on to another job soonish for many reasons, leaving mostly inexperienced engineers in my team, ready to reproduce even more mistakes.
> I am not sure of what can ben done in these conditions
Well, I'd say you are on a right track here:
> I will very probably move on to another job
;-) On a more serious note, it's important to a) have a plan for dealing with technical debt prepared beforehand and b) communicate clearly about what is caused by the unpaid debt. At some point the decision makers will want you to tell them what you can do about it, which is when you present a), and hopefully get enough time and budget to fix at least some of the most glaring problems. The only other option for a project - if you know there won't be any decision to deal with technical debt - is to fail. In which case moving on is the best course of action.
hahaha, yeah sadly it seems to be the reasonable choice.
I have demonstrated an enormous memory leak in our app (not really difficult to do when you look for it with the tools provided by the sdk). I have explained why it happens (shitty architecture is shitty) and proposed to prototype alternatives.
It was 8 months ago, no time to do it (for me or anybody else in the team), we have features to add.
Meanwhile, OutOfMemoryException is our main cause of crashes.
Sigh.
1 - An oft overlooked area where performance matters, is concurrency. It feels like it's taboo talk about performance and concurrency together, and it can be dangerous to mix the two too much, but they do go hand-in-hand to some degree. Take a web app. If you have 10K req, and it takes 500ms to process each request, you'll need 5000 workers (threads, processes, whatever) and some people to have the wait 1 second to get a response. Drop that processing time to 5ms and you not only drop the amount of workers, but also decreases the response time across the board.
2 - Databases in general, and B-Trees specifically, are a great example of a generalizations. Redis shows us the performance benefit of using more specialized data structures (yes, Redis is in-memory, but the data structure are more important than the fact that its in memory (though the two are closely related)). At an extreme, if you need to list results in order, nothing beats an array.
3 - The GC doesn't eliminate the need to think about memory. I've seen a number of performance-sensitive applications that failed to utilize a fraction of the 32GB or 64GB of memory that was available. A lot of developers just don't think about memory anymore. (I can't find it, but I remember a popular post that talked about this with respect to iOS' better performance than Android (while having less resources available)).