Free MIT Course: Performance Engineering of Software Systems

pclmulqdq · 2024-01-11T01:54:07.000000Z

As an engineer who has spent a lot of time on performance engineering, I have to say that this course has a lot of the "cool" parts of the job, but not a lot of the actual nuts and bolts of it. Practical performance engineering is primarily about measuring things.

In large software systems it's not obvious at all where your hotspots actually are and what you actually should do to take care of them. Further, once you do an optimization, you need to be able to make the right measurements to prove that your optimization actually worked. A lot of optimization also involves parameter tuning (eg how many threads to do what), and that is incredibly dependent on having good measurements.

Once you know what to do, the actual optimizations you need to do - even assembly-level tricks - are usually somewhat straightforward in comparison to the measurements you do to get there. If you don't do the measuring, you are going to end up doing a lot of high-effort, low-return work.

explaininjs · 2024-01-11T03:42:39.000000Z

I really don’t know where this mis conception keeps coming from. The course very explicitly makes clear the need for profiling and instrumentation, and the lab write ups require breaking down the impact of each change you made. But that’s all trivial compared to the main content of the class, it can be taught in a single lecture (+ hands on guiding in recitations/homework). After that is where the “ok X specific section is slow, now what?” comes in.

pclmulqdq · 2024-01-11T15:00:56.000000Z

The mistake in your comment - and probably the course - is the assumption that the measurement, which you limit to only profiling and a vague notion of "instrumentation" (which I assume usually means "reading the performance counters"), is the easy part of performance engineering. Actual performance engineering is rarely as simple as "this huge section of code is slow and hot and hasn't been hyper-optimized yet, let me hyper-optimize it." I have done projects like that in real systems, but they are few and far between compared to projects that involve a lot more measurement and a lot less assembly-level code. Usually, performance engineering either means (1) finding someone doing some O(n^2) shit and diplomatically telling them that they were stupid or (2) finding a relatively subtle and diffuse source of slowness in your system and making equally subtle changes that speed things up significantly.

See the following paper for an example of real life performance engineering, where the engineers involved slowed down their own code to create a significant improvement in general application performance: https://www.usenix.org/system/files/osdi21-hunter.pdf

As another example, performance engineering in trading systems often involves figuring out how to do non-invasive measurement of events that cause systemic tail latency so you can find the bottlenecks and slow parts. If you do the hamfisted things that most engineers think of ("let me chuck performance counter checks everywhere"), you will prevent your system from making money and often destroy the usable signal.

explaininjs · 2024-01-12T18:51:34.000000Z

Both 1 and 2 are solved by profiling. The course involves assembly because it’s taught in C, but the core concepts apply to any language - the particular language is just an implementation detail you (and several others) are getting needlessly hung up on.

Your trading system example sounds like a great topic for a masters thesis or similar graduate level work (or better yet - industry), not a core component of an introductory performance engineering class.

pclmulqdq · 2024-01-12T19:45:49.000000Z

> Both 1 and 2 are solved by profiling.

This is a very simplistic view of how software measurement works that is pretty pervasive in academia, but doesn't actually translate all that well. If you're sticking to only typical profiling methods (which tells you very little about things like latency and doesn't tell you at all about many sources of slowness), you still need to find a way to do it in a low-impact way if you actually want to measure a software system of any complexity. As an example of another technique, tracing (and even just reading logs) can give you a lot of interesting signal that profilers don't generally capture.

Most commercial software systems have a number of lines of code (and a code profile, by the way) comparable to the Linux kernel, so if you're just going to apply a simple profiling methodology to it, you're going to get a lot of crap data and you're going to slow things down a lot. Performance engineering is about extracting signal from the crap.

> The course involves assembly because it’s taught in C, but the core concepts apply to any language

I never said I wrote assembly, I said assembly-level code, which you can write in most languages. You can write it in Python if you are skilled enough at Python. Many people do it in Java and Go. I usually do it in C. Most of this kind of code these days is not actually written in assembly.

> Your trading system example sounds like a great topic for a masters thesis or similar graduate level work (or better yet - industry), not a core component of an introductory performance engineering class.

The primary point I am trying to make is that a class called "micro-optimization" should be teaching you how to micro-optimize code. A class called "performance engineering" should nominally be teaching you about how to actually do performance engineering, which is actually not all that related to micro-optimization.

explaininjs · 2024-01-12T22:16:22.000000Z

I’ve used techniques I both learned and taught in that class to dramatically speed up many real world large scale systems. Yes, not every aspect taught is applicable to every system (one thing we mentioned many times is that many of the micro-optimizations we explained were done by a modern compiler anyways, which is why we even brought up assembly). But those techniques are still useful historic context for the subject, which is exactly what you’d except an introduction class to include for the first lecture, along with a live demo that gets people excited about the class and willing to keep going with it.

The class is not about micro-optimization. The first couple lectures are. But people in this thread love to read their titles and stop there. I don’t get it. Here are the lecture notes for measurement and timing, you’ll notice they include what you say about separating signal from noise. I’m sure you, with tons of industry experience behind you, know more than this introductory undergraduate’s lecture in the topic provides (it’d be quite sad if you didn’t!), but that does not mean the class does not indeed provide an introduction to the topics. One that, again, I have personally built upon to dramatically speed up countless medium-to-large scale systems in a wide variety of contexts. https://ocw.mit.edu/courses/6-172-performance-engineering-of...

explaininjs · 2024-01-10T20:30:24.000000Z

I TA’d this course this semester (Fall 2018 - wow the time flies), happy to answer questions.

(Though I’m currently backpacking through the Andes so you may need to be patient)

dmlittle · 2024-01-10T20:49:59.000000Z

Thank you for TA'ing the class! I took it in 2015 and TAs really made the class for myself and most of my friends.

Is the end of semester Leisserchess competition still going? I believe I heard that the year you TA'd it (might have been a year before or after) a group finally compiled an opening playbook and beat everyone in the class

explaininjs · 2024-01-11T00:03:07.000000Z

I haven’t followed the course closely, but I’d assume there’s still some variation of the game going on, Charles was quite fond of it.

I’m not sure what you mean by “finally complied an opening book”, by the time I was involved with the class opening books were table stakes and we even included code to generate them as part of the starting distribution. They’re somewhat useful, but effective culling of the search space in multithreaded contexts is far more important. The opening book can only ever help for the first few ply, after that the engine which can consistently think a move ahead will likely win. (Indeed the staff designed the rules to optimize for that very property). A quick and informative heuristic function is of course also critical.

lagniappe · 2024-01-10T20:52:26.000000Z

I don't have any questions but do wish you well on your journey through the Andes :) Signing up for the course for fun, thanks!

flyer_go · 2024-01-10T22:41:53.000000Z

Do you have code for the course?

stuarto · 2024-01-11T00:51:10.000000Z

course repo with code and assignments is at: https://github.com/sourcery-ai-bot/MIT_OpenCourseWare-Perfor...

explaininjs · 2024-01-11T00:05:27.000000Z

Not my place to share it! Though it can be found online by those who seek.

noitamroftuo · 2024-01-11T01:06:11.000000Z

tell me about backpacking through the andes. one question - where do you sleep ?

dwrodri · 2024-01-11T18:13:19.000000Z

I am currently trying to prep for SWE roles where my focus would be performance / scalability, so this course is extra topical for me. Thank you for sharing.

Here are some other resources for people interested in improving their understanding of performance:

- "Computer Architecture: A Quantitative Approach" by Hennessy and Patterson. The "Computer Organization" book by the same authors is also good. Both of these resources were extra useful when building deeper intuitions about GPU performance for ML models at work and in graduate school.

- CMU's "Deep Learning Systems" Course is hosted online and has YouTube lectures online. While not generally relevant to software performance, it is especially useful for engineers interested in building strong fundamentals that will serve them well when taking ML models into production environments: https://dlsyscourse.org/

- Compiler Explorer is a tool that allows you easily input some code in and check how the assembly output maps to the source. I think this is exceptionally useful for beginner/intermediate programmers who are familiar with one compiled high-level language and have not been exposed to reading lots of assembly. It is also great for testing how different compiler flags affect assembly output. Many people used to coding in C and C++ probably know about this, but I still run into people who haven't so I share it whenever performance comes up: https://godbolt.org/

rashkov · 2024-01-10T20:24:43.000000Z

Another good course with exercises: https://github.com/dendibakh/perf-ninja

tayo42 · 2024-01-11T02:15:31.000000Z

Looks interesting, wish it touched on system and os level optimizations too.

j_not_j · 2024-01-11T21:21:57.000000Z

See also The Art of Computer Systems Performance Analysis by Jain.

The statistical parts are very good: self contained and with good explanations. Perhaps a weakness is there isn't much on distributed systems.

He did some Youtube videos as well.

jstanley · 2024-01-10T20:46:37.000000Z

The course looks really interesting, but in my experience of "performance engineering" you very rarely need to know this stuff.

If you want to improve the performance of some random software system you encounter at work, there is normally much lower-hanging fruit available.

First profile the application on as representative a workload as possible. Then fix the accidentally-quadratic loops, stop it from hitting the disk every iteration, add indexes for the worst database queries. If that doesn't at least double your throughput, you are working on an unusually-good application.

f1shy · 2024-01-10T21:56:34.000000Z

This is pretty much stated in the course. Basically is rare that you need it. But when you need it, is good to know.

oooyay · 2024-01-10T22:14:38.000000Z

More than good to know, sometimes knowing the thresholds of the nuts and bolts is the difference between tuning and rewrite it in a different language or more complex architecture.

a1o · 2024-01-10T21:01:40.000000Z

> there is normally much lower-hanging fruit

Yeap, usually it's more about properly profiling your application on the CPU and memory dimensions, occasionally instrumenting if necessary, experimenting different measurements and tools to have a good picture of your software bottlenecks and where there are gains.

Knowing the full picture and where the issues are you can then proceed to these specifics of optimizing your code for performance in these spots, or using a specific library that solves better that problem or what is usually an approach that gives better gain, rethinking knowing the limitations and finding a better design for the problem that part of the code is trying to solve.

explaininjs · 2024-01-11T00:07:47.000000Z

I do not know why this thread is dismissive of the course when that is precisely what the first few lectures of the course teach, and every subsequent assignment treats “profile and instrument the code” as the (sometimes inferred) step 0.

But it’s an 18 unit course, it’d be a disservice to the students (and the world at-large, considering the potential impact of performance engineering) if we stopped there.

(I TA’d this class)

jstanley · 2024-01-11T10:00:55.000000Z

I haven't taken the course, I was just going off the lecture titles. The first 3 lectures are entitled:

  Introduction and Matrix Multiplication
  Bentley Rules for Optimizing Work
  Bit Hacks

explaininjs · 2024-01-11T12:25:11.000000Z

“Measurement and Timing” is lecture 10. We started with small independent programs and worked up to systems, in much the same way you would not teach an individual to paint by telling them the Sistine Chapel needs a patch fixed.

But yes, I do agree profiling is step 0 to performance engineering.

sebastos · 2024-01-11T02:42:28.000000Z

The same can be said for many - maybe even most topics in engineering higher education. 99% of the stuff you'd learn in an advanced controls course ain't running anywhere, but it's a body of knowledge we want to maintain and grow because if it was applied everywhere, it could make everything better. Maybe the economics or the politics aren't there today, but you never know where it will come in handy.

I 100% get why you're saying it: the fear is that with all of these contraptions in mind, we send a generation of engineers out who immediately reach for their performance bazooka when all they needed was a pocketknife. But SOMEbody is doing actual high quality engineering and needs to reach for this stuff. It's MIT: there's a chance those people are in the room.

cozzyd · 2024-01-10T21:05:32.000000Z

it really depends on what you work on...

dennis_moore · 2024-01-10T21:01:09.000000Z

I think that the term "software system" can have a much broader meaning than the one you are using here.

jstanley · 2024-01-10T22:05:32.000000Z

For sure. I suppose the unspoken step is "work out which component is the bottleneck".

If you're lucky then it's CPU bound and everything runs on the same box and you can just look in htop.