Hacker News new | past | comments | ask | show | jobs | submit login
The Model Specific Registers of EC2 (brendangregg.com)
79 points by deirdres on Sept 16, 2014 | hide | past | favorite | 8 comments



Great stuff. I haven't tried figuring it out on cloud computers, but even when I have root access, turbo boost often enough seems to make precise measurements difficult. On Linux, with the exception of i7z (which he mentions) and 'turbostat' (which I usually prefer: https://github.com/torvalds/linux/blob/master/tools/power/x8...) CPU speeds are almost always reported using the nominal speed rather than the actual speed.

And even those tools just report an average speed, which means the part of the program you care about might be running faster or slower than the rest. The new Haswell-EP chips even have both an additional AVX base and AVX turbo speed, and switch speed for any portion of code that uses full width vectors. As a result, his traditional solutions 2 (read the performance counters) and 3 (write a test loop) can be harder than they sound.

So while the reading the MSR's can help a lot to diagnose the issue and to reduce the error, if you really want reliable measurements, the "turn it off in the BIOS" option is the best choice. But even this can be harder than it sounds, as it seems some BIOS will lie and use it anyway. At least, I have one Dell Haswell machine to which I have root but not physical access, which i7z and turbostat show to be running Turbo Boost although the otherwise reliable on site admin swears he's checked several times that it's turned off in BIOS.

It seems like it should be possible to turn Turbo Boost on or off by _writing_ to the appropriate MSR but I haven't succeeded with this yet. Is there a trick I'm missing? Has anyone else had success doing it this way? It would be nice to be able to turn it off as part of the timing routine, and then turn it back on for normal operation after.


Relatively cool: between 60 and 70 degrees Celsius.

That sounds a bit on the high end, considering that 65 is the highest my system (early i7) goes when fully loaded... but then looking at the temperature graph, it looks like they're really pushing them to the limits and letting thermal throttling take care of the rest - the spec for that particular model is ~80 max.

Since this is a VM, there's always a remote chance that they can trap the MSR reads and return whatever data they want. (Maybe it could even be used as a convert channel...?)


In a data center, it probably makes sense to cool this stuff only as much as necessary. I bet the 60-70 degree figure is the result of some extensive analysis and experimentation by Amazon.


The thermal throttle limit (PROCHOT), according to the MSRs, is 95. I did read through the various thermal status and thermal log MSRs, which were always zero, suggesting we aren't thermal throttling. Although I'm not sure those MSR bits are exposed by Xen. (They probably are, since the other bits in the same MSR are.)

Just checking the Intel manual docs:

"• Thermal Status (bit 0, RO) — This bit indicates whether the digital thermal sensor high-temperature output signal (PROCHOT#) is currently active. Bit 0 = 1 indicates the feature is active. This bit may not be written by software; it reflects the state of the digital thermal sensor.

• Thermal Status Log (bit 1, R/WC0) — This is a sticky bit that indicates the history of the thermal sensor high temperature output signal (PROCHOT#). Bit 1 = 1 if PROCHOT# has been asserted since a previous RESET or the last time software cleared the bit. Software may clear this bit by writing a zero."

I was checking both, and both were zero.

So if we aren't thermal throttling, then I guess it still may be the fans...


> That sounds a bit on the high end, considering that 65 is the highest my system (early i7) goes when fully loaded

But their i7 is probably stuffed in a room with 10000 other machines.


Remember that turbo boost increases the core clock speed but not the bus clock or the speed of devices. A 10% increase in the core clock may yield a 10% increase in benchmark performance, but for macrobenchmarks the effect is liable to be significantly less.


Still, I'm assuming all layers of your cache are running 10% faster as well. Definitely useful for some work. I'm working on a JavaScript JIT compiler and I would say that dynamic languages tend to generate much bulkier code in terms of instruction count because of dynamic type tests, inline caches and such, but they don't necessarily need to access more data (outside of program instructions). For this kind of bulky code, faster single core performance definitely does matter.


Anyone who screams at hard disks to see how vibration affects them is bound to discover lots of other interesting things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: