Hacker News new | past | comments | ask | show | jobs | submit login
CPU Energy Meter: A tool for measuring energy consumption of Intel CPUs (github.com/sosy-lab)
170 points by todsacerdoti 3 months ago | hide | past | favorite | 60 comments



Since kernel 3.3 or so, RAPL is also exposed through `/sys/devices/virtual/powercap/intel-rapl/*/energy_uj` in micro-joules (if not, `modprobe intel_rapl`). So if you want to do a quick power measurement, it can be done using just POSIX sh (root required):

    # in milli-watt (1000 = 1W) because shell arithmetic doesn't do floating point 
    while true; do
        LAST_MJ=$MJ
        MJ=$(cat /sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj)
        echo $(((MJ - LAST_MJ) / 1000))
        sleep 1
    done
Despite the powercap name being intel-rapl, the powercap interface is also available on AMD machines.

For a more detailed reading on several more metrics about the CPU, I think pcm[1] may be a better tool (it's a successor to the Intel Power Gadget the project was forked from). Though, it only works on Intel CPU.

[1]: https://github.com/intel/pcm


There is also perf:

  perf stat -e 'power/energy-pkg/' -I 1000 --interval-count 3 
  #           time             counts   unit events
       1.001064377              11.00 Joules power/energy-pkg/                                                     
       2.002605466              10.98 Joules power/energy-pkg/                                                     
       3.003726824              11.01 Joules power/energy-pkg/


AMD have an equivalent in uProf: https://www.amd.com/en/developer/uprof.html

Power profiling is listed as supported on all CPUs though a bunch of features (including memory bandwidth, one that I had wanted) are limited to EPYC CPUs and don't exist in Ryzen or Threadripper.


How to do so with pn Ryzen? RAPL?


Another easy tool that may already be on your system is "turbostat".


Highly recommended, very useful tool.



Erm what? So silly!

>Long story short, since last year the AMD Energy sensor information has been limited to root due to the PLATYPUS security vulnerability. HWMON maintainer Guenter Roeck proposed slightly limiting and randomizing the sensor data so it couldn't be used for nefarious purposes but still accurate enough for genuine use-cases and no longer needing to be root-only access. However, AMD engineers didn't like that approach.

>With the hardware monitoring subsystem maintainer not wanting the information to be restricted to root-only and AMD not wanting the limiting/randomization approach, Guenter went ahead and removed the driver.

So... we're better off without having this system at all than we would be if it were limited to root OR if it were randomized? Sounds like silly kernel politicking to me. "You don't like my plan? Oh well, I guess I'll take the ball and go home, have fun losers!"


What was the security issue? More spectre-like bullshit?



This only applies to hwmon, i.e. `sensors`. You can still read this through powercap/intel-rapl (even on AMD systems).


Are the energy consumption values reported by Intel CPUs accurate? Measuring energy consumption for cheap is hard, so I wonder whether they are big approximations or they have some magic tricks.


Yes. Much earlier architectures (e.g., Sandy Bridge) used event counters as a rough approximation for energy consumption. However, these days, we use calibrated current sensors, not approximations. These are rather accurate. And accurate enough to do a side-channel attack, too. If software opts-in for security, we also add a little bit of randomness to the readings, in order to avoid measurements being too data-dependent to where crypto would be broken (PLATYPUS attack), but not enough to affect accuracy for normal use cases.



As far as I know RAPL is implemented entirely in the CPU and is an estimate of CPU power using a complex model of CPU state, temperature and such. I don't believe it's an actual power measurement like e.g. SVI telemetry is.


This was true for earlier implementations, but newer ones actually measure power. There is an ADC in there. At least for Intel. Not sure about AMD implementation.


Doesn't this vary from part to part based on how integrated the VR is in that generation? It seemed to go back and forth in some prior generations, or seemed more accurate with Xeon than with client parts. I still treat it as accurate enough but I have wondered.


Could help to combine with this ?

"Identifying Compiler Options to Minimise Energy Consumption for Embedded platforms"

https://arxiv.org/pdf/1303.6485



In my opinion, Astron's PMT (Power Measurement Toolkit) is a much more useful tool than this, because it abstracts over Intel, AMD, and Nvidia (including Jetson): https://git.astron.nl/RD/pmt

There is also a paper about PMT: https://arxiv.org/pdf/2210.03724


I really wish there is a similar tool for measuring energy consumption of a transceiver power amplifier (PA) inside any wireless device because the efficiency is abysmal (less than 50% in real life scenario due to impedance matching, skin effect, etc) not unlike the internal combustion engine (ICE) but at least the latter do not need to deal with mismatched and high frequency issues. In fact PA is increasingly becoming the main culprit of energy wasting in any connected devices especially the wireless ones, and about 50% of the power consumption of the entire device system by the PA are normal. With IoT and machine-to-machine (M2M) type of communications where data transmissions are regular and frequent unlike human type of communication where they sleep at night, machines mostly never sleep and this makes the PA inefficiency becomes even more notorious compared to human communications.


Two systems I know from HPC that more usefully expose various architectures' RAPL etc. to userland via a daemon for application profiling are https://variorum.readthedocs.io/ and https://hpc.fau.de/research/tools/likwid/. Of course other sources of power consumption than CPU/uncore and GPU may be significant.

For whole-node power on typical racked systems, I'd expect to interrogate the power strips or similar supplies with SNMP or otherwise.




Why measure just the CPU and not the whole machine ?


Because PSUs (sadly) don't have a simple interface that transmits to your OS.


Many server PSUs can report those values. `ipmitool dcmi power reading` might give good results, but most server vendors have their own tooling that gives more detailed results.

If you are into overspending on PSUs you can get similar features in consumer PSUs, for example [1]. But at that point we have completely left the realm of "simple interface".

If you want to get a whole-system overview the PSU values are obviously better. But one of the biggest power consumer in many servers are the fans, and because they react to temperature sensors they always lag behind what's actually happening. That makes attributing a rise or fall in power tricky at sub-minute timescales. That's where CPU energy use is much more useful, even if it's a less complete picture.

1: https://www.thermaltake.com/toughpower-irgb-plus-850w-gold-t...


> because they react to temperature sensors they always lag

Why don't systems drive the fans based on the VR state instead of lagging temperature readings?


Ambient temps aren't always a constant? Nor is the end-to-end performance of the cooling solution?


I'm not suggesting that the fan control should be open-loop with respect to temperature, only that temperature is lagging but voltage and current are leading, so if the CPU is going to ask for a massive step change in power the platform can look at that and precharge the fans.


In the past, I've used things with load-dependent (instead of temperature-dependent) fan speeds.

It can seem like it is a simpler method of control (why measure temperature if we don't have to?). But they were really very annoying to be around. It wasn't so bad with constant loads, but bursty dynamic loads created an audible cacophony that was very distracting.

Such a system also doesn't necessarily take into consideration other things that affect cooling performance, like ambient temperature and dust.

At least one of the problems (the annoying sounds) can theoretically be worked around by slowing down the rate at which a fan ramps up and down. But by the time that is done, fan speed is back to lagging again -- we're heading back to where we were before.

And then we still have variables like ambient temperature, dust, degraded thermal interfaces, and others to contend with.

So, the simpler method seems to be what we broadly already do today: We change fan speeds based on the temperature of the thing being cooled.

This temperature already has everything we need integrated to begin with: It includes thermal output, ambient temp, dust, old thermal goop, and everything else.

If thing is cool (for whatever reasons it is cool -- maybe it's idling in a shed in January in Minnesota), then: Fan can go slow or even turn off.

If thing is hot (for whatever reasons it is hot -- maybe it's doing real work in a shed in August in Minnesota and full of cat hair), then: Fan can go faster.


I guess you could, but what would you gain? You can't dissipate the heat until it reaches the heatsink anyway. And for bursty workloads you might not even know whether or not you're going to need it. The point of the thermal solution is just to keep the chip from throttling or destroying itself, and direct measurement is plenty quick enough to accomplish that.



Why do you need to transmit to your OS though ?


Presumably for ease of reference and integration. If you don't need your OS to see it, why not just plug your PSU into a watt meter?


Yes I 100% agree and that's why I dont understand why using a software CPU Energy meter makes sense.


Well, that's the joy of this simplified scenario, but if you go back to the original objective, measuring the CPU's draw, you can't easily do that with a Watt meter.


Use hardware to measure hardware…makes sense.


Especially in the current IoT boom where you can get "smart plugs" with built-in power meters for $5-$10 a piece. If you get the zigbee version you get easy-ish API access without sending the data off into the cloud. But even the Wifi cloud-enabled versions usually have a semblance of an API. I use lots of them throughout the house without any intention of ever using them to actually toggle power.


Well imagine your PSU has all the metrics, now what? You need to get them somewhere for software to use.


Use on what, that's exactly what I was asking about in my original and subsequent question.


You don't just want to measure CPU consumption, but whole-system power is only useful for application performance if only one application of significance runs on it. I'd expect to measure it anyway for system management purposes.


Wild, I just came across this while doing some research on power consumption. I got a AMD 5950X and a Nvidia 4080 Super and I was conscerned about using too much power on my 750 Watt power supply. lol.

This was yesterday. Wild.


Intel PMT could be used to do this also https://github.com/intel/Intel-PMT


A tool for monitoring Intel RAPL data would probably be a bit more accurate, as this tool is not really measuring anything.


In general, how does CPU utilization correlate with CPU power draw?


More utilization = more power draw. Generally.


Precisely. Loosely speaking, CMOS power dissipation is governed by the equation P = alpha*F*V*V, where V is the voltage, F is the frequency of transistors toggling and alpha is an activity factor between 0.0 and 1.0, corresponding to how many transistors toggle. More utilization = more transistors toggling = more power draw. Generally.


And the more cores you load, the higher the total draw, but the lower the per-core draw.


wouldnt the meter also consume excess energy?


It's just a coulomb counter you can read from an MSR. But yes monitoring it inevitably consumes some amount of energy. It won't cost anything on a busy system but waking up an idle system to read it will be more noticeable. This is why I no longer use background metrics monitors like atop or netdata. An Intel client CPU can idle below 100mw if you leave it be, but something like netdata will raise that to 5W or worse.


Someone should make a monitor for the monitor


Add it to the list: RAPL, PSU metering, PDU metering, UPS metering, utility metering, etc...


[flagged]


> does not switch to 200MHz for a minute during video calls

I had a Dell work laptop that did the same thing. As far as I was able to tell the system had a bug/fault that continuously asserted the CPU's BD PROCHOT line when the integrated webcam was active. I don't think it was an Intel bug, the CPU was just responding to the external signal that (falsely) indicated the system was overheating.


I have a similar problem on my Framework laptop; I suspect it's Intel's fault.

I've tailed the system's embedded controller log, and I see when PROCHOT gets triggered, but even while watching the temperatures quickly drop to safe levels, the condition never gets cleared. (It's possible it's a bug in the EC's firmware, but I'm afraid it might be something lower level.)


I think there's an MSR you can set to disable throttling when BD_PROCHOT is active.


ThrottleStop is your friend.


What does this have to do with the content of the article?


What's the point, they already consume more than entry level space heaters...


Yes computers consistently sip 1200 w from the wall. That's a normal thing. Said no one ever.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: