Hacker News new | past | comments | ask | show | jobs | submit login
Which Programming Languages Use the Least Electricity? (thenewstack.io)
107 points by damagednoob on Oct 18, 2020 | hide | past | favorite | 73 comments



One effect it doesn't capture: A language that's twice as efficient will go (roughly) twice as fast and use half the electricity. But if you have a fixed amount of time to compute the answer (such as 1 screen refresh interval,) you can run the CPU slower so the efficient language finishes at the same time. CPUs get more energy efficient at lower speeds. For a given process, energy/op is roughly linear with clock speed.

So if you have a fixed deadline, a 2x efficient language can save you 4x the power.


> CPUs get more energy efficient at lower speeds.

intuitively i agree but how does this fall in line with the 'race to idle' concept?

would love to see some graphs with operations-per-watt for different cpu/arch.


‘race to idle’ doesn’t seem to be the preferred approach anymore. Instead we’re seeing CPU designers go down the path of multiple different sized cores, with different speeds and power consumption.

So in this world more “efficient” task can be run on slower lower powered cores. Stuff that needs to happen as quick as possible is don’t on faster, higher powered cores.

When you can, you just turn off the bigger cores completely.

I don’t have benchmarks, but there must be some good power savings in it if ARM have built their BIG.little architecture around it, and Apple followed suit with their processors.


race to idle is mainly for responsiveness, not efficiency.


>CPUs get more energy efficient at lower speed. or a given process, energy/op is roughly linear with clock speed.

That is not true. For any given node process, with the same CPU design there is an optimal energy efficiency curve with regards to its clock speed. Any lower doesn't save you energy per workload, higher means you are paying Exponentially more energy.


Another effect: your computer is on for the entire time you're writing the program. Picture it: you have a .csv file and a laptop with one hour of battery life, and you need to run some stats on it. Are you more likely to complete your task in Python, or in C?


What? In CMOS energy / op is constant unless you have dynamic voltage scaling. Which processors have this?


Certainly my Ryzen 3700x has some degree of dynamic voltage scaling. You can see the voltage changing in Ryzen Master. I think intel has it too but not sure.

Also, I am not sure it is correct that energy / op is constant, even at a given voltage? I'm no expert but I read about this when I was thinking about overclocking my device. At a given voltage, higher frequencies still use more energy due to the capacitance of all the components on the device. This is why overclocked energy dissipation scales super-linearly with voltage. Because you aren't just increasing the voltage you are increasing the frequency as well.


> What? In CMOS energy / op is constant unless you have dynamic voltage scaling. Which processors have this?

Approximately all computer processors.


Yeah, I looked into this some more. High end ones seem to have it, but most MCUs do not. Anyway, there are many issues with taking advantage of this. I don't think it's very likely that you're going to get the ideal linear decrease per frequency. There are arguments to made for run fast until stop, because it reduces the effect of leakage current, see:

https://semiengineering.com/is-dvfs-worth-the-effort/


AFAIK low-power MCUs are always used in race-to-idle fashion because nothing beats <<1 µA sleep.

For computer processors, DVFS isn't used like OP suggests, it's really more like Dynamic Temperature And Power Scaling (DTAPS), because it's meant to extract maximum performance from available power and thermal budget. Also while CPUs _do_ change power and frequency at >1 kHz, they're intentionally designed _not to_ downclock in the example GP gave (inter-frame idle).


I was under the impression that CMOS consumes energy on gate transition and not constantly, so the speed translates to more current consumption.


I don’t think that CPU can adjust its power like that. And if it could, you could use it in C in the same way.


If you want to see a relatively radical take on energy efficient computing (I'm sure it's not that radical if you work in the embedded space, but that's not my background so I can't say for sure), here's Chuck Moore talking about his GreenArrays chip with built-in Forth support:

https://www.infoq.com/presentations/power-144-chip/


ga144 are near sf to me

exchanging forth messages between cores seems so high level

I wonder if Moore is still thinking about improving these


I guess one could argue that the elegance of Forth is that it's so minimal that you can get "high level" programming at a "low level" cost. However, the small print of "low level" cost mentions that this includes not having any types whatsoever ;)


You can get a sense for how much bullshit this is from looking at the wildly varying results for Javascript (4.45) and Typescript (21.50). They should be basically identical.

This is because they used the source from the language benchmark game, which measures some combination of how fast a language is, and how much effort its fans are willing to put in to micro-optimise the programs for the benchmark. In the case Javascript developers have clearly spent more time on it than Typescript developers.


While the article does argue that faster isn't necessarily greener, I would interpret the data they show as confirmation of that assumption. Of course it isn't always the case, but overall the rule holds up very well. And for smaller differences the effect of the specific workload is probably much larger.

What I find odd is that the table shows Typescript as 7x slower than Javascript. That can't be measuring the same thing, so not sure what is going on there.


If it seems odd, it’s because they didn’t implement them the same way.

The performance of the same program written in TypeScript and JavaScript should always be identical. TypeScript doesn’t add anything to the runtime.

If they got this simple thing wrong, what else did they get wrong?

Here’s the flawed study so you can see for yourself.

https://github.com/greensoftwarelab/Energy-Languages


> What I find odd is that the table shows Typescript as 7x slower than Javascript.

Including type checking and transpilation in the timing, I assume.


I don't think so, otherwise Rust wouldn't have done well at all. I think the actual difference is that for some reason the language benchmarks game uses different code for Javascript and Typescript. E.g. the Javascript version of spectral-norm is multithreaded, using web workers, but the Typescript version is single threaded. Yeah it is stupid.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...


Please contribute a Typescript equivalent of the multi threaded JavaScript program.

I converted the ones I could, like:

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

:but I couldn't get the spectral-norm program to type check.


Lol ok I would but apparently I have to sign up to Debian Salsa and you can't sign up with a gmail account. I don't think they really want contributions! Anyway here is the code. I made some minor improvements that won't affect the runtime. You need to `npm install @types/node`.

    // The Computer Language Benchmarks Game
    // https://salsa.debian.org/benchmarksgame-team/benchmarksgame/
    //
    // contributed by Ian Osgood
    // Optimized by Roy Williams
    // modified for Node.js by Isaac Gouy
    // multi thread by Andrey Filatkin

    import { Worker as NodeWorker, isMainThread, parentPort, workerData } from 'worker_threads';
    import * as os from 'os';

    enum MessageVariant {
        Sab,
        Au,
        Atu,
        Exit,
    }

    interface SabMessage {
        variant: MessageVariant.Sab;
        data: Float64Array;
    }

    interface AuMessage {
        variant: MessageVariant.Au;
        vec1: UVWField,
        vec2: UVWField,
    }

    interface AtuMessage {
        variant: MessageVariant.Atu;
        vec1: UVWField,
        vec2: UVWField,
    }

    interface ExitMessage {
        variant: MessageVariant.Exit;
    }

    type Message = SabMessage | AuMessage | AtuMessage | ExitMessage;

    interface UVW {
        u: Float64Array;
        v: Float64Array;
        w: Float64Array;
    }

    type UVWField = keyof UVW;

    const bytesPerFloat = Float64Array.BYTES_PER_ELEMENT;

    if (isMainThread) {
        mainThread(+process.argv[2]);
    } else {
        workerThread(workerData);
    }

    async function mainThread(n: number) {
        const sab = new SharedArrayBuffer(3 * bytesPerFloat * n);
        const u = new Float64Array(sab, 0, n).fill(1);
        const v = new Float64Array(sab, bytesPerFloat * n, n);

        const workers = new Set<NodeWorker>();
        startWorkers();

        for (let i = 0; i < 10; i++) {
            await atAu('u', 'v', 'w');
            await atAu('v', 'u', 'w');
        }

        stopWorkers();

        let vBv = 0;
        let vv = 0;
        for (let i = 0; i < n; i++) {
            vBv += u[i] * v[i];
            vv += v[i] * v[i];
        }

        const result = Math.sqrt(vBv / vv);

        console.log(result.toFixed(9));

        async function atAu(u: UVWField, v: UVWField, w: UVWField) {
            await work({ variant: MessageVariant.Au, vec1: u, vec2: w });
            await work({ variant: MessageVariant.Atu, vec1: w, vec2: v });
        }

        function startWorkers() {
            const cpus = os.cpus().length;
            const chunk = Math.ceil(n / cpus);

            for (let i = 0; i < cpus; i++) {
                const start = i * chunk;
                let end = start + chunk;
                if (end > n) {
                    end = n;
                }
                const worker = new NodeWorker(__filename, {workerData: {n, start, end}});

                worker.postMessage({ variant: MessageVariant.Sab, data: sab });
                workers.add(worker);
            }
        }

        function work(message: Message) {
            return new Promise(resolve => {
                let wait = 0;
                workers.forEach(worker => {
                    worker.postMessage(message);
                    worker.once('message', () => {
                        wait--;
                        if (wait === 0) {
                            resolve();
                        }
                    });
                    wait++;
                });
            });
        }

        function stopWorkers() {
            workers.forEach(worker => worker.postMessage({ variant: MessageVariant.Exit }));
        }
    }

    function workerThread({n, start, end}: {n: number, start: number, end: number}) {
        let data: UVW | undefined = undefined;

        if (parentPort === null) {
            return;
        }

        parentPort.on('message', (message: Message) => {
            switch (message.variant) {
                case MessageVariant.Sab:
                    data = {
                        u: new Float64Array(message.data, 0, n),
                        v: new Float64Array(message.data, bytesPerFloat * n, n),
                        w: new Float64Array(message.data, 2 * bytesPerFloat * n, n),
                    };
                    break;
                case MessageVariant.Au:
                    if (data === undefined) {
                        throw Error('Au received before Sab');
                    }
                    au(data[message.vec1], data[message.vec2]);
                    parentPort!.postMessage({});
                    break;
                case MessageVariant.Atu:
                    if (data === undefined) {
                        throw Error('Atu received before Sab');
                    }
                    atu(data[message.vec1], data[message.vec2]);
                    parentPort!.postMessage({});
                    break;
                case MessageVariant.Exit:
                    process.exit();
            }
        });

        function au(u: Float64Array, v: Float64Array) {
            for (let i = start; i < end; i++) {
                let t = 0;
                for (let j = 0; j < n; j++) {
                    t += u[j] / a(i, j);
                }
                v[i] = t;
            }
        }

        function atu(u: Float64Array, v: Float64Array) {
            for (let i = start; i < end; i++) {
                let t = 0;
                for (let j = 0; j < n; j++) {
                    t += u[j] / a(j, i);
                }
                v[i] = t;
            }
        }

        function a(i: number, j: number) {
            return ((i + j) * (i + j + 1) >>> 1) + i + 1;
        }
    }




That would explain a lot. The speed rankings seemed kinda off to me as well. I found it surprising that both Swift and Go are listed as ~2 times slower than Java, and C++ 1.5 times as slow as C.



That seems like an odd way to compare things if so - compilation/transpilation will only happen once, then the code will run many times.


I've wondered about how much energy gets pissed away by running huge OS×LangVersion Continuous Integration matrices upon every commit to repositories.


I wonder how much energy gets pissed away for our two comments.


yeah but if everyone migrates to rust, imagine how much more bitching we can do.


If we only could make that into proof of work for a cryptocoin.


Good question. It's nice that we're finally starting to measure computation energy costs.


We encrypt all our video and it's not even good encryption.


As an aside, this seems to use the Benchmark Games for their testing, which has seen a lot of criticism for the exercises being more or less 1:1 ported, rather than writing them idiomatically or efficiently in the different languages. Lua, in particular, gets dogged out and doesn't even offer LuaJIT as an option.


And yet, here we have someone complaining "… they didn’t implement them the same way."

https://news.ycombinator.com/item?id=24819884


The Lua benchmarks are a bit funny because many of them were tuned for LuaJIT, including tricks such as eval-based metaprogramming which work well in a JIT.


That a compiled program runs faster, that is, requires less cpu cycles and consequently uses less energy, is not a very surprising result. This comparison basically then just benchmarks the different compiler outputs for the given test cases. Benchmarking is always difficult and often controversial, especially when drawn from the benchmarking game. Consequently, C not only wins in the speed but also the energy category.

But that is only testing the run time of a program. What this paper does not cover, is for example the energy needed to compile the program and of course, the energy needed to write and debug the program. The whole reason not to write "everything" in C is, that there is a real-life tradeoff between having the fastest possible program which also would be the most efficient, and the effort to create it. Higher level and dynamic languages have been created to create correct programs with less effort. Mostly of expensive programmers, but development time obviously goes along with energy usage for the development machine, compiler runs, debugging effort.


We write server software, running 24/7 by hundreds of thousands of server, so even double the development time on a single machine would be easily offset hundreds to thousand folds by a more efficient executable.

But, you even need to go to C if you would like to achieve that, e.g., using rust gives you nice level of abstractions and is highly efficient.


Right, there are many languages, which give you "good enough" efficiency but are much better suited for development and stability.


Sure, but for anything long running or potentially just long lived that gets run a lot, the chart would approach roughly this one.


Yes, the longer the program runs, e.g. the kernel of a computer, the more the overall energy consumption approaches the ranking listet in the paper, as far as those benchmarks are meaningful. They are not very good about dynamic memory allocation in long-running programs though.


In the "Is Faster Greener?" section, I'm surprised they didn't discuss threading. In the same wall-clock time, a benchmark that employs multi-threading heavily will presumably use much more power, due to more CPU cores being active.

Maybe their benchmarks were mostly single-threaded. But in general this is the first thing to look at.


have you actually checked that your claim tends to be true? n cores going almost n times faster with presumably some fixed overhead from the computer and threading might come out ahead


I said "in the same wall-clock time". If threading is done well, it will reduce wall-clock time, so I agree that threading could come out ahead.

What I was trying to get at is that if two languages perform the same task, but one of them uses multi-threading to accomplish it, "wall-clock time" is extremely likely to be a misleading proxy for power consumption.

It boils down to the fact that a CPU using all of its cores uses more power than a CPU using only one of its cores. This is easily checkable by listening to the fans on a fully loaded desktop PC, or by using software like "Core Temp" which can display CPU power consumption on some CPU models.


> It boils down to the fact that a CPU using all of its cores uses more power than a CPU using only one of its cores.

This is true if your cores are all the same and running at the same frequency, but voltage/frequency scaling make things more complicated and heterogenous cores like in essentially every modern ARM processor completely destroy it. If you have a bunch of slow cores that are half as fast but use a quarter of the power [1], using threads to split up the work so you can still meet your deadline can definitely save power.

1: https://www.semanticscholar.org/paper/Power-aware-task-sched...


Here’s a recent discussion of the paper:

https://news.ycombinator.com/item?id=24642134


If you're thinking about energy consumption in programming, this article is very interesting:

"C Is Not a Low-level Language" https://queue.acm.org/detail.cfm?id=3212479

> On a modern high-end core, the register rename engine is one of the largest consumers of die area and power. To make matters worse, it cannot be turned off or power gated while any instructions are running, which makes it inconvenient in a dark silicon era when transistors are cheap but powered transistors are an expensive resource.


Mmmhh, I wonder (I'm not a pro) if a few seconds of execution is enough for the garbage collectors to really kick-in? I remember that for example Java's GC performs frequent "small" but as well occasional "big" sweeps, right?


I'd also want to see a comparison on different CPU architectures.


Apart from the main subject of whether energy consumption tells us anything meaningful, Whenever I see such benchmarks withoud D and Nim, I feel like the readers will get a skewed view, and with each such benchmark this view will be reinforced. My gut feeling would be that these two languages would be between C and C++ (at least some other benchmarks place them there), but it would be good to have it confirmed.


Yes, and another interesting benchmark could involve "how much energy and time to compile (or start up) equivalent programs written in different languages?" Based upon its design philosophy (fast compilation) I would expect D to rank quite well.

OTOH there could be a Jenkins paradox where faster compilation results in more compiler consumption (e.g., iterative development). As you say (and I see a comment above addressing this) what are we optimizing for?


If you want to take this question to the next level (and beyond any current real-world applications), read up on Reversible Computing (https://en.wikipedia.org/wiki/Reversible_computing), which is the concept of performing computation at (near) zero energy cost.


Maybe use an embedded micropower system with C or Rust and interrupt-based processing approach rather than CPU-eating polling in something dynamic like Python, Ruby, or Node without ever going into standby mode.


Sounds interesting. Can you point me to some resources that compares the two or how `embedded micropower system` and `interrupt-based approach` works?


Try programming an Arduino with interrupt handlers, sending the main loop into deep sleep.

Of course this won't work for every program. And with current tech you're unlikely to drive FB with a warehouse full of arduinos ;)

First DDG hit, only skimmed it, but it seems to cover the idea pretty well: https://circuitdigest.com/microcontroller-projects/arduino-s...

For my Tasmota based devices, increasing the sleep time in the main loop to 250ms decreases power draw by 40%. They now might miss button presses (seems Tasmota polls?), but that's a non-issue for pure actors.


Hmm, if I am understanding correctly, it's not just about changing the runtime model of the eventloop for languages like NodeJs but also we need fundamental change in our hardware interacts with software.


It's used in embedded. For a server you have other consumers which are a pain yo power down. And then you need to get some interrupt.

My home server can WoL on unicast packets, so that could be used as an "interrupt" to wake the machine from standby. But then you need a suitable workload that allows for substantial sleep time (e.g. wake up for 3s every 30s). Or you could schedule minutes precision polling by waking up via RTC.

Saving power when serving even a few https requests peer second with a sub 10ms response time - as I said, forget about it, at least with x86 hardware as we have today.


IIRC, when USB was first introduced one form of comment was that it forced reliance on more central CPU resources [for polling?]. I.e., conveniently for Intel in terms of need for CPU "power".


Sadly, the paper seems to be missing compiler versions and compilation flags.



Ah thanks. They compare apples and oranges since they haven't disabled runtime checking for Ada.

There are probably more options they missed for the other languages.


I love embedded and I love functional but never shall the two meet. :(


It's not really functional as many would recognise it, but you can write fairly fast functional code in D - i.e. statically enforced purity, metaprogramming, transitive const, lazy evaluation etc.

There are more features on top of this but many require the GC or other memory allocation, which I would say disqualifies them from the most limited of embedded contexts.


Not before there's a CPU that actually has FP instructions [that get translated to imperative microcode].


Lisp, Ocaml, and Haskell look really great in the comparison.


not java.


Java is in the top 5 in the energy usage chart. It's only at the bottom of the memory usage chart (this was expected).


It's super interesting how Java got into the top 5 in the energy usage chart since the JVM is JIT compiling the code and running the code. It's doing more work than the other compiled languages so its budget to come in using less watts is really strict (Pascal, Chapel, Ocaml, Haskell, ). It's something worth exploring further.

In fact in the updated version with Julia added[1] , Julia takes very long to do anything since the Julia warmup is so awful. But somehow even though Julia takes 2.57 times longer to compute results (357%) than Rust and 93x more memory for fannkuch-redux, it's using only 23% more energy. All the while doing tracing JIT and rewriting memory segments to set for execution.

[1] https://sites.google.com/view/energy-efficiency-languages/up...


Java's JIT is fast and quite good. The only JIT language faster than Java (that I know of) is C#


What makes you say that? Java is better than almost all other languages here for energy efficiency.


It was a joke: "java bad".


But the evidence being presented here says that joke doesn't make sense in this context. If it's a joke it's an ignorant one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: