Not that I'm a big fan of Jython, but including startup time in a benchmark is o...

dripton · on Oct 15, 2010

I thought about that, but decided against subtracting out startup time. This is a real-world benchmark of how long it takes to run some actual code that I care about. Startup time counts, but the cumulative execution time will be dominated by time taken to run the slower programs, not startup time in the trivial ones, so I don't think I'm counting it too much.

But I think next time I do this I'll increase the max runtime from 1 minute to 3. That will keep more of the slower programs in the benchmark, and make startup time count that much less. Without actually removing it, because that seems too artificial and contrived to me.

mzl · on Oct 15, 2010

On the other hand, there are a lot of small scripts and utilities that are written in Python, so it is a useful metric for quite a lot of use-cases.

fauigerzigerk · on Oct 15, 2010

I don't see why I would want to benchmark those kinds of utilities.

mzl · on Oct 15, 2010

Because the start-up time of the interpreter will affect you every time you run one of those utilities.

fauigerzigerk · on Oct 15, 2010

Sure, and I did mention command line utilities. The issue I see with the benchmark is that it compares apples and oranges. For some programs it tests JIT performance and for others it tests startup time. If I want to test startup time, I just use a hello world program. To test JIT/interpreter performance, I try to exclude startup time.

derefr · on Oct 15, 2010

Hello world programs only ever rely on the bare runtime, though; real utilities have to load libraries after the runtime loads, which incurs further delays (which might depend on JITing/interpreting speed if they're not pre-compiled.)

mzl · on Oct 15, 2010

Ok, you've got a point there about separating the things that are measured.

rdtsc · on Oct 15, 2010

The same can be said about benchmarking numeric vs io code. Is it apples and oranges and bananas then? Most programs have a mix of everything. Whether Eurler's problems are a good mix that represent your workload is up to you to decide.

Raphael_Amiard · on Oct 15, 2010

In addition to what fauigerzigerk said, if you run a lot of command line utilities where startup time is relevant, there are established solutions to that (nailgun for example) so it still isn't a fair comparison. If you want a comparison of state of the art solutions, which this post obviously is, put nailgun into the mix.

ent · on Oct 15, 2010

those utilities are often used shell scripted to run many times in a row. at least in interactive use, the multiplied startup times can grow annoyingly long.

fauigerzigerk · on Oct 15, 2010

You could use one of those Java background daemons to do that, but anyway, what I was trying to say is just that this benchmark doesn't test JIT or interpreter performance in the case of Jython.

anonymous · on Oct 15, 2010

Also Hotspot VM needs warmup to achieve maximum performance since it takes some time to detect and JIT-compile performance-critical parts of code with all optimizations.

dripton · on Oct 15, 2010

As does every other JIT. This is an apples-to-apples comparison.