Looks like the same kind of problem that plagues interpreter loops: that top-lev...

acqq · on July 25, 2013

The problem is that most of the time the main switch of lex isn't entered in the loop, instead it is called from the many different points in the parser (a token is considered, something is done, then there's a call for a next token). The only case I see at the moment where there's loop over the switch is when doing the whitespaces, that can maybe be slightly improved.

The whole topic of lexing turning out to be slow is really a thing that should be measured. Really intriguing.

nkurz · on July 25, 2013

Glancing at the source, I think the whitespace parsing could be improved. Since whitespace will rarely if ever (?) be followed by more whitespace, and since most other things are likely to be followed by whitespace, just having both "parse_expecting_whitespace()" and "parse_probably_not_whitespace()" should improve the prediction.

Other tokens probably have similar but less strong "preferences". It's possible that a smart enough branch predictor will learn these on its own, but changing things so that each token ends with its own predicted branch would likely be a win.

Since the branch predictor uses the address of the branch as the 'key', the general goal is to have a separate branch instruction (if, switch) in each place where the probabilities are different enough to favor a different prediction. A wrong prediction costs about 15 cycles, but a correctly predicted branch costs less than 1, so you can put in quite a lot of these and still come out ahead.

Perhaps just ending every token handler with a best guess at what comes next?

  if (T->ptr[0] == most_likely_next) 
    handle_most_likely_next(T);
  else scan(T);

I don't have the syntax in my head, by I think you can use 'perf' to record mispredicted indirect branches and then display that sorted by caller.

acqq · on July 26, 2013

In every line there are more consecutive whitespaces at the begining unless the code isn't indented.

nkurz · on July 26, 2013

Great, that sounds like a fine opportunity for a "best guess": if newline, expect whitespace. Currently, I'd guess that 'whitespace except newline' is the default prediction for the switch() at line 451. I'd also guess that if not followed by a space or tab, a newline is frequently followed by another newline.

Maybe you could combine the case statements for space and newline, and do a branchless 'cmov' to increment loc.linenum if the match was a newline? This could be combined with loop to grab all the whitespace/newlines in one if you think whitespace is occurs in clumps.

acqq · on July 26, 2013

FWIW I'd never do a CPU dependent asm code in the lex.

nkurz · on July 26, 2013

I played with how to phrase that. You don't need to actually use a CMOV, just write code that allows your compiler to use one if supported.

  int tmp = real;
  if (foo == bar) {
    tmp = newval;
  }
  real = tmp;

I've seen it referred to as a 'hammock'[1], and at least for GCC and ICC it usually is a strong enough hint.

[1] http://people.engr.ncsu.edu/ericro/publications/conference_M...

jeremiep · on July 25, 2013

Indeed. At the very least, that scan() function could be broken down into smaller parts to be individually profiled.