Hacker News new | past | comments | ask | show | jobs | submit login

Looks like the same kind of problem that plagues interpreter loops: that top-level switch statement is causing a branch misprediction on almost every hit.

The solution i've seen was to have each opcode handler determine what the next handler would be and directly jump there.




The problem is that most of the time the main switch of lex isn't entered in the loop, instead it is called from the many different points in the parser (a token is considered, something is done, then there's a call for a next token). The only case I see at the moment where there's loop over the switch is when doing the whitespaces, that can maybe be slightly improved.

The whole topic of lexing turning out to be slow is really a thing that should be measured. Really intriguing.


Glancing at the source, I think the whitespace parsing could be improved. Since whitespace will rarely if ever (?) be followed by more whitespace, and since most other things are likely to be followed by whitespace, just having both "parse_expecting_whitespace()" and "parse_probably_not_whitespace()" should improve the prediction.

Other tokens probably have similar but less strong "preferences". It's possible that a smart enough branch predictor will learn these on its own, but changing things so that each token ends with its own predicted branch would likely be a win.

Since the branch predictor uses the address of the branch as the 'key', the general goal is to have a separate branch instruction (if, switch) in each place where the probabilities are different enough to favor a different prediction. A wrong prediction costs about 15 cycles, but a correctly predicted branch costs less than 1, so you can put in quite a lot of these and still come out ahead.

Perhaps just ending every token handler with a best guess at what comes next?

  if (T->ptr[0] == most_likely_next) 
    handle_most_likely_next(T);
  else scan(T);
I don't have the syntax in my head, by I think you can use 'perf' to record mispredicted indirect branches and then display that sorted by caller.


In every line there are more consecutive whitespaces at the begining unless the code isn't indented.


Great, that sounds like a fine opportunity for a "best guess": if newline, expect whitespace. Currently, I'd guess that 'whitespace except newline' is the default prediction for the switch() at line 451. I'd also guess that if not followed by a space or tab, a newline is frequently followed by another newline.

Maybe you could combine the case statements for space and newline, and do a branchless 'cmov' to increment loc.linenum if the match was a newline? This could be combined with loop to grab all the whitespace/newlines in one if you think whitespace is occurs in clumps.


FWIW I'd never do a CPU dependent asm code in the lex.


I played with how to phrase that. You don't need to actually use a CMOV, just write code that allows your compiler to use one if supported.

  int tmp = real;
  if (foo == bar) {
    tmp = newval;
  }
  real = tmp;
I've seen it referred to as a 'hammock'[1], and at least for GCC and ICC it usually is a strong enough hint.

[1] http://people.engr.ncsu.edu/ericro/publications/conference_M...


Indeed. At the very least, that scan() function could be broken down into smaller parts to be individually profiled.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: