I think they're deriving the additional features at prediction time. The test and patch don't contain all the information you need to compute the features, but they contain sufficient information when combined with a big static lookup table. At least that's the way I read it; agree it could be clearer.
> In the past, how often did this test fail when the same files were touched?
> How far in the directory tree are the source files from the test files?
> How often in the VCS history were the source files modified together with the test files?
But for prediction all they input is a tuple (TEST, PATCH), and XGboost works fine without the additional features?