Those are Unlabelled Accuracy score results, but Redshift is quietly computing t...

Those are Unlabelled Accuracy score results, but Redshift is quietly computing the dependency labels, while parser.py does not. Running Redshift in unlabelled mode gives very fast parse times, but about 1% less accuracy. The labels are really good, both as features and as a way to divide the problem space.

The data sets are the _Stanford_ labels, where the main results in Zhang and Nivre refer to MALT labels. Z&N do provide a single Stanford accuracy in their results, of 93.5% UAS.

Sentences per second should be just over 100. I use k=8 with some extra features referring to words further down the stack, where Z&N use k=64. Right at the bottom of the post, you can find the commit SHA and the commands for the experiment.