Those are Unlabelled Accuracy score results, but Redshift is quietly computing the dependency labels, while parser.py does not. Running Redshift in unlabelled mode gives very fast parse times, but about 1% less accuracy. The labels are really good, both as features and as a way to divide the problem space.
The data sets are the _Stanford_ labels, where the main results in Zhang and Nivre refer to MALT labels. Z&N do provide a single Stanford accuracy in their results, of 93.5% UAS.
Sentences per second should be just over 100. I use k=8 with some extra features referring to words further down the stack, where Z&N use k=64. Right at the bottom of the post, you can find the commit SHA and the commands for the experiment.
The data sets are the _Stanford_ labels, where the main results in Zhang and Nivre refer to MALT labels. Z&N do provide a single Stanford accuracy in their results, of 93.5% UAS.
Sentences per second should be just over 100. I use k=8 with some extra features referring to words further down the stack, where Z&N use k=64. Right at the bottom of the post, you can find the commit SHA and the commands for the experiment.