That's good to hear. I didn't look in to it in too much depth, I just thought it was strange that 'the' was so high for c++ so clicked on it to see example usage and got things like:
** use the contact form at http://qt.digia.co/contact-us.
furnished to do so, subject to the following conditions:
* This file is part of the LibreOffice project.
// with this library; see the file COPYING3. If not see
So assumed licenses had not been excluded.
Having a brief look at the source, I think with the licence marking approach it's still leaving in quite a few lines from each licence (see above for examples).
Here is more information about it: https://github.com/anvaka/common-words#how