Hacker News new | past | comments | ask | show | jobs | submit login

I tried to exclude copyright lines as much as I could. I used "license markers" for that, but I might have missed something.

Here is more information about it: https://github.com/anvaka/common-words#how




That's good to hear. I didn't look in to it in too much depth, I just thought it was strange that 'the' was so high for c++ so clicked on it to see example usage and got things like:

   ** use the contact form at http://qt.digia.co/contact-us.

   furnished to do so, subject to the following conditions:

   * This file is part of the LibreOffice project.

   // with this library; see the file COPYING3. If not see
    
So assumed licenses had not been excluded.

Having a brief look at the source, I think with the licence marking approach it's still leaving in quite a few lines from each licence (see above for examples).




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: