I wrote once a license crawler for VersionEye which recognises the most common licenses in README files on GitHub. But I didn't crawl whole GitHub! Only projects which are submitted in package managers and did not provide a license info on the package manager. I'm using it for example to complete license infos about RubyGem projects without a license on RubyGems.