Yes, litigation and compliance tend to lead the way when it comes to extracting meaning from legal data pools. In my opinion, the single biggest obstacle to getting legal knowledge to play nice with software is the fact that it is all "silo'ed" due to: (1) being in MS Word format, (2) being confidential information, and (3) the lack of conventions/standards in legal documents.
The good news, though, is that legal documents tend to follow a fairly narrow channel of variations, when isolated to particular practice areas (e.g., leases, sales of goods, service agreements, motions, etc.)
I've always wanted to run a huge number of documents through Beyesian filters or something similar to develop some interesting classification rules, but it's damn hard to get a pool of representative documents that isn't strictly confidential.
The only word-processed files that might be non-confidential I can think of are contracts made in the past couple of decades between companies that have both declared bankruptcy. Either that or public EULAs.
http://www.lexmachina.org/