Hacker News new | past | comments | ask | show | jobs | submit login
Free the Law: all U.S. case law online (law.harvard.edu)
147 points by fitzwatermellow on Jan 22, 2016 | hide | past | favorite | 44 comments



No, not "freely accessable online", not until the 8 year exclusivity agreement with Ravel expires. It's a pay service with a free tier.[1]

"Under the Harvard-Ravel agreement, Ravel is paying all of the costs of digitizing case law. HLS owns the resulting data, and Ravel has an obligation to offer free public access to all of the digitized case law on its site and to provide non-profit developers with free ongoing API access (Ravel may charge for-profit developers). Ravel will have a temporary exclusive commercial license for a maximum of eight years."

"For the duration of that commercial license, there will be a restriction on bulk download of the case law, with some notable exceptions. Harvard may provide bulk access to members of the Harvard community and to outside research scholars (so long as they accept contractual prohibitions on redistribution)."[2]

[1] https://www.ravellaw.com/plans

[2] http://lj.libraryjournal.com/2015/12/oa/harvard-launches-fre...


If you want free access to all the law, YC's Casetext.com provides it and they provide it for free for everyone right now.

Try searching for cases that interest you, and you can read about them and lots of comments from attorneys.

Disclaimer - I know most of the team, but I think anyone would agree after reviewing casetext.com, ravel and the other companies trying to do this that casetext is by far the most open and the only real free option for everyone.


There's actually several others. "Free the Law" is not even the first Ivy League entry in this space. Cornell has the Legal Information Institute (https://www.law.cornell.edu/) which is entirely free.

In the for-profit space, FindLaw has been around forever.

There's still nothing yet that really competes with Westlaw or LexisNexis for professionals though. It's not simply a matter of indexing all the decisions and adding a few hyperlinks. The stuff they are redacting for copyright reasons is stuff that you want when you are doing real research.


> There's still nothing yet that really competes with Westlaw or LexisNexis for professionals though.

I have a few friends at big law and they have 100% access to Westlaw or Lexis which means they do not need any of the alternatives at this point. That said, many, many, many small to mid-sized firms cannot afford access to Westlaw or Lexis which is why these various initiatives and companies are so important.

At a fundamental level, these services all help improve the odds you will win a case when hiring a smaller law firm. A lot of people comment here about how there are not society improving companies anymore (uber for food delivery), but I am a believer that each of these companies in this space are fundamentally important to our society.


You are missing the point. Why is it that in a "Democracy" we have to purchase access to the law through a private party? Ignorance of the law is no defense, but I have to pay to relieve that ignorance? Come on!


> Why is it that in a "Democracy" we have to purchase access to the law through a private party?

You don't. What you have to purchase through a private party is electronic access to the annotations that the private party adds to the cases. If you just want the public domain cases you don't need to purchase anything from a private party.

First, there is PACER [1]. PACER is not free, but it is pretty cheap and it is owned and operated by government, not by a private party. The documents retrieved through PACER are usually public domain, and so once someone gets a particular case they can legally share it. I don't have a link handy, but I believe there are startups that gather together cases contributed this way to build a growing body of PACER material that you can get without having to pay even the small PACER fees.

You can also get the material for free at government and school law libraries that are generally open to the public.

[1] https://www.pacer.gov


RECAP: https://www.recapthelaw.org/

Scraps PACER when you use it, injects the docs into the Internet Archive.


PACER is not cheap, nor is it comprehensive. Furthermore, it is not feasible to find out what the law is on PACER. If you already know what cases are important, then you can look them up on PACER I suppose. Or you could go to your local law library (presumably you are a member for a couple hundred bucks a year, right?) and look them up on paper. Why should we need a comprehensive electronically searchable database of casetext or legislation when paper will do just fine? Frankly as a small-time lawyer, your argument that PACER is sufficient or cheap just does not ring true for me. If it did, I wouldn't pay so much for Lexis / Westlaw.


The laws and legal opinions are public domain. The issue is distribution. Pre-Internet, courts were not in the business of publishing books of their opinions. Making the opinions available at the court house was deemed to be sufficient public access. That treatment was consistent with pretty much every other kind of public record (you had to go into the county clerk's office to view deeds, etc). Post-Internet, many courts publish their opinions online. E.g. the Virginia Supreme Court and Court of Appeals has opinions going back to 1995: http://www.courts.state.va.us/opinions/home.html.

What private parties provide is a value-add. That's what you purchase access to. They spend the money and go to the effort of digitizing, indexing, and annotating, case law and statutes going back a couple of hundred years. No single public entity can do that, because the 51 individual federal and state court systems are part of 51 distinct sovereign entities, and as a practical matter individual courts even within a single sovereign operate largely autonomously of each other.


Casetext makes the law 100% free...


Big fan of Jake and his team. Pretty sure they're stuck licensing case data, like everybody else. That means no unrestricted bulk downloads of permissively licensed case data for end users.

That data is "free" insofar as you can read it on Casetext without payment. That's great---I do it a lot---but it's a far cry from what Public Resource aimed to do and what Harvard has now promised to do.

In some situations, like Lexis Nexis' relationship to the State of California, there were are are already "walled garden" interfaces where you can go to read and download case law one case at a time. That hasn't opened the playing field for viable, comprehensive alternatives to Wexis, which is probably why Wexis can be convinced to offer them.

If and when the data become truly open, I can't wait to see where Casetext goes.


Operations may not wind up being all that different in spirit than startups that build open source products. There's a free platform and then services, features and tools for sale that serve the needs of professionals. For example, an API probably isn't an ideal product for a typical law firm. Something that runs on Windows and iPhones might offer a better value proposition in that market segment. The analogy can be extended, of course until it breaks.

I didn't use a copyleft software example by intent. Copyleft doesn't fit because the law itself is in the public domain [IANAL] and that is why The texts can be put under license by Harvard.

Anyway, it looks like an improvement over the current state of affairs. YMMV.


Note that most of the states use lexis or someone and have exclusive rights deals with them as publishers. You can't really easily get feeds of cases (you can get the reporters, and then scan them, but this is not up to date).

So whatever they do will be out of date.


If you haven't seen CourtListener, you should!

It's incomplete, in that most court website feeds don't have Bluebook citable pagination. The US Supreme Court and a few states with strong neutral citation rules are exceptions, but, alas, not the rule.


Ok, that's complicated enough that we took "freely accessible" out of the title above.


"Freemium?"


They have to digitize, in part, because all of the states have exclusive publication/etc agreements with westlaw or lexis or ....

So you can't get a feed of cases from pretty much anywhere, and often, you aren't allowed to bulk download, etc.

Plenty of folks have digitized all the data harvard is talking about here. They are not first. Carl malamud, for example, has scanned all the federal reporters and tons and tons of other cases. http://radar.oreilly.com/2007/08/carl-malamud-takes-on-westl...

and

https://bulk.resource.org/courts.gov/

(My experience here is from back in the early 2000's working on getting pacer/states/etc to open up all of this data, so we could get it into google scholar and elsewhere. Often, they were willing to sell it to us, but they would not let us pay them pretty much any amount of money to make it just open and freely available, which is what we really wanted. Things have not gotten better, sadly, and in fact, have gotten worse)


You need pre-internet cases from the era before courts had websites. Also, court rules generally require citation formats that use page numbers from the official reporters. In the past, the West has actually unsuccessfully tried to claim copyright to its page numbers (West Publishing Co. v. Mead Data Central, 799 F.2d 1219).

There is a lot of other editorial content that actual legal researchers really kind of need, but which they are going to redact because it isn't part of the reported decisions and is owned by the publisher. (Things like unofficial syllabi, indexed lists of holdings, subject matter, etc.)


I also tried and failed to "buy out" existing scans for permissive public licensing about five years ago. I actually approached the Law Library Microsoft Consortium, which none of the librarians I knew were even aware of, and couldn't get anywhere.

Frankly, even now, with Harvard announcement in hand, I doubt I could. It just wasn't a thing the institution knew how to do. "Does not compute."

I'm sorry to hear things have gotten worse.


Autocorrection changed "Microfilm" to "Microsoft".


They are projecting to have Federal and CA, NY, MA, IL, TX done in 2016, and the rest of the states in 2017. I'm curious why those particular states are being done first.

In particular, I'd have expected Delaware to be in the first group, because so many public companies are incorporated there, and so the decisions of its courts on corporate and stockholder issues have major national importance.

Offhand, I can't think of why MA or TX would worked on ahead of DE. Of course it is possible that the volume of material from each state is a factor...it could be that DE is being done in the first group but has a lot of material so won't finish in 2016. I've never taken a look at the volume of each state's output and so have no idea which state courts handle the most cases.


I'm surprised Delaware is even anywhere near the top. Notwithstanding corporate law, the number of cases handled annually by the listed states are at least one, perhaps two orders of magnitude higher than Delaware. New York and California, for example, have nearly 80 times as many lawyers as Delaware (https://lawschooltuitionbubble.wordpress.com/original-resear...). Also consider the target audience. Corporate law litigants are more likely to have access to private databases. Criminal law precedent should be a focus here, and most every other state in the union is going to have more criminal law precedent than Delaware.

[edit: acknowledging DE is in the hopper.]


"Ravel intends to make case law available as it comes through from HLS, and the California law should be online by the end of 2015. New York cases are next; Delaware, Massachusetts, Illinois, and Texas are close behind." [0]

Sounds like DE is up there.

[0] http://lj.libraryjournal.com/2015/12/oa/harvard-launches-fre...


My guess: MA because that's where Harvard is located. Others because they have large populations.


Large populations of corporations: Delaware, NY, CA, Illinois

Large populations of lawsuits: Texas (IP), MA (civil rights)

People populations are less relevant.


There is quite a bit of IP law handled in Texas thanks to patent trolls in East Texas. So I can see there being some interest in getting it covered. MA? Maybe it is interesting from a historical perspective since it is one of the older areas of the US.


Those patent cases are in federal court and so will be part of the Federal collection, not the Texas collection. The Texas collection is decisions of the Texas state courts, not Federal courts that are located in Texas.


Maybe because Harvard is in Massachusetts? ;-)


Ravellaw (https://www.ravellaw.com/) has built interesting knowledge graph visualization tool for the court cases.

https://vimeo.com/127559698

So one truly fascinating aspect of legal practice is that we tend to operate in the gray areas. However, the traditional way of researching case law – reviewing a list of cases returned based on your query – does little to help you sort through the mess.

With data visualization, you not only see the cases, but you see the relationship between cases, and how the cases work together. Among the most significant benefits, the data visualization elements of Ravel Law will help you narrow your research to the most relevant cases more quickly, while also helping you find those cases and arguments that, for whatever reason, didn’t rank in the top of your search.

http://www.thecyberadvocate.com/2015/09/30/data-visualizatio...

The value in this appears to relate concepts from one case to others through the visuals on the graph. The larger the circle, the more important the case will be. Lines connect one circle to another circle and it’s very easy to see which major cases are connected to other major cases. This is like a citator on steroids in my opinion as one can get to this point with a simple search. That means multiple steps in developing the analysis that finds the value and use of related cases. The snippets help immensely in determining which related cases are of value.

http://llb2.com/2014/02/04/a-very-brief-look-at-ravel-law/


I'm curious how LexisNexis is going to attack this breach of their monopoly. Do they have patents on case law search?


LexisNexis doesn't have a monopoly. First, Thompson West's database is just as comprehensive. Second, most of the decisions are indexed online elsewhere (Justia, Google Scholar). Third, almost all the underlying opinions are available elsewhere.

Contrary to popular belief, the dominance of Lexis/West doesn't come from having access to information other people don't. It comes from decades of experience in how to index, annotate, categorize, and cross-link case law, along with value-add services like being able to grab the actual scan of the printed page (OCR sometimes has errors!) or have a runner physically pull a document from a court docket.


No doubt LexisNexis and WestLaw have certain value added services that have certainly been developed through the years based on their positions as industry leaders, but you can't deny certain business practices that box out any potential free competitor.

Reflect back on your law school days. I bet you received a free student account to either or both LexisNexis and WestLaw (accounts worth thousands of dollars given for free through 3 years to train students on their programs); probably free training to your class from certified company representatives; donations to your school/creation of computer labs for exclusive rights to the students; etc...

I recognize your point about not being a monopoly, but that is in the Google - competition is only a click away - sense of the word, there are all kinds of anti-competitive practices and the marketplace reflects them.

If you want further indication of West willingness to engage in anti-trust/anti-competitive behavior take a look at Rodriquez v. West Publishing Corporation and Kaplan, Inc., where West for found to collude with Kaplan to create monopolies in Bar prep and LSAT review classes.


My agency just dumped Westlaw for cost reasons and now we are exclusively Lexis. Lexis is inferior in many ways. The subject indexes are not nearly as comprehensive or usable (makes sense, since West has a huge head start with this with its Key Number System.) The case summaries are too wordy.

Westlaw and Lexis both get big bucks to invest in their databases but then they are not at parity. So I doubt that some free (beer or speech) competitor is going to come along without those same resources and mount a serious challenge in the Lexis or Westlaw strongholds.

However, there is a reason Westlaw and Lexis are both going to Google-like interfaces with their next generation products (Westlaw Next and Lexis Advance). Attorneys are just going into Google now. Certainly if I'm looking for a news article I'm going to Google first, where years ago one might have done LexisNexis first. Even having to type my Lexis password is too big an impediment when I can get stuff on Google.

Also, when I don't need the firepower of Lexis I often go elsewhere. A lot of my work is just the CFR or US Code and the government has those for free, though the search capability is not as good.

So overall I doubt anti-competitive behavior explains much what dominance Westlaw and Lexis still do have (and I do think it's shrinking.) They have invested lots of money and these are professional tools that need lots of investment. The free (beer or speech) competitors aren't going to compete on the elements that need lots of investment, but they will nibble away at the things that Westlaw and Lexis were overkill for.


Sure statutes and the USC are easier on Google. In fact I have never used Lexis or Westlaw to look those up. I disagree about articles, you can certainly find legal articles on google on various topics, but the results are a shadow of Westlaw and Lexis.

But try search for a case on Google - not by citation, not by case style - by natural language or connectors like Lexis or Westlaw. Results are worse than useless, because you just lost time. Any startup could do case law search 10 times better than Google, but they could never compete in the market because lawyers were trained on Lexis and/or Westlaw are entrenched in law schools training the next generation of lawyers.


Their value is in categorization, summarization, searching, and "shepardizing." Shepardizing checks to see if a later case mentions, contradicts, conflicts, distinguishes etc. the case.

Currently most of this is actually done by humans. Eventually more and more will be done by computer analysis. But a quick google like algorithm doesn't really work. Google scholar has most case law and its just not effective.


Patents expire in 20 years, and Lexis-Nexis has been doing this for decades now.


And no company would ever create strategic business method patents at periodic intervals to keep their lock on an industry?

I can easily imagine the USPTO approving silly things like "A method to search case law using IPV6".


No response.

---

To: Erik Eckholm <eckholm@nytimes.com>

From: Aaron Greenspan

Date: October 30, 2015 at 1:31 PM

Subject: Concerns over Ravel/HLS Deal

Mr. Eckholm,

We just briefly spoke on the phone about your article (http://www.nytimes.com/2015/10/29/us/harvard-law-library-sac...). I am a Harvard College ’04-’05 alum, one of Professor Zittrain’s former students (I actually had to fight the administration to be permitted entry into his Law School course in 2001), and one of the first people Ravel tried to hire, because I am a programmer and I run a legal database called PlainSite (http://www.plainsite.org), which competes with them and receives about 16,000 unique hits daily worldwide. I was also a CodeX Fellow at Stanford Law School in 2012-2013, which is a program at Stanford that Daniel Lewis and Nik Reed are now also affiliated with. I tell you all of this only to point out that I am generally quite familiar with the principles, technologies and individuals involved here.

I’ve now corresponded with Jonathan Zittrain and Adam Ziegler at HLS, the latter by phone earlier today. I have brought to their attention a number of concerns, none of which have been resolved in my mind. They are as follows:

1. Harvard University is a Massachusetts not-for-profit organization. Its investment in Ravel, a for-profit corporation, via its XFund venture capital arm, and its subsequent contract with Ravel to earn "proceeds" (HLS’s term) from that relationship, involves profit. The University could in theory lose its tax-exempt status over this deal. This is not the same as the Harvard Management Corporation investing in for-profit corporations to further the University’s mission by earning capital gains and/or dividends—this is an exchange of cash for assets that Harvard claims it owns (even though case materials are public domain) and a contractual promise to monetize those assets through a for-profit company on an ongoing basis.

2. Worse yet, the deal involves profit from the withholding of public access to legal data, which is the precise ill that this relationship is nominally supposed to and claims to cure. In reality, it only exacerbates it by legitimizing, with all of Harvard’s imprimatur, the monopolistic legal information model that has dominated the nation’s judiciary for the past century and a half.

3. Professor Zittrain wrote an entire book on the dangers of internet lock-in and monopolies, yet his actions here are helping to create exactly the kind of monopoly he has become well known for warning about. According to Adam Ziegler’s recent post on the HLS Library blog (http://etseq.law.harvard.edu), there are to be "bulk access limitations" and "contractual prohibitions on redistribution." This is inconsistent with precedent concerning openness to court records and First Amendment law. That aside, what will these restrictions look like exactly? We don’t know, because…

4. ...Adam Ziegler told me that the contract with Ravel is not available for public examination and he did not know when it would be (if ever). He did read me a portion of the contract over the phone, which cited "non-commercial developers," and challenged me to come up with better wording. That’s easy. I don’t know what a "non-commercial developer" is, but I do know what a "non-profit organization" is. As an individual, I am a software developer who is the CEO of a for-profit corporation in a joint venture with a 501(c)(3) non-profit organization which together maintain PlainSite. Does that make me a "non-commercial developer?" Although Mr. Ziegler insisted that the contract was not subject to interpretation because it is simply clear enough already, I strongly disagree, as I expect any lawyer would. All contracts are subject to interpretation. The contract needs to be posted.

5. One of Ravel’s investors is Cooley LLP, a law firm in the Bay Area. Based on what Daniel and Nik have told me in the past, Cooley has early access to Ravel’s software. Essentially this means that Harvard Law School is giving one particular law firm an advantage, which I imagine must violate a number of its own policies, and seems wrong on the surface.

6. Professor Zittrain claims it would have taken 8 years to raise the money that Ravel is providing for this effort. This is extremely difficult to believe. Although Mr. Ziegler refused to disclose how much money is actually involved, we can safely assume it is in the $5 million range given that Ravel has only raised just under $10 million and has had employees to pay for several years. Recently, a single donor gave Harvard University’s engineering school $400 million, as your own newspaper reported (http://www.nytimes.com/2015/06/04/education/john-paulson-giv...). Harvard is also in the middle of a $6 billion-and-counting capital campaign, as reported by The Crimson (http://www.thecrimson.com/article/2015/9/18/capital-campaign...). Are we really to believe that the number one law school in the country (by some measures, anyway) could not scrape together the cash to buy its own scanners, or that it does not have scanners already? Are high speed scanners even that expensive? Here’s one on eBay for $1,450:

http://www.ebay.com/itm/KODAK-i610-PASS-THROUGH-HIGH-SPEED-D...

7. Mr. Ziegler could not answer my question as to why a consortium of non-profits was not consulted ahead of time. I know many that would have been eager to assist, likely including the Internet Archive in San Francisco, which already has several scanners.

8. Though I do not speak for them, I did notice that Harvard and Ravel seem to have nearly appropriated the name "Free Law Project," which is actually a project and non-profit organization at Berkeley that took over from work at Princeton. See http://www.freelawproject.org and http://www.courtlistener.com.

9. The Harvard Gazette has falsely reported, "The 'Free the Law’ initiative will provide open, wide-ranging access to American case law for the first time in U.S. history." (See http://news.harvard.edu/gazette/story/2015/10/free-the-law-w...) I have been in regular contact with Jonathan Zittrain, Harry Lewis (an XFund Advisor who was Dean during my freshman year) and others at HLS about PlainSite since I brought the idea to them in 2011 almost immediately as soon as I started working on it. Additionally, CourtListener (from the group at Berkeley) has also been in operation for years, offering open, wide-ranging access to American case law. There’s also Google Scholar, which is free and certainly more wide-ranging than Ravel.

10. Ravel is, to the best of my knowledge, unprofitable. It remains unclear why Harvard would place its bets on an unprofitable startup, rather than solicit donations for a project—as it is so adept at doing—in order to ensure maximum sustainability.

Mr. Ziegler attempted to dismiss the above concerns on the grounds that we still both agree in the greater goal of open access to law. I certainly have done all that I can to promote open access to legal information, including developing prototypes for digital legal data standards and suing the courts themselves (http://www.plainsite.org/dockets/29himg3wm/california-northe...). But if we both agree on this greater goal, then why has HLS been almost completely unresponsive to requests for cooperative assistance for the past four years, while this deal was being negotiated in secret?

To be clear, Harvard is not the only institution that has made highly questionable and insincere claims about its legal transparency efforts. Stanford CodeX claims to support open access to the law, yet it is now directly sponsored by Thomson Reuters, the parent company of West Publishing, and its "innovation contests" involve pledges not to redistribute case materials. But I would expect the Times to be able to distinguish between academic puffery and genuine efforts to improve the state of our incredibly broken legal system.

Aaron

PlainSite | http://www.plainsite.org


"Its investment in Ravel, a for-profit corporation, via its XFund venture capital arm, and its subsequent contract with Ravel to earn "proceeds" (HLS’s term) from that relationship, involves profit."

Unrelated Business Income? As far as I know, non-profits can invest in for-profit entities with no problem. Even 100% ownership of a for-profit corporation by a 501(c)(3) is ok.

Not to say I don't disagree with your sentiment, but I think that argument is the wrong approach.


It's unfortunate that Aaron's (thinkcomp's?) Harvard Law and Stanford Law educations did not include proper instruction on nonprofit tax law, or he would have known that this is a permissible activity for a nonprofit to engage in.

Though I guess I'm not surprised by this post. After all, Aaron does claim to be the co-founder of Facebook and in the past filed a number of nuisance lawsuits against more than a dozen companies...


You know, at this point I'm hardly surprised by ignorant and mis-informed comments on the internet such as this one, but just for the record, I have never filed a "nuisance lawsuit."


What would Aaron say?


Didn't a Stanford student already do this, posting XML of all federal/state judge opinions on his blog?

https://news.ycombinator.com/item?id=7026960 (discussion)

https://law.resource.org/pub/us/case/ (free mirror, looks like)

It's a great concept, and more/newer is better, but it seems odd for Harvard to act like they're the first to pull it off.


I was browsing around the directory listings, and found what looks like malware - https://law.resource.org/pub/us/case/govdocs/index.php (non-rendered)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: