Hacker News new | past | comments | ask | show | jobs | submit login
Docracy Terms of Service Tracker (docracy.com)
200 points by matt2000 on Feb 2, 2013 | hide | past | favorite | 49 comments



Buy.com is now Rakuten.com? Talk about choosing the wrong domain name to consolidate under!


Indeed. Having recently sat in an Executive style naming competition meeting, I can confirm the state of multinational corporate decisions involving brands and names is so intensely bureaucratic and complex that the rational behind the decision is far removed from common sense. I imagine the process involved dozens of executives and at least two outsourced firms, marketing managers, and brand evangelists. Charts were produced and powerpoints created, showing naming relevance and cultural sensitivities. SEO experts formulated brand uniqueness prospects for virality and search, self declared linguists gauged locality based pronunciation effectiveness. Translators estimated ease of use and consistencies globally. In the end its obvious to them that they made the correct decision.


"We need to ditch this short, easily typed, prescriptive domain name into something that sounds like a combo move in Street Fighter."


More like 'Rakuten means “positive spirit.” The name Rakuten Ichiba literally means a “market of positive spirit,” where shopping is entertainment. You see?'


Of course he made the right decision! The CEO is Harvard Business School educated, "fluent" in English [1], and has mandated that all company meetings shall be conducted in English (even in Japan) [2]. Out of touch you say? Nonsense!</sarcasm>

[1] The reality is far from this, of course.

[2]http://asiajin.com/blog/2010/05/21/english-please-rakuten-ba...


I think part of the problem is that most 日本人 are 恥ずかしい about their 英語能力. So when someone isn't, they don't stop to think that it might simply be braggadocio.


I love the way they list out each member's Educational history http://www.rakuten.com/ct/aboutus.aspx?omadtrack=campaignint...

A very strict translation with no adaptation or nuance for the market they're entering.


While documenting small changes probably is OK, I would think that reproducing long passages of a site's TOS, even in the context of documenting changes as a public service, could run into copyright-infringement problems.


Fair use: commentary and criticism.


Yes, some of these TOS have clear copyright notices. Are these valid?


The copyright notices are valid, but fair-use exceptions still apply, and fair-use tends to be read fairly broadly with this kind of document, compared to, say, lengthy excerpts from a novel. I'm not aware if there's any solid caselaw specifically on quoting excerpts from contracts or licenses, though.


IANAL, but I think a big factor would be whether it acts as a substitute. Since they aren't being used as a TOS on docracy, then it shouldn't be an issue. If you copy a TOS and use it on your own site, then it could be a problem.


[I work at Docracy] We believe that if everybody who visits and uses the company's site is implicitly held to these terms, then they should be able to have a copy of the terms. We would be very surprised if a company asked us to stop tracking these policies, and nobody has yet.


[deleted]


[I work at Docracy] The first part here isn't necessarily true. Many sites don't keep previous versions of their terms online, so the version of the terms you agreed to might no longer be available.


The looming threat would be if a company declared copyright infringement to get rid of an embarrassing earlier version of a ToS from your website.


You are right, and we believe that what we are doing represents "fair use", as per _delirium above.


Looks nice, but slightly buggy

1. https://www.docracy.com/doc/versions?docId=0b0kbmmpoon

version 1 looks pretty wrong (https://www.docracy.com/0b0kbmmpoon/local-com-privacy-policy...).

Wayback has versions from Jan 15th and Jan 16th (http://web.archive.org/web/*/http://www.local.com/privacy/) around the same time downloaded, and they look more normal.

2. Edit and download seems to pull the wrong version in some cases:

http://www.docracy.com/0xk2nizy6sk/fidelity-com-privacy-poli...

(It says version 1, displays version 1, Click edit and download, you get version 2)

3. The diff engine doesn't seem to try very hard in certain cases:

http://www.docracy.com/doc/diff?revisedId=0razhem25wh&or...

(The first paragraphs of these documents are a lot closer than it makes seem). It seems to have a bunch of stream alignment issues, which makes me think you are using a line based diff here, and post-processing the result.

Anyway, besides the above just found playing around, it looks otherwise nice.


Really rad implementation of a diff system. I've recently been working on a service for diff/versioning. If you're interested in checking it out, it's http://imnosy.com


Cool service. I've always used ChangeDetection, but wished they offered more-control (better scheduling, frequency of polling, etc.). Are these features you'd thinking about including?


Hey uptown, Long term I'd really love to make this service as robust and feature-complete as possible, while also maintaining an approachable and welcoming user experience.

Being able to schedule when you are sent notifications, batch them together (eg. if you're watching to web pages, you get one email each day at a specified time), as well as increasing the frequency of polling (goal is 1/hour) is on the list.

I'm working on a feature I see as unique in that it will allow you to choose which part of the page counts as a change, thus minimizing false-positives.

Can you think of anything else that would be useful? Something frustrating you of ChangeDetection?


I think those are the big things. ChangeDetection doesn't tell you when it'll poll. I think it's driven by the setup-time, and their queue, but ideally I'd like the ability to setup a specific time to check a given page, with a recurring-check frequency, and a way to show the diff between the two in an intuitive way.

If-this-then-that integration might be cool - but that's probably an edge-case not useful to most of your potential clients.


Agreed. Frequency control would be really great, and an even more intuitive interface for distinguishing what changes, really quickly, is all on my list :)


Is there a way to subscribe to change notifications on ToS docs from a specific company?


[I work at Docracy] There isn't, although you may be able to filter the main RSS feed. That's a good idea, though, we should add a feed to every document... something I will probably do... right now.


Would you guys be willing to submit the diffs to the Internet Archive in WARC format?

http://www.digitalpreservation.gov/formats/fdd/fdd000236.sht...


Really great work, +1 for a subscription mode.

As a further feature request, it would be really great to be able to flag/vote a diff as alarming, so it could be highlighted for more people can notice it. For example, this diff by Geico is pretty questionable:

https://www.docracy.com/doc/diff?originalId=0ihn8solvd3&...


Heh, I like that.


This is great, though there are loads of spurious differences caused by what looks like encoding problems. e.g. the blogspot privacy policy https://www.docracy.com/0rl3vthb6b7/blogspot-com-privacy-pol... has the classic "WTF UTF?" †symbols in there. Previously they were long dashes or quote marks.


[I work at Docracy] C'mon, I wouldn't say "loads"! We are still working on a few lingering encoding problems, and we usually filter them out when we spot them. Other spurious changes are from the sites themselves. For example, every couple of days the IRS changes the date format in their privacy policy for some mysterious reason: https://www.docracy.com/doc/diff?originalId=1frr1ml4lt&r...


heh, weasel words are weasely. sorry. Skype had a lot of changes due to quote marks changing too. don't take it the wrong way though, diffs are tricky i guess. (mobile typing, forgive brevity)


I was thinking one day to build a similar kind of service, but rather per-user (you upload a URL of your thing and we'll track it). But I've never started after some analysis. It's not trivial to do it in a general automated manner on a massive scale due to the problems with diffing PDFs, changing URLs etc. (I was considering primarily the tables of banking fees & provisions). Maybe one day... ;)


Let's say I have a url of a certain file, how would I go about to compare and highlight the changes (in an automated way)?


There are known command line tools for that [0] since many years. While it's easy to do it on purely text (ASCII) files, it's a bit more work on HTML files or binary files. For them you would probably extract the textual context first (e.g. stripping HTML tags) and then compare the clear text. Alternatively you may render the HTML/PDF file and do visual comparison, then extract the diff text from images.

By default diff programs create a line-based output, but you can change it to minimum per-word highlighting via options (e.g. 'git diff --color-words').

The thing with PDF is that often even when you re-save the same PDF file in the same editor, you would probably get entirely different files. I'm not a PDF expert but from what I've learned, PDF is the type of file that saves kind of vector representation of glyphs and their placements and is often unaware of what that glyph represents (depends perhaps on the program used to create the PDF and options). Importing PDF back to e.g. OpenOffice is an ugly work for the plugings.

There are some exiting solutions for diffing PDFs [1] however I haven't played with them really.

[0] http://en.wikipedia.org/wiki/Diff [1] http://stackoverflow.com/questions/887186/java-pdf-diff-libr...


Oh look, another inferior change to a thread title.

I really wish you guys would knock that off.


This is a great use of technology and programming/diffs to both cool and useful ends. I wish all my agreements, insurance policies, etc had something like this cleanly through one place. Maybe a policy standard for just that so companies make it public that way and no copyright problems. Also, companies eventually not doing it would be seen as devious.


[I work at Docracy, prepare for a plug!] Well note that we have a great negotiation and e-signing service that uses these same diff tools, try it out! http://www.docracy.com/supersigning


This service is really cool. I think all companies should host diffs of their terms. Stripe, for example, has both the previous versions and diffs of their terms (https://stripe.com/us/terms) available at https://github.com/stripe/terms.


this is great, glad someone did this. i've often wanted such a diff instead of "dig through 27 pages and see what possibly changed" with iTunes' TOS for example.

that said some of these terms look scary.


Wow. I was just thinking of making something like this yesterday. Guess great ideas don't last long. I was also interested in seeing how similar terms of services are. I've been working on a paper studying how boilerplate terms seem to be growing more dense over time, and one of the things I've noticed looking at a lot of TOSes is how similar they all are. For example, there are really only two or three variations on the wordings of choice of law clauses, and almost all arbitration clauses look the same. Wonder if you can run a comparison on your database across various TOSes to see how similar they are (a la turnitin for essays).

I have a striking suspicion that the lawyers (or webmasters) are just copying and pasting a lot of these terms from standard repositories or otherwise from other services.


Awesome idea!! Any chance someone could curate it and highlight the changes that were meaningful? For example, Geico just took out the following:

"We do not save this data nor disclose it to any third parties."

(anything but comforting...)


Very nice, this is very much needed. I had the exact same idea, which I worked on for a while about a year ago, as with lots of projects, it didnt get finished. :)

As other have commented, a discussion area for each change would be very interesting, especially if there are multiple changes happening at the same time.

I can imagine not everyone want this focus on changed tos, but its very good the user can easily get the information.


Would you add a "top 20" list of the sites with the most words or characters changed/month? I'd like to see which sites like to churn their terms.


Using Docracy's unique document change analysis, etc.

I'm really curious to know what's so unique about this as compared to classic diff plus colours?


[I work at Docracy] What is unique is how we handle diffing within a hierarchical HTML structure and how our algorithm is tuned to display sensible diffs for written language text, which requires some more nuance than what is typically used to diff code lines. It's our own, homespun algorithm.


Please license this to the IRS and lobby for all civil and criminal laws to be subject to obvious revision history markup.


A similar service is run by the EFF:

http://www.tosback.org/timeline.php

Source code is available at:

https://github.com/pde/tosback2

Historical crawl data is available at:

https://github.com/pde/tosback2-data


The Amazon TOS notes it was `Last updated: January 28th, 2013` and includes the recent Elastic Transcoder terms. This precedes its announcement, and journalists may try to use this for scoops.

However, the timestamp on it is January 29. How often do you check for updates?


[I work at Docracy] We check once a day. We've seen some strange stuff going on with "Last updated:" though. For example, if you check out the history of Skype's ToS, they changed it in mid-January, but with a last-updated date of "February 2013"! Then, just a couple of days ago, they reverted it. Here's the first change: http://www.docracy.com/doc/diff?revisedId=0xgv5wfb72v&or..., then they changed it back: http://www.docracy.com/doc/diff?revisedId=0jxg6uxmx4d&or...


Really excellent idea. Thanks for doing this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: