My data is licensed only to Y Combinator and its affiliated companies (I would have prefer it to be licensed it only for news.ycombinator,com), not any other rando that can access it via a browser or an API

Sematic (OP's employer; rebranded to Airtrain?) is a YC company. Not a lawyer but I assume that would be included in "affiliated" since YC presumably has some ownership of them.


Welp, there it is then.

If they are YC affiliated, nothing I can do.

But I do feel saddened and personally betrayed; I thought the licence I gave to YN was just for news.ycombinator.com to store and show my comments, not for any other purposes.

Silly, silly me.

All HN did do was store and show your (public) comments.

What this other company did with that (public) data seems to me to be a separate issue that you should take up with that company, just like the fact that your public comments (which you explicitly gave permission to HN to show) have been indexed by Google, Bing, and probably thousands of other spiders, bots, scrapers, etc.

I'm curious how you expected this to work. Like if you only give HN permission to store and show your comments on the public web, then somehow no other entity out there will be able to do anything with them?

Yes, I expect it to work in the same way instagram works, for example. If a commercial entity started yoinking photos from instagram and using them for commercial purposes, shit will hit the fan.

Again, the fact my user data has been scrapped already doesn't mean it was scrapped legally. I'm ok with HN showing my comments, I'm not ok with anyone else than HN using my user data.

> ommercial entity started yoinking photos from instagram and using them for commercial purposes, shit will hit the fan.

Will it though? I would imagine Meta would block them and then posture with a C&D or a frivolous lawsuit, but if they share the phones you gave them on the public internet, they're publicly consumable right? What law do you feel is broken there?

It's not Meta that would sue them (although they would), it's the copyright owners (the users) that will. Photos or comments, the User retains copyright on their content, and only license it to Meta or YC for specific purposes. Yes, that means Meta/YC and their affiliated companies can use the content for other purposes than displaying it in a browser, but 3rd parties 100% can't.

Well, are you going to sue? Are you going to sue Google and Bing and whoever else? Have you even bothered to contact the people at Airtrain to ask them to remove your content? Have you contacted HN to have them delete your account and comments?

Or are you complaining here (ironically) to make a point, but you don't actually care that much?

Shit hits the fan because Instagram considers user content a golden goose, and they have a vested interest in not letting it get outside their control. Not because they feel a particular obligation to protect user privacy. That's generally been status quo for every social network.

HN cares a lot less; they're a tech comment site and don't actively discourage people using the dataset gleanable from the contents of the site for novel experimentation.

(Sidebar: I see "scrapped" coming up a lot in these conversations these days. Where is that neologism coming from? I'm familiar with people calling it "scraping" but it seems like the term has drifted for some reason?)

Anyone could (in a technical sense) scrape HN or access the data through the API and do whatever they want with it. It's unclear to me whether the license granted to HN by your use of the site gives someone doing so license to your comments (I suspect not but IANAL) but the general argument here is that this would fall under fair use. Certainly that seems the case if they didn't display the dataset itself. I'm not sure how it would fair though given they are displaying the content.

