Hacker News new | past | comments | ask | show | jobs | submit login
The Profile Engine has now been donated to the Internet Archive (profileengine.com)
127 points by randomdrake on March 31, 2018 | hide | past | favorite | 53 comments



According to the creator’s Patreon:

https://www.patreon.com/profileengine

> I spent ten years on a project to build a free, public, highly advanced search engine for social network data. I obtained special permission from Facebook to index 420 million public profiles. When Facebook reneged on the deal and tried to destroy my business, I spent years of my life and a lot of money on legal costs and eventually obtained a settlement. I have continued to operate Profile engine for several years despite it not making money because it stands between Facebook and a monopoly over social data and because it helps educate people not to trust Facebook.

It seems like his main motive is to ruin Facebook — ostensibly by making it obvious to the world the dangers of FB data. And he’s doing this by releasing the accessible data — including photos — of every user who set their profile public.

The PR harm to Facebook is obvious. I’m not so sure the millions of people naive enough to not tighten their privacy settings will be completely understanding. Even if there is no legal danger, this doesn’t seem well thought out in terms of consequences.

Edit: A Quartz article from 2014, in which the service is described as “spammy”.

https://qz.com/279940/meet-profile-engine-the-spammy-faceboo...


"Edit: A Quartz article from 2014, in which the service is described as “spammy”."

Yeah, seems pretty sleazy. "I didn't get as rich as I wanted, so I'm going to release all this information about people who never wanted their information to be distributed this way, and I'm going to release it in a way that is uncontrollable."

Claydon writes: "I could have sold this data to some data broker for a lot of money and it would have been used by those with money for marketing or political purposes rather than freely available for the public good. Instead I donated it for free to the Internet Archive."

He's either naive or stupid.


How is it any worse than what the Internet Archive does normally when it scrapes sites itself? Most sites aren't able to stop scrapers the way FB could.


Have you seen Internet Archive try to monetize unauthorized scraped data? Only to release/leak it a DECADE later under the facade of free information.


Internet Archive would be sleazy if it started out by selling data before becoming what it is now? I don't see that the history of the company matters.


tbh I'm not crazy about some of the things the Internet Archive does, either.

What bugs me about Profile Engine is that there are probably a lot of people who realized their mistake of leaving their profiles wide open on Facebook, and tightened up the restrictions, whose details might still be wide open in this dataset without them even realizing.


I expect him to be sued to oblivion when the GDPR comes into effect. This is all of irresponsible dangerous and illegal.


I don't. The data doesn't even contain surnames of people.


Does ProfileEngine have a local instance in the EU or something? Are you thinking the "G" in GDPR means "global?"


The GDPR purports to apply to any EU person, no matter where they are. That's one of the many problems with it.

However, it is already the case that the EU casts a wide net. blekko got a demand for info from the EU regarding Google's dominance of the search space in relation to mobile apps. Most of the questions were about opinions, not facts. After consulting with our lawyers, they said that the small amount of advertising income we had from the EU was enough that the EU could demand that we answer, but we could ignore the first one and see if they asked again more firmly.


Would it be possible for the EU to get an arrest warrant and extradite the creator to the EU?

Or if an extradition isn't possible, what if the creator goes on vacation to the EU, would the creator get arrested then?


Doesn't this become the internet archive's problem, and not the creator. And all the internet archive would have to do is set up a way for people to remove their data from the archive.


No. Unless he goes to the EU he is not subject to the laws of another country.


Then how was Roman Seleznev arrested and charged under US laws even though he never visited the US, or even a country with an extradition treaty with the US?

https://www.youtube.com/watch?v=6Chp12sEnWk

https://en.wikipedia.org/wiki/Roman_Seleznev


Because small, weak countries like Maldives will kidnap and send anyone, anywhere if the country asking for the kidnapping victim is wealthy and/or powerful enough. A much more withering indictment could be made against New Zealand for their attempted extradition of Kim Dotcom to the USA.


Exactly. People tend to counter with "well the US did it". That's your countries fault people, I'm sorry, and as a US citizen I work on these things, but please stop blaming us for your weak governments. We have our own corruption problems, and if you think I mean the outsider who just wrecked two political dynasties and our MSM then I politely suggest you figure out who owns your news sources.

https://news.ycombinator.com/item?id=16661704

BTW that was Biden's OP for his pals. Creepy guy.


This guy most likely has no presence or assets in EU. So they will have no success in trying to enforce it.


And he probably won't be able to go there in the future without issues.


I doubt they will start flagging tourists for GDPR debts. But if they do, many Americans do just fine without ever visiting the EU.


GDPR is not in effect, yet. Can it retroactively go behind previous events? Not sure, not a lawyer.


Very very few laws become retroactive.


SOSTA/FOSTA was designed to be retroactive.


How is collating public profiles dangerous, or illegal? GDPR doesn’t cover it unless the poeple in question make a specific request either.


It's a bit difficult to make a request when a torrent of the data is released.

And the privacy settings on facebook in 2007 were "lol what's privacy?"


You’re not wrong, but you also didn’t answer any of my questions.


Very interesting. For those curious, this covers the period where Facebook grew from ~50m to ~500m monthly users (depending on what months are included). Some selected events from Facebook's history in this era:

2007/01 m.facebook.com launched 2007/05 Facebook Platform launched 2007/11 Facebook removes "is" from status updates 2008/06 Facebook settled with the Winklevii 2008/11 Facebook Credits launched 2009/02 The like button is added [1] 2009/09 Facebook announces they are cash flow positive 2009/09 Facebook launches @-tagging friends 2010/06 Comments now have like buttons 2010/10 Fincher's movie The Social Network is released

[1] I honestly had forgotten that this wasn't always part of Facebook, but it's apparently true: https://techcrunch.com/2009/02/09/facebook-activates-like-bu...


And that's what happens when noprocrast interacts with the lack of preview and time-limited editing. This is what I meant:

   2007/01 m.facebook.com launched
   2007/05 Facebook Platform launched
   2007/11 Facebook removes "is" from status updates
   2008/06 Facebook settled with the Winklevii
   2008/11 Facebook Credits launched
   2009/02 The like button is added [1]
   2009/09 Facebook announces they are cash flow positive 
   2009/09 Facebook launches @-tagging friends
   2010/06 Comments now have like buttons
   2010/10 Fincher's movie The Social Network is released
Ahem.


I remember having a lot of fun writing status updates that began with ’is’ back in the day. I can’t believe the last time was more than ten years ago...


I wonder when the "how you met" which used to generate all kinds of absurdist responses disappeared, perhaps to encourage more friending between people that actually had no clue how or if they'd met.

And no Facebook timeline should fail to mention when they removed the Top Gun quotes from the bottom of the page.


> We sued Facebook, fought hard in a David and Goliath battle and won a good settlement. One day, maybe we'll have time to tell the whole story - you'd be utterly shocked what goes on inside Facebook - what you've already heard is just the tip of the iceberg.

Well that's a very interesting statement....


edit: looks like settlement was confidential: https://www.quora.com/What-is-Profile-Engine-and-how-can-a-p...

edit: CNET article including PDF of complaint: https://www.cnet.com/news/facebook-unfriended-us-company-cla...


FWIW I haven’t seen any sign of collaboration by Internet Archive, on their site nor on their Twitter. Anyone can upload what they want to IA — it’s no indication of IA endorsement or of copyright status.

Edit: sorry if I was unclear. Anyone can create an account and create their own file archives. If you poke around enough you can find old movies and books that are still under copyright. I assume IA has to follow DMCA


Considering it is hosted on archive.org I'll take his word for it.

Edit: didn't realize anyone can put something at a top level download link.


All downloadable items on the Internet Archive have a link like that.


The Internet Archive must follow the DMCA by no longer making the content available, but that doesn’t mean it isn’t archived.


Yes, anyone can upload anything.

1. I have seen "The item is not available due to issues with the item's content" many times while poking around. It's the Internet Archive's version of SIGSEGV or "Bad command or file name." This is due to DMCA mostly.

2. The kinds of dark items on the IA are extraordinary, for example https://archive.org/details/%77%68%61%74%63%64%63%72%61%77%6... (not NSFW; I've obfuscated the URL so it doesn't get indexed). If this makes no sense to you that's okay.


I don't know why you think you've obfuscated the url, as a search engine guy I assure you that crawlers are not confused by that sort of obfuscation.


I meant "not get indexed" as in preventing this comment from showing up in searches for the unobfuscated text.


A for profit company makes a copy of user data, fights Facebook’s requests to delete that data, tries to monetise it, and then then that fails (possibly because of incoming privacy legislation) freely distributes it as a data dump.

Who is the bad guy here?


Do we have to pick exactly one bad guy?


As my Mom is wont to say “Both people in a fight can be wrong. And they usually are.”


from the Readme :

<quote> "... We have donated the complete Profile Engine database to the Internet Archive with the current exclusion of the following sensitive fields:

Email address Facebook user ID number Facebook username Surname Profile Engine login password hash ... " </quote>

and this : <quote> "... What if this data is abused?

This data has already been publicly available, first from Facebook and then on many search engines (including Profile Engine) for up to 10 years, with the consent of the person who entered their information on Facebook. Anyone who wanted to misuse this information has probably already had access to it and already saved what they want ... " </quote>

so the main reason is that he probably realized that there is no way to make money with this as anybody who wanted to (mis)use the data already had their own copy of it.


Since at least today, the items have been removed from both the linked page and from archive.org; “The item is not available due to issues with the item's content.”.

Fortunately, nothing vanishes permanently in today’s web, since we have this wonderful thing called archive.org… oh, wait…


Okay, I was made aware of this website a few days ago and it was possible to search Facebook users by many criteria as location, gender, age and IQ.

So If I'm getting this right, now we are all can have the data that Cambridge Analytica was supposed to delete, right?


No. This is Facebook data from public profiles gathered in 2007 to 2010.

CA’s data was collected via a quiz app, from users and those users’ friends, sometime around 2013-2015.


Okay, any idea how this website was able to search by IQ?


No idea. But IQ is not a data point in FB’s system. So likely it’s a data field that’s part of some game/app?


According to this article Facebook can estimate user's IQ: https://www.ft.com/content/3dfa397c-9a73-11e4-8426-00144feab...

Do you think that this could be a FB data point as the creator of the archive claims to have had a deal with Facebook to scrape data?


I can’t read past the paywall. But you read a newspaper article in which someone (who? A FB official? An academic?) claims FB can derive something from user data? That’s something literally anyone can try to do. Doesn’t have to involve FB officially.


Sure, but I would love to see what FB derives from their wast data and userbase.


"Prior to April of 2008, Plaintiffs had written the first Survey, Petition, Polling, Quizzes and IQ Test applications to appear on Facebook and ranked as one of the largest Facebook application developers in the world."

Source:

http://www.techfirm.com/storage/ProfileFacebookComplaint.pdf


This is public profiles, not friend networks. It's the data that you weren't supposed to publish if you didn't want people to see it.


First time I hear of this. Wow, what an interesting release!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: