Hacker News new | past | comments | ask | show | jobs | submit login
Tim Berners-Lee: Tell Facebook, Google you want your data back (cnet.com)
124 points by soitgoes on April 18, 2012 | hide | past | favorite | 50 comments



When discussing data-portability of social networks, Facebook's data download feature is sometimes brought up.

I periodically use this feature, to see if it has improved any since it's initial disappointing release. As of a few weeks ago, it has not.

A true data download from Facebook would consist of: a machine-readable form of every action I've taken on Facebook (likes, friend requests sent/received, photos I added tags to, photos uploaded, status updates, messages sent/received, comments made, etc.) along with timestamps and at least URIs pointing to the objects referenced (photos, people, etc.) if not a copy of my view of those objects.

(I understand why Facebook might claim they shouldnt give me, for example, dates that other people de-friend me, since that isn't accessible info. However, I do think that copies of statuses I commented on and can still see isn't unreasonable)

What we have now is: A static HTML dump of your profile page, photo page, and messages that is massively incomplete. Since the switch to timeline, fewer actions I have taken in the past seem to qualify for inclusion on my page ("moxiemk1 commented on friend's photo" used to feature more often in my profile than it does now). Since the revamped messages/chat integration, the messages dump (which always eventually cut off at some point in the past) is even smaller, and harder to read.

I would indeed like to have copies of the data I've created, and would like to emphasize that Facebook's "effort" to do so is complete BS.


When you join facebook you're joining a free to use data-silo, most of which is not open to the internet - the site is predicated on hiding your data from the world (and esp. google), and then selling it on to advertisers and other businesses, sometimes anonymised, sometimes not. Beacon is a perfect example of the sort of uses you can expect them to put your data to in future. All the data from like buttons, your social graph etc, is invaluable to them, and invaluable to advertisers and retailers.

The logical conclusion of that is they have absolutely no incentive to give you your data back, in fact they view that as their data, earned by offering you the service of sharing stuff with your friends, without having to set up your own website. That data is their crown jewels, so I am amazed that anyone would expect them to give it up, or be surprised at their reluctance - this is the very essence of Facebook, and they've done very well out of it.

That's not to say that you should never use Facebook, but just that if you do use a free service like Facebook, you should expect to give up some of your privacy and control over your data in return. If you don't want to do that, it would be better to use another service (which doesn't rely on selling your data as their business model).


If you are not in the USA (and maybe Canada) and you sign up to Facebook, you are signing an agreement with Facebook Ireland Ltd., a company registered in the EU and, hence, subject to EU law.

One part of EU law is that people have the right to access all personal data that a company holds on them. Here's an example of how to make such a request. http://europe-v-facebook.org/

You might claim "people voluntarily choose to join a free service, what right do you have to demand anything?" however that's not how laws work. Facebook is legally obliged to give non-US customers their data.


That's really interesting, I had no idea they were incorporated in Ireland too and hence sometimes subject to EU law (in theory at least). Thanks for the link. I'm not actually a member, but I'd be interested to see what they produce on other people just to see what sort of data they're holding.


I believe it's done as a tax dodge, so they can book some of their income in Ireland (which has low corporate taxes). May not have fully realized the privacy-law implications at the time they set that up. Either that or the savings are enough to be worth it.


Facebook does not sell user data.


In the crude sense of collecting all your posts and sending them all to an advertiser, of course not. In the sense of selling your interests, friends, social position, age etc to advertisers as a datapoint, yes they do; that's how they target ads and make money. They also tried to harvest purchasing habits from other sites like Amazon (with Beacon), and give broad access to developers, some of whom abuse the privilege and have been caught selling data on. I'd expect that sort of activity to increase post-IPO. They're not alone in this of course - gmail does the same, without the data lock-in.


No, they do not sell interests to advertisers. What they do is allow advertisers to show their ads to people with those specific ages, interests, and such. It's a subtle and important difference: with this method, advertisers only know that their ads are being shown to someone who matches their criteria, not who. Advertisers are not able to correlate your identity with ad targeting.


They don't have to, they sell access to it. Why would they sell it when they can basically rent out time to access it?


No, they sell services using it. Advertisers and others who give Facebook money do not see user information; they just pay to leverage it for effective targeting.


Do you work at Facebook? :)

I think you're missing the point. Facebook doesn't have to hand out a text file with your name and a list of all your friends for it to be considered "selling your data".

It's considered "selling your data" when you go to an unrelated site and that site gives you customized content based on information you gave to Facebook.


That does not happen either; user data is not given to third parties. I do not work at Facebook; however, this has been clearly and repeatedly stated by them (including, I believe, in legal filings).


That's a fair criticism of the feature. I'm sad it hasn't gotten a lot of love. But it's not BS, or a cynical marketing ploy, or what have you. Most projects at FB are executed by a surprisingly small number of people.

FWIW, I'll take a look at the next hackathon.


If you need any ammunition to get it looked at more intently, here: http://www.identityblog.com/?p=1201


Someone has likely written a better download using the FB Graph API. If not I bet someone on HN will write one. http://developers.facebook.com/docs/reference/api/user/


Annoyingly, it seems the Graph API just can't do much of what it seems it ought to be able to. So far as I can tell, Facebook doesn't offer any convenient way for me to read my friends' information -- that I have access to, on Facebook -- by automated means. (Or other people's info that's publically available.) This means I can't write a tool that makes a contact-info-list for me; if I want to assemble a list of phone numbers, for instance, I have to do it manually. (For phone numbers in particular -- but not for other things -- Facebook used to do this on the website, but this seems to be gone.) Nor can I do simple things like, say, given two people, whose friendslist I can view, determine which friends they have in common. Facebook lets me see mutual friends of me and another person, but not two other people. These are basic tools I should be able to write, but last I checked, the capability just wasn't there.


I'm somewhat pleased friends' phone numbers aren't available. It means you don't have to worry about a friend clicking "Click here to backup your phone numbers" scams. Your inability to extract data about your friends is a kind of privacy feature. Automated access to your data, good; automated access to other people's data, bad.


I wrote myself a script, which does this via screen-scraping once. Of course, this is a shaky solution. On the other hand, it does not require me to request an ID from Facebook or give permissions to any third-party app.


G+'s Takeout seems to do better. I don't think it has all you want, like friend requests or photos you tagged, but it has a lot.


I had deactivated my Facebook account. After I read your comment I decided to log in and try to get this dump. I waited probably ten minutes and it still hadn't finished. After that period I lost interest in having my Facebook open so I suspended the account which probably canceled the background job. This feature is slow as fuck.


It took a few hours in my case. You don't need to be logged in, though. They sent me an email, when it finished.

I assume they have a low priority batch system to handle these requests.


Twitter is one of the worst culprits. Currently there is no way to search for any tweet more than a few days old.

Google's realtime search used to provide the ability to search and retrieve tweets from a specific date and time in the past, but twitter cancelled that deal and haven't provided any decent replacement.

A publicly accessible archive has huge potential for research. A friend of mine used Google's realtime search to pin point and keep record of the first tweets out of Christchurch following the Christchurch earthquake. Sadly there is no way to do this now and twitter don't seem to care.


They're going to start selling archives of the past two years' of Tweets through DataSift, though for DataSift's usual hefty prices (for a real-time feed, they currently charge $0.10/tweet!): http://www.bbc.com/news/technology-17178022

Presumably the target customer is researchers of the finance-sector variety rather than researchers of the sociology or history-department variety...


$0.10/tweet? That is insane!


This wouldn't bother me at all if I didn't think they still had the data. The transient nature of tweets is one of my favourite features of Twitter.


This is precisely why The OpenPhoto Project exists. We're building a system where users give applications valet keys to their data.

Data ownership without portability is a moot point. Many services allow you to download a blob of your content but that's of little to no value for most users. What we really need (it's 2012 after-all) is a more thought out system where the user actually owns and controls their data and gives applications access to them.

This means multiple applications can leverage the same set of data and the user doesn't have to continue using any of them. Basically, there's no single point of failure in terms of data interoperability.

Currently, a user's Facebook content can be used by other services but for this to remain the user must keep their Facebook account open.

We're solving this problem by letting the user grant OpenPhoto software access to their photos. Most likely it's a bunch of photos that reside in their dropbox account but could also be an S3 bucket or box.net account once those become more consumer friendly.

http://theopenphotoproject.org or https://openphoto.me if you want to sign up for a hosted account.


where users give applications valet keys to their data

I love the metaphor. I want Facebook and Google to have my data, because they can't give me good service without it. Unfortunately, as a consumer, giving Facebook or Google access to your data is akin to the Native Americans welcoming the Europeans to their continent. You have no rights, and the balance of power is against you. They own your data now, and you may use whatever part of it they give you permission to use, in the manner they give you permission to use it.

Your metaphor is beautiful, instantly understandable, and puts them in their rightful place as service providers, not overlords.


Not sure I completely understood your first paragraph. Are you saying that as consumers if we give Google access to our data (even if it's limited - aka valet) then they ultimately "take over" the ownership of it?

There's a lot here. If they do get access to your data they'd most likely annotate it and provide you an enhanced experience on data that is owned by them (their annotations on your data). But I think that's okay. We have to be pragmatic here and say that in the example of photos if you give Google access to your photos to display in your G+ account and they collect +'s and comments perhaps it's okay that they "own" that content. As a user you still are better off than today and maybe it's a progression to where more and more data is portable and owned by you...but we have to start somewhere :)


If you maintain a copy of the data and give them access to it, then you aren't going to lose the data. You've lost control of it, since they can take it and do whatever they want with it, but you'll always have your copy. However, very few web apps work that way. The personal data that goes through Gmail is overwhelmingly generated in Gmail, and Google retains the only copy unless you back up all your data at home. Ditto for Facebook -- the only personal data that people are likely to retain a copy of is photographs, and again, that would only be for backup purposes, except for hobbyists who keep raw files or higher-quality images at home. For most people, Facebook serves as a very reliable repository for their personal snapshots.

We're completely at their mercy for all the data we create in their apps, and therefore we're completely dependent on two things:

1) They want to retain customer trust. 2) They are concerned that they might lose that trust if they misbehave.

They're constantly testing the limits of 2) (intentionally or unintentionally.) At some point 1) could cease to be true. They could become so necessary, or so powerful, that people become cynical about the possibility of defying them, and therefore start accepting abusive behavior.

Another scenario is that they could stop caring about customer trust not because they're too powerful but because customers have already abandoned them. Imagine that in 2022 Facebook is a has-been. It has been beaten in social networking, or social networking sounds ridiculously quaint in 2022, and Facebook is a failing company with no prospects for revival. Kind of like SCO. And it happens to possess, as its only asset, a little bit of intellectual "property" (a decade of personal information on a billion people.) Perhaps, in its death throes, it will go the way of SCO, abandon all moral restraint, and attempt to squeeze a windfall out of that "property" in the most cynical way possible. Maybe they'll charge you a ridiculous fee to retrieve your data in a useable form. If their user agreements make that impossible, someone in the company might go rogue and sell all your private data to a shady entrepreneur in a lawless country. They could sell the photos to Joe Francis and individual account histories to blackmailers and gossips. We know that won't happen today, because Facebook has the resources and the motivation to make sure it doesn't happen, but if the company goes to hell, we won't have those assurances anymore.



Those of you in the European Union can avail of EU law which says you're legally entitled to get a copy of all the personal data companies hold on you. People have made this requests to Facebook. Here's how: http://europe-v-facebook.org/EN/Get_your_Data_/get_your_data...


I can't find the original story, but when a guy asked for his data, some was missing and when he asked about it, Facebook answered it was their own trade secret and could not give them to him.


Honestly, i was happy when i first discovered google's data liberation project..http://www.dataliberation.org/. Was upset with fb a few years ago.. But nowadays i have accepted that no private company is going to have a financial incentive to give back my data. At best, there can be laws that are made, but loopholes will be found. Am moving all the stuff i care about to a vps hosted server. probably a couple more months and even email should be moved. Yay..:-)


It's interesting that people seem to have suddenly started talking about data ownership much more lately.

We have been working on a project called TheMux [1] that aims to create a data platform (for lack of a better term) which helps you to pull in and archive your data from various sources and apply some simple normalization. We're specifically working on ways to keep both content (blog posts, photos, status updates, etc) and datum (health info, workouts, communication data, etc) in a form where you a) have full control of the raw data and b) can make select data available to external apps which do things like presentation and analysis.

Our goal is to create an open-source platform to address some of these questions around data ownership, access and portability. Imagine the day when you decide that you want to move your blog from Tumblebook to Posterpress and you can do that by simply creating a new account on Posterpress and granting it access to your existing data. Or you've been using JogKeeper but then you find this great new service called SprintTracker that you want to try out - and all you need to do is connect and it will have your years worth of running data.

And we think that something like this will also make it much simpler for SaaS developers to compete not by customer lock-in but rather by providing superior products and continually working to make the customer happy.

We're taking it slowly right now to build this platform which we will open-source under a permissive license (as soon as it's a little more mature) by first building a few consumer-oriented services on top of it. Number 1 on our list is a blog-type website based on this MuxDB concept and that's what we're working on at the moment.

If you're interested in giving us your thoughts (or help, or tell us we're crazy, or whatever), my email is in my profile.

[1] http://theMux.com


I've been advocating (to friends) the need/inevitability of this kind of system.

One possible marketing phrase that's been thrown around is "data asymmetry", as in "TheMux aims to rectify the massive data asymmetries that give Facebook and Google orders of magnitude more power than their users."

Even better would be the development of a completely peer-to-peer version of Facebook (which Diaspora isn't).


Nice phrase, I'll have to keep that in mind - at least when talking to hackers.

Re: replacement of Facebook, I don't think it's realistic for someone to develop an actually decent federated replacement for a true social network. The difficulty of doing some of the stuff that Facebook users like, but trying to make it work when all of the nodes are on different networks with different constraints, seems to be very high.

I'm not so much in the "Facebook shouldn't have my data" camp... more like the "I want control of my all data TOO" camp.

Our concept for people writing apps that leverage TheMux doesn't require them to actually load data live from their user's instance of TheMux (because that's crazy, a single point of failure, and probably very inefficient) but that there is simply the agreement to interchange data.

To steal your phrase - we want to make the data symmetric.

So that no matter where I create or update my data, the systems involved makes a best-effort to return that data to my MuxDB, and then my MuxDB will make a best-effort to get that data out to the sites I've authorized and who need the data to do what I've signed up for.

Obviously, we're still in the early stages, but it's been an exciting and challenging journey so far.


It would be cool if all your personal data was stored in some central repository. You can designate certain chunks of that data as publicly accessible, and you can grant permission to different sites for reading and writing data to other sections. That way all sites can contribute to one data set, kind of like a personal Wikipedia. Everybody wins.

It would have a set of standards that make it easy to request commonly used pieces of information: name, primary-email, avatar, etc. You could even store passwords in it, or more likely just sidestep the need for passwords all together, since it could essentially be a global login. You would never have to fill out forms for registrations, credit card info, etc.

For sites that you don't want to give your personal information to, you just give access to a secondary set of data instead, with values for your internet nickname and persona.

Of course, there'd be big security challenges, but I think that would be a neat solution to issues with online identity and personal data.


I really have to wonder, why is there no "peer to peer" social network? It seems ideally suited for this type of usage, and this would ensure that data is only in each node rather than on some central server. Does something exist like that already?


why is there no "peer to peer" social network

Probably the closest there is is the blog-o-sphere and/or the internet in general? I can host my photos on a website, and email that to a friend? Someone can subscribe to my RSS feed of my blog, etc.


Depends on what you want a "peer" to be.

A peer being an Free Software server, which anybody can host: Appleseed, Diaspora, Elgg, and many more.

A peer being like a file-sharing application running on each persons computer: XMPP would be a good start, but I don't know any projects, who have built something noteworthy on top. The big problems: Friends must be able to interact with me, while I'm offline. Browser integration for "share buttons".


Eben Moglen from the FSF is trying to popularize a fully open source peer to peer social network run out of those tiny plug computers.

See:

http://www.freedomboxfoundation.org/


> He also told The Guardian that his habits on his computer indicate his health and places he's been.

Technology can definitely make a difference.

Last week you sat on the bus next to someone who has been diagnosed with Avian flu - would you like me to schedule a doctor's appointment for you?


Being able to download a blob != having your data available.


I would actually contest for the majority of Facebook users the current download is what most users think of when they here "download your data." It is blob-like, but after unzipping you can actually start just clicking and opening files. CSV, &c are nice for programmers, but not the general public.


I think the sentiment went over your head.

Useful applications will pop up soon enough if they make truly open apis and downloadable data. He's not suggesting google/facbook actually implement the statistics themselves. In fact the opposite, because that would still mean the data was closed.


What does == having your data available?



Being able to take that data and move it to another service?


"There are no programs that I can run on my computer...."

Major generation gap and sophistication gap here. This is why the issue of data ownership took so long to get traction. For most people, data is invisible and using it is magic. The only difference people noticed when they lost control of their data was that they could suddenly access it everywhere!

The difference between owning your data and not owning it is like the difference between living in a country with civil liberties and legal protections versus living in a country without civil liberties and effective legal protections. The difference is abstract and meaningless until it becomes a matter of life and death, when you end up in a situation that you only thought existed in paranoid fantasies.

The invisible change from owning your data to being owned by Facebook and Google happened to correspond to a very noticeable improvement in convenience for users. As in more dramatic examples from history, an authoritarian regime has delivered in a way that its predecessor could not, and as a result, people have embraced it. It doesn't mean that people accept its ideology. The way we think about Facebook, the way anybody thinks about Facebook who engages in these conversations about web industry economics, and therefore how Facebook thinks about itself, is much darker and more sinister than the aspect of Facebook that users embrace. Users embrace the effect that Facebook has had on their lives, which is overwhelmingly positive. The side we see is not something they embrace; it is something they have barely begun to consider. I have faith that they will react to it as we do.

We are not more sensitive to freedom than they are; we are merely face to face with the problem because we like to imagine ourselves in Facebook's shoes :-) We imagine what it's like to have that power; we are able to think in a predatory, profit-oriented mindset; we understand the temptation. We don't have to be evil to see the temptations that Facebook faces. And we understand that a person can withstand temptation, but a corporation cannot. A corporation cannot withstand the temptation to make money. Only as long as its profits exceed the wildest demands, anyway. When profits flag, a corporation will collapse to the moral lowest common denominator, because the people who resist immoral profits will be replaced.

Like I said, we don't have to worry that the average Facebook user is okay with the dire scenario of Facebook fully and amorally exploiting its power. It's completely imaginary at the moment. Those who worry about it worry alone, but only because we are uniquely positioned to imagine it. We can make other people understand. Right now Facebook's major sin has been to appropriate a dangerous and therefore immoral amount of power. They haven't abused it yet, not the way they could. I think with the right kind of education, users will revolt and demand rights and control before Facebook breaks down and really abuses the terrifying power it has amassed.

Am I too optimistic?


I do not want my data as much as I want to be able to delete everything from mm/dd/year to mm/dd/year. And I mean delete and truly forget. Tracking is the much bigger issue, my tweets out of twitter are worthless. Pictures posted are low res versions too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: