Data portability, the forgotten right of GDPR

somethingAlex · on May 25, 2021

What are consumers intuitively expecting compliance with this law to look like?

Data from one service may be in an entirely different schema than the service you want to import it too - let alone format. Service A may summarize your data and throw away the granular stuff, but service B runs on the granular data.

Are consumers going to implement ETL pipelines to achieve portability? Are they expecting to hook up streaming mechanisms for enormous swathes of data?

Just as an example, if I wanted to get a list of every song I liked on Spotify and import it into Apple Music, how would that even work? The songId of Spotify is undoubtedly different than the one Apple uses. Are Apple and Spotify supposed to agree on a common file format?

I agree with the intent of the law but I'm not surprised most services do not offer an automated way to take out data. It's a rare case, often a heavy workload, and there's really no way to guarantee the data you receive is actually portable.

izacus · on May 25, 2021

> Just as an example, if I wanted to get a list of every song I liked on Spotify and import it into Apple Music, how would that even work? The songId of Spotify is undoubtedly different than the one Apple uses. Are Apple and Spotify supposed to agree on a common file format?

Why wouldn't it work? Desktop apps had m3u playlist formats which could be read by multiple players - from Winamp on desktop, iTunes on a Mac or even car headunits. It's now kinda wierd to say that rockstar engineers of Apple/Spotify can't find a way to export playlists and liked songs (a global singletons essentially) they got from the SAME publishers and probably ingest from the SAME content owner data sources.

sethhochberg · on May 25, 2021

m3u (or really anything that relies on file names to reference media assets) would be a potential disaster... all kinds of weird stuff makes it into your parser when you're dealing with a large enough catalog. I used to work in streaming media, including with some m3u-based legacy systems, and dealt with a pile of edge cases a mile high.

But thankfully, the industry solved this problem themselves: ISRC (International Standard Recording Code) is already used all over the royalty reporting side of the industry because it specifically solves the problem of referencing an individual recording of a work.

DDEX is a content delivery manifest format the industry also uses for this kind of purpose (sharing complex metadata about recordings in a standardized way), but its an 800lb gorilla of a format and not super consumer-friendly.

These are things that are all over the back offices of your favorite streaming service, but mostly transparent to the consumer.

capableweb · on May 25, 2021

This is a great comment, thanks for adding to the conversation.

This further gives proof that it is technically feasible for Spotify and Apple Music to be able to import/export playlists across both their services. Looking forward to seeing it happen in reality.

redwall_hp · on May 25, 2021

I was thinking the other day about that, sort of. Apple in the early 2000s was really into things like CalDAV and WebDAV. Safari had integrated RSS reading at one point. They embraced standardization and interoperability for many things, at least where important user data was concerned. Then something happened after the iPhone took off and iCloud became a thing, and they became all about vendor lock-in. I assume it comes from being a market leader instead of only having a relatively unpopular computing platform.

afiori · on May 27, 2021

there also are non-nefarious reason for dropping open standards ( obviously lock-in and profit where a big plus for them ), one would be that on your own technology you can implement, drop, and/or modify whatever functionality you want

dsr_ · on May 25, 2021

There are incentives for, say, Mastodon to be able to ingest your tweeting history, or for Linked-In to eat your Facebook social graph.

There's no incentive other than the law for Twitter or Facebook to make that data exportable.

StopHammoTime · on May 26, 2021

That’s why laws exist.

There’s generally no incentive to not kill someone except going to jail.

account42 · on May 28, 2021

> That’s why laws exist.

And we are discussing the law that fixes the lack of incentive of Twitter and Facebook to export the data in this very thread!

anshorei · on May 26, 2021

Ostracization by peers, revenge by members of the victim's family or tribe. Humans generally had reasons not to go out and kill one another long before laws.

selfhoster11 · on May 26, 2021

That's literally just a form of law being practiced without being formally codified.

brokenmachine · on May 26, 2021

>There’s generally no incentive to not kill someone except going to jail.

I'd let some of you survive.

Guillaume86 · on May 25, 2021

> Just as an example, if I wanted to get a list of every song I liked on Spotify and import it into Apple Music, how would that even work? The songId of Spotify is undoubtedly different than the one Apple uses. Are Apple and Spotify supposed to agree on a common file format?

I understand your point but FYI music is a poor example as there is solutions to port metadata in that case. MusicBrainz aims to standardize music metadata and it is pretty commonly used. An example I know is the lastfm service, their APIs accept an optional mbid: https://www.last.fm/api/show/track.updateNowPlaying.

cbm-vic-20 · on May 25, 2021

Should music streaming services be compelled to support MusicBrainz to support this GDPR case simply because it is commonly used? Who decides that mbid is the GDPR-accepted track identifier?

iatt · on May 25, 2021

In contrast to requiring a particular standard to represent metadata, the better way is to require that the exported data be machine-readable and its contents documented. This gives providers the flexibility to innovate, adding their own identifiers or whatever useful data they want, but avoids legislators forcing everyone to use a common standard that, while useful today, can quickly go out of date. Competitors can then write converters or importers based on this documentation, as could motivated FOSS users.

kelnos · on May 25, 2021

> Just as an example, if I wanted to get a list of every song I liked on Spotify and import it into Apple Music, how would that even work? The songId of Spotify is undoubtedly different than the one Apple uses.

Artist + Song Title (+ Album and track number, if it's from an album) should be enough to disambiguate in enough cases for someone to consider this "portable".

Beyond that, we have music fingerprint IDs that a service could output in the data dump along with their own service-specific ID.

> Are Apple and Spotify supposed to agree on a common file format?

For something as simple as this, yes, absolutely they should. It's bonkers that they don't and wouldn't, aside from garbage anti-competitive lock-in reasons.

908B64B197 · on May 25, 2021

> Just as an example, if I wanted to get a list of every song I liked on Spotify and import it into Apple Music, how would that even work? The songId of Spotify is undoubtedly different than the one Apple uses. Are Apple and Spotify supposed to agree on a common file format?

If I was Spotify I would export that as an SQLite DB. Maybe the metadata catalog as a standalone DB too.

Apple Music has an API[0] so it's already mostly possible to import a list of songs in it.

> I agree with the intent of the law but I'm not surprised most services do not offer an automated way to take out data. It's a rare case, often a heavy workload, and there's really no way to guarantee the data you receive is actually portable.

"Data Portability" is so vaguely defined that I can't help but see it as yet another law that EU bureaucrats will use to fleece (American) "Evil Tech Giants".

[0] https://developer.apple.com/documentation/applemusicapi

anticristi · on May 25, 2021

"Data portability" is vague so that the law is stable and flexible. As a comparison, "drivers need to adapt driving speed to weather conditions" is equally vague. It would simply be infeasible to publish an hourly speed limit chart based on rain, fog, snow, etc. It is the responsibility of driving instructors to raise awareness on reasons to adapt the speed. Drivers need to then interpret that clause to their situation.

Similarly, it is up to industry -- either via standardization bodies or courts -- to clarify what exactly is "data portability".

FigmentEngine · on May 25, 2021

the phrase maybe, but the gdpr article itself provides more "Article 20

Right to data portability

1. The data subject shall have the right to receive the personal data concerning him or her, which he or she has provided to a controller, in a structured, commonly used and machine-readable format and have the right to transmit those data to another controller without hindrance from the controller to which the personal data have been provided, where:" https://eur-lex.europa.eu/eli/reg/2016/679/oj#d1e2753-1-1

908B64B197 · on May 25, 2021

> It would simply be infeasible to publish an hourly speed limit chart based on rain, fog, snow, etc.

Not really. Pretty sure tire makers have good data on traction at different speed, temperature and substrate.

afiori · on May 27, 2021

so the law would state that you need to constantly check your phone for the app that tells the current speed limit based on weather, road type, road size, car size, tire type, and traffic

hnick · on May 26, 2021

> Are consumers going to implement ETL pipelines to achieve portability? Are they expecting to hook up streaming mechanisms for enormous swathes of data?

As a dev I hate the fact that something like Zapier apparently has to exist in this messy world, but non-technical people like my wife tend to find it intuitive and relatively easy to use so that's one option.

Though for your example I'd argue that the ingester (Apple) has a vested interest in allowing import from many formats to poach customers. Much like how Apple went to the effort of creating the Move to iOS app on android. I wonder whether having the data exported with just a song id would be sufficient under the law, because you could just normalise all useful data away and export a list of IDs to the customer which seems clearly against the purpose of the law. Showing just IDs is not my data which would mean the actual songs I like.

capableweb · on May 25, 2021

> Just as an example, if I wanted to get a list of every song I liked on Spotify and import it into Apple Music, how would that even work? The songId of Spotify is undoubtedly different than the one Apple uses. Are Apple and Spotify supposed to agree on a common file format?

Yeah, that'd be great! We didn't get the web as we know it today until bunch of people and companies got together and created standards for everyone to rely on. Why can't we do that same for SaaS businesses?

I think the test is something like: If the concept is the same, you should be able to import/export it. For example, you have a SaaS having photo upload + being able to put the photos into a custom gallery. Then you should be able to export that gallery in a format that you can recreate the same gallery in another SaaS that also has photo upload + custom galleries.

The article itself is clear that it's not always technically feasible to offer this import/export. For example, it doesn't make sense to be able to export Facebook posts and import them into Twitter, because those are two different formats with different restrictions.

This is from the actual article:

> In exercising his or her right to data portability pursuant to paragraph 1, the data subject shall have the right to have the personal data transmitted directly from one controller to another, where technically feasible.

The full article of "Data Portability" is not that long, you can read it here: https://gdpr-info.eu/art-20-gdpr/

Helmut10001 · on May 25, 2021

I agree, most SaaS concepts are similar and have large overlaps in feature and functionality. Just for Social Media, we've written a common data structure format (lbsn.vgiscience.org) where it is possible to import/export from all services (this one is specifically tailored for visual analytics and exploration of research/privacy questions). When working on the structure, it became clear that most Social Media concepts exist in a similar form on multiple sites. There is very little functionality that is unique to a single SaaS.

capableweb · on May 25, 2021

Indeed, common data structures across platforms feels more common than specialized ones, biggest difference seems to mostly sit in the UI/UX layer at this point.

Data Transfer project is also trying to define some common data models that companies can use to ensure they export/import agreed data models, although it's still not very extensive: https://github.com/google/data-transfer-project/tree/master/...

Helmut10001 · on May 25, 2021

Yes, we had a look at the data transfer project, but it really felt more like an alibi. Looking at it now, I do not see much improvement. I don't think you can force companies to offer data exchange interfaces, if they have no benefit from it.

capableweb · on May 25, 2021

> Looking at it now, I do not see much improvement

Same here, very disappointing.

> I don't think you can force companies to offer data exchange interfaces, if they have no benefit from it.

Me neither, and neither does the people who came up with GDPR. That's why the "Data Portability" directive doesn't dictate exactly what format/model your exported data has to be in. It simply has to be exportable in a machine-readable format. The reason that it's just about exporting and not transferring, is because companies still are on the fence to allow users to move to different companies with their data. This directive does force those companies to finally behave in the favor of their users, instead of shareholders.

irrational · on May 25, 2021

Isn’t “Where technically feasible” a huge loophole?

capableweb · on May 25, 2021

Well, not really. These are directives, not laws. The laws themselves gets passed in each countries courts, and then each infraction will be handled by the courts themselves. Law in general requires there to be space for interpretation as well, as not all cases are the same.

As outlined elsewhere in this submission, it'd be stupid to require Twitter to be able to import Facebook Posts as Tweets, so the directive is not aiming to require that.

But if you instead have two companies who both allow photo upload and to put those photos in a photo gallery, it's not so stupid anymore to require them to be compatible with each other via import/export, as they do exactly the same thing with the same data.

In the end, the goal of the directive is not to force data transfers between all data models in the world. The goal is to force companies to have a export functionality for their users that outputs machine-readable data.

The incentive for being able to import competitors data is already in the nature of doing business. Exporting, not so much, hence we're now getting laws passed to force companies to be more user-friendly.

M2Ys4U · on May 25, 2021

>Well, not really. These are directives, not laws.

The GDPR is not a Directive, it is a Regulation (the clue is in the name...).

Regulations are law, they have what is known as "direct effect"[0] and apply as-is throughout the EU without having to be written in to domestic law in each member state.

(Directives can also have direct effect in certain circumstances too,[1] although they usually can't confer rights that people can use against non-State entities.)

[0] https://en.wikipedia.org/wiki/Direct_effect_of_European_Unio...

[1] https://en.wikipedia.org/wiki/Direct_effect_of_European_Unio...

account42 · on May 28, 2021

The emportant part that this law should achieve is to get the data out of the service. This at least allows competing services to provide importers, which is in their interest.

anticristi · on May 25, 2021

I agree. I could develop my own ETL and wasn't sure what to do with this right. Where would I import my Klarna Checkout history and for what purpose?

I guess this law is there to ensure your images can be transferred from Dropbox to Google Drive to Apple Cloud, without any of them being tempted to pull the plug.

mulmen · on May 25, 2021

This is so sad. That we have already forgotten how easy this is. And that we do not see that data integration is an obvious case for open standards and development.

Back in the "bad" (aka glorious) days of p2p file sharing we had no problems keeping things straight. Even Windows XP natively knows how media libraries work. Any service that makes this hard will be at a disadvantage to ones that make it easy, and maybe get roasted in court on GDPR grounds.

The only reason services do not offer you data export in an easily digestible format is that they want you to stay in the app.

maxdo · on May 25, 2021

I'll re-phrase. Imagine I'm a startup. If government force me to to delete some data, it makes my life easier, no data - no privacy issues. if someone tells me , I want to port my data to competitor, because my UI better then theirs, but they still prefer competitor, why should I care about this requests, why should i spent a single second of my engineers time to implement that?

TeMPOraL · on May 25, 2021

> if someone tells me , I want to port my data to competitor, because my UI better then theirs, but they still prefer competitor, why should I care about this requests, why should i spent a single second of my engineers time to implement that?

Because you're a good person and care about providing value to your users, and not just extracting value from them.

But since in practice, we can't rely on every business to be run by good citizens, this needs to be made a legal requirement, to remove the competitive advantage from being predatory and locking users down.

ryandrake · on May 25, 2021

While I agree 100% with this response, let's for the sake of argument assume OP is not a good person, and doesn't actually care about providing value to users.

An answer to "why" that does not depend on voluntary goodness is: Enough people is the world generally think representative democracy is a good thing. We stand by that system for making the rules. Enough people in part of the world think there is enough of a problem to the point where a rule was made. If you want to do business in that part of the world, you need to be bound by that rule. That's why you should spend time on it. As incomprehensible as it might be, it's important enough to those citizens that they are willing to levy a penalty on you if you don't.

...and so far, at least three people actually took time out of their day to go find the "down" arrow on this obviously raving insane viewpoint :) I love you guys!

TeMPOraL · on May 25, 2021

Agreed.

I'll add another argument that doesn't depend on voluntary goodness, just on longer-term thinking: if you can establish a reputation that you're not making it hard for users to migrate away from you, people will be more likely to try your services out in the first place.

And yet another long-term thinking argument: if being able to easily export and migrate data between competing services becomes commonplace, then you'll not only have an outflow of users to competitors, you'll also have an influx of users migrating from your competitors. If you're trying to put a superior service on a market dominated by inferior incumbents, it's in your interest to promote data portability, as - if your service is truly as good as you think - user flow will predominantly go towards your business.

ClumsyPilot · on May 25, 2021

Democracy takes precedence over markets and profit, what kind of madness is this?

jpttsn · on May 25, 2021

Involuntary goodness.

Is the point that you should feel good about any regulation, by virtue of its being the result of a democratic process? For (counter)example, I might not normally feel good about implementing “Muslim ban” functionality even if I recognize the nominally democratic process that forces me to.

toolz · on May 25, 2021

the spirit of such a law is great, but there's a huge problem - what does the implementation even look like? Are we going to have regulatory committees oversee which types of data should be portable and when? Who writes the protocols?

The implementation of such a law is impossible as far as I can tell and opens up huge vulnerabilities to smaller companies.

Just imagine when large companies can hire lobbyists that can force a data protocol on the smaller businesses.

The spirit of many laws is great, the implementation is unfortunately, what actually matters and I don't see solutions to these hard problems.

Allow me to go on a soapbox here, but far too many laws are created with good intentions that are destroying competition and hurting the end users.

capableweb · on May 25, 2021

> The implementation of such a law is impossible as far as I can tell and opens up huge vulnerabilities to smaller companies.

Have you actually read the article from GDPR about "Data Portability"? (https://gdpr-info.eu/art-20-gdpr/)

It's easier than you think. Offer a endpoint that spits out a ZIP file with JSON/multimedia of all the data you have associated with the user. Now you're done, you don't have to do anything else.

If possible, you should provide a good format (see my other comment https://news.ycombinator.com/item?id=27278816) but you're not strictly required to.

The intent of the article is not to allow people to import Facebook posts into Twitter, the intent of the article is to force businesses to allow people to export their data in a machine-readable format. What that entails exactly is up to each company to decide, and court of law to determine if it was followed properly.

toolz · on May 25, 2021

I hadn't read the law yet, thanks for the link, but I don't think that solves any problems at all and has potential for plenty of issues. The devil is in the details and the people already have the power to only use services that allow data exporting.

You're attempting to force companies to behave in a pro-social manner but if that company never wanted to behave in a pro-social manner we'll have just given them another attack surface with their lobbyists to use to kill their competitors.

I'll withhold judgement until I see how this plays out, it could end up being a great thing, the issue with laws isn't that they can't help - the issue is that laws that end up hurting almost never go away.

capableweb · on May 25, 2021

> but I don't think that solves any problems at all and has potential for plenty of issues

It does solve the problem with some businesses not offering exports in machine-readable formats in order to stop users from being able to move to other services together with their data. Or which problems do you think they are aiming to solve here?

> the people already have the power to only use services that allow data exporting.

Yes, but the directives are not meant to help people to chose services, it's meant to help people already using a service and being able to move to a different one with their data. By forcing companies to follow these directives, users no longer have to chose an inferior product just because they offer exports, because all the products have to offer export.

> You're attempting to force companies to behave in a pro-social manner but if that company never wanted to behave in a pro-social manner we'll have just given them another attack surface with their lobbyists to use to kill their competitors.

I don't really understand this line of reasoning, but I'm interested in understanding it. We already have bunch of laws and directives to make companies behave more ethical, since they made it clear that they need laws sometimes to do the right thing. How is this adding another attack vectors to kill their competitors? If company A is "anti-social" (I guess), doesn't offer an export and want to kill their competitor B (who does offer export), how does the export tie into company A being able to kill company B? As I understand it, company B is following the directives while company A isn't, so users of company A could sue that company, but that doesn't affect lawful company B.

But I might misunderstand something so please, elaborate :)

TeMPOraL · on May 25, 2021

> We already have bunch of laws and directives to make companies behave more ethical, since they made it clear that they need laws sometimes to do the right thing. How is this adding another attack vectors to kill their competitors?

I'd go as far as saying that such regulation fixes an attack vector. Before, a company behaving pro-socially was at a competitive disadvantage - their competitors that "never wanted to behave in a pro-social manner" could adopt antisocial strategies that the pro-social company couldn't. Banning those strategies levels the playing field.

toolz · on May 25, 2021

> It does solve the problem with some businesses not offering exports in machine-readable formats

and which data should businesses allow users to export in machine readable formats, every click, view, views on other sites with that sites cookie/callback?

what is a common machine readable format? Literally all data is machine readable - what if the "common" format is purposefully complex and hard to implement right and you have to use paid libraries to do it correctly? These are things big companies can afford to do that kill small competition.

and since they are a big company simply them using it makes it "common" by some definition since more people will use it by virtue of more people using their services.

> If company A is "anti-social" (I guess), doesn't offer an export and want to kill their competitor B (who does offer export), how does the export tie into company A being able to kill company B?

company A, being the dominate evil-corp can pay lobbyists to define the protocol for export in a format they define....company B (the small good willed company) already exports in a format, but now they are forced to change their existing systems resulting in a lot of work lost - that is effectively money stolen from company B

Now, a reasonably pro-social reaction would be to allow both exported formats, but how difficult would it be to have lobbyists convince a non-technical governing body that their format is superior and should be used?

Imagine a non-technical family member is overseeing some committee and facebook shows up with their amazing analytics and awesome data export tool with graphs, charts, everything. Do you think your non-tech family member will recognize that the underlying format is bad for small businesses? I don't think I'd expect a non-techie to understand the costs there.

edit: further, are there SLAs for export uptime? what happens when bad PR hits a company and data export laws effectively mean a company is expected to export terrabytes of data within a day or so? Is that small company now legally liable because they can't handle that kind of load - which is further compounded by the fact they are getting data export requests because of bad PR to begin with? Does that company now have to choose between serving exports or keeping their service running?

I'm sure if I spend an hour thinking of scenarios that could hurt businesses that are otherwise doing the best they can I can come up with plenty.

TeMPOraL · on May 25, 2021

I think you're approaching GDPR with a wrong mindset, perhaps one rooted in the US legal system. EU countries tend to put more weight to the spirit of the law than US does.

In GDPR, many of the things seem technically underspecified, because they aren't describing implementation details - they're describing the principle behind them.

For instance, what "common machine-readable format" means is obvious to everyone who does anything with digital data. For generic data, it's XML, JSON, CSV, you could probably get away with XSL(X) or DOC(X); for images it's BMP, PNG, JPG. Etc. If you think you have a valid reason to use something more niche, you can. If you're afraid someone will contest it, you can request an interpretation from appropriate regulatory body. If someone contests your choice, you can justify yourself - but if you're being purposefully obtuse, the ruling will be against you. The legal system gives you plenty of time to prepare, seek clarification, complain, dispute, get reprimanded - and ultimately comply, or, if you stubbornly refuse, get punished.

Consider what would have happened if GDPR actually defined what "common machine-readable format is". Plenty of companies would have a valid reason to complain that the list of allowed formats is too narrow, and unsuitable for their particular use case. The law would have to be updated to reflect the fast-changing landscape of computing technology, or risk slowing progress by forcing everyone to maintain legacy technologies.

Instead, GDPR, focuses on the guidelines to achieve the intended results, while leaving the implementation details for the industry to figure it out. It's better this way, than having regulators figuring out what's the difference between "cookie" and "local storage".

capableweb · on May 25, 2021

> and which data should businesses allow users to export in machine readable formats, every click, view, views on other sites with that sites cookie/callback?

"‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;"

https://gdpr-info.eu/art-4-gdpr/ - (1)

> what is a common machine readable format?

JSON, XML and a few others are candidates that are generally considered common. If you haven't heard about the term before you can find more information here: https://en.wikipedia.org/wiki/Machine-readable_data

> what if the "common" format is purposefully complex and hard to implement right

Then I guess the company is shooting itself in the foot if they make it harder to build the export functionality than it has to? The directive is not about being able to import data from any service, the directive is about being able to export your data in a machine-readable format. Not sure how much more clearer I can make this.

> company A, being the dominate evil-corp can pay lobbyists to define the protocol for export in a format they define

Company A is allowed to export the data in whatever data model they want, no lobbyists required. What it has to be though, is machine-readable.

> company B (the small good willed company) already exports in a format, but now they are forced to change their existing systems resulting in a lot of work lost

No, the directives nor laws around GDPR won't force a small company to change their export format. The directives are aimed at larger businesses that don't allow export at all, to get those companies to actually become user-friendly instead of user-hostile.

You should really give reading the full GDPR a go, it's not that long nor complicated and explains everything you're worried about (seemingly at least).

Here is the full version: https://gdpr-info.eu/

And here is a simpler quickstart explaining broadly what GDPR is: https://termly.io/resources/articles/gdpr-for-dummies/

Edit:

> edit: further, are there SLAs for export uptime? what happens when bad PR hits a company and data export laws effectively mean a company is expected to export terrabytes of data within a day or so? Is that small company now legally liable because they can't handle that kind of load - which is further compounded by the fact they are getting data export requests because of bad PR to begin with? Does that company now have to choose between serving exports or keeping their service running?

Again, I invite you to actually read GDPR before commenting further as both you and me spend more time answering each other than the time you could have taken to just read the resource you're commenting about now.

Article 12 (3):

> 1 The controller shall provide information on action taken on a request under Articles 15 to 22 to the data subject without undue delay and in any event within one month of receipt of the request. 2 That period may be extended by two further months where necessary, taking into account the complexity and number of the requests. 3 The controller shall inform the data subject of any such extension within one month of receipt of the request, together with the reasons for the delay. 4 Where the data subject makes the request by electronic form means, the information shall be provided by electronic means where possible, unless otherwise requested by the data subject.

If you can not handle running your service + the export in a way so people clicking the export gets their data within 30 days, I don't feel so bad about you actually just closing down your service instead, as the uptime in general must be very bad.

TeMPOraL · on May 25, 2021

> The devil is in the details and the people already have the power to only use services that allow data exporting.

The problem with this is that people are choosing products and services based on many different aspects simultaneously. In particular, price and (with Internet services) network effects are such a strong factors that they pretty much override all other considerations. How this plays out in practice is, the whole market stops offering value along the "irrelevant" factors.

In case of GDPR - because abusing users' data makes money, and not abusing it costs money, everyone starts abusing it to reduce price (or their costs). You're not going to ditch Facebook if all your friends are there. You're not going to ditch your primary care provider because it plays fast and loose with your data - it's a big hassle, and there's no guarantee other providers aren't even worse.

Imagine switching this discussion to one about food safety regulation. If they were suddenly all repealed, you can bet your top dollar that the quality of food would quickly degrade across the board. Even the most upstanding companies would start making sacrifices to keep up with their less ethical competitors, or risk getting outcompeted - relaxing standards allows to drop the price (or increase and reinvest profits), which allows to keep this up through economies of scale, while companies standing their ground on quality lose customers, lose efficiency, and have to increase the price. Customers won't choose the more increasingly more expensive, quality food, because in a typical countries, most people can't afford expensive food.

The end result is the market locking into a new, much lower, food safety level.

There are certain patterns on the market that are very predictable, and which are impossible to fix from within. That's where regulations are needed. And they do seem onerous to businesses when introduced - that's because we usually realize the problem only when we're deep in it.

ohwanderu · on May 25, 2021

[flagged]

TeMPOraL · on May 25, 2021

No one, since you aren't using any of Emergynt's services. Also, nice low-effort doxx. Reminds me I need to update my LinkedIn profile.

golergka · on May 25, 2021

> Because you're a good person and care about providing value to your users, and not just extracting value from them.

That's a false dichotomy that is also misrepresents the nature of a typical business transaction.

If you're a good person and a business owner, you're looking to make mutually beneficial business transactions. If someone is looking to move away from using your business, then it's them who's trying to extract value from you, without giving anything in return.

Of course, sometimes, as just a good person, you want to do good for other people without anything in return — but you can do it as a private person, putting your profits into charity funds. Separation of concerns is a good thing that make things clear. Also, from any moral point of view, money spent on engineer salary that allows some food app user to migrate to a competitor is probably not spent as well as feeding hungry or providing health care to sick anyway.

TeMPOraL · on May 25, 2021

> If you're a good person and a business owner, you're looking to make mutually beneficial business transactions.

Exactly this. The transactions become much less mutually beneficial if the business is trying to hold my data hostage. Especially if they gave no indication of it previously, back when I was still evaluating the value of the transaction.

We had a good time, I got value from their services, they got my money and perhaps some extra benefits too - now it's time for us to part ways, I'm packing and I want my stuff back.

> If someone is looking to move away from using your business, then it's them who's trying to extract value from you, without giving anything in return.

Transactions on the market are supposed to be voluntary. This means I should be free to move away from using any business, after discharging all my obligations to it that I voluntarily accepted. By switching to a competitor, I'm not taking any more value from the company. As for asking for my data back, a business should consider my data loaned to them. It's their obligation - social, and now legal - to give it back.

GDPR specifies more than one way to do this. Controller to Subject transfers are a no-brainer. It's my data, I want it back. Controller to Controller transfers are gated by "where technically feasible". This point is there to ensure that businesses which don't have the necessary infrastructure in place aren't forced to spend time and effort to build it up. Only those that can do it at negligible costs are being forced to provide this option.

jrochkind1 · on May 25, 2021

golergka's daycare service: you can check your kids in, but you can never check them out. it just wouldn't be mutually beneficial, sorry.

colejohnson66 · on May 26, 2021

If you went with a hotel service, you could’ve made a Hotel California reference ;)

est31 · on May 25, 2021

If you are a startup, then such a law directly benefits you because you might want to convince users to migrate to your services. If the big established competitor of yours has to offer data exports, such a migration is made easier for you, enabling your startup to grow faster, and giving users the ability to enjoy more innovation in the market.

goodpoint · on May 25, 2021

Because removing an exit barrier means removing lock-in.

Not holding customers data hostage can increase your service adoption.

E.g. many companies would not pay for a web-only email service where you cannot download and backup emails.

E.g. A lot of people pay for non-locked books (epubs) that can be carried over across different devices.

Governments across the world broke lock-in mechanisms for decades (e.g. carrying phone numbers, being able to buy gas/car oil/car tires/PC components/ from independent vendors)

WA · on May 25, 2021

You don't write an API to port stuff to your competitor. You write a JSON or CSV export and competitors can then make an import tool for your data format (and vice versa).

Is this really an effort? It's basically a JOIN over a bunch of tables or maybe the JSON state tree of your SPA and that's about it.

Chances are, your startup works with all data of a user and has a way to request all data from the DB anyways.

dariosalvi78 · on May 25, 2021

A company that builds houses would very much avoid building those pointless and expensive security features. Why would they spend a second of their architects' time on that?

croes · on May 25, 2021

I'll re-phrase: why should I care about the requests of my users? Now you know why they prefer your competitor over your better UI. Your UI may be better, but your UX sucks.

jcelerier · on May 25, 2021

You're exactly the kind of person I hope my government protects me of. Companies are not meant to enrich yourself but to make the world better.

google234123 · on May 25, 2021

Companies are not meant to make the world better...

afiori · on May 27, 2021

capitalism is the belief they do

ClumsyPilot · on May 25, 2021

I'll rephrase. Imagine I'm a startup. If someone tell me, I want to transfer my savings to a conpetitor, why should I care about this request?

The answer should be obvious, it's their data just like it's their savings.

jensus · on May 25, 2021

the mental shift seems to be to not regard your customers data as your product but rather focus on your service as your product

kspacewalk2 · on May 25, 2021

That will make a whole lot of business models out there not feasible. The result will be fewer free services (to put it differently, fewer services and fewer choices). If you don't pay for stuff with your data, you can't have it for free. Are we sure we want to use government regulations to impose this on consumers of services, from the top down? Instead of, say, letting them decide?

(Yes, of course it's an industry talking point. The best kind - one that's true and valid, and so far not effectively refuted).

lolinder · on May 25, 2021

A business model does not have a right to exist just because individuals would choose to patronize it if legal. There are plenty of predatory business models that capitalize on market failures. "Free" definitely appears to be one of those business models.

kspacewalk2 · on May 25, 2021

To clarify, you are saying we should legislate away the right of a consumer to consent to a service whereby, in lieu of payment, the consumer is delivered targeted advertisement based on the data generated by their use of the service?

If this phrasing is incorrect, please correct it. It's just really helpful to be clear and precise in such discussions, because people sometimes hide the essence of their argument behind ambiguous verbiage.

shuntress · on May 25, 2021

To clarify, the sentiment seems to be that we should legislate the requirement that a consumer must explicitly consent to any service whereby, in lieu of payment, the consumer is delivered targeted advertisement based on the data generated by their use of the service rather than take the consumer use of the service as implicit consent.

lolinder · on May 26, 2021

I'm not proposing legislation outlawing any particular business model. If someone can make "free" work while respecting customer data ownership, more power to them.

What I am saying that customer data should legally belong to the customer, and if that makes some business models infeasible, so be it.

hansvm · on May 25, 2021

Given the context of GDPR data portability, it seems more likely that they're saying that businesses shouldn't have a right to hold data hostage as a method of lock-in, especially in lieu of providing a service people like enough to voluntarily stick with. The "targeted advertising as payment" thing is a separate can of worms that they may or may not care about.

matheusmoreira · on May 25, 2021

> That will make a whole lot of business models out there not feasible.

So be it.

> If you don't pay for stuff with your data, you can't have it for free.

Okay. Charge us.

> Are we sure we want to use government regulations to impose this on consumers of services, from the top down?

Yes.

arrosenberg · on May 25, 2021

Free really means subsidized in this case. Those business models are anticompetitive, so it’s pretty easy to justify eliminating them.

jensus · on May 25, 2021

From a subjective view I do not believe we want any business model that survives on utilising your data beyond the core of the product to exist e.g. I would think we want anyone to sell your data to add companies.

I do not believe there is a need for so much free stuff in general. But it should never be a situation where you have to pay for your data to be safe.

kspacewalk2 · on May 25, 2021

Those are your beliefs/values. I mostly share them. But is it right to impose them legislatively on everyone?

jensus · on May 25, 2021

what are legislates if not the opinion of the current society (and for some countries the opinion of corporations)?

As in yes, with my current understanding of personal data, I do believe we should have laws safeguarding them - even at the risk of business'.

baq · on May 25, 2021

> That will make a whole lot of business models out there not feasible.

That’s the point.

JohnWhigham · on May 25, 2021

Good riddance to bad trash. It's a shitty business model to begin with.

M2Ys4U · on May 25, 2021

>That will make a whole lot of business models out there not feasible.

Good.

tomcooks · on May 25, 2021

Because it's not your data, it's mine?

cromulent · on May 25, 2021

Barriers to exit are also barriers to entry.

shuntress · on May 25, 2021

Because this is also required of your competitor and will allow users port their data into your startup which gives you a chance to compete.

print_goto_ten · on May 25, 2021

Being required and complying with that requirement are two different things.

shuntress · on May 25, 2021

That is how every law, rule, and norm works.

Are trying to imply that it is important for the legal system to have effective overseers, investigators, lawyers, juries, and judges?

toomuchtodo · on May 25, 2021

Isn’t engineering time cheaper than legal counsel time when your customers file complaints with the government against your org for not adhering to the law?

JumpCrisscross · on May 25, 2021

> engineering time cheaper than legal counsel time

For a Silicon Valley based company hiring EU lawyers, no. Engineers are more expensive. Also, for a Silicon Valley company with limited or no EU presence, the time value of money may make incurring that deferred cost worth the saves near-term engineering time.

Laws should be followed. But laws must be enforced. OP’s point is valid. The EU passed a law and delegated enforcement to its various members, each of whom have varying levels (and interpretations) of enforcement around different parts of the text.

Until that changes, GDPR compliance will remain a courtesy. Not a right.

toomuchtodo · on May 25, 2021

Good points, appreciate the reply.

MattGaiser · on May 25, 2021

Is any legal counsel time actually being spent on this? It seems like all the disability legislation. In theory it applies to websites. In practice, few give it a 2nd thought.

I have yet to hear of a company significantly harmed by failing to consider accessibility.

sam_lowry_ · on May 25, 2021

Yes unless you already have lawers on staff.

bombcar · on May 25, 2021

This covers a good argument as to why: https://www.joelonsoftware.com/2000/06/03/strategy-letter-ii...

And it's true - there are a number of services for work that we've never tried because there's no easy way "back".

matheusmoreira · on May 25, 2021

Why should your company be allowed to lock in other people's data in your company's computers and then refuse to give it back? This is obviously abusive. Why should your company be allowed to abuse its customers? Why should an abusive company even be allowed to exist?

_wmhc · on May 25, 2021

I bet there are more laws that a company would love not to follow, but it's the law and thus you'll need to spend time implementing it.

dbetteridge · on May 25, 2021

Because the data isn't yours, it belongs to the customer.

That is the opinion that GDPR encodes into law

pmlnr · on May 25, 2021

Erm... because you need to follow laws. Your company would file tax records, right? And follow fire and building regulations in the office, correct? So why would it not follow GDPR?

0xbadcafebee · on May 25, 2021

It's not your responsibility to help your customers use a competitor's service, so you definitely don't have to care about that. However, you might care if you practice "dogfooding".

The idea of eating one's own dog food is to understand the experience of the customer and improve the product. It demonstrates confidence in your product and helps you empathize with your customers. If you do this & are confident in your product, then a portability feature (to allow your customers to try out your competitors) should not be a threat.

Assuming you can convey to your customers why your product is superior, they won't have need of the porting tool. If one day they think, "Hmm, I wonder if the competitor is better", and try to use the porting tool to use the competitor, and find out it's a huge pain because the competitor's product isn't as good (or doesn't work the way yours does), they may decide they just don't feel like switching. People might also use your product just because they can switch if they ever need to.

Pandora is a great example of a shitty company that does not believe in its own product. If you use the free version, you are constantly bombarded with dark patterns and direct advertisements to get you to upgrade to their paid account. It's annoyware. If you eventually pay for the product, the only value add is fewer ads. There's no improved functionality, there's no easier experience, no better algorithm. Just slightly less pain. It's like upgrading from dogfood that tastes like shit, to dogfood that only smells like shit. If Pandora created a data portability tool, they would be screwing themselves, because they know their product is shit. If they had a great product, portability wouldn't be a threat to their business.

beyondcompute · on May 25, 2021

Absolutely! I remember asking to export my data from one of the services and the support pretty much ignored me (they replied in general but “forgot” to mention anything related to that question).

grishka · on May 25, 2021

I wanted to get my data out of ask.fm because I answered quite a lot of questions there back when it was fun. The GDPR export option was nowhere to be found. Opened a support ticket, they asked me for a EU ID... Well, yeah, I don't have one, I'm not a EU resident, I wanted to piggyback on the laws of countries that actually care about their people. But it just struck me that they hate their users this much. Even Facebook didn't go this low.

On an absolutely unrelated note, I reverse engineered ask.fm's client API back when I was actually using it.

wizzwizz4 · on May 25, 2021

Under GDPR I think they're not allowed to require an EU ID. So just say “I'm not required to give you my personal data for this”.

johndough · on May 25, 2021

Do you have a source for this?

In my experience, many large companies ask for ID. I am not quite sure which is correct since, on the one hand, they should verify that a request comes from the legitimate account holder, but on the other hand, they should practice data minimization.

sushibowl · on May 25, 2021

I suppose this is a UK source but it should apply to GDPR generally https://ico.org.uk/for-organisations/guide-to-data-protectio...

> You should also not request formal identification documents unless necessary. First you should think about other reasonable and proportionate ways you can verify an individual’s identity. You may already have verification measures in place which you can use, for example a username and password.

The GDPR doesn't state explicitly how to do identification for subject access requests, only that “The controller should use all reasonable measures to verify the identity of a data subject who requests access, in particular in the context of online services and online identifiers.” In the case of ask.fm it seems like if the person's identity can be verified by the fact that they can access their account, it's not reasonable to require an official ID.

afiori · on May 27, 2021

this is to regarding identification for grishka it was about proving if the law was applicable

grishka · on May 25, 2021

> they should verify that a request comes from the legitimate account holder

Facebook and Google do this by asking you to enter your password again. The ID thing is clearly there to impose a limit based on your nationality.

scrollaway · on May 25, 2021

You can be an eu citizen with a non-eu ID so it makes no sense.

anticensor · on May 26, 2021

scrollaway · on May 26, 2021

If you have a right of permanent stay in the EU, you're a citizen, even if you're from a non eu country.

If you have dual nationality between an eu and non eu country you might have two IDs as well.

Lots of cases like these. I'd call them edge cases but they're really not.

DocTomoe · on May 25, 2021

Not identifying a data subject without beyond reasonable doubt before sending out highly personal data is itself a GDPR violation - even a data breach which they would have to report to their GDPR officer.

_wmhc · on May 25, 2021

>beyond reasonable doubt

Sure, but on a website log-in info, email confirmation or 2FA is enough for that. Unless you already gave them your ID-card, they shouldn't have to use that to identify you.

DocTomoe · on May 26, 2021

I'm not sure about that. Information that is saved about a user might be more security-relevant than what someone - they or someone who hacked their account - might see in their account.

It clearly is something I would not want to have hours of meetings with legal council about, so I can see why some organisations may err on the safer side.

PoignardAzur · on May 26, 2021

You're playing devil's advocate.

If someone has access to your password and 2FA method, they can impersonate you and destroy your reputation, buy things in your name, consult all your old photos and learn everything about you, etc, and no platform will ever ask them a EU id at any point in the process.

The idea that a platform asks for a EU id for any reason other than making the GDPR request process more painful is laughable.

afiori · on May 27, 2021

there is a possible reason, the gdpr applies to EU citizens, EU residents, and people within the EU, so it is reasonable they ask you to prove you are one of these categories.

PoignardAzur · on May 27, 2021

Yeah, no kidding. Of course they want you to prove you're a EU citizen, because they want to make as little effort as possible.

I don't consider that reasonable. Data portability should be a right. You shouldn't have to jump through hoops to exercise that right, and companies shouldn't be asking you "can you prove beyond a doubt that we're legally obligated to give you your data" before doing so.

afiori · on May 28, 2021

I agree, I would like them to offer this service to everyone, but the reason they ask for an EU ID is not "emails are to easy, have them suffer" but more likely "we really have not set up a process for this so we will not do this unless forced by the law"

The solution to this is to have their own government impement a GDPR-like policy.

mrweasel · on May 25, 2021

What do they mean with an EU id? Passport, licens? Those have my CPR number (think SSN, but different). There no way I showing that to export a playlist. It’s suppose to be kept secret. If your company can’t respect the GDPR how am I to expect that you’ll safely handle the single most import personal information I have?

varispeed · on May 25, 2021

Companies think that the data that is portable is your email address, profile picture, address, IP addresses - but other things like posts, comments are not. It is actually not well defined in GDPR and if portability means transferring your profile (e.g. username, email and some details about you only), then GDPR is pretty much useless in that regard.

account42 · on May 28, 2021

> Companies think

Which ones have you tried exporting your data from?

mxmilkiib · on May 25, 2021

Remembering http://dataportability.org etc

djdeutschebahn · on May 25, 2021

Maybe these two are also relevant:

https://github.com/google/data-transfer-project

https://datatransferproject.dev/

ot1138 · on May 25, 2021

Do you know what happened to them (and/or some of the other companies/projects/initiatives that launched with the same goals)?

mxmilkiib · on May 26, 2021

I've a messy collection of links (many of which I need to web.archive.org fix) on https://wiki.thingsandstuff.org/Open_social#DataPortability that you might be interested in.

Basically, companies thought it more profitable to not put any effort into letting users escape their service (or keep chat federated, etc.)

Various threads are still around though.

ot1138 · on May 26, 2021

Wow, this has been around for awhile. Pretty interesting take on this in the link to Brad Fitzpatrick's article from 2007.

Your statement about companies being unmotivated to support anything like this rings true to me. It is exactly what I would expect. The question is, what might motivate them.

mxmilkiib · on May 26, 2021

Proof that it works, that it can lead to customers migrating in. Or some kind of legislation. Main point is for it to work though.

Personally, I'm waiting for some combo of SOLID, the Fediverse and Matrix. SOLID is like FOAF and XFN and *dav and etc on steroids, very JS orientated though and not the most friendly of UX.

tester34 · on May 25, 2021

where can I download my HN's data?

notRobot · on May 25, 2021

You can't. There's also no easy way to request all profile data deletion, unfortunately.

However, they do respond to privacy requests, see:

https://news.ycombinator.com/item?id=26959559

https://news.ycombinator.com/item?id=26410165

capableweb · on May 25, 2021

Have you tried emailing hn@ycombinator.com and it got denied? Or what you mean there is no easy way to request the data deletion? AFAIK they don't scrub the comments but if you request it, your username will be replaced with [deleted] for all your comments.

basisword · on May 25, 2021

I remember trying that with an old account a few years ago (suggesting your solution of just hiding the username) and was denied. Maybe things have changed since then though.

murphy1312 · on May 25, 2021

that is by no means an easy way.

easy would be a button on the profile page for example.

capableweb · on May 25, 2021

Ah well, I guess "easy" is relative. I'm sure if you send them one email, they'll confirm it with you once within a month and then delete the data.

Compare that to Coinbase, which has forms, buttons and seems it's mostly an automated process instead of manual email, but I've tried getting Coinbase to delete my account + data for over 6 months now to no avail, multiple emails back and forward where they confirm the deletion, say it's in progress, I email back after a month and they ask me to confirm the deletion again.

So even with a button, doesn't mean the process is easy, and there is also a lot more to consider than just how you initially the request.

hungryforcodes · on May 25, 2021

Mind you, Coinbase probably has an obligation to keep your data for x number of years for both tax and auditing purposes.

Buttons840 · on May 25, 2021

This is such a grey area. Do emails others sent to me belong to them? Do my HN comments make the entire conversation partially mine? If one of my comments is "well said", and the parent deletes their comment, is not my comment diminished? What do we do about quotes? Etc.

capableweb · on May 25, 2021

Solved problem already: Hash the username + a salt and change that everywhere. Every comment is from a unique author + the comment body is still there + all the replies are still there but, author name has been removed.

Buttons840 · on May 25, 2021

That's a decent solution. But I think simply replacing the usernames with [deleted] is better. It leaves the comment but detaches the user and breaks the link between all the users comments.

capableweb · on May 25, 2021

It becomes very hard to track conversations with N+2 users though, if more than one has the [deleted] username. Hence the hashing to get a unique [deleted] username for each user.

kuschku · on May 25, 2021

That's not legal either. If the comment body contains personal information anywhere, GDPR also applies to it.

newswasboring · on May 25, 2021

I have sent such emails for a previous account, the emails were ignored.

capableweb · on May 25, 2021

Last time I "archived" my account data on HN I used https://github.com/HackerNews/API which seems to be working good enough for my needs.

thatguy0900 · on May 25, 2021

Hn has no EU presence so doesn't have to follow EU laws, no? Or do they have to ip block Europeans? What would the EU actually do to hn if they did decide to enforce the rules here?

burntoutfire · on May 25, 2021

Typical approach is issue a fine and then seize the assets in the EU that belong to HN's owners (if there are any).

alexaholic · on May 25, 2021

GDPR is about data, not companies. It applies to all entities regardless of where they are established as long as they're doing business in the EU or processing data of EU citizens.

dahart · on May 25, 2021

True, but GDPR does not automatically apply to global companies that just happen to get used by EU citizens. There are two separate conditions, either one is sufficient, but if neither are met then GDPR does not apply. The company must either offer services to EU citizens directly, or profile behavior of EU citizens, e.g. via direct advertising within Europe. See Recitals 23 and 24 https://gdpr.eu/Recital-23-Applicable-to-processors-not-esta...

alexaholic · on May 25, 2021

Yes, see also my other comment https://news.ycombinator.com/item?id=27278939

tremon · on May 25, 2021

Indeed, my answer would be no. But IANAL, IANYL and TINLA.

There's https://gdpr.eu/companies-outside-of-europe/ :

> Article 3.2 goes even further and applies the law to organizations that are not in the EU if two conditions are met: the organization offers goods or services to people in the EU, or the organization monitors their online behavior.

Recital 23 clarifies what is meant by the organization offers goods or services to people in the EU: https://gdpr.eu/Recital-23-Applicable-to-processors-not-esta...

> In order to determine whether such a controller or processor is offering goods or services to data subjects who are in the Union, [..] the mere accessibility of the controller’s, processor’s or an intermediary’s website in the Union, of an email address or of other contact details, or the use of a language generally used in the third country where the controller is established, is insufficient to ascertain such intention, factors such as the use of a language or a currency generally used in one or more Member States with the possibility of ordering goods and services in that other language, or the mentioning of customers or users who are in the Union, may make it apparent that the controller envisages offering goods or services to data subjects in the Union.

Profiling is clarified in recital 24: https://gdpr.eu/Recital-24-Applicable-to-processors-not-esta...

> it should be ascertained whether natural persons are tracked on the internet including potential subsequent use of personal data processing techniques which consist of profiling a natural person, particularly in order to take decisions concerning her or him or for analysing or predicting her or his personal preferences, behaviours and attitudes.

So, I'd say no. The mere fact that HN is accessible to people in the EU does not show intent. HN is an English forum, which is the native language of the country where it is established, and does not offer its services in additional European languages, and does not advertise products in the Euro currency. I'm unable to know for sure, but I don't believe HN is using my posts here to predict or analyse my personal preferences either.

alexaholic · on May 25, 2021

I'm inclined to say that's a wrong interpretation. You don't have to sell anything to be required to be compliant with GDPR. My understanding is any entity (not necessarily a company, mind you) collecting personal or behavioral data of EU citizens needs to comply to the GDPR. Were HN to collect such data, EU laws would apply. But take that with a pinch of salt, I'm no lawyer or anything.

vincnetas · on May 25, 2021

There is public API for HN data

https://github.com/HackerNews/API

Does it count like ability to download your data?

user-the-name · on May 25, 2021

No. It needs to be accessible to everyone, not just to programmers with lots of free time.

dahart · on May 25, 2021

Question: are you an EU citizen, and is there any way for HN to know whether you are an EU citizen? (Your public profile page has no personally identifiable information.)

GDPR is an EU law that applies to sites that market directly to EU citizens. How and whether it applies to sites outside the EU has been debated. GDPR can prevent a site from operating in the EU. But GDPR does not apply to a US citizen using a US-run web site.

https://en.wikipedia.org/wiki/General_Data_Protection_Regula...

https://gdpr-info.eu/art-3-gdpr/

Edit: speaking of personally identifiable information, GDPR defines the information that is subject to download as “personal” information, only when it can be identified. Do you have data on HN servers that is subject to GDPR even if you live in the EU? (I don’t think I do.)

See 4.1: https://gdpr-info.eu/art-4-gdpr/

_wmhc · on May 25, 2021

>only when it can be identified.

Note that it also includes indirect identification, which means that if combined with other data it would identify you. Recital 30 might be of use here too;

>Natural persons may be associated with online identifiers provided by their devices, applications, tools and protocols, such as internet protocol addresses, cookie identifiers or other identifiers such as radio frequency identification tags. This may leave traces which, in particular when combined with unique identifiers and other information received by the servers, may be used to create profiles of the natural persons and identify them.

Rygian · on May 25, 2021

My HN username (Rygian) is PII because it can be used to identify me indirectly (HN has a log of my username connecting from IP x.y.z.w, and my IP address is PII).

dahart · on May 25, 2021

In the US there is precedent (existing court rulings) against IP address being PII. Obviously, IP address is not very good PII, and never guaranteed to be able to identify someone.

Whether HN has a log of it is an assumption I don’t have a way to verify. Lots of privacy-conscious sites purge connection logs often and/or refuse to keep them for this very reason.

zorked · on May 25, 2021

IP addresses are PII in Europe.

nemoniac · on May 25, 2021

This may sound pedantic but PII is not even mentioned in the GDPR. It's a notion from U.S. law.

The GDPR refers to "personal data". Everything you say above about PII is true of personal data under GDPR.

pbhjpbhj · on May 25, 2021

Most people have an email address on their profile, that's PII. One could post one's name, that's definitely PII and AIUI that affects all the data then on the site, as it's now associated.

dahart · on May 25, 2021

Are you sure forum comments can count as PII, according to GDPR definitions?

M2Ys4U · on May 25, 2021

Article 4(1) of the GDPR states the relevant definition: "Personal data means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person".

That should be read in light of the recitals, for instance recital 26:

"The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes."

From these I think it should be obvious that forum comments may be (and even probably are) considered to be personal data under the GDPR.

dahart · on May 25, 2021

It’s also obvious that many comments might not be PII too, right? Whether they are depends entirely on whether the user has chosen to share PII, but in any case it’s not automatic, it’s not structured data that’s easily searchable in general, and typically depends on whether other identifying information is available. In other words HN doesn’t ask for PII, has no way to know what comments are PII in general, has no way to reliably identify EU citizens, does not operate in the EU or target EU citizens, has no structured way to profile EU citizens. I’m wildly in favor of online data protections, and I think the GDPR has done many good things, but this particular example does not seem to constitute a clear example of either GDPR applicability nor (tangentially & IMO) of need for data control.

M2Ys4U · on May 25, 2021

>GDPR is an EU law that applies to sites that market directly to EU citizens.

That is wrong. The GDPR does not make reference to citizenship.

It explicitly notes that it applies when either when the data subject is physically in the EU/EEA, or when the data controller/processor is based in the EU/EEA.

dahart · on May 25, 2021

You’re right, I described it incorrectly. GDPR applies to “subjects (natural persons) within the Union”. As an example, EU citizens living abroad are not covered by GDPR. Americans visiting HN from the Bay Area also shouldn’t expect to have the rights that GDPR grants to subjects within the Union, right?

nicbou · on May 25, 2021

There are still some issues with it (incomplete data, manually triggered data exports), but it's a notable improvement nonetheless.

It's particularly valuable when it lets you export instant messaging conversations and shared photo albums. It means that companies cannot hold your data hostage to keep you on their platform.

I use GDPR exports for a personal data thing I'm building [0][1]. It simply wouldn't work without GDPR, because public APIs are increasingly rare. Most of your personal data is locked and GDPR data exports are usually the only way to access it on your own terms.

[0] Intro: https://nicolasbouliane.com/projects/timeline

[1] Code: https://github.com/nicbou/timeline

jFriedensreich · on May 25, 2021

it took me fighting 6 months with viacom support to get my song plays for last.fm . spotify improved from 2 weeks to 2 days but its still ridiculous to call something true data portability that is not automatic and not instant. a lot of companies tried giving me semi obfuscated pdfs or html without classes or classes that were random strings, we need to improve the law to enforce instant availability and an industry standard format like json or xml. also this needs to be completely automatable without having to do it myself.

capableweb · on May 25, 2021

> it took me fighting 6 months with viacom support to get my song plays for last.fm

Sue them. It should be faster than 30 days according to GDPR.

> a lot of companies tried giving me semi obfuscated pdfs or html without classes or classes that were random strings

Giving you obfuscated data is also against GDPR as the data needs to be clearly machine-readable. Again, sue them as you now have two points against them.

jeroenhd · on May 25, 2021

The GDPR does not give you any way to sue them directly. You can report the company to your country's DPA, which should look into the issue and might take it with the offending party in a court of law. That is assuming that the parent is actually an EU citizen or a foreign citizen living in the EU; if they aren't, the GDPR doesn't apply to them.

I see a lot of (mostly non-EU) commenters thinking that the GDPR is grounds for any individual to sue any company for practically anything because privacy is hard, (which is probably why everyone was so hyped to hate on the GDPR) but that's just not how it works.

As much as I value data portability, I'd much rather see a DPA sue the hell out of the companies that make those ridiculous, illegal cookie walls and popovers filled with dark patterns instead.

ot1138 · on May 25, 2021

Sue them under what law or jurisdiction?

xavdid · on May 25, 2021

For what it's worth Facebook and Instagram (also owned by FB, but is fairly separate product-wise) have pretty good export tools. You make a request in the web UI and a short time later, can download a zip with a bunch of JSON files. I was pleasantly surprised by how much they included.

fossislife · on May 25, 2021

Under GDPR, they have to include everything (they admit) they have about you, isn't that right?

anticensor · on May 26, 2021

Unsurprisingly enough, they only include what you entered yourself, but not derived data about you.

xavdid · on May 25, 2021

I think, but I'm not sure how accessible it has to be. I was mostly commenting on how approachable the data format was. Formatted json with descriptive keys.

I have no idea what the law requires about the data format, so they could be doing the absolute minimum.

mehdim · on May 25, 2021

Co-author here of the research. The most simple and effective and rapid solution would be to impose API neutrality. As explained in the report, it would just obliges API providers to give back the same API access to users than they give to their partners. For instance, why I get less data from Facebook if I ask my personal data, than if I create an app and ask maximum app permission (all OAuth scopes)? API neutrality already works. For instance, Open banking in UK and PSD2 in Europe apply API neutrality. Any 3rd party can access to a bank API if they are granted by the user to do so. After 2 years, for instance, up to 20% of the UK online banking population beneficiated from it as "Banking data Portability via APIS" . 20% is huge. If FAMGAs and all other big companies data was accessible via "neutral APIs" to users, data portability would be "a thing"

Also, the fact that you don't know what to do with you data dump in JSON is a blocker. With APIs, integrations by 3rd parties are simpler and more user oriented.

Last point, with API neutrality, no need of maximizing "interoperablity" (even is is always useful and makes things simpler, we have seen that with DataTransferProject it does not work really as companies don't work with the same data model) Developers will do the matching work between the original app and the destination app, no worries, when incentive is here, middleware glue will come. The problem these days is that the source of data is useless, has no value, so no incentive. You can look at this study with GDPR Facebook data value for developers https://www.law.nyu.edu/centers/engelberg/pubs/2019-11-06-Da... The main question is : Why a Facebook GDPR Data dump/takeout has no value for developers where Facebook API has value for millions of applications developers and businesses? With API neutrality it will have maximum value for users (as it has already value for partners) and minimizing fatigue to implement portability (an API is lot more developer friendly than a JSON dump that you receive in 30 days via email and that the user need to upload somewhere)

robin_reala · on May 25, 2021

The best use of GDPR for data portability that I’ve ever seen was right here on HN: https://news.ycombinator.com/item?id=24764371

Long story short, Confiks takes Spotify to task for removing the API that SongKick used to retrieve playlist data; a short time and several factual emails later they restore API access.

okamiueru · on May 25, 2021

It's great that it made Spotify activate a useful API they had little good reason to terminate. However, that "victory" always left me thinking a lot of people missed the point, including the person who challenged Spotify, as well as Spotify employee who responded to those emails, who confirmed they had little good reason to disable/remove the API.

To summarize, Spotify can of course terminate any API they so wish, and there is absolutely nothing in GDPR that compels them to keep it up. As long as they of course still follow the requirements by GDPR. As annoying as a workaround would be for Confiks, emailing the data within those 30 days to SongShift is acceptable and in compliance. There is nothing in GDPR that says "data transfer options once enabled cannot be removed and/or changed".

Quoting the clause "the data subject shall have the right to have the personal data transmitted directly from one controller to another, where technically feasible.", always seemed off with what is being demanded, because, they would be complying with that requirement by compiling and archive of the data, a big dump, and send it to SongShift. "I demand that you re-enable the API", on the other hand has no basis whatsoever in GDPR. The quote follows with "... or allow for some other method to allow me to exercise my rights under the GDPR", which is exactly right.

Spotify re-enabled the API because they chose to, not because they had to.

To clarify: I think it was great that Confiks convinced Spotify to re-enable the API. But, the arguments presented were not particularly convincing, and I took this as a case of Spotify doing the right thing, rather than a "David vs Goliath".

wizzwizz4 · on May 25, 2021

> Spotify re-enabled the API because they chose to, not because they had to.

Spotify re-enabled the API because:

• they already had the API, so they couldn't claim it was an undue development burden to re-enable it;

• it was cheaper than setting up another system based on manually emailing large chunks of the database; and

• it was good press.

okamiueru · on May 25, 2021

Indeed. But those are probably the reasons for why they chose to. My point is that Spotify chose to. A lot of people made that correspondence out to be that they had to, per GDPR. And, as this thread discusses, and your list correctly omits: GDPR was not one of them.

gpm · on May 25, 2021

Bullet points 1 and 2 in that reply only exist because of the GDPR...

okamiueru · on May 26, 2021

Regarding bullet point 1, this is where Spotify went wrong (if their intention was to not provide this API), because in the correspondence, they did make it apparent that enabling the API was trivial. However, they could have said that due to business reasons, they decided to no longer maintain that API, and there is nothing compelling them to do so.

Regarding the second bullet point, you would have to explain to me what that has to do with GDPR.

To get back to the original point, activating this API is suggested as the clear winning example of GDPR. And, the counter argument I provided is that Spotify could have chosen to not activated the API, and still be in full compliance of GDPR. However, they evaluated the situation, and decided to activate the API.

I'm not making a point beyond this, so the responses so far have been a bit confusing. Yes, GDPR was part of the factors that pushed in the direction of enabling the API, and this is a good thing. However, the argument is only that GPDR did not compel them to do so.

gpm · on May 26, 2021

> Regarding the second bullet point, you would have to explain to me what that has to do with GDPR.

Without the gdpr the cheapest alternative is to dev null the email, with the gdpr the cheapest alternative is to re-enable the api.

> I'm not making a point beyond this, so the responses so far have been a bit confusing.

Welcome to the internet :P

I do think your point is a bit weak here to be honest, and that people are reacting to this. It relies on interpreting the original demand as strictly "re-enable the api" and not "comply with the GDPR, and btw we both know the easiest way for you to do that is to re-enable the api".

In the example being used it was really the latter, consider that all the followup emails included the phrase "or allow for some other method to allow me to exercise my rights under the GDPR". It's also not like the overall goal of exporting data to another provider would have not been achieved if they implemented some new method instead, you can be sure that songshift would still do the work to make it easy to switch from spotify to them.

Google234 · on May 26, 2021

It’s could easily be argued that it’s an burden to maintain them.

gpm · on May 25, 2021

Not that this impacts your broader point, but

> the data within those 30 days

This is a common misconception. The requirement is

> GDPR information must be provided without undue delay but at latest within one month. Only in reasoned cases may this one-month deadline be exceptionally exceeded.

The operative requirement is without undue delay, one month is a maximum on the definition of undue delay, not a minimum. Intentionally delaying compliance for the better part of a month after you have demonstrated the ability to send the data within seconds would undoubtedly constitute undue delay... (Some exceptions apply)

okamiueru · on May 26, 2021

I agree. Though, in the context where I do mention this, I was pointing out the worst case that is easy to demonstrate a breach of. Spotify employee in correspondence made the mistake (and I mean this in the more business sense, as I personally find it refreshing that a correspondence is honest, rather than filled with protective legalese) of suggesting that the API would be trivial to reactivate, and thus making a case for it to be a case of causing an undue delay if not doing so. Without that, proving "undue delay" is not an easy thing to do.

Breach of this is not easy to prove, and I think that if this was to reach the court, it would be easily be dismissed if the user had their data transferred (in any form that can be processed, and not necessarily whatever was part of the original API spec) within 30 days.

Proving that undue delay was caused, is a much higher bar to meet. Again, less high due to the correspondence in question.

jbverschoor · on May 25, 2021

What about data-portability of in-game assets?

gpm · on May 25, 2021

You want a csv saying what "this account has this item" flags are set for your account in the database? Ya, you can probably get that, for all the good it will do you.

5560675260 · on May 25, 2021

Theoretically you could ask for your account data in a game and for it to be sent to developers of another game. But there are very few if any incentives for receiving party to honour your purchases somewhere else. And even if they would chose to do this - they will not be able to provide you same assets (unless we are talking about some unity store bought models).

selfhoster11 · on May 26, 2021

Let's solve this with NFTs!

Partly joking, but partly serious. Maybe we could use something like this.

brutuscat · on May 25, 2021

ZKP all the way https://www.aepd.es/en/prensa-y-comunicacion/blog/encryption...

kijin · on May 25, 2021

Exporting your personal data is only half of the story. Importing is the other half.

Suppose I exported all of my posts, photos, contacts, and a bunch of metadata from social network A. Perhaps I could view the contacts in Excel and browse the photos in my favorite gallery app. But unless I can upload it all to social network B and continue as if I've been using B all along, the data is not really "portable". It's just a backup, a frozen snapshot that can't be unpacked anywhere else.

I'm not even sure if it makes any sense to import one's Twitter feed into Instagram or one's Facebook profile into Reddit. Edit: I'm not saying this is because of anti-competitive behavior on anyone's part. The services simply are so drastically different.

Mordisquitos · on May 25, 2021

As you say, the data domains of different services may simply be logically incompatible, which is fair enough. As critical as I am of social media platforms, I don't think they should have any obligation to implement the abstract ability to "import" data in general. As long as they provide their users with the ability to export their data in a reasonable fashion, in a well-defined and consistent open format, and as parseable as technically feasible, I believe they have done their duty.

What's important is that a Twitter, or Facebook, or Reddit alternative should be able to implement data import from their respective competitor, without facing intentional anti-competitive difficulties. Let "the market" decide which services want to implement the import of which kind of data.

ncallaway · on May 25, 2021

Absolutely. Service providers have a real incentive to implement _import_ functionality, since it means an opportunity for new users on the platform.

Since the incentives for the service providers line up with the users, I don't see a need for the government to regulate importing data.

Export is the exact opposite. The incentives for the service providers go directly against a user. After all, if the provider export a use's data, then then that user can just leave!

toyg · on May 25, 2021

Where this is a real need, something will come up - somebody will develop a service, a utility that can be a bridge between the exported data and the new target. Without the export, this would not even be possible.

dane-pgp · on May 25, 2021

> Exporting your personal data is only half of the story. Importing is the other half.

If we're wishing for things, I'd like to add the ability to have "live" exports/imports. By that I mean two services with comparable data types should be able to keep your accounts in synch, such that a change made on one service is reflected (after only a short delay) on the other service.

You would still have to visit Facebook if you wanted to see what your friends there were saying, but they would be able to see what you were posting on Mastodon, for example, without needing to create a Fediverse account.

BiteCode_dev · on May 25, 2021

Putting a square into a round role does not make sense, but you may now be able to design a compatible square or round hole that one can move to.

beckingz · on May 25, 2021

On the other hand every single app at one point would happily import your contacts no matter how they were formatted...

The incompatibility is by design.

DocTomoe · on May 25, 2021

Contact data is relatively uniform - you got names, addresses, phone numbers, maybe a few dates. Even with the handling of different addresses per contact, different systems already show divergent behaviour, even when they use the vcard standard.

This becomes immeasurably more difficult with information from, let's say, Amazon: A competitor would not have the articles I've viewed, or the comments I have left under questions, or a compatible rating system.

Even social networking sites ... how useful is it to import twitter exports of my tweets answering to someone if neither that someone nor the content I answered to is available in the target system?

Sometimes, it is by design - in the majority of cases, it's just different people implementing different use-cases differently.

matharmin · on May 25, 2021

Without something like the GDPR, incentives are very different for exporting vs importing: Import functionality makes it easier for someone to switch to your service, while export functionality makes it easier for someone to switch away.

Additionally, if a service is missing import functionality, you can choose to not use it. While you'd often only realize you need export functionality while you've already used a service for years.

Maybe the case of porting data between social media services doesn't make much practical sense, but there are many other services where it does. Fitness tracking apps as an example - you'd likely want to take your history with if you switch services.

And usually these services would not be using the exact same data format for import vs export. But that's not as big an issue in practice - you'd often find third-party scripts or services that can do the conversion for you.

goodpoint · on May 25, 2021

Data from a popular social network is gold.

Other social networks would implement import functions in a day.

dariosalvi78 · on May 25, 2021

The gdpr actually says that data should be automatically transferred from one controller to another where technically feasible. https://gdpr-info.eu/art-20-gdpr/

dundarious · on May 25, 2021

2 of the 6 ways portability is broken are duplicates of each other.

capableweb · on May 25, 2021

Not sure it got forgotten, I think members of https://datatransferproject.dev/ didn't start actively moving until GDPR came into effect, and it seems they are doing _something_, although still it's very basic.

jvalencia · on May 25, 2021

https://github.com/google/data-transfer-project

Danski0 · on May 25, 2021

Forgotten? It's a basic GDPR article (article 20) that every somewhat serious EU company know.

Link bait by a company making their profit of GDPR confusion.

mihaic · on May 25, 2021

Overall, I think GDPR is a positive force that protects consumers. It does have one major downside though, and that's that it treats entities of all sizes in the same way.

Placing the same regulatory burdens on start-ups as on big tech is a drag on innovation, and it's frustrating that there is no minimal cap on users before GDPR comes into effect, given how the EU has constant exception for artisanal food and goods manufacturers.

Lawyers and third parties want to get a piece of the pie, so they'll present themselves as indispensable. It's almost as if this is the EU version of TurboTax.

dheera · on May 25, 2021

I think a much stronger solution would be to require all browsers to disable cookies by default, and let the user opt into websites that you need to sign into.

In its current state GDPR has effectively just littered the website with popups that don't let you disable "functional" cookies in the popup. Functional my ass. The pages work fine without them. I disable all cookies except on a short list of sites I need to log in. Unfortunate side effect is the goddamn GDPR popups keep popping up on all those news sites.

chrizel · on May 25, 2021

Yes, I don't understand why this is not handled on the browser level. Back then in the 90s browsers asked the user for every web page that wanted to store a cookie. The situation we have now is not much worse.

But now every web page (at least in the EU) makes some kind of ugly popover and many of them try to convince you to accept all tracking with some kind of dark patterns of UI design.

I don't understand why we won't stop all this nonsense and build this stuff into web browsers as opt-in or opt-out (let the user decide) and therefore ensure that the UI is always the same without any dark patterns.

capableweb · on May 25, 2021

> I don't understand why we won't stop all this nonsense and build this stuff into web browsers as opt-in or opt-out (let the user decide) and therefore ensure that the UI is always the same without any dark patterns.

Because GDPR covers the everything, not just the web. If we had protections in the browsers, smartphone apps would still be affected. Instead, we fix it on a regulation level and no matter if it's web, apps or quantum-apps, we'll be covered.

TheManInThePub · on May 26, 2021

*sigh*

Here we go again.

The GDPR has no problem AT ALL with cookies. Use as many as you like with no need for popups. However, if you are using cookies to track or personally identify me (advertisers take a bow), then you need to ask my permission to do so. And so you should.

I am unaware of how a browser may possibly be used to block only personally identifying cookies, and besides, putting the onus to do so onto the data owner is against the principle of the GDPR; that personal data is MINE and you must ask my permission to use it.