This goes against my internal logic about personal privacy. The solution to online privacy and data mining is not collecting it all in a central repository, it is not collecting it at all.
Further, and I realise this will come off as alarmist, but, what then if the software suffers a 0-day? All that data will then be nicely aggregated for a bad actor. Somehow, knowing that there is perhaps a non trivial amount of work to be done to collect data and compile a profile from many different sources feels safer than putting it all in one place.
> All that data will then be nicely aggregated for a bad actor.
If I were to host my own backend for personal data on server/platform XYZ and there is a 0-day for platform XYZ, the bad actor need actively search out my server and get my data from my server. But a nicely structured datadump is not particularly valuable if it's just one person. So you need to hunt down all other instances of XYZ and aggregate all data to get something someone would like to pay for. But this aggregation is stale and when xyz is patched and months have passed you just have gigs of data that has gone bad, and just like rotten fruit that wont sell for much. So i would say, in practicality, given enough decentralization, and a lot of competing platforms, the hypothetical bad actor in this scenario is much worse off than the non-hypothetical bad actors we have running around and fucking with our data right now. FAANG et al.
You make a good point. PDS solutions aim to get rid of "big data", and centralised data lakes that can be queried. It's not inherently a bad idea, but:
> a nicely structured datadump is not particularly valuable if it's just one person
> But this aggregation is stale and when xyz is patched and months have passed you just have gigs of data that has gone bad, and just like rotten fruit that wont sell for much.
Not really. First off, I would imagine it would be possible to script finding people's servers and scraping it for data. Ultimately these servers will have to be hosted somewhere and systems like masscan make it easy to rapidly find servers hosting software that you can exploit. What's more, now the person is responsible for this risk level. Sure, a couple of experienced sysadmins like myself or you would know how to secure our data and make the server difficult to scan or probe, and difficult to access in the worst case, but how many users are actually going to be able to put in the time to learn system administration, to ensure that a server they are hosting is secure? It takes a lot of work, especially if you do not know the first thing about computers.
The end result of this will drive the introduction of businesses whose responsibility is to host these servers, and now you are back where you started, except worse! I can reasonably assume that just because my welfare data has been breached, that does not mean that they could access my medical records. Now however, that is not the case!
Secondly, even data that you would assume is stale, can be important and viable. Old phone numbers, for example, are still valuable as they can be used to construct a history for the given person, and often identity confirmation procedures require listing old information along with new information (A friend recently had to list places they had lived at to confirm their identity, which meant that they were unable to confirm because it was requesting a full list of addresses they had lived before they were ten (!)). Databases like Medical Records or your National Insurance Number do not tend to lose their value just because they aren't from this year data. Often old security questions and passwords are just as valuable as new ones, old information can be used to construct a 'good enough' profile and either used to sniff out newer more viable information, or used to aid the rapid generation of possible and likely passwords, among other things.
Thanks! Very valid points, I left out all the nuances to get some counterpoints and yours are very valid. I think the biggest issue, as in most federated/decentralised scenarios, is the inevitable(?) backend/server hosting providers that will crop up. In this case there would be very large incentives to try to provide "easy solutions" that hide the technicalities allowing for loopholes to aggregate and sell data. The individual datapoints might be encrypted but you might monitor what kind of data consumers are attached to the PDS and based on how much activity the consumers generate aggregate and sell data about eg. users with many/active fitness related data consumers and target these users with ads about fitness equipment.
Disclaimer: I couldn't really grasp how Personium works from the "app screen demo" but it didn't stop me from commenting...
Classic Internet then, if it can't be perfect why bother at all. Let's just complain and look at all those stupid people sharing things on Facebook?
Alternatively we can set up our own servers that won't be monetized by Facebook or Google but since they can potentially be broken into, why bother?
Or do you mean we should keep our data on hard drives in a safe and plug them into an airgapped computer whenever we want to look at photos or listen to music?
This is a bit harsh but this is an important topic and at the moment I cannot come up with a better explanation.
I'm not advocating for anything specific here. :) And frankly, I take your point and actually agree wholeheartedly. The status quo of data mining and tracking is terrible, and leads to exactly what you're talking about: people changing their behaviour (not just online) because they feel like they're being watched[1].
I realise I'm not providing a solution. I wouldn't even feel confident at pointing a general direction. I'm merely pointing out that I don't believe the right way to solve this problem of personal data aggregation is consolidating all this personal meta-data into a single spot.
I understand this, and there's an argument to be made about ethics here.. but, these companies do have a non-trivial amount of work to do to collate the data into profiles. There are also ways of making the attribution of this data to an individual more difficult for these companies.
Yes, I agree. Creating more distributed forms of risky activity, doesn't make the activity much less risky. It just introduces new attack vectors. The answer is to not use services that collect data.
Of course. :) I don't mean to patronise, and I hope you're not trolling, but I feel like you're missing the point a little. Using a "cloud" to store potentially sensitive documents or information is not comparable to collating your digital movements, usage history, habits, purchase information, etc.
It's not just about controlling the information, it's about collection in the first place. I want to store photos, and I need to store documents that are sensitive in a place that I can access them easily, and securely. These things are important to me. To me – individually – because they relate to me.
Collecting information about my online activity is not important to me. It is, however, important to advertisers, data brokers, and other players on some arbitrary scale of nefariousness.
Solid is ok for building UI apps against a pretty layerable interface and open data, but they aren't investing enough in end-to-end encryption, which isn't surprising because it doesn't seem immediately required for inrupt.com's phase of startup.
Web UI will code against DIF-SDS or Solid.
If whatever format isn't encrypted, it could be translated to an encrypted blob, then stored in
* centralized/cheap/simple: DIF-SDS.
* decentralized/redundancy/distribution: DHTs or next-gen "Internet Computers"
While I like what PDS projects like this are aiming to achieve the two big problems with the approach that I see are:
1) Traction. There's no reason for third parties to give up what they view as lucrative personal data on you to have it stored elsewhere. The status quo is profitable for them, and this is an unproven system with no market share - so they have a handy excuse not to cater for it.
2) PDS solutions don't only aggregate your data, but congregate it - centrally - for a bad actor to later exploit should the software ever be compromised.
I think the ideal solution here is something along the lines of what Blur and Apple Private Relay are doing.
As a side note, I'd really like to see Apple expand the private relay service beyond Sign in With Apple so that I can use private relay addresses with third party services that haven't already moved to support SIWA. Given the private relay e-mail is bound to the user by Apple, they should be able to make this slick enough to allow you to sign-in to the same account later if that service moves to support SIWA down the road.
Beyond that, I'd love them to expand to anonymization services. I would be quite happy to have packages or mail from third parties arrive addressed to "FAO: RELAY-AMAZON:GX43UJXKL56ASFHU" rather than to my actual name such that I don't need to give that information out.
With Apple's (or even Google's) clout, I could ultimately see either of them (if they wanted to do so) win over a lot of goodwill by pushing the private relay to a person's full identity rather than just their e-mail. They've shown before they can make moonshot disruptions work if they're so motivated (e.g. Apple Pay). Google is already doing this for phones with Google Voice. I'd pay for that service. I'd pay a lot for that, actually.
> There's no reason for third parties to give up what they view as lucrative personal data on you to have it stored elsewhere. The status quo is profitable for them, and this is an unproven system with no market share - so they have a handy excuse not to cater for it.
There are a lot of third parties, and for many of them, keeping your data is not their expertise nor business model, and neither is the protection of it. There's plenty of personal data sloshing around in badly protected databases of companies that need to use it for their primary competence, and for whom new legislation is increasingly turning it into a liability. Being able to use it without needing to be responsible for it can still be an attractive proposition.
(Disclosure: I work on what is presumably a competitor to this project, though opinions are my own.)
I look forward to the day we get there. From my own experience, companies aren't taking those responsibilities nearly seriously enough yet - although heftier punishments under new laws will hopefully change that.
Instead of giving your contacts to facebook, why can't facebook instead operate directly on them in your personal data store? Why does youtube know which videos I've liked but I don't unless youtube allows me? Why is sharing my github repo something github has to do, and not me? Why are all my posts randomly dispersed and locked across multiple networks and not available for me to query and analyse as I see fit?
I hope eventually we all own and control our data while platforms are given limited access to it, rather than the other way around.
1. What kind of "operate on" wouldn't necessitate sending you contacts to them?
2. Because they're YouTube videos that you watched on YouTube and pressed the like button on YouTube. If you "like" them using a 3rd party client (I think NewPipe cam do this) then that information is saved locally
3. "Why is sharing my GITHUB repo something GITHUB has to do, and not me?" - see the problem now?
4. Because there are like 50 of us that would actually care to do that and there's no money in it for them. FWIW, doing that is actually possible these days with all the GDPR data export options, just a little tedious.
> If you "like" them using a 3rd party client (I think NewPipe can do this) then that information is saved locally.
Exactly. When you export your NewPipe data it creates a db file with all your searches, stream history, subscriptions and more. All of this happens without you having a YouTube account and this data doesn't leave the device until you decide.
> FWIW, doing that is actually possible these days with all the GDPR data export options, just a little tedious
It's tedious to the point that it is something else. The same reason the web is scrape-based rather than api-based today. When something is ancillary and done for compliance alone, you rarely get what you are looking for.
Say everyone had their own personal, live, accurate, predictable, actual data tree:
There would likely be an ecosystem of new services that create new value for you from that data. Combining things, extracting insights, backing things up, sharing things, organizing things, AI'ing things.
> 2. Because they're YouTube videos that you watched on YouTube and pressed the like button on YouTube.
Yes, that's how it is now, but it can be that going to youtube is granting them access to a branch of your tree on your server (which could be an encrypted managed 3rd-party).
> 3. "Why is sharing my GITHUB repo something GITHUB has to do, and not me?" - see the problem now?
As above, github the service can be made to operate on data that is yours in your warehouses.
> there's no money in it for them
They are still the app, they make the interconnections, they have tons of the valuable data and meta data.
It is somewhat confusing that the documentation of a Personal Data Store server does not contain any reference to how a person would use it to store data.
From the demo video it seems more apt to consider it as a "Broker of Personal Data," based on the examples shown where data is already stored in third party services, and the role of Personium is more oriented towards the grant/deny/revoke logic of data consumers attempting to access data.
My problem with all these "personal data tracking" systems is: if I have to enter the data manually, then I'm not going to do it.
It's 2021 and we still need to enter manually "ate 200gr of chicken for dinner". People, the problem is not how to store data, the problem is the input mechanism.
That's not how it works. For food, one person enters the data from the label on the back, other people scan the barcode with their camera and it's added as consumed. That's how MyFitnessPal works, as well as alternatives that use OpenFoodFacts under the hood. For other types of data, you have other aggregators that are used for a specific use case (Pocket for Articles, Last.fm for music, Trakt.tv for shows/movies, Goodreads for books, various device trackers for steps/heart rate/sleeping patterns etc).
Then you use tools that combine those things together, like Exist.io. Alternatively, you go through a middleman route like Zapier/IFTTT/Integromat/NodeRed to not have to fuck around with each API separately. Then you use a system that visualises it all.
Granted, there are some blind spots that will make you have to enter the data manually (example: with Netflix you have to manually download a CSV, and it only includes the title and a date, not the time of the day nor the watch time), but that's really not that time consuming. It takes me about two minutes per day to fill in the missing fields in my system, and I spend a few hours every two months or so fucking around with the data to find some correlation that I simply wouldn't be able to find otherwise.
Seriously, based on the answers here we should all be storing our data offline.
Realistically what will happen if we don't provide people with good self hosted free software is someone will instead provide them with Facebook and Google ad based software.
It's troubling to me that I cannot find anything on the Personium website that explains who is behind Personium. There is a link on the nav menu labelled <squiggle>, which might be chinese characters, next to a link labelled "English" - so I suppose the project is run by some chinese team.
Personium: Personium (JP) is an open source 'Personal Data
Store Server' and is intended as a basis for organizations to build an operator with it. OpenID connect (section 4.4.3) is applied here. Among other things, financial data,
energy consumption and social data are processed in Personium applications of Personium. Typical users of
Personium are banks, energy companies and advertisers.
Personium itself does not provide services directly to the
end user. Personium is, among other things, the basis of a few of Japan's "databases," a new IT-sector standardized role for companies.
I dont know... I find this sketchy as anything. Why do I need to keep a track of my purchases at a particular shop? Its not like I dont already know which shop sells cheapest bread, I dont want to track it at all.
I dont care how I drove or rode because I already did it and know. Keeping track of it for what? So my spouse can keep an eye on me? Or track of children?
What this project is "aiming" for is IMO solution to a problem that doesnt exist. Why? To build a market for themselves otherwise all this "smarts" that is thrown with iot is just used for data mining at a mass scale. Only projects like openhab make it selfhosted and your own.
Nothing personal about the team behind this but I am not sold. Sorry. I'd rather my data not be anywhere in the first place than be in a central location which now I have to guard against people. Sorry
> What this project is "aiming" for is IMO solution to a problem that doesnt exist.
No, there are many people who like to track all kinds of data that would find this project useful. I use Perkeep, DayOne, IFTTT, LifeCycle, Exist, Gyroscope, RescueTime, Pinboard, Pocket, OwnTracks, and probably some others I've forgotten to track everything I possibly can and keep it logged. Why? Because a) I forget things and it's nice to have an external memory, and b) I like to play around with the data and do things like "what's the biggest geographical span I've visited per month in a given year?" or "which roads within a 2km radius have I not yet walked down?"
Just because you don't have a need for a project or understand why other might doesn't mean it's "a problem that doesn't exist".
You can't imagine reasons why people would want to keep their data? Which includes posts, photos, likes, etc. Many people aren't that scared of their spouses :D [1]
I mean. If you explicitly don’t want the data , then maybe nobody needs to track it (unless you’re using a cc or similar). The point is that that falls on you and you can choose not to.
Seems quite a heavyweight implementation. E.g. nextcloud can be installed in a shared hosting environment. One should also consider the costs for hosting such a peraonal data store!
To be frank, this is so enterprise-y in their documentation without ever explaining the meat and bones that I instantly distrust it. I can't even get a simple explanation of what this is in practice instead of "BaaS PDDs!".
Also the requirements are almost goofy if they think this is going to be hosted individually...
Further, and I realise this will come off as alarmist, but, what then if the software suffers a 0-day? All that data will then be nicely aggregated for a bad actor. Somehow, knowing that there is perhaps a non trivial amount of work to be done to collect data and compile a profile from many different sources feels safer than putting it all in one place.