Hacker News new | past | comments | ask | show | jobs | submit login
Kindle collects a surprisingly large amount of data (nullsweep.com)
545 points by BCharlie on Aug 25, 2020 | hide | past | favorite | 379 comments



This statement - "None of these requests appear to be used for customer features like last read location." - bugs me, because it's fairly obviously false, and detracts from the real concerns.

To sync a "last read page" across devices, you need to send a location back to Amazon. It's also appropriate to tie a location to a device, so you can pick the appropriate device to sync your position from. And, when you highlight a word, the translation, definition, and wiki page is brought up, so of course it's being sent to bing and wikipedia.

There are valid concerns here (there's too much information being sent overall - the location data doesn't need to be sent with every page turn, for example), but these concerns are being buried behind FUD about none of this data needing to be transmitted.

EDIT: Can I also point out the ironic nature of griping about Amazon's analytics collection while running an analytics suite on the webpage yourself?

zql=Kindle%20Collects%20a%20Surprisingly%20Large%20Amount%20of%20Data pqo=1 xfg=1 xqi=946451 h=8 m=58 s=11 eqm=https%3A%2F%2Fnullsweep.com%2Fkindle-collects-a-surprisingly-large-amount-of-data%2F uel=https%3A%2F%2Fnews.ycombinator.com%2F nvn=b271bb7f9e0fe444 xpx=1598364493 bqq=2 oso=0 ajh=1598366510 lyz=1598364493 _ref=https%3A%2F%2Fnews.ycombinator.com%2F euq=0 cookie=1 res=1080x1920 fpr=429 rlp=xnxpI1


I mention that the data that appears to be used for those purposes is sent again in a separate request to a separate end point, so we have two types of requests: last read location, and reading analytics. Sorry it wasn't clear, I'll try to improve the wording.


Will you also be updating and noting that the requests to Wikipedia and Bing are for explicit customer-benefiting features?

Might be worth noting that you can opt out of their data collection (on the e-reader, at a minimum) as well. Settings > Device Options > Advanced Options > Privacy or in the device management console in your account on amazon.com


The text in question:

> Highlighting or tapping any word will send the requests with the text to Bing Translate and Wikipedia, as well as back to Amazon.

Is there a reason why that text needs to be sent before the user clicks the "translate" button? Is there a reason why it needs to be sent to Amazon?


> Is there a reason why that text needs to be sent before the user clicks the "translate" button?

Yes - UX latency. I would expect this kind of thing to take a few thousand milliseconds, and shaving off a few hundred milliseconds from between when the user highlights text and when they select "translate" is significant. The fact that this data is being sent to Wikipedia of all places further signals that the usage is likely to be innocuous.

Do I think that this is globally a good design decision? No, for both engineering and privacy reasons. There's definitely no good reason why it should be sent to Amazon at all.


> There's definitely no good reason why it should be sent to Amazon at all.

I was wracking my brain on this, and all I could come up with was "to independently verify the invoicing for Bing translations" and "how many times are people accessing the definition/translation and not highlighting". So, analytics, not something that explicitly benefits the user.


Can we stop pretending that analytics don't explicitly benefit the user? Product Engineering organizations rely on analytics to improve user experiences.


Analytics can be done less granularly and still benefit the user. Also, surely not every data point collected is used to benefit the user.

For example, Amazon doesn't need to know where I am when I request a definition or translation. If they're concerned about usage, they only need to know how many times I actually used one or both of those features per day, per week, or month. They don't need to know instantly every single time a word is highlighted.


> Analytics can be done less granularly and still benefit the user. Also, surely not every data point collected is used to benefit the user.

How? For all we know, it isn't granular - it might be aggregated at the server level to hide specific user's actions. But they'd still need to be sending in the data from the device to the server.


The device could keep a daily count of interesting actions, and sync that to analytics servers on a daily or weekly basis. That preserves 95% of legitimate use cases while leaking much less private data (like how my reading habits are distributed across the day)


I mean, you're still collecting most of the problematic data. And you might legitimately be interested in what you're leaving out - knowing time of day that people do things is actually important for plenty of use cases.


They can but they're often much more than that.

Also it should really be opt in. Our at least opt out. I hate Amazon looking over my shoulder while reading a book.


That's why I don't use their reading app and use a custom OS.


The word choice we want here is directly vs. indirectly.


I'm surprised you'd say that. Out of interest, how does analytics help websites not use blathery, unhelpful text in overly-small fonts, done too-pale to make them unreadable. A lot of UI failings are of this most basic kind.


When you play an online slot game where you bet money that some numbers will appear on screen, and they use analytics to "improve user experience" (read: engagement, read: you losing more money), is that benefiting you or is it benefiting them?


Kindle devices have a dictionary on device. By looking into which words are most frequently defined, they can add these to the local dictionary to help improve the speed of the UI.


The screen refresh rate on these devices is measured in seconds, so a few hundred millis of network latency is impossible to display.


This isn't universally true - Dan Luu's computer latency page[1] lists three Kindles, all below 900 ms of latency. And, since some devices have latency as low as 570 ms, it makes sense that they would use this optimization.

[1] https://danluu.com/input-lag/


have you actually used a kindle? it certainly doens't take seconds for the definitions to pop up. a full-page refresh might take a second, but most page turns or UI interactions are partial draws and are much faster.


I suspect that they were overstating a limitation of these devices rather than speaking from inexperience. While it has been years since I've used a Kindle, I do use Kobo devices and the delays are perceptible. While changing a page may be quite quick, user interface elements (such as a box containing a definition) seem to take longer. I suspect that they have to be more agressive when refreshing the screen before and after these user interface elements are displayed in order to make the ghosting less perceptible.

If you want to see what I mean by the ghosting of user interface elements being more perceptible, try using KOReader. The ghosting after using a menu can be quite noticable (at least on Kobo devices, which are based on the same technology).


You're exaggerating how slow the screens are.

And the fact that the screens are slow should be motivation to make the rest of the system as responsive as possible. A good software engineer will work around bottlenecks, not shrug their shoulders and introduce new ones.


Also remember that "Kindle" can refer to an app on your phone or desktop computer, all of which may share code related to highlighting and translating.


That doesn’t seem right. Let’s consider the screen refresh to be like a subway station, where the train shows up every few seconds. We need the text we want to show to the user to be at the stop waiting when the train arrives. If we miss the train, we need to wait for the next train to get our text on the screen. The network latency delays when we show up to wait at the station.

If the refresh rate is 5 seconds, and the network response time is 500ms, than eliminating the 500ms response time means we are 10% less likely to miss the train. On average, the time for the text to appear on the screen decreases by 500ms.

All this assumes the refreshes happening on a static schedule. If the software can trigger the refresh, then it’s a lot simpler. The 500ms improvement in latency would apply equally to every engagement with the translate feature.


There's no static schedule. It's an e-ink display. Refreshes happen when software tells it to display something new and take several hundred millis per blank - and a screen can be up to three blanks (because if it doesn't go white-black-display, then some pixels get stuck "on" or "off" or "halfway").


In that case, it’s clear that eliminating the network request before triggering the refresh directly reduces the amount of time the user has to wait to see the result.


There isn't a "translate button" - the selection of the word i the button for define/translate/wiki. You swipe between the three cards.

I like this, as a user. I don't want MORE buttons to tap through when I'm trying to define or translate a word. Especially since the Kindle eink screen and UI is not the most responsive.


This is literally my #1 used feature of my Kindle. I read texts in different languages to have a quick access to single-tap translations.

If it took 2 taps, I would switch platforms.


On the iOS app it all appears instantly-ish when I highlight, so I'm guessing it's just the same codebase.


> Might be worth noting that you can opt out of their data collection (on the e-reader, at a minimum) as well. Settings > Device Options > Advanced Options > Privacy or in the device management console in your account on amazon.com

Good tip, I'm going to give this a whirl. Unfortunately, all the network calls add a significant amount of latency even if one didn't care about privacy.


Can you provide the URLs so we can use pihole to block the requests?


(off-topic) What’re the advantages of pihole over /etc/hosts?


That it works for all devices on your network. Even ones that don't have an etc/hosts :)


>(off-topic) What’re the advantages of pihole over /etc/hosts?

It's good for cases exactly like this - devices where you don't have control over /etc/hosts (or where you have lots of them and don't want to keep the hosts files in sync). I use it for my Samsung TV to keep them from phoning home (but still letting me use apps)

Edit: you can also set up a DoH endpoint and filter traffic while also allowing Encrypted SNI to work


> It's good for cases exactly like this - devices where you don't have control over /etc/hosts

Is the pihole a DNS server or a firewall? Sibling comments suggest it's a DNS server, but that doesn't answer this need at all -- if you don't control /etc/hosts, you don't control the device. It can do its resolution however it wants. Most obviously, it can include the domain names you don't want it to reach in its own /etc/hosts file, which you just said you didn't control.


In addition to sibling replies which point out network-wide usefulness... pihole (or any dns server) can/will return NXDOMAIN instead /etc/hosts which will only return an ip. A dns server can also be configured to match a domain and any subdomain (wildcard match) without having to specify each entry individually.


They both work similarly if you're using them to block outbound requests, but a Pi-Hole would intercept and block outbound requests for every device on the network where it's installed, whereas editing /etc/hosts would only block requests on a single device (unless that device is your router, I guess?)


I liked the article. If you are gonna update it, please consider also mentioning technical aspect. Frankly, Amazon snooping on users is to be expected, but short mention of app for which platform have you analysed using which tools would be welcome addition.


> Frankly, Amazon snooping on users is to be expected

Snooping on users during e-commerce transactions, sure.

But recording user's detailed interactions with every ebook? I hope that's a big surprise to your average Kindle user.

It would be great to see a data request response and how much of this data is retained and for how long. It's clearly not anonymized at the request level.

Very easy to see a future where just reading certain books or reading certain books too many times could flag you as dangerous or be used to support a mental incompetence hearing resulting in loss of rights.


> But recording user's detailed interactions with every ebook? I hope that's a big surprise to your average Kindle user.

I doubt it. Here are some features the Kindle phone app intentionally advertises to the user:

- prediction of how long the book will take to complete, based on your reading rate

- tracking of whether or not you read anything on any given day


I believe page location analytics are used for the amount of money that goes to Kindle Unlimited authors, also.

It can't just track the very last page in the book that you read, because authors were gaming that by encouraging people to immediately skip to the last page of very large works they didn't otherwise care about. Instead there's some kind of heuristic that tries to figure out if you've more-or-less-normally read the book.


A good point, since KU authors are paid per page read. Lots of fraud potential there.


I think the reason to send a sync every page turn is you don’t know if the device will be in contact when any alternate sync trigger happens so to keep it mostly up to date the best option is to constantly sync whenever you have connectivity.


I honestly don't mind the FUD as long as user don't have options. Amazon deserves the bad press in that case. Kindle is an awesome screen reader, but such features make it a bad device. A good device just had an option "sync usage data to Amazon account" <yes/no>. People suggest it is a technical impossibility.

It is just a shame that you have no options. Had to quickly search if my kindle has GPS capabilities. Gladly it does not.

"Kindle Collects a Surprisingly Large Amount of Data" is a completely honest and in my opinion correct statement. So yes, companies are dishonest in their data collection practices and responding with exaggeration is maybe wrong. But I do care more about the data collection issue.


> A good device just had an option "sync usage data to Amazon account"

The Kindle has an option to "sync last page", which you can turn off -- that sounds like it could be exactly what you're asking for, but more experimentation would be needed to know for sure.

I didn't see any mention of this config in the OP, aside from mentioning that the feature exists, so it's unclear whether the data being sent is used just for that feature, or whether less data is sent if the sync feature is turned off.


I pointed this out in a thread, but with the e-reader devices at least, you do have an option. It's opt-out, which sucks, but it does exist.


> which sucks

Note that it doesn't just suck because you're giving up using the Kindle itself. It also sucks because you'll be losing your entire collection of Ebooks, which are DRM-encumbered and can not be ported to other non-Amazon devices/platforms/apps.

This makes it extremely difficult for other privacy-respecting platforms to compete on the market, since using them requires the user to either break the law by stripping DRM from their books, or to abandon their entire purchased library.

Future TOS/EULA/Privacy changes that might not have been in place when a user originally bought their Kindle can thus be forced on them by making it prohibitively expensive for the user to opt out or change ecosystems.


I think there's a bit of a misunderstanding - you can turn off analytics on your e-reader without giving up the kindle platform. It's also separate from whispersync (which can also be disabled independently).


Just for clarification -- is this something that actually turns off the collection itself?

I'm seeing conflicting things online that range from "just hit this toggle and you're good", to "you can disable some of it, but not all", to "this only opts out of data processing for ads/analytics".

If there really is an option to disable the collection entirely, then that would mitigate a large number of the problems I have with that practice. Of course I'd love for it to be opt-in, but just giving the option would still be better than many other devices like Smart TVs.


Kindles have airplane mode and allow you to load books onto them using the USB connection. The battery also lasts somewhat longer if you use them that way. Amazon directly offers a "Download & Transfer via USB" option for ebooks you purchase in their store, as well -- this is a relatively well-supported use case.

It does mean that if you want to be absolutely sure your Kindle isn't phoning home, you can't use the Kindle browser, and you need a laptop or similar to download the things you want to transfer over. It's not a perfect solution for everyone, but for the typical HN reader who is concerned about telemetry, it should work.


I've done this. Mine has been in aeroplane mode since the day I got it. I seem to remember having to allow it to connect to Amazon once when I first took it out of the box, but since then, no network connectivity at all, and zero problems as a result. It's been great.

I download the ebooks themselves using the Kindle application on my computer (if I'm using Amazon to get them, which I don't always), and then use Calibre to manage/import/convert/strip DRM from them. I don't need the sync functionality, or to be able to look things up on the internet (not being able to do that is a feature as far as I'm concerned!). I just want text on a page. I like the "e-reader" experience, and I have no desire to read books on a phone or tablet. I have one Kindle, and it comes with me if I think I'm going to have the opportunity to read when I'm out of the house.

Of course, if you're using Amazon to get your books they'll still build a profile of your reading habits, but there's something about tracking the exact parts of a book I'm reading, the bits I might linger on or reread, which feels extra intrusive to me, and which I categorically don't want.


> Mine has been in aeroplane mode since the day I got it. I seem to remember having to allow it to connect to Amazon once when I first took it out of the box, but since then, no network connectivity at all, and zero problems as a result. It's been great.

I also never connect my Kindle to the internet. (The phone app does connect.) You don't have to allow it to connect to Amazon once. Mine has never connected.


In isolation, "last read page" could surely be E2E encrypted. Amazon would know that I'm using a Kindle app or device, but everything else could be opaque.

There's no motive on Amazon's part to do it this way, it would be a hassle to implement, possibly not great for battery life, and I expect that users don't care much.

Frankly, I don't care much, in practice. In principle, yes; everything which can be kept private, should be. But Amazon knowing what page I'm on just doesn't discomfit me, the way the prospect of some company being able to read my messages does.


The pro-privacy crowd needs to choose it's battles.

The most common response about online privacy is "what does is matter if X knows Y? I've got nothing to hide".

People already don't care, and I guarantee they also don't care that Amazon knows what page they are on in the book the are reading. There are much bigger issues to focus on


Can't you do lost of those things by sending encrypted data to Amazon, and getting back the encrypted data from them? They act as a storage in most cases, not as a server, no?


You'd have to figure out some kind of secure key sharing mechanism between phones, tablets, web browsers, and e-readers.

Or, you can trust that a position in a book (bookmarks, notes, etc.) is not sensitive information that really needs to be encrypted. This is my - perhaps overly pragmatic - position.


I think the books you read and your annotations should definitely be protected. Imagine reading about Tienanmen Square in China.


Simply purchasing/owning a book on that topic would be enough for an oppressive government like China, they wouldn't need to know where in the book you were exactly.


Some books sold in China are edited for that market. If you highlight a passage that shouldn't be in your book, you could be in trouble.


>You'd have to figure out some kind of secure key sharing mechanism between phones, tablets, web browsers, and e-readers.

Yeah, it's not like Amazon can afford security experts to work on this or anything.

>Or, you can trust that a position in a book (bookmarks, notes, etc.) is not sensitive information that really needs to be encrypted.

This is an ignorant position that has been proven wrong over, and over, and over again. Private data should be secure by default, because otherwise eventually someone will figure out how to abuse it. This is a lesson form bazillion fraud schemes and social engineering hacks everyone in tech should have learned by now.


Amazon could also afford to fill the Panama canal with dirt and reunite the American continents, but why would they? A dozen angry (potential) customers on HN is hardly motivation.


I've been thinking about PII in this context.

If all data is secured by default, then the identification of PII is not about deciding to secure that data, it is about identifying where we might impose (and often this isn't required, but now we can consider it) additional UX burden or complexity in order to add _additional_ security.


If I can't think of a way to abuse me by having my data, it doesn't mean that someone else doesn't. I would really rather avoid all this discussion by them not having my data to begin with.


If you know of an alternative that offers client-side encrypted sync, I'd love to hear it. I'm considering alternatives to the Kindle as well, even if for reasons unrelated to the analytics.


I wish I'd knew:-)


The data is encrypted.


Encrypted between me and Amazon (such that Amazon could see the content), or encrypted between my devices such that Amazon can't see the content (but only the encrypted form)?


>the location data doesn't need to be sent with every page turn, for example

why not? if i open a book on my phone that i stopped reading on my kindle, i want it to open to the last location i read to on my kindle. not ten pages back because it doesn't sync data every page turn for some imaginary privacy benefit.


>To sync a "last read page" across devices, you need to send a location back to Amazon. It's also appropriate to tie a location to a device, so you can pick the appropriate device to sync your position from.

Why is location needed for that? Shouldn't a device id and account work just fine? I don't need to share my location to sync other devices.


Why is syncing across devices not opt-in? Why doesn't Kindle tell you which data it sends and when?


Sync is opt-in.

And, good question. It would be nice, though I'm sure they've buried it in their multi-page privacy doc somewhere.

EDIT: No, it's not opt-in. Reading failure on my part.


>Sync is opt-in.

https://www.epubor.com/whispersync-for-kindle.html

"And "Whispersync for Books" is enabled on Kindle Fire, Kindle devices and apps by default."

https://smallbusiness.chron.com/amazon-whispernet-work-58992...

"Whispersync is on by default in all new Kindles, but you can turn off the option on individual devices if you have multiple readers attached to your account."

https://ebookfriendly.com/how-to-disable-data-collection-kin...

"How to disable data collection on your Kindle or Fire device"


Crud, I read that wrong; you're correct in that it's opt-out.

That said, given that it provides a high value to the end user (I use it daily), I personally don't mind.


The aggravating bit - beyond the fact that Amazon doesn't let you opt out, is that this sometimes affects performance. Switching over to the kindle app occasionally hangs. Killing the app and restarting it usually works, but there are times when I have to go to airplane mode and kill and restart the app just to open a book!


You can opt out, at least on the physical devices.

But yeah, the Kindle iOS app is crap in many ways - the one that bugs me is how hot it makes my phone. I mean, WTF?


As a former Kindle developer, I can say that most of what's mentioned in this article are metrics used to understand how the features are used (bookmarks, highlights, dictionnary, etc.), how much they are used, and in which country. This allows the teams to focus on features that are actively used, and sometimes lead to discontinuing features that see little to no use. Hope that helps.


As many people here have echoed - this boils down to the fact the data is being captured without an opt out.

I don't doubt the developers are using it for 'morally acceptable' purposes, but I don't trust Amazon not to abuse that data later down the line!

I really don't feel that anyone needs to know precisely what pages I have viewed in a specific book.


The kindle e-readers do offer an opt-out from the metrics collection. It can be triggered from the website or the device itself.

That it's an opt-out and not opt-in is not a good thing, but it can be opted out of on the e-readers.


OK well that's something. An opt-in would be preferred but that's much better than nothing.

Is it confirmed though that these network requests definitely stop after that is switched?


What are the steps to do this?



Does not work, can you point to a tutorial? And does this include the Kindle app?


On my kindle Oasis:

- Go to the homescreen

- Open the hamburger menu

- Tap settings

- Device Options

- Advanced Options

- Privacy

- Disable


That data allows users to pick up where they left off as they change devices.

I rely on that regularly as I use both my phone and a Kindle device to read books.


So you should turn those features on. It doesn't mean I should have to tolerate it by default.


At least for EU citizens the GDPR requires this to be an opt-in, with the option to decline without service degradation.


Agree. Opt out at the minimum. How did software and features ever get done before telemetry?

Efficiency is not always the best humanistic approach. So maybe they support unused features and maybe they let some features wither that lots of people like. Maybe it would make things cost a little more. I think people would be ok with some of those inefficiencies.


>How did software and features ever get done before telemetry?

IMHO, The software today is miles better at UX.


Is that because of telemetry or just the field developing naturally, though?


No, I don't think its just due to telemetry, I think its a combination of multiple factors as you suggested.


The opt out is don't buy a Kindle.


That's how every company rationalizes the mass collection of user data. "Oh lets collect many terabytes of every user-action in case we need to one day discontinue a feature".

It's a book. You don't need to collect and track every fucking action I do to find out if your stupid highlighter is being used in Poland.


Whether you like it or not this collection does lead to better products - that is why you think every company does it because those that don’t usually die out. Understanding your users is vitally important.

Privacy LARPers are a tiny segment of the market, the average person doesn’t really care if their ‘usage of the highlighter function is tracked’


> Privacy LARPers are a tiny segment of the market, the average person doesn’t really care if their ‘usage of the highlighter function is tracked’

If so, why don't they loudly advertise the data collection and do it only with opt-in?

It's not that the average user doesn't care if they're tracked, it's that they're not aware that they're being tracked.


You think companies should loudly advertise something people don’t care about? That doesn’t make sense.

Plenty of companies are quite transparent about their data collection practices (set up an Apple device recently?)

Most people are aware of data collection, they care more about functionality though.


>Plenty of companies are quite transparent about their data collection practices (set up an Apple device recently?)

I have not, not recently, but what you say is simply bullshit. They're "transparent" in that they give you a ToS loaded with legalese that they know you couldn't easily read through to find just how much and where they're squeezing your life for information to store. In cases where they simplify this with some less legalistic declarations of data use, what you often see there are numerous weasel words and phrases to very ambiguously describe what's being done. You know, things like "We MAY collect some information for the sake of improving user experience" and blah blah....

Then of course, there's the outright lying, which also happens, in which big tech companies simply fail to mention some types of data collection anywhere (the Amazon Alexa voice recordings being listened to by humans is a good example iof this)


This isn’t buried in a tos or legalese

https://www.groundctl.com/wp-content/uploads/2018/04/csm_IMG...

Apple prompts you for each piece of data collection during the setup of an iOS device (and lets you choose if you want to share).


You're presenting the shining example in the corporate world of responsibility with customer data, Apple, with every other company and saying that everyone does it this way?

Most companies hide it in legalese. Some companies claim they're not sending any data and then send it anyway. Looking at you Philips Hue lights.


> You think companies should loudly advertise something people don’t care about?

It's not what I said.


> why don't they loudly advertise the data collection


This I wrote. I didn't write "companies should loudly advertise something people don’t care about" -> you added something to my sentence, taking it out of context.

I wrote my opinion already, but I'll repeat it anyway in case it was not clear. I think you can't know if people care about it or not, as long as they're not informed about it.


> If so, why don't they loudly advertise the data collection and do it only with opt-in?

But they do.

https://m.youtube.com/watch?v=yg70ojfWXnk


The video is about synch, while the conversation is about "collection does lead to better products" -> i.e, analytics.


What do you believe syncing means? This discussion talks about whispersync reporting last page read and most recent page read events. What do you think that's supposed to do?


Syncing and analytics are not identical, sorry.


You're the only one fabricating accusations about "analysing" in a discussion about how Kindles send data with whispersync, a system widely known to be used to sync data across devices.

More importantly, the only usecase mentioned in the discussion that resembles anything like analysis is synching page reads across devices, and tracking reading progress to compensate authors who make their books available through subscription services.

Either you know stuff about "analysing" that for some reason you're keeping a secret, or you're talking nonsense about stuff you have no grasp over.


Please read the message beginning this thread.

https://news.ycombinator.com/item?id=24271258

It's written there:

> "most of what's mentioned in this article are metrics used to understand how the features are used (bookmarks, highlights, dictionnary, etc.), how much they are used, and in which country."

Besides, I don't appreciate phrases like "fabricating accusations" or "you're talking nonsense about stuff you have no grasp over". I'm may be wrong, it happens often, but even if I am this aggressive tone is not in place. You can point out my mistakes politely if they exist, same way as I do with yours.


> Privacy LARPers

This is an unnecessarily denigrating term at this point in the conversation. It's not LARPing to want to be able to read a book or take notes without being tracked.


> It's not LARPing to want to be able to read a book or take notes without being tracked.

Absolutely agree but it is LARPing to pretend this collection is for anything but improving a product. Nobody is out to get you and nobody particularly cares how often you specifically turn the page (the data is useful in aggregate).


Kindle's privacy FAQ[0] says:

> We also use it to develop and improve products and features for all our customers and to gain insights into how our products are being used, assess customer engagement, identify potential quality issues, analyze our business, and customize marketing offers.

Targeted marketing is, in itself, something that's reasonable for someone to want to block regardless of whether or not there's a mustached villain tracking you. Privacy is about more than stalkers, it's about the effects of data usage. For some people, targeted advertising is a harm regardless of whether or not the company knows their name.

To go a step farther, I also don't understand why it's LARPing to be worried about a company who is actively being investigated for misusing seller data.

I bring this up every time that one of these threads/stories gets posted, but there's (appologies, but for lack of a better word) some kind of weird gaslighting that always happens in these situations. Before it broke that Echo and Siri queries were sometimes listened to by 3rd-party contractors, if I had posted that suspicion on HN people would have called me paranoid. Once the story broke, the argument then shifted to, "well of course they're doing that, how else would you improve the service?" That kind of thinking applies to Amazon as well.

I don't know that it's likely, but I don't think it's outside the realm of possibility that Amazon might use this information in the future to help target pirates, change book rankings on their store, perform highly targeted advertising and book recommendations, or turn it over during government subpoenas. Those are completely reasonable usages that their privacy policy leaves them permission to do.

Similarly, I don't know that it's likely, but it's not outside the realm of possibility that this information might get sent to 3rd parties with less responsible data practices, or that employees might be given direct access to it in an unobfuscated form[1]. It's not something I'm losing sleep over, but I wouldn't be shocked to my core if someday all this information got leaked publicly and correlated to people's email addresses.

These are all situations where privacy matters regardless of the original intention. The "I only want to make my service better" defense applies to basically all data collection that most companies do. Even advertisers use that defense. It's reasonable for people to want to avoid being a part of that.

Of course, it's also reasonable for people not to care, to say that hacking is a risk they're willing to live with, and that they don't mind targeted ads, and that the books they read aren't sensitive. But it's not LARPing if someone has a different opinion on whether or not they want to tolerate that stuff.

[0]: https://www.amazon.com/gp/help/customer/display.html?nodeId=...

[1]: See, https://www.telegraph.co.uk/technology/2017/12/12/creepy-net.... Is it LARPing for me to be weirded out by a marketing department trolling over my reading/listening/watching habits looking for viral tweet material?


> Whether you like it or not this collection does lead to better products

Maybe it's just me but every tech product I use these days gets worse over time. If something does get better, two things get worse. They mostly try to optimize for user engagement and not user experience.

> Understanding your users is vitally important.

And the only way to understand people is spying on them?


There’s an important distinction to make: this tracking doesn’t necessarily lead to better products, it leads to better business metrics.

Sometimes a better product comes out of better business metrics, but other times they’re directly opposed.


This is not true. What if for example you want to make a change to the dictionary feature because you imagine that it’s not useful and should be less prominently accessible. How would you measure if this is a good idea or not without tracking its use? This has nothing to do with business and everything to do with making the product better.


Sure, there’s an example where best case the user experience is improved and business metrics aren’t affected. But I assure you if that app has a decent analytics setup they’ll also be tracking business metrics, and if for some reason business metrics went down with that change past some acceptable threshold, that change won’t be launched.

Now if you look at opposite case, where a feature is worse for user experience but helps business metrics, that feature will definitely be launched. A small, mostly harmless example: Ever tried to hide twitter’s recommended accounts? It gives you the option to “see less often”, but curiously there’s no option to stop seeing the window forever. Why? Because clearly it benefits twitter’s business on average to keep showing these recommendations.

I’ve built enough dark patterns at my last job to know it always comes down to business metrics.


Exactly. At the end of the day it's about profit and not necessarily a better product. Sometimes more profit means making a better product for the end user.


'Privacy LARPers are a tiny segment of the market, the average person doesn’t really care if their ‘usage of the highlighter function is tracked’'

Which is exactly why we have regulation that forbids these practices, to protect the gullible from themselves. Furthermore, do you think privacy should be the privilege of just those that are smart and keen enough to be aware and prepared to engage in a relentless and perpetual battle with the most dark of patterns with every click they make?


Do you have something to back up the claim that this kind of data collection leads to better products?


This comment is such cowed boot-licking of a giant corporation. Completely antithetical to the hacker ethos.


One can partake in the hacker ethos while not being a conspiracy nut - albeit I admit those sometimes go together.


Most of the world-famous libre software is built without their developers study of massively collected usage data ("telemetry").

I look at VLC as a great example to follow. Their stats show 3.4 billion downloads (https://www.videolan.org/vlc/stats/downloads.html), yet they do no telemetry at all. The product works great. It could be improved of course, but Outlook could also greatly be improved, and they have high-salary staff and a boatload of data they extract from users. Yet it's slow as hell and has lots of UX I disagree with.

I'm myself the author of a replacement of Windows "alt-tab" on macOS (https://alt-tab-macos.netlify.app/) which doesn't do any telemetry. I can lead the roadmap, with the help of the community, without spying on how users set their preferences and use the app.

As a matter of fact, it can be argued that acting that way can be negative value as it's reinforcing popular usage; or from the power-users perspective, dumbing down the software. By definition, advanced features will have low usage. It doesn't mean it should be removed.

Lastly, think about non-software businesses. Many amazing products have simply no way to gather data when the products are in the users homes. They rely on gathering data by talking to customers at the points of purchase, customer care, are in various forums with enthusiast users. This model has shown great results, so it is in no way clearly to be avoided in favor of telemetry-everything.


> Most of the world-famous libre software is built without their developers study of massively collected usage data ("telemetry").

The sort of telemetry mentioned in the article is used for UX purposes, and God knows FLOSS sucks at UX.

And by the way, Debian collects and reports telemetry since the early 2000s, and Firefox is quite open on how much telemetry it collects.


TBH the argument that it reinforce popular usage is a valid one, at MS we were taught again and again on how to design good experiments using telemetry but at the end it's hard to support changes when your data shows that something is working properly, and UI changes tend to produce a dip in usage or satisfaction graphs until they catch-up.


VLC’s UI is horrible.


It doesn't really matter does it? You don't collect data without consent, period.

Why is that so hard to understand?

Why don't developers ever push back against this sort of thing? Collectively we build this stuff, we are not 'soldiers following orders' which makes us responsible for what we create.

The current actual use is not relevant. Consent and the possible uses are relevant.


I think your comment is unfair.

Every webserver logs the IP address and the URL visited. Do you think most people know this? Do deverlopers push against this?


strawman; you visit someone else’s server, and therefore they get data about your visit; with kindle, you’re using your own device and there’s no expectation that amazon will be snooping


"you visit someone else’s server, and therefore they get data about your visit"

I don't think the average person knows this. A lot of people even have no clue about internet. So there is no consent most of the time. And we, the developers, just let the logs running.

"with kindle, you’re using your own device and there’s no expectation that amazon will be snooping"

Well I would absolutely have this expectation. I expect a device that is connected to the internet snooping on me. Then there is the Amazon brand. I absolutely don't trust them so I expect them to snoop in me.

But to be clear: I absolutely hate that my privacy is gone. I use all kinds of blockers to disable tracking and I also agree with jacquesm snooping is wrong. But I still think his point is too black and white and therefore unfair.


>Every webserver logs the IP address and the URL visited.

I maintain a webserver - https://git.sr.ht/~ancarda/tls-redirector - that has no support for logging. If you wanted logs for some reason, you'd need to modify the source code to add that functionality.

Granted, tls-redirector isn't a general purpose webserver, but even in production I tend to turn off logging. I just don't see the need to have logs lying around that I never use.


No, not every webserver does. This is something that you could easily configure.

Yes, most people know this by now.

Yes, some developers push against this.

Also: It's the law. Collecting data without consent is not always legal. Whether that particular bit of data rises to the level of requiring consent is left as an exercise for the reader for their particular jurisdiction and industry.


GDPR actually forces all websites to carefully keep track of what gets logged and for how long these logfiles are retained. So yes, legislators are pushing back against the common practice of logging everything just cause.


> You don't collect data without consent, period.

This.


I think the privacy-concerned end-user thinks, "Yes, I completely understand why this information is being tracked and how it would be useful to Amazon. But I still don't like it."


As a freedom-concerned citizen, I always completely understood the policies and methodology of dictators and tyrants, and how what they do is useful for them.


Quit LARPing - Amazon isn’t trying to take over the world by tracking how often you use the bookmark feature.


Or "It's all fine and dandy today, but what about in x years when there's a new person/group with different incentives in charge?"


I'm surprised no one brought up revenue sharing.

I was under the impression there was a revenue-allocation problem that Amazon needed to solve (Kindle Unlimited subscriptions?), that depended on reliable reading statistics. E.g. How many people read book A?

Wish I could find the article, but the implication was there were a ton of publishers attempting to game the system. For example, by publishing blank, very long "books" and having them "read" by software automation.


How does it make a difference?

First, if an entity want my input and are going to use it, they should be decent enough to pay me for giving it. Why do users need to work for free for Amazon?

Second, is it opt-in? If not, then there's an ethical issue here, even if a manual opt-out option is given (does it?). If there's no opt-out, there's a double ethical issue.

Thirdly, is this data deleted once it's being used for the goals you mentioned, or is it kept, making it a risk both for leaking and for Amazing deciding to put it for a different usage in the future.


You don't. You have 100% freedom to not work for Amazon. Don't buy a kindle. Don't use a kindle.


If I would have known that by buying Kindle I end up working for Amazon, I indeed wouldn't have bought one.

It's deception. Please put on the box a big warning, "THIS DEVICE COLLECTS YOUR DATA", similar to those on cigarette boxes.


Are you genuinely surprised at this point? Pretty much all big tech companies were caught outright lying about user data collection. Why would you assume by default they don't try to get as much as possible? They are all based on ML, of course they do.

A year or two ago Amazon was swearing that humans don't listen to Alexa conversations until we learned they actually do. IIRC Amazon tried to backpedal: "of course they do, it is their job, we meant humans don't listen _for fun_".

At this point just assume the internet connectivity as such a warning.


> Pretty much all big tech companies were caught outright lying about user data collection.

You can strip the big here.


Of course I'm not surprised, but I refuse to accept this as normal.


But your refusal doesn’t change the reality.

Kinda like refusing to believe that climate change is real does not change the reality.


What? I didn't say I don't recognize the reality. I said I don't accept it as normal, meaning I work trying to change it.


There’s a plastic bag over the product saying don’t open it if you don’t agree with the terms of service and that it’s required to use the device.

Also, plenty of people just leave the kindle in airplane mode and use third party software like Calibre to manage their libraries.


FWIW, the website providing this breakdown also collects analytics data without a warning. So, there's that to consider as well.


It’s called the terms of service?


Terms of service are written to be understandable by lawyers, not average end-users. At this point, understanding every terms of service, privacy policy, etc. presented by every piece of software, website, etc. encountered by an average user would require them to spend hours per week on it. This is assuming that they even have the language skills necessary to decipher the document (think of non-native English speakers, people without higher education, and so on.)

Creative Commons was on the right track with their human-readable licenses, see e.g. this example [1]. Apple is on the right track with their App Store "nutrition labels" [2]. This is what we need for people to make informed decisions. For physical objects like a Kindle, I believe such "nutrition labels" should ideally be put on the box (physical store) and website (online stores), so the consumer is aware before they go home and turn on the device (this makes it easier to compare the Kindle to a Boox or Nook at the store).

[1]: https://creativecommons.org/licenses/by-nc/4.0/

[2]: https://mashable.com/article/apple-privacy-nutrition-labels-...


ToS are effectively useless for this purpose.

If the industry moved to a standardized disclosure form (e.g. something like the HUD-1 [1] in real estate sales), people would stop complaining about this.

[1] https://www.hud.gov/sites/documents/1.PDF


1. Nobody actually reads Terms of Service (well, governments and some major businesses do, but 99,99% of regular users don't).

2. Nobody reads them because most of the time they are explicitly user hostile, I'm pretty sure they are designed to prevent users from reading them.


Yes! Even when I try to read the terms of service, I find them hard to understand. I feel bad because it’s sort of shame on me for agreeing to stuff blindly. User hostile is a good way of putting it.


Are they printed on the box in a readable form before the customer buys the product?


Very different things.


Payment is a fair point on Kindles, I get why web sites offers free services in return to commercials (and your data) but I paid for my Kindle and (most of) the content I read.


I don't think that will ease anyone with privacy concerns. People who are against government surveillance is not against the police catching criminals and solving cold murder cases. The Golden State Killer case was a very good use of DNA profiling and DNA databases being used to catch a criminal. The problem is that many don't trust the government to only use it for those cases, and many others don't trust the technology to have a low enough false positive rate to not cause harm to innocent people.

Understanding how the book reader features are used in practice is good. Selling the same data to a advertiser is bad. Profiling people into predefined groups is bad, and the technology has risk of having false positives/negatives that reinforce stereotypes. The law has yet to catch up to treat information gathered by libraries and information gathered by a developer of e-readers as being very similar in risks.


We can step outside of government examples, too, and find cases where corporations getting all data sciencey with this information have accomplished some pretty ucky - and also impossible to anticipate - things.

An instructive case here is Target figuring out that they could use customer purchase history to detect, with a pretty decent degree of confidence, when a customer was pregnant. They then proceeded to use this model to send out mailings, and those mailings resulted in people being outed in rather compromising and potentially seriously harmful ways.



IP address, country, goodread account details, each page turn, exact page location, etc., seem unnecessary for that.


Page location and page turn in there for syncing across devices, that's fine - ask the user 'sync across devices', if they say yes, not a problem. if they say no, don't send the data. Data that is stored would be something like 'currentlocation[$bookid] = $location'. Storing historical information (user was at location 1219 at 2020-01-06-05:12:41) is not required for that function.

Philosophy should always be store the minimum amount of data to provide the function that the user wants.

IP address is transitory and shouldn't be kept longer than needed for the tcp session, maybe it sticks in firewall logs, but that shouldn't be used for anything other than security.

goodread account details would only apply if you connect to goodread, I'm not sure what the benefit of that is, but I could see that 'user abc123 read this book' is useful data - again ask if you can send the data.


Fair enough. How do I turn it off?


The primary way that helps is to communicate that everyone on the team appeared to think this is perfectly acceptable to do without communicating it to the paying customer.

I mean, we already knew this, but it means any and all Amazon hardware must be considered potentially hostile.


Almost all hardware and all software (especially software as a service) should be considered potentially hostile.


It's not about how it is used, it's about how it can be used (especially when a less benevolent entity gains access to it.)


They have collected large amounts of data from pretty much day one on those devices.

Back when they had a cell phone in them. I was standing behind a guy who was supporting it. "Uh lets bring up where you are at? It says you are 10 miles off the coast of miami?...." "oh yeah I am calling from my yacht" "do you see any cell towers?" "no" "It kinda needs those to work. I am surprised I got the location data."


Privacy concerns are usually about how information could be misused, not how it's used right now or routinely.


A Kindle comes with Kindlings, a lesser form of the book, where you are being read by Amazon while reading; you are working for Amazon in ways you might never understand.

The Kindling never leaves Amazon properties; it is not yours even though you paid almost the full price of a book.

If there is rule of law in the US and EU, these will eventually become free e-books, that is, separated from Amazon; they will regain the status and properties of the book.


This is why you keep your e-books stashed on media you control, and put copies onto your Kindle when you you want to read them.

Same with any data you store on an iOS device. You never let a device you don't control have the only copy of any data important to you.


For example, readers might want to integrate their libraries into the knowledge base of their personal AI.


I don't care how they are used honestly, I care about options to disable it.


Yeah I came here to say the same. I'm about as tin-foil-paranoid-privacy-all-the-things as they come, but the "invasive" data mentioned in the post don't seem particularly invasive to me, and collecting that data seems perfectly appropriate for the purposes you mentioned.

With all that said, I do dream of a PINE64 E Ink device (or something that's open and hackable).


Remarkable is open and hackable.

https://github.com/reHackable


It also costs more than an iPad and has terrible response times


Yep, pretty consistent for e-ink.

Still, I think it has the best value proposition for an e-ink tablet at the moment, but I'd love to be proven wrong.


Probably true - I’ll snatch it up the moment color e-ink is a thing, color is vital for most of the papers I work with and for books I prefer a smaller form factor so from my perspective it sits in kinda an odd part of the market.


Color e-ink is close, which is really impressive imo. I did not expect to see it for years.

Who knows how long it will take to get good enough yields for affordable consumer products.

https://www.eink.com/color-technology.html


Yea analytics like this are really what I find to be so important, as a developer.

How much time and frustration do I potentially waste on something that no one ends up using?

Things like this are very useful and it's strange to me that people aren't sympathetic to that perspective.


I think a lot of people are sympathetic to that perspective while still wanting control over their privacy.

It's the difference between someone inviting you to come into their home for a visit, and you breaking in whenever you feel like to take notes on what they're doing.


It's strange to you people care more about their autonomy than your convenience?

Telemetry can tell you what users are doing. It doesn't tell you why.


I'm saying as someone who works in software I empathize with the idea of spending lots of time implementing a feature, tearing hair out over some technical issue, etc. only to realize no one uses that feature.

I'd rather people be able to opt-in, but conceptually I'm not really upset that people can see my usage patterns, etc.


I think most of us work in software. Asking for consent isn't hard.

Telemetry won't tell you nobody wants a feature you haven't implemented yet. User research might.


> the "invasive" data mentioned in the post doesn't seem particularly invasive to me[.]

Attempting to get the subnet IP address? That seems pretty invasive.

From the article:

> Attempt to get the IP address on the local network (a 10. address, which was incorrect for me)


What, exactly, will that do for them?


That's my point. The data is both A) Invasive and B) Pointless, unless trying to do things they shouldn't on your network. But they still collect it for some reason.


> don't seem particularly invasive to me, and collecting that data seems perfectly appropriate for the purposes you mentioned.

Fine. So you allow them to collect it. However, don't decide for others if it's "invasive" or "perfectly appropriate" for them or not. Do it opt-in such that people who wants to share their data could do that.

Oh yeah, and offer them payment for that. They deserve it.


There are some features in software I rarely use. But those times I do use them they are utterly essential. If I find such feature has been removed I am incensed.

Usefulness is NOT the same as usage.


> Usefulness is NOT the same as usage.

Metrics can tell that story though so you’re arguing a straw man.

Example: If you see that 99% of users have never used a function ever - you have a pretty good idea that it needs to be reworked or removed. You may also see a function that is used by 80% of users once a month, that you may opt to keep.


It's not so much that ubiquitous telemetry can't identify this, it's whether it's better for this than a focus group. You can have background telemetry with the focus group so you're not just giving customers what they say they want instead of what they need.


I'm not sure. While I understand that developer time needs to be cut down or restrained sometimes - though perhaps not at Amazon in this case, which concerns their core business -, your example could merely turn out to be a way of losing 1% of the users. Usage statistics alone cannot tell you whether your users hate or like a feature. Some features are always going to be used more than others.


What if that feature costs 30% of dev time? Without being able to measure you wouldn’t be able to make a good judgement. Imagine how science would work without experiments?


Wouldn't focus groups work better AND respect your users?

Devs think it is either telemetry or develop blind but in reality software was developed (and possibly was better) before telemetry using focus groups.


Don't care. Still hate it. Why not add in an opt-out of metrics in the preferences?


As a developer, that is how _your dev team_ used the data. Can you confidently say that the metrics weren't also being accessed by the marketing department for different purposes? Or that it wasn't being shared with Amazon's business partners?


I have quite often seen people here and on other tech forums assume that purchasing a Kindle means being locked into Amazon's ecosystem, giving up personal details, and having the risk that your books might be deleted. But you don't have to use the Kindle's internet connectivity: I have owned three generations of Kindle, and with each one I activated airplane mode the second I unboxed the device and I never turned airplane mode off. All my ebooks come from sources other than Amazon (mainly LibGen, for example), and they can be easily transferred over to the Kindle by USB because the Kindle appears as any ordinary USB drive to a computer.


If this practice ever get wide spread I would guess that the developers will limit airplane mode in someway in order to ensure that the device will call home at some point.

But it is a pretty clever hack to get a hostile machine to not connect to the internet as airplane mode is (I assume) regulated behavior.


Even if the developers take the egregious step of nerfing airplane mode, you can still "opt out" by not giving the device credentials for your WiFi network.


Only a matter of time before devices come with 5G data connections...


AKA, the original whispersync.

Yup, this was once a thing - you didn't need wifi for sync or downloading books at all.


Kindles at one point apparently came with free cellular access

https://xkcd.com/548/


They still do - it's an option on more expensive devices (Paperwhite and Oasis).


I had a kindle keyboard and it had 3g. It worked in a bunch of countries--slowly though. I remember reading blogs where people were taking the sim cards out and tethering using them.


To save money they could come with LoRA radios and sync when the opportunity arrises to a LoRA gateway, including meshing with each other to aggregate data to increase the likeliness of encountering a gateway. LoRA modules are pretty cheap.

https://www.thethingsnetwork.org/


...which would require a valid SIM. So just don't add one. If the device comes with a pre-inserted/hardwired/virtual SIM, well... several countries in the world require KYC-style registration of the SIM owner before networks are allowed to activate the SIM, so there'd still be an opt-out path for the user in such countries.

eta: My point being: Now you're in a twisty little maze full of corner cases, all different. Not the sort of thing much loved by Amazon (or any of the GRAFT).


FWIW, the original kindle used a cellular connection to do position syncing and book downloading. No user-provided SIM needed.


Not in the IoT world. The 'owner' of the Sim, the company that sells the device, would have a deal with one or more network providers to allow access, and take care of facilitating data retention and identification regulation.


Does this work with Teslas?


Only until you come into range of any open WiFi which is every public place everywhere


Is that Kindle that promiscuous that it will literally connect to any open AP without prompting?


As of my current device (the Oasis), no, it does not appear to be this promiscuous. I can't speak to the analytics, but the whispersync and book downloading doesn't work unless you explicitly connect it to an AP.


Parent comment was

> if the developers take the egregious step of nerfing airplane mode,

and I was responding that IF the developers decide to nerf the airplane mode it's very possible they will start using any open AP; some TV's are reportedly doing this already


Ok so you don't know, you're just speculating without evidence.


No, the entire scenario is a hypothetical, the standard of evidence is inapplicable.


I don't understand the point then.

Literally any device you own with WiFi could be updated tomorrow to connect to any open access point.


Here on hn, I read several stories where smart TVs did exactly that: they tried all available wifi networks to see if one of them worked.

It doesn't seem so far-fetched that the Kindle might, too.


Sure, but I'm looking for an answer and not idle speculation.


I don't think they can - if they do this, Kindle would not be allowed in airplane cabin.


You're allowed to use WiFi on planes (other than take-off and landing, currently) and Aeroplane Mode often allows WiFi and Bluetooth these days. The rules are changing pretty fast.


I have also owned three generations of Kindle! Like you, I've never taken any of them online.

Never supply a wifi connection during setup, and instead immediately engage airplane mode. USB transfer is easy with something like Calibre, which also handily converts ePub to Mobi for Kindle use.

It used to be that you could buy Kindle books and download them to your computer for transfer to the Kindle via USB, but they seem to have made that more difficult in the last year or two. Other sources still work fine, though.


I tried to do this with a recent paperwhite but some features seemed to require registration - the main one I cared about being "collections". Had to make a fresh amazon account, register it, then put it into aeroplane mode never to be reconnected.


Hmm, maybe that's why collections doesn't work for me. I might enable wifi for a bit if that will help.


Same here (sans the Kindle). All eBook readers I have bought have never been connected to a WiFi network. If I want to change the books, i do it via USB.

That fact that Amazon collects these very detailed metrics has been well known for a long time. You will find old discussions in the MobileRead forum. Here is a thread from 2013 "Block Big Brother":

https://www.mobileread.com/forums/showthread.php?t=205224


> I activated airplane mode the second I unboxed the device and I never turned airplane mode off.

Same, however I had to connect my Kindle Oasis to the internet 1 time after purchase though, if i remember correct it was to download the dictionaries (for translation) i needed. And i think there was a feature that was missing until i connected it to the internet once (i used a new/temporary account for that) but can't remember what feature that was though.


You could have got those dictionaries from a filesharing community and simply copied them over to the Kindle via USB. No need to connect the Kindle to internet.


Yea, I tried that first but all the dictionaries i downloaded didn't work on my device for some reason, so in the end i gave up...


Why buy a kindle at all then?

Any cheap budget tablet can read ebooks and stay off the internet.


Any cheap budget table isn't e-ink, which matters for battery life and, at least for some people, reading pleasure. Also, I mainly use my Kindle for reading research papers in academia, into the hundreds of publications each year. So, after years of using these devices its UI (which I find admirably simple and straightforward) is burned into my muscle memory. So, switching to another series of devices would mean having to adjust to a new workflow that may well bring unwelcome complexities.


Do you have any tips for reading papers on Kindle? My experience with pdfs is they get pretty shrunk up.


There are many cheap e-ink readers on the market these days, though.


I've never seen one that has the bang for the buck of a basic paperwhite. I got my last one for under $100 and I never use the amazon nonsense. I just keep it in airplane mode and load my own books.


There's also Kindle competitors, like the Kobo. You can also bet that if this becomes a wide-spread concern, another e-ink reader may come to market that offers privacy and security, maybe some sort of open-source, secure-by-default, ereader. Some attempts at this have already been made [1], but its not clear how strong the market demand for that is and if it will be successful. If you really want a privacy-centered reading experience, the easiest way to do this is just borrow the printed book from your local library.

[1]: https://hackaday.io/project/168761-the-open-book-feather


For people who read a lot, it makes sense to purchase device that is optimized for reading.

E-ink gives you Better screen for text, a lot better battery life, no apps, no notifications, no video ads, o ads in general, nothing flashy.

And kindles are relatively cheap, and available almost everywhere.


One reason is value. They produce so many, the quality is decent and the price is subsidized so it's artificially low.

Why is it subsidized? Obviously to make it more fun to buy books, but also collecting valuable data on your reading habits. Obviously they know _what_ you're reading but it seems useful to them also to know what you bookmark etc.

They also have all the hardware they need for location history tracking by remembering wifi broadcasts seen. Is it known if that's being uploaded?


eInk is a lot nicer for reading books than LCD screens.

The downside is that eInk currently only supports black-and-white and turning pages is roughly only as fast as turning the page of a book.

Also, battery life is counted in days (and sometimes weeks) and not in hours.


> The downside is that eInk currently only supports black-and-white

E-ink Kaleido with 4096 colors at 100 ppi are available commercially since past few weeks

https://www.youtube.com/watch?v=mqiCOheb1jo


Thank you! I did not know about this. This is a really cool development. Even if the saturation (as mentioned by rtkwe) is not the greatest, this is a big step towards reading more analytical texts with colored graphs.


Those are really neat, the big issue is they’re still pretty low saturation so the images don’t look as good as an actual LED/LCD screen.


The new Kindle Oasis is incredibly fast on page turns - feels almost instant coming from a Paperwhite


I haven't seen any cheap budget tablet that even comes close to the quality of a Kindle Paperwhite for reading.


Kindle is just a great reading device. The only feature that I _might_ consider using that requires connectivity is the Wikipedia lookup, and the verdict so far is that Airplane mode is more valuable than that.


Kindle is easily the cheapest and most functional reading device out there.


Same, although I just have the one old Kindle that I revived with a new battery. It's never had a network connection since I factory reset it. I just dump ebooks onto it via USB. It might be recording all sorts of analytics but I don't care because it'll never be connected to the outside world.

Plus, for all the people saying basically "it's for your own good", the battery lasts much longer on aeroplane mode. For this device, for me, WiFi is an anti-feature.


What formats does it handle? Can it handle EPUB for example?


For this I recommend Calibre: https://calibre-ebook.com/

It will convert any format of E-Book to a compatible format for the Kindle (usually MOBI) and allows you to upload it directly. I use it often and it's an amazing piece of software!


I actually prefer KindleGen[0] to convert EPUB to MOBI - I find it produces a superior e-book.

Edit: Oh no, Amazon removed KindleGen! When did that happen? I still have x86 copies for Linux and windows if anyone wants it. Supposedly "Kindle Previewer" can do the same thing, but a cursory glance looks like it no longer supports Linux...

[0] https://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000765...


> Edit: Oh no, Amazon removed KindleGen! When did that happen?

We know where this is going, don't we? Offering interoperability in the beginning and then gradually taking it away - the old bait and switch trick. We consumers fall for it again and again.


Aghh, I have (had?) a project that relied on downloading kindlegen as part of the CI/CD step. I just downloaded it from Amazon each time.

I doubt I have a copy of it somewhere. Does anyone have / know of a copy of kindlegen for Linux anywhere?

Edit: Wayback machine has it, thank god. Had a very real sense of loss there. Unsure why it's a project I've hardly touched for years. https://github.com/wjdp/gotdict for the interested.



Again though, KindleGen had Linux support, it looks like this doesn't...


Damn, they removed KindleGen?! What a bummer!


It works pretty well except for the latest generation of Kindle DRM. AFAIK, it hasn't been cracked yet. There are workarounds but the workaround result in a lower quality book.


I second that, I have older Paper Kindle with only wifi. And I never connect to wifi, Calibre is excellent for managing kindle archive without amazon's breathing down my neck.

Highly recommend.


No it cannot. It only supports Amazon's proprietary ebook formats: AZW and MOBI. I love the paperwhite, but the limited format support led me to choose a different ereader when I last bought a new one.

Edit: I know I could convert between formats, but that process is not always perfect and can lose important formatting.


Loading books using programs like calibre [1] allows you to covert EPUB to MOBI (the kindle format) seamlessly before transferring. In my experience this works perfectly.

[1] https://calibre-ebook.com/


The Kindle handles .mobi and .azw3. It is trivial to convert EPUB to MOBI before you send the book to the device over USB (it can even be done as part of a command-line script, for example).


Having to deal with the .epub conversion was the main reason I ditched my Voyage and got a Kobo Aura One instead.


Why ditch a device just for that reason? Again, the OP assumes that one will transfer books to the device over USB for privacy’s sake. In this case, creating a shell wrapper around cp to automatically convert any EPUB to MOBI upon copying the file (naming the command, say, kindlecp) is trivial.


Conversion can occasionally barf and completely screw up the formatting. It's annoying to have to go back and re-convert with different settings for a problem that's non-existent when native support exists.


There are online converters for it all, the process is a little annoying but works fine and you can do it locally too with various tools:

https://www.epubconverter.com/epub-to-mobi-converter/


I now do the same. I never take it off airplane mode unless I send a PDF or something to the device.


Same here - I only take mine off airplane mode when I am sending a library book to it, leaving wifi on eats through the battery significantly faster.


I just tend to use non-Kindle applications/devices for this. It's always been extremely easy to get non-DRM ebooks into Apple's book reading app (formerly iBooks, now just Books, in Apple's ongoing quest to make most of their application names as boring as possible). Perhaps ironically this makes Books the "non-walled-garden" app for me.

The pitfall in all this, though, is that there are a lot of commercial books that are only available from publishers that use DRM, and personally I don't consider DRM a sufficient justification for piracy -- so that leaves me stuck with locked books regardless. Lately I've been buying them from Apple rather than Amazon, although if I actually jump through whatever hoops are required to set up DRM stripping with Calibre for Kindle books, assuming that's still possible, I may switch back.


Would you agree that your usage pattern of the device is very atypical? I suspect (no hard evidence) tat 99% of Kindle purchasers use them primarily to read Amazon Kindle books.


Not sure about Kindle, but iPhone and Apple watch collect your location history even in airplane mode (by saving the list of wi-fi access points):

https://support.apple.com/en-us/HT207056, see "Significant locations"


That document doesn't mention anything about Airplane mode at all. Nor does it describe the Significant Locations feature as "saving the list of wifi access points"; in fact, I'm fairly sure that's not what they're talking about, and instead they're talking about the feature iOS uses to determine that you tend to go to the same place for lunch on Tuesdays or the same friend's house on Saturday afternoons and offer that as a Siri suggestion -- which is almost certainly GPS-based.

Last but not least, Significant Locations data is not just described as "end-to-end encrypted and cannot be read by Apple", it's clearly in the list of items under "By enabling Location Services, location-based system services such as these will also be enabled": e.g., if you're really, really bothered by this, you can turn it off.


Yes, this document does not tell how and when exactly it works. I took an Apple Watch and tested it myself before writing this.

Even if you trust that Apple does not use it for anything else, you cannot check this (no source code) and you cannot be sure that they won’t start using it in the future.

Opt-out tracking is not ethical. It should be opt-in.


I never turn off airplane mode, so my Kindle can collect all the data it likes! :)


Does the dictionary feature still work? It's the main reason I bought a Kindle.


The dictionaries are stored offline and will work fine in airplane mode. Wikipedia won't, though.


This is a bit unfortunate, because the kindle paperwhite is just phenomenal. It's easy on my eyes and it's a godsend for traveling. I suppose the solution here is to just keep it in offline mode when not syncing books.

[edit] as others have noted, it's possible to permanently use offline mode, and transfer books via usb cable.

> Unfortunately, in order to use a non-Kindle application, I have to buy DRM-Free books.

One can remove DRM for amazon's ebook format (.azw3 ?) via some python scripts. You didn't hear it from me though.



> Each request also isn't sent as soon as it's generated. A number of these records are created and stored locally, then uploaded (note the sequence_number field). Even if a person is offline while reading, this data is stored and sent when reconnected.

Keeping it in offline mode doesn't help.


> Keeping it in offline mode doesn't help.

Permanently keeping it offline and only transferring via USB does.


>One can remove DRM for amazon's ebook format (.azw3 ?) via some python scripts. You didn't hear it from me though.

Not for the new KFX format. Only way to get around that is to use an older version of the kindle desktop app that downloads the azw format. Workaround won't last long though. And won't work on newer macs because the old version is a 32bit app .


Apparently you can do the conversion with Calibre.[1]

1: https://epubor.com/how-to-convert-kindle-kfx-to-epubpdfmobi-...


Last I checked (a year ago?) KFX wasn't a great input format, as it's optimized for the Kindle readers and not for conversion/interoperability. That is, KFX is to AZW3 as PDF is to HTML.


Sure, but if the book you're looking for is only available on Kindle and your eReader is not a Kindle, then the conversion is better than nothing.

I've found some O'Reilly ebooks only available on amazon in the format "Kindle Edition" (ie. KFX). Pretty aggressive market strategy from amazon given EPUB3 is the technical standard, but there you have it.


On my Amazon account, I can download my purchased books as AZW3 from the following page: https://www.amazon.com/hz/mycd/myx ('Manage Your Content and Devices'). (As I understand it, AZW3 is mostly the same thing as EPUB3.)

(Either that, or the files I download from there aren't actually AZW3 files but just KFX files with an .azw3 extension.)


> One can remove DRM for amazon's ebook format (.azw3 ?) via some python scripts.

The fonts can be a pain to descramble though.


This is only one reason why I absolutely love my Kobo Aura HD, it's never been connected to WiFi. Its storage device is a standard SD card which can be swapped for a larger one. Oh, and it's not giving money to Amazon which is always a big win for me. It also happens to be a super nice piece of kit, and it has my warmest recommendations.


That's a sensible approach, but sadly Kobo probably does something similar for those who are less savvy than you:

> We collect Personal Information when you use or otherwise interact with the Kobo Services. For example, we collect information about how you use the Kobo Services, such as pages you view, the rate at which you consume e-content (how often and for how long), genres, authors or subject matter you prefer and searches you make or share, the ebooks or audiobooks you have liked, comments you have left and also websites you have viewed through links in the comments. [1]

It's depressing that the market will not stomach the true cost of "dumb" hardware anymore, so it's becoming harder and harder to find. Everything that can be subsidised with hoovering up data, or pushing content, is. If this is the thin end of the wedge, I dread to think where we're heading.

I have an 2010 Kindle Keyboard and naively thought that we wouldn't end up here. The closer we got the less likely I am to "upgrade".

[1] https://authorize.kobo.com/terms/privacypolicy


Oh, I've actually never read that, I wrongly assumed that != Amazon == good guys.


My kindle is in airplane mode since I opened its box and I send books to it via usb. No one is forcing you to use amazon services, I didn't even pay for the ad free version but I've never seen an ad.


I've actually found it quite challenging to purchase books to put on my Kindle that aren't from Amazon, since they use a proprietary format.


I would say exactly the opposite. I regret of buying a book from Amazon [0] dedicated to Kindle-use, because it is DRM protected and I am forced to use "Amazon Kindle" application, otherwise I cannot open it. I am usually okay with DRMs but I miss a fact I haven't bought it elsewhere with less annoying protection.

[0]: https://www.amazon.com/Designing-Data-Intensive-Applications...

Psst, "Designing Data Intensive Applications" was very good read. Do you know similar books that focus on distributed systems?


I had completely forgotten that .mobi is proprietary!

In principle, you're absolutely right. In practice, .mobi is easy to generate, modify, fold, spindle, and mutilate with free/Free software.

Even Amazon's .azw is just mobi with * replacing $.


> I've actually found it quite challenging to purchase books to put on my Kindle that aren't from Amazon, since they use a proprietary format.

While MOBI began as one's company's proprietary solution, the format is well over a decade old now and quite well understood by the Free Software community. Calibre can convert EPUB (or anything else, really) to MOBI, so you can buy or pirate your ebooks from anywhere and easily put them on a Kindle.


You need to get Calibre!


I think you mean "Predictably" rather than "Surprisingly".


“Obviously”.

What isn’t collecting “too much” data at this point?


I'm sure someone like me always has the same "hot take" in every thread regarding this, but I honestly still love reading physical books. After spending a day weary of interacting with screens all day, there is something nice about tapping in to this activity that humans have done for hundreds of years. Sure, e-ink is easier on the eyes, but isolating myself with a good book can be a near spiritual experience.


I agree.

E-Readers do a hell of a good job at emulating the experience with e-ink displays & you can't compete with the ability to carry 1000's of books in your bag, but there's something about the reading experience that I wish to keep completely 'analogue'!


I read, on average, about two books per week on Kindle.

I buy, on average, about one book per month on paper.

There's nothing quite like the smell and feel and experience of paper books, and there's nothing quite like the convenience of Kindle.


I love reading physical books too, the user experience of them is so much nicer.

I also like to go back to re-read books. With non-fiction I'll often want to go back to reference or quote something, and with fiction I love reimmersing myself in the worlds the author's create.

I've amassed quite a little library of books that I still enjoy having access to and it's lovely. But it's also /terribly/ inconvenient to move to a new apartment. It's also quite annoying when I'm visiting a place, and I'd love to pull up a favourite story but didn't think to bring it with me.

I've started moving to a hybrid solution - My absolute favourite stories I keep in paper because I enjoy the feel, but for most books having them digitally much nicer.


I have great spatial memory for things I've read. I was able to pull up a quote from a book that I read the summer of 1992 seven years later because I remembered roughly where in the book and on the page the quote appeared. I could probably go to my library and find it still another 21 years later. I don't get that from e-books.


I do too but I also don't like to lug around the latest 10k page high fantasy epic I'm reading on a plane.

I think there's room for both.

I use my Kindle for reading my pop-fiction and stuff I like to read on the go or in bed.


Yes, and once read, books can stay with you on the shelves you live with in your habitat, reminding you who you are and what you know and believe.


It would be quite interesting to know how this data is actually used on Amazon's servers. It reminds me of the criticisms of government data collection programs, that they just hoover up every bit of data that's available without actually knowing what to do with it. Suppose you train some AI to predict what pages in a book will be most engaging to the reader. Since your interface to the book is still just going to be something where people can turn the pages what are you actually going to do with that information? It's a massive sacrifice of the privacy of the user for small gains at best in getting insight into the user's behavior. I wouldn't be surprised if this information is sitting in a database somewhere at Amazon completely unused.

The philosophy of Amazon appears to be to do as much as possible in the hopes that one day it will be useful. This is at odds with the principle of philosophical skepticism, that because we can't be sure of the consequences of our actions we should strive to do as little as possible. The data could be hacked and leak out, for example. There is tremendous uncertainty around things like that.


I formed my opinion before clicking the article, already working out some comments in my mind like "who's surprised?" After reading the article though, surprisingly my opinion changed. This doesn't seem all that bad. I don't doubt that Amazon is over-collecting, but the samples he posted seem like it's just information for syncing reading position and settings. Of all the nefarious things Amazon does with data, I don't think that's one of them.


I did some research on early Android sending a bunch of data back to Google's servers, a few months later the information was encoded/encrypted before being sent over the wire. I'd be curious if the next app version of Kindle started obfuscating what it was sending back home.


Why would you leave on wifi on an e-ink kindle, when not actively downloading a book? The battery lasts 3-4x as long with it disabled (on my 3rd gen device at least).

I doubt most users need a real-time sync of their book location to the cloud, unless they read on multiple devices.

Also, if you use the kindle to get loaned/library books on this particular model, they aren't removed even if the due-date is exceeded until you reconnect to wifi, which has been handy at times...


> Why would you leave on wifi on an e-ink kindle, when not actively downloading a book? The battery lasts 3-4x as long with it disabled (on my 3rd gen device at least).

I concur with keeping the wifi off while not downloading, because battery life is way better, but it doesn't help against data collection.

> Each request also isn't sent as soon as it's generated. A number of these records are created and stored locally, then uploaded (note the sequence_number field). Even if a person is offline while reading, this data is stored and sent when reconnected.


> Why would you leave on wifi on an e-ink kindle, when not actively downloading a book?

One of the much-advertised features of the Kindle is its ability to highlight a word and look it up against a dictionary, against Wikipedia, or against the web.


You don't need internet connectivity on the Kindle to look up a word in a dictionary. The Kindle supports dictionaries in Mobipocket format, so the dictionary lives right on the device. It is easy to find .mobi dictionaries for major languages freely available from torrent communities.

Using the Kindle's Wikipedia function actually requires going through Amazon's servers and is a privacy violation, so I would not recommend users do that.


Mine does not even have wifi. I prefer it that way.


I’m not surprised, but I suggest the Kobo e-reader to the OP. Can use multiple formats, easy to upload books to it, and some models have expandable memory. You can completely disconnect it from the internet if you want.


There is also alternative open source firmware available for Kobo devices: https://github.com/koreader/koreader


I tried to read the first link in the article, the link in the sentence

"There have been cases of Amazon removing specific books from customer accounts (and kindles)."

It redirected me from:

https://io9.gizmodo.com/amazon-secretly-removes-1984-from-th...

to

https://www.gizmodo.com.au/amazon-secretly-removes-1984-from...

So it seems I am not allowed to read up about this reference.

Or some underpaid developer messed up the redirects.

Either way this issue about data collection is interesting in its own right, but this other issue of global redirects also feels important, but I only say that as someone who tried to follow the news here.


Fascinating investigation and good article.

But this doesn't actually surprise anybody, right?


> Unfortunately, in order to use a non-Kindle application, I have to buy DRM-Free books

No. All you have to do is own an old Kindle (buy one on ebay if necessary). Then you can download DRM protected Kindle files from Amazon for this old device, and Calibre and the appropriate plugin can un-DRM them, and transform them in any other format (epub, mobi, text, rtf...) for you to use on your app of choice.

It's certainly better to buy DRM-free books directly if you can find them, but the above solution works quite well.


I use my Kindle Paperwhite completely offline. I factory reset it and haven't connected it to WiFi since and just side load what I need(I did have to strip the DRM from my Kindle books to side load them on the unregistered device). I never really used the online features when it was registered previously and kept it in airplane mode to help with batter life. Another bonus is that if a freshly reset Kindle never connects to the internet, you never get the ads.


How are the alternatives. Although i will miss my collection of books but I'm going to be in the market for the next ebook reader.


I wrote https://remy.grunblatt.org/blog/kobo-aura-h2o-hacking.html a while ago. At some point it sent ISBN to google.

The domain I extracted for my kobo aura:

  api.ipinfodb.com
  api.kobobooks.com
  auth.kobobooks.com
  authorize.kobo.com
  kbdownload1-a.akamaihd.net
  kbimages1-a.akamaihd.net
  mobile.kobobooks.com
  pool.ntp.org
  script.hotjar.com
  social.kobobooks.com
  ssl.google-analytics.com
  static.hotjar.com
  stats.g.doubleclick.net
  storeapi.kobo.com
  vars.hotjar.com
  www.google-analytics.com
  www.google.com
  www.google.fr
  www.googletagmanager.com
  www.msftncsi.com


Nice, thanks for that writeup


Kobo's have comparable (even superior, IMO) hardware to the Kindle line. The thing that everyone who migrates from Kindle to Kobo seems to get hung up on is that it does not have an option to wirelessly sync books that have been sideloaded across devices. This is because Kobo does not give everyone a private cloud like Amazon does (I imagine it would be prohibitively expensive to do so for anyone but Amazon).

It's not a big deal for me, but apparently it's a dealbreaker for some Kindle refugees that they can't start reading a sideloaded book on their phone and pick up where they left off when they open their Kobo.


I have a $350 Kobo Forma and the UI is so slow compared to my $200 Kindle. It takes a long time to startup and it has horrible & slow touch detection which makes it really hard to highlight quotes properly.

Maybe other Kobo variants do better however.


I don't see why that should be expensive/difficult. Ebooks are mostly small files. It would be hard to ramp up a gigabyte unless you end up with image laden items such as pdfs.

Synching can be an issue. I had a one of the early kindles, and it was fine until I hit a few hundred items. It would re-index and be completely unresponsive for 10minutes at a go. That could have been done cloud side. In the end I decided I needed to purge loads of documents/titles to get it useful again. But accidentally sat on it. So game over. Moved to a simple Nook and SDCard loads.


I just switched from my Kindle Paperwhite to the Kobo Libra H2O and I really like it.

It's easier to hold with dedicated page turn buttons, good lighting, and fast screen response time. Also water resistant and good battery life.

So far I've been able to get all the books I've wanted, mostly from the Kobo store, but it can work with any open format.


I've switched from Paperwhite to Kobo (Aura I think?) and the highlighting feature is really making me miss my Paperwhite.

1. I can't highlight text across pages.

2. There's also an issue in which I navigate to some highlight and the text gets shown in a dark grey against black background, making it nearly impossible to read.

3. Since I can't highlight text properly (thanks to issue 1), I can't simply extract my highlights from a book, so I have to manually type it on a laptop, which is a painful experience thanks to issue #2.


What drove you to abandon your Kindle Paperwhite, MattPalmer1086?


The battery stopped holding its charge, so I looked around. I particularly wanted open formats, and I've been getting away from Amazon in general.


Though I haven't analyzed other devices (because I don't own them), they could easily have similar issues. I personally really want an open e-ink device, but I haven't seen one for sale unfortunately. For now, I do Calibre ODPS server with Marvin app on a phone, but it doesn't really compare.


I have an Onyx Nova 2 and I like it quite a lot. It runs android and has access to the android ecosystem, so I can read my webnovels and mangas and even kindle books without needing to use any external applications like Calibre.


I read this comment on my Nova 2. It's a very nice capable device for tasks like web browsing, email, and note taking (either with the pen or Bluetooth keyboard).


I don’t know the alternatives, but do know you have software like Calibre in order to keep your book collection despite changing your device.


Got my mother-in-law a Kobo Forma. Relatively pricey but I was able to walk her through how to check out a book from her local library via Cloud Library & transfer it to her device. Was a life-saver while the physical library was closed due to Covid-19. I was a little concerned as there were complaints about fabrication but her experience has been very positive.


My favourite ebook reader is:

Aluratech black and white https://m.youtube.com/watch?v=e2WoVRsap9Q

No drm, suppported all formats, held a charge for a week. No internet. Fits in jeans pocket.

It came out in 2009... I wish they still made them.


Feels somewhat abandoned at times, but Apple Books is okay.


Would love to replace my Kindle with another device. Any recommendations? - Also, I appreciate a local file on the Kindle that logs all my highlights (this file is called `My Clippings.txt‘. I parse that file and have a wonderful summary of the books I read. Any other ebook reader that creates a file like that?


It will make people uncomfortable, but this is standard practice in terms of event collection for analytics. Many articles here write about discovery from the side of a particular app or site.

If people reviewed some analytics solutions (many trials are available), then they'd see how pervasive this is and what product vendors are encouraging. The like's of Amazon have much more scrutiny around the use of data collected than those of smaller organizations. Obviously, they wield great market power so the concerns are broader, but an attacker has a much better chance of raiding smaller developers for volumes of data with much the same fidelity.


Some users also buy a Kindle which is subsidized by ads? I pay to avoid this and change privacy settings..

If you are using a device designed to market to you - they almost all run ads and collect analytics. I guess this is technically not a user facing feature, but it provides some user benefit (cheaper price).

Does anyone know sales breakdowns? If everyone is concerned about privacy / not being marketed too I guess the versions with ads are not selling. But I've been surprised not that marketing platforms collect data (authors website did) but that most users don't care about this "abuse" that the author is so concerned about.


The early Kindles didn't do this. It used to annoy me to no end that I'd have to manually tell my Kindle to sync when I was done reading.

Originally, I didn't realize this. I learned this when I'd pull out my phone in a waiting room, or on a train, only to not be anywhere near where I last read the book on my physical Kindle.

Now, I'm quite happy that Kindle syncs aggressively. I use an old phone to read in my hot tub, and it's great that the book opens up to the last place I read it, no matter which phone I'm using.


I can't find a reference to it now, but I recently read something referencing the massive quantities of kindle data amazon give you when making a GDPR data subject access request. I think it was something like 100k rows of data for one user.

Perhaps I should do that myself.

Edit: You can request your kindle data here (UK version): https://www.amazon.co.uk/gp/privacycentral/dsar/preview.html


I still can't find the original post I read, but the guardian wrote about this recently [1]

[1]: https://www.theguardian.com/technology/2020/feb/03/amazon-ki...


Kindle is a great tech (e-ink) with a terribly expensive ecosystem (amazon store) for books.

I load all the books I get directly from my computer (Mostly from project Gutenberg).

Turning airplane mode on permanently now.


Are these requests sent to a separate domain? I may have missed it in the article but it’d be great to know whether we could null route these without disrupting functionality.


I reckon it's time to stop working around all data collection bullshit. No more technical solutions to political problems.

Applying technical workarounds is still supporting a company, and is giving them a thumbs-up to keep at it.


> I reckon it's time to stop working around all data collection bullshit. No more technical solutions to political problems.

I agree, generally. However, if you already have the hardware than it's wasteful to not make use of it.


That's a great idea! It looks like the 'bad' stuff goes to unagi-na.amazon.com


Exactly, I also added:

mobile-app-expan.amazon.com

cde-ta-g7g.amazon.com


Awesome, thanks for the quick reply. I'll add this to my pihole config.


Probably worth adding to the PiHole


My thoughts exactly!


I’ve found similar concerns in an official church scripture app which I will not name.

It was sending an enormous amount of data back to the church including what the user was reading and for how long, everything the user highlighted or bookmarked etc.

It was enough to really question the need for such data.

I really believe that if that data served a legitimate purpose to the functionality of the app (which I’m sure a lot of it did) then the data should have been saved locally on the users device.


As much dang money as Amazon makes off kindle, why are they also spying? I guess "because they can" will always be a useful refrain, but I really wish there was plain english version of what information they collect at any given company/web app/mobile app/OS kind of like the attorney general's warning. Not something that is 20 miles long with legalese that any non-attorney can decypher


After I put Pihole on the network, wife's kindle was almost immediately the biggest offender.

That said, the article appears to list activity type ( which is problematic in itself -- time stamp + person is reading now ). I can see a legitimate use for it, but I also hate the idea of being profiled in that way.

To be perfectly honest, Kindle does not seem to pull more than average Android phone ( thought that is problematic in itself ).


I have a 2015 Kindle Paperwhite. I've put it on flight mode the day it arrived and it never went online again. Yes, loading new books takes slightly more effort (I use USB transfer with Calibre) but the peace of mind I get is more than worth it. Unlike OP, Amazon can neither track my reading habits (beyond my ebook purchases) nor delete anything from my Kindle.


That doesn't seem like a large amount of data.

The character analytics stuff is probably contractual obligations they have to publishers. The publishers probably want to double check the way people read as well and ensure that they are paid out correctly.

The other logging, as someone else mentioned is probably analytics for their own product development.


I was always curious why Amazon's Dynamo was co-developed for Kindle. Kindle didn't seem like the sort of product that required its own scale-free key-value store. An object store, certainly (for the books themselves); and maybe a relatively-mundane sharded key-value store, for read positions.

But this kind of explains it, to me.


Amazon loses when users take the discounted kindle, never enable wifi and source books from libgen. These users would be addressing their privacy concerns and saving money. Perhaps it isn't the largest market, but Amazon isn't exactly incentivizing participation with these privacy policies.


This is why I am skeptical of Kindle. It's Orwellian to know all the details of a person's reading habits, and all the minutia of a reading session.

This is why I download e-books from the dark web and read them on an airgapped machine, free from The-eye-of-Amazon


> The local IP is the only item on here that bothers me, though I couldn't find any other local network information that would be problematic.

It seems that the author is not really that surprised with the amount of data being collected.


So one way to avoid all data gathering might be to keep your Kindle on airplane mode permanently and load/remove books via USB. Battery would last longer too. It also kills ads on the cheaper version of the Kindle.


What is the surprise? Who doesn't collect data? As long as that data is anonymized and used for improving their product(s), I am fine. It will be scary if the data is used for selling ads/data itself.


OT: What is the app used in the screenshot to capture the HTTP requests?



I had a funny situation with kindle. It was connecting to the internet all the time, I enabled airplane mode and then it started complaining about it all the time.

Out of spite I added password to my wifi (I didn't have any and I even named my hotspot smth like "free" for my neighbors to use, wouldn't do that now).

To my surprise, some ~8months later I discovered my kindle to happily connect to my wifi. I'm pretty sure I would never enter the password there, because the kindle was the reason I added password to begin with. Maybe there is some more sane explanation than "kindle bruteforced my wifi", like a bug or some nuance in authorization protocol?

edit: it happened 7 years ago with kindle 2013 paperwhite.


Given how data driven Amazon is this is not really a surprise, is it?


Our local library does drive up pick up. Obviously not as instant as a download... but man it is nice to leave the house for a few minutes. Kills two birds with one stone.


Cool investigation. Thanks for sharing. Have you analyzed what data Marvin collects in each session? Before switching I'd want to see a comparison.


Just wait until they learn about the "behavioral reading" data collected by, oh I don't know, virtually every media site on the Internet.


The biggest difference in my mind is that the Kindle is hardware you purchase.

It has no need to be sending that much data, including attempting to find out the local IP.

The article stated that a few seconds of usage sent 100 requests to Amazon servers. I'm fairly certain that most websites don't make quite as many requests as the tablet did.


Well, I tried browsing without NoScript for a little while.

I stand corrected. New Reddit made 150 requests in about 30 seconds, not counting images/media/html.

That being said, It's easy to block many of these with NoScript/uBlock Origin.


Who cares if your local IP is sent somewhere?!??


Okay lets rephrase it as:

"Large corporation collects massive amounts of data, including data that could only be useful if trying to do something malicious on someone else's network."


Most of the time my Kindle is on airplane mode - does anybody know if my Kindle will still send this data later all at once, when the wifi is on?


Surprisingly?

Legitimate or not, it seems obvious that Amazon would be heavily monitoring device use, especially with the ad-supplemented devices.


Can anyone provide a viable open source or non-privacy invasive alternative that isn’t something I need to assemble myself?


This is covered in the terms of service (you read those before using the device, right?):

https://www.amazon.com/gp/help/customer/display.html?nodeId=...

That doc also includes instructions for how to opt-out of this collection:

> you may opt out of processing of your personal data relating to the use of your Kindle e-reader collected by the operating system of that device ("device usage data") for marketing and product improvement purposes via All Settings > Device Options > Advanced Options > Privacy. If you turn this setting off, we will stop processing this device usage data for the purposes of serving you customized marketing offers and improving our products and features. Turning this setting off will not affect... your ability to use features of the device, such as data syncing or backup features or Special Offers we display if you purchased a device that includes Special Offers, as we will continue to collect and process your data to deliver those features to you

I'm interested to see whether this sort of biometric/behavioural data will ever be thought of as Personal Data under GDPR (since I bet you can identify someone from their browsing behaviour, just like you can using walking gait and typing cadence). If that was the case you'd need to present an opt-in when you first booted the device, which I think would resolve the complaints from most folks in this thread.


> The local IP is the only item on here that bothers me

What! Why? What about all the other data?


What irks me about it is that Amazon doesn't give me access to that data.


I just turn off the Kindle's wifi unless I actually need it.


Just use airplane mode? Will also increase the battery life.


From the article:

> Each request also isn't sent as soon as it's generated. A number of these records are created and stored locally, then uploaded (note the sequence_number field). Even if a person is offline while reading, this data is stored and sent when reconnected.

That being said, if you leave airplane mode on permanently and sideload books, you should be fine.


How is anyone still using a Kindle after the 1984 scandal?


As a shareholder, this is disappointing to read.


Maybe I am getting less fervent about privacy and data security but I don't see these metrics as PII.

This is a complete whataboutism but you gave Amazon a lot more information when you purchased the kindle from them.

I think the answer is Amazon should add an option to turn this off.


> you gave Amazon a lot more information when you purchased the kindle from them.

Kindles are sold in physical locations – at least in the EU, many Kindle owners got their device from a local electronics shop. You don't necessarily have to order them from Amazon. Then, when you unbox it, there is no requirement to register with Amazon or even connect to the internet at all.


definitely going to be selling my kindle now...


an interesting article

thanks for sharing this

thankfully Kindle is not selling very much (relatively) so it is not a big issue if they collect a lot of data


How was this achieved? I didn’t realize a Kindle supported HTTP proxys or installing root certs?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: