This statement - "None of these requests appear to be used for customer features like last read location." - bugs me, because it's fairly obviously false, and detracts from the real concerns.
To sync a "last read page" across devices, you need to send a location back to Amazon. It's also appropriate to tie a location to a device, so you can pick the appropriate device to sync your position from. And, when you highlight a word, the translation, definition, and wiki page is brought up, so of course it's being sent to bing and wikipedia.
There are valid concerns here (there's too much information being sent overall - the location data doesn't need to be sent with every page turn, for example), but these concerns are being buried behind FUD about none of this data needing to be transmitted.
EDIT: Can I also point out the ironic nature of griping about Amazon's analytics collection while running an analytics suite on the webpage yourself?
I mention that the data that appears to be used for those purposes is sent again in a separate request to a separate end point, so we have two types of requests: last read location, and reading analytics. Sorry it wasn't clear, I'll try to improve the wording.
Will you also be updating and noting that the requests to Wikipedia and Bing are for explicit customer-benefiting features?
Might be worth noting that you can opt out of their data collection (on the e-reader, at a minimum) as well. Settings > Device Options > Advanced Options > Privacy or in the device management console in your account on amazon.com
> Is there a reason why that text needs to be sent before the user clicks the "translate" button?
Yes - UX latency. I would expect this kind of thing to take a few thousand milliseconds, and shaving off a few hundred milliseconds from between when the user highlights text and when they select "translate" is significant. The fact that this data is being sent to Wikipedia of all places further signals that the usage is likely to be innocuous.
Do I think that this is globally a good design decision? No, for both engineering and privacy reasons. There's definitely no good reason why it should be sent to Amazon at all.
> There's definitely no good reason why it should be sent to Amazon at all.
I was wracking my brain on this, and all I could come up with was "to independently verify the invoicing for Bing translations" and "how many times are people accessing the definition/translation and not highlighting". So, analytics, not something that explicitly benefits the user.
Can we stop pretending that analytics don't explicitly benefit the user? Product Engineering organizations rely on analytics to improve user experiences.
Analytics can be done less granularly and still benefit the user. Also, surely not every data point collected is used to benefit the user.
For example, Amazon doesn't need to know where I am when I request a definition or translation. If they're concerned about usage, they only need to know how many times I actually used one or both of those features per day, per week, or month. They don't need to know instantly every single time a word is highlighted.
> Analytics can be done less granularly and still benefit the user. Also, surely not every data point collected is used to benefit the user.
How? For all we know, it isn't granular - it might be aggregated at the server level to hide specific user's actions. But they'd still need to be sending in the data from the device to the server.
The device could keep a daily count of interesting actions, and sync that to analytics servers on a daily or weekly basis. That preserves 95% of legitimate use cases while leaking much less private data (like how my reading habits are distributed across the day)
I mean, you're still collecting most of the problematic data. And you might legitimately be interested in what you're leaving out - knowing time of day that people do things is actually important for plenty of use cases.
I'm surprised you'd say that. Out of interest, how does analytics help websites not use blathery, unhelpful text in overly-small fonts, done too-pale to make them unreadable. A lot of UI failings are of this most basic kind.
When you play an online slot game where you bet money that some numbers will appear on screen, and they use analytics to "improve user experience" (read: engagement, read: you losing more money), is that benefiting you or is it benefiting them?
Kindle devices have a dictionary on device. By looking into which words are most frequently defined, they can add these to the local dictionary to help improve the speed of the UI.
This isn't universally true - Dan Luu's computer latency page[1] lists three Kindles, all below 900 ms of latency. And, since some devices have latency as low as 570 ms, it makes sense that they would use this optimization.
have you actually used a kindle? it certainly doens't take seconds for the definitions to pop up. a full-page refresh might take a second, but most page turns or UI interactions are partial draws and are much faster.
I suspect that they were overstating a limitation of these devices rather than speaking from inexperience. While it has been years since I've used a Kindle, I do use Kobo devices and the delays are perceptible. While changing a page may be quite quick, user interface elements (such as a box containing a definition) seem to take longer. I suspect that they have to be more agressive when refreshing the screen before and after these user interface elements are displayed in order to make the ghosting less perceptible.
If you want to see what I mean by the ghosting of user interface elements being more perceptible, try using KOReader. The ghosting after using a menu can be quite noticable (at least on Kobo devices, which are based on the same technology).
And the fact that the screens are slow should be motivation to make the rest of the system as responsive as possible. A good software engineer will work around bottlenecks, not shrug their shoulders and introduce new ones.
Also remember that "Kindle" can refer to an app on your phone or desktop computer, all of which may share code related to highlighting and translating.
That doesn’t seem right. Let’s consider the screen refresh to be like a subway station, where the train shows up every few seconds. We need the text we want to show to the user to be at the stop waiting when the train arrives. If we miss the train, we need to wait for the next train to get our text on the screen. The network latency delays when we show up to wait at the station.
If the refresh rate is 5 seconds, and the network response time is 500ms, than eliminating the 500ms response time means we are 10% less likely to miss the train. On average, the time for the text to appear on the screen decreases by 500ms.
All this assumes the refreshes happening on a static schedule. If the software can trigger the refresh, then it’s a lot simpler. The 500ms improvement in latency would apply equally to every engagement with the translate feature.
There's no static schedule. It's an e-ink display. Refreshes happen when software tells it to display something new and take several hundred millis per blank - and a screen can be up to three blanks (because if it doesn't go white-black-display, then some pixels get stuck "on" or "off" or "halfway").
In that case, it’s clear that eliminating the network request before triggering the refresh directly reduces the amount of time the user has to wait to see the result.
There isn't a "translate button" - the selection of the word i the button for define/translate/wiki. You swipe between the three cards.
I like this, as a user. I don't want MORE buttons to tap through when I'm trying to define or translate a word. Especially since the Kindle eink screen and UI is not the most responsive.
> Might be worth noting that you can opt out of their data collection (on the e-reader, at a minimum) as well. Settings > Device Options > Advanced Options > Privacy or in the device management console in your account on amazon.com
Good tip, I'm going to give this a whirl. Unfortunately, all the network calls add a significant amount of latency even if one didn't care about privacy.
>(off-topic) What’re the advantages of pihole over /etc/hosts?
It's good for cases exactly like this - devices where you don't have control over /etc/hosts (or where you have lots of them and don't want to keep the hosts files in sync). I use it for my Samsung TV to keep them from phoning home (but still letting me use apps)
Edit: you can also set up a DoH endpoint and filter traffic while also allowing Encrypted SNI to work
> It's good for cases exactly like this - devices where you don't have control over /etc/hosts
Is the pihole a DNS server or a firewall? Sibling comments suggest it's a DNS server, but that doesn't answer this need at all -- if you don't control /etc/hosts, you don't control the device. It can do its resolution however it wants. Most obviously, it can include the domain names you don't want it to reach in its own /etc/hosts file, which you just said you didn't control.
In addition to sibling replies which point out network-wide usefulness... pihole (or any dns server) can/will return NXDOMAIN instead /etc/hosts which will only return an ip. A dns server can also be configured to match a domain and any subdomain (wildcard match) without having to specify each entry individually.
They both work similarly if you're using them to block outbound requests, but a Pi-Hole would intercept and block outbound requests for every device on the network where it's installed, whereas editing /etc/hosts would only block requests on a single device (unless that device is your router, I guess?)
I liked the article. If you are gonna update it, please consider also mentioning technical aspect. Frankly, Amazon snooping on users is to be expected, but short mention of app for which platform have you analysed using which tools would be welcome addition.
> Frankly, Amazon snooping on users is to be expected
Snooping on users during e-commerce transactions, sure.
But recording user's detailed interactions with every ebook? I hope that's a big surprise to your average Kindle user.
It would be great to see a data request response and how much of this data is retained and for how long. It's clearly not anonymized at the request level.
Very easy to see a future where just reading certain books or reading certain books too many times could flag you as dangerous or be used to support a mental incompetence hearing resulting in loss of rights.
I believe page location analytics are used for the amount of money that goes to Kindle Unlimited authors, also.
It can't just track the very last page in the book that you read, because authors were gaming that by encouraging people to immediately skip to the last page of very large works they didn't otherwise care about. Instead there's some kind of heuristic that tries to figure out if you've more-or-less-normally read the book.
I think the reason to send a sync every page turn is you don’t know if the device will be in contact when any alternate sync trigger happens so to keep it mostly up to date the best option is to constantly sync whenever you have connectivity.
I honestly don't mind the FUD as long as user don't have options. Amazon deserves the bad press in that case. Kindle is an awesome screen reader, but such features make it a bad device. A good device just had an option "sync usage data to Amazon account" <yes/no>. People suggest it is a technical impossibility.
It is just a shame that you have no options. Had to quickly search if my kindle has GPS capabilities. Gladly it does not.
"Kindle Collects a Surprisingly Large Amount of Data" is a completely honest and in my opinion correct statement. So yes, companies are dishonest in their data collection practices and responding with exaggeration is maybe wrong. But I do care more about the data collection issue.
> A good device just had an option "sync usage data to Amazon account"
The Kindle has an option to "sync last page", which you can turn off -- that sounds like it could be exactly what you're asking for, but more experimentation would be needed to know for sure.
I didn't see any mention of this config in the OP, aside from mentioning that the feature exists, so it's unclear whether the data being sent is used just for that feature, or whether less data is sent if the sync feature is turned off.
Note that it doesn't just suck because you're giving up using the Kindle itself. It also sucks because you'll be losing your entire collection of Ebooks, which are DRM-encumbered and can not be ported to other non-Amazon devices/platforms/apps.
This makes it extremely difficult for other privacy-respecting platforms to compete on the market, since using them requires the user to either break the law by stripping DRM from their books, or to abandon their entire purchased library.
Future TOS/EULA/Privacy changes that might not have been in place when a user originally bought their Kindle can thus be forced on them by making it prohibitively expensive for the user to opt out or change ecosystems.
I think there's a bit of a misunderstanding - you can turn off analytics on your e-reader without giving up the kindle platform. It's also separate from whispersync (which can also be disabled independently).
Just for clarification -- is this something that actually turns off the collection itself?
I'm seeing conflicting things online that range from "just hit this toggle and you're good", to "you can disable some of it, but not all", to "this only opts out of data processing for ads/analytics".
If there really is an option to disable the collection entirely, then that would mitigate a large number of the problems I have with that practice. Of course I'd love for it to be opt-in, but just giving the option would still be better than many other devices like Smart TVs.
Kindles have airplane mode and allow you to load books onto them using the USB connection. The battery also lasts somewhat longer if you use them that way. Amazon directly offers a "Download & Transfer via USB" option for ebooks you purchase in their store, as well -- this is a relatively well-supported use case.
It does mean that if you want to be absolutely sure your Kindle isn't phoning home, you can't use the Kindle browser, and you need a laptop or similar to download the things you want to transfer over. It's not a perfect solution for everyone, but for the typical HN reader who is concerned about telemetry, it should work.
I've done this. Mine has been in aeroplane mode since the day I got it. I seem to remember having to allow it to connect to Amazon once when I first took it out of the box, but since then, no network connectivity at all, and zero problems as a result. It's been great.
I download the ebooks themselves using the Kindle application on my computer (if I'm using Amazon to get them, which I don't always), and then use Calibre to manage/import/convert/strip DRM from them. I don't need the sync functionality, or to be able to look things up on the internet (not being able to do that is a feature as far as I'm concerned!). I just want text on a page. I like the "e-reader" experience, and I have no desire to read books on a phone or tablet. I have one Kindle, and it comes with me if I think I'm going to have the opportunity to read when I'm out of the house.
Of course, if you're using Amazon to get your books they'll still build a profile of your reading habits, but there's something about tracking the exact parts of a book I'm reading, the bits I might linger on or reread, which feels extra intrusive to me, and which I categorically don't want.
> Mine has been in aeroplane mode since the day I got it. I seem to remember having to allow it to connect to Amazon once when I first took it out of the box, but since then, no network connectivity at all, and zero problems as a result. It's been great.
I also never connect my Kindle to the internet. (The phone app does connect.) You don't have to allow it to connect to Amazon once. Mine has never connected.
In isolation, "last read page" could surely be E2E encrypted. Amazon would know that I'm using a Kindle app or device, but everything else could be opaque.
There's no motive on Amazon's part to do it this way, it would be a hassle to implement, possibly not great for battery life, and I expect that users don't care much.
Frankly, I don't care much, in practice. In principle, yes; everything which can be kept private, should be. But Amazon knowing what page I'm on just doesn't discomfit me, the way the prospect of some company being able to read my messages does.
The pro-privacy crowd needs to choose it's battles.
The most common response about online privacy is "what does is matter if X knows Y? I've got nothing to hide".
People already don't care, and I guarantee they also don't care that Amazon knows what page they are on in the book the are reading. There are much bigger issues to focus on
Can't you do lost of those things by sending encrypted data to Amazon, and getting back the encrypted data from them? They act as a storage in most cases, not as a server, no?
You'd have to figure out some kind of secure key sharing mechanism between phones, tablets, web browsers, and e-readers.
Or, you can trust that a position in a book (bookmarks, notes, etc.) is not sensitive information that really needs to be encrypted. This is my - perhaps overly pragmatic - position.
Simply purchasing/owning a book on that topic would be enough for an oppressive government like China, they wouldn't need to know where in the book you were exactly.
>You'd have to figure out some kind of secure key sharing mechanism between phones, tablets, web browsers, and e-readers.
Yeah, it's not like Amazon can afford security experts to work on this or anything.
>Or, you can trust that a position in a book (bookmarks, notes, etc.) is not sensitive information that really needs to be encrypted.
This is an ignorant position that has been proven wrong over, and over, and over again. Private data should be secure by default, because otherwise eventually someone will figure out how to abuse it. This is a lesson form bazillion fraud schemes and social engineering hacks everyone in tech should have learned by now.
Amazon could also afford to fill the Panama canal with dirt and reunite the American continents, but why would they? A dozen angry (potential) customers on HN is hardly motivation.
If all data is secured by default, then the identification of PII is not about deciding to secure that data, it is about identifying where we might impose (and often this isn't required, but now we can consider it) additional UX burden or complexity in order to add _additional_ security.
If I can't think of a way to abuse me by having my data, it doesn't mean that someone else doesn't. I would really rather avoid all this discussion by them not having my data to begin with.
If you know of an alternative that offers client-side encrypted sync, I'd love to hear it. I'm considering alternatives to the Kindle as well, even if for reasons unrelated to the analytics.
Encrypted between me and Amazon (such that Amazon could see the content), or encrypted between my devices such that Amazon can't see the content (but only the encrypted form)?
>the location data doesn't need to be sent with every page turn, for example
why not? if i open a book on my phone that i stopped reading on my kindle, i want it to open to the last location i read to on my kindle. not ten pages back because it doesn't sync data every page turn for some imaginary privacy benefit.
>To sync a "last read page" across devices, you need to send a location back to Amazon. It's also appropriate to tie a location to a device, so you can pick the appropriate device to sync your position from.
Why is location needed for that? Shouldn't a device id and account work just fine? I don't need to share my location to sync other devices.
"Whispersync is on by default in all new Kindles, but you can turn off the option on individual devices if you have multiple readers attached to your account."
The aggravating bit - beyond the fact that Amazon doesn't let you opt out, is that this sometimes affects performance. Switching over to the kindle app occasionally hangs. Killing the app and restarting it usually works, but there are times when I have to go to airplane mode and kill and restart the app just to open a book!
As a former Kindle developer, I can say that most of what's mentioned in this article are metrics used to understand how the features are used (bookmarks, highlights, dictionnary, etc.), how much they are used, and in which country.
This allows the teams to focus on features that are actively used, and sometimes lead to discontinuing features that see little to no use.
Hope that helps.
Agree. Opt out at the minimum. How did software and features ever get done before telemetry?
Efficiency is not always the best humanistic approach. So maybe they support unused features and maybe they let some features wither that lots of people like. Maybe it would make things cost a little more. I think people would be ok with some of those inefficiencies.
That's how every company rationalizes the mass collection of user data. "Oh lets collect many terabytes of every user-action in case we need to one day discontinue a feature".
It's a book. You don't need to collect and track every fucking action I do to find out if your stupid highlighter is being used in Poland.
Whether you like it or not this collection does lead to better products - that is why you think every company does it because those that don’t usually die out. Understanding your users is vitally important.
Privacy LARPers are a tiny segment of the market, the average person doesn’t really care if their ‘usage of the highlighter function is tracked’
>Plenty of companies are quite transparent about their data collection practices (set up an Apple device recently?)
I have not, not recently, but what you say is simply bullshit. They're "transparent" in that they give you a ToS loaded with legalese that they know you couldn't easily read through to find just how much and where they're squeezing your life for information to store. In cases where they simplify this with some less legalistic declarations of data use, what you often see there are numerous weasel words and phrases to very ambiguously describe what's being done. You know, things like "We MAY collect some information for the sake of improving user experience" and blah blah....
Then of course, there's the outright lying, which also happens, in which big tech companies simply fail to mention some types of data collection anywhere (the Amazon Alexa voice recordings being listened to by humans is a good example iof this)
You're presenting the shining example in the corporate world of responsibility with customer data, Apple, with every other company and saying that everyone does it this way?
Most companies hide it in legalese. Some companies claim they're not sending any data and then send it anyway. Looking at you Philips Hue lights.
This I wrote. I didn't write "companies should loudly advertise something people don’t care about" -> you added something to my sentence, taking it out of context.
I wrote my opinion already, but I'll repeat it anyway in case it was not clear. I think you can't know if people care about it or not, as long as they're not informed about it.
What do you believe syncing means? This discussion talks about whispersync reporting last page read and most recent page read events. What do you think that's supposed to do?
You're the only one fabricating accusations about "analysing" in a discussion about how Kindles send data with whispersync, a system widely known to be used to sync data across devices.
More importantly, the only usecase mentioned in the discussion that resembles anything like analysis is synching page reads across devices, and tracking reading progress to compensate authors who make their books available through subscription services.
Either you know stuff about "analysing" that for some reason you're keeping a secret, or you're talking nonsense about stuff you have no grasp over.
> "most of what's mentioned in this article are metrics used to understand how the features are used (bookmarks, highlights, dictionnary, etc.), how much they are used, and in which country."
Besides, I don't appreciate phrases like "fabricating accusations" or "you're talking nonsense about stuff you have no grasp over". I'm may be wrong, it happens often, but even if I am this aggressive tone is not in place. You can point out my mistakes politely if they exist, same way as I do with yours.
This is an unnecessarily denigrating term at this point in the conversation. It's not LARPing to want to be able to read a book or take notes without being tracked.
> It's not LARPing to want to be able to read a book or take notes without being tracked.
Absolutely agree but it is LARPing to pretend this collection is for anything but improving a product. Nobody is out to get you and nobody particularly cares how often you specifically turn the page (the data is useful in aggregate).
> We also use it to develop and improve products and features for all our customers and to gain insights into how our products are being used, assess customer engagement, identify potential quality issues, analyze our business, and customize marketing offers.
Targeted marketing is, in itself, something that's reasonable for someone to want to block regardless of whether or not there's a mustached villain tracking you. Privacy is about more than stalkers, it's about the effects of data usage. For some people, targeted advertising is a harm regardless of whether or not the company knows their name.
To go a step farther, I also don't understand why it's LARPing to be worried about a company who is actively being investigated for misusing seller data.
I bring this up every time that one of these threads/stories gets posted, but there's (appologies, but for lack of a better word) some kind of weird gaslighting that always happens in these situations. Before it broke that Echo and Siri queries were sometimes listened to by 3rd-party contractors, if I had posted that suspicion on HN people would have called me paranoid. Once the story broke, the argument then shifted to, "well of course they're doing that, how else would you improve the service?" That kind of thinking applies to Amazon as well.
I don't know that it's likely, but I don't think it's outside the realm of possibility that Amazon might use this information in the future to help target pirates, change book rankings on their store, perform highly targeted advertising and book recommendations, or turn it over during government subpoenas. Those are completely reasonable usages that their privacy policy leaves them permission to do.
Similarly, I don't know that it's likely, but it's not outside the realm of possibility that this information might get sent to 3rd parties with less responsible data practices, or that employees might be given direct access to it in an unobfuscated form[1]. It's not something I'm losing sleep over, but I wouldn't be shocked to my core if someday all this information got leaked publicly and correlated to people's email addresses.
These are all situations where privacy matters regardless of the original intention. The "I only want to make my service better" defense applies to basically all data collection that most companies do. Even advertisers use that defense. It's reasonable for people to want to avoid being a part of that.
Of course, it's also reasonable for people not to care, to say that hacking is a risk they're willing to live with, and that they don't mind targeted ads, and that the books they read aren't sensitive. But it's not LARPing if someone has a different opinion on whether or not they want to tolerate that stuff.
> Whether you like it or not this collection does lead to better products
Maybe it's just me but every tech product I use these days gets worse over time. If something does get better, two things get worse. They mostly try to optimize for user engagement and not user experience.
> Understanding your users is vitally important.
And the only way to understand people is spying on them?
This is not true. What if for example you want to make a change to the dictionary feature because you imagine that it’s not useful and should be less prominently accessible. How would you measure if this is a good idea or not without tracking its use? This has nothing to do with business and everything to do with making the product better.
Sure, there’s an example where best case the user experience is improved and business metrics aren’t affected. But I assure you if that app has a decent analytics setup they’ll also be tracking business metrics, and if for some reason business metrics went down with that change past some acceptable threshold, that change won’t be launched.
Now if you look at opposite case, where a feature is worse for user experience but helps business metrics, that feature will definitely be launched. A small, mostly harmless example: Ever tried to hide twitter’s recommended accounts? It gives you the option to “see less often”, but curiously there’s no option to stop seeing the window forever. Why? Because clearly it benefits twitter’s business on average to keep showing these recommendations.
I’ve built enough dark patterns at my last job to know it always comes down to business metrics.
Exactly. At the end of the day it's about profit and not necessarily a better product. Sometimes more profit means making a better product for the end user.
'Privacy LARPers are a tiny segment of the market, the average person doesn’t really care if their ‘usage of the highlighter function is tracked’'
Which is exactly why we have regulation that forbids these practices, to protect the gullible from themselves. Furthermore, do you think privacy should be the privilege of just those that are smart and keen enough to be aware and prepared to engage in a relentless and perpetual battle with the most dark of patterns with every click they make?
Most of the world-famous libre software is built without their developers study of massively collected usage data ("telemetry").
I look at VLC as a great example to follow. Their stats show 3.4 billion downloads (https://www.videolan.org/vlc/stats/downloads.html), yet they do no telemetry at all. The product works great. It could be improved of course, but Outlook could also greatly be improved, and they have high-salary staff and a boatload of data they extract from users. Yet it's slow as hell and has lots of UX I disagree with.
I'm myself the author of a replacement of Windows "alt-tab" on macOS (https://alt-tab-macos.netlify.app/) which doesn't do any telemetry. I can lead the roadmap, with the help of the community, without spying on how users set their preferences and use the app.
As a matter of fact, it can be argued that acting that way can be negative value as it's reinforcing popular usage; or from the power-users perspective, dumbing down the software. By definition, advanced features will have low usage. It doesn't mean it should be removed.
Lastly, think about non-software businesses. Many amazing products have simply no way to gather data when the products are in the users homes. They rely on gathering data by talking to customers at the points of purchase, customer care, are in various forums with enthusiast users. This model has shown great results, so it is in no way clearly to be avoided in favor of telemetry-everything.
TBH the argument that it reinforce popular usage is a valid one, at MS we were taught again and again on how to design good experiments using telemetry but at the end it's hard to support changes when your data shows that something is working properly, and UI changes tend to produce a dip in usage or satisfaction graphs until they catch-up.
It doesn't really matter does it? You don't collect data without consent, period.
Why is that so hard to understand?
Why don't developers ever push back against this sort of thing? Collectively we build this stuff, we are not 'soldiers following orders' which makes us responsible for what we create.
The current actual use is not relevant. Consent and the possible uses are relevant.
strawman; you visit someone else’s server, and therefore they get data about your visit; with kindle, you’re using your own device and there’s no expectation that amazon will be snooping
"you visit someone else’s server, and therefore they get data about your visit"
I don't think the average person knows this. A lot of people even have no clue about internet. So there is no consent most of the time. And we, the developers, just let the logs running.
"with kindle, you’re using your own device and there’s no expectation that amazon will be snooping"
Well I would absolutely have this expectation. I expect a device that is connected to the internet snooping on me. Then there is the Amazon brand. I absolutely don't trust them so I expect them to snoop in me.
But to be clear: I absolutely hate that my privacy is gone. I use all kinds of blockers to disable tracking and I also agree with jacquesm snooping is wrong. But I still think his point is too black and white and therefore unfair.
>Every webserver logs the IP address and the URL visited.
I maintain a webserver - https://git.sr.ht/~ancarda/tls-redirector - that has no support for logging. If you wanted logs for some reason, you'd need to modify the source code to add that functionality.
Granted, tls-redirector isn't a general purpose webserver, but even in production I tend to turn off logging. I just don't see the need to have logs lying around that I never use.
No, not every webserver does. This is something that you could easily configure.
Yes, most people know this by now.
Yes, some developers push against this.
Also: It's the law. Collecting data without consent is not always legal. Whether that particular bit of data rises to the level of requiring consent is left as an exercise for the reader for their particular jurisdiction and industry.
GDPR actually forces all websites to carefully keep track of what gets logged and for how long these logfiles are retained. So yes, legislators are pushing back against the common practice of logging everything just cause.
I think the privacy-concerned end-user thinks, "Yes, I completely understand why this information is being tracked and how it would be useful to Amazon. But I still don't like it."
As a freedom-concerned citizen, I always completely understood the policies and methodology of dictators and tyrants, and how what they do is useful for them.
I was under the impression there was a revenue-allocation problem that Amazon needed to solve (Kindle Unlimited subscriptions?), that depended on reliable reading statistics. E.g. How many people read book A?
Wish I could find the article, but the implication was there were a ton of publishers attempting to game the system. For example, by publishing blank, very long "books" and having them "read" by software automation.
First, if an entity want my input and are going to use it, they should be decent enough to pay me for giving it. Why do users need to work for free for Amazon?
Second, is it opt-in? If not, then there's an ethical issue here, even if a manual opt-out option is given (does it?). If there's no opt-out, there's a double ethical issue.
Thirdly, is this data deleted once it's being used for the goals you mentioned, or is it kept, making it a risk both for leaking and for Amazing deciding to put it for a different usage in the future.
Are you genuinely surprised at this point? Pretty much all big tech companies were caught outright lying about user data collection. Why would you assume by default they don't try to get as much as possible? They are all based on ML, of course they do.
A year or two ago Amazon was swearing that humans don't listen to Alexa conversations until we learned they actually do. IIRC Amazon tried to backpedal: "of course they do, it is their job, we meant humans don't listen _for fun_".
At this point just assume the internet connectivity as such a warning.
Terms of service are written to be understandable by lawyers, not average end-users. At this point, understanding every terms of service, privacy policy, etc. presented by every piece of software, website, etc. encountered by an average user would require them to spend hours per week on it. This is assuming that they even have the language skills necessary to decipher the document (think of non-native English speakers, people without higher education, and so on.)
Creative Commons was on the right track with their human-readable licenses, see e.g. this example [1]. Apple is on the right track with their App Store "nutrition labels" [2]. This is what we need for people to make informed decisions. For physical objects like a Kindle, I believe such "nutrition labels" should ideally be put on the box (physical store) and website (online stores), so the consumer is aware before they go home and turn on the device (this makes it easier to compare the Kindle to a Boox or Nook at the store).
If the industry moved to a standardized disclosure form (e.g. something like the HUD-1 [1] in real estate sales), people would stop complaining about this.
Yes! Even when I try to read the terms of service, I find them hard to understand. I feel bad because it’s sort of shame on me for agreeing to stuff blindly. User hostile is a good way of putting it.
Payment is a fair point on Kindles, I get why web sites offers free services in return to commercials (and your data) but I paid for my Kindle and (most of) the content I read.
I don't think that will ease anyone with privacy concerns. People who are against government surveillance is not against the police catching criminals and solving cold murder cases. The Golden State Killer case was a very good use of DNA profiling and DNA databases being used to catch a criminal. The problem is that many don't trust the government to only use it for those cases, and many others don't trust the technology to have a low enough false positive rate to not cause harm to innocent people.
Understanding how the book reader features are used in practice is good. Selling the same data to a advertiser is bad. Profiling people into predefined groups is bad, and the technology has risk of having false positives/negatives that reinforce stereotypes. The law has yet to catch up to treat information gathered by libraries and information gathered by a developer of e-readers as being very similar in risks.
We can step outside of government examples, too, and find cases where corporations getting all data sciencey with this information have accomplished some pretty ucky - and also impossible to anticipate - things.
An instructive case here is Target figuring out that they could use customer purchase history to detect, with a pretty decent degree of confidence, when a customer was pregnant. They then proceeded to use this model to send out mailings, and those mailings resulted in people being outed in rather compromising and potentially seriously harmful ways.
Page location and page turn in there for syncing across devices, that's fine - ask the user 'sync across devices', if they say yes, not a problem. if they say no, don't send the data. Data that is stored would be something like 'currentlocation[$bookid] = $location'. Storing historical information (user was at location 1219 at 2020-01-06-05:12:41) is not required for that function.
Philosophy should always be store the minimum amount of data to provide the function that the user wants.
IP address is transitory and shouldn't be kept longer than needed for the tcp session, maybe it sticks in firewall logs, but that shouldn't be used for anything other than security.
goodread account details would only apply if you connect to goodread, I'm not sure what the benefit of that is, but I could see that 'user abc123 read this book' is useful data - again ask if you can send the data.
The primary way that helps is to communicate that everyone on the team appeared to think this is perfectly acceptable to do without communicating it to the paying customer.
I mean, we already knew this, but it means any and all Amazon hardware must be considered potentially hostile.
They have collected large amounts of data from pretty much day one on those devices.
Back when they had a cell phone in them. I was standing behind a guy who was supporting it. "Uh lets bring up where you are at? It says you are 10 miles off the coast of miami?...." "oh yeah I am calling from my yacht" "do you see any cell towers?" "no" "It kinda needs those to work. I am surprised I got the location data."
A Kindle comes with Kindlings, a lesser form of the book, where you are being read by Amazon while reading; you are working for Amazon in ways you might never understand.
The Kindling never leaves Amazon properties; it is not yours even though you paid almost the full price of a book.
If there is rule of law in the US and EU, these will eventually become free e-books, that is, separated from Amazon; they will regain the status and properties of the book.
Yeah I came here to say the same. I'm about as tin-foil-paranoid-privacy-all-the-things as they come, but the "invasive" data mentioned in the post don't seem particularly invasive to me, and collecting that data seems perfectly appropriate for the purposes you mentioned.
With all that said, I do dream of a PINE64 E Ink device (or something that's open and hackable).
Probably true - I’ll snatch it up the moment color e-ink is a thing, color is vital for most of the papers I work with and for books I prefer a smaller form factor so from my perspective it sits in kinda an odd part of the market.
I think a lot of people are sympathetic to that perspective while still wanting control over their privacy.
It's the difference between someone inviting you to come into their home for a visit, and you breaking in whenever you feel like to take notes on what they're doing.
I'm saying as someone who works in software I empathize with the idea of spending lots of time implementing a feature, tearing hair out over some technical issue, etc. only to realize no one uses that feature.
I'd rather people be able to opt-in, but conceptually I'm not really upset that people can see my usage patterns, etc.
That's my point. The data is both A) Invasive and B) Pointless, unless trying to do things they shouldn't on your network. But they still collect it for some reason.
> don't seem particularly invasive to me, and collecting that data seems perfectly appropriate for the purposes you mentioned.
Fine. So you allow them to collect it. However, don't decide for others if it's "invasive" or "perfectly appropriate" for them or not. Do it opt-in such that people who wants to share their data could do that.
Oh yeah, and offer them payment for that. They deserve it.
There are some features in software I rarely use. But those times I do use them they are utterly essential. If I find such feature has been removed I am incensed.
Metrics can tell that story though so you’re arguing a straw man.
Example: If you see that 99% of users have never used a function ever - you have a pretty good idea that it needs to be reworked or removed. You may also see a function that is used by 80% of users once a month, that you may opt to keep.
It's not so much that ubiquitous telemetry can't identify this, it's whether it's better for this than a focus group. You can have background telemetry with the focus group so you're not just giving customers what they say they want instead of what they need.
I'm not sure. While I understand that developer time needs to be cut down or restrained sometimes - though perhaps not at Amazon in this case, which concerns their core business -, your example could merely turn out to be a way of losing 1% of the users. Usage statistics alone cannot tell you whether your users hate or like a feature. Some features are always going to be used more than others.
What if that feature costs 30% of dev time? Without being able to measure you wouldn’t be able to make a good judgement. Imagine how science would work without experiments?
Wouldn't focus groups work better AND respect your users?
Devs think it is either telemetry or develop blind but in reality software was developed (and possibly was better) before telemetry using focus groups.
As a developer, that is how _your dev team_ used the data. Can you confidently say that the metrics weren't also being accessed by the marketing department for different purposes? Or that it wasn't being shared with Amazon's business partners?
I have quite often seen people here and on other tech forums assume that purchasing a Kindle means being locked into Amazon's ecosystem, giving up personal details, and having the risk that your books might be deleted. But you don't have to use the Kindle's internet connectivity: I have owned three generations of Kindle, and with each one I activated airplane mode the second I unboxed the device and I never turned airplane mode off. All my ebooks come from sources other than Amazon (mainly LibGen, for example), and they can be easily transferred over to the Kindle by USB because the Kindle appears as any ordinary USB drive to a computer.
If this practice ever get wide spread I would guess that the developers will limit airplane mode in someway in order to ensure that the device will call home at some point.
But it is a pretty clever hack to get a hostile machine to not connect to the internet as airplane mode is (I assume) regulated behavior.
Even if the developers take the egregious step of nerfing airplane mode, you can still "opt out" by not giving the device credentials for your WiFi network.
I had a kindle keyboard and it had 3g. It worked in a bunch of countries--slowly though. I remember reading blogs where people were taking the sim cards out and tethering using them.
To save money they could come with LoRA radios and sync when the opportunity arrises to a LoRA gateway, including meshing with each other to aggregate data to increase the likeliness of encountering a gateway. LoRA modules are pretty cheap.
...which would require a valid SIM. So just don't add one. If the device comes with a pre-inserted/hardwired/virtual SIM, well... several countries in the world require KYC-style registration of the SIM owner before networks are allowed to activate the SIM, so there'd still be an opt-out path for the user in such countries.
eta: My point being: Now you're in a twisty little maze full of corner cases, all different. Not the sort of thing much loved by Amazon (or any of the GRAFT).
Not in the IoT world. The 'owner' of the Sim, the company that sells the device, would have a deal with one or more network providers to allow access, and take care of facilitating data retention and identification regulation.
As of my current device (the Oasis), no, it does not appear to be this promiscuous. I can't speak to the analytics, but the whispersync and book downloading doesn't work unless you explicitly connect it to an AP.
> if the developers take the egregious step of nerfing airplane mode,
and I was responding that IF the developers decide to nerf the airplane mode it's very possible they will start using any open AP; some TV's are reportedly doing this already
You're allowed to use WiFi on planes (other than take-off and landing, currently) and Aeroplane Mode often allows WiFi and Bluetooth these days. The rules are changing pretty fast.
I have also owned three generations of Kindle! Like you, I've never taken any of them online.
Never supply a wifi connection during setup, and instead immediately engage airplane mode. USB transfer is easy with something like Calibre, which also handily converts ePub to Mobi for Kindle use.
It used to be that you could buy Kindle books and download them to your computer for transfer to the Kindle via USB, but they seem to have made that more difficult in the last year or two. Other sources still work fine, though.
I tried to do this with a recent paperwhite but some features seemed to require registration - the main one I cared about being "collections". Had to make a fresh amazon account, register it, then put it into aeroplane mode never to be reconnected.
Same here (sans the Kindle). All eBook readers I have bought have never been connected to a WiFi network.
If I want to change the books, i do it via USB.
That fact that Amazon collects these very detailed metrics has been well known for a long time. You will find old discussions in the MobileRead forum. Here is a thread from 2013 "Block Big Brother":
> I activated airplane mode the second I unboxed the device and I never turned airplane mode off.
Same, however I had to connect my Kindle Oasis to the internet 1 time after purchase though, if i remember correct it was to download the dictionaries (for translation) i needed.
And i think there was a feature that was missing until i connected it to the internet once (i used a new/temporary account for that) but can't remember what feature that was though.
You could have got those dictionaries from a filesharing community and simply copied them over to the Kindle via USB. No need to connect the Kindle to internet.
Any cheap budget table isn't e-ink, which matters for battery life and, at least for some people, reading pleasure. Also, I mainly use my Kindle for reading research papers in academia, into the hundreds of publications each year. So, after years of using these devices its UI (which I find admirably simple and straightforward) is burned into my muscle memory. So, switching to another series of devices would mean having to adjust to a new workflow that may well bring unwelcome complexities.
I've never seen one that has the bang for the buck of a basic paperwhite. I got my last one for under $100 and I never use the amazon nonsense. I just keep it in airplane mode and load my own books.
There's also Kindle competitors, like the Kobo. You can also bet that if this becomes a wide-spread concern, another e-ink reader may come to market that offers privacy and security, maybe some sort of open-source, secure-by-default, ereader. Some attempts at this have already been made [1], but its not clear how strong the market demand for that is and if it will be successful. If you really want a privacy-centered reading experience, the easiest way to do this is just borrow the printed book from your local library.
One reason is value. They produce so many, the quality is decent and the price is subsidized so it's artificially low.
Why is it subsidized? Obviously to make it more fun to buy books, but also collecting valuable data on your reading habits. Obviously they know _what_ you're reading but it seems useful to them also to know what you bookmark etc.
They also have all the hardware they need for location history tracking by remembering wifi broadcasts seen. Is it known if that's being uploaded?
Thank you! I did not know about this. This is a really cool development. Even if the saturation (as mentioned by
rtkwe) is not the greatest, this is a big step towards reading more analytical texts with colored graphs.
Kindle is just a great reading device. The only feature that I _might_ consider using that requires connectivity is the Wikipedia lookup, and the verdict so far is that Airplane mode is more valuable than that.
Same, although I just have the one old Kindle that I revived with a new battery. It's never had a network connection since I factory reset it. I just dump ebooks onto it via USB. It might be recording all sorts of analytics but I don't care because it'll never be connected to the outside world.
Plus, for all the people saying basically "it's for your own good", the battery lasts much longer on aeroplane mode. For this device, for me, WiFi is an anti-feature.
It will convert any format of E-Book to a compatible format for the Kindle (usually MOBI) and allows you to upload it directly. I use it often and it's an amazing piece of software!
I actually prefer KindleGen[0] to convert EPUB to MOBI - I find it produces a superior e-book.
Edit: Oh no, Amazon removed KindleGen! When did that happen? I still have x86 copies for Linux and windows if anyone wants it. Supposedly "Kindle Previewer" can do the same thing, but a cursory glance looks like it no longer supports Linux...
> Edit: Oh no, Amazon removed KindleGen! When did that happen?
We know where this is going, don't we? Offering interoperability in the beginning and then gradually taking it away - the old bait and switch trick. We consumers fall for it again and again.
Aghh, I have (had?) a project that relied on downloading kindlegen as part of the CI/CD step. I just downloaded it from Amazon each time.
I doubt I have a copy of it somewhere. Does anyone have / know of a copy of kindlegen for Linux anywhere?
Edit: Wayback machine has it, thank god. Had a very real sense of loss there. Unsure why it's a project I've hardly touched for years. https://github.com/wjdp/gotdict for the interested.
It works pretty well except for the latest generation of Kindle DRM. AFAIK, it hasn't been cracked yet. There are workarounds but the workaround result in a lower quality book.
I second that, I have older Paper Kindle with only wifi.
And I never connect to wifi, Calibre is excellent for managing kindle archive without amazon's breathing down my neck.
No it cannot. It only supports Amazon's proprietary ebook formats: AZW and MOBI.
I love the paperwhite, but the limited format support led me to choose a different ereader when I last bought a new one.
Edit: I know I could convert between formats, but that process is not always perfect and can lose important formatting.
Loading books using programs like calibre [1] allows you to covert EPUB to MOBI (the kindle format) seamlessly before transferring. In my experience this works perfectly.
The Kindle handles .mobi and .azw3. It is trivial to convert EPUB to MOBI before you send the book to the device over USB (it can even be done as part of a command-line script, for example).
Why ditch a device just for that reason? Again, the OP assumes that one will transfer books to the device over USB for privacy’s sake. In this case, creating a shell wrapper around cp to automatically convert any EPUB to MOBI upon copying the file (naming the command, say, kindlecp) is trivial.
Conversion can occasionally barf and completely screw up the formatting. It's annoying to have to go back and re-convert with different settings for a problem that's non-existent when native support exists.
I just tend to use non-Kindle applications/devices for this. It's always been extremely easy to get non-DRM ebooks into Apple's book reading app (formerly iBooks, now just Books, in Apple's ongoing quest to make most of their application names as boring as possible). Perhaps ironically this makes Books the "non-walled-garden" app for me.
The pitfall in all this, though, is that there are a lot of commercial books that are only available from publishers that use DRM, and personally I don't consider DRM a sufficient justification for piracy -- so that leaves me stuck with locked books regardless. Lately I've been buying them from Apple rather than Amazon, although if I actually jump through whatever hoops are required to set up DRM stripping with Calibre for Kindle books, assuming that's still possible, I may switch back.
Would you agree that your usage pattern of the device is very atypical? I suspect (no hard evidence) tat 99% of Kindle purchasers use them primarily to read Amazon Kindle books.
That document doesn't mention anything about Airplane mode at all. Nor does it describe the Significant Locations feature as "saving the list of wifi access points"; in fact, I'm fairly sure that's not what they're talking about, and instead they're talking about the feature iOS uses to determine that you tend to go to the same place for lunch on Tuesdays or the same friend's house on Saturday afternoons and offer that as a Siri suggestion -- which is almost certainly GPS-based.
Last but not least, Significant Locations data is not just described as "end-to-end encrypted and cannot be read by Apple", it's clearly in the list of items under "By enabling Location Services, location-based system services such as these will also be enabled": e.g., if you're really, really bothered by this, you can turn it off.
Yes, this document does not tell how and when exactly it works. I took an Apple Watch and tested it myself before writing this.
Even if you trust that Apple does not use it for anything else, you cannot check this (no source code) and you cannot be sure that they won’t start using it in the future.
Opt-out tracking is not ethical. It should be opt-in.
This is a bit unfortunate, because the kindle paperwhite is just phenomenal. It's easy on my eyes and it's a godsend for traveling. I suppose the solution here is to just keep it in offline mode when not syncing books.
[edit] as others have noted, it's possible to permanently use offline mode, and transfer books via usb cable.
> Unfortunately, in order to use a non-Kindle application, I have to buy DRM-Free books.
One can remove DRM for amazon's ebook format (.azw3 ?) via some python scripts. You didn't hear it from me though.
> Each request also isn't sent as soon as it's generated. A number of these records are created and stored locally, then uploaded (note the sequence_number field). Even if a person is offline while reading, this data is stored and sent when reconnected.
>One can remove DRM for amazon's ebook format (.azw3 ?) via some python scripts. You didn't hear it from me though.
Not for the new KFX format. Only way to get around that is to use an older version of the kindle desktop app that downloads the azw format. Workaround won't last long though. And won't work on newer macs because the old version is a 32bit app .
Last I checked (a year ago?) KFX wasn't a great input format, as it's optimized for the Kindle readers and not for conversion/interoperability. That is, KFX is to AZW3 as PDF is to HTML.
Sure, but if the book you're looking for is only available on Kindle and your eReader is not a Kindle, then the conversion is better than nothing.
I've found some O'Reilly ebooks only available on amazon in the format "Kindle Edition" (ie. KFX). Pretty aggressive market strategy from amazon given EPUB3 is the technical standard, but there you have it.
On my Amazon account, I can download my purchased books as AZW3 from the following page: https://www.amazon.com/hz/mycd/myx ('Manage Your Content and Devices'). (As I understand it, AZW3 is mostly the same thing as EPUB3.)
(Either that, or the files I download from there aren't actually AZW3 files but just KFX files with an .azw3 extension.)
This is only one reason why I absolutely love my Kobo Aura HD, it's never been connected to WiFi. Its storage device is a standard SD card which can be swapped for a larger one. Oh, and it's not giving money to Amazon which is always a big win for me.
It also happens to be a super nice piece of kit, and it has my warmest recommendations.
That's a sensible approach, but sadly Kobo probably does something similar for those who are less savvy than you:
> We collect Personal Information when you use or otherwise interact with the Kobo Services. For example, we collect information about how you use the Kobo Services, such as pages you view, the rate at which you consume e-content (how often and for how long), genres, authors or subject matter you prefer and searches you make or share, the ebooks or audiobooks you have liked, comments you have left and also websites you have viewed through links in the comments. [1]
It's depressing that the market will not stomach the true cost of "dumb" hardware anymore, so it's becoming harder and harder to find. Everything that can be subsidised with hoovering up data, or pushing content, is. If this is the thin end of the wedge, I dread to think where we're heading.
I have an 2010 Kindle Keyboard and naively thought that we wouldn't end up here. The closer we got the less likely I am to "upgrade".
My kindle is in airplane mode since I opened its box and I send books to it via usb. No one is forcing you to use amazon services, I didn't even pay for the ad free version but I've never seen an ad.
I would say exactly the opposite. I regret of buying a book from Amazon [0] dedicated to Kindle-use, because it is DRM protected and I am forced to use "Amazon Kindle" application, otherwise I cannot open it. I am usually okay with DRMs but I miss a fact I haven't bought it elsewhere with less annoying protection.
> I've actually found it quite challenging to purchase books to put on my Kindle that aren't from Amazon, since they use a proprietary format.
While MOBI began as one's company's proprietary solution, the format is well over a decade old now and quite well understood by the Free Software community. Calibre can convert EPUB (or anything else, really) to MOBI, so you can buy or pirate your ebooks from anywhere and easily put them on a Kindle.
I'm sure someone like me always has the same "hot take" in every thread regarding this, but I honestly still love reading physical books. After spending a day weary of interacting with screens all day, there is something nice about tapping in to this activity that humans have done for hundreds of years. Sure, e-ink is easier on the eyes, but isolating myself with a good book can be a near spiritual experience.
E-Readers do a hell of a good job at emulating the experience with e-ink displays & you can't compete with the ability to carry 1000's of books in your bag, but there's something about the reading experience that I wish to keep completely 'analogue'!
I love reading physical books too, the user experience of them is so much nicer.
I also like to go back to re-read books.
With non-fiction I'll often want to go back to reference or quote something, and with fiction I love reimmersing myself in the worlds the author's create.
I've amassed quite a little library of books that I still enjoy having access to and it's lovely. But it's also /terribly/ inconvenient to move to a new apartment.
It's also quite annoying when I'm visiting a place, and I'd love to pull up a favourite story but didn't think to bring it with me.
I've started moving to a hybrid solution - My absolute favourite stories I keep in paper because I enjoy the feel, but for most books having them digitally much nicer.
I have great spatial memory for things I've read. I was able to pull up a quote from a book that I read the summer of 1992 seven years later because I remembered roughly where in the book and on the page the quote appeared. I could probably go to my library and find it still another 21 years later. I don't get that from e-books.
It would be quite interesting to know how this data is actually used on Amazon's servers. It reminds me of the criticisms of government data collection programs, that they just hoover up every bit of data that's available without actually knowing what to do with it. Suppose you train some AI to predict what pages in a book will be most engaging to the reader. Since your interface to the book is still just going to be something where people can turn the pages what are you actually going to do with that information? It's a massive sacrifice of the privacy of the user for small gains at best in getting insight into the user's behavior. I wouldn't be surprised if this information is sitting in a database somewhere at Amazon completely unused.
The philosophy of Amazon appears to be to do as much as possible in the hopes that one day it will be useful. This is at odds with the principle of philosophical skepticism, that because we can't be sure of the consequences of our actions we should strive to do as little as possible. The data could be hacked and leak out, for example. There is tremendous uncertainty around things like that.
I formed my opinion before clicking the article, already working out some comments in my mind like "who's surprised?" After reading the article though, surprisingly my opinion changed. This doesn't seem all that bad. I don't doubt that Amazon is over-collecting, but the samples he posted seem like it's just information for syncing reading position and settings. Of all the nefarious things Amazon does with data, I don't think that's one of them.
I did some research on early Android sending a bunch of data back to Google's servers, a few months later the information was encoded/encrypted before being sent over the wire. I'd be curious if the next app version of Kindle started obfuscating what it was sending back home.
Why would you leave on wifi on an e-ink kindle, when not actively downloading a book? The battery lasts 3-4x as long with it disabled (on my 3rd gen device at least).
I doubt most users need a real-time sync of their book location to the cloud, unless they read on multiple devices.
Also, if you use the kindle to get loaned/library books on this particular model, they aren't removed even if the due-date is exceeded until you reconnect to wifi, which has been handy at times...
> Why would you leave on wifi on an e-ink kindle, when not actively downloading a book? The battery lasts 3-4x as long with it disabled (on my 3rd gen device at least).
I concur with keeping the wifi off while not downloading, because battery life is way better, but it doesn't help against data collection.
> Each request also isn't sent as soon as it's generated. A number of these records are created and stored locally, then uploaded (note the sequence_number field). Even if a person is offline while reading, this data is stored and sent when reconnected.
> Why would you leave on wifi on an e-ink kindle, when not actively downloading a book?
One of the much-advertised features of the Kindle is its ability to highlight a word and look it up against a dictionary, against Wikipedia, or against the web.
You don't need internet connectivity on the Kindle to look up a word in a dictionary. The Kindle supports dictionaries in Mobipocket format, so the dictionary lives right on the device. It is easy to find .mobi dictionaries for major languages freely available from torrent communities.
Using the Kindle's Wikipedia function actually requires going through Amazon's servers and is a privacy violation, so I would not recommend users do that.
I’m not surprised, but I suggest the Kobo e-reader to the OP. Can use multiple formats, easy to upload books to it, and some models have expandable memory. You can completely disconnect it from the internet if you want.
So it seems I am not allowed to read up about this reference.
Or some underpaid developer messed up the redirects.
Either way this issue about data collection is interesting in its own right, but this other issue of global redirects also feels important, but I only say that as someone who tried to follow the news here.
> Unfortunately, in order to use a non-Kindle application, I have to buy DRM-Free books
No. All you have to do is own an old Kindle (buy one on ebay if necessary). Then you can download DRM protected Kindle files from Amazon for this old device, and Calibre and the appropriate plugin can un-DRM them, and transform them in any other format (epub, mobi, text, rtf...) for you to use on your app of choice.
It's certainly better to buy DRM-free books directly if you can find them, but the above solution works quite well.
I use my Kindle Paperwhite completely offline. I factory reset it and haven't connected it to WiFi since and just side load what I need(I did have to strip the DRM from my Kindle books to side load them on the unregistered device). I never really used the online features when it was registered previously and kept it in airplane mode to help with batter life. Another bonus is that if a freshly reset Kindle never connects to the internet, you never get the ads.
Kobo's have comparable (even superior, IMO) hardware to the Kindle line. The thing that everyone who migrates from Kindle to Kobo seems to get hung up on is that it does not have an option to wirelessly sync books that have been sideloaded across devices. This is because Kobo does not give everyone a private cloud like Amazon does (I imagine it would be prohibitively expensive to do so for anyone but Amazon).
It's not a big deal for me, but apparently it's a dealbreaker for some Kindle refugees that they can't start reading a sideloaded book on their phone and pick up where they left off when they open their Kobo.
I have a $350 Kobo Forma and the UI is so slow compared to my $200 Kindle. It takes a long time to startup and it has horrible & slow touch detection which makes it really hard to highlight quotes properly.
I don't see why that should be expensive/difficult. Ebooks are mostly small files. It would be hard to ramp up a gigabyte unless you end up with image laden items such as pdfs.
Synching can be an issue. I had a one of the early kindles, and it was fine until I hit a few hundred items. It would re-index and be completely unresponsive for 10minutes at a go. That could have been done cloud side. In the end I decided I needed to purge loads of documents/titles to get it useful again. But accidentally sat on it. So game over. Moved to a simple Nook and SDCard loads.
I've switched from Paperwhite to Kobo (Aura I think?) and the highlighting feature is really making me miss my Paperwhite.
1. I can't highlight text across pages.
2. There's also an issue in which I navigate to some highlight and the text gets shown in a dark grey against black background, making it nearly impossible to read.
3. Since I can't highlight text properly (thanks to issue 1), I can't simply extract my highlights from a book, so I have to manually type it on a laptop, which is a painful experience thanks to issue #2.
Though I haven't analyzed other devices (because I don't own them), they could easily have similar issues. I personally really want an open e-ink device, but I haven't seen one for sale unfortunately. For now, I do Calibre ODPS server with Marvin app on a phone, but it doesn't really compare.
I have an Onyx Nova 2 and I like it quite a lot. It runs android and has access to the android ecosystem, so I can read my webnovels and mangas and even kindle books without needing to use any external applications like Calibre.
I read this comment on my Nova 2. It's a very nice capable device for tasks like web browsing, email, and note taking (either with the pen or Bluetooth keyboard).
Got my mother-in-law a Kobo Forma. Relatively pricey but I was able to walk her through how to check out a book from her local library via Cloud Library & transfer it to her device. Was a life-saver while the physical library was closed due to Covid-19. I was a little concerned as there were complaints about fabrication but her experience has been very positive.
Would love to replace my Kindle with another device. Any recommendations? - Also, I appreciate a local file on the Kindle that logs all my highlights (this file is called `My Clippings.txt‘. I parse that file and have a wonderful summary of the books I read. Any other ebook reader that creates a file like that?
It will make people uncomfortable, but this is standard practice in terms of event collection for analytics. Many articles here write about discovery from the side of a particular app or site.
If people reviewed some analytics solutions (many trials are available), then they'd see how pervasive this is and what product vendors are encouraging. The like's of Amazon have much more scrutiny around the use of data collected than those of smaller organizations. Obviously, they wield great market power so the concerns are broader, but an attacker has a much better chance of raiding smaller developers for volumes of data with much the same fidelity.
Some users also buy a Kindle which is subsidized by ads? I pay to avoid this and change privacy settings..
If you are using a device designed to market to you - they almost all run ads and collect analytics. I guess this is technically not a user facing feature, but it provides some user benefit (cheaper price).
Does anyone know sales breakdowns? If everyone is concerned about privacy / not being marketed too I guess the versions with ads are not selling. But I've been surprised not that marketing platforms collect data (authors website did) but that most users don't care about this "abuse" that the author is so concerned about.
The early Kindles didn't do this. It used to annoy me to no end that I'd have to manually tell my Kindle to sync when I was done reading.
Originally, I didn't realize this. I learned this when I'd pull out my phone in a waiting room, or on a train, only to not be anywhere near where I last read the book on my physical Kindle.
Now, I'm quite happy that Kindle syncs aggressively. I use an old phone to read in my hot tub, and it's great that the book opens up to the last place I read it, no matter which phone I'm using.
I can't find a reference to it now, but I recently read something referencing the massive quantities of kindle data amazon give you when making a GDPR data subject access request. I think it was something like 100k rows of data for one user.
Are these requests sent to a separate domain? I may have missed it in the article but it’d be great to know whether we could null route these without disrupting functionality.
I’ve found similar concerns in an official church scripture app which I will not name.
It was sending an enormous amount of data back to the church including what the user was reading and for how long, everything the user highlighted or bookmarked etc.
It was enough to really question the need for such data.
I really believe that if that data served a legitimate purpose to the functionality of the app (which I’m sure a lot of it did) then the data should have been saved locally on the users device.
As much dang money as Amazon makes off kindle, why are they also spying? I guess "because they can" will always be a useful refrain, but I really wish there was plain english version of what information they collect at any given company/web app/mobile app/OS kind of like the attorney general's warning. Not something that is 20 miles long with legalese that any non-attorney can decypher
After I put Pihole on the network, wife's kindle was almost immediately the biggest offender.
That said, the article appears to list activity type ( which is problematic in itself -- time stamp + person is reading now ). I can see a legitimate use for it, but I also hate the idea of being profiled in that way.
To be perfectly honest, Kindle does not seem to pull more than average Android phone ( thought that is problematic in itself ).
I have a 2015 Kindle Paperwhite. I've put it on flight mode the day it arrived and it never went online again. Yes, loading new books takes slightly more effort (I use USB transfer with Calibre) but the peace of mind I get is more than worth it. Unlike OP, Amazon can neither track my reading habits (beyond my ebook purchases) nor delete anything from my Kindle.
The character analytics stuff is probably contractual obligations they have to publishers. The publishers probably want to double check the way people read as well and ensure that they are paid out correctly.
The other logging, as someone else mentioned is probably analytics for their own product development.
I was always curious why Amazon's Dynamo was co-developed for Kindle. Kindle didn't seem like the sort of product that required its own scale-free key-value store. An object store, certainly (for the books themselves); and maybe a relatively-mundane sharded key-value store, for read positions.
Amazon loses when users take the discounted kindle, never enable wifi and source books from libgen. These users would be addressing their privacy concerns and saving money. Perhaps it isn't the largest market, but Amazon isn't exactly incentivizing participation with these privacy policies.
So one way to avoid all data gathering might be to keep your Kindle on airplane mode permanently and load/remove books via USB. Battery would last longer too. It also kills ads on the cheaper version of the Kindle.
What is the surprise? Who doesn't collect data? As long as that data is anonymized and used for improving their product(s), I am fine. It will be scary if the data is used for selling ads/data itself.
I had a funny situation with kindle. It was connecting to the internet all the time, I enabled airplane mode and then it started complaining about it all the time.
Out of spite I added password to my wifi (I didn't have any and I even named my hotspot smth like "free" for my neighbors to use, wouldn't do that now).
To my surprise, some ~8months later I discovered my kindle to happily connect to my wifi. I'm pretty sure I would never enter the password there, because the kindle was the reason I added password to begin with. Maybe there is some more sane explanation than "kindle bruteforced my wifi", like a bug or some nuance in authorization protocol?
edit: it happened 7 years ago with kindle 2013 paperwhite.
Our local library does drive up pick up. Obviously not as instant as a download... but man it is nice to leave the house for a few minutes. Kills two birds with one stone.
The biggest difference in my mind is that the Kindle is hardware you purchase.
It has no need to be sending that much data, including attempting to find out the local IP.
The article stated that a few seconds of usage sent 100 requests to Amazon servers. I'm fairly certain that most websites don't make quite as many requests as the tablet did.
"Large corporation collects massive amounts of data, including data that could only be useful if trying to do something malicious on someone else's network."
That doc also includes instructions for how to opt-out of this collection:
> you may opt out of processing of your personal data relating to the use of your Kindle e-reader collected by the operating system of that device ("device usage data") for marketing and product improvement purposes via All Settings > Device Options > Advanced Options > Privacy. If you turn this setting off, we will stop processing this device usage data for the purposes of serving you customized marketing offers and improving our products and features. Turning this setting off will not affect... your ability to use features of the device, such as data syncing or backup features or Special Offers we display if you purchased a device that includes Special Offers, as we will continue to collect and process your data to deliver those features to you
I'm interested to see whether this sort of biometric/behavioural data will ever be thought of as Personal Data under GDPR (since I bet you can identify someone from their browsing behaviour, just like you can using walking gait and typing cadence). If that was the case you'd need to present an opt-in when you first booted the device, which I think would resolve the complaints from most folks in this thread.
> Each request also isn't sent as soon as it's generated. A number of these records are created and stored locally, then uploaded (note the sequence_number field). Even if a person is offline while reading, this data is stored and sent when reconnected.
That being said, if you leave airplane mode on permanently and sideload books, you should be fine.
> you gave Amazon a lot more information when you purchased the kindle from them.
Kindles are sold in physical locations – at least in the EU, many Kindle owners got their device from a local electronics shop. You don't necessarily have to order them from Amazon. Then, when you unbox it, there is no requirement to register with Amazon or even connect to the internet at all.
To sync a "last read page" across devices, you need to send a location back to Amazon. It's also appropriate to tie a location to a device, so you can pick the appropriate device to sync your position from. And, when you highlight a word, the translation, definition, and wiki page is brought up, so of course it's being sent to bing and wikipedia.
There are valid concerns here (there's too much information being sent overall - the location data doesn't need to be sent with every page turn, for example), but these concerns are being buried behind FUD about none of this data needing to be transmitted.
EDIT: Can I also point out the ironic nature of griping about Amazon's analytics collection while running an analytics suite on the webpage yourself?
zql=Kindle%20Collects%20a%20Surprisingly%20Large%20Amount%20of%20Data pqo=1 xfg=1 xqi=946451 h=8 m=58 s=11 eqm=https%3A%2F%2Fnullsweep.com%2Fkindle-collects-a-surprisingly-large-amount-of-data%2F uel=https%3A%2F%2Fnews.ycombinator.com%2F nvn=b271bb7f9e0fe444 xpx=1598364493 bqq=2 oso=0 ajh=1598366510 lyz=1598364493 _ref=https%3A%2F%2Fnews.ycombinator.com%2F euq=0 cookie=1 res=1080x1920 fpr=429 rlp=xnxpI1