If you make a search query, but have not clicked on any results, you have a privacy expectation that the web servers of the search results you have not clicked on will not know you performed this query, your ip address, cookie, etc. For example, if you search for [headache] and then close the window, mayoclinic.com knowing that you made this query would probably be a surprising result.
With naive preloading, you would preload a search result from that origin. Your browser would make an HTTP request to the site and that site (sending an ip address, the URL you are preloading, and any cookies you may have set on that origin). So, this approach would violate your expectation of privacy.
Instead, if the page is delivered from Google's own cache, the HTTP request goes to Google instead of the publisher. Google already knows that you have made this query, and are going to preload it (the search results page instructed your browser to do so in the first place). The request will not have any cookies in it except for Google's origin cookies, which Google already knows as well. Therefore this type of preload does not reveal anything new about you to any party, even Google.
AMP has been doing this for a long time in order to preload results before you click them. However, until Signed Exchanges the only way to do this was that on click the page would need to be from a Google owned cache URL (google.com/amp/...). With Signed Exchanges, that can be fixed. The network events are essentially the same.
Note that once the page has been clicked on, the expectation of privacy from the publisher is no longer there. The page itself can then load resources directly from the publishers origin, etc.
To your last point, if someone posts a link on twitter to an AMP page on a publisher domain, and then you click it, your browser will make a network request to the publisher's origin. Google will not be involved in this transaction in any way. If someone explicitly posts a link to an Google AMP Cache Signed Exchange, then yes this will trigger a request to Google but this will be far less likely going forward as these URLs will never be shown in a browser. For example, try loading https://amppackageexample-com.cdn.ampproject.org/wp/s/amppac... using Chrome 73 or later. This is a signed exchange from one domain being delivered from another. You'll never see that URL in the URL bar for more than a moment, so it's unlikely to ever be shared, like I'm doing now.
Thanks, this was very informative. I'm not a fan of AMP at all, but this helps me understand the reasoning a little bit better and why Google hosting the AMP cache is necessary for preserving privacy.
At its root, I think my objections to AMP boil down to a few things:
On a technical level:
1. It's buggy and weird on iOS.
2. I'm not convinced I care about a few seconds of loading time enough to justify the added complexity of making this kind of prefetching possible. Additionally, this seems like a stop-gap that will be rendered unnecessary by increasingly wide pipes for data.
On a philosophical level:
3. It gives Google way too much power over content.
4. I want the option to turn it off completely because of points [1] and [3], and because I fundamentally want to feel in control of my internet experience.
Edit: The point about SXG making AMP URLs less likely to get copy/pasted to other mediums is a key benefit I hadn't considered and will likely make avoiding AMP outside of Google search easier.
2. How many URL's do you load in a day? My browsing history over the last 10 years averages to 417 pages per day. 2 seconds per URL is 35 days of my life...
Bandwidth increases do not fix latency. If a document has to round trip from the other side of the planet, that adds about 200 milliseconds until we break the speed of light. If that same document must make several round trips to be able to initially load (very common!) this adds up rather quickly. The only solutions are localized caching and prefetching.
Yes, exactly why people think this is creepy. I also expect you not to start using battery rendering shit I haven’t asked you to in the background or data that again you don’t have permission to use. Just because the majority of users don’t care doesn’t mean you gregable are not corrupting the foundation of a free web. I still feel you’re making the web super creepy, grabbing extra data and the whole project focused could be accomplished without this embrace and extend - derank slow pages more aggressively doesn’t lead to a two tier web and doesn’t tie everyone further into Google’s brain washing algorithms. But this “solution” to the problem at least for now Chrome doesn’t visit Google for these new style links from elsewhere so at least that is some improvement. The fact this whole project should not exist and adds zero value and I can’t opt out is a massive problem for me.
If you make a search query, but have not clicked on any results, you have a privacy expectation that the web servers of the search results you have not clicked on will not know you performed this query, your ip address, cookie, etc. For example, if you search for [headache] and then close the window, mayoclinic.com knowing that you made this query would probably be a surprising result.
With naive preloading, you would preload a search result from that origin. Your browser would make an HTTP request to the site and that site (sending an ip address, the URL you are preloading, and any cookies you may have set on that origin). So, this approach would violate your expectation of privacy.
Instead, if the page is delivered from Google's own cache, the HTTP request goes to Google instead of the publisher. Google already knows that you have made this query, and are going to preload it (the search results page instructed your browser to do so in the first place). The request will not have any cookies in it except for Google's origin cookies, which Google already knows as well. Therefore this type of preload does not reveal anything new about you to any party, even Google.
AMP has been doing this for a long time in order to preload results before you click them. However, until Signed Exchanges the only way to do this was that on click the page would need to be from a Google owned cache URL (google.com/amp/...). With Signed Exchanges, that can be fixed. The network events are essentially the same.
Note that once the page has been clicked on, the expectation of privacy from the publisher is no longer there. The page itself can then load resources directly from the publishers origin, etc.
To your last point, if someone posts a link on twitter to an AMP page on a publisher domain, and then you click it, your browser will make a network request to the publisher's origin. Google will not be involved in this transaction in any way. If someone explicitly posts a link to an Google AMP Cache Signed Exchange, then yes this will trigger a request to Google but this will be far less likely going forward as these URLs will never be shown in a browser. For example, try loading https://amppackageexample-com.cdn.ampproject.org/wp/s/amppac... using Chrome 73 or later. This is a signed exchange from one domain being delivered from another. You'll never see that URL in the URL bar for more than a moment, so it's unlikely to ever be shared, like I'm doing now.