AMP pages displaying your own domain

IgorPartola · on April 17, 2019

I can’t believe there is no way to opt out of AMP as the end user. The UX is so terrible. Often times I will search for something and have a Reddit result come back. When I tap the link, I get the AMP page which:

* does not show all comments, often ones I am actually looking for

* does not let me collapse comment sections

* uses the default white background theme which burns my retinas if I am looking at my phone in a dark environment

* shows overlay ads for the Reddit app that cover about 40% of the screen for no goddamn reason

* requires 2-3 separate actions to get to the original page

Yet I cannot find a browser extension or setting to tell AMP to fuck off. Honestly AMP might be what finally gets me to switch search engines after many years of using Google.

runn1ng · on April 17, 2019

Frankly reddit is the website that has the worst AMP implementation by far.

In contrast, say, Urban Dictionary is undistinguishable from the real thing.

smilliken · on April 17, 2019

You won't see AMP if you switch search engines.

bo1024 · on April 17, 2019

Using DuckDuckGo with a backup of !g (send search to google), I don't think I've ever hit an AMP page in search results in my life. Maybe because I only use !g for really technical searches.

vanadium · on April 17, 2019

!s takes you to Startpage, which, if you're trying to avoid Google, gets you where you want to go by proxy.

jdgoesmarching · on April 17, 2019

Just this morning I got pissed off by an AMP page and was considering a search engine switch. Maybe this is my sign.

maxwell · on April 17, 2019

AMP was what made me abandon Google Search in January 2018 for DuckDuckGo.

Surprising how little I've noticed the change, after using Google Search for over 15 years. I try queries on google.com maybe once or twice a week if I don't find what I'm looking for on DDG. If it's anything media or product related, I feel like I'm on an old, crowded MySpace page. DDG feels more like the old Google.

IgorPartola · on April 17, 2019

You convinced me. Switching now. Fuck this noise.

m-p-3 · on April 18, 2019

You can even tweak your DuckDuckGo settings (dark theme, etc) and save them in DuckDuckGo using a passphrase of your choice, which you can restore and keep them in sync across those devices.

dbbk · on April 17, 2019

Bing implements AMP, and there's nothing technically stopping other search engines from adding support.

Also, links in the Twitter app default to AMP as well.

IgorPartola · on April 17, 2019

I wouldn't mind AMP as a feature as long as Google let me specify that I don't want it.

josteink · on April 17, 2019

Good for most people they use neither of those things then.

zavi · on April 17, 2019

Sounds like a criticism of Reddit moreso than AMP.

IgorPartola · on April 17, 2019

I pick on Reddit because it has the most glaring issues. But there are others. Certain tech sites like Gizmodo come to mind. So do some news sites. It's especially weird when the result is a page that contains a video, and the video is what I actually want but because of AMP it doesn't load, and it's not immediately apparent what's going on.

AMP is straight up broken technology. Imagine if you subscribed to a print version of the NYT but instead of getting the Sunday edition you got a ransom note looking summary of some of the articles from Clipper Magazine. Would you be OK with that?

m-p-3 · on April 17, 2019

I use Firefox, DuckDuckGo and Redirect AMP to HTML https://addons.mozilla.org/firefox/addon/amp2html/ everywhere I can, including mobile.

Yetanfou · on April 17, 2019

Use something like Searx [1], either self-hosted or hosted by someone or some organisation you trust. You can still get Google search results if you feel the need, either by explicitly asking for them (!go for search, !goi for images, !gon for news, !gos for scholar, !gov for video) or by enabling the Google engines in your config. In the latter case you get Google results mixed up with the other enabled engines. The results are presented as normal links without redirection through the originating search engine.

Searx can be extended so it would be possible to create a plugin which rewrites AMP links into non-AMP equivalents, where available. It can already do things like Open Access DOI rewrite (Avoid paywalls by redirecting to open-access versions of publications when available) so the ground work has been done. I'm currently working on improving (and fixing, where necessary) the image search engines and will probably start on such a plugin if nobody else beats me to it.

[1] https://asciimoo.github.io/searx/

frostbyte3 · on April 30, 2019

It's not really google's fault Reddit's amp sucks. AMP sucks, IK, I'm working on my company's AMP pages right now and they are a PAIN. But with enough tweaking, they can be gotten right. So I wouldn't blame Google for Reddit's devs.

Tepix · on April 17, 2019

Using the signed exchange mechanism means you allow anyone to serve your content. You will no longer know when it has been served and by whom. Instead, Google will know more about what your users are consuming on your website than you - despite HTTPS!

Also, there is no mechanism to limit who is allowed to serve your content for you.

I see no technical reason why the content has to be prefetched from Google instead of your own server.

It's also confusing for users and administrators. Want to block access to a website in your network? Guess what: Your block will not be effective because Google will proxy the data unbeknownst to the firewall.

zackbloom · on April 17, 2019

The reason it has to be prefetched from not-you is to protect the users privacy. Until they click a link it is not considered acceptable to leak their search to the potential destination. Links have to be fetched from a third-party who the search engine trusts not to share the data, that at the moment is Google but will hopefully expand.

codedokode · on April 17, 2019

Google already can do this by preloading a cached page from its own domain. So this specification is unnecessary.

I think the real reason is that Google wants to build a walled garden, but doesn't want the walls to be noticeable. Even with AMP, they display a header that looks like a browser's address bar [1]

Also, on that page Google admits that it uses AMP Viewer to collect information about users:

> Data collection by Google is governed by Google’s privacy policy.

Which is probably their real motication for creating AMP.

[1] https://developers.google.com/search/docs/guides/about-amp

Ajedi32 · on April 17, 2019

> Google already can do this by preloading a cached page from its own domain.

That's what AMP already did. This spec is better because it ensures publishers retain control over their own content, and doesn't confuse users by showing "www.google.com" in the URL bar for content that didn't originate from Google.

codedokode · on April 17, 2019

Publisher might want to display their URL in the address bar. But as a user I want to see the actual URL, not what Google or publisher want to show me. I don't want to see "example.com" in the address bar while I am actually connected to Google via a TLS connection signed by Google's key and my IP address is collected according to Google's privacy policy.

What confuses users is Google displaying a fake address bar [1] or browser displaying the wrong URL.

[1] https://developers.google.com/search/docs/guides/images/amp0...

Ajedi32 · on April 17, 2019

The URL you see _is_ the actual URL. It doesn't matter where the content was initially loaded from because the page is signed by the publisher's private key (the publisher has full control over the page contents, Google can't alter it).

codedokode · on April 17, 2019

The content is served from Google's servers according to Google's (not publisher's) privacy policy. While Google cannot alter the content, it sees the unencrypted HTTP request. I don't want neither Google nor publisher to control contents of my address bar.

Ajedi32 · on April 17, 2019

Google already knows the unencrypted contents of the page, and they know you clicked on a link to it (from their search results page). The signed exchanges system doesn't reveal any information to Google _or_ the publisher that they don't already know.

Your browser controls the contents of the URL bar, not Google or the publisher.

pvorb · on April 17, 2019

But Google controls my browser.

codewiz · on April 18, 2019

Have you tried changing browser? -- Written from my Chromium browser installed from the Fedora Linux package repository.

pvorb · on April 18, 2019

I actually use Firefox and avoid Google products where possible, but for the majority of users Google is controlling their browser.

grrowl · on April 17, 2019

I copy your post, but make it available further up the thread. Even though I sync your comment’s edits to mine several times a day, I also control Hacker News so I get them to display your username in place of mine, as to not confuse readers.

Page and DNS prefetching exists, HTML exists, why not just link to the page on the original domain?

pxtail · on April 17, 2019

> I think the real reason is that Google wants to build a walled garden

Exactly, this is the real reason why this abomination came into existence - all of this is masked as work for greater good all for those poor kids with limited network speed. As end effect everyone will suffer - user will never leave google ecosystem, he will remain on search page without even knowing about it, creator will lose control over his own content

pvorb · on April 17, 2019

Exactly: Google built a walled garden and now replaces the fence by glass, because people don't like the view from inside their cage. The worst thing is, that it'll get away with it.

Tepix · on April 18, 2019

„Considered acceptable“ by whom?

Why should the user‘s privacy be protected toward the content provider instead of the search provider? The search provider already knows more about me.

lucb1e · on April 17, 2019

Edit: I completely misread the comment, but can't delete it anymore a minute later because someone already commented. So oops.

hknd · on April 17, 2019

The caching doesn't necessary need to be by Google.

https://blog.cloudflare.com/announcing-amp-real-url/

singron · on April 17, 2019

I think that product is still cached by Google. Cloudflare is just providing the cryptography for Web Packaging so that the browser will show the url from the original page instead of the Google cache.

codedokode · on April 17, 2019

Caching not by Google doesn't give you a priority placement in the search carousel so it is pretty useless.

ihuman · on April 17, 2019

> You will no longer know when it has been served and by whom

I'm OK with publishers knowing less about the people seeing their content.

jononor · on April 17, 2019

They just have to access the data via Google. Who can cross-link it with all the other data they have. No real privacy gain here.

Ajedi32 · on April 17, 2019

That statement is also false, because publishers can still track when the page is served via JS.

jimktrains2 · on April 17, 2019

Amp does allow Google analytics and other analytics services. Unfortunately, most places don't use server logs for much :(

Tepix · on April 17, 2019

Chances are if you don't want Google to serve your content to protect the privacy of your users, you don't want to use Google Analytics either.

Btw your account seems rather active for an account with the description "Inactive. Deletion Requested." :-)

atoav · on April 17, 2019

Or if you want that stuff to work e.g. in China.

tracker1 · on April 17, 2019

A bit more on this... there are a LOT of secondary bots out there, either searching for security holes to exploit or otherwise slurping content for reasons other than search.

JS based analytics (google or otherwise) is generally a better option for detecting actual usage. Yeah, you lose maybe 2% of actual users. You also lose 99% of the various bots. You still have to filter google's and bing's bots that execute JS though.

jsnell · on April 17, 2019

This is such a strange reaction from HN. The AMP cache URLs have been a top 3 complaint about AMP here. "I can't copy-paste URLs, it's hard for users to understand which site they are on, it looks like the content is provided by Google rather than the real provider", etc.

Now there's a solution that preserves the preloading and validation benefits of AMP caches but maintains the original URLs, in a way that's cryptographically sound, in the process of being standardized, and controlled by the publisher. This gets launched much faster than one would have expected. And suddenly everyone pretends that the AMP cache URLs were never a problem and this is some kind of a power-grab.

gnomewascool · on April 17, 2019

I think that most people are worried about Google using a controversial[0], draft web "standard" (Signed HTTP Exchanges), that introduces a major change in how the web works, in mass production, without trying to first resolve the problems raised with the proposal.

[0] For instance, Mozilla considers the current specification to be harmful[1].

[1] https://mozilla.github.io/standards-positions/

eudora · on April 17, 2019

It also puts a _lot_ of work on the publisher to implement these changes. Again.

(Unless you're using CloudFlare)

gregable · on April 17, 2019

Currently, it's difficult to implement however unlike rewriting a page in AMP, signing the page is a purely mechanical operation. All that's required is to improve the tooling, it is theoretically possible to be a one-click change for any website out there. Initially adding gzip support to a web server was difficult and out the reach of many webmasters, now it's basically universal.

tytso · on April 17, 2019

There was an attempt to address Mozilla's concerns[1], but Mozilla never responded, unfortunately. If the Mozilla community chooses not to respond, that might cause people to consider whether or not their position should be given much weight.

[1] https://github.com/mozilla/standards-positions/issues/29#iss...

RubenSandwich · on April 17, 2019

What do you mean they never responded? They say they are working on a response[0]. Taking time to respond and informing the other party that it will take a while is not "never responded".

[0] https://github.com/mozilla/standards-positions/issues/29#iss...

tytso · on April 19, 2019

3 months is a long time.... how long is someone supposed to wait for a response before you just move ahead? If the answer is "forever", it becomes trivial to perform a denial of service attack on a standard. Mozilla specifically said, "this is not high priority for us". If it's not high priority for them to respond, that's fine, but waiting forever doesn't seem like a reasonable thing to require.

codedokode · on April 17, 2019

I am against this standard. First, I want to see the real URL in the address bar. Second, I don't want Mozilla to spend resources to implement specification that is made by Google for its own purposes.

RobertRoberts · on April 17, 2019

I simply don't trust Google to not change the rules later.

What will stop Google from down-grading 2nd class URLs (ie, not hosted with google) to page 2 results?

It's effectively the same thing as having no AMP at all, yet they cleverly got everyone on board with this tactic.

Edit: I just skimmed through this... this looks _WORSE_ than having Google show their domain. This is some of the sneakiest most deceitful garbage I could have ever imagined.

Just no way. Need convincing? Look at the animated gif half way down:

https://3.bp.blogspot.com/-Xqfy7IhiTzc/XLY7goySWzI/AAAAAAAAD...

eudora · on April 17, 2019

Sneaky because now you don't know what server a web page is coming from?

Because yes that's true, although cryptography it's maybe half true.

hedora · on April 17, 2019

Sneaky because (especially for news articles), the most common web-based attack is google (or fb, etc) slurping up my information.

Now, they want to remove the remaining user interface element that says they’re spying on me!

Also, this makes it even harder to ad block their junk at the network layer (is foo.com down, or is this more amp bs?)

codedokode · on April 17, 2019

On their page about AMP Viewer Google admits that they are collecting user's data when they view AMP pages [1]:

> The Google AMP Viewer is a hybrid environment where you can collect data about the user. Data collection by Google is governed by Google’s privacy policy.

With replaced URL it will be more difficult to spot.

[1] https://developers.google.com/search/docs/guides/about-amp

loudtieblahblah · on April 17, 2019

Because literally the entire private reason for AMP is a power grab.

Not the public reason, but absolutely the private reason.

If Google, Apple, Amazon, Microsoft, or whatever publicly traded company makes a move its for money and power and preferably power, since that yields even more money.

AMP on web and email is the perfection of embrace, extend and extinguish

tantalor · on April 17, 2019

Power grab from whom? Bing serves AMP too.

https://blogs.bing.com/Webmaster-Blog/September-2018/Introdu...

rum3 · on April 17, 2019

They also push a browser based on Chromium. I wonder how many intentional incompatibility issues there will be with other web browsers that are not developed by G$$gle.

mtberatwork · on April 17, 2019

Well, I wouldn't call this a solution just yet. If you read through the documentation, you'll find that this won't work on shared hosts and requires a TLS certificate "that supports the CanSignHttpEchanges flag. As of April 2019, only DigiCert provides this extension." [1] Plus, as if the lift of transforming HTML into AMP HTML wasn't already big enough for your average web site owner, implementing signed exchanges will be over the head of 99% of the folks building web pages on the Web.

IMO, while the URL problem was a big issue, the bigger issue is that AMP's restrictions and limitations gives your users a neutered user experience in the final end. As others have pointed out, if it wasn't for Google's implicit requirement to implement AMP (e.g. to get into their carousel and other locations), AMP would have been DOA.

[1] https://amp.dev/documentation/guides-and-tutorials/optimize-...

jefftk · on April 17, 2019

> as if the lift of transforming HTML into AMP HTML wasn't already big enough for your average web site owner, implementing signed exchanges will be over the head of 99% of the folks building web pages on the Web

Converting web pages into AMP isn't something you can automate, but supporting signed exchanges is. You need certificate authorities to support the flag and web servers support the protocol, but if this catches on then the only thing you'll need from the site owner is the decision on whether to allow it.

(Disclosure: I work for Google)

tcd · on April 17, 2019

Well, it's disappointing DigiCert didn't tell Google to fuck off. I hope this never comes to something like Let's Encrypt, so the vast majority of developers can never use this.

Sometimes, Google needs a gentle nudge from users saying "we don't like this" and hope they reconsider (I doubt it).

jefftk · on April 17, 2019

> I hope this never comes to something like Let's Encrypt, so the vast majority of developers can never use this.

Let's Encrypt's response:

    I think it’s likely too early in
    this draft’s development for Let’s
    Encrypt to prioritize implementation.
    It looks like it has a ways to go
    within the IETF before it would be
    an internet standard.

https://community.letsencrypt.org/t/cansignhttpexchanges-ext...

hedora · on April 17, 2019

Honestly, if comcast and friends started blocking this crap by default (with an opt in for people that want to be spied on by google) I’d take back at least half the mean things I’ve said about Pai.

fartcannon · on April 17, 2019

How do you personally feel about AMP? It looks like an attempt to make the web a walled garden.

jefftk · on April 17, 2019

AMP has a lot of things all together, some of which I like:

* I like that when AMP is used for ads then the ads are fully declarative. Advertisers getting to run custom javascript, even in a cross-domain iframe, isn't great.

* I like that AMP allows sites (currently primarily search engines) to trigger preloading in a way that doesn't leak information to the site that is being preloaded.

* I like the way things like "sorry AMP only allows us to use 50k of CSS" can give developers leverage to push back against bad site designs.

* I like that it centralizes some measurements: instead of every ad provider using their own custom polling system to determine if the ad is on screen they can all subscribe to events triggered by a single well written system. This doesn't affect the amount of tracking (there's lots either way) but it makes it hurt the user experience less.

On the other hand, I don't like that:

* AMP uses a ton of JS, and if all you want is a simple website it's going to slow things down in the non-preloaded case. For example, taking a random post on my site (https://www.jefftk.com/p/trycontra-implementation and https://www.jefftk.com/p/trycontra-implementation.amp) I see a median speed index of 1.611s on non-AMP but 2.051s on AMP: https://www.webpagetest.org/result/190417_XB_22673cb98ce390a... https://www.webpagetest.org/result/190417_PS_1a60378762d87fb...

* A lot of people that don't want to implement AMP are doing it because then they get more search traffic. I understand how there isn't currently a non-AMP way of doing preloading in a way that doesn't leak information to the site (see above) but I think Web Packaging should be extended to support this in the general case and allow publishers to use AMP only if they want to.

* The interaction between AMP and content blockers isn't great. If you have a content blocker set to allow some JS but not all (for example, no third party JS) then it's not going to run the AMP JS or the contents of the <noscript> block, and AMP pages will render with 8s of white screen before the CSS times out. This is a pain, but I'm not sure what the right way to fix it would be. (I wish content blockers were smart enough to figure out which <noscript> tags to run, but that's probably asking too much.)

If you wanted to expand on how AMP seems like an attempt at a walled garden I would be interested in reading it; I haven't previously read any explanations that made sense.

fartcannon · on April 17, 2019

I guess the question is: Do you trust google to treat non-AMP pages the same as AMP pages?

If they don't/won't, no matter what your justification for why is (you believe it will provide speed, security, whatever), that's one of the walls.

Sure, you can not use it, but does that limit your ability to be found on the internet? If yes, then there's that wall again.

They're in the extend stage of Microsoft's favourite strategy.

jefftk · on April 17, 2019

> Do you trust google to treat non-AMP pages the same as AMP pages?

Google clearly doesn't treat AMP and non-AMP pages the same way: only AMP pages are eligible for the carousel in Google search, and there's a little icon.

Once there's a way for non-AMP pages be safely preloaded I would be very surprised if Google search didn't start doing that, though. (Speaking only for myself, not the company.)

codedokode · on April 17, 2019

And replacing the URL bar contents makes it more difficult to spot the walls [1].

[1] https://developers.google.com/search/docs/guides/images/amp0...

andy_ppp · on April 17, 2019

Fair enough but there is now zero need to load them from the AMP cache at all - this security model could allow News Carousel to load them from the originating site and still have access to the pre-rendering instant load magic/lies that AMP provides.

It feels a little dodgy to me this standard and a bit embrace extend but I'll see how it plays out and reserve judgement until we see this happening in the wild and how well it works. Personally I'd like to be informed in the browser chrome that it was being served via this mechanism rather than me visiting the original site.

Can you maybe see that people feel the browser is now lying to them about where the content is coming from?

jsnell · on April 17, 2019

If you're loading the content from the originating site, surely there's no benefit at all to signing. If you're loading the content directly from the site, the browser just needs TLS to verify the integrity of the content.

And you're also back to the situation where you can't preload the content in a controlled manner or privacy-preserving manner, nor have the page-speed guarantees since the version being served to the user is not the version that Google crawled.

It's kind of the opposite. The cache is where the actual benefits come from. That's not the part you want to get rid of. The AMP spec was just a vehicle for making the caching possible in a secure manner.

This model would theoretically allow the validation, caching and prefetching to be done for all (signed, so opt-in by the publisher) HTML pages. Which is another one of the historical top complaints about AMP: why can't light, fast-loading, mobile-friendly HTML get the same treatment in search results.

> Can you maybe see that people feel the browser is now lying to them about where the content is coming from?

I can see that they are feeling like that, I just don't understand how they arrived there.

How is this different from a e.g. company X's website being behind Cloudflare? The browser didn't contact the actual server that company X hosted the content on. Instead the browser contacted a server run by Cloudflare that could prove cryptographically (via TLS) that it was authorized to serve content on behalf of the actual site.

AlexandrB · on April 17, 2019

> And you're also back to the situation where you can't preload the content in a controlled manner or privacy-preserving manner...

A few people have pointed out the privacy-preserving aspect of AMP. I'm not sure I get how that's the case. Is this referring to the fact that the page is not being pre-loaded from the content owner's own webserver? The main privacy violators on the internet are Google and Facebook. How is loading something from Google cache protecting my privacy?

Worse still, if someone posts an amp link on Twitter or a chat client Google now gets to know when I access a specific website even though they are an unrelated third party[1].

Edit: [1] In practice this was probably already the case since Google Analytics is so popular. But still.

gregable · on April 17, 2019

Good question.

If you make a search query, but have not clicked on any results, you have a privacy expectation that the web servers of the search results you have not clicked on will not know you performed this query, your ip address, cookie, etc. For example, if you search for [headache] and then close the window, mayoclinic.com knowing that you made this query would probably be a surprising result.

With naive preloading, you would preload a search result from that origin. Your browser would make an HTTP request to the site and that site (sending an ip address, the URL you are preloading, and any cookies you may have set on that origin). So, this approach would violate your expectation of privacy.

Instead, if the page is delivered from Google's own cache, the HTTP request goes to Google instead of the publisher. Google already knows that you have made this query, and are going to preload it (the search results page instructed your browser to do so in the first place). The request will not have any cookies in it except for Google's origin cookies, which Google already knows as well. Therefore this type of preload does not reveal anything new about you to any party, even Google.

AMP has been doing this for a long time in order to preload results before you click them. However, until Signed Exchanges the only way to do this was that on click the page would need to be from a Google owned cache URL (google.com/amp/...). With Signed Exchanges, that can be fixed. The network events are essentially the same.

Note that once the page has been clicked on, the expectation of privacy from the publisher is no longer there. The page itself can then load resources directly from the publishers origin, etc.

To your last point, if someone posts a link on twitter to an AMP page on a publisher domain, and then you click it, your browser will make a network request to the publisher's origin. Google will not be involved in this transaction in any way. If someone explicitly posts a link to an Google AMP Cache Signed Exchange, then yes this will trigger a request to Google but this will be far less likely going forward as these URLs will never be shown in a browser. For example, try loading https://amppackageexample-com.cdn.ampproject.org/wp/s/amppac... using Chrome 73 or later. This is a signed exchange from one domain being delivered from another. You'll never see that URL in the URL bar for more than a moment, so it's unlikely to ever be shared, like I'm doing now.

AlexandrB · on April 17, 2019

Thanks, this was very informative. I'm not a fan of AMP at all, but this helps me understand the reasoning a little bit better and why Google hosting the AMP cache is necessary for preserving privacy.

At its root, I think my objections to AMP boil down to a few things:

On a technical level:

1. It's buggy and weird on iOS.

2. I'm not convinced I care about a few seconds of loading time enough to justify the added complexity of making this kind of prefetching possible. Additionally, this seems like a stop-gap that will be rendered unnecessary by increasingly wide pipes for data.

On a philosophical level:

3. It gives Google way too much power over content.

4. I want the option to turn it off completely because of points [1] and [3], and because I fundamentally want to feel in control of my internet experience.

Edit: The point about SXG making AMP URLs less likely to get copy/pasted to other mediums is a key benefit I hadn't considered and will likely make avoiding AMP outside of Google search easier.

londons_explore · on April 17, 2019

2. How many URL's do you load in a day? My browsing history over the last 10 years averages to 417 pages per day. 2 seconds per URL is 35 days of my life...

I totally want that time saved if possible.

andy_ppp · on April 17, 2019

Making everything faster won't give you more time.

themacguffinman · on April 17, 2019

That's literally not true.

It looks like you were trying to make some deeper philosophical point, but you'll have to be clearer because your statement makes no sense.

gregable · on April 17, 2019

Bandwidth increases do not fix latency. If a document has to round trip from the other side of the planet, that adds about 200 milliseconds until we break the speed of light. If that same document must make several round trips to be able to initially load (very common!) this adds up rather quickly. The only solutions are localized caching and prefetching.

andy_ppp · on April 17, 2019

Yes, exactly why people think this is creepy. I also expect you not to start using battery rendering shit I haven’t asked you to in the background or data that again you don’t have permission to use. Just because the majority of users don’t care doesn’t mean you gregable are not corrupting the foundation of a free web. I still feel you’re making the web super creepy, grabbing extra data and the whole project focused could be accomplished without this embrace and extend - derank slow pages more aggressively doesn’t lead to a two tier web and doesn’t tie everyone further into Google’s brain washing algorithms. But this “solution” to the problem at least for now Chrome doesn’t visit Google for these new style links from elsewhere so at least that is some improvement. The fact this whole project should not exist and adds zero value and I can’t opt out is a massive problem for me.

jsnell · on April 17, 2019

If the browser were to prefetch search results, it would leak information to all the result pages about the user having done that search. (I once had a blog post accidentally rank on the first page for "XXX". I really don't want to know who is searching for that particular term.)

Google has to know what you're searching for to compute and show the results. So there are few additional privacy implications from the preload.

And your last case is exactly what will no longer happen. People will now copy-paste the original URL rather than the cache URL. Click on the link, and you're taken to the original site.

andy_ppp · on April 17, 2019

Then don’t cache stuff that you haven’t told the user you are caching from third party sites.

andy_ppp · on April 17, 2019

> If you're loading the content from the originating site, surely there's no benefit at all to signing. If you're loading the content directly from the site, the browser just needs TLS to verify the integrity of the content.

The browser security model stops them from doing this, but presumably in this new world they could allow this to work and not host the content in the carousel themselves.

I think the argument about content suddenly becoming "slow" and no longer AMP validated if it's not served from the AMP cache is a poor one.

Finally I'm willing to postpone judgement but I did just explain why people feel that Google is embracing and extending the web if you can't understand why people are worried about this that's not something I can help you with ;-)

Cloudflare does not have the same scope, power, monopoly or scale that Google have - I can change CDN provider if they start doing weird stuff, no problem, but I can never really get away from Google.

snarfy · on April 17, 2019

Most people agree that URL spoofing is bad. I'm not sure why google should get a pass.

cannedslime · on April 17, 2019

My biggest quarrel with this is that its just another way for google to take control over the internet. Does any other search provider than google use AMP? Does any browser other than googles own support this? How busy are you? You can't wait 0.5 seconds for an HTTP request? And you think its worth feeding google with more precise data about your movements online than they already have? And as a business integrating AMP, loose control over your own content and platform? Why?

delroth · on April 17, 2019

https://blogs.bing.com/Webmaster-Blog/September-2018/Introdu...

Disclaimer: I work at Google, nothing related to AMP or search.

ajmurmann · on April 17, 2019

Bing is the best thing that ever happened to Google. It's the fig leave that protects you from antitrust.

andrekandre · on April 17, 2019

my counter argument to this would be: we don’t need more corporate control of the internet and standards, we need less... bing throwing their weight in isn’t any better imo

IanCal · on April 17, 2019

> You can't wait 0.5 seconds for an HTTP request?

I don't have links to hand but everything I've seen shows real dropoffs in users as you increase the time. Once you're looking at low numbers of seconds you're looking at significant numbers of users simply abandoning the site. Half a second extra is not insignificant, and the user experience changes a lot between things that feel instant and things that have a noticeable wait.

cannedslime · on April 17, 2019

Yes but you don't need AMP to have a fast loading website or even one that applies the same principles as AMP when it comes to having inline CSS, loading scripts async etc. The biggest problem in all of this is usually ads and analytics anyways.

codedokode · on April 17, 2019

Also Google controlling AMP specifications means Google can decide what widgets (from what companies) can be there on the page, what ad networks and analytics systems can be used.

codedokode · on April 17, 2019

This is actually the opposite: users are deceived because they think they connect to publisher's site but in fact they are still inside Google's walled garden. Their data are collected according to Google's privacy policy but it is difficult to spot looking at the address bar.

Also, Google controlling AMP means that Google decides what analytic systems and ad networks are allowed on the AMP page. With Google having its own ads and analytics business, doesn't this tempt them to make life little easier for its own products and little more difficult for competitors'?

dbbk · on April 17, 2019

As bad as the URLs were, at least you could edit them to get back to the non-AMP version if you were technically literate enough. Now there'll be no distinction, you could get sent to an AMP link from Google which is a lesser experience than the 'real' site and have no way of getting out.

jefftk · on April 17, 2019

> have no way of getting out

I believe if you refresh the page it triggers a request to the original site, which will probably then choose to give you the non-AMP version of the site.

jedieaston · on April 17, 2019

Can't you just click the link at the top right that will send you to the real page as it does today?

dbbk · on April 17, 2019

I believe they said that bar would be going away once they rolled out 'real URLs'.

teddyh · on April 17, 2019

What reason does Google now have for keeping the link there?

48309248302 · on April 17, 2019

It only works in Chrome. The Web has now been split in two, and you now have to use Google Chrome to be on the faster version. Google is shamefully abusing its power in several places here.

gruturo · on April 17, 2019

If by not using Chrome you don't end up on an AMP page, I consider that a feature.

48309248302 · on April 17, 2019

Google is unethically abusing their power against non-Chromium browsers like Firefox. Speed matters in the eyes of users, even if we individually block AMP. See the link below for a general pattern.

https://www.zdnet.com/article/former-mozilla-exec-google-has...

nebulaserfer · on April 17, 2019

Google just gives users what they want. I've checked the link you provided and the website is total wreck in terms of user experience (subscription popup, large obtrusive ad banners and so on).

I have push notification disabled but it wouldn't be surprise for me if they asking to subscribe for push notifications on the first page view.

Current era of content websites is a disaster except few cases like medium and maybe reddit with a discount.

AMP is an only solution for general users who just want to google a cooking recipe or latest news in their town.

e12e · on April 17, 2019

First, everyone go out of their way to break REST[1] caching by eliminating proxies from SSL (for some good, some bad - reasons).

And now we're trying to shoehorn it back in?

It used to be that a local caching squid proxy was a great way to make load times of various "front pages of the Internet" bearable on a shared low bandwidth uplink (local/national news sites etc typically being served from the cache/lan).

New ssl/tls kinda-sorta breaks that (there's no middle ground - either install intercepting cert that catches everything, or abandon caching on everything. Either cache CNN. com and medical records, email(webmail) and Facebook messages - or neither).

AMP might be a bridge too far - but some kind of (semi) public "signed, not encrypted" would still be a good fit for hypertext applications/documents - because of the caching benefits.

[1] As excellently outlined and contrasted by Fielding in his thesis: https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm

nialv7 · on April 17, 2019

"Oh, you don't like the AMP url? OK, how about, we solve this problem, BY MAKING YOUR BROWSER LIE TO YOU. How does that sound?" - Google, probably.

If you look back, Google has made multiple attempts at "improving" the url. It starts to become clear what they are trying to do now.

api · on April 17, 2019

I think everyone sees clearly that AMP is a power grab but fishes for proximate reasons to reject it instead of just sticking to that.

I don't care how AMP works. It's a power grab. Done.

jxdxbx · on April 17, 2019

the preloading and caching seems like a marginal speed boost. the main win from AMP is just the stripped-down format, which does not require the cache.

jononor · on April 17, 2019

People here have been complaining that AMP is a power-grab from the very start. So I do not see this as a strange reaction.

matt4077 · on April 17, 2019

It works best if you try understanding everything from the perspective of someone tail’ing Apache logs and fiddling with their kernel parameters.

wayneftw · on April 17, 2019

Different groups of people are commenting at different times.

Also, Mozilla members rally around a ton of stuff here on HN. That's why you see so many posts about Rust despite the fact that it's not really that popular. That's also why the top comments on stories about MS Edge switching to Chrome where lamenting the fact that they didn't choose Firefox, despite the fact that hardly anybody uses Firefox.

albertgoeswoof · on April 17, 2019

Let’s hope AMP, like most google products, is shut down within the next 2-3 years

mrweasel · on April 17, 2019

What's wrong with it exactly... beside being weird. I'm not a fan of manipulating the URL the way they do with this change, but couldn't you just opt to not use AMP if you don't like it?

Ideally people would develop fast sites on their own, but apparently they need the help of Google.

untog · on April 17, 2019

If you don't use AMP your search engine placement suffers. Often dramatically, as all the pages in Google's top-most carousel are all AMP pages.

And AMP is a pain in the ass. It's sold as being "just HTML" but it isn't, really. You can't even use an <img> tag, it has to be <amp-img>. So you have to generate two versions of every page. Achievable for large companies but if you don't have a lot of resources that's a big overhead. As is so often the case, it helps concentrate all web traffic to a smaller and smaller number of sites/publishers and shutting the rest out. That's not good.

nerdjon · on April 17, 2019

The issue is that you can't, or you risk your site being basically blacklisted from google. Especially if your a news site.

Users have no control outside of not using google. If google were to provide a setting for the user to never see AMP, I would have less issue with this. But they don't

Instead, they basically force publishers to use this because if they don't the news carousel will not show their article. It just gives Google more control over the web for minimal at best benefits

galaxyLogic · on April 17, 2019

I don't known much about AMP so my question is why can't it be a standard?

If there are some benefits to it why shouldn't those benefits be standardized? Is Google preventing the standardization of AMP?

vxNsr · on April 17, 2019

> but couldn't you just opt to not use AMP if you don't like it?

This is false. as a user I cannot easily opt out of using AMP

sieabahlpark · on April 17, 2019

Nah I see AMP staying around long enough to capture a significant portion of web share that they mine data from. Essentially is just a way to insert themselves as "the internet"

dreamcompiler · on April 17, 2019

Google products that suck the most tend to last a long time. See Google+.

ganeshkrishnan · on April 17, 2019

Amen!

Kiro · on April 17, 2019

Why is that? I think AMP is great.

zavi · on April 17, 2019

Users love it so unlikely.

Illniyar · on April 17, 2019

We need to decouple two things that are mashed together in this post:

Web packaging and Signed exchanges seems benign and beneficial, you can sign a particular page inside a package (let's say a zipped folder of some kind) and now anyone can cache that data and show it, while both the browser and the user knows that it's safe to display it. Since the AMP format is similar, it seems quite beneficial to now have all your AMP content support this feature. And anyone who made some of their pages AMP can use that same process to support other Signed Exchanges (such as p2p networks or CDNs) . This is great since it makes distributed caching much easier.

The bad part is that google search uses this signed exchange format not to show the actual URL but rather put it in an iframe inside chrome (and only chrome). The real question is whether we will be able to use this functionality outside search, if I have my own site and show a large iframe with signed exchange page, will I also be able to change the browser url bar? mmph, probably not.

gregable · on April 17, 2019

There is a little confusion here, understandable. Google search will not show these signed exchanges in an iframe, the pages are full frame.

Try it for yourself. Using Chrome 73 or later (you probably already have this), and a mobile browser (either a phone or mobile emulation), try the query [amp dev success stories].

It will only use signed exchanges in Chrome because currently only Chrome supports signed exchanges. The search engine explicitly looks for the browser to state that it supports signed exchanges in an Accept header, like any other new technology.

Yes, any page can use this. So, for example if you went and fetched a signed exchange from https://amppackageexample.com/ (or any other site that supports one, this is just an example), you could then serve that from your own server, more or less just like any other file (the less is that you need to set the right Content-Type header, but it otherwise works just like serving an image or a zip file).

Then, if a user visited the URL on your site https://yoursite.com/cached-copy-of-amppackageexample.com/ then the browser would display https://amppackageexample.com/ in the URL bar, as though that URL had 301 redirected, but without the extra network fetch.

Google search does exactly this, just loading a cached copy of the Signed Exchange, and any other cache (or even any website) can do the same.

tW4r · on April 17, 2019

Forgive my lack of know-how, but does this theoretically mean I could download this _signed package_ to my computer along with the signature and use it later to prove that the information was provided by the source according to the signature?

themacguffinman · on April 17, 2019

Yes. You can see this among other planned use cases for the web packaging spec here https://wicg.github.io/webpackage/draft-yasskin-webpackage-u...

IanCal · on April 17, 2019

I'm not quite following which parts were or weren't needed for what's been enabled in the post here, for the usecase of delivering a single offline package that can be opened like a website, is there something that works yet? Or a repo I should be following other than the spec?

Once I can create webpackages and deliver them to clients a lot of thing I want to do become hugely easier and nicer.

themacguffinman · on April 17, 2019

I believe Chrome has already shipped an implementation, I don't know any more details unfortunately. It's still in the standardization process.

I know it's not exactly easy to follow but the only implementation repo I can think of to follow right now is the Chromium repo.

IanCal · on April 17, 2019

Oh interesting! I'll see what I can find there, thanks!

I also had a look in the blog and the "progressive web apps" might be the right thing to look at. There's probably something subtle that's different but I think I can use these to solve the actual problem I have.

https://developers.google.com/web/updates/2019/03/nic73?hl=h...

edit - damn, I don't think this is right at all. Frustrating as it seems pretty perfect but I have to serve from my own domain for 30s before a user can install it :( I just want a single file way of delivering web content! It seems like all the features are basically there, just with restrictions to focus on different use cases.

gregable · on April 17, 2019

You could prove the document was signed using the source's private key. That does prove the document was signed by the source if you can prove that only the source had access to the key.

abraham · on April 17, 2019

rum3 · on April 17, 2019

How does the browser verify that the AMP is up to date?

gregable · on April 17, 2019

Good question. The publisher signs an expiration timestamp in the Signed HTTP Exchange. The publisher can choose this timestamp and the browser will not respect signatures with expirations in the past. Note also that the specification requires, and browsers enforce, that the expiration cannot be more than 7 days in the future.

e12e · on April 17, 2019

Wouldn't it be better to borrow from HTTP and allow a head request to the original source - with a reply of a current signature?

Isn't this whole exercise really just adapting public key signatures on top of old school caching?

With a http proxy you ask for an url, the proxy fetches or serves on behalf of the owner. This adds some circumvention around the way tls/ssl breaks that type of caching. But it should still be able to do a head-like request for a current signature - with no need to download the content again if it is unchanged?

gregable · on April 17, 2019

There is in fact some draft language around this kind of a mechanism to update a signature to extend the lifetime of the document by fetching a remote URL. See https://tools.ietf.org/id/draft-yasskin-http-origin-signed-r... .

Doing this on every page load breaks either user privacy (by making the origin fetch before the user clicks) or the preload performance gain itself (by blocking load while waiting for this round trip).

e12e · on April 18, 2019

But if the signature is expired, preload would fail anyway, which would trigger a regular load "on click" - but that click should maybe result in a head request for possibly just getting an updated signature?

gregable · on April 20, 2019

The intermediary (Google in this case) can choose not to serve an expired exchange.

codedokode · on April 17, 2019

It's ridiculous. Google wants to keep users at its domain so much that it invents a whole technology to substitute address bar contents. This shows how harmful it is when a company has a significant market share in several different areas (browsers and search engines).

I hope at least Mozilla doesn't adopt this technology and will show the true URL.

This technology is complicated. Browser vendors have to implement all of this only to please Google.

lucb1e · on April 17, 2019

Indeed, Google just has too much market share.

Last week I blocked Google from my domains (blog: lucb1e.com/!130), hopefully others will follow suit and degrade the search quality until people get better results (at least for some more obscure content) elsewhere, or perhaps until Google notices we are really not okay with their behaviour.

spartas · on April 17, 2019

Are you blocking GoogleBot by IP range or User-Agent match? Why aren't you using your robots.txt file to block GoogleBot instead or in addition to your server-side logic?

lucb1e · on April 17, 2019

Robots.txt was my first thought as well, but that is said to not actually block your site from appearing in the results. They'll gather from other sites what the page is about (think <a href=mysite/somepage>how to knit a sweater</a>) and show that as title without page summary. Maybe if it looks like the site is down, they won't bother.

Blocking is based on user agent, they seem to set that reliably and the IP addresses change. You can do some reverse lookup magic but this was way easier than looking up every single IP that visits my site.

Ajedi32 · on April 17, 2019

This is the exact opposite of "keeping users at its domain". That was the situation _before_ they implemented this standard. Now users will get sent to the publisher's domain instead (via a prefetched page load).

proaralyst · on April 17, 2019

But it's served from Google, so they still control all the analytics

Ajedi32 · on April 17, 2019

No they don't. The page contents are controlled by the publisher and cryptographically signed so Google can't alter it. Another improvement over the previous situation.

josteink · on April 17, 2019

Since it’s not fetched from the publisher they will have to use Google analytics or have nothing.

Guess what they’re going to choose.

Ajedi32 · on April 17, 2019

Google Analytics isn't the only analytics solution out there. Publishers can use literally _any_ method of gathering analytics that's not server logs.

strictnein · on April 17, 2019

Remember the talk about how the Chrome team was going to "rethink" the navbar, and what domain and site identity really mean? And people were a little worried about this?

Turns out people were right to be suspicious. This is hot garbage. You can no longer ask a user "What URL does your navbar say you're at?". It is no longer a source of truth. They will actively be lied to.

Agebor · on April 17, 2019

But what does it mean that you are on a particular URL?

For a long time already it's not being connecter to a particular physical server. Now it's the next step - to be completely decoupled from the server and just mean content instead.

propogandist · on April 17, 2019

This is meant to offload tracking from just Google Analytics and SERP clicks, which is used to track user behavior (but can be blocked) into services that cannot be blocked beyond Google domains.

If Google hosts the website and is masking the resulting url, they're able to have more visibility than Google analytics. They'll likely give this AMP some SEO boost temporarily and that will get web admins to adopt the technology.

It's just like reCaptcha, which is used to track users across the web (requires google.com + gstatic.com urls to load, which drops its own cookies or scans existing ones), blocking recaptcha will break core web functionality... and recaptcha v3 is even worse.

48309248302 · on April 17, 2019

Web publishers don't necessarily want their content decoupled from their own servers, but they don't have a choice now if they depend on traffic from Google.

codedokode · on April 17, 2019

You are not decoupled from the server. Google still sees HTTP request you make in plaintext and collects your data according to their privacy policy. It just won't be obvious because of publisher's URL in the address bar.

notatoad · on April 17, 2019

there was no need to be suspicious. google wasn't being sneaky about it, they have been actively talking about, promoting, and openly developing this feature for at least a year.

48309248302 · on April 17, 2019

This sounds terrible. Does it mean that browsers will begin lying to users and say that the users are visiting the website's server when they are really visiting a restricted version of the website that is hosted in Google's cache? I don't want my content restricted or hosted in Google's cache.

AMP doesn't load in a privacy sensitive way. It's on Google's servers and it takes many seconds to load if you have JavaScript disabled.

Also, the feature only works on Google Chrome and possibly Edge, which gives another point to the article below.

https://www.zdnet.com/article/former-mozilla-exec-google-has...

AMP is a fundamentally bad idea that needs to disappear.

Edit: Mozilla has marked Signed HTTP Exchanges as harmful.

https://mozilla.github.io/standards-positions/

gregable · on April 17, 2019

The browser displays the URL from the origin that digitally signed the unmodified content.

A browser already doesn't show you what server delivered the content. That would be your wifi AP, cell phone tower, or ISP node. The internet has already long established that we can trust content without trusting intermediaries.

There are two elements that are important: integrity and privacy. The content integrity is protected via a digital signature, the "signed" part of "signed http exchanges". The signature proves that the document hasn't been tampered with.

Regarding privacy: The intermediary (a search engine in this case) already has the content being delivered as a result of crawling it. It also knows the user clicked on a link to get that content, and knows the user's ip address. Even without AMP or Signed Exchanges, the privacy situation is the same. Once the page is loaded, all further interactions with the origin are normal https traffic, so later requests are not different in privacy either.

What this enables, for search results, is the ability to load the bytes of the content before the user clicks a search result. If the browser prefetched those bytes with the origin's awareness, then the user's privacy with respect to the search query would be violated, making prefetch problematic. With this setup, documents can be prefetched while preserving user privacy and after the user clicks all browser behavior continues as normal from that point forward.

rando444 · on April 17, 2019

AMP allows Google to see exactly how you interact with every page on the internet.

Just from the text of the pages you visit they can build a profile around you. What your interests are, how much of an article you're likely to finish, whether you're the type of person to highlight text as you read, etc.

Unless you live on an island with a poor satellite connection AMP is useless as anything more than a corporate user data collection tool.

gregable · on April 17, 2019

AMP documents don't share user data with Google, which can be trivially seen by inspecting the network events that the page generates.

If the publisher chooses, they can send logging to Google Analytics, but this is not part of AMP.

The typical argument otherwise is that the AMP javascript is loaded from Google's cache, however these javascript resources allow for a very long cache lifetime (1yr if the page came from the Google Cache), so relatively few page loads will actually end up fetching them from the network for most users.

Edit: These resources are also on cookieless domains.

Mindwipe · on April 17, 2019

> The typical argument otherwise is that the AMP javascript is loaded from Google's cache, however these javascript resources allow for a very long cache lifetime (1yr if the page came from the Google Cache), so relatively few page loads will actually end up fetching them from the network for most users.

Christ this is thin as a privacy argument.

codedokode · on April 17, 2019

> AMP documents don't share user data with Google, which can be trivially seen by inspecting the network events that the page generates.

Is there anything preventing Google from changing this later?

andrerm · on April 17, 2019

No, if Google can change the way web works from day one they can change anything they want. Don't forget Google is killing imap and dns already. Why not http to?

codedokode · on April 17, 2019

Also, Google explicitly states that it is collecting data in AMP Viewer [1]:

> The Google AMP Viewer is a hybrid environment where you can collect data about the user. Data collection by Google is governed by Google’s privacy policy.

I assume they collect information from HTTP request the browser sends when requesting an AMP page.

[1] https://developers.google.com/search/docs/guides/about-amp#a...

cced · on April 17, 2019

> AMP documents don’t share user data with Google

They might not now, but could ‘t Google start creating unique URLs on each page, allowing them to track you that way?

meowface · on April 17, 2019

They can already do that, and are doing so, through Search, Analytics (maybe), ads, etc. That war is long lost.

duozerk · on April 17, 2019

They can't if you block all their shitty domains and don't use google services. Things that many privacy-conscious users do.

dwild · on April 17, 2019

We are talking about their AMP cache. If you don't use Google Service, except if you like to prepends their amp cache URL before your links, you'll never get there.

Their AMP cache happens only on their search service. They already know which links you click... having an AMP cache on top doesn't give them MORE information than they already get. The use of that cache also make sure the website doesn't get more information because it's preloaded.

duozerk · on April 17, 2019

That's not entirely true though, is it ? any link shared on reddit, or here, on on any social network by a chrome user can be an amp one.

snaky · on April 17, 2019

If (or when) the share of that privacy-conscious users will rise, Google might motivate webmasters to compile GA scripts in the main JS script, and considering pretty much any website now a days just doesn't show content with no Javascript enabled, it would be much harder to avoid.

duozerk · on April 17, 2019

I browse mostly without javascript on and that's not true; easily more than half of websites work just fine without it, and that number goes far up if you accept some lack of features. Though there are some that indeed don't work at all.

Although your point is well taken that there could be ways to sneakily track users eventually despite the aforementioned measures, and potentially even without javascript being required (though I doubt that share of privacy-concious users will ever raise significantly - most people simply don't care).

gsich · on April 17, 2019

No excuse.

48309248302 · on April 17, 2019

Google can't tell if a link has been clicked if JavaScript is off and the `ping` attribute is removed, so AMP removes privacy there.

By forcing web publishers to host their content on a Google cache, they lose their server-side logging and the ability to determine how they set up they way they serve their own sites.

Also, why do you artificially slow page loads on AMP pages to 8 seconds when JavaScript is disabled? That is a privacy issue.

gregable · on April 17, 2019

The linker (google in this case) could rewrite the link to use a redirector if they choose. If Javascript is off, AMP and thus Signed Exchanges are disabled on Google search results anyway.

You misunderstand the 8 second CSS animation in the AMP boilerplate. Here's the code (simplified):

  <style>
    body { animation:-amp-start 8s steps(1,end) 0s 1 normal both}
    @keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}
  </style>
  <noscript>
    <style amp-boilerplate>
      body{animation:none}
    </style>
  </noscript>

See the noscript section: if javascript is disabled, the CSS displays the body immediately. If Javascript is enabled, but for some reason the AMP javascript fails to load, after 8 seconds, the page is displayed anyway. The page is probably somewhat broken without the javascript loading, but the 8s is a fallback, not code to slow down non-javascript browsers.

48309248302 · on April 17, 2019

There are legitimate (privacy/speed) reasons to not load AMP's JavaScript while still not turning of JavaScript entirely. Google does have the capability to know when you're on an AMP page, because the JS loads from ampproject.org, which is registered by Google.

An 8-second delay seems like an intentional "bug" to coerce users to turn on JavaScript (and advertising).

gregable · on April 17, 2019

The javascript is heavily cached, so will not give a request on every page load.

That is not the intention. If javascript is disabled entirely, Google Search won't even load AMP pages. The scenario you describe of a user loading an AMP page directly without javascript enabled is somewhat rare.

48309248302 · on April 17, 2019

Many people use tools to block third party JS from loading. AMP can't be called privacy-friendly while making it extremely difficult to use when tracking (AMP Analytics) is blocked. The 8-second delay happens to me every time I accidentally click an AMP URL in my browser.

lucideer · on April 17, 2019

I don't use Google Search, and I frequently get sent to Google's AMP cache via other link sources (e.g. HN).

I don't have javascript blocked, but I do have Google's tracking blocked via standard tracking protection (which is now a built-in feature in most non-Google browsers), which means <noscript> tags are not triggered, and I get the 8 second delay due to non-loading JS resources.

I don't think my setup is as rare as you make out.

UweSchmidt · on April 17, 2019

It is clear that the current developments on the web are worrysome and we need real privacy. We need to be able to find a website and visit it completely anonymous, unless we actively submit information to said website or a court order is issued.

A cell phone tower or ISP node is ideally just infrastructure, "plumbing". Google seems to be trying to advance their strategic position in that direction. Rather than just being one search engine among several, they are trying to become part of the infrastructure. This could prevent future privacy solutions (and even prevent competitions between search engines).

codedokode · on April 17, 2019

The real reason to make this spec is not to improve integrity o privacy or something else but to make users stay on Google's domain instead of going to other site. Google wants to build its little walled garden, and this spec is needed to make users think that the walls aren't there.

> With this setup, documents can be prefetched while preserving user privacy and after the user clicks all browser behavior continues as normal from that point forward.

But Google can already preload and show cached version of the page without this spec. The only difference would be that address bar shows "google.com" instead of publisher's domain. There is no need for this specification.

josteink · on April 17, 2019

> A browser already doesn't show you what server delivered the content. That would be your wifi AP, cell phone tower, or ISP node.

No. Incorrect. Completely backwards. Factually wrong. You just failed your networking-exam.

Those things you mentioned would be transparent networking nodes forwarding your TCP-packets and they have nothing to do with any layers above that.

The fact that you don’t even know this completely invalidates any other point you may have.

ddalex · on April 17, 2019

I think you're missing the point of the GP. It says that you don't know and you don't care which particular server returns your content - is it a self hosted machine, is it a cloud machine, is it a CDN? No way of knowing unless you inspect the deeper stack. What you see very visible is which BRAND (I. E. URL) returned your content.

So this Amp exhange technology changes nothing in this regard. It's like Google provides its own Free CDN, it is just not done in a traditional manner.

josteink · on April 17, 2019

> It says that you don't know and you don't care which particular server returns your content

Which is plain wrong. I care.

When the URL-bar says I’m looking at company.com, I expect my browser to have used my OS’s DNS-resolver to look that name up, connect to the IP-given and nothing else.

I certainly don’t expect it to send traffic to certainly-not-the-nsa.com which are MITMing my traffic and tracking/monitoring it.

If I can’t trust my browsers URL-bar to exclusively and accurately reflect what is actually requested, it is effectively lying to me, the user, it’s owner.

And then suddenly all URLs are phishing URLs because Google made URLs no longer matter or mean anything.

Completely unacceptable.

ddalex · on April 20, 2019

My point is that even if you look at the URL bar currently and it says company.com, you don't know what you're connecting to. Probably you're connecting to CloudFlare/CloudFront/Akamai/Fastly/any other CDN which is set up with good-enough certs to impersonate the domain. Therefore you're not trusting a particular server, you're trusting a relationship that the domain owner built with her's service providers.

The proposed scheme is just another way to extend this kind of relationship that the publisher builds, a new mechanism if you will. There is nothing in there that requires more or less trust from your part than before.

You're complaining that need URLs to reflect what is requested - in fact, I argue that you want the URL to tell you what is being served. But this is not what's currently happening.

URLs are already lying to you.

I doubt that you WHOIS-lookup all DNS resolved-IPs to verify that the IP presenting a cert is assigned to the organisational entity that you want to connect to, and have a whitelist of those entities that you actually allow your browser to connect to. Because that's what currently required to make sure you don't go through CDNs and other intermediaries between you and the publisher.

josteink · on April 21, 2019

Using a CDN currently means the company use trusted mechanisms like DNS to delegate certain traffic to other providers (like with Cloudflare). And it does so for everyone.

In which case the URL serves what was requested.

What AMP does is provide google.com content and lie to the user and says it comes from company.com.

Which isn’t true, and it only does so for users coming from google.com. Where I’m sure google will be happy for the additional tracking data.

This is NOT the url the user was lead to believe he requested. This is not what everyone else is served.

This is malware.

chii · on April 17, 2019

> I don't want my content restricted or hosted in Google's cache.

how is this different than using your own domain, but pointing it to a github.io page? Or using medium, but with your own domain (but still being served from medium's servers)?

Is it just google you're adverse to, or the entire idea of someone else hosting your content?

48309248302 · on April 17, 2019

1) I want full control over my servers and to not be penalized in search engines for not hosting my sites on Google. Where are the server-side logs?

2) I want full control over how I publish my sites with real web standards. AMP is not a web standard, it's a Google format that they are strong-arming people into using.

3) Mozilla considers Signed HTTP Exchanges harmful. This technology is as bad as what Microsoft was doing with IE in the old days.

4) I don't publish on Github pages, but if I did, I would still have a choice over which servers I put the sites on.

5) There shouldn't be a single company (or few companies) that dictates how we publish online.

6) Shame on the people who are splitting the web with this fake-opensource technology. There's even a Google engineer over here referring to the Web like it's a Google product. https://news.ycombinator.com/item?id=19631136

Operyl · on April 17, 2019

As per point 6, I wouldn’t take what was said there as a statement from Google, or potentially even an employee of Google. They did it as a throwaway .. anybody wishing to kick the hornets nest could have posted that, employee or not.

48309248302 · on April 17, 2019

It's not written like someone trying to kick a hornet's nest. It's written like someone who has been conditioned inside of a culture that has begun to view the Web as a Google product on some level.

Operyl · on April 17, 2019

And if somebody was wanting to kick a hornets nest, that’s exactly how you’d want to write it :).

My point is, you cannot just blindly trust anonymous comments to be who they say they are, it’s an easy way to get yourself in trouble.

laggyluke · on April 17, 2019

But if the comment was, say, digitally signed, on the other hand... ;)

millstone · on April 17, 2019

DNS is the answer to the first two questions.

However the last question is a fair point - nobody complains about CloudFlare's caching of your web page as you designed it.

The critique of AMP is that it receives privileged placement in search results, and that content authors are being pressured into adopting this de-facto Google-controlled spec, where they host your content and control its presentation. Anything that furthers AMP helps Google in this effort.

skybrian · on April 17, 2019

That's a good point! Domain owners can host their websites wherever they like, and yes that includes Google's cloud.

If they go through a content network like Cloudflare, you can't even tell who's hosting the site by looking at the IP address.

It drives home the point that websites are abstractions that have no necessary relationship to any particular physical hardware. Network tools may or may not tell you a bit more about the source, depending on if there are any leaks in the abstraction.

48309248302 · on April 17, 2019

There is a difference between the web publisher controlling that abstraction and a web publisher that has been strong armed into one abstraction or another.

skybrian · on April 17, 2019

There are incentives, but publishers still make their own decisions.

48309248302 · on April 17, 2019

Being penalized in the search results is outright coercion, not an incentive.

young_unixer · on April 17, 2019

I didn't even know about "HTTP Exchanges", and I'm more interested than ~98% of the population about this kind of stuff.

Showing the name of the "signer" in the address bar, instead of the server where the content is actually hosted goes against decades of browser UI design.

Good on Mozilla for marking it as harmful.

jedberg · on April 17, 2019

> Showing the name of the "signer" in the address bar, instead of the server where the content is actually hosted goes against decades of browser UI design

Does it though? If you use Cloudflare or Akamai or Cloudfront or Netlify or etc. etc. then what shows up in the URL bar is not the server where the content is actually hosted. Well, it is the server where it is hosted, it's just one of the many domains hosted by that server.

luckylion · on April 17, 2019

That has never been different. Cloudflare & co are reverse proxies, for all intents and purposes from a user agent view, they are where the content is coming from. They are the ones pointed to in DNS, and they have valid SSL certs.

jedberg · on April 17, 2019

And how is this all that much different? In fact I would say it's more secure. DNS can be spoofed pretty easily. This is a cryptographically signed package. If anything, I'd have more faith in this changing my URL than a proxy via DNS.

Just because Google invented it doesn't make it bad.

lol768 · on April 17, 2019

> In fact I would say it's more secure. DNS can be spoofed pretty easily. This is a cryptographically signed package

How is it more secure? If, as you say, DNS can be spoofed easily - I can easily get a certificate issued with the required extension and make a "cryptographically signed package".

jefftk · on April 17, 2019

> If, as you say, DNS can be spoofed easily - I can easily get a certificate issued with the required extension and make a "cryptographically signed package".

Spoofing DNS to clients is much easier than spoofing DNS to certificate authorities. Otherwise domain-validated HTTPS certs wouldn't mean much.

luckylion · on April 17, 2019

> And how is this all that much different?

It changes the meaning of the address bar from "this is who I'm talking to" to "this is who (at some point in time) signed this content".

jedberg · on April 17, 2019

But when there is a CDN there, "who I'm talking to" is really just an intermediary who pretends to be you, and may have in fact modified the content. With this, it is still an intermediary pretending to be you, but at least now the package is signed and can be verified.

luckylion · on April 17, 2019

The CDN is you, for all intents and purposes. It's your agent in the back and forth, as much as your hosting provider would be. A third-party cache isn't.

I don't mind that you can sign and verify content, that's fine and useful. I'm just not a fan of changing the address bar's meaning.

jedberg · on April 17, 2019

But what I'm saying is that the meaning that you ascribe to the address bar is incorrect -- it already only tells you who published the content, not who you are actually connected to.

What I'm saying is that this does not change the meaning of what's in the URL bar. It's the same as before. It tells you who published the content originally.

luckylion · on April 17, 2019

> it already only tells you who published the content

No, it tells you the origin of the document. If you are the creator, and you choose to put your content on server X it will tell you "I've got this from server X". Whether that server is a reverse proxy or a shared webhost or a dedicated server in a DC or a raspberry pi running on your desk doesn't matter - it's the designated original that you, the owner of example.org chose.

That's what it always meant, and it changes when you do a redirect, and it shows you the current URL even if there is a canonical header of http-equiv. I can put a reverse proxy on my host and proxy example.com to example.org - the address bar tells you that you're reading example.com, not example.org, as it should, because you're connected to me, not to example.org.

dwild · on April 17, 2019

This is just semantic.

Do a trace route on any domain and you'll see that the server isn't the one that give you the answer, but some intermediary. Sure in that case when you did the request, the content is fresh and the server answered RIGHT NOW, but that cache still get the content from the server, it's just a bit older.

jazoom · on April 17, 2019

I decided it's time to give DuckDuckGo another shot. I just realised it's a lot nicer to scroll through its results than Google is now.