pfoof's favorites | Hacker News

HTTP/2 seems to be designed for (graphical, interactive) webpages.

The maintainer of a popular webserver has suggested HTTP/2 is slower than HTTP/1.1 for file download.

https://stackoverflow.com/questions/44019565/http-2-file-dow...

As I stated, I use HTTP/1.1 pipelining every day. I use it for a variety of information retrieval tasks, even retrieving bulk DNS data. To give an arbitrary example, sometimes I will download a website's sitemaps. This usually involves downloading a cascade of XML files. For example, there might be a main XML file called "index.xml". This file then lists hundreds more sitemap XML files, e.g., archive-2002-1.xml, archive-2002-2.xml, containing every content URL on the website beginning with some prior year all the way up to the present day. Using a real world example, index.xml contains 246 URLs. Using HTTP/1.1 pipelining I can retrieve all of them into a single file using a single TCP connection. Then I retrieve batches of the URLs contained in that file, again over a single TCP connection. Many websites allow thousands of HTTP requests HTTP/1.1-pipelined over a TCP single connection, but I usually keep the batch size at around 500-1000 max. Of course I want the responses in the same order as the requests.

The process looks something like this

    ftp -4o 1 https://[domainname]/sitemaps/index.xml
    yy030 < 1|(ka;nc0) > 2
    yy030 < 2|wc -l

    1337855

1337855 is the number of URLs for [domainname]. Content URLs, not Javascript, CSS or other garbage.

yy030 is a C program that filters URLs from standard input

ka is a shell alias that sets an environment variable that is read by the yy025 program to indicate an HTTP header, in this case the "Connection:" header set to "keep-alive" not "close" (ka- sets it back to close)

nc0 is a one line shell script

    yy025|nc -vv h1b 80|yy045

yy025 is a C program that accepts URLs, e.g., dozens to hundreds to thousands of URLs, on stdin and outputs customised HTTP

h1b is a HOSTS file entry containg the address of a localhost-bound forward TLS proxy

yy045 is a C program that removes chunked transfer encoding from standard input

To verify the download, I can look at the HTTP headers in file "2". I can also look at the log from the TLS proxy. I have it set configured to log all HTTP requests and responses.

Is this a job for HTTP/2. It does not seem like it.

This type of pipelining using only a single TCP connection is not possible using curl or libcurl. Nor is it possible using nghttp. Look around the web and one will see people opening up dozens, maybe hundreds of TCP connections and running jobs in parallel, trying to improve speed, and often getting banned. As with the comment from the Jetty maintainer, I suspect using HTTP/2 would actually be slower for this type of transfer. It is overkill.

IMHO, HTTP, i.e., in the general sense, is not just for requesting webpages and resources for webpages.

I find HTTP/1.1 to be very useful. It is certainly not just for requesting webpages full of JS, CSS, images and the like. That is only one way I might use it. Perhaps HTTP/2 is the better choice for webpages. TBH, if using a "modern" graphical browser, I would be inclined to let it use HTTP/2. Most of the time I am not using a graphical browser.