Hello everyone, it's the author here. I initially created 13ft as a proof of concept, simply to test whether the idea would work. I never anticipated it would gain this much traction or become as popular as it has. I'm thrilled that so many of you have found it useful, and I'm truly grateful for all the support.
Regarding the limitations of this approach, I'm fully aware that it isn't perfect, and it was never intended to be. It was just a quick experiment to see if the concept was feasible—and it seems that, at least sometimes, it is. Thank you all for the continued support.
Running a server just to set the user agent header to the googlebot one for some requests feels a bit heavyweight.
But perhaps it’s necessary, as it seems Firefox no longer has an about:config option to override the user agent…am I missing it somewhere?
Edit: The about:config option general.useragent.override can be created and will be used for all requests (I just tested). I was confused because that config key doesn’t exist in a fresh install of Firefox. The user agent header string from this repo is: "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
yes, where "cares" means "the lost revenue is greater than the cost of development, QA, and computational/network/storage overhead, and the impact of increased complexity, of a function that figures out whether people are faking their user agent."
It's probably orders of magnitude greater than the revenue loss from the tiny minority of people doing such things, especially given not everyone who uses tools like these will become a subscriber if blocked, so that cuts the "lost" revenue down even further.
Even if it's not worth an actual site operators time to implement such a system themselves, WAFs like Cloudflare could easily check the IP address of clients claiming to be Googlebot/Bingbot and send them to CAPTCHA Hell on the sites behalf if they're lying. That's pretty low hanging fruit for a WAF, I would be surprised if they don't do that.
edit: Indeed I just tried curling cloudflare.com with Googlebots user agent and they immediately gave me the finger (403) on the very first request.
I also think the antitrust suit (and many more) need to happen for more obvious things like buying out competitors. However, how does publishing a list of valid IPs for their web crawlers constitute anticompetitive behavior? Anyone can publish a similar list, and any company can choose to reference those lists.
Hmm, the robots.txt, IP blocking, and user agent blocking are all policies chosen by the web server hosting the data. If web admins choose to block Google competitors, I'm not sure that's on Google. Can you clarify?
A nice example is the recent reddit-google deal which gives google' crawler exclusive access to reddit's data. This just serves to give google a competitive advantage over other search engine.
Well yes, the Reddit-Google deal might be found to violate antitrust. Probably will, because it is so blatantly anticompetitive. But if a publication decides to give special access to search engines so they can enforce their paywall but still be findable by search, I don't think the regulators would worry about that, provided that there's a way for competing search engines to get the same access.
This is false, the deal cuts all other search engines off from accessing Reddit. Go to Bing and search for "news site:reddit.com" and filter results by date from the past week - 0 results.
It kind of is. If Google divested search and the new company provided utility style access to that data feed, I would agree with you. Webmasters allow a limited number of crawlers based on who had market share in a specific window of time, which serves to lock in the dominance of a small number of competitors.
It may not be the kind of explicit anticompetitive behavior we normally see, but it needs to be regulated on the same grounds.
Regardless of whether Google has broken the law, the arrangement is clearly anticompetitive. It is not dissimilar to owning the telephone or power wires 100 years ago. Building operators were not willing to install redundant connections for the same service for each operator, and webmasters are not willing to allow unlimited numbers of crawlers on their sites. If we continue to believe in competitive and robust markets, we can't allow a monopolistic corporation to act as a private regulator of a key service that powers the modern economy.
The law may need more time to catch up, but search indexing will eventually be made a utility.
It's clearly meant to starve out competitors. Why else would they want website operators to definitively "know" if it's a GoogleBot IP, other than so that they can differentiate it and treat it differently.
It's all the guise of feelgood stuff like "make sure it's google, and not some abusive scraper" language. But the end-result is pretty clear. Just because they have a parallel construction of a valid reason why they're doing something, doesn't mean they don't enjoy the convenient benefits it brings.
If this is all it's doing then you could also just use this extension: https://requestly.com/
Create a rule to replace user agent with "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
It used to be a good extension. Now it is crapware tied to web services. I don't want any web services, I don't want seeing ads about paid features, I want a free extension working absolutely locally and not phoning home.
It says it blocks ads and other things, too. I imagine the use case is someone wanting this for multiple devices/people so they don't have to set up an extension for every platform/device individually. I have no idea how effective it is
You can make a search related function in FF by rightclicking on that box and 'add keyword for this search' https://i.imgur.com/AkMxqIj.png
and then in your browser just type the letter you assign it to: for example, I have 'i' == the searchbox for IMDB, so I type 'i [movie]' in my url and it brings up the IMDB search of that movie . https://i.imgur.com/dXdwsbA.png
So you can just assign 'a' to that search box and type 'a [URL]' in your address bar and it will submit it to your little thing.
Eh, pretty minimal risk unless you use a guessable hostname and/or the URL gets published somewhere.
If the install is under "eixk3.somedomain.com/ladderforfriends" and it sits behind a reverse proxy, it might as well be invisible to the internet, unless your DNS provider is an idiot and allows zone transfers, or you are on a network where someone is snarfing up DNS requests and then distributing that info to third parties. If you restrict it to TLS 1.3, even someone sniffing traffic from one of your friends won't be able to glean anything useful, because the requested hostname is never sent in plaintext.
Rotate the hostname if/when it becomes necessary...
As soon as you get a cert for that domain, you will start getting requests to it because of certificate transparency reports. Everyone will know immediately the site exists.
It seems to me that google should not allow a site to serve different content to their bot than they serve to their users. If the content is unavailable to me, it should not be in the search results.
It obviously doesn't seem that way to Google, or to the sites providing the content.
They are doing what works for them without ethical constraints (Google definitely, many content providers, eg NYT). Is it fair game to do what works for you (eg. 13ft)?!
> It seems to me that google should not allow a site to serve different content to their bot than they serve to their users.
That would be the fair thing to do and was Google's policy for many years, and still is for all I know. But modern Google stopped caring about fairness and similar concerns many years ago.
The policy was that if a user lands on the page from the Google search results page, then they should be shown the full content, same as Googlebot (“First Click Free”). But that policy was abandoned in 2017:
I disagree, I think the current approach actually makes for a better and more open web long term. The fact is that either you pay for content, or the content has to pay for itself (which means it's either sponsored content or full of ads). Real journalism costs money, there's no way around that. So we're left with a few options:
Option a) NYT and other news sites makes their news open to everyone without paywall. To finance itself it will become full of ads and disgusting.
Option b) NYT and other news sites become fully walled gardens, letting no-one in (including Google bots). It won't be indexed by Google and search sites, we won't be able to find its context freely. It's like a discord site or facebook groups: there's a treasure trove of information out there, but you won't be able to find it when you need it.
Option c): NYT and other news sites let Google and search sites index their content, but asks the user to pay to access it.
> Real journalism costs money, there's no way around that
I agree, but journals should allow paying for reading the article X amount of money, where X is much much much lower than the usual amount Y they charge for a subscription. Example: X could be 0.10 USD, while Y is usually around 5-20USD.
And in this day and age there are ways to make this kind of micropayments work, example: lightning. Example of a website built around this idea: http://stacker.news
I don't think this will work reliably as other commenters pointed out. A better solution could be to pass the URL through an archiver, such as archive.today:
I agree. Though there is a counterpoint that a Russian host isn't going to respect a DMCA request. On the flipside it's a Russian replacement for Github that is based on Gogs, Gitea, or even Forgejo possibly. So yeah, YMMV.
Nice effort, but after one successful NYT session, it fails and treats the access as though it were an end user. But don't take my word for it : try it. One access, succeeds. Two or more ... fails.
The reason is the staff at the NYT appear to be very well versed in the technical tricks people use to gain access.
They probably asynchronously verify that the IP address actually belongs to googlebot, then ban the IP when it fails.
Synchronously verifying it, would probably be too slow.
You can verify googlebot authenticity by doing a reverse dns lookup, then checking that reverse dns name resolves correctly to the expected IP address[0].
Which leads to the possibility of triggering a self-inflicted DoS. I am behind a CGNAT right now. You reckon that if I set myself to Googlebot and loaded NYT, they'd ban the entire o2 mobile network in Germany? (or possibly shared infrastructure with Deutsche Telekom - not sure)
Not to mention the possibility of just filling up the banned IP table.
Hypothetically if they were doing that, they’d only be ‘banning’ that mobile network in the ‘paywall-relaxing-for-Googlebot’ code - not banning the IP traffic or serving a 403 or anything. They ordinarily throw paywalls at those users anyway.
There are easily installable databases of IP block info, super easy to do it synchronously, especially if it’s stored in memory. I run a small group of servers that each have to do it thousands of times per second.
We need a P2P swarm for content. Just like Bittorrent back in the day. Pop in your favorite news article (or paraphrase it yourself), and everyone gets it.
With recommender systems, attention graph modeling, etc. it'd probably be a perfect information ingestion and curation engine. And nobody else could modify the algorithm on my behalf.
"The reason is the staff at the NYT appear to be very well versed in the technical tricks people use to gain access."
It appears anyone can read any new NYT article in the Internet Archive. I use a text-only browser. I am not using Javascript. I do not send a User-Agent header. Don't take my word for it. Here is an example:
If I am not mistaken the NYT recently had their entire private Github repository, a very large one, made public without their consent. This despite the staff at the NYT being "well-versed" in whatever it is the HN commenter thinks they are well versed in.
Because I had to learn more, sounds like a pretty bad breach. But I’m still pretty impressed by NYTs technical staff for the most part for the things they do accomplish, like the interactive web design of some very complicated data visualizations.
Rich Harris, creator of Svelte, worked at the Guardian. Svelte has been adopted at NYT however [0].
You might be thinking of Mike Bostock, the creator of D3.js and ObservableHQ.com, who led data visualization work at NYT for several years [1]. I'm not sure if they have people of that magnitude working for them now.
Rich did work at the NYT. I thought there was some Mandela effect going on for a second, because you misled me into believing you had actually googled it by providing sources.
Yeah, my bad. I shouldn't have relied solely on the Wikipedia article and my (sketchy) memory. Rich Harris is still listed as a graphics editor on the investigative team at NYT: https://www.nytimes.com/by/rich-harris
Might want to update the Wiki article on Svelte, which strongly implies Rich worked at The Guardian, not NYT. The only source I could quickly find that seems to corroborate what you're saying is a LinkedIn page, but because of its bullshit paywall, there's context missing.
I would hope that they’re in IA on purpose. It would be exceptionally lame if NYT didn’t let their articles be archived by the Internet’s best and most definitive archive. It would be scary to me if they had only been allowing IA archiving because they were too stupid to know about it.
FWIW, if you happen to be based in the U.S., you might find that your local public library provides 3-day NYT subscriptions free of charge, which whilst annoying is probably easier than fighting the paywall. Of course this only applies to the NYT.
In the Netherlands the library provides free access to thousands of newspapers for 5 days after visiting, including The Economist and WSJ, which actually have paywalls that aren't trivial to bypass.
PressReader allows reading various newspapers (in their newspaper form), the short sub to NYT I mentioned is a bit different and gives you access to the online version.
I continue my search for a pay wall remover that will work with The Information. I'm honestly impressed that I've never been able to read an Information article in full.
There's definitely a hit to search traffic if you go this route, as Google is unlikely to rank you above a competing article for a competitive keyword based on only an excerpt. The Information simply doesn't care.
They have an expensive subscription ($400/year) that I'd guess targets VCs and tech execs, which is a very specific segment that's best reached via means other than Google search, anyway.
But yes, to your point, successfully paywalling a media site in a way that's impossible to bypass is trivially easy to do. Most media sites just don't think it's worth it.
I would run headless chrome to fetch website and block all fetch requests from ublock origin list. This would give you a "remote" adblocker that works anywhere.
How is its addon support now though, in general? I stopped using it last year since it was pretty slow on iOS but more so because its addon support was very wonky.
> On iOS there are no web browsers other than Safari, per the app store rules. "Chrome" / "Firefox" / etc on iOS are just basically skins on top of Webkit.
I’m more inclined to use archive(.org|.ph). But this is a decent workaround when archive is unavailable.
Side note: paywalls are annoying but most publications are often available for free via public library.
For example, NYT is free via my public library. PL offers 3-day subs. A few other decent publications are available as well. Availability of publications is YMMV as well.
It is obviously clumsy on purpose in the sense that if you want to access the NYT on a regular basis, you need to go through the procedure again once the 72h pass expires. If you are a regular reader it might be worth paying for the membership.
The docker-compose.yaml file is where you specify the ports you want to expose. It looks like by default it's 5000:5000 (5000 outside and inside the container). You will need to change it and then run docker-compose up -d.
You can change it to something like 5133:5000 and access the instance through localhost:5133
Thank you for the tip! I ended up editing the port parameter in the app.run() call within portable.py and it worked. Felt like it might be a good idea to add this as a runtime argument for easier customization.
>> This is a simple self hosted server that has a simple but powerful interface to block ads, paywalls, and other nonsense. Specially for sites like medium, new york times which have paid articles that you normally cannot read. Now I do want you to support the creators you benefit from but if you just wanna see one single article and move on with your day then this might be helpful
Personally I'm not a fan of this attitude. I've read and digested the arguments for it, but, for me, it runs close to "theft".
For example, read the sentence again, but in the context of a restaurant. Sure I wanna support the creators, but what if I just want a single meal and then get on with my day?
Businesses, including news web sites, need to monetize their content. There are a variety of ways they do that.
You are free to consume their content or not. You either accept their monetization method as desirable or you do not.
The "I just want to read one article" argument doesn't fly. If the article is so compelling, then follow their rules for accessing it.
Yes, some sites behave badly. So stop visiting them. There is lots of free content on the web that is well presented and lacks corporate malfeasance. Read some of that instead.
I get that I'm gonna get downvoted to oblivion with this post. HN readers are in love with ad blockers and paywall bypasses. But just because you can do something, just because you think it should be "free, no ads", does not make it right.
Creators create. They get to choose how the world sees their creation. Support it, don't support it, that's up to you. Deciding to just take it anyway, on your terms (however righteous you feel you are) is not ok.
1. Stop comparing digital products (data) to physical products (food).
2. Don't use the word "take", nothing is taken, only copied.
3. "They get to choose how the world sees their creation" Not necessarily. This is a pretty big assumption that lies at the heart of the conflict between the rights of the author and the rights of the public.
> 1. Stop comparing digital products (data) to physical products (food).
That's not a fallacy. Perhaps the way they are compared is wrong, but they can be compared. If you want an example: e-books vs books, mp3s vs cds, Netflix vs DVDs, online banking vs your local branch office.
> 2. Don't use the word "take", nothing is taken, only copied.
You do take it, except when you intend take to mean "remove from a place." You can take a nap, you can take a breath, etc.
> 3. ... This is a pretty big assumption
Copyright owners do have the right to restrict access, legally and morally, although the latter is IMO, of course.
I agree with you. Where the gap lies for me is that I can’t just buy one meal, I have to sign up for the yearly meal plan even if I just want one meal.
A few years ago I tracked how many times I visited some of the various paywalls sites for their articles, and typically it was between 5-10 times per year. One was 30, so I paid for a subscription to them, but I can’t justify several dozens of dollars for 5 articles on many other sites. If I can’t access their content because a bypass doesn’t work then so be it, however I wasn’t willing to pay for that content either. I feel like it’s the classic misconception regarding piracy by the movie industry - I wasn’t willing to pay money for it in the first place, so it’s not lost revenue (unfortunately).
I was actually discussing this overall problem with my wife the other day, and I came to the conclusion that I basically want a “Netflix” but for news - I subscribe to one place and I get access to a whole range of publications. That’s worth it to me. I very much don’t see it happening though, sadly.
I'll counter your one meal vs subscription analogy with another;
I don't want to buy this Ferrari, I just want to drive it for the day. The dealership wasn't interested (they directed me to a different business with a different business model.)
Yes you want a Netflix for news. But even Netflix isn't enough. You also need Amazon, Disney+, Apple TV and so on.
Indeed, all of them are only partially aggregating - much of throw content (if not all of it) is in-house production.
Yes micro payments per article would be nice, but endless startups have proved this approach works for neither suppliers nor consumers.
There's no place to rent a Ferrari in my town. That doesn't make it ok to just "borrow" the dealers one.
The world does not have to magically conform to our desires, our ethics. Sometimes we can't get what we want, and just taking it anyway is not ok
Supercar hire businesses do exist though, and I can certainly rent one for a day in many places all over the world.
Regarding Netflix - I’m referring to OG Netflix which really did seem to aggregate everything under one subscription.
In any case, I do agree that micro transactions for articles mostly do fail, hence my leaning towards a more “Netflix”-style approach that lowers the risk for consumers. I don’t expect to get what I want here, but publishers also can’t simply get what they want either.
Yes super-car rentals exist, but only in a small number of locations. My point was that not having one conveniently available doesn't make alternative approaches ok.
An aggregator like the original Netflix would be nice but I suspect that model would not work for long. (As evidenced by current Netflix et al).
Publishers can certainly do anything they like with their content, and they set the rules for accessing it.
Assuming what they want is piles of money, I expect they take that into account when setting the rules.
But it's their content. You don't get to break the rules just because you don't like them.
The problem is that this might end with a bad incentive structure that I think would lead to bad quality: it would push you to write the kind of articles that everyone wants to click on/pay for. So mostly clickbait. Emotional content instead of factual one. It's unlikely that this could finance the long-term, high-quality investigative journalism that actually defines high quality journals.
Newspapers have been dealing with this issue since the nineteenth century. I don't know how things work where you are in the world, but in the UK and Australia, newspapers are separated into broadsheets which have better-quality journalism and tabloids which are clickbait nonsense. (The terms come from the paper sizes they used to be printed on.) In the UK, the tabloids are further divided by the colours of their mastheads: the black-tops are less sensational; the red-tops are more sensational.
Yes. My understanding is that Spotify is reliant on a system used for radio broadcasts. Paying pennies to broadcast songs was already the structure. Artists make their money from merch and live gigs as it always has been. But now they can also target me for advertising based on the bands I listen to, to sell me merch and live gigs.
Back to newspapers. Traditionally if a news paper article takes my fancy I pay a small fee to access that paper and not establish a long term subscription for just that paper. A spotify type service could easily hand out credits in the same way.
> They get to choose how the world sees their creation
Then they should not publicize it. They could license only to Google, but Google isn't interested. Instead, publishers need to publicize, which is... expected? Once they publicize, they can't claim the public is not allowed to read. It's like sharing a printed newspaper with my friends. Publishers shouldn't be able to prevent it.
Yes. It's funny how people will claim they only block ads because they allegedly want to pay for good content, but cannot, or claim that piracy is just a service problem. Yet when asked to put their money where their mouth is they instead just continue to openly try and get stuff for free. It's pure entitlement and disrespect for other's labor
As for the "I can; therefore I will" justifications: I can steal from my local corner shop. It's very unlikely they'd catch me. Yet I do not
I just came from chicagotribune.com where they tried to entice me with a Flash Sale of one year’s access for a total of $1. Sounds great, but I took advantage of it a year or so back and regretted it due to how annoying they were with advertisements, newsletters, etc…. It’s pretty amazing that the tactics can be so annoying that they can make me regret a $1 purchase.
Upshot: Numerous legitimate reasons. I'd very much like to see universal syndication / superbundling on a distributed progressive tax and/or ISP tollkeeper basis, with some additional details for the fiddly bits.
As for subscribing to content submitted to general discussion sites, such as HN or others:
As of 21 June 2023, there were 52,642 distinct sites submitted to the HN front page.
Counting those with 100 or more appearances, that falls to 149.
Doing a manual classification of news sites, there are 146.
Even at a modest annual subscription rate of $50/year ($1/week per source), that's a $7,300 subscriptions budget just to be able to discuss what's appearing on Hacker News from mainstream news sources.
Oh, and if you want per-article access at, say, $0.50 per article, that's $5,475 to read a year's worth of HN front-page submissions (10,950 articles/year), and that is just based on what is captured on the archive. In practice far more articles will appear, if only briefly, on the front page each day.
Which is among the reasons I find the "just subscribe" argument untenable. Some sort of bundling payment arrangement is required.
I'm not suggesting this out of any sense that they are good-faith participants in society or even commercial space.
They are however the one point of authenticated and payment-based contact between all Internet users and the greater cloud.
So if there's going to be some form of payment, it's highly likely to come either from the ISP or some collection of government-imposed fees or taxes. The latter could come at multiple levels, e.g., a city, county, state, or federal levy.
>They are however the one point of authenticated and payment-based contact between all Internet users and the greater cloud.
Yeah but we need to diminish their contact with authentication and payments. They have little to no understanding of security.
Like you might be thinking "Oh just have comcast do this" and they could probably cobble something together with only a massive data breach once in every 5 years.
But think about all the guys in the sticks dealing with Joe Blogs Wisporium or Texas Petes flying internoobs. These solo operators often cant work out telephony or iptv. Having them middleman payments for an extra tax or service is a huge extra risk. I am not even super comfortable with them holding PII or Credity cards as it stands.
Mostly because they see themselves as invisible, have basically no security posture, and are often either operated by the owner, or by a bunch of foreign tech support goons who have unlimited access to your data.
3. Have a long sordid history of despicable practices, including price gouging, rampant upselling, shoddy customer service, manipulating search and Web results, selling customer browsing history or patterns, and worse.
4. Are typically at or near the top of lists of companies / industry sectors with which the public trusts least and hates most.
They have enormous power over individuals, collect absolute gobs of data, and act capriciously, carelessly, and overwhelmingly against their subscribers' personal interests with it.
This attitude is why journalism is dying. There is value to an undissected payment to the publisher that gives them revenue surety and lets them fund a variety of reporting, even if you don't personally find it interesting or relevant. This is exactly how journalism x publishing worked with the "rivers of gold" from classifieds/advertising during the golden age (also: this is exactly how Bell Labs, Google 20% time, etc were/are funded. The incessant need to pay for only the small portion you directly consume/find interesting kills this sort of thing).
Interesting thoughts. I can’t refute or support your assertions about the cause of journalism’s demise off hand, but I actually am very curious whether a publication could find success in a model where one could pay a nominal fee to access a give article (in addition to the normal subscription setup).
I don’t pay for NYT. I don’t want to, because of the cancellation stories I see repeated.
If I could pay $1 here and there to access an article, though? I’d do that.
And NYT would get some money from me they aren’t getting now.
I see the point you're making, but I'm not sure it's a fair assessment that my attitude is why journalism is dying. I'd almost go so far as to say we're making the same point.
See, back in the "good olde days", I could subscribe to 1 or 2 sources of news (in my case the local paper and the big city paper) and get something like 80-90% of my general news. I guess largely through the magic of syndication. When someone shared a story with me, it was physically clipped, no paywall. And I get the impression that advertising was fairly effective.
The attitude that is killing journalism is, IMHO, the publishers attitude that the world still operates the same way it did 40 years ago: buy a subscription to us and advertisements work.
One of the big reasons I don't subscribe to, say NYT, is that in a given month there are only a few articles there that I seem to be reading. There are maybe 5-7 sources like that, and, when I'm honest with myself, my life isn't enriched by $100/mo subscribing to them. And advertisements just don't seem to work in today's world.
For example: I do pay for The Guardian because they:
- Don't paywall their articles.
- They pop up a reminder that I've looked at X articles.
- I can pay what I want (I probably average paying them around $1 or 2 per article I read).
While this is generally true for legacy publications (impossible to cancel!), I mostly enjoy paying for niche-topic newsletters from a single source. A great example is a former newspaper journalist who was laid off and now produces his own newsletter focused on a single college football team. He probably makes more now than when a newspaper employee. I am a happy subscriber. I pay for a handful of these. I also subscribe to newsletters like "Morning Brew." while free and ad-supported, it is well done.
A decade or 2 ago, there were some talks in several countries about creating a global licensing aggreement where people would just pay one single tax payment per year and have access to everything without being called pirates.
But medias / arts publishers weren't happy with that idea.
Articles should come with license agreements, just like open source software nowadays. Free for personal entertainment, but if you try to make money from the information in the article or otherwise commercialize it, you can fuck right off.
> Free for personal entertainment, but if you try to make money from the information in the article or otherwise commercialize it, you can fuck right off.
Note that such a license would not be considered open source. Open source and free software allows commercialization because they do not allow discrimination against various interest groups. The only thing that open source allows you to do is to restrict people from restricting the information, which has some relation to commercialization, but not fully.
It once was Google's requirement that you'd serve the same content to the Google crawler as to any other user. No surprise that Google is full of shit these days.
Counterpoint - if you like the content enough to go through this - just pay for it. Monetary support of journalism or content you like is a great way to encourage more of it.
Countercounterpoint - Maybe I have news subscriptions for periodicals I regularly read, but don't feel like paying for a monthly subscription to read one random article from some news outlet I don't regularly read that someone linked on social media or HN.
So back out of the webpage and don't read it. That is a constructive way of letting a content producer know their user experience is not worth the "expense" of consuming their product. But if the content is worth your time and energy to consume, pay the "price" of admission.
I back out of the webpage and go to 12ft.io, which allows me to both, read the article, while simultaneously using that constructive way of letting the publisher know that their product is not worth it's price.
And then 12ft-dot-io throws an error, but still shows its own ad in the bottom right corner! But you probably knew that since you constructively use them.
The three articles I read from the NYT a year are not worth the price of a monthly subscription.
My choices are:
1) Use archive.ph to read the three articles.
2) Never read a NYT article again.
3) Pay for a subscription for the NYT.
I think you need to be approaching this from an exceptionally legalistic perspective to think that anything but Option 1 is reasonable. If I could pay the five cents of value those three articles are worth, I would, but I can't so I won't.
Standing at an empty intersection, I'm not going to start lecturing someone for looking both ways and crossing the street when the traffic light signals "Don't Walk".
I understand that you might feel that journalism is under funded and that this scofflaw, naredowell attitude is further jeopardizing it. The fact that the reasons newspapers are failing is complex and has less to do with consumer behaviour than it does with other factors not least of which are market consolidation and lax antitrust laws. I pay hundreds of dollars a year on newspaper subscriptions and I refuse to believe that I'm the reason any of that is happening.
I guess we are going down a rabbit hole that 12ft-dot-io doesn't specifically address — it doesn't bypass paywalls. Regardless, #2 is an option. And the choice is entirely yours.
I get more peeved at the entitlement many feel to use ad blockers and rail against content producers monetizing their sites, when the choice to not consume the content is an option. Ask my why I gave up twitter a few weeks ago :)
> Regardless, #2 is an option. And the choice is entirely yours.
I can also choose not to read over the shoulder of someone reading an article on the train or averting my eyes at the headlines displayed at a newsstand. Somehow, I can't find in me the slavish devotion to the media industry margins required to do so.
> I get more peeved at the entitlement many feel to use ad blockers and rail against content producers monetizing their sites, when the choice to not consume the content is an option.
This is such a confusing opinion, and an even more baffling to thrust it unto others.
The best thing to do for ones computer safety is to run an ad blocker, as acknowledged by even the FBI[0]. Profiling by ad companies makes our world more insecure and inequitable. I deeply despite selling client data as a business model, as it seems you might as well.
So, your position is that I should both lodge my complaint against their unfair dealings by not consuming their website, but that it is also unjust for me to evade tracking and block ads because it hurts their bottom-line which is unethical to begin with . This sorta feels like chastising me for walking out of the room while TV ads run and deigning to watch the rest of the programme.
It’s baffling to me why you would insist on consuming content produced by such dangerous abusers of your security and privacy. And then thrusting your opinion that all content should be free onto all sites monetized by ads is further confusing.
That’s glib. It is possible to discern websites that are safe, respect privacy and are generally pleasing to visit without an ad blocker. If you deem them unsafe, leave, don’t log entirely off the internet.
I’m not saying you are telling me to. I’m pointing out that you are depriving sites from their chosen method of monetization while continuing to consume their content. Effectively “averting your eyes” from their ads, instead of just not visiting the site.
I’m not accusing you of anything. It’s just simply what you are doing. It’s the mental gymnastics these threads are always full of justifying the wholesale disavowal of all ad-supported content that is hard to follow.
This assumes the their presence has no affect on me. It takes time to click a page and let it load, and more time to dig through all of the results when all of them are unreadable. Maybe if there were a tag like [ungodlyamountofads] on each, it would help. But even then I'd still have to scroll through them.
I guess I fail to see how one can entirely remove how fully voluntary the visiting of a webpage is. It is how the web works! And how all kinds of "free" media has worked for eons.
I don't mean to excuse incredibly poor user experience design, and certainly not abusive tactics. But sorry if I have zero empathy for your clicking, loading and scrolling pain. Leave the website! It is amazing how many people are defending a site that claims to "Remove popups, banners, and ads" while:
1 - failing to even work.
and:
2 - shows it's an ad on the resulting page!
While we likely agree there are egregious abusers of both user experience and privacy, I don't believe I have a fundamental right to define how a website is allowed to present their content and/or monetize it. But I do retain the right, which I frequently practice, to leave a webpage and utilize alternate sources in that moment and in the future.
Majority of the internet is your "leave the webpage" example so by allowing shady ad tech sites to use these tactics you're just promoting the proliferation of a shittier internet. Being subjective in this case makes no sense to me unless you have skin in the game so I'll assume you do.
As an exaggerated albeit relevant comparison; this is like saying you don't want police even though there are lots of criminals, you can always just walk away if things look suspicious. This assumes you have the eye to determine what is suspicious. I was hoping I wouldn't have to worry about crime in the first place.
Absolutely I have skin in the game. Do you never benefit from free content, tools or services that exist only because the opportunity to monetize through advertising is possible?
I display a single banner ad on a website that offers a free business tool, as an example.
I also do the same on a free business tool where I also offer a paid, advanced, ad-free version. If a user sticks around for 30 seconds, which most do (average time on both ad-supported sites is more than six minutes), then the freemium site pops up a message alerting them to the paid option.
No obligations and no restrictions on the free versions.
I don't make significant amounts from ads or subscriptions, but I would have no incentive beyond this to continue to offer these services, which many appear to find valuable and use for commercial purposes.
I frequent many free sites/tools that benefit from my visit, and I benefit from their offering for both business and personal reasons. I understand and agree to the transaction occurring.
Outlandish comparisons like you offer completely miss the mark and dilute the legitimate arguments for the use of ad-blockers, which I do believe exist. But I will offer an equally outlandish counterpoint: You prefer a world where over-policing would occur and round up innocent victims with criminals? "Most crimes are committed by males aged 18-25, if we round them all up, we will drastically reduce crime!" Hyperbole, I know. But probably more applicable than your argument for the use of ad blockers.
As I said before, I am not accusing anyone of wrongdoing. Using an adblocker allows for a cleaner, safer internet for the user. No doubt about that. It also, it has to be acknowledged, sweeps the good under the rug with the bad. Period. All-or-nothing enforcement is your proposition. Again, that simply has to be acknowledged. There is no debate there. If you believe that will ultimately lead to a better internet, then that is where we can disagree, as that is entirely subjective.
I'm not saying you're hiding anything it's just easy to see why you have this opinion. My example was not outlandish and is relevant. Vs the argument you made which was a purposefully dishonest analogy.
My hope is not to state that ads are evil as I don't believe that, just to point out that you are a person who serves ads, I also never state any of the opinions or beliefs you say I did, Have a nice day!
> But if the content is worth your time and energy to consume, pay the "price" of admission.
This assumes that the "time and energy to consume" is equivalent to the "price". What if it is worth the time to install 12ft or whatever, but not worth the price they want to charge?
I mean, sure, if you insist and make site-level negotiations with yourself about the value of the content.
Here’s a simple example for me:
I search Google for how to perform an operation in an Excel spreadsheet. I skip past the obvious ads at the top first. I click on a promising result on a user forum, but first have to click through a popup and then have a banner covering a third of the screen and a small inset screen with a video. That’s too much for me. I stop and go back to Google. I pick another option. And I may remember that forum is not worth the click in the future.
We make decisions like this online and offline every day. The fact is there are many valuable sites and services that are ad supported and done so responsibly. Not all, but many. Ad blockers are a blunt tool. Installing one on grandma’s browser is a constructive use, but not just because “ads are bad.”
^ This describes my experience as well. And there are certain outlets where I'll read an interesting article if someone links it, but don't want to give them money due to my objection with <xyz> editorial practices.
Paying for it doesn’t make the site less miserable to use. One of the stupid things about piracy is that it tends to also be the best available version of the thing. You’re actively worse off having paid for it. (Ads, denial, DRM in general, MBs of irrelevant JS, etc don’t go away with money, but do with piracy)
I agree, but would like for a way to pay for an article, or a single day, week, or month of access. Just like I could buy a single one-off issue of a publication a couple of times before starting a long term relationship with it.
Not all publications support this, and some like the NY Times require chatting with a representative to cancel the subscription.
I see a lot of talk about physical media around film and music, but not being able to buy single issues of any magazine or newspaper anonymously when the circumstances call for it, is a great loss for public discourse.
I feel like there were companies in the past that did try this, where you would chuck $5 or whatever in an account, and then each page you went to that supported the service would extract a micropayment from the account.
Never took off. Should have done. e.g. in Santa Cruz there is https://lookout.co , which is pretty good, but extremely pricy for what it is. There has to be a way between "pay and get everything", "ignore/go straight to 12ft.io".
As of 21 June 2023, there were 52,642 distinct sites submitted to the front page.
Counting those with 100 or more appearances, that falls to 149.
Doing a manual classification of news sites, there are 146.
Even at a modest annual subscription rate of $50/year ($1/week per source), that's a $7,300 subscriptions budget just to be able to discuss what's appearing on Hacker News from mainstream news sources.
Oh, and if you want per-article access at, say, $0.50 per article, that's $5,475 to read a year's worth of HN front-page submissions (10,950 articles/year), and that is just based on what is captured on the archive. In practice far more articles will appear, if only briefly, on the front page each day.
Which is among the reasons I find the "just subscribe" argument untenable. Some sort of bundling payment arrangement is required.
My alternative suggestion is global access to content through a universal content syndication tax, or fee assessed through ISPs, on a progressive basis. See:
"Why won't (some) people pay for the news?" (2022)
Countries outside of the US exist, some of them with extremely low incomes that nevertheless hold segments of the population that are technically competent enough to not only understand what Docker is, but to use it on a regular basis.
The NYT is from the US so framing the question this way is not surprising and drawing the comparison of someone who can't afford NYT but knows what docker is, is interesting without your addition.
There are other things we could mention like, maybe there are many people who can afford NYT but still don't want to pay for it, but that's not what we were talking about. That being said, thanks for the reminder about other countries... I'm sure everyone on HN forgot about globes.
I myself don't wish or want free content, I can create my own entertainment without the need of a large media corporation or a keyboard warrior spoon feeding me.
I don't think comparing Journalism to Netflix or ESPN is relevant since they provide quality entertainment (at least in the minds of their base) vs Journalists who stretch out 2 bits of searchable information into a 10 page ad puzzle full of psychological poking.
Yes, most journalism is less valuable than the critically acclaimed fantasy horror Stranger Things. This doesn't mean Journalism is less important or that good journalism doesn't exist. Honestly it's crazy to me Journalism doesn't see more critique. Most just sensationalize, fearmonger, and point fingers.
It's not unusual for Americans to live insular lives where the rest of the world doesn't exist in their worldview. The globe snark is unnecessary and frankly not worthy of a HN comment.
And assuming that no one outside of the US could possibly be interested in US-oriented articles in the NYT - not to mention their world news - is just another example of the insular attitude I'm referring to.
Is there a way to pay for journalistic content that doesn't involve participating in the extensive tracking that those websites perform on their visitors?
I love to read the news but I don't love that the news reads me.
> Is there a way to pay for journalistic content that doesn't involve participating in the extensive tracking that those websites perform on their visitors?
Well you could buy physical newspapers/magazines. (Or access content via OTA TV / the library.)
100%. And sometimes that form of payment is putting up with ads, etc. I routinely back out of sites that suddenly take over the screen with a popup or take up large chunks with video or animations. Same as opting not to go in a particular store. But I also stick around and occasionally use products advertised to me. Shocking, I know.
I fully agree with the sentiment! I support and do pay for sources I read frequently.
Sadly payment models are incompatible with how most people consume content – which is to read a small number of articles from a large number of sources.
No. Paywalled content should not be indexed by search engines. The implicit contract I have with the search engine is that it is showing me things that I can see. The publishers and search engines pulled a bait and switch here by whitelisting googlebot. So it's fair game to view the publisher's website with googlebot. That's what the engineers spent their time working on. It would be unfair to let that go to waste.
It is an affront to the open web to serve one set of content to one person and different content to someone else (unless there is a user experience benefit to the customization I suppose).
I place most of the blame on the publishers for doing the bait and switch, but Google gets some blame too. They used to penalize website that sent googlebot different results (or at the very least they used to say that they penalized that). Now, they seem fine with it.
I dunno, it seems more like there should be a user-configurable setting to hide/show paywalled content.
If you're looking for something, and it's only available behind a paywall (even a paywall you pay for!), how are you going to find it if it's not indexed?
I'll do that as soon as one-click-to-cancel becomes law. I refuse to subject myself to predatory business practices so they won't see my money until a legislative body starts working on behalf of the people.
But when somebody shares an article with me and I want to see what I’ve been sent, I’m not going to buy a $15 monthly subscription to some small-town newspaper in Ohio just because they’ve decided to paywall their content in that way.
I wasn't even thinking about paywalls, the first thing I did was check to see if cookie banners and "Sign in with Google" popups went away. There's so many user-unfriendly things that you constantly deal with, any amount of browsing is just a bad experience without putting up defenses like this.
The neural network is not scraping content for its own purposes, it is for the purpose of the people who are running/training it.
And yes, one person reading a piece of content without paying money for it is far, far better than one person/corporation scraping all of the world's content in order to profit off of it.
It’s probably about the same. The difference with sites like e.g. Perplexity is that they have a business model which requires “acquiring” said content for free whereas a single person is just a single person.
> Somewhat related comparison, Is a human choosing to do this theft really better than a neural network scraping content for its own purposes?
Here’s a similar comparison: “Is a human recording a movie at the theatre to rewatch at home really better than the one who shares the recording online?”
Seeing as you’re calling it “theft”, presumably what you mean by “better” is “causes less harm / loss of revenue to the work’s author / publisher”.
I’d say the answer is pretty clear. How many people go through the trouble of bypassing paywalls VS how many use LLMs?
Saying “a neural network scraping content for its own purposes” doesn’t even begin to tell the whole story. Setting aside the neural network is unlikely to be doing the scraping (but being trained on it), it’s not “for its own purpose”, it didn’t choose to willy nilly scrape the internet, it was ordered to by a human (typically) intending to profit from it.
why pay a monthly subscription if we're going to be bombarded by legally required popups and other internal promotional stuff that hooks you to the site anyway?
Regarding the limitations of this approach, I'm fully aware that it isn't perfect, and it was never intended to be. It was just a quick experiment to see if the concept was feasible—and it seems that, at least sometimes, it is. Thank you all for the continued support.