Hacker News new | past | comments | ask | show | jobs | submit login
How paywalled sites permit access to visitors from social media sites and apps (elaineou.com)
184 points by kevinphy on July 18, 2017 | hide | past | favorite | 94 comments



This bookmarklet works too: javascript:location.href='http://facebook.com/l.php?u='+location.href


    javascript:location.href='http://facebook.com/l.php?u='+encodeURIComponent(location.href)
encodeURIComponent because there might be special characters


In exchange for Facebook tracking, and potentially intercepting and modifying, every page you look at this way.


Which, for many people, will be preferable to the obvious alternative way to get through the pay wall...


Actually it is not, but most people lack the knowledge and understanding to care enough.


    if (details.url.includes(url)) {
So, if I want to detect if you have this plugin installed, I load an image with ?plugin-test=wsj.com as any part in the url can have it to trigger.

Might want to improve this...


If you add this to the manifest "permissions" section and go to http://pbb.nkcss.com/ you can see it's easily detected:

    "http://pbb.nkcss.com/*"
And if I can do it in a few minutes, I'm sure those who have a paywall can do so as well.


Some code review just in case anyone is interested. (I don’t expect it to make it into the article as it was written six months ago.)

This pattern:

  if (details.url.includes(url)) {
    return true;
  }
  return false;
should be replaced by this:

  return details.url.includes(url);
This pattern:

  array.map(someFunction).reduce(function(a, b) { return a ||b}, false)
should be replaced by this:

  array.some(someFunction)
(Note the semantics are slightly different—`.some` will break early, so it’s more efficient and equivalent provided there are no side-effects in the map function.)

Taking both of these, the following:

  var useTwitter = VIA_TWITTER.map(function(url) {
    if (details.url.includes(url)) {
      return true;
    }
    return false;
  })
  .reduce(function(a, b) { return a || b}, false);
can be rewritten much more simply as:

  var useTwitter = VIA_TWITTER.some(function(url) {
    return details.url.includes(url);
  }
You could even do it thus if you desired:

  var useTwitter = VIA_TWITTER.some(details.url.includes.bind(details.url));
… but that’s probably harder to read. I will mention arrow functions, however, which are pretty:

  var useTwitter = VIA_TWITTER.some(url => details.url.includes(url));
This part:

  details.requestHeaders.filter(function(header) {

    // block cookies by default
    if (header.name !== "Cookie") {
      return header;
    } 

  })
`.filter` only cares about truthiness in its return value—as this code does it, undefined is false and an object is true. But you could simplify it:

  details.requestHeaders.filter(function(header) {
    // block cookies by default
    return header.name !== "Cookie";
  })
Also in the original code’s usage of map, it’s not actually changing the values, only things inside them, so using `map` is wasteful (as it entails allocating a new array). You could just use `forEach`:

  var reqHeaders = …;
  reqHeaders.forEach(function(header) {
    if (header.name === "Referer") {
      header.value = setRefer(useTwitter);
      foundReferer = true;
    }
    if (header.name === "User-Agent") {
      header.value = setUserAgent(useTwitter);
      foundUA = true;
    }
  });
A remark on fine-tuning performance: when you access properties inside an object multiple times, it’s optimal to store it as a local variable to save having to look it up multiple times. (This is especially the case if the property is expensive to access.) Take the `blockCookies` method:

  function blockCookies(details) {
    for (var i = 0; i < details.responseHeaders.length; ++i) {
      if (details.responseHeaders[i].name === "Set-Cookie") {
        details.responseHeaders.splice(i, 1);
      }
    }
    return {responseHeaders: details.responseHeaders};
  }
This is accessing `details.responseHeaders` many times when it only needs to access it once. It is also accessing its `length` member once per iteration, rather than caching the length. Normally for that I’d say “store the length once up front,” but in this case the code is changing the array length in the loop, so that’d actually break things. On that note, the code as published is actually missing some cookies, because it removes an item from the array and then skips past the new element at that index. To fix that, you need to move the `++i` into the loop so it can be skipped if you do splice the array. Also in order to not need to access the length property many times you could iterate in reverse instead of forwards. I might write the whole function like this:

  function blockCookies(details) {
    var headers = details.responseHeaders;
    var i = headers.length - 1;
    while (i > -1) {
      if (headers[i].name === "Set-Cookie") {
        headers.splice(i, 1);
      } else {
        i--;
      }
    }
    return {responseHeaders: headers};
  }


Awesome code review, even more valuable than the plugin itself. Thanks :)


You can actually just do this:

    for (var i = 0, l = details.responseHeaders.length; i < l; ++i) {


I would have done that, but because of the splicing you don’t actually want ++i every time. That’s why I rewrote it to a while loop.

Even then you could do it as a for loop with an empty increment clause, of course. Or even use a one-liner for loop with no body:

  function blockCookies(details) {
    for (var headers = details.responseHeaders, i = headers.length - 1; i > -1; headers[i].name === "Set-Cookie" ? headers.splice(i, 1) : i--);
    return {responseHeaders: headers};
  }
But that’s crazy talk. Leave optimisations that to the minifier if it feels so disposed.


I remember this trick many years ago was useful for bypassing download sites that otherwise obliged you to use their adware-filled "download manager" to get files or gave bonuses to such usrs (special user-agent). The words "User-Agent: MEGAUPLOAD 2.0" might bring back interesting memories for some here. ;-)

It's trivial to do this with a filtering proxy, which means it works in all browsers. On the other side, I've unknowingly embedded images from image hosting sites that didn't allow "hotlinking", only to be told by other users they couldn't see them, because of the referer headers (or lack thereof) my usual configuration sends.

IMHO things like this shouldn't be spread too widely... much like the fight between adblockers and anti-adblockers, it can only eventually lead to a more hostile computing environment.


Paywalled sites (particularly news/media sites) are playing at a messy game here. It’s similar to the free vs paid music. In the radio-records day, free radio play drove record sales revenue. Ideally they would sell records to anyone willing to pay, but give music away to anyone willing to listen. In the actual world, you can’t do this perfectly so you need another way for free & paid to co-exist. Once the medium changed to digital, all the rules get thrown out. Attempts to force digital to play by radio-&-record’s rules have been a slog.

Many internet generations ago news sites were mad at Google for showing ads alongside their headlines on news.google. They even got some lawmakers to agree. Google offered to de-index them. Stalemate.

Similar issue here. News sites want paid subscriptions if they can get them, so paywall. They also want the readers who won’t pay, so no paywall.

Overall, I think paywalling is a semi-dead end. I don’t mean that it won’t work for any site, but it’ll probably be a niche revenue source like a print-only publication. Most news sites want to be part of the greatest, most relevant discussions. Those happen on the internet as a whole, not inside walled gardens. This mess free for some, paid for others mess is to much of a kludge to be the model.


It's fine talking about messes, but _someone_ has to pay for the content, and display ads don't cut it. Paywalls will be the dominant form of monetization for most mainstream news orgs in the next 5 years. The second most important one will be the Guardian's supporter model, which is a paywall lite. It's still not clear this model will work at the Guardian is still bleeding money.

I think eventually you'll see a re-arranging of press wire cooperatives such as AP and CP to limit dissemination of wire content for free without strings attached, forcing the hand of many.


The content is usually crap used to brainwash people, so let the shareholders pay for it. Besides once _someone_ has paid then no one needs to pay ever again because sharing and digital copies.

The business model is wrong and broken, and this failed business model is not our problem.


I honestly prefer to not use pay walled websites. I also run adblock but in a whitelist mode, sites i often visit, find entertaining and worthy of any money from my activity, get whitelisted.


I am curious as how these sort of guides are taken as incouous fun reads while a guide to, say, shoplifting would seem less than legitimate.


But this is about an arbitrary distinction between two groups of non-paying customers!

However I dispute the "customer" analogy too. In fact, turning this tired and one-sided analogy of a "store" on its head, how about it's my browser, and while they're in it, they can play by my rules?

There is nothing about a client-server paradigm (especially nowadays with thick clients) causing me to be "in their premises" or causing that to be a better analogy than their being in my premises[1]. The "online store" was just an analogy, and was how the www was sold to millions of profiteering dullards starting in the 90s, so naturally the idea has gotten a lot of traction and you can be forgiven for still thinking of it that way.

[1]Neither is accurate of course - The truth of the matter is that my robot handshakes with their robot across space. (And then their robot asks the Twitter robot whether my robot has the special Twitter street cred badge.)


I think a better analogy is sneaking into a movie theater, for which there are a large number of "innocuous fun" articles out there [1] (although not featured on HN so much)

[1] https://www.google.com/search?q=how+to+sneak+into+a+movie+th...


Because, unlike a news article, the marginal cost of the shoplifted good is nowhere near zero.


And how about a guide to illegally copy Spotify tracks without an account?


link?


Just hypothetical. I'm curious about the perceived value of music versus news articles.


I wouldn't be surprised if a shoplifting guide made it to the front page of HN.

I mean, how many HNers have read the MIT lockpicking guide or the Anarchist's Cookbook? At least half?


As someone who has worked on building two paywalls and still involved in them, please bear in mind a few things.

1. Publishers have ability to configure access on paywalls as they see fit.

Whether based on referrer, UA, and a whole host of other attributes, history and so on.

2. Publishers don't care about a degree of paywall evasion.

Studies show that people willing to pay will pay, and those who go to great lengths to evade won't ever subscribe. The question is then, do you want to waste expensive developer resources in an arms-race against ppl who'll never give you a cent, or do you want to spend that developer time enhancing the experience for those who will subscribe.

3. Full locked-down paywalls are known to be bad.

Publishers still want to ensure their content is in the public conversation, and that means their content has to be accessible in soem form - or you strategically choose to follow a different business model. See: https://techcrunch.com/2010/11/02/times-paywall-4-million-re...


This is the kind of thing that eager prosecutors will turn into a CFAA charge.


Another way to get free access to the WSJ, which doesn't require stepping into a legally/ethically grey area, is to use one of the handful of apps/extensions that has been granted free access.

The WSJ approached me and offered this access to a project I'm building, Read Across The Aisle [1]. I've built an iOS app and Chrome extension, both of which are free. I don't know exactly what the WSJ gets out of this deal (we have not given them, or anyone else, any user data), but I think it's that they want to be associated with post-filter bubble projects.

1: http://www.readacrosstheaisle.com


So why isn't this on the Chrome store?


Because the sites that have set up the paywall will report the extension and get it taken down, happened to an extension that I made a while back that had a similar idea.


Why is it take-down-able?


Because it allows people to circumvent the paywall and WSJ will complain to Google. And then because Google can take down whatever they want...


Which is why you don't put your extension as s Chrome extension, but as a tamper monkey script.


What is the best, in your opinion, site to find tamper/grease monkey scripts? I remember the landscape being fragmented some time ago.


Yeah. Seems like when the main site went down, the scene never got things back on track with a proper aggregator or central site.


https://greasyfork.org/ - it's been a year or so since I've made use of it but after userscripts.org became unusable this was my replacement. they made some sensible decisions about what's acceptable in a script and whitelisted external libraries: https://greasyfork.org/en/help/code-rules


Not really. There are lots of extensions that do this:

https://chrome.google.com/webstore/search/paywall%20remover?...

I'm sure many get taken down by request but its usually on a technicality (flimsy trademark claim) or sloppy assessment on Google's part.


A Paywall Extension, hmmm? What if it were part of a dev tool bundle?

It's arguably unfair that users of Facebook and Twitter and other social media sites should have access to more content than someone who has decided not to use those platforms.


Private websites have no duty to allow "fair access" to their content. Non-users of Facebook is not a legally protected class.


Firefox and ModifyHeaders is nice for this.


While I think these kind of posts are really fun to read, they should also mention little bit more about why these are paywalled. People needs to understand that traditional newspaper is dying and in order to support real journalists, they need to make money. I am neither working for any news organisation nor talking about a specific news outlet.

As far as I can see, they are trying hard not to annoy people but at the same time try to make some money for their work.


Even with paid news, the NYTimes, WSJ, etc are using this new model of 'share-ability' to spread their articles in hopes of getting new subscribers. Unfortunately this means that they are still highly incentivized to generate click-bait and inflammatory articles.

Better to write clap-trap for a single political view that will spread like wild fire in its echo chamber rather than a well-reasoned article properly contrasting the opposing viewpoints. People don't share articles that challenge them and their social circle, they share what they agree with.


These are all the headlines from https://www.wsj.com as of right now. Please point out which you consider "click-bait" and/or "inflammatory":

- Republicans Ditch Senate Health Bill, Plan Repeal Vote

- Trump Certifies Iran Is Complying With Nuclear Deal

- Trump Unveils Blueprint for Nafta

- BNP Paribas Fined by Fed Over Currency Manipulation

- Netflix Surprises With Big Subscriber Gains, Shares Soar

- Activist Proxy Fight Puts Focus on P&G’s Cost-Cutting Effort

- Proxy Fights Are a Rarity for Peltz’s Trian

- Trian Launches Proxy Fight Against P&G

- Artificial Intelligence Won’t Take Over Wall Street

- U.K., EU Resume Brexit Talks

- Bond Trading in Focus Ahead of Goldman Sachs Earnings

- U.K. Inflation Eases Unexpectedly in June

- What Macron Sees in Trump

- The EU Needs to Get Tougher on Hezbollah

- Why Europeans Oppose the Russia Sanctions Bill Unable to Buy U.S. Military Drones, Allies Place Orders With China


Meanwhile, on the WSJ Facebook Page:

- The Trumps and the Truth

- Millenials as seen by Corporate America

- Trump administration slaps Iran with additional sanctions

- Is Religion Still Tabboo? Not Everywhere

- Pay for College Interns is so 2020

- "Cereal for lunch, candy with 'new flavors and textures,' selfie-friendly makeup: what CEOs believe millennials want"

- Beachcombers Hunting for New York's Last Unspoiled Neighborhood

Still some real stuff in there, but a lot more noise.


The front page isn't what is shared. See sibling comment of the Facebook share page.


they might write some clickbait to attract new customers but they rely on subscribers to stay in business. if there was only clickbait as you said, people like me would unsubscribe in a minute


I'm wondering what keeps e.g. Google from implementing micro-payments. If I can get access to an article or website for 20 cents by simply clicking a (universal) payment button, I would do so in many cases.


Like the old adage goes, they won't do this because their livelihood depends on it.

A single user can be exposed to 3-4 (or more) ads per page. Each ad can earn them some money depending on the relevance of the ad to the viewer. In this case, the price they charge the advertiser is more guesswork/auction than science. They don't have to/can't prove to anyone that the ad truly worked. Advertisers tweak and A/B test ads to achieve higher engagements and better results.

A fixed rate model like you're suggesting limits the scalability of selling the same pair of eyeballs to 3-4+ advertisers. It forces google and content creators to attach a hard number to each visitor which is essentially upper bound by people's spending habits - if people are feeling poor, they can simply stop visiting websites to make ends meet!

Fir these reasons, I don't believe any advertising based company is ever truly going to get behind pay-as-you-go content. I've made my bet (see my profile for deets aka FD) and I'm in the process of quitting my job to put my money where my mouth is.


I don't think these two approaches are diametrically opposed. In the traditional print journalism model, I pay for my copy of the newspaper/magazine, and I get a lot of ads -- most magazines are > 50% ad content by area. Even online, I am a subscriber to the NYTimes and WaPo -- I still get ads when I view the site. Paying does not turn off ads.

I am sure I'm in the majority on this site, but I believe that it's possible to do advertising-supported content in a tasteful way where the ads add to the experience, or at least do not detract from it.


Print has a natural limit - you cannot expect to print and deliver unlimited sheets interspersed with ads. Websites can show you content from a decade ago with ads thrown in if the content is relevant to you. Print ads can be skipped far more easily that web ads.

With NYT and WaPo, your subscription is supposed to support quality journalism. The tradeoff that you're agreeing to - pay but also see ads - is not going to work for say a music subscription service or a stock tips service. My argument holds for those businesses. I'd say that WaPo, WSJ, and NYT are outliers, not the mainstream.

Tasteful ads:

Web advertising is so democratic that even the smallest company or startup can spend a few hundred dollars and get in front of you. Demanding tastefulness from such advertisers puts us as the risk of "corporatising" ads again, imho.

Anyway, I don't want to commandeer this conversation. I'd love to hear other opinions from HN.


Likewise cable TV. You pay for a subscription and you also see a lot of ads (more than on OTA TV, which is one of many reasons I cut the cord.) In the early days of cable networks, many had no ads because it was thought they could survive on subscription revenue. That idea didn't last long.


As a whole, broadcast and cable TV is (used to be) a captive market. It was very possible for channels to synchronize or nearly synchronize their commercial breaks which meant that they could get away with charging for content AND showing ads.

The proof is that the minute add-ons like DVRs and set top boxes like Roku came along, the first thing people did was to bail on the old TV model.


Today's newspaper wraps tomorrow's fish.

Internet ads don't


Thurs is why it shouldn't be an advertising company, but a brand-new company that should do this!


Amen! And this is what we are trying to do with Datajoy.



See https://contributor.google.com/v/beta

I think they've tried this before but recently re-launched.


I think that's the wrong approach. I don't want to pay to remove ads (there are adblockers for that). I would gladly pay for articles that I can otherwise not access. The requirements are:

- simple button

- no hassle; only one subscription, which can be used on all websites

- low cost per article (on the order of tens of cents)

And perhaps:

- a free preview of the first couple of paragraphs of the article

- discount when I use the service a lot


Adblockers have their limits, too. Some news providers take images embedded in the article and ads from the same server, making the two rather hard to distinguish. You either end up with false positives or false negatives.

Quite a few pages will not show you anything if they detect your adblocker, too. And don’t even get me started on all those scandals in which certain adblockers have been shown not to block ads because they were paid to do so... I don’t see adblockers as the answer. Also because the consumer-provider interaction should not feel like an arms race but as a trade at the end of which both sides should be happy.


Perhaps a crowdsourced approach could help here, where the raw text and relevant images are identified/extracted by the first few readers.


Sign up for Blendle. It's exactly what they do and all the big publishers are on board.


I wouldn't say it's exactly what they do. First of all, you end up reading the news through Blendle, where you have to see all kinds of news you don't want to see (in particular giving Blendle the opportunity to become a gigantic tracker); then (it's only available in a few countries so far) by far not all big publishers are on board, and finally it's not transparent how much of what you pay on Blendle makes it back to the original newspaper.

edit: Having to go through the Blendle site also means if you're looking at an article on wsj.com from a few months ago you won't be able to jump to the Blendle version to the best of my knowledge. Archives aren't Blendle's thing.

edit: Basically, I'd like to say: Don't sign up for Blendle.


Yes, also I'd highly prefer a universal solution by a big company like Google (who already tracks everything anyway, but can be trusted to a high degree).


I think we're still very far from peak tracking if there is such a thing. Imagine I know all the books you've bought within the last 10 years, how scary is that? Certainly a bit scary if I know the shopping habits of millions of customers to which I can then relate you and (maybe because they've filled out surveys that you haven't?) learn the approximate income of your household (so that maybe I'll start showing you higher prices?).

It gets quite a bit scarier if I have the computing power and access to the full texts of all those books (like Amazon and Google Books do) to search for patterns in those texts that tell me about your fears, political affiliation, sexual desires, etc.

edit: I'm not sure what you mean by trusting Google. I trust Google to get the technical side of things right.


> I'm not sure what you mean by trusting Google. I trust Google to get the technical side of things right.

Yes, that's what I mean. They are less likely to get hacked.


They don't get paid via ads like they used to?


Most people block ads, as a side effect of the ads being nasty and irrelevant.

Maybe they can start anew if they used bearable ads - something like The Deck (no affiliation, just an example).


People block ads. And for good reason. If a big website wants to make money from ads, they need to force the advertisers hand sometimes. OutBrain and Taboola are the worst.


Bear in mind ads is just one revenue stream. There are times that ad market has collapsed and prices dropped. Having a second (or more) reliable and recurring revenue stream makes solid business sense and allows them to move away from soem of the crappier ad practices.


They can start a Patreon instead if they need support, instead of closing their site to the ones who pay only.


That's an interesting point of view, though it is not shared by everyone. I tend to agree with the point of view that investigative journalism died when newspapers got bought by a handful of ultra rich people from the 1% or is it 0.1%.

Then there is the point of view that newspapers were dying even before the web and that it all started when newspaper got funded by including advertising.

One way or the other I'd like to see how effective paywalls are at supporting investigative journalism.


I am okay with that. The quality of news produced is already so low it doesn't matter.


With the real news gone, the fake news will no longer seem fake...


This only works, of course, for paywalls that want links in Twitter feeds to bypass their paywalls. I'd at least expect an ad at that point.


Interestingly the FT's paywall is immune to this in my testing. Device fingerprinting?


Not that I condone doing it, Google Bot trick still works for FT.


FT does not allow all articles free from a t.co link.


You can download my Firefox plugin to bypass the WSJ paywall (also bypasses FT paywall): https://addons.mozilla.org/en-US/firefox/addon/bypasspaywall...

And if you want the Chrome version you will need to manually download it: http://bypasspaywalls.weebly.com/


How Google’s Web Crawler Bypasses Paywalls

https://elaineou.com/2016/02/19/how-to-use-chrome-extensions...


That article links back to the one published on HN:

"Update: A newer version of the chrome extension is available here."


I want bot neutrality damn it!


tl;dr:

Wall Street Journal ended allowing special access for search engines through their paywall. By spoofing Twitter app's Referer and User Agent access is still possible and an included Chrome Extension script implements this idea.


I thought that google didn't allow websites to alter their appearance for google's UA.


I think the standard was, you could show the whole page to the google search engine so long as you showed the same page to someone clicking through from the search results. That's why the old trick was to search for the article on Google.


I think the WSJ is just showing Google the reduced version as well.


Nope. Never been the case.


<rant> Personally, I'm finding the increased use of tl;dr here on HN annoying. I feel, that like myself, HN readers are intelligent enough to read and understand the articles without someone coming along and posting a summary, simply for up-votes.

HN is all about the articles, and then the discussion on top. If people are finding the articles to hard to penetrate that they need a tl;dr summary, then maybe HN isn't the site for them? </rant>


This is exactly the kind of article I expect a tl;dr on, I actually looked for it to save myself opening the article, scanning for "referer" and then closing it immediately (which I did because the tl;dr was so low)

I think this is a backlash from the self-congratulatory, self-indulgent tone medium has perpetuated, not through design I'm sure.

Sometimes one just wants to validate what we expect/know and move on; personally I don't want more than 200 words to elaborate this headline.

edit: typo


Sometimes a tl;dr might be also a subtle and polite hint: The original post could be much shorter and/or its headline could be more expressive and/or less click-baity.


I find the tl;dr's quite useful. Even after reading the article I sometimes see the tl;dr and compare or at least confirm what I just learned in case I missed something.

I admit when they first appeared I felt the same way and didn't like them, but they have since saved me time now and then.


I have found the average HN'er who summarizes does a better job the the original author, in many cases. They might have a better understanding of the information, or just better writers?

I don't like reading wordy articles. I don't like reading rushed, or poorly researched articles behind a paywall.

Traditional journalists/authors have another problem on their hands these days.

They have people out there summarizing their information, many times better, and more concise than the original article--for free. This whole change in the way things were once done is hurting all of us financially.

There are so many times I come here and get such a better understanding of said article by reading the comments.

(I do the think publications need to up their game a bit. It seems like too many decided to hire new college graduates, on the cheap, whom seem to reluctantly spit out a article, or they hire wordy authors who can't write well, but got the job because they know someone at the organization. This is not the time for any publication to hire employees kids, or practice any form form of nepotism. I understand they can't pay like they used to, but this is the time to up your game especially if you want us to pay. Even if they up their game--they might have lost the war. And in certain cities/locales a lot of important issues/information will go under the radar; which I find sad. Maybe the federal government should step in and fund certain publications, but in a hands off approach?)


Is it just me or are websites that implement these kinds of selective paywalls rarely worth visiting, let alone worth spending effort to get through the paywall?


The Wall Street Journal is a very, very good newspaper that brought us such amazing series as "What They Know", which I believe started the practice of referring to the use of trackers as "spying" (much to the dismay of advertisers), which has certainly helped deliver the message that tracking is something users should worry about.


It is just you.


It's not you. For me, there is so much more to life than news. Not worth any extra effort




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: