Evolving “nofollow” – new ways to identify the nature of links

IfOnlyYouKnew · on Sept 10, 2019

It's the crisis of the hyperlink: It's diminished to the point where Google is afraid of losing it's previously dominant signal for good.

Personal publishing has moved almost entirely to platforms that `nofollow` links. And commercial sites do whatever it takes to avoid linking anywhere but back, deeper into their own site.

There have been some cases where I think people are behaving somewhat irresponsible with "nofollow": the German Wikipedia, for example, does it to all outbound links.

That's understandable from a spam-fighting perspective. But, if search engines actually implement it draconically, it deprives the search index of vast amounts of collaborative filtering.

I actually just checked, and couldn't find nofollows on either HN or Reddit links. Are they somehow setting them via some method other than the actual link tag?

In any case, I was going to suggest that nofollow makes sense for the "new" queue, but it wouldn't hurt and possibly help to remove it once user generated content has reached certain milestones, i. e. user karma, or hitting the front page.

kylecordes · on Sept 10, 2019

A related source of great frustration: put some good stuff on the web. Lots of people link to it but almost entirely from social media platforms where every link is a nofollow. Somebody else working on gaming the system with SEO makes a less valuable and useful thing, but obtains (by subterfuge or diligent direct asking) some non-nofollow links. Your high-quality content will be outranked in search results by the other less-high-quality quality content.

A reasonable improvement to this, to reinvigorate the hyperlink: social media platforms could stop using nofollow for links put in by users who themselves have built up a degree of reputation.

alxmdev · on Sept 10, 2019

It really does seem hard to get visitors these days, even for original niche content. The first page of Google search results is half social/video/news carousels instead of genuine long-term content, and a lot of the other links point to repetitive articles published by generic big-name websites that never really add anything new.

9588 · on Sept 11, 2019

This is the point of nofollow, to get rid of small websites and to prevent them. The spam story is nonsens, it does nothing spam protection related.

throwaway2048 · on Sept 10, 2019

The thing is, all this really does is promote spamming and self promotion, its of no real direct benefit to the user.

makomk · on Sept 11, 2019

The English-language Wikipedia used to take an... interesting approach to nofollow, not sure if they still do. External links to most websites were nofollow with the exception of cross-wiki links to approved wikis including links to Jimmy Wales' commercial website Wikia. So the net result was that the trusted sources which Wikipedia heavily relied on got no boost but Wikia got a substantial boost in ranking.

smitop · on Sept 10, 2019

> couldn't find nofollows on either HN or Reddit links HN nofollows job ads and posts under ~10 points (it's 5 in the opensource code, but it seems HN has been modified there). Old reddit has rel="nofollow" but then removes it via JavaScript for some reason. NewReddit displays OldReddit if it detects Googlebot is asking.

tshaddox · on Sept 10, 2019

> It's diminished to the point where Google is afraid of losing it's previously dominant signal for good.

Is Google afraid of losing that signal? I don't have any data to back this up, but my guess is that Google's reliance on PageRank is greatly diminished now, and they probably use search result click-through rate and website engagement data (from Google Analytics and ads) as the primary signals for website relevance and quality.

ma2rten · on Sept 10, 2019

PageRank is not an important signal anymore, but Google still uses links in other ways.

https://moz.com/blog/2019-study-on-link-ranking

lvncelot · on Sept 11, 2019

> but it wouldn't hurt and possibly help to remove it once user generated content has reached certain milestones, i. e. user karma, or hitting the front page.

Wouldn't that also give a lot more incentive to reach those milestones illegitemately?

ForHackernews · on Sept 10, 2019

It's funny because Google created this monster by attaching a value to links.

akersten · on Sept 10, 2019

I don't plan on implementing these, and neither should you, unless they get added as part of an official HTML specification. Maybe they are, but I don't see them on this list of link types [0]. Until they get formally specified, to me this is just Google doing Google things and fragmenting the web.

[0] https://developer.mozilla.org/en-US/docs/Web/HTML/Link_types

magicalist · on Sept 10, 2019

> unless they get added as part of an official HTML specification. Maybe they are, but I don't see them on this list of link types [0]

Extending link types is in the html specification:

https://html.spec.whatwg.org/multipage/links.html#other-link...

> Extensions to the predefined set of link types may be registered in the microformats wiki existing-rel-values page [http://microformats.org/wiki/existing-rel-values#HTML5_link_...].

where "may" is defined in the usual way (they aren't in there, but the page hasn't been updated for over a year, so I'm not sure if anyone cares about it anymore after already adding so many extension values).

It's common to have user agent (in this case googlebot) specific rel values.

akersten · on Sept 10, 2019

That's a fair point, thanks. I do still feel that the attribute is woefully underspecified (evidenced by the lack of the microformats list being updated, and the fact that it's a wiki) so in my opinion the least Google could do here is to shore that up. If they contributed to the wiki or formalized it a little bit (like the HSTS list or public suffix list) it would go a long way as far as how I feel about their additions to the attribute. I realize that's quite a lot to ask for "just" a few new link types though.

tannhaeuser · on Sept 11, 2019

To be fair, the WHATWG HTML "Standard" is also nothing more than a Wiki, or even just a collaborative space where everything can change at any time.

magicalist · on Sept 11, 2019

> To be fair, the WHATWG HTML "Standard" is also nothing more than a Wiki, or even just a collaborative space where everything can change at any time.

well unless you're waiting for the second coming for someone official to come along and bless an RFC, all standards are born of a collaborative space where everything can change at any time. Versions are just snapshots, and good luck if you want to stick with one forever (hows that TLS 1.0 server going to work out after March 2020?).

Anyways, if you want an official© W3 spec, here it is in HTML5 https://www.w3.org/TR/html50/links.html#other-link-types, HTML 4.01 https://www.w3.org/TR/html401/types.html#type-links, or hey, even back to HTML 2 https://tools.ietf.org/html/rfc1866#section-5.7.3

tannhaeuser · on Sept 12, 2019

I'm well aware of the various W3C's and WHATWG HTML specs (see my site at [1]); that was kindof my point ;)

[1]: http://sgmljs.net/

reaperducer · on Sept 10, 2019

It feels like yet another way for Google to offload the hard work of indexing the web into other people.

See also; microdata, and the monthly alerts I get from Google about the content of my sites being malformed, yet they validate just fine in a dozen other tools.

If this keeps up, eventually we'll have to log in to Google to provide it with with the URLs of new content, and fill in all of the meta data about each page. All in the name of whatever buzzword Big G comes up with that month.

notatoad · on Sept 11, 2019

>If this keeps up, eventually we'll have to log in to Google to provide it with with the URLs of new content, and fill in all of the meta data about each page

FWIW, most people who care about their search ranking would absolutely love a tool like this. The point of the automated crawl isn't just to find the information, it's because if you let people submit their own content to the index they'll lie about what it is.

tannhaeuser · on Sept 11, 2019

> hard work of indexing the web

IMO if you mean that Google should check linked-to content rather than relying on link qualifiers, Google has been doing a bad job in recent years, as I can't find useful material among an ocean of low-effort clickbait most of the time. OTOH, relying on metadata by publishers won't solve this problem.

gpm · on Sept 11, 2019

Dare I say that a better indexed web is a public good that we all benefit from immensely.

Logging into Google and providing it with the URLs isn't analogous because that only benefits Google, it provides a barrier to the entry of competition, which isn't good for us users.

zonidjan · on Sept 11, 2019

> Dare I say that a better indexed web is a public good that we all benefit from immensely.

Sure, if it were used for the benefit of the public, rather than the benefit of Google.

Spivak · on Sept 10, 2019

I mean we collectively did this to ourselves by letting browsers be super permissive about invalid markup.

The only reason that Google is able to do this is because things that don’t know what rel=“usg” means will silently ignore it.

amelius · on Sept 10, 2019

They would have found another, probably less elegant, way to do the same thing. E.g. they'd put the extra info in HTML comments.

zonidjan · on Sept 11, 2019

Oh, hi Microsoft.

the_pwner224 · on Sept 11, 2019

You're confusing forward-compatibility (ignore markup with tags you don't recognize, which is awesome) with accepting invalid markup (which is stupid).

buro9 · on Sept 10, 2019

I run large forums and mark my links "nofollow". I see no reason or benefit to me to change to or add "ugc".

It's not clear that there's any benefits for me.

And it's vague enough that I don't know that there are not downsides.

Seems best to do nothing.

kazinator · on Sept 10, 2019

Complete agreement. Furthermore, bad actors out there will abuse "ugc": they will mark terabytes of robot-generated content as "user-generated". It will not be possible to rely on it as an indication of organic content or anything of the sort. Search engines won't be able to use that for ranking, for instance.

wolco · on Sept 11, 2019

This change is not for you. This change allows them to treat facebook/twitter/reddit links with weight if the other signals are present.

kazinator · on Sept 11, 2019

If I know that I'm traversing and processing facebook/twitter/reddit links, doesn't "nofollow" already indicate user generated content? (Even if some "nofollow" links do not do that, I can probably separate those based on their position in the documents.)

If you're going to treat specific sites specially, then go in all the way and be prepared to have your logic understand any/all aspects of their structure. Or else, don't bother.

tylerl · on Sept 11, 2019

>> "...no reason or benefit..."

I donno, seems legit to me. Adding "nofollow" just means "I'm not accountable for this" -- it's overloaded to mean both "My site is linking here but I didn't vet it first" (the UGC version), versus "This isn't actually me, it's an advertiser borrowing space on my site" (the sponsored version).

Your site/forum might want "credit" for being a sort of attention aggregation hub without necessarily taking responsibility for all the content your users post -- you may want a softer middle ground if such a thing becomes available. But in the advertiser version you're just renting out page real estate; your relationship with those links ends the moment the ad is displayed.

Giving you a way to differentiate between the two could let you benefit from ranking and placement that more appropriately takes your user content into consideration while better filtering out noise from ad content.

nakedgremlin · on Sept 10, 2019

So... this is pretty much saying they will now analyze all links now, irregardless of "nofollow". Previously they outright ignored "nofollow".

Kimm0n0 · on Sept 10, 2019

Not enough signals when everything is tagged "nofollow"

yyyk · on Sept 10, 2019

Nitpick: There's no such word as "irregardless". Escorts myself out

apocalypstyx · on Sept 10, 2019

Slightly tangential, but this comes to mind:

We are not here concerned with so-called computer 'languages', which resemble human languages (English, Sanskrit, Malayalam, Mandarin Chinese, Twi or Shoshone etc.) in some ways but are forever totally unlike human languages in that they do not grow out of the unconscious but directly out of consciousness. Computer language rules ('grammar') are stated first and thereafter used. The 'rules' of grammar in natural human languages are used first and can be abstracted from usage and stated explicitly in words only with difficulty and never completely.

--Walter Ong [Orality And Literacy]

amelius · on Sept 10, 2019

For every word in existence, there was someone who used it first.

wahern · on Sept 11, 2019

I don't think it's that simple. There's a chicken & egg paradox at play.

decasteve · on Sept 10, 2019

Are you trying to start an inflame war?

illnewsthat · on Sept 10, 2019

Nitpick: there is such a word according to MW [1]

[1] https://www.merriam-webster.com/dictionary/irregardless

yyyk · on Sept 10, 2019

1. I only recognise the Oxford English Dictionary, not those upstart Americans ;-)

2. Even the Americans say (from your link):

"Irregardless was popularized in dialectal American speech in the early 20th century. Its increasingly widespread spoken use called it to the attention of usage commentators as early as 1927. The most frequently repeated remark about it is that "there is no such word." There is such a word, however. It is still used primarily in speech, although it can be found from time to time in edited prose. Its reputation has not risen over the years, and it is still a long way from general acceptance. Use regardless instead."

That's doesn't read as accepting the 'word' beyond the most technical noting that some people use it.

Jach · on Sept 10, 2019

Dictionaries are historians of usage not legislators of language. At least in English, where we have no equivalent to the "Académie française" (suck it, Jonathan Swift).

howderek · on Sept 10, 2019

Though it considers it "non-standard" the word also exists in the Oxford English Dictionary: https://www.oed.com/view/Entry/99668

zonidjan · on Sept 11, 2019

It actually sounds like they've already been analyzing all links, regardless of "nofollow". Which is not surprising in any way.

xg15 · on Sept 10, 2019

Seems so. They note at the end that they still respect rel=noindex and of course robots.txt though.

Jaruzel · on Sept 11, 2019

From the Article:

rel="nofollow": Use this attribute for cases where you want to link to a page but don’t want to imply any type of endorsement, including passing along ranking credit to another page.

This means the meaning of 'nofollow' is changing? That seems a horrible idea. Previously 'nofollow' meant exactly that - "don't follow this link please googlebot", now it will mean "follow this link, but don't grant my site ranking onto the destination." - Thats a VERY different use case, I can't see all the millions of existing 'nofollow' tags being changed by site owners to any of these new tags. Surely a 'nogrant' or somesuch would be a better option, and leave 'nofollow' alone.

gpm · on Sept 10, 2019

Question that wasn't answered: Should user generated content links, that may or may not be sponsored (suppose the site owner neither knows nor cares), be marked "ugc" or "ugc sponsored" or "ugc nofollow"?

_ps6d · on Sept 10, 2019

I think just "ugc" from Google's perspective, but they mention that you can use "ugc nofollow" for other services that don't understand "ugc".

wolco · on Sept 11, 2019

Ugc: It's basically someone I didn't know posted this. Treat it as neutral.

Sponsored is someone paided/trade for this. Treat it as distrustful.

Both is someone I don't know posted this and I made money. Treat as fake-

michaelbuckbee · on Sept 10, 2019

Nofollow has and it's interpretation have always been interesting. Professional SEO's analyze sites as to their "link profile" the number of nofollow, follow, anchor'd, raw etc. type URLs that point to your site.

Consider a super-spammy site that eschews any like but a "follow", well that's weird - that may be an indication they're a spammer.

So while this (quite likely correctly) states that "nofollow" links have not search rank weight, it's not the entire story.

Hitton · on Sept 10, 2019

Not sure that many sponsors will be happy to keep sponsoring the content if they'll get "sponsored"(pretty much nofollow) link.

vntok · on Sept 10, 2019

Seems like a net win.

Dylan16807 · on Sept 11, 2019

It's not a win, it just means nobody will use that tag.

jraph · on Sept 11, 2019

I also propose a semantic <ad> HTML tag, which would behave like <section>, to represent parts of a document that are an ad.

With an optional do-not-block attribute.

account42 · on Sept 11, 2019

> With an optional do-not-block attribute.

Which would be honored just like Do-Not-Track? Seems only fair.

treve · on Sept 11, 2019

I'm curious if any attempt was made to submit this as a standard inclusion to the IANA link relations registry:

https://www.iana.org/assignments/link-relations/link-relatio...

dwheeler · on Sept 11, 2019

I appreciate nofollow and ugc. I run a site with user-generated content, and worry about spammers over writing the site with junk content. Having a way to indicate that the link is from someone else is useful for eliminating the incentives for spammers.

zonidjan · on Sept 11, 2019

"Do more work so that we have more info. No, it doesn't give you any benefit."

wolco · on Sept 11, 2019

Nofollow made sense but if everyone is using it for all links there is no value in it and the tag should be ignored.

I like the change it allows them to ignore nofollow for facebook/twitter/reddit or smaller sites that nofollow all links.

tyingq · on Sept 11, 2019

I'm pretty sure google has already been treating some nofollow links as if they weren't nofollow for some time now.

9588 · on Sept 11, 2019

It makes organic indexing impossible, there is no point making your own website like this. Its even insulting calling hardworked content spam.

burtonator · on Sept 11, 2019

nofollow is 15 years old? Holy crap I'm old!!!