Hacker News new | past | comments | ask | show | jobs | submit login
Track HN: Survival Rate of Show HN Stories (nami.land)
230 points by namiwang on June 13, 2023 | hide | past | favorite | 73 comments



I'm happy to say that the reports of my death here are greatly exaggerated :)

I'm the owner of both #4 and #140 on the Top-scoring Show HN Stories that Didn’t Survive... but both are very much alive!

#4 StackSort was a Github.com page, but on 2021 they made it so only Github.io wroks. If dang sees this, I'd really appreciate if you could change the URL for https://news.ycombinator.com/item?id=5395463 to use github.io!

#140 ReadMe has the same io/com issue, in the opposite direction! we redirect readme.io to readme.com now, which seems to be why it's flagged.


How on earth did you get readme.com?

I'm assuming someone else owned it, whenever I see that and all the "make an offer" links I move on and ignore it. Was the process easy?


It wasn't called ReadMe originally... I happened to be browsing HN, and came across a post with someone offering readme.io for free, and I was like "oh that's a great name!" (I ultimately paid $3k as a thank you)

https://news.ycombinator.com/item?id=6397526

For the first few years, we used readme.io as our domain. When we did our Series A, I finally bought the .com for $170k. By that point I knew we were successful, and I figured the longer I waited the more it'd cost.


Wow I can't fathom the .com being worth that much. Did you manage to do any sort of math on whether the .com has helped bring in $170k of sales? Or how many years it would take to break even.


It’s an asset they can resell, and he validated the $170k price himself by paying that much.


It's only worth that much to a company named Readme. If no such company exists, and no company is willing to rename itself to that, then it's not worth that much. It's worth $170k times the probability that there is or will be a company that wants to buy it for that much.


It's only worth that much to a company named Readme

It's worth that much to any business that wants readme.com and believes it to be worth 170k. You don't need to be called Readme to want that domain. There are plenty of tech companies that want vanity domains to point to parts of their offering. I can easily imagine Microsoft, Jetbrains, Atlassian, or Replit making an offer for it.

That said, by far the most likely way any company would acquire it would be as part of acquiring Readme as a whole. If you want to exit by selling (rather than IPOing) the value of your assets still plays a small part.


It’s certainly been worth it. For better or for worse, the .com projects a level of professionalism the .io does not. We work with many large companies, and I’m absolutely confident we’ve made our money back on it.


Someone has to be the first one.

I once registered web.site. I had it for a week or two before I tried pointing it at a server at which time someone noticed and took it off me.


> at which time someone noticed and took it off me.

This... isn't how domain registration works.


I mean it absolutely could be. For a lot of these weird TLDs, it's a single company that makes its own rules as to who owns a domain. Not saying the person you're replying to is right or wrong, but it's not out of the realm of possibility.


It was taken off me by the same people that I registered it with. They may have broken rules to do so. I just know they said I shouldn't have been given it.


.site is run by radix. They run quite a few reasonably popular domain names (.tech, .space. .online etc). AFAIK their rules are entirely standard.


“Someone”…?


The main page https://gkoberger.github.io/ that the https://gkoberger.github.com/ link suggests going to gives a 404 as well. Could be a good idea to add a main page for https://gkoberger.github.io/ that links the StackSort page and anything else


Good call! I'll fix that up today.


If we're going by vote count, my "show HN" should be #30, but it's not on the list at all.


If you mean https://news.ycombinator.com/item?id=23965787, that still survives, right? Which is why it's not in the list of "Top-scoring Show HN Stories that Didn’t Survive" (where it would be #31).


> Extra: ChatGPT Gave a Wrong RegexPermalink I consulted ChatGPT for a regex to extract domains from urls, and it gave a flawed one:

^(?:https?:\/\/)?(?:[^@\n]+@)?(?:www\.)?([^:\/\n?]+).

It even gave reasonable detailed explanations which convinced me. Later tests revealed that this regex doesn’t work for url with @ in path, such as https://foo.com/@./bar. The correct one should be

^(?:https?:\/\/)?(?:[^@\/\n]+@)?(?:www\.)?([^:\/?\n]+).

---------------------

The trick is to ask ChatGPT what the right tool for the job is in your language of choice. For python, ChatGPT will happily give you:

  from urllib.parse import urlparse
  extract_domain = lambda url: urlparse(url).netloc.replace('www.', '', 1)
  # Example usage
  url = 'https://foo.com/@./bar'
  domain = extract_domain(url)
  print(domain)  # Output: foo.com
-------------

I don't think RegEx is typically the "most" correct tool for the job for things which likely have built-in parser libraries (XML, HTML, URLs, JSON, etc)


Nice work!

I'd actually be interested in factors that make a Show HN a success vs failure.

Objectively, there's an obvious one your dataset: time of submission. Tuesday afternoon (which timezone? I assume US west coast?) seems to be key. No way this correlates with the quality of submissions.

Subjectively: it seems to become much harder recently. I managed once a couple of years ago for a short time to reach the front page with an Android app, now I'm barely able to get above 20 points, even though the product is (again, subjectively) cooler and has a possibly wider audience (https://news.ycombinator.com/item?id=35671245).

Not complaining, but perhaps nowadays Show HN is not an easy way anymore to "get the word out" and get some early user feedback for and from indie hackers? Any other sites that might be of interest?


Its badge on a product's home page is to me a negative signal, but partly since it does still happen (quite a lot) - people do seem to use ProductHunt.

(I suppose I'd use it - and pretty much anything - but just not put 'omg #1' badge on my site, if I had something to launch myself.)

Completely tangential now, but I think its problem is right in the title - who is hunting a product? It's a complete echo chamber, surely nobody who doesn't have something to launch is actively using it - 'it's Wednesday so I need a new Gmail-integrating Jira spline reticulator'.


> which timezone?

I’m wondering the same. Earlier in the article he mentions UTC.

So it’s either afternoon or early in the morning Pacific time.


No affiliation, but the second to top deceased site is still alive and kicking [0]

Spot checking the top results might give a better estimate for how many are actually alive vs. just using bot protection.

[0]https://news.ycombinator.com/item?id=35543668


It just errors out right now. How can we differentiate: always errors out vs dead?


Vercel (and AWS) are down right now, hence the error.


https://harvestsignal.com/ is also still alive, but the site certificate expired.


Thanks for making and sharing this - although I'm surprised it's not a "Show HN" itself!

I was curious about the top post that didn't survive - an HTML5 game called "airma.sh" - and I wanted to check it out. I think I found a working mirror: https://www.crazygames.com/game/airmash

It's possible that this is a different game, but it seems to fit the description.

Interestingly, the person who submitted that post stopped being active on HN after that discussion.


Airmash lives very well on this community hosted site: https://airmash.online/

The original author was never to be heard from again.


I know you mention there are lots of reasons for false positives and negatives, but does your methodology account for length of time at all? Meaning, if a project was posted to HN in 2009, it could have been successful for 14 years and then closed down, or just changed URLs somewhere along the way, and in that case it would be counted as a failure even though it wasn't. Likewise, if it was posted in May, 2023 and is still around, that doesn't mean much because it's still flying the Grand Opening banner, practically.


Exactly. Some of these graphs are really flawed. Like the heatmap for the top 1% which pretty much mirrors the submission heatmap. I want to see what portion of submissions for that time slot reached 1%, not of all submissions. There could be time slots that perform exceedingly well outside of popular times.


The top 250 has 8 dead projects from 2023. Of those 8, 5 are not dead at all, 1 is alive but has an expired certificate and only 2 (the lowest ranked) are dead. This does not seem like useful data.


That's definitely a red flag, although I'd expect the 2023 data to have a disproportionate number of false negatives relative to true negatives (since the vast majority of 2023 projects are still alive).


Airmash still lives at https://airmash.online/ and there’s also a space mod - Starmash - at https://airmash.cc/

I apologise in advance for the hours you’ll lose to these (again?)


> Looking for a Sponsor to Host the Database PubliclyPermalink > In the meantime, it’d be great if anyone can query the database. I tried to host a public database and real-time query interface online, but couldn’t afford the bill for a smooth Postgres instance to hold around 20G (40M rows plus indices) data. While a $20 instance could suffice, it’s pretty slow from usable, comparing to the local one on my M2 MacBook Air.

Here is the database with publicly available SQL endpoint: https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...


Nice, but seems to be last updated 2022-12-12 and funnily the IDs that don't exist have a time of 1970-01-01 00:00:00


Great visualisation. I was quite surprised that the submission dates and times appeared unimodal around an American morning peak.


Using a stacked barchart for dead vs alive isn't a great choice in my mind. Normalize to 100% please.


n=1 but I know at least one non-american who has stayed up late so that the submission coincides with this peak time


Regarding database hosting, if you would consider giving the data away, I would suggest converting it to an SQLite database and sharing it over Torrent.


I'm guessing OP wants to share a database that's always up to date.

A torrent containing a single sqlite file would be good for a snapshot in time, but each update would require a new torrent, even if it only contains the updates since the base or last release.

IIRC IPFS can be used to distribute files that change over time, with only the changes being transferred, although of course there would need to be a place where OP publishes the hash of the most recent file.

In either case, someone would need to seed the file to guarantee it's always available.


I second this. You've done a great service to collect this data. I'm guessing the file must be much smaller than 20GB when compressed.


I've also did an experiment by generating and searching embeddings for all the comments on HN. Here is the walkthrough: https://www.youtube.com/watch?v=hGRNcftpqAk


It is only around 5 GB in ClickHouse. Details: https://github.com/ClickHouse/ClickHouse/issues/29693


Neat idea, thanks for sharing.

Curious choice to highlight Show HNs that didn't survive, but not the ones that did.

Is there a reason for this?


Same, I read the article twice in case I missed it, but no, nothing about the ones that did survive, even on the "more data" section.


> Send me your interesting queries

I'd be interested to see what the top Show HN posts were, after adjusting for the growing size of the HN community. That is, posts from 10 years ago would not have garnered as many upvotes simply because the community was smaller, and presumably posts were upvoted less back then, in general.

I don't know the best way to measure this; it could be normed based on the median number of upvotes for the top story each week, bucketed by month. Probably someone has a better idea for this.


I am also, along with gkoberger happy to say that we didn't die after our Show HN (Show HN: A Covid-19 testing location site that a group of us are building)

https://news.ycombinator.com/item?id=22650725

In fact we were so successful that we were able to shut it down less than a year after we started (It's on the list as a very reasonable Type II error ;))

Thanks to the HN community for helping us get an amazing Temporary product out and shut down successfully


Recently I was browsing through old threads where users showed off their personal websites and blogs. I wanted to find some inspiration for my own website.

What I found instead were about 3/4 dead links – even though the threads were all from the last 4-5 years. I found that quite sad, because people often talked with great passion about their websites and they sounded really cool. Also i LOVE those small, personal islands in the big, commercialized and in many ways centralized web.


Sadly that is nothing new. I used to run a website gallery and link rotting is incredibly high.

Same is true for another couple of projects I’m running now. I’m collecting personal websites and quirky small web experiments and the same is happening there.

Somewhat related is the phenomenon of dead blogs. Plenty of those with a couple of interesting posts and then abandoned.


> So I’m looking for a sponsor to host the database publicly. I need one mediocre VM for a Rails stack app and a semi-powerful hosted Postgres instance. Contact me if you’re interested

The Oracle Cloud Free tier is a great deal. They give you 4 Ampere A1 Cores + 24 GB RAM + 200GB storage for free. More than enough for a 20G (40M rows plus indices) Posgres instance.


Is there a way to see how long a link stays on the hn front page on average, and if that average is rising or falling over time? I read that avg time spent by a twitter hashtag on the twitter trending page has been falling year over year. Indicating people's are paying less attention to any one thing.


I'd love to get some correlation with rank, or even filtering of lower scoring posts.

From what I know, HN posts are often used as a signal for viability of a project. In that case, you can't make a conclusion on the effectiveness of Show HN posts, because some of them will die off by design.


Just a silly aside with regards to the regex to extract domains from URLs, my little tool called unfurl [0] exists to solve that exact sort of problem :)

[0]https://github.com/tomnomnom/unfurl


bagder (of curl) also made trurl to address URL manipulation:

https://github.com/curl/trurl


Phind (#2 on your list) is still up and running also (https://www.phind.com/search?q=false%20negative&source=searc...).


How do you have 40mm rows of data on Show HN for only ~126,000 stories?


Comments and the stories that are not "SHOW HN".

From TFA:

> For this analyze, I considered submissions made before May 31, 2023, 23:59 UTC. The dataset consists of 4,714,023 stories and 30,363,533 comments from 867,097 users.


My Show HN from 2013 is still alive but it's listed as dead (#590). Probably because the link from the post uses https but my 301 redirect only works using http.


Oh, Airmash is dead. I remember seeing it on HN then spending half of my workday this day playing it.


The community revived it to https://airmash.online/ pretty sharpish, does this count as dead?


This is neat! One of my sites is on this list - I'm gonna have to put up a 418 on it as well.


Is this why HN was so slow yesterday?


Very cool!

Did you have any conclusions?

I had a look at the page, couldnt see anything you'd written up :-)


What is the timezone for the heat maps? I assume UTC but wanted to check.


You're telling me substack.com doesn't even make the top 100?


If you're referring to the domains, it's by submission count. Presumably only one Show HN was linked to substack.com.


It's just strange to me because medium.com comes up as #4, but in recent years Substack links get posted very often.


People post Show HNs and link to substack.com? I guess I don't understand why medium.com would show up either, but I can't recall seeing a substack link for a Show HN.


I guess you're right. I checked out the substack domain on HN and only found one obvious Show HN thread for it, and the overall list of threads was surprisingly short. I feel like I've seen enough Substack threads come up here that, if Medium were linked to for a lot of Show HNs, that Substack should be on that list somewhere even if it's dead last. My impression that Substack links were competing with Medium links on HN was definitely not correct.


What about low ranking Show HN that did survive?


What timezone is used for the submission heatmap?


The pandemic really got the activity going during 2020 (first bar chart), but maybe not so surprising with everyone pivoting to remote work. And obviously all discssusions about vaccines and how different government were handling things.


Phind, the 2nd entry, is live and well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: