Hacker News new | past | comments | ask | show | jobs | submit login
ChangeDetection, monitor any website change (github.com/dgtlmoon)
253 points by serhack_ on Sept 1, 2023 | hide | past | favorite | 56 comments

A few years ago when I was a Junior engineer, I worked for a company which would provide us with free tickets to any sports team in our city. I'm not a sports fan, but my stepdad loves hockey. When the tickets went up, they got reserved pretty fast.

I noticed that the site we used to reserve the tickets had a predictable slug like: `{company}-hockey-tickets-2016-season`, and `{company}-hockey-tickets-2015-season`. The sites also didn't change between each refresh.

So, when it was nearly time for the 2017 season to start, I wrote a script which hashed a GET request of the site at `{company}-hockey-tickets-2017-season`. If the hash changed, it would send me a text message using twilio, and I would know to immediately get on a machine and reserve the tickets we wanted. After a couple false positives, the page eventually went up before it was announced and I reserved the tickets I wanted.

Unfortunately, the office manager who did these things told me I couldn't jump the gun and un-reserved my tickets. After the official announcement went up, the spots I had originally reserved quickly went to someone else

Yeah, that office manager probably knew who would be getting the tickets in advance.

Working hard: writing a script to detect when the website goes live.

Working smart: bring the office manager a pastry, ask them about their day, and drop hints about how excited you are for the tickets this year.

Engineer vs Non-Engineer approach.

You mean social engineering.

I'd rather pay scalper prices than do that

And that's when I'd make that script and the knowledge behind it available publicly and advertise it on social media :)

It seems like bad company policy to let a bunch of techies fight over a limited resource via web requests. It's like network inspector gladiators. Maybe that was the point? Lol

Why is the hash of the request better than comparing strings?

I assume they meant they were hashing the response rather than hashing the request. Then you don't have to keep the entire response around for comparison, only its hash.

Content on the page with slug `{company}-hockey-tickets-2016-season` changed, not the slug itself.

I think mhb's point is that the size of the request content is so small, it can be compared byte-for-byte directly instead of hashing and comparing hashes. Many of us have a kind of muscle memory where we always hash everything for comparison, but for small data there is not really a point.

Good stuff. Niche idea for ya: consider government contractors. Gov sites tend to be atrocious. Manual change tracking can mean you find a $1M RFP weeks earlier, which increases chance of winning. (Consider mda.mil. There's no email signup. What they consider a 'newsletter' is in fact a list of PDFs hosted on their site.)

And that's federal gov contracting. State/local is even more of a Wild West. So much so that many, maany companies just never have the bandwidth to even find RFPs and navigate those ancient sites.

While that may be true if contractors/competitors for contracts were generally unaware of the contract until posting, that’s usually not the reality. A lot of government contracting advice from experienced contractors is that if you’re discovering it via a public source (i.e. the website) you’re too late.

This I am afraid is not true either, though. I know many dual-use companies who discovered an RFP (not just on SAM), late -- i.e. with a proposal deathmarch -- and got through.

My understanding is that those scenarios are the exception, and finding obscure contracts that have gone undiscovered isn’t really an untapped fount of free, missed, opportunity.

That said, aggregators like bonfire etc. can get you pretty far if that’s your strategy!

Yeah it's probably true that such things are the exception. But for a startup, a chance at e.g. a $256k SBIR Phase I and then $1.9M Phase II and the modern follow-on options in e.g. the Army (called the CATALYST program and other things) -- this one exception turns the company into something real.

So because the amounts are large (compared to commercial, in terms of when a customer first buys from a small company), those cases really make a huge difference.

Oh yeah, grants are great, but a completely different beast!

I have to counter yet again, apologies. :) SBIRs yes are considered grants, but I've seen the same phenomenon for e.g. CRADAs and OTAs. Functionally this is all similar: we find them often on bad gov websites, and an automated tool to track changes can greatly help many companies to win them.

How can you be "too late?" Isn't the point of RFP postings that they're the first step in a process of soliciting proposals (bids) for the requested project? They're certainly not supposed to be granted on any kind of first-come, first-served basis... combatting that sort of corrupt insider dealing is the whole reason there are legal requirements to post them in the first place. So while I don't doubt the premise of your cynical truism that "if you're discovering it via a public source you're too late," I hope that this is a separate issue from the RFP process itself, because any project where that's the case must be a project where the RFP was meaningless in the first place. And that behavior is also explicitly illegal, even if it's unfortunately prevalent and infrequently prosecuted.

Funny how much better government might get if we just let their employees post on Craigslist

I'm OP. I'm not the developer of the project, but I think it deserved a post on HN because it's pretty an useful tool to track website changes. Setup can be done in a few minutes and the lead developer is always looking for solid feedbacks.

It cannot be a Show HN then:


Please change the title.

Uops, sorry

First time I've noticed a GH repo having so much SEO in the About section! Not sure how I feel about it..

It's actually refreshing to see end-user documentation and answering the basic questions of “what can this do” and “why should I care” in the readme. Most projects jump straight into how to build it locally, with not even a screenshot of what they’ve created.

Right? First time I run into a README file on Github and I think "Oh, this text is for search engines, not readers".

They should add an url into the description text. This will render as backlink any there are many pages out there which replicate github-topic pages which will then give your site many links for free.

IIRC, GitHub adds the nofollow attribute to all user-supplied links links.

I actually found it via the about section so I guess it helped haha

I've been self-hosting this for a year or so and it's pretty neat.

It happens quite often that I need to wait for something which doesn't have a builtin alert.

Currently I'm for example using it to get an alert when a flutter package has been updated on pub.dev

I've used https://urlwatch.readthedocs.io but this definitely seems much easier to use—though possibly not quite as powerful regarding page filtering. But at least ChangeDetection supports jq which is already quite a nice feature in that department.

https://visualping.io does a nice similar job and is free for moderate personal use.

On the theme of alternatives, I've been using https://www.followthatpage.com/ on and off for years. In particular for my partner's job hunt, watching some local trade association billboards or single school's `/jobs.html` pages.

Also has a reasonable free plan for personal use (20 pages daily + 20 pages weekly + 1 page hourly).

Definitely much simpler in terms of diffing (only text), but it has this 2000s vibe.

Curious how this handles sites behind CloudFlare using bot detection.

It’s not too difficult to get past cloudflare using puppeteer. Being normal is the key.

Yeah, +1. Even vanilla puppeteer is pretty successful against Cloudflare

I see selenium and playwright in the dependencies. I would’ve just gone with tls-client

Kudos, this is a very feature rich solution. I like it. 8.99 seems a little bit much, but I guess it's probably the sweet spot on the curve.

You can install it for free on your machine. I think it's a decent price, considering that they offer you to check a website even with a ratio of seconds.

For Windows, there is the OG https://www.aignes.com/

Here's an addon for Firefox that also checks websites in regular intervals to detect changes:


Not affiliated with it and I don't know if it still works, haven't used it lately. When I used it, it worked to my satisfaction.

my favorite one is visualping.io

Interesting! I've been making some scripts privately for specific tasks, like an Nvidia video card I wanted to buy. However many sites are really hostile to scraping these days. I'll give it a try,.

camelcamelcamel is a really old service, but it's still around and still works well for product price changes.

Yes but they collaborate with Amazon. For example during the pandemic they turned it all off for a few months because Amazon wanted it.

They're still around because Amazon lets them be around.


This is interesting. I did not log into it for ages but sure still get the alerts for my existing trackers. That's why I assumed that they are still around.

How does this handle A/B tests that sites are running?

wouldn't that depend entirely on which group its request got put in, which would be something it would have no control over?

I have used it for some time now. It has been very useful for me.

Some use cases for me:

* price changes

* calendar changes for sport events

* document changes for local gov

* new firmware releases

* terms of conditions changes

Anyone has been utilized this tool to monitoring Amazon Prices (with login)?

Use CamelCamelCamel

As far as I know, doesn't work to Amazon.br (Brazil). Thanks by the reply.

Have you tried keepa?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
