Hacker News new | past | comments | ask | show | jobs | submit login
Simple and privacy-friendly alternative to Google Analytic (github.com/plausible)
175 points by rajeshrajappan on March 7, 2021 | hide | past | favorite | 87 comments



I would posit that almost any remotely hosted analytics system (privacy oriented or not) is eventually a target for privacy centric browsers. If not now, then in the future.

I mean, let's be honest - the days are fast coming when anything that looks like remotely hosted javascript is going to be blocked, no matter how benign it is.

So could it be that the future is home grown analytics subsystems that reside in your own stack?

That way people who need deeper types of tracking can do it, while those that need shallow analytics can do it too.

It certainly seems to be heading in that direction.


We're hoping to start a conversation with browsers such as Brave and Firefox and blocklist maintainers about this.

One way to incentivize even more sites to move from GA et al would be to create some kind of privacy criteria and whitelist those analytics that fulfill it (open source, minimal data, no personal data, no cookies/persistent identifiers, no cross-site/device tracking, no connection to adtech etc).

Site owners want analytics. We offer self-hosted service but most sites don't want to deal with managing analytics server as it is not an easy job. So by blocking every analytics tool (good or bad) the incentive for site owners is more on trying to avoid being blocked rather than on moving to something more privacy-friendly.

(I'm the Plausible co-founder)


I doubt that would end up well with users (and Plausible is already blocked in blocklists). It is like the whole "VPN for privacy" debacle all over again. There's no way that a tracking company can prove to me that the tracking is not logging things it shouldn't (today or in five years), no matter how open source it is. As long as you can't prove it it isn't trustworthy, just like VPNs that have proven to be a privacy nightmare where you have lots of companies pretending to not log but in reality often do.

IMO all this will do is end up with yet more lists for adblockers and not only do we already have a huge mess with those we are also seeing them being strangled by API changes like in Chromium. Personally I'd much rather visit a site that use GA (because I know I can block it) than go in and "hope for the best" like it is with ad blocklists. Whitelists would either have to be bulletproof (IE. back to the proven privacy problem) or they would be like cookie pop-ups where most have no idea which to use and trust. I most definitely do not trust someone who builds a Chromium derivative to decide what to whitelist. Whitelists belongs in the users hands where they already are, not some remote company that is bleeding money. We have seen how that works out with a certain adblocker already.

I'm a site owner for a small business with zero tracking scripts and zero external connections from the site so I know for a fact that tracking is unnecessary even in areas with lots of online competition. Sure I could do a lot of tracking to make more money for the business but that is the rub isn't? Tracking is about greed. Webservers already tell us enough otherwise.

Edit: I'll also just add that anyone who is in the tracking business and use CNAME fiddling is per definition not trustworthy.


> Tracking is about greed. Webservers already tell us enough otherwise.

There are a few aspects that can't be tracked from server logs, for example screen size. I think this can be fairly important for UX reasons.

There's some other tracking that can be useful as well; for example if you're considering removing a button or feature then it's useful to know how many people are using that. If this is a JS-only feature (like, say, sorting a table in JS) then you need some JS tracking on this.

In short, I feel lobbing all "tracking" in one category is a mistake. It's all about how you use it and what you do with it. This applies to most technology really.

I do agree that trust is a big concern; I don't really have a clear comprehensive solution to this.


Everything you say is good and true. Unfortunately, the trust has been broken, and now everyone loses. The onus is on the people doing the “good” tracking to prove that they’re deserving. “This is why we can’t have nice things.”

Or, more realistically, the tracking will move to the browser, and since the dominant force in online advertising is also the dominant force in the browser market, they’ll continue to dig their most and track us all.


> The onus is on the people doing the “good” tracking to prove that they’re deserving.

That's exactly what your GGP is proposing ("create some kind of privacy criteria and whitelist those analytics that fulfill it")


Client Hints is a draft standard (currently supported in Chromium browsers) that allows servers to request some details like viewport-width.

https://developers.google.com/web/fundamentals/performance/o...

* I work at Google but not on Chrome


BrendanEich has some ideas on the "trust but verify" aspects of this. Plausible is 100% open source with no proprietary parts but we'd love to work with Brave (and Firefox/EasyList/uBlock Origin) to provide proof to get verified and unblocked by them. It would be a very effective way to get many more sites/businesses to remove GA


The thing is, say that you would be exempt by the blockers.

The way they work is not by downloading and checksumming scripts to see if they are allowed it not. They just downright refuse to download what is blocked.

So someone could use your special whitelist status to get their creepy tracking into visitor web browsers.

That does not make sense to allow for blockers.

Hence, you will continue to be blocked.

Great effort, though. I wish this were the future of analytics.


But being "open source" doesn't really guarantee anything. How do I know that anything I send to mysite.plausible.io gets processed the way you say it does? How do I know that the code running on Plausible.io is the same code that's on your GitHub? Hell, even if I can verify your code then how do I know your proxy doesn't syphon it off to a second "secret" service?

Don't get me wrong, I have no reason to doubt your claims and do trust you specifically, but basing the entire system on "I decide to trust Marko from Plausible" doesn't really scale.

I am in the same boat as you as I run GoatCounter; I know I do everything like I say I do, but I also know that there's nothing preventing me from doing any of the above and actually collecting much more from what I say I do. It's not hard to set up and no one will ever find out. Theoretically there are legal limits on this. In practice this is a very weak guarantee. This is a big reason why self-hosting was always a first-class supported use case for this.

Theoretically there are some technical things you can do to improve matters; for example a per-domain device ID generated by the browser (or JS, doesn't really matter actually). But then you run in to legal limits due to the way the GDPR is phrased, even though it's more privacy-friendly and not really in the spirit of what the GDPR is about :-/ We talked a bit about this over email last year IIRC.

The real crux is finding something that's practical, usable, and will actually be implemented/used. We can all think of some idealized system, but if it's not realistic that it'll be implemented then it's a pretty academic exercise. In practice this means that any browser solution will need buy-in from at least the Chrome and Safari teams to really be useful, and I don't rate the chances of that as very high of happening any time soon.

This isn't even because I subscribe to some "Big Evil Google and Their Nefarious Dark Plans" view, but just because they have little incentive to do any of this and it's quite a lot of work to do it well. It's easier to just block the lot and, arguably, this is perhaps better than doing nothing. If GoatCounter is impacted by this then so be it. At the end of the day site owners are not the customers of Safari and Chrome: people using those browsers are.


Someone from Chrome has proposed something similar in the form of a "privacy budget". Each fingerprintable surface gets a score and each origin has a budget. Once you go over, something(?) happens.

https://github.com/bslassey/privacy-budget


The fact that it comes from the Chrome team is why you know it should be discarded. The Chrome team's sole job is to protect Google's targeted advertising business.


I think that’s a great idea and I’m firmly on your side.

However, we’re still on that slippery slope as described I think.

At some point Firefox, Chrome and Safari is going to start blocking almost everything by default - or at best severely restrict them.

The question is, how can we move to some kind of embedded analytics? It’s already kind of there in most of the larger platforms.


>whitelist those analytics that fulfill it (open source, minimal data, no personal data, no cookies/persistent identifiers, no cross-site/device tracking, no connection to adtech etc)

Unpopular Opinion on HN.

This is hard. And sort of force everyone into the same bracket of privacy. As a contrarian, I actually want to know returning visitors ( It doesn't even need to be 100% accurate ). Right now, Tech uses privacy as a word for anonymous. In a real world analogy, the current privacy definition means I would be intruding my customer's privacy if I recognise the same customer coming into my coffee shop everyday during roughly the same time ordering the same latte with Oat milk.

I dont need to know who they are, and I shouldn't be able to buy what ever set of Data to match their profile or be able to sell my data for others to match him/her. Which is what I think is wrong with current tracking and adtech. But knowing my customers should not be it.


The thing that stops many people moving away from GA is Adwords. If I want to advertise via AdWords do I have any choice but to use GA?


You can advertise using Adwords without having GA.


You can. Poorly.


> I mean, let's be honest - the days are fast coming when anything that looks like remotely hosted javascript is going to be blocked, no matter how benign it is.

I'm sorry but what? Remotely hosted JS seems to always be growing in popularity, at least in the SaaS business and related area. Can I ask what industry you're in where you see more and more people blocking _all_ JS, not just analytics/tracking? (The HN bubble doesn't count as a "industry")

I have a really hard time as seeing your statement as "the truth". People today seem even more likely to accept arbitrary JS running in their browser, than how it used to be.


I run a Saas platform for many years that hosts for a few thousand customers that are definitely not the HN crowd.

Individuals are not blocking. It’s the browsers that are heading in that direction.

I’m just reading the tea leaves brah.


> Individuals are not blocking. It’s the browsers that are heading in that direction.

Where do you see any indication that browsers would prohibit executing cross-origin or cross-site JavaScript? (Browsers are limiting all sorts of things, but this is not one I'd expect or have heard discussion of.)

(Additionally, this is really easy for site owners to get around through CNAME or proxying)


And you're seeing less and less people having JS enabled at all? Are you actively steering people to use Lynx or something?


We see browsers restricting cross-origin everything more and more, take for example Firefox's recent efforts to segment cookies, or how browser caches are now segmented instead of global.

I agree with GP, I can certainly see a future where browsers make a move to block cross-origin JavaScript files.


I agree that browsers are getting more and more aggressive about blocking external content. I am all for good privacy protections and I understand the reasons why they are doing it.

But as a user, I am starting to get frustrated by how often I visit "normal" sites that just don't work in certain browsers because of all this blocking. There are plenty of legitimate reasons for people to load external content on websites, and it's important not to throw the baby out with the bath water here.

I am also concerned that certain big tech companies, notably including Google though it is hardly the only one, have a tendency to shoot first and not even ask questions later in terms of collateral damage when they deem something to be the appropriate course of action. Chrome has historically had no problem with killing off major functionality that some useful sites required, and other browsers have often followed. Apple has a tendency to do the same (or to achieve the same result by refusing to support functionality in the first place) particularly on iOS. It's always done in the name of improving privacy or security or reliability or some other worthy cause, but it still effectively removes useful content from the Web based on the decision of a handful of people who work on browsers. We should be very wary of that kind of power, particularly when it is wielded by people with little accountability or oversight.


I just set up Plausible analytics on my blog a couple of days ago and can confirm the at least uBlock Origin blocks it by default.


Why homegrown? Pretty obvious solutions for big players to it is to run small servers on your infra, that will proxy all the communication. Given that by putting remote js, you already give 3rd parties access to your data, it’s not that really opening you to that much extra risk.

It’s shortsighted IMO to fight with remotely hosted JavaScript. It makes things more complex, but doesn’t really help with privacy that much for longer term.


I think the solution is to integrate it at the build tool level. A Webpack plugin. They are much harder to block without having to actively rewrite the JavaScript codepaths. The requests should also be proxied back through your own middleware to side step DNS level blocks. I can see a model where a third party analytics platform can offer support for the most popular web frameworks.


One neat thing about Plausible is that you can point a CNAME of your domain to their servers and serve the analytics request that way. To the browser, it looks the same as if you hosted it yourself.


The same is true for server-side Google Analytics, but that's already been blocked by ad/analytics blockers.


Except Pihole has CNAME blocking by default so that will still not work.


Not if you are proxying.


You can proxy on your regular domain; we're talking about using a CNAME.


Yes, I would think that one of the selling points of a solution such as Plausible is that they can be hosted on the same domain. A huge step in the right direction if you ask me.


I'm happy with https://goaccess.io/ via raw nginx logs. No additional requests for the end user.


We don’t need any of the advanced features Google Analytics offers. But when I look at the replacements, they also lack basic ones.

A simple one would be showing me the visitors to /blog/ and its subdirectories. And from there allow drilling down to them.

And from a UX perspective, none of them seem to support searching for a specific page to display the stats for. Yes, you can edit the URL, but that’s a horrible way to do it.

edit: To add, they are also very expensive. Above 1 million views/month (which I would say is still a pretty small commercial site) goatcounter already is in "ask us" territory and plausible wants $69/month. As the value add seems very small, we rather use our own homegrown, bare-bones analytics system for anyone who doesn’t consent to analytics.


Plausible can actually do that and we have the same exact use case showing in our live demo (filtering by /blog/ only and allowing filtering per blog post). See https://plausible.io/plausible.io and click on "Visit /blog*" in the Goal Conversion section.

(I'm the co-founder of Plausible)


It does not. I used /blog/ as an example because it’s what Plausible has. But no sub-sites of /blog/ are listed, for example /blog/growing-saas-mrr.


Look into the Top Pages report on the linked page for the full list of all blog posts ranked by popularity. Click on the individual post to filter the dashboard by traffic to that post only.


I guess what Semaphor is saying is that there is no nesting. Taking a look at "docs" as an example, you have these entries in "Top Pages" from https://plausible.io/plausible.io:

- /docs/self-hosting 4.1k 5.7k 67%

- /docs/ 2.1k 2.7k 30%

- /docs 1.6k 2.1k 15%

- /docs/self-hosting-configuration 1.2k 1.8k 57%

You have to select "/docs/self-hosting" directly, and once you done that, you don't see the subpages of that page anymore. If you select "/docs" you only see docs, not "/docs" + subpages so you can see the most popular blogpost and only pages under "/docs"


We haven't set it for our docs on the live demo but we have set it up for /blog this exact use case. On our live demo scroll all the way down to Goal Conversions and click on "Visit /blog*". This gives you a filtered dashboard and on that filtered dashboard if you look at the Top Pages report you will see only the blog posts ranked by popularity and no other content that's outside of the /blog.


Ah, now I understand. Seems a bit convoluted, but then so is almost everything in GA ;)


Agree. This started as a feature for our custom events/goals as people wanted to see conversions on dynamically generated checkout pages for ecommerce. But turns out it is useful for the use cases we're discussing here too so we plan to make it easier to discover/work with in the future.


Wow I just learned about this feature. I felt if that was the intended use case then Goal Conversion should be ( or at least have the option to allow it ) at the top above the graph.

Right now if you want to drill into /blog you have to scroll down to the bottom and scroll back up again to see the results.


yeah, i know what you mean. we'll see if we can somehow put some kind of a custom filter on the top of the dashboard to allow this usecase or perhaps add a search box to the top pages report. need to decide which way is best.


Hey! I remember writing that ticket. I really appreciated how you handled the matter back then and left myself a note to come back in 6 months, rather than look for something else.


thanks, glad to hear that! and hope you enjoy this feature!


I'm about to have another look at it


> Above 1 million views/month (which I would say is still a pretty small commercial site) goatcounter already is in "ask us" territory

I can give some context on this: determining a good pricing on these kind of things is rather tricky. In principle this is easy: "cost + markup". But "costs" is actually pretty variable and independent of number of pageviews as such: sending 1 million pageviews on a single path is cheap. Sending them spread out over 1 million paths is much more expensive.

For smaller sites this is not a big deal, but above a certain amount this can matter a lot. I don't want to overcharge "light" users, but I also don't want to undercharge "heavy" users.

Basically, figuring out a good pricing is just hard. One of the goals is to be a viable alternative for GA for at least a bunch of use cases (though not all), and being cheap is part of that. This is the entire reason there's a free plan in the first place: when I started working on this there was no real alternative: you either had to shell out money or self-host, which is too high of a barrier for many people's blogs and whatnot (especially if they're not technical and will have trouble self-hosting). It's all a bit of a balancing act.

> And from a UX perspective, none of them seem to support searching for a specific page to display the stats for. Yes, you can edit the URL, but that’s a horrible way to do it.

That is supported, unless I misunderstand what you mean?


1 Million for $36: https://pirsch.io/ and you can click on any path to filter the statistics.


I use Plausible, it's very nice and completely painless to get started with. Transfers 701 bytes(!) to load the js on a page, which is super impressive.


While that *is* impressive, the bigger performance cost is usually in the DNS resolve time.


Not knocking this but I've seen what feels like dozens of script based solutions coming up on HN recently. If the goal is privacy what would this give you over something which runs on the server?


As in server logs? Accuracy would be the main thing. I did a study and found huge number of bots in server logs (AWStats shows 18 times higher number of page views than Plausible on my own website for instance). See https://plausible.io/blog/server-log-analysis

(I'm the co-founder of Plausible)


That is data though. You might want to know more about those bot hits. You might want to ip ban certain ranges that aggressive traffic comes from.


Yea but the target audience for the data is different. For example, I recommended plausible to the head of marketing at my job because she cares about conversion rates, how long people use the site, etc and she gets that from google analytics.

While the sys admins use kibana/grifana to view data the sort of data analytics provides such as where and how people are going, if people are converting, bounce rate aren't important. The raw stats are what are important. I would not recommend plausible to the sysadmins.


True. There's a lot of value in data server logs provide for many use cases. But for sites that use Google Analytics right now, in majority of cases they don't want that data and that data makes their dashboard so noisy that is no longer usable for the purposes they use GA for.


I tested Plausible a few months ago and couldn't get it up and running on my own server. The docs stated that self-hosting was possible but not supported at all. It was a real bummer, hope they worked on this in the last months.


How did you try to get it running? They have a docker-compose setup[1] that was super easy to use. It looks to be 5 months old so if you tried before then, good news, it's easy now!

[1] https://github.com/plausible/hosting


I cannot use the docker-compose setup they provide on my server, because it interferes with my existing Docker containers. Settting up the Clickhouse container never worked because Plausible wasn't able to connect properly to it. After researching for hours, I gave up. Not sure what the issues was.


Have you tried putting it into a different user namespace than your current docker containers? https://docs.docker.com/engine/security/userns-remap/


Nope, I just gave up after some hours.


I've been self-hosting Plausible for a few months now, but you're right - there's a bit of fiddling to get everything working. I ended up creating dedicated hosts for the Clickhouse and Postgres databases, and then playing with the connection strings in the Plausible container to connect to them.


I just set it up using the recommended docker machine way and it was like a breeze. You might want to give it another try - that said, it's really cheap, comparable to self hosting


I am now looking at https://nullitics.com (https://github.com/nullitics/nullitics) to jump away from Google Analytics, and quite like it. For self-hosted option I would probably use GoAccess. In general, I like the trend that more and more alternatives appear, competition is never a bad thing.


I don't know which one came first, but Nullitics and Plausible are pretty much clones of each other. Almost identical layout but with different styling.


Definitely not any of them being the first. SimpleAnalystics and Fathom also have similar layout and came before both.


I came across Plausible from this article on Test Double https://blog.testdouble.com/posts/2021-03-02-why-privacy-min...


Thanks for submitting Plausible! We have a feature that sends alerts for traffic spikes so I just got an alert that we have large number of visors thanks to this post!

(I'm the co-founder of Plausible)


No problem, I will be trying this out in my projects. I have been looking for alternatives to Google Analytics because of the privacy concerns and also does not require all the features they off.


I left Plausible for GoatCounter. It is better designed privacy-wise and much easier to self host (a single statically build binary, no dependencies): https://goatcounter.com/


Has anyone here tried out http://offen.dev yet?

I've stumbled upon it while implementing a self-hosted matomo solution. I like the approach, which allows for a much nicer UX of their consent process (of course I would prefer to not having to have one in the first place).


Loving the Plausible paid/hosted version, just upgraded yesterday. Exactly what I need and nothing more. Totally painless. As a dev I really cant think of anything I dislike! I don't do any A/B testing or social engineering nonsense, I just want to know where people are clicking from and when.

Being able to create password protected links for my colleagues is really nice though!


Question for you lot: Self hosting sounds like a better pro privacy solution (if cross origin JavaScript gets blocked), but what’s stopping somebody having a server side script to funnel all that information to somebody else?

I don’t think it helps.

Perhaps tracking should be done be regulated bodies who must abide by the rules


Regulate the shit out of everything, because why not?


What’s about the claim that if you drop GA, your site will degraded by Google and you’ll lose traffic?


We have no evidence of that happening. Also Google themselves are on record saying that they don't use GA data for their search algorithm. I wrote a post on this topic: https://plausible.io/blog/google-analytics-seo

(I'm the Plausible co-founder)


In theory, that shouldn't happen. The gist being, Google wants to make you (the person searching) happy. So much so, it wants to personalize its suggestions. That is, if you knew what Google knows, what would you recommend to you? :) The algorithm wants to be human. It wants to be you.

In fact, in that context, if you're a pro-privacy person then Google should - again in theory - be more prone to recommend privacy-respecting sites to you. True, that's counterintuitive given Google's biz model. However, that biz model breaks down even faster when people stop returning because they are unhappy w/ search results.

Put another way, it's only a matter of time before DDG and the like use the pro-privacy signal to move sites up the SERP. DDG already knows you probably prefer that, as that's why you use DDG. It certainly would be a helpful icon DDG could add to their SERPs.


I love Plausible and use and on mulitple sites. I like the simplicity, great UX and the fact that it's open source and self-hostable (even though I don't host it myself, knowing that it's possible is great).


Re: unlimited websites

Is there a way to invite clients and give them access to only their site's?

Effectively, I'd pay for it for myself and would add (mostly) pro-bono NPO client sites.


Yes. Take a look at our shared links feature (private, secure and can be password protected). Those with the shared link only get access to the individual dashboard that you shared and nothing else.

https://plausible.io/docs/shared-links

(I'm the co-founder of Plausible)


Great. Thank you. One more question while I have your attention:

Can I still use utm_* codes? Or something similar appended to a link (in the query string) in order to add attributes to a visit? And then analyze / filter by those attributes?


Yes. UTM tags (utm_source, utm_medium and utm_campaign) are fully supported. And you can filter the dashboard by any of them. And you can also see which of them refer traffic that ends up converting on any of your custom events/goals. You can try it out on our live demo: https://plausible.io/plausible.io


I use https://www.goatcounter.com/ and I'm very happy with it. It's open source and can be self hosted. Since there's no user tracking, it's GDPR compliant out of the box.


GoatCounter claims to be GDPR compliant, but also says it collects:

URL of the visited page. Referer header. User-Agent header. Screen size. Country name based on IP address. A hash of the IP address, User-Agent, and random number

As such, it seems to be processing user data that could be linked to an individual person.

I'd be cautious about the claim that GoatCounter is totally GDPR compliant (without a consent notice). You're safe, for now, on the basis that this doesn't seem likely to be tested in law.


> GoatCounter claims to be GDPR compliant

It claims it's probably GDPR compliant, but it's pretty transparent about various possible caveats and such on the GDPR page[1].

[1]: https://www.goatcounter.com/gdpr


Ah, I read another page it frames this slightly differently, e.g. https://www.goatcounter.com/why

"There should always be an option to add GoatCounter to your site without requiring a GDPR consent notice."


It's tricky writing these kind of things haha. You can actually disable data collection for a number of parameters: disable "User-Agents" for example and they will never be stored to disk. I should probably update a few of these pages.


GoatCounter is great.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: