Hacker News new | past | comments | ask | show | jobs | submit login
Google Analytics alternative that protects your data and your customers' privacy (matomo.org)
208 points by doener on May 7, 2023 | hide | past | favorite | 109 comments



Matomo is decent, but my main issue with it is the performance when run at any sort of scale. It's PHP/MySQL, which is nice for ease of self-hosting, but it means a lot of things need to be pre-calculated. Most of the newer and more performant GA alternatives out there are using things like ClickHouse.

ClickHouse: Piwik PRO, Plausible, PostHog, Yandex, Cloudflare

Snowflake: Amplitude, Piano, Snowplow

SingleStore: Fathom

I've written a book on the subject including evaluating the 15 most widely used options: https://gaalternatives.guide


What kind of scale are you looking at?

I'm building UXWizz, which is on the same stack, and in my experience on a single server you can track up to 10-15 million sessions total (~1M sessions per month, with 1 year retention). To scale, for now I made it so you can have one database per domain, so the data is sharded by domain (if you are tracking multiple domains).

The biggest bottleneck is querying a large dataset, the storing of data seems to work pretty well. You could of course archive the data and automatically create stats using cron jobs, but this leads to a lot less filtering options and not so fresh data.


The scaling issues I'm talking about are on the reporting side, not the measurement side. With Clickhouse you can do complex analytics queries on live data, with MySQL as you say you may need to pre-gen with cronjobs, etc.


Makes sense. My question was what magnitude of data amount are we talking about? Because for the average website, receiving less than 100k monthly session, a single MySQL instance is more than enough for real-time queries in sub-100ms query times.


> Matomo is decent, but my main issue with it is the performance when run at any sort of scale. It's PHP/MySQL, which is nice for ease of self-hosting, but it means a lot of things need to be pre-calculated.

I've never actually run into performance issues, neither when using it in production professionally, nor for my self-hosted sites (with Matomo always running on-prem). I'd say the performance of PHP and MySQL/MariaDB is most likely decent as long as you don't go too far into specialized workloads, for example log aggregation/tracing; though even some APM solutions like Apache Skywalking also support using traditional RDBMSes for this purpose as well: https://skywalking.apache.org/docs/main/v9.0.0/en/setup/back...

That said, I can't help but to wonder at what actual scale (number of logged events/second, given certain hardware) you'd run into issues. Luckily, because adding basic analytics is usually quite easy, testing this for your own workloads shouldn't be out of the question - then you can let the data speak for itself.


The performance issues aren't with the measurement requests but with reporting.

When I eval'd it for my book last fall there were big delays in reporting waiting for segments and then also issues with custom reports. I think they have changed the default behavior to get around some of the former, but with MySQL it's always going to be tough for larger queries.

(if there's any performance issue on the measurement side it has more to do with the JavaScript payload because they include a lot in their standard JS bundle).


> with MySQL it's always going to be tough for larger queries

MySQL was too slow for running analytics on a single website? How much data are we talking about?


For self-hosted, if you are doing into the tens of thousands of pageviews per day you may want to turn off real-time and switch to auto-archiving which pregens reports. That level of traffic depends a lot though on the report period you are running against and how big you've provisioned your MySQL server. YMMV and I haven't benchmarked at multiple traffic levels.


A while back I built out a quick guide comparing all of these alternatives, because the core value prop was pretty similar and it was annoying to compare between pricing plans. (My personal vote goes to Fathom.)

https://buttondown.email/comparison-guides/google-analytics-...


Fathom is run by some goofy marketer who has openly slandered (on HN) other analytics products in this space. Sadly, can’t support anyone who does that. They’re not open-source either.


I wouldn't call Jack Ellis a goofy marketer. His technical expertise is vast, as are his writing abilities [0].

[0]: https://usefathom.com/blog/ddos-attack


Last I checked, Fathom’s open source product hadn’t been updated for a couple of years. So, I switched to Plausible which is more reasonably updated.


How does Plausible compare to Google's Universal Analytics? And are there any SEO effects?

GA4 migration seems not aimed at general users, so I'm looking at alternatives. Ideally could import my data.


No SEO effects. If anything there could be a slight advantage to using Plausible over GA because the JS tracker is much smaller, though in practice I would not expect any SEO impact.

Plausible is much simpler than UA, all the reporting is on one page and it has much more limited config options. It also can import your UA data. If you're looking for a simplified GA alternative it's a very good choice. (Full disclosure, I recently published an article on their blog about UA vs GA4: https://plausible.io/blog/ua-vs-ga4)


Thanks! Hmm, can you drill down into sets of pages like you can in UA? That was a pretty key feature.

I'll have a look, seems promising.


You've probably already discovered this, but the only way you drill down is with filters. You can ad-hoc define a page filter with multiple URLs, but you can't (to my knowledge) save that filter set, and content grouping UA-style doesn't exist. Their live demo is with their own site's data: https://plausible.io/plausible.io

Good luck!


Stay far away from fathom. Bro culture bullshit at the worst. Don’t believe a word they say.


"GA's interface is complex and confusing, especially for basic use cases."

As I said in another comment, it's been eight years since I used that accursed interface, and I'll be ready to try it again once the flashbacks go away.


I'd love for you to add Amplitude (amplitude.com) to your guide. Ensuring we have the same easy out of the box experience GA does is a top priority for us this year. Let me know how I can us get added!


I'm using the matomo self hosted version and like it overall. I love you can track all outbound clicks without having to specifically add Dom elements to outbound links to make this possible. Unfortunately matomo is blocked just like Google Analytics by every ad/tracking blocker. Doesn't matter if you host it yourself and only track global stats vs tracking users across the web like GA does. The only solution seems to be writing your own analytics.


At this point in history, tracking on the web is no longer a trusted activity where people can assume that the person behind the tracking is doing it for benevolent purposes. It's the same thing with email and spam, especially when attachments are involved.

Writing your own analytics can give some additional benefits in that you are only collected what you need while taking into considerations your users needs. I expect however that in time browsers will block more and more by default, similar in how email clients and services has progressed in their arm race with spam.


Why spend time undermining people's preferences?


Because I think most people who use something like ublock don't want to see ads or have their privacy violated by being followed around the web using third party trackers.

A site owner observing some general, anonymized stats like visitor and page count, which outbound links are clicked, os, screen size, time on page and what have you is quite different. I understand a blocker must go all the way and cannot distinguish between these cases. Hence my effort to find an alternative.


Most people who are against trackers are not against the website they visit getting valuable information about which page they use or not, or the order in which they use each page to figure out which path work or not, etc ...

They are against the website chosing to not pay for it and instead getting it for free in exchange for giving all that data to a 3rd party (like GA / Google), who then uses it for its own purpose.

Doesn't mean no people are against that first scenario too, but then they better not make an account, visit several pages in a row on the same website or want to use a cart, or essentially anything beyond a static website.

Both scenarios are widely different, and convincing people on both side (even both extreme) of that line that the line doesn't exists is one of the greatest and most successfull trick tracking companies have played.


An ad/tracking blocker could discriminate between privacy-protecting trackers and spyware, but it would not be worth the time in practice.

Such a distinction would need an option and have to be on by default. Most people use the “out of the box” config, so only a few people (like me) would enable honest tracking.

The blockers would have to keep up with this option to make sure the thing they allow hadn’t switched to evil mode.

And so on. Basically another case where bad actors like google poisoned the well.


If I build a web site, and it is my preference to know what pages get clicks on what elements (presumably, so I can make my site better)... whose preference gets priority; mine or my users? It's not as black and white as your question makes it sound.


It kind of is black and white, from technology point of view.

You, the website owner, can control what your server does in response to HTTP requests a client makes. You control what data is sent, and under what conditions you'll send that data (ie: presence of a valid session cookie, correct username/password, cryptographly signed request, etc).

I, the user owning a computer, get to control what my computer does. I run a web browser, and can choose what happens in response to data your site sends me via HTTP.

Most notably, your site can send some javascript, but my computer doesn't have to run it. My computer can also selectively block what it does, including limiting its access to initiate web requests to other sites.

Anything beyond this is artificial, such as laws like DMCA or CFAA.


Your response seems to completely miss the point of the thread you're replying to. The discussion in question was, effectively

>>> You can write your own code to gather statistics

>> You should respect your user's desires and not gather statistics

> The users aren't the only ones with desires

Sure, whether or not you "can" do it is black and white (and a game of whack-a-mole many times), but whether or not you "should" do it is very much a gray area.


The users have the ultimate authority whether you like it or not: they don’t have to read your whole page, they don’t have to look at that image (or even load it), they don’t even have to go to your site if their friends tell them not to.

It’s like going to pee when an ad appeared on TV back when TV was a thing. The broadcaster and advertiser had no control.

I am sympathetic to your desire (I’m assume your desire comes from a good place),* but at the end of the day I think we want to live in a world where the people are the important part.

* in my experience the best sales people really do believe the prospective customer does want what they are selling, be it pantyhose, homeopathic drugs, or specially formulated window washing fluid.


I don’t get it, how can they stop you from recording this on your own server?

Are you talking about CNAME cloaking? Pretty sure Apple only cares if one specific server gets all the CNAMEs. It doesn’t block CNAMEs in general.


I thought that was the whole point of what was being said; that things like metrics (what on the page gets clicked on) are getting blocked. Bear in mind, I'm not just talking about what pages get loaded. There's more to "clicked on the page" than just page loading.


ITP now also degrades first party server-set cookies to 7 days where the first part of the IPs don't match. So if you're using CNAMEs for your measurement and the you have a.a.x.x and b.b.x.x it will downgrade.


Link?



Wow. OK, so the servers can just proxy the requests. Then what's Safari gonna do? Unless they totally eliminate all state and cookies in browsers, they can't prevent that!


You can usually rename the tracker to something that's not on the blocklist.


That used to work but current block filters analyse js variables and url parameters and are much harder to circumvent.


Is it also blocked when you don't even enable cookies? You loose some accuracy, but clients can't prevent ending up in your logs and they have to share some info with the server.


> tracking users across the web like GA does

What does this mean?


At least in prior Google Analytics versions, a third party cookie was used, giving the possibility to link you to every site that implements Google Analytics. But Google explicitly states not to do this, so you are correct in calling me out here.


GA4 still uses the doubleclick cookie. It also encourages the use of Google Signals and runs measurement requests off of the main google.com domain to help it track users based upon their Google login.


If site A and site B both uses GA, then GA track them across both internally for their stats (and it helps google in figuring out the same user has interest A and interest B).

Matomo promises to not do the same link across properties on their cloud hosted version.


Analytics platforms are a dime a dozen, and they all suffer from the same flaws. Either they are too expensive for everyone, or they add too much overhead for everyone. I've actually been debating getting into this space for a while due to this fact, especially after interviewing with one of the top companies in this space and noting how cocky the CTO was about their product.

Google Analytics is FREE and won't cost most us a dime. If you want to charge me a dime, you need to make your product worth that extra dime to ME. Note that even healthcare companies use Google Analytics here in the US.


I tried Matomo.

Self hosting is easy. That's a plus.

I also like the interface. Took a while to get used to it but after that, I liked it even better than Google Analytics.

But one problem that seems unsurmountable is that it tries to be clever. And while trying, it messes up your data.

If you have pages with a parameter in the querystring that is called "q", Matomo does not count those as pageviews. It tries to be clever and only count those as "searches". Probably because many site searches use a parameter "q" for what the user is searching for.

Even if a page is a search result page, it should be counted as a pageview.

The problem gets even worse when you have users bookmarking pages with a "q" parameter. Then things get really messy when you try to understand which pages users use, where they come from etc.

I have searched a lot, but have found no way to disable this "cleverness". And no way to retroactively fix the data.


Record tracking pings independently of any solution and figure things out later, a 1x1 GIF can absorb an unlimited number of query strings, and S3 storage for log files is extremely economical. It doesn't matter which analytics solution you end up with, you'll always be throwing away data. IIRC even GA enterprise doesn't allowing pulling the original click log back out of it, only some form of sampling or another. Put a 1x1 GIF in an S3 bucket, create a CloudFront distribution and enable logging for it into another bucket. Tada, unlimited scale achieved for pennies.

Matomo and GA's trackers both allow sending the query to a URL of your choice. Treat the analytics system as an index over the logs, rather than treating the logs as an auxiliary artefact of the analytics system


That's an odd choice as WordPress, which is by far the most popular CMS, uses ?s= as a search query.

I would expect those pages to be included in the data. They could offer some sort of segmentation if they think they they can separate out searches, though.


Just disable website search or change the search parameter inside website settings to stop Matomo from interpreting the 'q' parameter as search.


It is bad default, but nice that it is configurable.


Matomo is not trivial run on prem, there are lots of stuff that do not work on larger installs unless you do lots of manual optimization, what those optimizations are is not obvious. The problems only shows after some time when you have to redo reports for multiyear periods, or handle hug of death.

That said people love analytics, it is a powerful tool.


Any links? I'm assuming you mean something more than the periodic rollups?


The biggest issue is actually that the analytic part is so removed from the performance part. There is no good way to know what will be a fast analytics report, no feedback in the UI for the creator of the report. I am stuck with policing that.

Not sure what you mean with rollups. And not sure what I am supposed to link.

These are just some of the things you have to manually install, this is except all the plugins you have to buy. Upgrade Matomo and not miss analytics. Good db performance. Secure installation. What is both bogging down performance.


I wish some of the privacy focused GA alternatives had SOC 2 reports, or ISO 27001. We’re working towards our first SOC 2, which makes it hard to incorporate anything without one into our product.

On prem is a lot of work, and not something i want to approach lightly.


why? Having gone through a few SOC-2s, I don't see any value it other than it being a racket.


Yes it’s a huge racket that’s likely does little to solve the problems it was enacted to prevent. But have you tried making deals with large SOC2 companies without your own certification?


Having gone through ISO 27001 and PCI DSS level 2 I kind of assumed all of these security focussed compliance standards are just that. Anyone have any exceptions?


Piwik Pro is SOC2 certified.


> Google Analytics alternative that protects your data and your customers' privacy

It's not your data, this data about me.

> Your customers will love you because their valuable personal data is protected.

I guarantee you your customers will not love you for tracking them.


Your customers might love you for making a better website, and this is hard to do without feedback, of which analytics is one kind.


>hard to do without feedback, of which analytics is one kind

[citation needed]


Ideally the different self-hosted web stacks would have built-in analytics that would not have to hit the client with javascript. But they don't, or if they do each has its own inconsistent approach to as what data is collected and how it is presented. So the second best if you care about your user's privacy (and, if applicable, your own commercial or institutional privacy) is something like matomo.


C'mon there's like a thousand threads on these already.


In all these threads, there is never a project manager from a large establishment telling

- Thanks. we will migrate - we did and it was <good/bad>

More or less everyone is after $.



Half of the "alternatives" there are dead (website is offline) and the rest is not free. Those are hardly alternatives, more like band-aid solutions.


A paid alternative to a free service is still an alternative.

They may not be alternatives you like, but they are alternatives. And for many people a paid option may be better once they start looking at why the free thing is free.


I'm going through the list at random (I'm on the market for such a service) and none of them appear offline so far.


Yeah you're right actually. From my phone most of them appeared online but from my computer, on the same network, they're all fine.


This website is off to a bad start when the first item says

> Because it does not use cookies there is no need to show cookie banner for this service.

which is a blatant lie/misinformation. The ‘cookie law’ has nothing to do with the actual use of cookies.


To defend this site, that is the claim of the vendor and I wouldn't expect a site that focuses on listing EU alternatives to be critically evaluating a claim like that which hasn't been explicitly nay-sayed by any regulatory agency. Plausible uses a visitor id based upon a hashed + salted user agent plus IP address where the salt is rotated daily. The choice of whether consent is required for that is for the individual implementing site to make up their mind upon, but I don't think the vendor claim is unreasonable.

A similar (but better, IMHO) site that focuses just on analytics is: https://newmetrics.io/


The cookie law™ clearly lays out that technically required cookies can be set without explicit consent. Performance measurements are not a requirement though.

So if you are not using anything but essential cookies on your site, you don't need a banner.


I just want analytics that don't require a Ph.D. in obscure user interfaces to get anything out of them. TBF, I haven't used GA in 8 years, maybe it's gotten better -- but I still have flashbacks.


I have reviewed many Google Analytics replacements in terms of features and capabilities. Matomo may be suitable for you, but its data presentation is not user-friendly. If you only need basic metrics to track, there are many alternatives that present analytics data more clearly. For more information, see https://algustionesa.com/google-analytics-alternatives/.


It is really easy to protect everyone's privacy by not using advanced analytics platforms at all.


Their usage of word "On-Premise" instead of "on prem" or "on premises"!!!


The terms on-premises and on-premise and on-prem are synonymous within enterprise lingo.


The one that grinds my gears is "bottoms up" instead of "bottom up."


>The one that grinds my gears is "bottoms up" instead of "bottom up."

Why does that "grind your gears?"

The term "bottoms up" is (unless I'm missing something important -- am I?) just the plural of "bottom up."

And given that the phrase is an admonition to drink what's in your container (hence seeing the bottom of the glass pointing up), it's usually (although I suppose it could be more prevalent when drinking alone, but I guess we'll never know) offered among a group (two or more) of folks, making "bottoms up" appropriate as it refers to more than one "bottom."

Again, I could be missing something important here. If so, please enlighten me.

Edit: Fixed typo.


It's not a plural, it's a different expression that happens to sound similar.

"bottoms up" means to turn your alcohol glass upside down, so the bottom is up, a euphemism for drinking from your glass

"bottom up" is an order of operation, starting at the bottom, working your way up


> It's not a plural, it's a different expression that happens to sound similar.

>"bottoms up" means to turn your alcohol glass upside down, so the bottom is up, >a euphemism for drinking from your glass

>"bottom up" is an order of operation, starting at the bottom, working your way up

That makes more sense now. Thanks! Makes me wish I hadn't taken a tops-down approach to my initial reply. :)


Rudderstack claims to be a GA alternative [1] and accepts server-side data allowing this to be a 1st party integration skipping the consent complexity. Any thoughts on this one? It also made it to the Thoughtworks Tech Radar [2].

[1] https://www.rudderstack.com/replace-google-analytics-4-guide... [2] https://www.thoughtworks.com/en-us/radar/platforms/ruddersta...


Why is this at the top of HN?


I'm guessing because of this:

https://blog.google/products/marketingplatform/analytics/pre...

> All standard Universal Analytics properties will stop processing new hits on July 1, 2023.


So what is the main difference between Universal Analytics and Google Analytics 4?

We currently use Google Analytics to understand how users move through our app. We also used Matomo (previously Piwik) 5 years ago.

Now Google Analytics on iOS will stop working for users unless they update our app? It doesn’t seem to say anything: https://developers.google.com/analytics/devguides/collection...


Analytics are widely used in communication departments in European enterprise, and where that previously was very often Google Analytics, it’s hard to use it because of Google’s inability/unwillingness to change their enterprise targeting business model to be GDPR compliant. I’m not personally convinced you really need an analytics tool in most European communications departments. As long as saying something like that is akin to heresy, however, I think it’s safe to say that a lot of people are interested in alternatives to Google Analytics.

It’s likely not just in Europe anymore. Privacy seems to be a tend that is on the increase everywhere. But as I understand it, things move to the top of HN if they are interesting to a lot of people, and privacy is interesting to a lot of people these years. Not just to the “nerds” either, at least I tend to see more and more discussion on it outside of tech circles. In the EU specific you do have the very real “motivation” of dropping Google Analytics because using it puts you in the lovely area of breaking the law.


Google bad


I have integrated my site with matomo.

The matomo analytics are captured and stored on-premises on my server (nothing goes to the cloud).

Performance is good with my configuration. You can see page performance for yourself by loading this page: https://freesoftware.life/how-to-install-kubuntu-23-04/


You can also run Matomo without tracking Javascript and instead feed in log files [1]. This works with the Cloudfront log files (and many others).

[1] https://matomo.org/faq/general/requirements-for-log-analytic...


Haven't used it in six years but back in the days it was Piwik it was ideal: easy to set up locally, a good range of features and a friendly community (v. responsive to an upgrade issue we experienced but apart from that everything worked exactly as expected).


Back in my day it was awstats. Still works great. I have 18 years of data.

https://www.awstats.org/


Loved AWStats! Still can be useful — but bots, client side caching, CDNs, and did I mention bots..? have made the data hard to rely on for much. A while ago I switched from AWStats to GoAccess (https://goaccess.io/) for this kind of thing. I prefer its interface, and it's way way faster to churn through big log files (C vs. Perl).


Try usermaven.com as an alternative to GA, I am sure you'll love the simplicity and easy of use/maintenance because of the auto-tracked events functionality.


I'm not an engineer. Can someone please specifically explain to me how this protects data and privacy more than Google?

Does it use cookies and browser storage?


One thing is that it doesn't send any data about your users to Google.


Beware, Matomo by default isn't very privacy friendly, you will need the GDPR banner for any advanced features.

If you want a GDPR compliant analytics you have to disable many of its flagship features, or use something else like Plausible, designed to work with no consent.


You need a GDPR banner for sharing information with third parties. Why would you need one for self-hosted Matomo?


You'll need consent for any data collection not essential to your site's functionality, even if you host stuff yourself.

If all you so is collect how often your pages are being visited then you're not collecting any PII and you don't need a banner, but if you're tracking visitors based on unique identifiers (cookies, IP addresses, etc.) you'll need to get consent first.


Its not only that, its also about the data collected.

Real time data like visitor data and heatmaps aren't allowed, also IP tracking is not allowed too.

Because matomo can be very powerful, more powerful than Ganalytics.

You can let it assign a unique id to every visitor if it visited a subdomain and logged in, so you can now exactly who each visitor is on all of your sites.

https://matomo.org/faq/how-to/how-do-i-configure-matomo-with...


Thoughts on Microsoft Clarity? https://clarity.microsoft.com


I think we learnt enough when big tech offers something for “free” and when they call it “absolutely” free it just means you are absolutely the product.

So thanks but No thanks


Clarity is awesome--the metrics and the way it combines a visualization of our site with user session data is amazing. It shows you the actual locations on the page that users visit as well as the path they follow to get there. The insights are far more actionable than Google Analytics from my experience. (We use both.)

p.s., Under the covers Clarity runs on ClickHouse.


https://umami.is/ does the same, free tier available.


Usermaven.com does the same and covers product insights as well. Free tier is pretty generous (1M events per month).


Thanks for sharing this! Just got it set up on one of my domains and very pleased.


Let's all just stop tracking all together.

We don't need tracking at all and bloating up and slowing down websites.


This would be nice. I don't track users on my personal blog. I don't give a flying duck about what people do there.

However I make money from running a website which is really useful to a lot of people. I absolutely need to know what works and what doesn't. I can't write and edit in the dark, possibly missing by a mile what my readers really need. It would be like flying a plane without instruments.

For instance, I see that a lot of people use the search to find a single guide, which should definitely be linked on the home page. Without basic tracking, I wouldn't even know which pages are important to my users.

There are many gaping holes in your website that you could be completely blind to without a basic sense of what your users do on your website.

I also caught many illegal copies of my website through referrer tracking. Three of them were phishing websites, and I got them shut down.

So there are many legitimate reasons to have basic traffic counters, and you can have those while respecting your users' privacy and following the spirit and the letter of GDPR.


I wonder if someone's made an AdNauseam for tracking libraries yet. Going on the defensive clearly doesn't work, some more offensive action is required.

Send a whole bunch of plausible events, pretending to click every link, changing your identifiers and stuff like resolution every time, make it impossible to determine what data is real and what isn't. Bonus points for leaving websites alone if they don't load tracking scripts until you've consented.

We can't stop trackers, but we can try to make them useless. Even if they filter out such tracking they shouldn't be able to figure out what data was real, making their tracking attempts worthless.


I believe AdNauseam uses EasyList, so if it doesn't include the EasyPrivacy part of that (which contains the trackers) by default it seems like it would be easy to add.

That said, I don't think this is an effective strategy at all. Safari has placed a big giant hole in tracking (like 20% of users) and lots of sites are still proceeding like nothing has changed. Google referrer spam was run at mega-scale dumping billions (at least) of spam hits into millions of profiles and didn't effect tracking efforts.

A plugin run by .0001% of users or whatever that adds in a bunch of slop to the numbers just makes more analysts pull out their hair rather than leading to change.


Worth noting that it seems to be only "open core", there's a bunch of paywalled features that I presume aren't open source. https://matomo.org/pricing/


We (cronitor.io) have a really great all-in-one solution to analytics and website monitoring with a generous free tier.

https://cronitor.io/real-user-monitoring




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: