Matomo is decent, but my main issue with it is the performance when run at any sort of scale. It's PHP/MySQL, which is nice for ease of self-hosting, but it means a lot of things need to be pre-calculated. Most of the newer and more performant GA alternatives out there are using things like ClickHouse.
I'm building UXWizz, which is on the same stack, and in my experience on a single server you can track up to 10-15 million sessions total (~1M sessions per month, with 1 year retention). To scale, for now I made it so you can have one database per domain, so the data is sharded by domain (if you are tracking multiple domains).
The biggest bottleneck is querying a large dataset, the storing of data seems to work pretty well. You could of course archive the data and automatically create stats using cron jobs, but this leads to a lot less filtering options and not so fresh data.
The scaling issues I'm talking about are on the reporting side, not the measurement side. With Clickhouse you can do complex analytics queries on live data, with MySQL as you say you may need to pre-gen with cronjobs, etc.
Makes sense. My question was what magnitude of data amount are we talking about? Because for the average website, receiving less than 100k monthly session, a single MySQL instance is more than enough for real-time queries in sub-100ms query times.
> Matomo is decent, but my main issue with it is the performance when run at any sort of scale. It's PHP/MySQL, which is nice for ease of self-hosting, but it means a lot of things need to be pre-calculated.
I've never actually run into performance issues, neither when using it in production professionally, nor for my self-hosted sites (with Matomo always running on-prem). I'd say the performance of PHP and MySQL/MariaDB is most likely decent as long as you don't go too far into specialized workloads, for example log aggregation/tracing; though even some APM solutions like Apache Skywalking also support using traditional RDBMSes for this purpose as well: https://skywalking.apache.org/docs/main/v9.0.0/en/setup/back...
That said, I can't help but to wonder at what actual scale (number of logged events/second, given certain hardware) you'd run into issues. Luckily, because adding basic analytics is usually quite easy, testing this for your own workloads shouldn't be out of the question - then you can let the data speak for itself.
The performance issues aren't with the measurement requests but with reporting.
When I eval'd it for my book last fall there were big delays in reporting waiting for segments and then also issues with custom reports. I think they have changed the default behavior to get around some of the former, but with MySQL it's always going to be tough for larger queries.
(if there's any performance issue on the measurement side it has more to do with the JavaScript payload because they include a lot in their standard JS bundle).
For self-hosted, if you are doing into the tens of thousands of pageviews per day you may want to turn off real-time and switch to auto-archiving which pregens reports. That level of traffic depends a lot though on the report period you are running against and how big you've provisioned your MySQL server. YMMV and I haven't benchmarked at multiple traffic levels.
A while back I built out a quick guide comparing all of these alternatives, because the core value prop was pretty similar and it was annoying to compare between pricing plans. (My personal vote goes to Fathom.)
Fathom is run by some goofy marketer who has openly slandered (on HN) other analytics products in this space. Sadly, can’t support anyone who does that. They’re not open-source either.
No SEO effects. If anything there could be a slight advantage to using Plausible over GA because the JS tracker is much smaller, though in practice I would not expect any SEO impact.
Plausible is much simpler than UA, all the reporting is on one page and it has much more limited config options. It also can import your UA data. If you're looking for a simplified GA alternative it's a very good choice. (Full disclosure, I recently published an article on their blog about UA vs GA4: https://plausible.io/blog/ua-vs-ga4)
You've probably already discovered this, but the only way you drill down is with filters. You can ad-hoc define a page filter with multiple URLs, but you can't (to my knowledge) save that filter set, and content grouping UA-style doesn't exist. Their live demo is with their own site's data: https://plausible.io/plausible.io
"GA's interface is complex and confusing, especially for basic use cases."
As I said in another comment, it's been eight years since I used that accursed interface, and I'll be ready to try it again once the flashbacks go away.
I'd love for you to add Amplitude (amplitude.com) to your guide. Ensuring we have the same easy out of the box experience GA does is a top priority for us this year. Let me know how I can us get added!
I'm using the matomo self hosted version and like it overall. I love you can track all outbound clicks without having to specifically add Dom elements to outbound links to make this possible.
Unfortunately matomo is blocked just like Google Analytics by every ad/tracking blocker. Doesn't matter if you host it yourself and only track global stats vs tracking users across the web like GA does. The only solution seems to be writing your own analytics.
At this point in history, tracking on the web is no longer a trusted activity where people can assume that the person behind the tracking is doing it for benevolent purposes. It's the same thing with email and spam, especially when attachments are involved.
Writing your own analytics can give some additional benefits in that you are only collected what you need while taking into considerations your users needs. I expect however that in time browsers will block more and more by default, similar in how email clients and services has progressed in their arm race with spam.
Because I think most people who use something like ublock don't want to see ads or have their privacy violated by being followed around the web using third party trackers.
A site owner observing some general, anonymized stats like visitor and page count, which outbound links are clicked, os, screen size, time on page and what have you is quite different.
I understand a blocker must go all the way and cannot distinguish between these cases. Hence my effort to find an alternative.
Most people who are against trackers are not against the website they visit getting valuable information about which page they use or not, or the order in which they use each page to figure out which path work or not, etc ...
They are against the website chosing to not pay for it and instead getting it for free in exchange for giving all that data to a 3rd party (like GA / Google), who then uses it for its own purpose.
Doesn't mean no people are against that first scenario too, but then they better not make an account, visit several pages in a row on the same website or want to use a cart, or essentially anything beyond a static website.
Both scenarios are widely different, and convincing people on both side (even both extreme) of that line that the line doesn't exists is one of the greatest and most successfull trick tracking companies have played.
An ad/tracking blocker could discriminate between privacy-protecting trackers and spyware, but it would not be worth the time in practice.
Such a distinction would need an option and have to be on by default. Most people use the “out of the box” config, so only a few people (like me) would enable honest tracking.
The blockers would have to keep up with this option to make sure the thing they allow hadn’t switched to evil mode.
And so on. Basically another case where bad actors like google poisoned the well.
If I build a web site, and it is my preference to know what pages get clicks on what elements (presumably, so I can make my site better)... whose preference gets priority; mine or my users? It's not as black and white as your question makes it sound.
It kind of is black and white, from technology point of view.
You, the website owner, can control what your server does in response to HTTP requests a client makes. You control what data is sent, and under what conditions you'll send that data (ie: presence of a valid session cookie, correct username/password, cryptographly signed request, etc).
I, the user owning a computer, get to control what my computer does. I run a web browser, and can choose what happens in response to data your site sends me via HTTP.
Most notably, your site can send some javascript, but my computer doesn't have to run it. My computer can also selectively block what it does, including limiting its access to initiate web requests to other sites.
Anything beyond this is artificial, such as laws like DMCA or CFAA.
Your response seems to completely miss the point of the thread you're replying to. The discussion in question was, effectively
>>> You can write your own code to gather statistics
>> You should respect your user's desires and not gather statistics
> The users aren't the only ones with desires
Sure, whether or not you "can" do it is black and white (and a game of whack-a-mole many times), but whether or not you "should" do it is very much a gray area.
The users have the ultimate authority whether you like it or not: they don’t have to read your whole page, they don’t have to look at that image (or even load it), they don’t even have to go to your site if their friends tell them not to.
It’s like going to pee when an ad appeared on TV back when TV was a thing. The broadcaster and advertiser had no control.
I am sympathetic to your desire (I’m assume your desire comes from a good place),* but at the end of the day I think we want to live in a world where the people are the important part.
* in my experience the best sales people really do believe the prospective customer does want what they are selling, be it pantyhose, homeopathic drugs, or specially formulated window washing fluid.
I thought that was the whole point of what was being said; that things like metrics (what on the page gets clicked on) are getting blocked. Bear in mind, I'm not just talking about what pages get loaded. There's more to "clicked on the page" than just page loading.
ITP now also degrades first party server-set cookies to 7 days where the first part of the IPs don't match. So if you're using CNAMEs for your measurement and the you have a.a.x.x and b.b.x.x it will downgrade.
Wow. OK, so the servers can just proxy the requests. Then what's Safari gonna do? Unless they totally eliminate all state and cookies in browsers, they can't prevent that!
Is it also blocked when you don't even enable cookies? You loose some accuracy, but clients can't prevent ending up in your logs and they have to share some info with the server.
At least in prior Google Analytics versions, a third party cookie was used, giving the possibility to link you to every site that implements Google Analytics. But Google explicitly states not to do this, so you are correct in calling me out here.
GA4 still uses the doubleclick cookie. It also encourages the use of Google Signals and runs measurement requests off of the main google.com domain to help it track users based upon their Google login.
If site A and site B both uses GA, then GA track them across both internally for their stats (and it helps google in figuring out the same user has interest A and interest B).
Matomo promises to not do the same link across properties on their cloud hosted version.
Analytics platforms are a dime a dozen, and they all suffer from the same flaws. Either they are too expensive for everyone, or they add too much overhead for everyone. I've actually been debating getting into this space for a while due to this fact, especially after interviewing with one of the top companies in this space and noting how cocky the CTO was about their product.
Google Analytics is FREE and won't cost most us a dime. If you want to charge me a dime, you need to make your product worth that extra dime to ME. Note that even healthcare companies use Google Analytics here in the US.
I also like the interface. Took a while to get used to it but after that, I liked it even better than Google Analytics.
But one problem that seems unsurmountable is that it tries to be clever. And while trying, it messes up your data.
If you have pages with a parameter in the querystring that is called "q", Matomo does not count those as pageviews. It tries to be clever and only count those as "searches". Probably because many site searches use a parameter "q" for what the user is searching for.
Even if a page is a search result page, it should be counted as a pageview.
The problem gets even worse when you have users bookmarking pages with a "q" parameter. Then things get really messy when you try to understand which pages users use, where they come from etc.
I have searched a lot, but have found no way to disable this "cleverness". And no way to retroactively fix the data.
Record tracking pings independently of any solution and figure things out later, a 1x1 GIF can absorb an unlimited number of query strings, and S3 storage for log files is extremely economical. It doesn't matter which analytics solution you end up with, you'll always be throwing away data. IIRC even GA enterprise doesn't allowing pulling the original click log back out of it, only some form of sampling or another. Put a 1x1 GIF in an S3 bucket, create a CloudFront distribution and enable logging for it into another bucket. Tada, unlimited scale achieved for pennies.
Matomo and GA's trackers both allow sending the query to a URL of your choice. Treat the analytics system as an index over the logs, rather than treating the logs as an auxiliary artefact of the analytics system
That's an odd choice as WordPress, which is by far the most popular CMS, uses ?s= as a search query.
I would expect those pages to be included in the data. They could offer some sort of segmentation if they think they they can separate out searches, though.
Matomo is not trivial run on prem, there are lots of stuff that do not work on larger installs unless you do lots of manual optimization, what those optimizations are is not obvious. The problems only shows after some time when you have to redo reports for multiyear periods, or handle hug of death.
That said people love analytics, it is a powerful tool.
The biggest issue is actually that the analytic part is so removed from the performance part. There is no good way to know what will be a fast analytics report, no feedback in the UI for the creator of the report. I am stuck with policing that.
Not sure what you mean with rollups. And not sure what I am supposed to link.
These are just some of the things you have to manually install, this is except all the plugins you have to buy. Upgrade Matomo and not miss analytics. Good db performance. Secure installation. What is both bogging down performance.
I wish some of the privacy focused GA alternatives had SOC 2 reports, or ISO 27001. We’re working towards our first SOC 2, which makes it hard to incorporate anything without one into our product.
On prem is a lot of work, and not something i want to approach lightly.
Yes it’s a huge racket that’s likely does little to solve the problems it was enacted to prevent. But have you tried making deals with large SOC2 companies without your own certification?
Having gone through ISO 27001 and PCI DSS level 2 I kind of assumed all of these security focussed compliance standards are just that. Anyone have any exceptions?
Ideally the different self-hosted web stacks would have built-in analytics that would not have to hit the client with javascript. But they don't, or if they do each has its own inconsistent approach to as what data is collected and how it is presented. So the second best if you care about your user's privacy (and, if applicable, your own commercial or institutional privacy) is something like matomo.
A paid alternative to a free service is still an alternative.
They may not be alternatives you like, but they are alternatives. And for many people a paid option may be better once they start looking at why the free thing is free.
To defend this site, that is the claim of the vendor and I wouldn't expect a site that focuses on listing EU alternatives to be critically evaluating a claim like that which hasn't been explicitly nay-sayed by any regulatory agency. Plausible uses a visitor id based upon a hashed + salted user agent plus IP address where the salt is rotated daily. The choice of whether consent is required for that is for the individual implementing site to make up their mind upon, but I don't think the vendor claim is unreasonable.
A similar (but better, IMHO) site that focuses just on analytics is: https://newmetrics.io/
The cookie law™ clearly lays out that technically required cookies can be set without explicit consent. Performance measurements are not a requirement though.
So if you are not using anything but essential cookies on your site, you don't need a banner.
I just want analytics that don't require a Ph.D. in obscure user interfaces to get anything out of them. TBF, I haven't used GA in 8 years, maybe it's gotten better -- but I still have flashbacks.
I have reviewed many Google Analytics replacements in terms of features and capabilities. Matomo may be suitable for you, but its data presentation is not user-friendly. If you only need basic metrics to track, there are many alternatives that present analytics data more clearly. For more information, see https://algustionesa.com/google-analytics-alternatives/.
>The one that grinds my gears is "bottoms up" instead of "bottom up."
Why does that "grind your gears?"
The term "bottoms up" is (unless I'm missing something important -- am I?) just the plural of "bottom up."
And given that the phrase is an admonition to drink what's in your container (hence seeing the bottom of the glass pointing up), it's usually (although I suppose it could be more prevalent when drinking alone, but I guess we'll never know) offered among a group (two or more) of folks, making "bottoms up" appropriate as it refers to more than one "bottom."
Again, I could be missing something important here. If so, please enlighten me.
Rudderstack claims to be a GA alternative [1] and accepts server-side data allowing this to be a 1st party integration skipping the consent complexity. Any thoughts on this one? It also made it to the Thoughtworks Tech Radar [2].
Analytics are widely used in communication departments in European enterprise, and where that previously was very often Google Analytics, it’s hard to use it because of Google’s inability/unwillingness to change their enterprise targeting business model to be GDPR compliant. I’m not personally convinced you really need an analytics tool in most European communications departments. As long as saying something like that is akin to heresy, however, I think it’s safe to say that a lot of people are interested in alternatives to Google Analytics.
It’s likely not just in Europe anymore. Privacy seems to be a tend that is on the increase everywhere. But as I understand it, things move to the top of HN if they are interesting to a lot of people, and privacy is interesting to a lot of people these years. Not just to the “nerds” either, at least I tend to see more and more discussion on it outside of tech circles. In the EU specific you do have the very real “motivation” of dropping Google Analytics because using it puts you in the lovely area of breaking the law.
Haven't used it in six years but back in the days it was Piwik it was ideal: easy to set up locally, a good range of features and a friendly community (v. responsive to an upgrade issue we experienced but apart from that everything worked exactly as expected).
Loved AWStats! Still can be useful — but bots, client side caching, CDNs, and did I mention bots..? have made the data hard to rely on for much. A while ago I switched from AWStats to GoAccess (https://goaccess.io/) for this kind of thing. I prefer its interface, and it's way way faster to churn through big log files (C vs. Perl).
Try usermaven.com as an alternative to GA, I am sure you'll love the simplicity and easy of use/maintenance because of the auto-tracked events functionality.
Beware, Matomo by default isn't very privacy friendly, you will need the GDPR banner for any advanced features.
If you want a GDPR compliant analytics you have to disable many of its flagship features, or use something else like Plausible, designed to work with no consent.
You'll need consent for any data collection not essential to your site's functionality, even if you host stuff yourself.
If all you so is collect how often your pages are being visited then you're not collecting any PII and you don't need a banner, but if you're tracking visitors based on unique identifiers (cookies, IP addresses, etc.) you'll need to get consent first.
Its not only that, its also about the data collected.
Real time data like visitor data and heatmaps aren't allowed, also IP tracking is not allowed too.
Because matomo can be very powerful, more powerful than Ganalytics.
You can let it assign a unique id to every visitor if it visited a subdomain and logged in, so you can now exactly who each visitor is on all of your sites.
I think we learnt enough when big tech offers something for “free” and when they call it “absolutely” free it just means you are absolutely the product.
Clarity is awesome--the metrics and the way it combines a visualization of our site with user session data is amazing. It shows you the actual locations on the page that users visit as well as the path they follow to get there. The insights are far more actionable than Google Analytics from my experience. (We use both.)
p.s., Under the covers Clarity runs on ClickHouse.
This would be nice. I don't track users on my personal blog. I don't give a flying duck about what people do there.
However I make money from running a website which is really useful to a lot of people. I absolutely need to know what works and what doesn't. I can't write and edit in the dark, possibly missing by a mile what my readers really need. It would be like flying a plane without instruments.
For instance, I see that a lot of people use the search to find a single guide, which should definitely be linked on the home page. Without basic tracking, I wouldn't even know which pages are important to my users.
There are many gaping holes in your website that you could be completely blind to without a basic sense of what your users do on your website.
I also caught many illegal copies of my website through referrer tracking. Three of them were phishing websites, and I got them shut down.
So there are many legitimate reasons to have basic traffic counters, and you can have those while respecting your users' privacy and following the spirit and the letter of GDPR.
I wonder if someone's made an AdNauseam for tracking libraries yet. Going on the defensive clearly doesn't work, some more offensive action is required.
Send a whole bunch of plausible events, pretending to click every link, changing your identifiers and stuff like resolution every time, make it impossible to determine what data is real and what isn't. Bonus points for leaving websites alone if they don't load tracking scripts until you've consented.
We can't stop trackers, but we can try to make them useless. Even if they filter out such tracking they shouldn't be able to figure out what data was real, making their tracking attempts worthless.
I believe AdNauseam uses EasyList, so if it doesn't include the EasyPrivacy part of that (which contains the trackers) by default it seems like it would be easy to add.
That said, I don't think this is an effective strategy at all. Safari has placed a big giant hole in tracking (like 20% of users) and lots of sites are still proceeding like nothing has changed. Google referrer spam was run at mega-scale dumping billions (at least) of spam hits into millions of profiles and didn't effect tracking efforts.
A plugin run by .0001% of users or whatever that adds in a bunch of slop to the numbers just makes more analysts pull out their hair rather than leading to change.
Worth noting that it seems to be only "open core", there's a bunch of paywalled features that I presume aren't open source. https://matomo.org/pricing/
ClickHouse: Piwik PRO, Plausible, PostHog, Yandex, Cloudflare
Snowflake: Amplitude, Piano, Snowplow
SingleStore: Fathom
I've written a book on the subject including evaluating the 15 most widely used options: https://gaalternatives.guide