Our local police force has set up a site for 'anonymous' reports from rape victims, and it had GA tracking on every page (plus Google CDN content, another issue). I wrote them to explain why this wasn't the best idea and how Piwik was a better choice.
Crimestoppers (A UK charity) are doing this too, and I wrote them to explain the potential for privacy issues. 'You've outsourced crime victims' privacy to an ad company' was my basic message.
I told both the local police and Crimestoppers how easy Piwik was, and how I thought it was a better idea but I don't think I got my point across. There is a gap in the understanding where the site owner doesn't see the raw data (bot Google does) and so they think it's okay.
Anyhow, interesting privacy issues and it may be that I am overlooking something or being overly cautious.
My clients are already using Google Webmaster Tools and Adwords and want everything integrated, plus they want the reliability of Google.
But another FOSS analytics platform is Snowplow (http://snowplowanalytics.com), and while it wouldn't replace GA for them, it might replace another commercial analytics package. Few high volume ecommerce sites use just GA these days.
Snowplow co-founder here. Thanks for mentioning us liquidcool :-)
Snowplow is a little different from Piwik - Piwik is a LAMP-stack opensource app which replicates a GA-style analytics experience.
Snowplow is more of a scalable event analytics platform - it is built on AWS (CloudFront, Elastic MapReduce, Redshift), does _not_ have a UI but has a very clean & simple event model[1] and scales horizontally to billions of events.
To date, Snowplow is mostly used by web companies that want to warehouse their granular event data to build custom analyses, segment users, personalize sites etc.
I want to take this opportunity to praise SnowPlow.
They allow for collection from just about anywhere. Options include Javascript, GIF beacons for email embed, Arduino, and anything else that can send HTTP GET requests.
Snowplow's collection can be performed by Amazon CloudFront, which results in absurdly low latency world-wide, for cheap. I think I estimated around $0.60 per million hits. However, Snowplow is very decoupled, so collection can also occur on NodeJS servers, or anything else that conforms to their spec. In fact, everything is fairly neatly decoupled, so if you think something would run better in [FAVORITE_LANUAGE/PLATFORM] you are free to implement it.
I run Snowplow on my blog, but only run collection for right now. Eventually, when I get around to setting up the rest of Snowplow, I'll be able to crunch the numbers. This is huge - most analytics services require processing and collection to be closely tied.
All that said, Snowplow isn't plug and play like Google Analytics or even Piwik is. You need to be willing to patiently wade through the Snowplow documentation.
Both Piwik and Snowplow emulate Google Analytics too closely. Using the same syntax is silly, and limits their potential. I'm still waiting for an entirely JSON based analytics service.
On the JSON/GA point - thanks for the feedback. We are slowly but surely evolving from our GA-style API, e.g. with our tracker support for new Mixpanel-style unstructured events (though they are not - yet - supported in our ETL/storage):
We are moving our event model to be Avro-based later this year and then yes we may look at completing our existing tracker protocol with something more JSON-centric. So lots on the horizon!
wow! May I ask where you advertised Snowplow so far? I can't believe that I don't know it.
Awesome product, looks a little hard to setup, due to the Hadoop linkage, but definitely worth to be used on my next projects. Maybe worth making a vm appliance, puppet/chef, or docker script for easier deployment.
Why haven't I ever heard of it? I usually "snowplow" the web for all kinds of software, but it appears that I haven't crossed the marketing channels you've targeted.
Thanks for the kind words X4! Marketing-wise - we are just growing organically, slowly building the userbase as more people come into contact with what we're offering.
Deployment-wise - it is a bit fiddly to set up. A community member is hopefully going to work on some Opscode Chef and Amazon CloudFormation scripts so that should make it a lot easier.
The setup is pretty annoying. I still haven't figured out what permissions need to be granted to IAM users on AWS to do things like spin up the Hadoop jobs.
Having your granular end up in Redshift, ready to be enriched with other biz data sources is a beautiful, beautiful thing, and completely worth it.
Ironically, could the DDOS actually be because of Piwik? With the sudden increase in traffic, it's probably generating a massive amount of data and overloading the database.
Piwik is pretty good alternative and unlike GA, it gives instant analytics and is extensible.
However, it tends to behave clunky after some time, depending on the number of sites, traffic and server where you host it. This is most visible when you try to display a larger date range or just dig in the history.
There is one more important thing to consider: it collects IP addresses of all your visitors out of the box, which might be in conflict with your local laws. Be sure to check that out before adding it to the site.
I think they use sampling once your traffic gets to a certain size, so though they report figures down to individual hits, it's not actually as accurate as that might imply.
Yes, it uses sampling. And the numbers might be way of. We had times, with stats fluctuating 40% if I look at the numbers from yesterday and compare them to the numbers taken more then 48h after the fact.
> There is one more important thing to consider: it collects IP addresses of all your visitors out of the box, which might be in conflict with your local laws. Be sure to check that out before adding it to the site.
Our attempts to install and use Piwik have always been disappointing in the past. It seems to fall apart (slow display etc.) quickly for medium to high traffic sites (50+ million page views/month). Are there any people here who are using it successfully at this order of magnitude and are willing to share some configuration hints?
We gave up on it for the same reason. It is absurd for a simple web analytics app to require massively more powerful hardware than our actual app does. And since it is a mysql mess, any updates that touch the schema put your stats offline for hours. I really can't fathom how using mysql is still considered acceptable in 2013.
I completely agree with the statement. I love Piwik and we use to monitor a lot of our sites. But I hate the fact that it's LAMP. Archive script runs out of memory at least once a week.
We run Piwik on 40M page views per month on dedicated server. Archiving data takes a few hours, but the UI is fast and it works. We used tips in: http://piwik.org/docs/optimize/how-to/
Anyone fancy telling the lot at gov.uk about this? It would be nice if they weren't using foreign owned (and likely foreign-located) servers to record and analyse what UK citizens do on UK government websites.
It seems to be entirely cookie-focussed though, and not even consider that data on UK citizen interaction with government, regardless of cookies, may be something to keep private.
I mean this - "despite the fact that no personal data was collected, it was good practice not to share analytics information with third parties in order to reassure government websites’ users." is ludicrous.
It's trivial for a service with as many hooks into everything as google to correlate cookies and IP addresses of visitors to GA-using sites with their Google accounts and other tracking data. It's almost as ludicrous as the answer I got direct from them "we don't allow google to use the data".
You don't allow it? How exactly is it that you stop them keeping records when every time I visit your pages my computer also tells them what I'm doing?
Hi, realized about this post here in HN thanks to Piwik.
I am the author of the post Tony is linking at the top of his post, and I also use Piwik, where I saw my article was with 300+ visits instead of the 20+ it gets daily. :)
Piwik is great, and I use it to track visits to a few sites I own, all of them are some 6000 page views per day, so no real traffic.
Because my site is powered by Jekyll, I also use vanilla forums as commenting system, so no Disqus or Intense Debate either :)
Have a nice day, and thanks Tony for the link and credit .
Piwik is great until you get a decent amount of traffic, I had it on a client site as an experiment. at 100k page views per hour Piwik nuked the server :(
I'm a lover of Piwik; switch out of GA for Piwik across all the domains I operate for myself and for some clients. One install can handle multiple domains/site/accounts/groups
Many of the clients appreciate the increased "privacy". And for applications (internal/public/private) it just makes more sense to me than GA.
My favourite part is that it's doing server-side log analysis for my traffic - not using those JS based widgets.
It's got event tracking (sweet) if you choose to use a JS based tickler to do that kind of thing.
One thing I feel you miss out if you don't use GA is 'Google-juice'. Maybe it's just like blowing into a Super Nintendo cartridge, but I think using GA increases your SEO with Google.
The reason why this probably isn't true is that it would be a regulator's/anti-trust-buster's dream come true. Imagine it: Google ranks you lower if you don't use other Google products. Don't use GA? Ranked lower. Don't pay for ads? Maybe your organic results drop a bit. . .
Google has to be very careful about its organic search results. Giving extra weight to sites that use things like Google Analytics would be the "confirmation" that regulators would need. As such, it seems like Google wouldn't risk its core business over something like this.
I totally understand the logic: if I use Google Analytics, Google knows that my site is getting traffic and it makes a certain sense to take that into consideration. However, I think the opposite side of that (if sites don't give Google whatever Google wants, they're going to be ranked lower) is a can of worms that Google doesn't want to be seen dipping into.
Authorship on your sites alters your sites appearance on SERPs thereby giving you a distinct advantage or disadvantage that you can only gain by using Google's product.
As mdasen mentioned, Google Analytics probably doesn't have much of an effect on SEO due to regulatory concerns.
However, it probably does handicap you if you use other Google services, such as Adwords. Google Adwords heavily leverages Analytics data to optimize your marketing campaigns, and doesn't offer the ability to interface Adwords with a third-party like Piwik. So not using Analytics will cause Adwords to fly blind to user behavior on your site, and won't be able to tell which leads were useful and which weren't.
Piwik is great and so much better than Analytics. It works especially well for us when we want to track sources of sales, as sales are handled by 3rd party reseller and Analytics goals are of no use.
In Piwik you have the visitors log at a glance, we just match IP/time to the log and BAM - we know where the buyer came from, what pages did they visit and how many time stayed there.
On top of this, our site traffic is hidden from the eyes of Google.
Piwik can be set to follow the Do Not Track header.
It rigidly follows DNT, not storing the request at all.
It looks like Google and other analytics players are going to refuse to follow DNT, after a hilariously weak proposal by the DAA was rejected by the W3C committee.
If you want to give a try to Piwik on your own laptop, AWS or Azure we (BitNami) have free one-click installers, VMs and cloud images http://bitnami.com/stack/piwik
Edit: Nevermind - I just read to the end, Piwik can auto-update to latest 1.12. Missed that when cross-reading and trying it myself while reading.
The guide and the mentioned Github repo use Piwik 1.5.1. There are several security issues with this version (perhaps many more): http://www.cvedetails.com/vulnerability-list/vendor_id-9612/...
Latest version is 1.12 - I advise against this "simple" solution to use Piwik. Perhaps there is a github repo with the latest Piwik?
I've used Piwik myself for years and swear by it for all of my personal projects or sites where I need full data privacy. It provides all the basic data I need for my clients and then some. Also once it was setup I've found maintenance to be pretty simple. I use it for regular web sites, WordPress sites and MediaWiki installations.
Granted, I miss the days of being able to use tools like Analog but it's so badly out of date and not maintained anymore so I only use it when I need to process raw traffic numbers from a server.
I've used Piwik for years and it is incredibly simple to use and set up but this post makes it much more complex than it needs to be. In all honesty, it's just as simple as setting up Wordpress. Drop the Piwik folder on a server somewhere, run the installation (connecting to your database and if I recall correctly you don't need to use the root user, just a user with sufficient privileges), and you're done.
I want to love Piwik, and I do like it a lot, but I do have some problems. Piwik gets slow after a while. This may have to do with the server its running on partly but over time the software will slow down especially if you try to pull out somewhat longer date ranges.
It isn't as pretty as GA. I know this is petty and that its themeable but the UI was important to me. Keeping it up to date and maintaining it was also something that requires vigilance. It isn't hard to update but you have to make sure to check for updates. Sounds simple but you'd be surprised how lazy one can be. Also, integration with Webmaster Tools isn't available which is kind of a bummer.
On the plus side there's very little that GA offers that Piwik doesn't. There's even a great mobile app which GA doesn't yet have to my knowledge. You can monitor multiple sites on different servers using a simple JavaScript snippet just like GA, and it breaks down the data in just about every way you'd want.
In the end, despite really wanting to use Piwik long term I wasn't able to do it. I don't see a problem with using Google Analytics for tracking purposes. Google has the power to abuse the data they collect but I trust them not to. I'm not running a site where visitor privacy is a big priority. If I were running such a site I'd reconsider this position. But from an ethical standpoint if it's somehow not okay for Google to collect tracking data on your visitors (and promise not exploit it) why is it okay for any of us to use Piwik and collect that data ourselves. Google has far more data that can do far more damage but they also have far more resources to put into security than most of us. I can take a pledge not to exploit my user's data but Google does too? I know I can trust myself but my users don't. My users might even prefer that if I were to use analytics software that I use software that comes from Google, a name they know and trust, rather than me, a guy who they know a little bit but doesn't have a reputation that can even remotely compete with Google. To me, that's the more interesting aspect of Piwik. The question of why running your own anaytics software is more ethical than using Google.
Edit: When I said I wasn't running a site that made visitor privacy a priority I was excluding the site I run that actually does make user privacy a huge priority. I'm aware I look like a hypcrite now and I think I might actually spend some time thinking of whether or not to switch over to a self-hosted analytics solution for that site. I'm still not sure that a self-hosted service is preferable in my case but I'm open to the idea.
By default, Piwik will aggregate data when you a) make an API request - b) Load the dashboard (which in fact calls the API). Cron archiving makes this process faster by processing all the data beforehand so that the API can simply request it from the DB.
I just want to mention that Cron archiving really does wonders for performance.
I've got a site that is receiving nearly 10,000 daily uniques since Google's last [panda|penguin|whatever] update in May, and while Piwik's performance was not terrible, setting up the cron a few weeks ago made the whole experience very snappy.
Apologize for hijacking the thread.
I am trying to convince one of my clients to do exactly this. Switch from GA to Piwik for their site which is deployed on Adobe CQ5.
I know I can use client side tracking easily (hence my recommendation). However, do you know if there is support for server side CQ5 logs?
I'm tracking sites by loading their Apache access logs into the database. I'm not sure what's CQ5 logs structure is, but it should be possible (tho it might require some modifications).
Edit: It should be noted that you will get lots of bots requests showing up when you load access logs (the regular tracker relies on javascript, which gets rid of most of the bots). I have a small awk script that cleans the access logs prior to importing them, by trying to detect bots. I can upload that if you're interested.
I think the important distinction is that trust will be either first-party or third-party, and most of your users will never think about it.
If I'm using your site or service I've already decided to trust you somewhat. Self-hosted analytics is just an additional baby step.
If you put a third-party resource on your site (and most of us do) then that third-party is going to have their own entirely separate terms & conditions, which you as webmaster have little or no control over.
It's down to whether you (and/or your users) are okay with farming out visitor privacy to a third-party. Once that question is satisfied it's just a question of performance.
I'm on iOS and I used a bunch of the GA apps and they all sucked or asked you to pay which I'm not opposed to but its not worth it for me. Maybe things in iOS land have gotten better since I last tried one.
> It isn't as pretty as GA. I know this is petty and that its themeable but the UI was important to me.
I explored this too--it turns out that they make money off of the design of custom Piwik UIs for sale. A site license for their white-labeling plugin costs just under 2,000 euro.
My product allows users to create and launch their own website. I'd quite like to be able to quickly provide basic statistics for users (alongside Google analytics if they want it).
We run a multi-tenant application so have thousands of sites running from the same codebase, however, each site owner would need it's own statistics. At the moment, we just let users provide their own Google Analytics, but it would be nice to report to Piwik I think and give them their own preconfigured stats area?
I've used it in the past to do something a bit similar. I had a sort of "master account" that was installed on all domains and then a second individual one per domain. Piwik worked really nicely and gave the end user the ability to see a sort of "combined stats" as well as per-domain stats. We couldn't get GA to work in this way (I think that has since changed).
This was about two years ago now and I've followed the development on and off and it's certainly come on leaps and bounds. It's definitely worth some investigation for your use-case.
I've done this on the product I'm developing. You can set it up programmatically and it works exceptionally well. That being said, there are some performance considerations, especially if the sites you're hosting are high traffic.
I'd recommend reading the documentation and doing a bit of searching for a configuration that will fit your needs before you begin. It will save you a lot of time.
Good luck and let me know if you have any specific questions!
I would not recommend Piwik over GA. The #1 reason is that Piwik does not track the (not provided) Google keyword searches. Google now provides the (not provided) keywords with the site page attached now. For example: (np - /pricing), so you at least have an understanding of what they searched for. This is a big factor if you're serious about SEO.
I have implemented Piwik widgets in my latest project for tracking visitors to personal pages. When you visit this page
http://reminderof.me/ruggero I see the insight on my dasbhboard.
IMHO Piwik is a valid open-source alternative to Google Analytics and will erode its application marketplace.
For hosting providers (or those nice people who share their servers with friends), you can have Piwik automatically installed by default when creating new sites.
This is impossible to do with GA, as GA require the creation of personal accounts and injection of code into the customers/users website.
I tried Piwik on a number of sites to move away from GA, but I was simply not able to match the minimal impact on my load-times compared to Google. Self-hosting your analytics solution still requires an optimized web-server - things like TTFB were a real issue for me.
I believe he is referring to an event such as a javascript based event when a user clicks a button etc. I'm also interested in event tracking because my WebGL site uses it to track user interaction since users only visit the WebGL view and don't visit multiple pages.
Piwik calls those "goals". You can definitely kick those off from javascript. We run piwik on our web solitaire site (http://greenfelt.net) and kick off goals when users do things like complete a game. The code looks like this:
if (piwikTracker)
piwikTracker.trackGoal(2);
My only annoyance is that you have to reference things by numbers and not some nice string that reads well in the code... But it works and it's something you setup once and forget about, so I got over it.
Crimestoppers (A UK charity) are doing this too, and I wrote them to explain the potential for privacy issues. 'You've outsourced crime victims' privacy to an ad company' was my basic message.
I told both the local police and Crimestoppers how easy Piwik was, and how I thought it was a better idea but I don't think I got my point across. There is a gap in the understanding where the site owner doesn't see the raw data (bot Google does) and so they think it's okay.
Anyhow, interesting privacy issues and it may be that I am overlooking something or being overly cautious.
I posted a related question over on Stack Exchange if anyone's interested in providing some feedback there. http://webmasters.stackexchange.com/q/47069