We ditched Google Analytics

eponeponepon · on Dec 8, 2015

    It took us only a few weeks to write our home-brew 
    analytics package. Nothing super fancy yet now we have 
    an internal dashboard that shows the entire company much
    of what we used analytics for anyway - and with some 
    nice integration with some of our other systems too.

I never quite grasp how the above isn't just a matter of intuition to anyone working in the tech sector. Google Analytics thrives on developers' laziness in my opinion.

And to echo other posters: SpiderOak deserve thanks. If I find myself with any need for a service like theirs, I know I'll be looking at them.

gedrap · on Dec 8, 2015

>>> I never quite grasp how the above isn't just a matter of intuition to anyone working in the tech sector. Google Analytics thrives on developers' laziness in my opinion.

Ah, the "not invented here" syndrome!

There are tons of things that you could do "in a couple of weeks" that more or less work. However, it doesn't mean you have to or even that it would be a good idea.

If all developers adopted the attitude that you have expressed, there would be thousands of sad sad developers who need to maintain shitty in-house analytics system because someone once said "I could do it in a week". There are tons of awful CMSes already because someone once said "I could do better than wordpress" / "I could create a better framework" / etc.

In a lot of the cases, GA is just good enough. Sure, you might need to spend some time to explore its features (custom dimensions, etc), there's more to GA than a number of pageviews for a given day. There are cases when GA is not enough. Fair enough. But it's definitely not the majority of the cases.

Sure, it makes sense for SpiderOak given it's target audience. However, there's no need to make such a generic statement about 'anyone working in the tech'.

EGreg · on Dec 8, 2015

The answer is open source. Where is the wordpress to google's analytics product?

gedrap · on Dec 8, 2015

Piwik, Snowplow, etc. They do exist.

Then the question is do you really want to maintain the infrastructure required to run the analytics smoothly? Especially if your company has dozens of millions of pageviews a month and depends on the real time needs (extra infrastructure to support that).

Are you familiar enough with the stack so you could have a high degree of confidence that you can fix productions issues which are inevitable? Quite often, an honest answer here is 'no'. Then can you afford to lose a few hours/days/weeks (whatever it would take to fix the issue) of data? Again, often the answer here is 'no'.

Of course, you have hosted solutions. But they are no better than GA in terms of privacy.

Paid support exists too but the cost can skyrocket pretty quickly, on top of paying for the infrastructure and maintaining it.

eevilspock · on Dec 8, 2015

Processing logs is a lot cheaper than the javascript download and other additional http requests needed for google analytics, not to mention the privacy costs. Cheaper for the website, the user, and the web in general.

Not to mention you get perfectly accurate analytics, with no loss due to request blockers or disabled javascript.

The code for this is generic. An open source solution costs nothing beyond some CPU to process the logs and a database to store the analytics.

iheartmemcache · on Dec 8, 2015

It's been a while since I've used GA but being able to segment into age, gender, and interests(1) are things that you can't do without paying a marketing aggregator hundreds of thousands of dollars a month or using GA. You can do some geolocation classification and things like campaign effectiveness, bounce rate, etc, but since Google has so much aggregate data off-hand the value of being able to classify user-x as "Male, 40s, Interests-similar-to-demographic-we-sell-to"(2) is invaluable whether you're selling seats of enterprise software, high-fashion luxury items, or cheapo stoner knick-knacks. You can't really market segment with your own software.

(1) https://support.google.com/analytics/answer/3125360?hl=en&re... (2) https://support.google.com/analytics/answer/2819948

eevilspock · on Dec 9, 2015

Sure. So now the question is, why would Google offer all of this for "free"? Is it really free? Who pays, and in what ways?

iheartmemcache · on Dec 9, 2015

Obviously, they're using the same information that's helping you calibrate your campaigns to add to the hive-mind, so they can further data-mine. You're sacrificing the anonymity of your end-users in doing so. Obviously they're offering it so that they can refine their profile of you more accurately to sell ads / direct more relevant traffic to you better. I'm not an industrial engineer but I've been reading about it for the last few weeks. I turned off Adblock for a while and even with my Opt-out plugins(a,b) I started getting ads for $4,500 Fluke multimeters. The combination of one's search history plus a fairly comprehensive history of the sites you visit(b) to a terrifying degree, but at the same time, the average business with only a few million dollars a year going towards both sales and marketing can't really approach Quantcast and ask for access to their API.

a: https://tools.google.com/dlpage/gaoptout b:https://chrome.google.com/webstore/detail/do-not-track/ckdcp... b: I don't have the study off-hand, but IIRC some guy after finishing his masters from Stanford wanted to assess how much information Google had re: an average users browser history. The findings, based off Common Crawl data of the top 100k sites + presence of GA.js yielded something like ~> 75% of the web was tracked (not to be confused with how much of an end-user's traffic is tracked, that number will be far higher) based on sites with a GA.js history factoring in Referer tags. Those were unweighed numbers, i.e., I bet more than one out of two 45 year old woman's traffic can be analyzed to a 95% degree of completely entirely based off of Pinterest, Facebook, search history and the outbound links from her e-mail.

e12e · on Dec 9, 2015

Interesting points. I think there are many ways to use Google Analytics that go beyond what many people want from "visitor data". [Some of t]he kind of questions GA can answer is only possible if one is willing to collude in destroying (meaningful) privacy.

I've had "simple foss analytics" on my todo-list for quite some time. I'm hoping one can build on what piwik have collected wrt bot agent strings, ips etc - and combine with a simpler collector (adding php to the stack just for analytics isn't very appealing, never mind a php codebase of somewhat questionable quality).

Snowplow looks good, but I'm not sure if they have a supported "self-host" stack yet (they started out very awz/s3 centric).

I actually think there's room for a new product, that puts a little bit more thought into what questions it makes sense to ask, and how best to answer them (eg: does collecting metrics on every visitor even make sense if you can answer the same wuestions just as well by doing random sampling? You might want to quantify where your bandwith goes - but simple log analysis might do that easily enough - and it might have very little to do with your human visitors etc).

__jal · on Dec 8, 2015

If you make decisions with money riding on the answers, it costs a lot more than CPU and DB.

Perhaps systems administration is somehow very cheap for you, but I'm willing to bet it is still not "nothing" - even if the cost is you personally not watching a TV show you like because you're patching the web server on your analytics box for your personal vanity domain, that's still a cost.

For most operations, sysadmins are somewhat expensive, and because of that, busy. This is why Urchin was such a good idea, and why Google bought them - the proposition is to trade your users' privacy for the admin time it takes to support another internal app. There's an absolute no-brainer, assuming you don't care about your users' privacy (IIRC, they were going to sell the service before Google ate them, but that's ancient and trivial history).

eevilspock · on Dec 8, 2015

>because you're patching the web server on your analytics box

If you're business is so small that an additional low-volume web server just to display your analytics (you don't need one for the actual tracking) is a big deal, then the same web server that serves your product can serve your analytics. Not a big deal.

__jal · on Dec 8, 2015

I'm glad we both agree that analytics for a vanity domain is not a big deal. It also was a bounding example for my argument, not my argument.

efuquen · on Dec 8, 2015

I don't think it's a matter of laziness. More so where is it best to spend your expensive/valuable developer resources, on the product or some home-baked analytic's framework?

I applaud SpiderOak, but they are much different from most other sites. They have privacy conscious customers to begin with, this is something that is good press for them and probably a net positive on their bottom line for doing it, not the case with most other sites. Also it's something they are doing after having a very mature product for many years, clearly not the first or most important thing they needed to tackle as a company.

IE6 · on Dec 8, 2015

Agreed - for some cases just pasting the GA snippet onto a site is sufficient. For others you should add events and such. For others you must roll your own.

marcosdumay · on Dec 8, 2015

If it's worth A/B testing your site, it's worth doing it with a tool that understands your costs and revenue structure.

GA is mostly used by people that don't need it, yet want to pretend they get actionable data out of it.

redthrowaway · on Dec 8, 2015

It's not laziness, it's opportunity cost. For SpiderOak, it makes sense to spend a few weeks of a few developers' time to roll their own analytics. For me, it doesn't. Our customers aren't privacy-focussed. In fact, our app depends on them explicitly sharing [quite a lot of very personal] data with us. I would rather spend that time building something that delivers value to them and us than indulging my personal beliefs about privacy.

rplnt · on Dec 8, 2015

Aren't there self-hosted analytics anyway? Piwik[1] comes to mind first, but I'm sure there are many.

1. https://piwik.org/

billmalarky · on Dec 8, 2015

Piwik is incredible. But it should be noted that it does provide a scaling challenge for high traffic use cases (> hundred million actions per month), and hosting your own analytics is expensive.

I bring this up because people had been slamming moot for using GA on 4chan instead of piwik without understanding why.

bjelkeman-again · on Dec 8, 2015

We have much lower traffic than that and our Piwik servers, with paid support from the Piwik team, often struggles to generate reports etc. Not convinced Piwik is that easy to scale.

billmalarky · on Dec 8, 2015

People have scaled it to over a billion actions per month. No clue how much of that includes customizations though... It sounds way past the out of box limit.

Look at the comments from sandfox and afterlastangel in this thread. afterlastangel is pushing a billion, sandfox is around 300 MM per month.

http://forum.piwik.org/t/high-traffic-piwik-servers-database...

_oadw · on Dec 9, 2015

I'm looking into replacing GA Premium ever since Easylist blocked GA tracking for Adblocked users and self-hosted Piwik seems like the best solution. I'd be well into the billions.

billmalarky · on Dec 10, 2015

With that kind of traffic hopefully you have the resources to pull it off. Good luck!

wishinghand · on Dec 8, 2015

Do any of the Google Analytics alternatives scale to that size?

rbinv · on Dec 8, 2015

Free alternatives? Not really. Paid? Yes, SiteCatalyst and Webtrekk come to mind.

People seem to ignore that the tracking JavaScript is not what you're paying for. It's the backend + servers.

billmalarky · on Dec 8, 2015

People have taken piwik to 300MM up to over 1 billion actions per month. But it certainly isn't "set it and forget it."

http://forum.piwik.org/t/high-traffic-piwik-servers-database...

sarciszewski · on Dec 8, 2015

See https://news.ycombinator.com/item?id=10697045

Piwik is still using (unsalted) MD5 for passwords in 2015, and probably will still be using unsalted MD5 in 2016.

etjossem · on Dec 8, 2015

This is pretty bad. Piwik could be a high value target depending on the nature of the site it is used to analyze.

I can't believe unsalted MD5 is "by design" (https://github.com/piwik/piwik/issues/8753).

sarciszewski · on Dec 8, 2015

https://theintercept.com/2014/02/18/snowden-docs-reveal-cove...

Considering Piwik is used by the GCHQ, I find it hilarious.

Someone1234 · on Dec 8, 2015

They're using an open source analytics software package to analyse the very data it was designed to analyse.

I don't find it using poorly implemented hashing in the administrative interface to be at all relevant to what they're doing, or why they shouldn't be using it.

etjossem · on Dec 8, 2015

Information on who visits WikiLeaks - and what they read and upload - is an incredibly high value target. I don't see how you can argue otherwise, when Britain's top intel agency has an expensive line item in their budget just to get at that info.

Given these known security flaws, it's not a stretch to assume anyone who can see the GCHQ's Piwik server can have that data too, regardless of whether they are authorized.

See below for a small preview of what an attacker could exfiltrate (dissident IPs redacted for a reason):

https://firstlook.org/wp-uploads/sites/1/2014/02/piwik2.png

While we're talking about poor security practices: the privileged username in the screenshot is apparently still the default ("admin"), so I hope the password isn't still "changeMe" ... http://piwik.org/faq/how-to/faq_191/

Albright · on Dec 8, 2015

Unsurprisingly, Wikipedia has a list: https://en.wikipedia.org/wiki/List_of_web_analytics_software

tux3 · on Dec 8, 2015

Wikipedia's love of lists is absolutely amazing: https://en.wikipedia.org/wiki/List_of_lists_of_lists

rqebmm · on Dec 8, 2015

https://en.wikipedia.org/wiki/List_of_list_of_lists_of_lists

Someone1234 · on Dec 8, 2015

Strangely Microsoft's one is missing: Application Insights.

Pretty much works like Google Analytics but utilises both client JavaScript and embedded runtime code to generate a richer picture of what is going on.

Too bad the interface on the Azure Portal is terrible. They spent too much time making it look fancy, and not enough time getting the 101s of usability right (which is a criticism I'd lay at the feed of the new Azure portal in general).

nateguchi · on Dec 8, 2015

Who makes these lists?!

blowski · on Dec 8, 2015

Well you can see the list of users here:

https://en.wikipedia.org/w/index.php?title=List_of_web_analy...

bencollier49 · on Dec 8, 2015

Good question!

Probably the vendors of the software concerned. Perhaps it started out as a list of three with a major bias towards a particular product. And then the competitors responded, moderators did their things and eventually an accurate list was evolved.

vermontdevil · on Dec 8, 2015

Does the adblock/ublock etc block this as well?

Am looking to use this in lieu of Google Analytics.

jdmichal · on Dec 8, 2015

Self-hosted means that it will be served from your own servers, and thereby your own domain. So unless your domain is on a block list, it will be loaded.

EDIT: Sorry, I've been dealing with uBlock Matrix for too long, and forgot how advanced the other blockers pattern matching is. See the many responses to this for better information.

pdkl95 · on Dec 8, 2015

(my apologies for the tone - I have edited the post to try to keep it purely fact based)

From EasyPrivacy[1]

    /piwik-$domain=~piwik.org
    /piwik.$script,domain=~piwik.org
    /piwik.php
    /piwik/js/*$domain=~piwik.org
    /piwik1.
    /piwik2.js
    /piwik_
    /piwikapi.js
    /piwikC_
    /piwikTracker.

This doesn't include any renamed versions, nor does it include the numerous domain-specific variations.

[1] https://easylist-downloads.adblockplus.org/easyprivacy.txt

jdmichal · on Dec 8, 2015

Slow down there guy, it was a simple mistake. I've been using uBlock Matrix for too long is all.

jrgv · on Dec 8, 2015

The EasyPrivacy block list contains an entry that will block the piwik.js file. Of course, when you're self-hosting, it's trivial to serve that file with a non-default name.

jdmichal · on Dec 8, 2015

That's an interesting choice. I mean, it's not like you can hide from the web server that you are making the request. But then again, I'm assuming -- by the sheer necessity of having a JS file -- that they are collecting some additional metrics not available to the server in the request.

Paul-ish · on Dec 8, 2015

Those filters could be in place to block the Piwik cloud service: https://piwik.pro/cloud/?pk_source=Piwik.org&pk_medium=Cloud...

username223 · on Dec 8, 2015

It will probably take awhile, but trackers will move to aggregating log files, and blockers will move to TOR. And the arms race continues...

mattab · on Dec 16, 2015

Piwik for example already can import log files: http://piwik.org/log-analytics/ as an alternative to JavaScript tracking

modoc · on Dec 8, 2015

No it parses web server logs, however as mentioned above it doesn't work well for very high traffic sites.

rbinv · on Dec 8, 2015

Piwik relies on client-side JavaScript for tracking, not log analysis.

modoc · on Dec 8, 2015

http://piwik.org/log-analytics/

Flammy · on Dec 8, 2015

They have both, most users use the client side javascript. I'm not familiar with how well the log analysis works.

rbinv · on Dec 8, 2015

Sorry, I was not aware of that feature.

untog · on Dec 8, 2015

And I never quite grasp why many people working in the tech sector are insistent on reinventing things that already exist. Such thinking thrives on developers' personal sense of exceptionalism in my opinion.

seanwilson · on Dec 8, 2015

Yeah, a nontrivial app is comprised of so many parts, if you tried to reinvent a few of them yourself you'd never get anywhere. Also, try looking at the commit history and issue lists of seemingly trivial libraries. It's incredibly easy to underestimate how complex something that looks simple at first can be.

ascendantlogic · on Dec 8, 2015

That starts going down the path of the "not invented here" mindset. You could then attribute not hand-rolling every bit of infrastructure yourself as "laziness". Yes, I am lazy to the point that I don't want to hand-roll an industrial-strength RDBMS myself, or the operating system, or the networking protocol, or the key/value store, etc etc.

onion2k · on Dec 8, 2015

If all you want to know is who accessed a site, with which browser, how long for, and which pages they looked at then you could get all that from your webserver's log files without writing any code. On the other hand, to build something that's robust, relatively scalable, works across browsers and devices, and can give you an event watching platform like GAnalytics gives you (eg the useful bit), that is far from trivial.

raverbashing · on Dec 8, 2015

Most developers don't develop (major) libraries, languages and OSs in house, it doesn't mean they are lazy, it means the company need to focus limited resources on their core business.

bluedino · on Dec 8, 2015

>> Google Analytics thrives on developers' laziness in my opinion.

Every service does. Pingdom, GA, Olark, Github...

It took them a few weeks to write their own analytics. What features did they not implement? How many people worked on it?

Does your 1 or 2 person startup have 4 weeks to write their own analytics package or do you have more important stuff to do? (I'm betting you do. Like launching your product instead of re-inventing the wheel with analytics)

beerbajay · on Dec 8, 2015

> Google Analytics thrives on developers' laziness in my opinion.

It's almost never "developers" who are deciding to use GA; it's middle managers or marketing departments.

fenomas · on Dec 8, 2015

Isn't GA's main draw its close integration with adwords and whatnot? The dashboard and UI seem pretty clearly aimed at someone who needs to manage their spending on google marketing services, not on someone who needs to count pageviews.

So it's not hard to imagine marketing wanting it; presumably it provides them a lot of value that wouldn't be easy to recreate in-house.

cm2012 · on Dec 8, 2015

My experience with in-house recreations of off the shelf solutions is disappointing at best.

k__ · on Dec 8, 2015

Well, it's not a thing to implement in a few days, but a few weeks.

That maybe no option...

MichaelGG · on Dec 8, 2015

If you can reimplement GA in a few weeks, you need to do this over December, then enjoy your FU money.

GA is rather deep, with tons of integration and ways to slice and segment data.

Yeah, maybe in a few weeks you can get _something_ that'll give you something that'll make some manager not too unhappy. Seems like a terrible value prop for almost all companies since, unfortunately, approximately no one cares (or they run adblock anyways).

k__ · on Dec 8, 2015

I mean implementing a analytics tool that does what you need. If you do it just for yourself, you don't need all those fancy things, so it is often doable in a few weeks.

angersock · on Dec 8, 2015

If it takes you more than a few days to put together a basic analytics platform and reporting system, you're a script kiddie.

Not hard to track page hits, time on, time off, and arbitrary events.

EDIT:

Seriously? Folks, it's a table for analytics events, a few SQL queries to do basic reporting (at least in Postgres), a little bit of client-side JS to post the events, and a bit of server-side code to create the routes and maybe display the report page.

I guess if it doesn't include Kafka, Mesos, Kubermetes, Neo4j, and Docker, it isn't delivering business value.

ffs.

Moru · on Dec 8, 2015

It is quite costly to write to the database for each hit, I guess most downvotes is because of this. If you limit writes by keeping them in some memory cache it's doable for slightly higher loads.

angersock · on Dec 9, 2015

If you're Google, sure. Most of the startups here aren't Google.

People are prematurely optimizing if they fear is "but but but mah datamoose".

Also, it's not prohibitively costly if you do even slight batching of the events, say batch load between every five minutes or an hour.

I'd love to hear somebody with war stories chime in though!

muzmath · on Dec 9, 2015

You criticize people for _premature optimization_ while in the same breath advocating rolling your own, shitting implementation for page views? Right...

angersock · on Dec 9, 2015

Incorrect--remember, there is no way of guaranteeing privacy of users *if you outsource analytics for your page.

You're missing the point of why "rolling your own shitting [sic] implementation" is worth it: it's not the speed, it's the privacy.

digi_owl · on Dec 8, 2015

> thrives on developers' laziness in my opinion.

Frankly most of what i read out of the tech world these days seems to be about pandering to developer laziness.

All manner of APIs and services seem to exist in their current form simply to extract rent from developers that don't want to do back end "dirty work".

jessaustin · on Dec 8, 2015

Being paid for doing work by those who need the work done is the opposite of "extracting rent".

pdkl95 · on Dec 8, 2015

Being paid for doing work has nothing to do with extracting rent, which is the practice of inserting yourself as a middleman so other people have to pay you "rent"[1] where none should be required.

The entire idea behind writing a Service as a Software Substitute[2] is about extracting rent.

[1] https://en.wikipedia.org/wiki/Rent-seeking

[2] http://www.gnu.org/philosophy/who-does-that-server-really-se...

hsitz · on Dec 8, 2015

I understand Stallman's dislike of SaaSS in [2], but I fail to see how it meets any definition of rent-seeking. People who provide SaaSS are using economies of scale to offer services that are desirable to some, because they're offered at a cost that is less than the cost of developing and maintaining their own private solution. There is certainly a loss of freedom in using these services, as Stallman points out. But rent-seeking, not so much. Users of SaaSS need to decide whether the cost savings of using SaaSS is outweighed by the freedom they give up. Nothing more, so far as I can tell.

jessaustin · on Dec 8, 2015

Perhaps you should have read that wikipedia page before so helpfully linking to it. There's nothing about "middlemen" there. "Rent" is political economy jargon; it's not just a synonym for "distasteful practices". Adam Smith wasn't complaining about shopkeepers or shipping companies, and he certainly wasn't talking about "back end" software services. There is no royal decree enforcing how such services shall be provided. If you don't like AWS then use GCP.

oldmanjay · on Dec 8, 2015

I feel like someone needs to rewrite Stallman's missives to eliminate the term redefinition and the connotation management. His usage of these rhetorical techniques is far too ham-handed to be persuasive to those who aren't already convinced, even when his message is important.

andybak · on Dec 8, 2015

Laziness is a fine quality to have in a developer: http://threevirtues.com/

daveguy · on Dec 8, 2015

I would add wisdom to that list. Wisdom to know which modifications will allow you to be the lazy in the future and produce the best results before the user realizes they needed them. I think wisdom is a very important one.

chimeracoder · on Dec 8, 2015

> I never quite grasp how the above isn't just a matter of intuition to anyone working in the tech sector. Google Analytics thrives on developers' laziness in my opinion.

Unless I'm mistaken, one big difference is that not using Google Analytics means you don't know which Google search pages people used to access your website. That can be a really important difference for some websites.

aargh_aargh · on Dec 8, 2015

Can't you find out from the Referer header anymore? It's been years since I tried, so it may have changed.

kalleboo · on Dec 8, 2015

Only if your site is HTTPS

cgag · on Dec 8, 2015

I think you can get that info with webmaster tools without using analytics.

limelight · on Dec 8, 2015

Having implemented two different custom analytics dashboards, it's a lot more complex than you think.

Sure, the basics are easy. But marketers and business people want to drill into a lot of data which is non-trivial to gather.

Unless you have a compelling business case (which SpiderOak does), it's not worth it.

yoo1I · on Dec 8, 2015

A lot of people are replying to the suggestion of implementing your own analytics by calling out it's NIHness.

I've recently been faced with this problem, and a solution doesn't have to be too complex.

There are roughly two parts to an analytics solution: event logging and, well, the actual analytics.

Writing your own logger in javascript is super simple, you're just sending off json objects to be inserted into a elasticsearch cluster. Since you have to define that logging anyhow, the only extra work you need to do is the layer to do the actual ajaxrequests.

What's left is running and defining your queries in elasticsearch.

BAM! Analytics

I realize it's not fit to be used for every situation, but it can so some pretty complex things this way without the hugest amount of effort ...

gedrap · on Dec 8, 2015

I get what you are trying to say and I was one of the NIH-sayers, it totally makes sense in some cases and looks like it made sense in your case. Great! :)

I don't think anyone was saying that GA is always better, it's just more often than not it is. It takes some skill and quite a bit of experience to draw the line at a reasonable place and correctly recognize the trade offs.

fiatjaf · on Dec 8, 2015

I've replaced Google Analytics in all my projects with my CouchDB-only web analytics service, Microanalytics[1], which I could access from a CLI[2] and worked very well.

But then I started to fall short on disk space for storing too many events. This is a problem.

[1]: https://github.com/fiatjaf/microanalytics

[2]: https://github.com/fiatjaf/microanalytics-cli

tootie · on Dec 8, 2015

    much of what we used analytics for anyway

Until your requirements grow and your stuck building something that was in GA 5 years ago.

ubersync · on Dec 8, 2015

Don't the ad blockers disable Google Analytics by default? If I am not wrong, I think uBlock Origin does.

So, I think, as more and more people will start using ad blockers, site owners will start getting less and less accurate stats from Google Analytics, forcing them to implement their own solutions. Hopefully, open source solutions will start providing the best features that Google does.

Moru · on Dec 8, 2015

Anything that is widely used (open source or not) will be blocked because of common names or other patterns that can be recognized and blocked. If you need exact statistics you need to roll your own sooner or later. Or at least heavily customize some other product.

jkestner · on Dec 8, 2015

And GA is inscrutable. I don't use it very much because it's got way too many layers of abstraction. It was fine before as Urchin. Maybe this is a category like email clients — there should be a sustainable paid product that doesn't suck.

fiatjaf · on Dec 9, 2015

There's also http://get.gaug.es/, which seems great.

I have gone through their trial, but now I think I will register for the Solo account ($6/mo).

morninj · on Dec 8, 2015

Maybe http://haveamint.com/ is what you're looking for? (I'm not affiliated--just a former user.)

Retra · on Dec 8, 2015

Everything developers don't do is a matter of laziness if you ignore the fact that they might have other priorities.

splatcollision · on Dec 8, 2015

Looking for a good npm / express middleware module that does this. Combines privacy concerns + developer laziness!

justinkramp · on Dec 8, 2015

Maybe https://crypton.io/

_mqkw · on Dec 8, 2015

Not strictly on topic so I apologise if this is unwanted but I thought I'd share my experience with SpiderOak in case anyone here was thinking of purchasing one of their plans.

In February SpiderOak dropped its pricing to $12/month for 1TB of data. Having several hundred gigabytes of photos to backup I took advantage and bought a year long subscription ($129). I had access to a symmetric gigabit fibre connection so I connected, set up the SpiderOak client and started uploading.

However I noticed something odd. According to my Mac's activity monitor, SpiderOak was only uploading in short bursts [0] of ~2MB/s. I did some test uploads to other services (Google Drive, Amazon) to verify that things were fine with my connection (they were) and then contacted support (Feb 10).

What followed was nearly __6 months__ of "support", first claiming that it might be a server side issue and moving me "to a new host" (Feb 17) then when that didn't resolve my issue, they ignored me for a couple of months then handed me over to an engineer (Apr 28) who told me:

"we may have your uploads running at the maximum speed we can offer you at the moment. Additional changes to storage network configuration will not improve the situation much. There is an overhead limitation when the client encrypts, deduplicates, and compresses the files you are uploading"

At this point I ran a basic test (cat /dev/urandom | gzip -c | openssl enc -aes-256-cbc -pass pass:spideroak | pv | shasum -a 256 > /dev/zero) that showed my laptop was easily capable of hashing and encrypting the data much faster than SpiderOak was handling it (Apr 30) after which I was simply ignored for a full month until I opened another ticket asking for a refund (Jul 9).

I really love the idea of secure, private storage but SpiderOak's client is barely functional and their customer support is rather bad.

[0]: http://i.imgur.com/XEvhIop.png

Someone1234 · on Dec 8, 2015

Many of these types of services seem to intentionally cap upload speeds to reduce their potential storage liability (since they're likely over-selling storage to be able to offer 1 TB for $12 with the level of redundancy, staffing costs, etc, needed).

I wonder if that is happening in this specific case? Although if it were the case the vendor should still be honest about it. Just saying they limit uploads to 2 Mbps is better than giving the run-around.

toomuchtodo · on Dec 8, 2015

> reduce their potential storage liability

Its to reduce their maximum bandwidth capacity required. I don't see it as a problem, considering their price points. They're selling you storage, not "slam 1TB of your data into our storage system in a day". If you're looking for that, ship a hard drive to Iron Mountain.

EDIT: Even AWS limits how fast you can upload to S3, and built an appliance for you to rent and ship back and forth if you need to move data faster. That station wagon full of tape is still alive and well.

tombrossman · on Dec 8, 2015

> Even AWS limits how fast you can upload to S3...

I'm on gigabit fiber and use S3 to backup hundreds of gigs per month to S3. I've never seen them limit upload speeds, it is clearly saturating the connection for the entire duration of my upload. I would expect that because I am paying for the storage, they would be happy to let me write data to their machines as fast as I like. Is there a citation you can provide from their docs that supports your statement? Genuinely curious, because my experience has been different.

To the point that some of these sync or backup providers limit bandwidth, I have definitely experienced that. Tested SpiderOak and Dropbox and upload speed was horrid. Dropbox in particular was disappointing because they can't even claim to have the extra encryption overhead SpiderOak does, it was just shit speed every day. I'm paying a premium for gigabit fiber to the home and you really can tell who over-promises and under-delivers quickly. Fortunately my 'roll your own' backup + sync works well and is price competitive so I'll stick with that.

toomuchtodo · on Dec 8, 2015

> I would expect that because I am paying for the storage, they would be happy to let me write data to their machines as fast as I like.

I don't understand why you'd think this. You're paying for storage, not an SLA as to how fast you can fill it.

> I'm paying a premium for gigabit fiber to the home and you really can tell who over-promises and under-delivers quickly. Fortunately my 'roll your own' backup + sync works well and is price competitive so I'll stick with that.

This is the preferred solution if a) commercial services are too slow for you and b) you're willing to spend the time to implement and manage it. It appears, based on commercial services out there, that there is no competition based on upload speeds.

blfr · on Dec 8, 2015

He thinks this because it's in Amazon's interest to let him dump as much data as possible. It's not a matter of an agreement, it's a matter of aligned incentives.

toomuchtodo · on Dec 8, 2015

Thanks, I had not thought of that.

toast0 · on Dec 8, 2015

> Its to reduce their maximum bandwidth capacity required.

They should be looking to partner with someone who has bandwidth problems in the other direction. By combining a backup service's upload bandwidth and a streaming video service's download bandwidth into one AS, you can get a more balanced stream, and qualify for free peering.

toomuchtodo · on Dec 8, 2015

Yeah, agreed. The problem is, you're limited to partners in the same DC as you (unless you're going to bite the bullet and start using fiber loops between datacenters to accomplish this). Backblaze (for example only) is only in one DC in Northern California if I recall, which limits them to whomever is in that datacenter.

A great model would be to parter with CDNs; they pour content out to eyeball networks, but you could run a distributed network of your storage system across all of their POPs.

wernercd · on Dec 8, 2015

If I have buy a 1TB plan to hold 999GB of data and it takes months to push that data up...

That ZOMG WHAT A DEAL! of a plan is kinda worthless...

"Slam" is a bit of a loaded word, since... if they are selling 1TB of storage, shouldn't we get 1TB of storage?

That's the same crap that ISP's tried to pull with UNLIMITED INTERNET!!! (as long as you stay under 30gb per month)

toomuchtodo · on Dec 8, 2015

> if they are selling 1TB of storage, shouldn't we get 1TB of storage?

You do, they're just not allowing you to store it in 24 hours. Some services (Backblaze, if I recall) allow you to ship a drive to get around this limitation.

Notice that all services do this? If you can do better, build one! Prepare to go broke from the peak bandwidth requirements you'll need to build your networking architecture to support such transfer rates, but I always encourage experimentation and learning lessons over complaints.

res0nat0r · on Dec 8, 2015

Sounds like an upload cap speed should be stated somewhere, at least in a FAQ then yes?

toomuchtodo · on Dec 8, 2015

I agree, it should be disclosed upfront.

X-Istence · on Dec 8, 2015

The appliance is so that you don't need to send terabytes of data over a 10 Gbit/sec connection for example to their datacenter.

The limitation is actually the pipe that connects you to Amazon, not an inherent limitation within S3 or other services within Amazon on connection speed. If you have a good enough connection, or peering with Amazon things go amazingly fast.

When I worked at an ISP, we slammed about 20 Gbit/sec into S3 without issues, but even then data we were backing up -- about 300 TB of data a day -- at that rate took 1.4 days to upload to the cloud, so we ended up backing it up in-house instead. (we needed to store the data for 7 days, after that it went bye bye).

toomuchtodo · on Dec 8, 2015

> When I worked at an ISP, we slammed about 20 Gbit/sec into S3 without issues, but even then data we were backing up -- about 300 TB of data a day -- at that rate took 1.4 days to upload to the cloud, so we ended up backing it up in-house instead. (we needed to store the data for 7 days, after that it went bye bye).

Seems like the perfect usecase for S3; inbound transfer is free, and you're only paying for a rolling 7 day window of storage with lifecycle rules :/

niutech · on Dec 12, 2015

Can I ask, why would an ISP upload 300TB data/day? Are you wiretapping all users' packets?

Joona · on Dec 8, 2015

Definitely looks like it to me. Took me a good month to back up my (video) files with CrashPlan, as it was using some 10% of my upload.

I think it would be a good selling point for a service like this to allow higher upload speeds.

BillyParadise · on Dec 8, 2015

A good upsell, yes. But initial seeding to "affordably priced" online services at full data rate can never be economically viable to the provider. Bandwidth is cheap(er) these days, but routers which can handle big bandwidth are still big bucks.

Hold on, this is hacker news. VCs, this is a great idea!

No, no of course it's not. Initial seeding is a competitive moat for the first mover. Moving a few hundred gigs to a new backup company just to save a few bucks? I don't think I could be bothered, because I KNOW how long it will take.

jewel · on Dec 8, 2015

Pricing is falling rapidly for storage. Consider that S3 - IA is $15/mo for a TB, and backblaze B2 can offer 1 TB for $5/mo. I would assume both are making some profit at those price points, so $12/TB/mo should be workable if the service is doing their own hardware.

Backup services especially have low operational requirements for their hardware and network connection, since once the files are uploaded they only need to be verified periodically.

newscracker · on Dec 8, 2015

> Many of these types of services seem to intentionally cap upload speeds to reduce their potential storage liability (since they're likely over-selling storage to be able to offer 1 TB for $12 with the level of redundancy, staffing costs, etc, needed).

SpiderOak is definitely overselling the 1TB as well as another one that pops up once in a while called as the "unlimited" plan for $149 a year. This is clear from the disproportional pricing structure - $79 a year for 30GB that jumps to $129 a year for 1TB and then to $279 a year for 5TB - which entices users to go for the higher amounts because they appear to be great deals. What people with residential broadband connections may not realize is that a) uploading even 1TB of data will take a long time and b) SpiderOak cannot, and does not, provide any minimum guarantees on the upload or download speeds (assuming everything else in between SpiderOak and the user looks fine).

lakeeffect · on Dec 8, 2015

The thing that is silly about that is related to cost of acquisition and retention of customers. If a company is able to get more data quicker they are more valuable to the customer and will most like be used and retained by the customer. If organizations are offering storage as a solution while at the same time trying to minimize the costs of that solution, by minimizing the utilization of that storage; they are exchanging fixed costs associated with storage (that should be easily built into pricing) for large variable costs related to customer acquisition, retention, and branding.

jcfrei · on Dec 8, 2015

Yup, I've noticed the same with Wuala. The uploads were pretty slow. I've heard similar complaints from people using OneDrive. I would be very willing to switch to a smaller competitor even if it meant paying more than I do at Dropbox. But from my experience Dropbox is the only provider capable of synchronizing large amounts of data 24/7.

extesy · on Dec 8, 2015

Backblaze client is uploading at the speed of several mb/s, very close to my connection upload speed limit.

scurvy · on Dec 8, 2015

It's definitely possible to offer that on a monthly basis if you model that each customer stays for 36-39 months. Also, I doubt that they are using replicated storage, but are using erasure coding instead. Also, they dedupe before upload, so more cost savings there.

_m7bj · on Dec 8, 2015

Spideroak doesn't and cannot dedupe, since everything you upload is encrypted by a key held only by you.

daveguy · on Dec 8, 2015

Spideroak can and does dedupe client side before uploading. It can't dedupe across multiple clients, but it does dedupe within the client. It also tracks syncs so that data synced between multiple client machines only has to be stored once (with appropriate redundancy).

evoloution · on Dec 8, 2015

That explains why the data are visualized the way they are in the view menu :)

mark_l_watson · on Dec 8, 2015

That doesn't sound good. On the other hand, I use SpiderOak with not a lot of cloud storage use, with clients on OS X, Linux, and until this morning Windows 10. The only problem I ever had was more or less my fault - trying to register a new laptop with a previously named setup.

BTW, why store photos and videos on encrypted storage? For that I use Office 365's OneDrive: everyone in my family gets a terabyte for $99/year and I really like the web versions of Office 365 because when I am on Linux and someone sends me an EXCEL or WORD file, no problem, and I don't use up local disk space (with SSD drives, something to consider).

_mqkw · on Dec 8, 2015

I prefer to store photos and videos on encrypted storage because I want to control who sees them. Storing them on unencrypted storage means I don't have that control, the storage provider does and is kind enough to let me make suggestions.

As for OneDrive, I tried it for a while but it didn't work out. Their clients and web interface were terrible and their API was severely lacking. I expect more functionality when I'm sacrificing my privacy.

I ended up going with Google Drive in the end, as you can get 1TB for $9/month with an Apps for Work Unlimited account (I actually seem to have Unlimited under that plan, which isn't supposed to happen until 4 users). That of course means sacrificing encryption but I trust Google enough to make the privacy tradeoff in exchange for extra features (OCR, Google Photos etc.).

mark_l_watson · on Dec 8, 2015

I also buy extra storage from Google but I have had some problems downloading large backup files (50 GB, or so) that I have stored on Google Drive, so no system is perfect.

A little off topic, but Google really seems to be upping their consumer game lately with Google Music, Youtube Red, Google Movies + TV, etc. I am now less a user of other services like GMail and Search, but Google gets those monthly consumer app payments from me. I have the same kind of praise for Microsoft with Office 365.

ldehaan · on Dec 8, 2015

This has been my experience as well, not to mention how much the client slowed down my machine. It's been really slow going but the client is getting better. I never tried doing the encryption on my side, though, they also do diffs on each file you upload so I imagine that has something to do with the lag. I still use spideroak, they're the only company I'm aware of that encrypts locally and also has done a lot to progress personal security for all of us. So I've gotten used to the slow speeds and buggy software, it keeps getting better so that's a big plus :)

cpach · on Dec 8, 2015

There are other backup applications that encrypt locally before sending to server. Two examples are https://www.tarsnap.com/ and https://www.haystacksoftware.com/

theandrewbailey · on Dec 8, 2015

I was going to post a comment about how cloud storage is more of a means to move data around rather than back it up, until I dug a little deeper and saw that SpiderOak actually pitches itself primarily as a backup provider. I agree, it needs to be much faster than that.

kbenson · on Dec 8, 2015

Is it possible that they are working on batches, and not doing any hashing/compression in parallel with the uploading? It seems feasible from your screenshot that they are getting ~10GB of data at a time, compressing(?) and hashing, and then uploading, and then starting on the next ~10GB.

jasonsync · on Dec 8, 2015

Plug: https://www.sync.com

newscracker · on Dec 8, 2015

The only issue I have, which is similar to what I see with some other providers, is that the first non-free plan is a huge jump in storage space and price. If I want a Dropbox replacement, I'd be looking at a 25GB or 50GB plan (just comparing what I have with all kinds of free storage bonuses accumulated over years). Having some more "in-between" plans that are more linear in storage and price would've been an incentive to try this out since I'm not willing to fork $49 a year for 500GB while knowing that my Dropbox usage is less than one-tenth of that.

_mqkw · on Dec 8, 2015

Love the pricing and features but Win+Mac only and no API largely kills it for me as I need Linux access at the very least.

draw_down · on Dec 8, 2015

This comment is ridiculous, and so is the fact that it's at the top. This is supposed to be about Google Analytics, come on.

cpach · on Dec 8, 2015

It is off-topic, yes. For me personally it was very valuable however since I’m in the market for a backup application, and I will definitely take Veratyr’s comment into consideration when choosing between the available offerings.

y4mi · on Dec 8, 2015

well, the post is from SpiderOak, so its understandable.

but its an ad hominem argument, thats for sure.

Paul-ish · on Dec 8, 2015

Could the issue be caused by bad peering between your ISPs?

_mqkw · on Dec 8, 2015

If that was the case I'd expect the upload to be consistent but slow. Since it was intermittent, I believe it's an app issue.

bluedino · on Dec 8, 2015

>> my laptop was easily capable of hashing and encrypting the data much faster than the network was capable of handling it

You are assuming that you are the only one using that uplink and that server

_mqkw · on Dec 8, 2015

Updated my comment:

> easily capable of hashing and encrypting the data much faster than SpiderOak was handling it

I can believe that there was upstream congestion somewhere outside my network (speeds to Google, Amazon indicated that there were no issues inside) or that their server was overloaded but the engineer who investigated seemed to attribute it to the client:

> Additional changes to storage network configuration will not improve the situation much. There is an overhead limitation when the client encrypts, deduplicates, and compresses the files you are uploading"

buro9 · on Dec 8, 2015

Why not move to push GA data server-side?

Trivial to set-up, immune to adblockers affecting the completeness of data, prevents the write of tracking cookies, leaves data and utility of the GA dashboard mostly complete (loss of user client capabilities and some session-based metrics).

This is the route I'm preferring to take (being applied this Christmas via https://pypi.python.org/pypi/pyga ).

One may argue that Google will still be aware of page views, but the argument presented in the article is constructed around the use of the tracking cookie and that would no longer apply.

I'm shifting to server-push to restore completeness, I'm presently estimating that client-side GA represents barely 25% of my page views (according to a quick analysis of server logs for a 24hr period). I'm looking to get the insight of how my site is used rather than capabilities of the client, so this works for what I want.

ytechie · on Dec 8, 2015

I agree. Server-side analytics were actually fairly mature before Google came alone. It's just more complicated in some cases, but manageable. The biggest downside these days would be SPA apps since they are not necessarily touching the server in any regular way.

JupiterMoon · on Dec 8, 2015

People don't care about the cookie or any of the details of the implementation. They care about being tracked across the whole internet. If you are still contributing to that then you are disrespecting your customers. I hope that I am not one of them.

ufmace · on Dec 8, 2015

Except that basically nobody cares about "being tracked across the whole internet", as shown by GA, Facebook, etc being on virtually every popular website and nobody noticing or caring at all. If you care enough to make even the most trivial change in behavior, then you're optimistically 1 in 1000.

JupiterMoon · on Dec 9, 2015

I said that I care and that I hope that I am not a customer of businesses that track me and contribute to google's tracking. In that case I am 1 in 1000 and if a site doesn't work without GA and I don't have to use it (as in to file my taxes have to) then I won't I will purchase from a competitor.

EDIT Most people do notice and do care this has come up in countless conversations. They just accept it as a necessary evil that they can't do anything about and accept (wrongly) that they as a individual can't change the world.

buro9 · on Dec 8, 2015

Did you read my comment?

You will have no GA cookie from any of my sites, I am not recording client identifying things or capabilities. It is a server-side push of GA and avoids all client-side interactions.

It is merely, "A page has been viewed, this one: /foo/bar?bash".

There's nothing in there that is tracking you. I'm not even embracing the session management aspect.

I get to use the tool that is best-in-class, in a way that lacks capability to track you.

huhtenberg · on Dec 8, 2015

Without any "client identifying things" how would GA be able to chain several page hits into a session then? That is, do basic visits vs. hits split.

If you are in fact anonymizing everything about a client as you claim you do, then it won't be able to. Unless, of course, you are feeding GA some opaque client ID that you then internally map to and from actual clients that hit your server. However something tells me that you aren't doing that, or you would've mentioned it already.

(edit) I re-read your comment. You aren't apparently interested in session counts. But what's good the GA summary then if you can't tell 10 bounced visitors from one visitor with 10 hits? This makes no sense. If you want to look at just page hit numbers, there are dramatically simpler ways to do that.

buro9 · on Dec 8, 2015

In the test I've done, sending no session/user data over, I lose all sense of a "session".

But I do retain insight into what content has been viewed, how much, what is rising and falling, etc.

The question really is what info are you really reporting on? AdBlockers make us blind and tracking is horrible, but I get to have a far more complete view over the simple stuff Urchin used to be great at.

JupiterMoon · on Dec 9, 2015

Why not use your server logs for this information?

huhtenberg · on Dec 8, 2015

Ah, so you are passing some client IDs over the GA after all. An IP address perhaps? You know that's a leading question, right?

Incidentally, I ran similar experiment with gaug.es few years ago - pulled on their tracking API from our server side. While it worked as expected, these sort of shenanigans are good for only one thing - hiding the fact that you are using 3rd party analytics from your visitors.

On a more general note - the thing is that you either care about other people's privacy or you don't. It's not a grayscale, it's binary. And if you do, there's no place for GA in the picture.

buro9 · on Dec 8, 2015

No.

I am not passing IP. I am not passing a client-id. I am not passing any kind of correlation identifier from which a session can be inferred or created. I am not passing user-agent information. I am not passing a cookie ID.

I am only passing a page view event. "Page /foo/bar?bash has been viewed".

Take a look here: https://code.google.com/p/serversidegoogleanalytics/

Tell me where in that example (mine is similar) you see any client identifying information.

There is none. If GA deduces anything, it will be a property of my origin server and not a client.

I do not agree that using GA in the way I have described allows Google to invade privacy at all. Please explain clearly how it does in your opinion.

ccozan · on Dec 8, 2015

But isn't the same kind of data you could extract from Apache logs? Since from what you describe is basically a log of all your requests.

GA has many utilities, mainly is to follow the user and see the funnel they go and second to monitor the marketing campaigns. If you don't need this, then Apache log + webalyzer is perfect for everyone.

buro9 · on Dec 8, 2015

I persist with GA, because every now and then I work with partners who would like to verify the activity on my websites (and yes my user agreements and privacy policy allow this) and have a means to compare this with historical data or data from other sites.

Those partners frustrate me, in that they won't trust me to provide stats generated from server logs, but they all default trust GA.

This technique allows me to use GA, produce the view of the content they need, export the PDF, and share that... and they trust it.

GA is the de facto store of trusted data when it comes to web site activity. For my sites that is tracking content page views.

huhtenberg · on Dec 8, 2015

I don't understand why you bother with GA then.

buro9 · on Dec 8, 2015

That's OK. It wasn't a requirement of my system.

huhtenberg · on Dec 8, 2015

Spectacular joke.

This whole conversation started with you saying why abandon GA when you can use it without compromising clients' privacy. An exchange that followed shows that one can't actually derive not just the same function from GA that way, but virtually any function at all. Yes, you can feed data in, but the usefulness of what you can get back out is next to zero. What am I missing?

From your opening comment:

> Why not move to push GA data server-side?

Because it renders GA largely useless if clients' privacy is actually observed.

pdkl95 · on Dec 8, 2015

> I am only passing a page view event. "Page /foo/bar?bash has been viewed".

I would like to say, as someone extremely hostile to tracking of any kind, that if this is all you're sending to google, that sound perfectly fine from a privacy perspective. (Google gets your information, but that's between you and Google)

Thank you for choosing a method that respects the privacy of your readers.

buro9 · on Dec 8, 2015

> (edit) I re-read your comment. You aren't apparently interested in session counts. But what's good the GA summary then if you can't tell 10 bounced visitors from one visitor with 10 hits? This makes no sense. If you want to look at just page hit numbers, there are dramatically simpler ways to do that.

I do not care to track users/sessions, page views are enough for me. I am tracking content and content views... and I get this big tool that is awesome at slicing data and presenting trend information... for free.

humanfromearth · on Dec 8, 2015

The only issue I can see with this is a lot of HTTPS connections with your analytics platform from your web service. If you choose to use a work queue/proxy to do it, it's additional work/point of failure, etc.. It's not as 'simple' as adding a JS at the bottom of your page.

JupiterMoon · on Dec 9, 2015

What info does Google get on your customers in exchange for your free use of their service?

buro9 · on Dec 9, 2015

You've emphasised that word as if it changes the question somehow, but I don't see how it changes the answer.

JupiterMoon · on Dec 9, 2015

Because by using your website I become your customer I am doing business with you. I don't always want a third party involved.

You never answered what info do you send to the owner of the tracking library that you licence? Or if you send them no info how do they get paid?

oneJob · on Dec 8, 2015

How about open-sourcing your product before worrying about improving other products? SpiderOak has been "investigating a number of licensing options, and do expect to make the SpiderOak client code open source in the not-distant future" for a very, very long time now. It's no trivial thing to have a closed source client for a "zero knowledge" service.

https://spideroak.com/faq/why-isnt-spideroak-open-source-yet...

EDIT: I'd welcome discussion, in addition to your up/down votes

prajjwal · on Dec 8, 2015

I came here for this exact thing. They said they were going to go open source in 2014 IIRC, and failed to deliver. I have stopped using SpiderOak - how am I supposed to trust them with my most private files when I can't verify that they're not doing anything shady on my machine?

The opening line of this post is amusing. They ought to give thought to fixing their core product first.

gerty · on Dec 8, 2015

I am also concerned with that. That message has been there unchanged for some time now. To be fair, there's a lot of stuff on the Github page, including the Android client under Apache license. Although as far as I can tell, desktop client is not there yet.

cm2187 · on Dec 8, 2015

The other thing is that google analytics is on many adblockers lists, precisely for that reason. As adblockers are getting widespread, the analytics is going blind.

Albright · on Dec 8, 2015

I've been running a blocker to block GA and other junk on my PC, but I imagine I'm in a statistically insignificant minority. And I still can't block them on my iPhone unless I disable JavaScript entirely (though I'm running iOS 9, I'm not able to install a blocker for some reason; I guess Apple arbitrarily doesn't support them on my older iPhone model).

happyopossum · on Dec 8, 2015

>I guess Apple arbitrarily doesn't support them on my older iPhone

It's not arbitrary - it requires a 64 bit CPU (of which Apple has now shipped 3 generations of).

Albright · on Dec 8, 2015

Ah, is that the differentiator? I see. Still strikes me as somewhat arbitrary, though - is content blocking such a strenuous task that it requires a 64-bit CPU? Wouldn't using a blocker cause the CPU to do less work in most cases since it doesn't have to download so many ad media files or execute as much JavaScript?

Yeah, I guess it's just time to get a friggin' new phone already, but this one ain't broke yet, ya know?

vlunkr · on Dec 8, 2015

If anyone is looking for a good blocker for stuff like this, I recommend ghostery. I set it to block everything by default, and whitelist the few things I want. It doesn't block scripts served by the site you are on, so it doesn't totally break your browsing experience, like others do.

cm2187 · on Dec 8, 2015

I could install an adblocker on iOS9, and you can customize the block list.

I don't think you are a minority. I understand adblocking usage is around 20%-ish now.

What I don't understand is people who use adblockers but still login to their google account on chrome. It sorts of defeat the purpose...

Kristine1975 · on Dec 8, 2015

If your device is jailbroken (not sure if there's a jailbreak for iOS 9), you could add entries for GA to its hosts file. I use these on my desktop PC:

  127.0.0.1 www.google-analytics.com
  127.0.0.1 google-analytics.com
  127.0.0.1 ssl.google-analytics.com

AFAIK ad blockers are only supported on iOS devices with 64 bit CPUs.

nateberkopec · on Dec 8, 2015

An open-source, self-hostable solution providing 80% of common Google Analytics functionality seems doable to me.

Is there anything out there in this realm? If not, why not?

davidroetzel · on Dec 8, 2015

Have a look at piwik: http://piwik.org/

sarciszewski · on Dec 8, 2015

I don't recommend Piwik. Using MD5 for passwords in 2015 is beyond irresponsible.

https://github.com/piwik/piwik/blob/7f375924db9328f20a0b7cb1...

https://github.com/piwik/piwik/blob/6846145992278b52a2a35a8f...

ascorbic · on Dec 8, 2015

Unbelievable. Unsalted MD5, no less. There's an issue to fix this that's been open for seven years! https://github.com/piwik/piwik/issues/5728

Karunamon · on Dec 8, 2015

Eh. The analytics data is pretty low value as far as hacker targets, and this can be mostly mitigated anyways by sane segregation of the admin backend from the publicly accessible site.

There's an open ticket for it, but it looks like it hasn't been addressed in a while since they don't want to break all existing passwords.

https://github.com/piwik/piwik/issues/5728

sarciszewski · on Dec 8, 2015

This is a solved problem.

http://security.stackexchange.com/a/31439/43688

https://www.reddit.com/r/PHP/comments/3lwxlw/hash_and_verify...

ascorbic · on Dec 8, 2015

A low value target maybe, but having a critical security ticket open for seven years is unforgivable. If they don't want to break compatibility it's pretty simple: use something like PHPass and upgrade the hash when the user next logs in. i.e. what every halfway sensible web app did at least five years ago.

jacquesm · on Dec 8, 2015

It does not have to break all existing passwords. Just add an envelope for the old passwords.

onion2k · on Dec 8, 2015

There's a $555 bounty if you can demonstrate a security vulnerability in Piwik because of that.

sarciszewski · on Dec 8, 2015

I'm not interesting in further dehumanizing myself with participation in a bug bounty program.

I'll write an exploit for it (the general case, not just Piwik in particular) and drop it on OSS Sec some day, but here's a theoretical attack:

1. Guess a username somehow. Maybe "admin"? Whatever, we're interested in the security of the hash function. Let's assume we have the username for our target.

2. Calculate a bunch of guess passwords, such that we have one hash output for each possible value for the first N hexits.

e.g.

    substr(md5($string), 0, 2) === "00"
    substr(md5($string), 0, 2) === "01"
    substr(md5($string), 0, 2) === "02"
    // ...
    substr(md5($string), 0, 2) === "ff"

3. Send these guess passwords repeatedly and use timing information to get an educated guess on the first valid MD5 hash.

4. Iterate steps 2 and 3 until you have the first N bytes of the MD5 hash for the password.

5. Use offline methods to generate password guesses against a partial hash.

The end result: A timing attack that consequently allows an optimized offline guess. So even if their entire codebase is immune to SQL injection, you can still launch a semi-blind cracking attempt against them.

sarciszewski · on Dec 8, 2015

By the way, if anyone else wants to try to claim the $555 from Piwik based on the above theoretical attack, feel free.

frik · on Dec 8, 2015

How to protect from timing attacks - It's All About Time: http://blog.ircmaxell.com/2014/11/its-all-about-time.html

sarciszewski · on Dec 8, 2015

password_verify() compares hashes in constant-time, so, yeah...

dataewan · on Dec 8, 2015

http://snowplowanalytics.com/ is worth considering if you have larger volumes of traffic

marrone12 · on Dec 8, 2015

Snowplow is great. Super scalable and they have instructions on how to host everything on your own AWS infrastructure.

pg1 · on Dec 8, 2015

+ 1 for snowplow. We been using it for more than a year now with high traffic sites and it's working great.

zachh · on Dec 8, 2015

What do you use as a front-end?

theepauk · on Dec 8, 2015

Why not for low volume sites?

pg1 · on Dec 8, 2015

Well it takes lots of time to setup + needs a few extra server like event collector, log cleanup and enrich, data loading and database server.

hendi_ · on Dec 8, 2015

Yes, there is: www.piwik.org

kordless · on Dec 8, 2015

Self hosted analytics is still centralizing behavioral data on the users. It's not really any better from a privacy standpoint than GA.

jon-wood · on Dec 8, 2015

Its nowhere near as centralized as Google Analytics though - at least if you're self hosting that data is confined to the silo which is your own analytics, rather than Google being able to aggregate that with their behaviour on every other site they visit as well.

kordless · on Dec 8, 2015

That silo is still aggregating data. Trying to argue its "less" centralized by using quantification of the amount of centralization is still akin to dissonance. Clearly people here don't agree with this, but that's to be expected when the topic is so polarizing. Traffic analytics must be important, so we rationalize our actions, or inactions around how we collect them.

Any centralized solution, at any scale, can possibly violate someone's privacy. Period. If we want to really fix things, we should stop circle jerking ourselves and do something about it.

gpvos · on Dec 8, 2015

Not at all. The entire point is that Google is able to track one person across many, many sites. That is simply not possible if each site had its own self-hosted analytics.

trebor · on Dec 8, 2015

To any of the SpiderOak team: thank you.

It's more than just the tracking cookie, though. It's also about Google aggregating all its website data into a unified profile. The data they have on everyone is frightening—all because of free services like GA.

tajen · on Dec 8, 2015

Yes, thank you SpiderOak, even though I don't use you: High profile companies quitting GA means we get aware of alternative solution. Today, I've learnt about http://piwik.org .

c0achmcguirk · on Dec 8, 2015

Spideroak user here. I stopped using Dropbox and started using Spideroak about a 18 months ago. I really like the product. It's not as good as Dropbox in some ways (like automatically syncing photos from my phone) but it really is easy to use. I still have a mobile client on Android and I can keep my files in sync across multiple computers. I pay for the larger storage size and I'm not even close to using it all.

It syncs fast too. Just thought I'd share my experience with people.

eljimmy · on Dec 8, 2015

Is it just me or is this a click-bait title with hollow content?

sparkzilla · on Dec 8, 2015

It is. It's no big deal to stop using Google Analytics. It is, however, a big deal not to use Google Search, something I am considering for my company.

mort96 · on Dec 8, 2015

Well, the title says they stopped using Google analytics, and the article explained that they stopped using Google analytics, why they did it, and what they're doing instead. You may not find it interesting, but the title clearly reflects the content, so I'm not sure how it's click bait.

lukeqsee · on Dec 8, 2015

> Like lots of other companies with high traffic websites, we are a technology company; one with a deep team of software developer expertise. It took us only a few weeks to write our home-brew analytics package.

I'm a little curious why they decided to go this route instead of using one of the open-source solutions. Aren't there good solutions to this problem already?