> The turnaround time also imposes a welcome pressure on experimental design. People are more likely to think carefully about how their controls work and how they set up their measurements when there's no promise of immediate feedback.
This seems like a cranky rationalization of the lack of a fairly ordinary system.
Sure, you shouldn't draw conclusions for potentially small effects on < 24 hours of data. But of course if you've done any real world AB testing, much less any statistics training, you should already know that.
What this means is you can't tell whether an experiment launch has gone badly wrong. Small effect size experiments are one thing, but you can surely tell if you've badly broken something in short order.
Contrary to encouraging people to be careful, it can make people risk averse for fear of breaking something. And it slows down the process of running experiments a lot. Every time you want to launch something, you probably have to launch it at a very small % of traffic, then you have to wait a full 24-36 hours to know whether you've broken anything, then increase the experiment size. Versus some semi-realtime system: launch, wait 30 minutes, did we break anything? No? OK, let's crank up the group sizes... Without semi-realtime, you have to basically add two full days times 1 + the probability of doing something wrong and requiring relaunch (compounding of course) to the development time of everything you want to try. Plus, if you have the confidence that you haven't broken anything you can much larger experiment sizes so you get significant results much faster.
If the people running experiments do not know and cannot be told how to do the fundamental thing that they are trying to do, then you have bigger problems.
Which is ultimately what this post points to: the author doesn't trust his team and isn't listening to them and doesn't expect they will listen to him. Regardless of the degree to which the author is correct in his assumptions, the problem is more than just engineering.
You can totally break some product functionality somehow without necessarily triggering a software exception or server crash! You really do need to know the target events per experiment group.
You can break your product without noticeably affecting the things like the http error 500 rate, cpu utilization %, etc. that you would likely see on some ops dashboard.
On our ops dashboard we see stuff like number of ID syncs, number of events processed (by type), etc. - I'd argue that is something is truly "broken" you see it.
If you're using funnel analytics to decide that the product is broken - i'd say you probably do something wrong.
> What this means is you can't tell whether an experiment launch has gone badly wrong.
Personally I prefer automated testing to tell me if a feature has gone badly wrong, not conversion numbers. Then I find out before I launch too.
Or do you mean that the UX is so badly designed the users cannot use your software anymore? In which case, maybe there are bigger problems than real time analytics
When I started my business, I looked at sales number every day.
I got my hopes up when I sold 15 copies of my app on a good day, only to feel completely devasted when I sold only 4 copies the next day.
After some time I stopped looking at daily sales numbers, and switched to weekly numbers instead. But even that was too often.
Now I fetch my numbers roughly every other week, and don‘t worry at all about an individual numbers. Only by looking at trends over a longer timeframe can you make sensible decisions... The numbers just fluctuate too much from one week to the next.
That's definitely one of these things you have to learn as part of running a business. You tend to over-index on things you can measure and under-index on things that you can't. After a while you realize there's a lot more randomness in what happens than you might expect and so you can't weight any particular data point too highly. If you look at patio11's blogging of Bingo Card Creator, you'll notice he has little idea as to why his userbase grows or shrinks any particular year: http://www.kalzumeus.com/2012/12/29/bingo-card-creator-and-o... And he's one of the most sophisticated product analytics experts out there!
I used to spend a lot of time reading news stories and checking stock prices every day. Turns out almost all of that is random noise that doesn't add up to actionable patterns.
Better to ignore the hourly and daily stuff and look at longer timeframes with long-form articles and books. Even fiction can be pretty good at identifying real macro patterns.
Do you have any way to tell if any free users actually entered their AWS keys into your app? There is a much higher level of trust needed for your app over the average mobile app. To actually use it with production data when I'd never heard of you before, I'd probably generate a honey pot set of keys and check if they leaked, and other tests before I could trust it with real keys. Security apps are really tricky to bootstrap.
The keys are staying on the device, so I have no way to check if people actually entered them. I'm also not running any analytics.
I mean, I just guess that if people weren't using the app, they would uninstall it eventually?
I've had a retention rate of 60% which I thought was pretty good and made the think that people actually use the app.
That being said, I completely understand the trust issue you mention. It's even worse for the online version of the same service, iamproxy.com
Do you have a suggestion how I can improve the trust?
I'm serious. I'm not a mobile app publisher, I don't know anything about app discovery through the Play store, but I have experienced the paradox of increasing sales by increasing prices in other channels.
I'm not super up on the psychology at play, but I've read it's because people use price as a proxy for quality/utility in uncertain markets.
Here's my metric: I do complicated merges once in a while. If Beyond Compare (price = 60$) helps me resolve the conflicts in few minutes, of what used to take an hour of cursing and sweating, I pay them and consider the money well spent, because I can spend that hour (and a lot of others) that the tool freed up for me, earning money. The tool pays for itself basically.
If it's a niche market, low prices is rarely a viable approach because you won't have a lot of volume. Also, all the info you have about app stores and revenue are based on the total mass, or on the top-100 lists, and niche apps fall completely outside of it. There are actually a few narrow categories where you can get away with a high price.
"But unless the intention is to make decisions with this data, one might wonder what the purpose of such a system could possibly be."
Speaking as an analyst, you'd be amazed (or not) at how much analytics is done either for resume padding, because one guy thinks it's "cool", or to enable marketing or executives to chase their own tails.
Indeed, with the general phenomenon of bullshit jobs, infinite instantaneous and always changing information is a welcome smokescreen, because it always allows people to "justify doing something". With such numbers, there is always something for them to do :p
They don't really mean anything, you can't draw actionable conclusions, and no one will even read the report after the first week, but you gather the metrics anyway to satiate the Metrics God's hunger.
This has been my experience working in enterprise. There is an enormous amount of "tactically motivated analytics" going on, and not just within the realm of marketing.
Almost every metric can be abused by increasing the frequency, resolution and dimensions in which it is measured.
In terms of some real world examples:
Labour force statistics are almost always reported on an instance level by commentators/media, but its trend that's recommended/meaningful.
Net promoter scores and breakdowns are consistently done and reported too frequently on too small a base to establish a trend, and they're usually cross tabulated and spoke on far too many dimensions.
Customer opinions and employee ratings/surveys.
Generally it's just fundamental statistics: don't reason too much about the general from the specific. And as you get greater resolution and move towards real time events, your specific becomes smaller and smaller, which means that extrapolations onto the general result in greater and greater errors...
Interesting. Thanks for sharing. Do you know of any good writing on the web or print that talks more about such abuses in the analytics and corporate world? Interested to learn how to spot when kooky tactics are at play in a tech/corporate setting, specially when its willful.
The last sentence worth reading the entire article!
Real-time web analytics is a seductive concept.
It appeals to our desire for instant gratification.
But the truth is that there are very few product
decisions that can be made in real time, if there
are any at all. Analysis is difficult enough already,
without attempting to do it at speed.
This is so true for so many situations. One of the hardest thing to understand on the 'other' side of a browser is the dimensionality of the stuff you are measuring. I recall an A/B test we did at Blekko that simply uncovered the presence of a 'click bot' that was always clicking on like the 15th link on a page.
Firstly, I'm super biased as I'm the CEO of a product analytics company where one of the value props is getting the data in real time. I agree with this post that people will look at single data points out of context and weight the evidence much more strongly than they should be. Analytics should be one of many tools you use that informs your understanding of how customers are using your product. I also agree that I see a lot of early stage startups invest way too much in building out real-time analytics stores without thinking about what they value they get out of it. That's not something you should do until you're in the 100+ engineer range.
That said, this is a post-hoc justification of why having real-time analytics is bad. Delaying the data by 24 hours doesn't automatically make significance testing better or force people to incorporate more context when interpreting their data. If that's the issue, fix that problem, don't blame the tools.
There's a ton of positive value to having real time data. Just off the top of my head:
1) If you've instrumented something incorrectly you can see that and fix it right away
2) Even worse, if you've accidentally messed up a feature with a release you can know about it right away. This happened to one of our customers recently and without a real-time analytics system they wouldn't have caught it quickly (and this is a tech startup that's well regarded for their engineering that everyone here would know the name of).
3) You can observe significant changes to your user base right away, eg during a launch or if you're getting a lot of new users from a specific channel.
4) It allows you to have more confidence in deploying multiple times a day. It's a little crazy to me that the deployment of a product is faster than our ability to measure it.
I think the real issue is that we're in the early days of analytics and people don't have a great understanding of how to leverage their tools properly. It's like web search before Google or file sync before Dropbox. People will look at single data points and conclude totally crazy things. They won't have an understanding of basic things like the amount of fluctuation on a week to week basis. The largest analytics provider in the world (Google) gives away their product as an ancillary service to drive you to purchase more ads, not to have a better understanding of how your customers are using your product. I'm hopeful this will change over the next 5 years though (and for us to be a part of that!)
Asking people to become data analysts when they just want to run a business and have a million other things to worry about is exactly the wrong way to think about this.
People shouldn't need to become data analysts to understand the data the software should be able to analyze and give simple advice when it has it.
I have designed my share of analytics software from the classical dashboards to more complex things like a market replay for Nasdaq.
At the end of the day, analytics primarily is useful if it can be used to make decisions with.
If you really want to build a useful analytics software tool you have to build something that analysis and either do things for me automatically (like seeing a lot of people are suddenly coming to your site because of a keyword and then quickly do some google advertising) or it has to be able to tell you something you can use (ex. You should buy extra hotdogs for Sunday as you are about to get a lot of customers because of bad weather after/before the football match.
Agree with everything you're saying about data. Given how hard an analytics system is to set up properly, it's one of the last pieces you should be adding in to get an understanding of how to build your business.
There's a ton of positive value to having real time data. Just off the top of my head: 1) If you've instrumented something incorrectly ... 2) Even worse, if you've accidentally messed up ... 3) You can observe significant changes ... 4) It allows you to have more confidence in deploying multiple times ...
Embracing both yours, and the author's perspectives I think I have a compromise. It seems to me the author's gripe is more specifically that his customers want to "do the same large window analytics in realtime" whereas here you highlight particular usecases where realtime analytics are simple.
In a previous life I was producing near-realtime analytics for network activity for populations numbering 10s of millions, all the time, in realtime, except the definition of realtime was stretched to allow a 15 minute latency.
Reducing this 15 minute latency was oft requested e.g to give help-desk operators an instant view of what is happening for a customer or, to automate responses to particular well-known system issues in realtime.
It wouldn't have been impossible to reduce our 15 minute window but given various architectural limitation of the time it would have been expensive and upon analysing these customer requirements it made more sense to simply siphon off the realtime event-data at our monitoring points into a parallel dataflow.
TL;DR you do different things with realtime vs aggregate data. Do you really need the expense of a system that does both equally well?
Finding products breakages through analytics is very backwards. If a team is deploying code that untested (or with that poor a process), could you really trust them to make informed judgements either? That needs to be fixed earlier than your product.
UX improvements or mistakes in a conversion pipeline, sure though. Fair points there. :)
I believe that's naive. There is always a non-zero chance that deploying new code will cause an issue that is not covered by your unit tests or integration tests.
Good testing methodology means adding to your tests when you find such a case, not promising that all possible cases are covered all the time, because that's unreasonable, and the assumption that there's no way it could break on deploy is the kind of hubris that leads to breakages when you deploy.
The conclusions of the article are probably correct, but it's important for analytics systems to have real time capabilities for debugging and iteration. When you're wiring up some new events or funnels you need to be able to click through and sanity check that you're collecting the data you expect.
Maybe some of the desire for whole systems to be real time comes from this frustration.
If you're running case management for a large corporation with tight SLA's - you by definition need real time analytics.
Similarly for if you're tracking outages, downtime, etc.
All that said, the organizations I've been in do a good job of distinguishing operational metrics vs high level analytics done for quarterly reviews or metrics pulled from across the enterprise for the big wigs.
Maybe I've been lucky working for companies with solid reputation, but if anything I think sometimes they should have more numbers ready to shoot from the hip.
Unscheduled maintenance? It would probably be good to get an idea of how many people access the site/open cases/etc. during that particular time.
If setting up notifications is "real-time analytics" I need to rewrite parts of my resume.
If your notifications are issued based on aggregated metrics (e.g., the number of incoming cases grew 500% in the last 15 minutes) then you probably could. If your notifications are issued on someone’s action (e.g., user submitted a case) then that’s another level of complexity.
This is definitely the use case I meant - those notifications are based on analytics across different systems.
Not sure given the context of the conversation why you'd think I meant adding yourself to an email list when a case get created or when a dashboard that's already built reaches some threshold.
Real-time data gives that nice warm confirmation that the change you just deployed is working as expected. Code change, site failover, any change at all.
It drives me batty that Google analytics update interval is unpredictable, so I look at real-time numbers.
Certainly it’s foolish to jump to conclusions from a too-small sample. Increasing latency does nothing to solve this. I’d recommend an introductory stats course instead.
Some real time analytics are useful. For example, pricing of virtual currency (how deep to run an online "sale") can be much more profitable of you have a real time measurement of price elasticity. I've built tools like that for people who actually need them, and it works in those cases.
However if you don't do real time business, you don't need real time analytics. What would you do with it? Have new wireframes generated and implemented in real time every five minutes? Switch database engines five times a day?
I feel the article makes the second argument well, but perhaps hasn't seen the case where the first argument holds. Try telling a currency trader that they don't need real time analytics and see how far that takes you!
Generally really good points. However one great use of realtime analytics is as a way to immediately catch production errors (particularly when connected to alarms, e.g. Splunk).
My thought is that realtime amazing systems as discussed can be an exceedingly difficult engineering problem or a trivial business problem (use google or another hosted solution).
Personally, any engineer who would try to reinvent the wheel on this one (rather than leverage any of the many incredibly refined existing technologies) should not serve in a decision-making capacity.
> It's important to divorce the concepts of operational metrics and product analytics. Confusing how we do things with how we decide which things to do is a fatal mistake.
It sounds like you agree with this quote from the article?
Indeed, "real-time" is one of the most requested features for S3stat, even though there's really not much you'd be able to do with faster data. In our case, Amazon doesn't even deliver their logfiles until 8 hours or so after the fact, so "real-time" reports would just be a pretty moving picture of the past.
I toy around with the idea of building it out as a feature anyway, just so that I can charge a premium to customers who want to turn it on.
It's what news orgs have been doing for a long time. The number of people addicted to this crap is much more than the number of people who can do something useful with it.
Why that is, has more to do with psychology and evolution of the mind than it has to do with the actual real time data or news.
Eh. Now that you mention it, watching an optimizer battle out a 0.1% objective function improvement every couple iterations on a run that took a few hours total ...
(But that was in another job, and besides, the code is dead.)
Early stopping is just fine if you are Bayesian. Not fine if you are doing null-hypothesis significance testing.
Multi-armed bandits beat the heck out of A/B testing because they dynamically balance making money now with taking risks that might mean you make more money later.
Real-time data is crucial to commercial publishing, and ads, and financial services, and costing serverless, and supply chain, and probably a whole bunch of other domains
In my experience, 99% of the value of real-time analytics has been in identifying service disruptions that monitoring tools don't find, and 1% of the value has been in informing business decisions. If some change badly broke your HTML rendering, but didn't throw any errors, you may not see it in service monitoring but you will definitely see it in the signups per day.
"No sampling" is just another way of saying "the queries will be so slow that you will never do any ad-hoc queries. It's also short for "I don't understand statistics".
Sampling can be really bad for web analytics. Take Google Analytics for starters. It heavily samples above a certain threshold of hits. At more granular drill downs, certain reports are at best unusable even directionally and at worst are misleading. Small efforts get lost in the noise of the rest of the data and often show up as zero values.
Random sampling is the foundation of statistical inference.
It works really well for estimating population statistics from sampling.
It works less well for predicting individual user behaviour (this tends to be where you need almost all the data rather than a sample).
Like, as your sample size decreases, your errors of estimation increase (1/sqrt(n)) so random sampling is normally the best way to approximate data (e.g. for real-time metrics).
You may wish to consider different sampling strategies, however. For some things, a persistent user (or other dimension of interest) makes lots of sense. For tracking service health, you probably want to sample on requests.
You may also wish to over-sample particular groups, if they are important to you (like maybe an F2P game would sample for most users, but store a larger proportion for paying users).
tl;dr random sampling is awesome, and should be used broadly.
Hate to point this out but it is funny how the Manifesto author equates accuracy to precision: "Accuracy (how precise the data is). Everything should be accurate."
I deal with the subtelties of precision/accuracy everyday.
This seems like a cranky rationalization of the lack of a fairly ordinary system.
Sure, you shouldn't draw conclusions for potentially small effects on < 24 hours of data. But of course if you've done any real world AB testing, much less any statistics training, you should already know that.
What this means is you can't tell whether an experiment launch has gone badly wrong. Small effect size experiments are one thing, but you can surely tell if you've badly broken something in short order.
Contrary to encouraging people to be careful, it can make people risk averse for fear of breaking something. And it slows down the process of running experiments a lot. Every time you want to launch something, you probably have to launch it at a very small % of traffic, then you have to wait a full 24-36 hours to know whether you've broken anything, then increase the experiment size. Versus some semi-realtime system: launch, wait 30 minutes, did we break anything? No? OK, let's crank up the group sizes... Without semi-realtime, you have to basically add two full days times 1 + the probability of doing something wrong and requiring relaunch (compounding of course) to the development time of everything you want to try. Plus, if you have the confidence that you haven't broken anything you can much larger experiment sizes so you get significant results much faster.