I wish the diagrams were bigger, they are hard to read and a bit blurry.
One of the interesting points, that is often overlooked in ML is model deployment. They mention tensorflow, which has a model export feature that you can use as long as your client can run the tensorflow runtime. But they don't seem to be using that b/c they said they just exported the weights and are using it go which would seem to imply you did some type of agnostic export of raw weight values. The nice part of the TF export feature is that it can be used to recreate your architecture on the client. Bu they did mention Keras too which allows you to export your architecture in a more agnostic way as it can work on many platform such as Apples new CoreML which can run Keras models.
Warning: I'm a vendor. Take everything I say with a grain of salt. I will try to sell you something.
1 biased perspective I have here: Infra is often a different team from data science. They don't always do the deploying.
Beyond "some sort of serving thing" the data scientists might not necessarily know about what's being deployed.
This is not true at every organization and there are exceptions. This is typically true of most companies we sell to though. There are usually ML platform teams that do the "real" deployment (especially at sizable scale)
Another characteristic of production is it's "boring". "Production" is a mix of databases to track model accuracy over time, possibly microservices depending on how deployment is "done". Characteristic ways of giving feedback when a model is wrong, experiment tracking and model maintenance among other things.
A lot of these things are typically very specific to the company's infrastructure.
The "fun" and "sharable" part that people (especially ML people) is usually related to "what neural net did they use?"
The other thing to think about here: "production" isn't just "TF serving/CoreML and you're done" there's typically security concerns, different data sources,.. that are often involved as well that might be specific to a company's infrastructure. There also might be different deployment mechanisms for each potential model deployment: eg: mobile vs cloud.
Grain of salt sales pitch here: We usually see the "deployment" side of things where it's a completely different set of best practices that happen to overlap with data scientists experiments. This includes latency timing, persisting data pipelines as json, gpu resource management, kerberos auth for accessing data, managing databases and an associated schema for auditing a model in production (including data governance), connecting to an actual app/dashboard like the ELK stack,..
TLDR: The deployment model would be its own blog post.
The Google paper Machine Learning: The High Interest Credit Card of Technical Debt [1] offers a semi-rigorous introduction to the topic of real-world ML model engineering/deployment considerations and best practice. (If anyone else knows of similar work I'd be grateful to hear about it.)
If you have to rely on model serialization schemes you have a problem because they express the model in terms of low level operations.
You probably want to do experiments with multiple model variants, or teak your model and fine tune from deployed weights. To do that you need a way to recreate it from layer-level objects instead of the add/reshape operations Tensorflow and its kin store internally.
That example is essentially building (and ostensibly training) an albeit a trivial model in Go. Typically you have more complicated architecture so you're deployment has two parts:
But the very first sentence on that page says: These APIs are particularly well-suited to loading models created in Python and executing them within a Go application
I wonder how much they could enlist others to solve this by creating something like an 'Uber Auction House' to basically buy and sell the right to reap Uber's cut for a ride. They could clean up on exchange fees while everyone solves this problem for them.
I work in Forecasting for Amazon and I've often wondered the same thing. Almost any business with some (moderate) degree of uncertainty about the future could be "securitized". I think doing so in a way that preserves privacy, security, and is defensible against disintermediation could be valuable.
All that said, the company that outsources (and that's really what you're proposing) such a core component of their business is probably taking on way too much risk.
This is interesting, could it potentially reduce surge pricing?
One thing I thought, it would be really convinient if Uber could amortize their surge pricing over the month/year. In order not to hit customers with unexpected rates and essentially offer a flat predictable fee over the whole period. Problem with that is you can't really plan the future demand to calculate how much you need to save/dip. Could an auction house help to hedge the bets?
It's unlikely to affect surge pricing. You can think of surge as a tool to move drivers to where the riders are. Even if you knew there would be heavy demand, you need to incentivize drivers to actually go there. Why would I, as a driver, go 20 minutes out of my way to be in Katy Perry concert traffic when I can keep picking up passengers at a reasonable clip on the other side of town?
Surge is only really solved once autonomous vehicles can be preemptively positioned near demand.
The author might be suggesting that you could have surge priced compensation for the drivers to incentivize them to move to the demand but also amortize that cost for the consumer.
That's an interesting problem because (if I'm understanding you correctly) forecasting would need to be done at an individual level. A consumer getting rides primarily during off-hours should (imo rightly) pay less than a rider booking rides primarily from busy locations. Amortizing that cost fairly while also making a safe profit is a tricky balance to strike, I'd think.
I don't understand if they use windowing as a fixed computational step that is active both in training and scoring time, or, if they use sliding windows only to chop up the training data.
Also, I wonder if they checked how a feed-forward NN that operates on the contents of a sliding window (e.g. as in the first approach above) compares with their RNN results. I am curious about this, as it would give us a hint whether the RNN's internal state encodes something that is not a simple transformation of the window contents. If this turns out to be the case, I'd then be interested in figuring out what the internal state "means"; i.e. whether there is anything there that we humans can recognize.
I wasn't very sure what the sliding window part was about either. I think they were just saying that they trained on a sliding window using the "output window" as part of their loss function.
A feed-forward NN wouldn't do much because it doesn't hold a state variable which you need to be able to understand context in time series data. There are probably some pieces of the state that you'd be able to interpret but the majority of it would mean nothing to us.
It makes very little sense to me that their sliding window does not appear to contain previous holidays. I'm not expert but I'm pretty sure holidays are a seasonal trend, and it would benefit them to train their models on previous holidays as opposed to the 3 months before a holiday, right?
Whenever I see a post or announcement by a major company that they're using "machine learning", I'm reminded of what CGP Grey said: it seems like nowadays machine learning is something you add in to your product just so you can seem hip by saying that it has machine learning, and not for a legitimate technical reason.
There are undoubtedly things that machine learning is right for, however to me it seems like it's become a buzzword more than anything else.
Resume-driven development is an organizational pattern, and like all patterns, it exists for better reasons than you'd first think.
The lifetime compensation of developers isn't just tied to how much salary they have at the moment. Getting into a dead-end and not developing your skills will definitely set you back over the long term. Like, there's a reason people will pay you a premium for working with COBOL. So there's a very rational pressure to get better total compensation out of a role by choosing resume-developing tools and technologies.
On the other end, organizations mostly just care about getting the task done. They've got the choice of doing it in a boring way and paying lots of money for developers who don't want to grow their resume, or indulging in the developers fancies and getting it done more cheaply.
tl;dr - resume driven development is a way for companies to pay for projects with "you'll get experience".
Interesting stuff, but all they've managed to do so far is find models that fit historical data better. Would be interested to read a follow up a year later to see how their models actually performed.
I wonder how are they quantifying uncertainty around their predictions. Having a point-estimate without some notion of confidence interval seems much less useful. Is there a natural way to do this through LSTMs?
Also, some actual benchmarking would be great. Say, against Facebook's Prophet (which also deals with covariates and holiday effects).
One of the interesting points, that is often overlooked in ML is model deployment. They mention tensorflow, which has a model export feature that you can use as long as your client can run the tensorflow runtime. But they don't seem to be using that b/c they said they just exported the weights and are using it go which would seem to imply you did some type of agnostic export of raw weight values. The nice part of the TF export feature is that it can be used to recreate your architecture on the client. Bu they did mention Keras too which allows you to export your architecture in a more agnostic way as it can work on many platform such as Apples new CoreML which can run Keras models.