Hacker News new | past | comments | ask | show | jobs | submit login
Scaling Machine Learning at Uber with Michelangelo (uber.com)
112 points by dpandya on Nov 4, 2018 | hide | past | favorite | 59 comments



I love what Uber does with machines (ML), hate what it (currently) does to people.

We recently potted some models from Stan to Pyro (SVI on PyTorch), and it’s been reallly exciting (except for the dark corner of poutines), it really has the performance of something being used in production, except the occasional nan explosion.

edit we are lazy and use our GitLab CI/CD to drive model development iteration. It’s not as fully featured as what’s in the article but it’s a zero effort start.


What does Uber currently do to people that you hate? Uber currently provides people with more than 2 Million jobs [1]. Uber drivers/couriers made almost $13 Billion in the US alone last year [2].

[1] https://medium.com/@gc/ubers-path-forward-b59ec9bd4ef6 [2] https://www.sfchronicle.com/business/article/Uber-drivers-in...


Treat people like freelancers while paying them like low-wage waiters? $13bn/2mn jobs is $6k/job/yr, not so impressive compared to welfare.


Personally I think Uber drivers/couriers are freelancers. Uber drivers can work whatever hours they like. If they want to work 1 hour a day, they can. If they want to work 10 hours a day, they can. That's not something that non-freelancers can do.

For the same reason, I don't think it's fair to compare the average Uber driver salary to a full-time salary. Some Uber drivers work full-time, but I'd guess most don't. Lots probably only work a few hours a week. Uber provides students/parents/anyone with a way to make extra money on the side.

Also, from the article I linked to, 900k US drivers make $13 Billion a year. So $14k-$15k/job/year. When you factor in that many (probably most) Uber drivers are only working part-time, that's significant income from a super flexible job.


> I think Uber drivers/couriers are freelancers

yep so they should be compensated more, not less, in sight of the precarity of their job

> When you factor in that many (probably most) Uber drivers are only working part-time, that's significant income from a super flexible job

super flexible for whom? the drivers? or for Uber?

There are two sides to the gig style work, one side is a corp that's got teams of PhDs and a cloud calculating its optimal risk-reward strategy, and the other side some poor people trying to make money. How this can possibly turn out well for the latter is a pipe dream.


Freelancers are compensated whatever the market values them at. If they can get better pay/work elsewhere, then they do.

Driving for Uber is as flexible job as it gets. I don't see how it's flexible for Uber - Uber can only offer a ride if a driver decides out of their own free will to accept a ride.

For many people (2+ Million), driving for Uber is worth the money. They decided that it's a better gig than their other options. If they decide that it's not worth it, then they stop. And because it's such a flexible job, they don't even need to give two weeks notice.

Even though you think that driving for Uber isn't worth the money, millions of people around the world do.


Begging for money in the street is even more flexible — no car required — but I doubt you’d say those doing so are satisfied.

> free will to accept a ride

Whenever money is involved, free will goes out the window. It’s naive to think of all these people as rational actors.


I like what Uber (currently) does to people. Gets passengers from point A to point B efficiently while saving them significant money in the process over alternatives. Metaphorically puts dinner on the table of hundreds of thousands of drivers. Literally puts dinner on the table of millions (UberEats). Has a business model that doesn't rely exposing more eyeballs to more ads, corrupting the press, media, and privacy in the process. Reduces car ownership and dependence. Moving towards encouraging people to ride green vehicles. Literally saves lives (reducing DUI). Yeah, I'm okay with the Uber of 2018.*

*Disclaimer: I work at Uber, and my opinions are solely my own. We're hiring.


"Silicon Valley innovation now is directly aimed at oppressing the underclass, and everybody knows it and can see it. They hate Uber. People hate Uber. It means the death of the era of good feelings that came with this constant Moore's Law style innovation.

And that was an unforced error, by Silicon Valley. It was in their DNA. They didn't have to give Travis Kalanick, a guy they despised and never trusted, for good reason—They didn't have to give him all that venture capital.

But they saw him as an expendable probe, so they cynically gave him money, to see how much law-breaking he could get away with in the name of their disruption activities.

That was hubris—and nemesis is well on the way."

- NEXT17 | Bruce Sterling | Live from 2027


In the same talk, Bruce Sterling also said, "Do what China says. It’s the ascendant model. It’s destroys the California ideology. The Silicon Valley companies can’t get a toe-hold there."

As far as I can tell, the guy doesn't like America or even representative democracy very much. Take that for what you will.


Is it Uber / gig-economy apps you don't like, or the general idea of low income relatively unskilled labor jobs?


It's the idea of companies externalizing costs onto their labor force, because they refuse to recognize their labor force as "workers".

They avoid responsibility to communities they generate profits in, by exporting negative externalizes at a much higher level than traditional businesses.

Also, I don't think Uber drivers are 'unskilled'. The lowest rung is filtered out by not being able to bring their own $20000 vehicle to participate.


Merely owning a vehicle is an odd definition of skilled labor. They actually do a lot of community engagement (Uber now has huge operations teams all over the world) so that point is either ignorant or outdated. And you seem to object to the idea of independent contractors entirely (unless you can elaborate further), which is your right, but not really a unique strike against Uber.


> Merely owning a vehicle is an odd definition of skilled labor.

Having witnessed countless uber/lyft drivers do their thing I have to agree with your "skilled" assessment.

More on point -- my main objection is they are paying basically at cost pricing to the "driver-partners" when you add up all the costs. Basically, though many will disagree, all they're doing is taking the equity out of their vehicle now instead of at resale time.


> Has a business model that doesn't rely exposing more eyeballs to more ads, corrupting the press, media, and privacy in the process.

Though it does have a business model that (did?) flagrantly disregards the law in pretty much every market it moved into.

And we'll see how the privacy thing turns out when they figure out the data they have on millions/billions of people is worth a bunch of money and Wall Street is demanding "more cowbell".


Yes, Uber suffers from original sin, but you don't become the fastest growing company of all time by avoiding any toe-stepping. Look at the pathetic state of "Jump Bikes" in SF. Uber is judiciously following all the regulations these days (in all areas), and SF is "generously" upping the Jump Bike limit from roughly 200 to 400 total bikes. It's really a pathetic number of bikes and doesn't come close to meeting demand.


> you don't become the fastest growing company of all time by avoiding any toe-stepping

this is purified hubris, how can you not vomit on your keyboard while writing that?

> roughly 200 to 400 total bikes. It's really a pathetic number of bikes and doesn't come close to meeting demand

maybe the mandate of SF is not to satisfy demand or Uber's profit incentive but to keep public interest in mind, e.g. ensure that Uber doesn't develop a monopoly on whatever a jump bike is.


>this is purified hubris, how can you not vomit on your keyboard while writing that?

I'm curious how you get around. Do you own a car? It's a fairly privileged view that only the well-off should have access to point-to-point transportation and maybe some civil disobedience was in order to rectify this injustice.

>maybe the mandate of SF is not to satisfy demand or Uber's profit incentive but to keep public interest in mind, e.g. ensure that Uber doesn't develop a monopoly on whatever a jump bike is.

Prior restraint on free enterprise that lacks negative externalities opens the door to crony capitalism replete with bribes, donations, and rent-seeking. In government, never ascribe to benevolence that which can be better explained by greed or power-seeking.


> ...

all off topic generalities.


Sorry, I don't believe any of that. It reads like "let them eat cake".


Can you elaborate a bit more about your usage of GitLab CI/CD for model management/development. I am currently working on a platform [1] that tries to solve some of the issues mentioned in the article, i.e. improving data scientists' productivity and velocity, compare models, solve reproducibility issues...

[1] https://github.com/polyaxon/polyaxon


We uh treat models as code, but also have NFS shares setup for the storage and GitLab runner talking to a Slurm cluster to run the models. Results and cross validation upload to GitLab. Main thing we haven’t built out yet are performance dashboards for showing improvement across commits, but with the GitLab APIs that’s a script away (currently we do it by hand)


Thanks for your reply.

Actually the question was more around "how do you create your models and what do you mean treating them as code", "why slurm and not something like airflow" , "what is the test/performance setup - backtesting, smoke test" etc etc

The Gitlab stuff is easier to understand.


Ah right,

> how do you create your models and what do you mean treating them as code

we start with local Jupyter notebooks, and refactor bits of code into modules that get tested, which for our models mainly means recovering parameters from simulations, and then test them on real data, where we assess performance with LOO approximations for Bayesian models (notably PSIS) and some labeling from experts (which is not taken too seriously tbh)

> why slurm and not something like airflow

because the HPC resources we have access to are built with Slurm, which is super fast, supports DAGs of jobs, schedules our jobs reliably and quickly. I don't really want the other stuff on the Airflow feature list to be honest.


super interesting. thanks for sharing.

>we start with local Jupyter notebooks, and refactor bits of code into modules that get tested, which for our models mainly means recovering parameters from simulations, and then test them on real data

This is the part that everyone seems reinventing. Have you looked at PyML (https://eng.uber.com/michelangelo-pyml/). What are some of your learnings around jupyter -> production code. A lot of these are around conventions - "write a function called train(), fit(), test()". Is that the basis of your pipeline as well ?


It’s not so simple for our models (hierarchical Bayesian time series models, often nonlinear, which may not be typical): we spend a lot of time digging through the data itself, forward simulations of model, and refactoring/tweaking model structure. PyML (as described in the link you provided) doesn’t appear to support the first two parts, which are prerequisites to improving the model IMO.

Usually when we are doing more of the train/fit/test cycle, there’s an argparse script to quickly try different parameter values succinctly (which is run and tracked by the above CI setup)

I wouldn’t say we’re reinventing since a better solution isn’t very clear (though PyML et al look interesting)

edit forward simulation isn't a frequent thing in posts on generic ML algorithms, so just as an example: suppose you run a model and see an oscillatory component along a temporal dimensions in your residual error, and you add a oscillatory component to your model, and rerun it but still see a residual with an oscillation. You can run a forward simulation of your model to see what frequency it's predicting and check against what's seen in the data, and fix it. This is a contrived example but when you have multiple competing priors or model components, this is an effective way to debug their behavior.


this article (https://towardsdatascience.com/uber-introduces-pyml-their-se...) does a better job motivating PyML, or maybe I'm just more awake now. In any case, I see what you mean. The GitLab CI setup we have builds Docker images out of our models, and we use branch names to target datasets, so "production" usage is "just" creating a branch, watching it run, checking results, etc.

Maybe a missing detail is that our models are run-once, once results are QA'd, they are sent to relevant practitioner, so Uber's query-per-second stuff is irrelevant for us (for now), which I can see simplifies the deployment question enormously.


Hello, Community Advocate from GitLab here. I was reading through your comments and it's great to hear how you use GitLab for your setup. Thanks for sharing your story with the community and we'd love to hear more from you on how GitLab helps you.


> we'd love to hear more from you on how GitLab helps you.

do you have specific questions?


We wanted to hear what features you like using the most and how do those features help you with setting up your project. However, you wrote https://news.ycombinator.com/item?id=18384804 which answers a lot of the questions. Thanks!


Interesting. In that case, why do you even use Docker ? Does it simplify distribution of models easier ?

Would love to know more about your packaging setup - the branch name to divide datasets is a nice trick (I'll use it as well).

How does your CI know where to find models ? Im betting you are using some kind of convention here - one model per py file...so package each py file in a docker container.

If it is possible, would love to see the skeleton structure of one of your pre-packaged files.

Tldr - it seems you invented something like pyml as well. Are the deployment scripts+model skeletons open source ?


Our GitLab instance has a lot of projects and it’s been helpful for the users to have a set of template projects each with their own Docker image. Some of those images are many gigabytes in size, tricky env vars etc. Docker “democratized” CI for most of our scientific personnel who aren’t devs, since they can hit the Fork button and have a working CI config to base their project on.

In the ML projects, it serves mainly to package dependencies, and to ensure some basic security constraints: raw datasets are accessible read only, ensuring that if we suspect some issue with cached results (cause our inner orchestrator is Make..) we can nuke all the results and start over from scratch, sure the raw data is intact.

The models and arguments are in the CI config. No magic there, but since it’s all in the repo I’m ok with it.

This whole setup was put together for an upcoming clinical trial as steps toward ISO quality norms compliance, and I can’t share it now. I do intend to reproduce it in an open form alongside our existing software (GitHub.com/the-virtual-brain) when it’s ready.

In any case I appreciate your questions a lot: they drove me to think a little harder and see why stuff like Michelango and PyML is stuff that even we (academic/clinical) group should be using... if we can find the time to do it.


What do you think are the best dashboard options for showing improvements?


I'd probably set up Grafana talking to Elasticsearch or PostGres, but I haven't thought too hard about it.


Polyaxon looks nice but we don’t admin the majority of the GPU resources we use (which is why being able to tell GitLab-runner to invoke Slurm is cool)

Pachyderm is another one I’ve looked at but we don’t have the sys admin bandwidth for that stuff right now.


Why would you do this instead of using pymc3?


PyMC3 didn’t run well on GPUs last I tried. That may have changed but I find PyTorch easier to work with than Theano or TensorFlow.


Just in case other readers stumble by, neither of these perceptions of pymc is accurate.

GPU operability is well-supported, and much like Keras, pymc provides well-designed abstractions over top of TensorFlow, making the downsides of raw TensorFlow mostly irrelevant.

I like PyTorch a lot too, but any time I see someone say PyTorch is easier than TensorFlow, it usually just means that person only tried PyTorch, learned some special knowledge about it, and now they don’t want to admit using a different framework might be the better choice, even if it requires giving up some of what’s nice about PyTorch.


That’s a fairly aggressive response.

Both TF and Theano require static graph while PyTorch lets you use Python’s regular control flows (if, for, while, etc). This makes building modular model components much easier, since you can reason about execution mostly as if it’s normal numerical Python code.

I have tried running PyMC3 models on GPUs (when they were on Theano; not sure if they have transitioned since) and it is slower than CPUs, not for small models but the big, SIMD-wide ones. When I ported the same thing to Pyro/PyTorch, it was clearly making good use of the GPU, not bottlenecked by useless CPU-GPU transfers

Maybe that’s changed now, so as they say the only useful benchmark is your own code.


> “I have tried running PyMC3 models on GPUs (when they were on Theano; not sure if they have transitioned since) and it is slower than CPUs, not for small models but the big, SIMD-wide ones.“

Can you post a link to your code with some synthetic data of the sizes you’re talking about to demonstrate this? I hear it as a criticism a lot, but have never found it to be true (full disclosure: I work on a large-scale production system that uses pymc for huge Bayesian logistic regression and huge hierarchical models, both in GPU mode out of necessity).

> “Both TF and Theano require static graph while PyTorch lets you use Python’s regular control flows (if, for, while, etc). This makes building modular model components much easier, since you can reason about execution mostly as if it’s normal numerical Python code.”

I can’t tell if you’ve looked into pymc or not based on this (or Keras either for that matter), since in pymc, GPU mode is just a Theano setting, you don’t actually write any Theano code, manipulate any graphs or sessions directly, or anything else. You just call pm.sample with the appropriate mode settings at it is executed on the GPU.

Much like with Keras, where you can also easily use Python native control flow, context managers and so on, pymc doesn’t require low-level usage of underlying computation graph abstractions.

Again, I really like PyTorch too, but people just seem to have only ever tried PyTorch, liked one or two things about it, forgive the parts that are bad about it (like needing to explicitly write a wrapper for the backwards calculation for custom layers, which you don’t need to do in Keras for example), and generalize to criticize other tools.


I’ve contributed to pymc actually (https://docs.pymc.io/api/distributions/timeseries.html#pymc3...) and used it in research projects. So when I say I find Pyro/PyTorch easier to use, it’s not wishful thinking.

I don’t have pymc code anymore since we have moved to Stan, and now starting porting code to Pyro.

> forgive the parts that are bad about it (like needing to explicitly write a wrapper for the backwards calculation for custom layers

Why do that when AD does it for you?


like needing to explicitly write a wrapper for the backwards calculation for custom layers, which you don’t need to do in Keras for example

Not sure I understand - you will need to write a backwards pass regardless if you use Keras, PyTorch, or anything else. With Keras, you would need to modify the underlying backend code (e.g. with tf.RegisterGradient or tf.custom_gradient). With Pytorch you write the backward() function, which is about the same amount of effort.


You missed the point entirely. When you compose operations in Keras, it automatically generates the backpropagation implementation, you do not need RegisterGradient, custom_gradient or anything else if you are making new operations or layers as the composition of existing operations (whether that is logical indexing, concatenation, math functions, whatever).

In PyTorch, you still do have to define the backward function and worry about bookkeeping the gradient, clearing gradient values at the appropriate time, and explicitly calling to calculate these things in verbose optimizer invocation code.

I encourage you to check out how this works in Keras, because it is simply just factually different than what you are saying, in ways that are specifically designed to remove certain types of boilerplate or overhead or bookkeeping that are required by PyTorch.


No, you're wrong about Pytorch. If your custom op is a combination of existing ops, you don't need to define a custom backward pass. This is true for any DL framework with autodiff. For more details, look at this answer [1].

Regarding more verbose Pytorch code for the update step, compare:

In Tensorflow:

loss = tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=output_logits)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)

sess.run(optimizer)

In PyTorch:

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

loss = nn.CrossEntropyLoss()(output, label)

optimizer.zero_grad()

loss.backward()

optimizer.step()

In my opinion, PyTorch makes the parameter update process a lot easier to understand, control, and modify (if needed). For example what if you want to modify gradients right before the weight update? In PyTorch I'd do it right here in my code after the loss.backward() statement, while in TF I'd have to modify the optimizer code. Which option would you prefer?

[1] https://stackoverflow.com/questions/44428784/when-is-a-pytor...


> PyTorch, you still do have to define the backward function and worry about bookkeeping the gradient, clearing gradient values at the appropriate time, and explicitly calling to calculate these things in verbose optimizer invocation code

I’ve definitely never had to do that. Where do you get this from?


would love to know what is your model development iteration. especially how you do testing, etc


See my comment here, but I can answer other questions if you have them

https://news.ycombinator.com/item?id=18376567


This is not a product, nor is it open sourced - so this is basically just a PR stunt. Or am I missing anything??


Looks like a blog post about an internal tool. Not sure why this is interesting to people


It's kinda funny they tout their usage of GPS. I use Uber on a near daily basis and drivers by an large use Google maps. They have out right said "Uber sucks for directions"

And if you use express pools it will always say to go the wrong side of an intersection. I like uber because of the drivers, but their fancy technology is flawed.


Please do not conflate GPS with navigation. There is a massive set of problems you can solve with high fidelity GPS Data (Uber knows it is a driver in a car, verifies it with another GPS entity (rider app reports GPS also), etc). There is not that much overlap between great GPS data and great maps - no amount of great GPS data will give you a good basemap. Please let me know if I am not making sense, I am more than happy to provide examples / explain further!


Can you expand on the difference between GPS and navigation?


GPS is a system for determining your position in the world, usually in latitude and longitude.

Navigation is pathfinding in the real world + directions. GPS is useful (But far from the only) system for determining where you are. Navigation is often implemented as a route from point A (in lat, lng) to point B (in lat, lng) and then running an algorithm (such as dijkstra, or A*, but usually something far more advanced) from A to B. The algorithm runs on a routing graph of some kind, produced from processing real world map data.


All that a GPS device gives you is your coordinates (latitude and longitude). It does not tell you whether these coordinates correspond to a street address (you need a map for that). It does not tell you how to move between two points (you need a pretty complicated set of routing algorithms and historical/real-time traffic congestion data to do that).


I've never seen an Uber driver not use Waze in London.


I was in Colombia and South Africa last year. Those Uber drivers also used Waze.


In Mexico City I always see them use Waze.


I believe they’re using GPS data here more for analytics, rather than navigation.

They can use GPS data to chart usage metrics, plan pool rides, check for anomalies, and harass journalists, for example.


Indeed. What is even more strange about the use of google maps is that Uber bought Bing maps, I am sure for a hefty sum.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: