Instagram Makes a Smooth Move to Python 3

dmalvarado · on June 15, 2017

“Yeah, Python is great in so many ways, too bad it’s not really scalable.”

I'm not even sure what this means anymore. I guess I'm just not sure how any language, when used correctly, could be inherently unscalable. My guess is statements like this came from a time when monoliths were the application design of choice? Now, assuming Instagram has just 1,000 photo handling servers, each one is only responsible for 95,000 photos a day.

Of course, that's not to say that Instagram doesn't have CAP issues. It does, especially in the "C" area, but again, not a problem inherent in the language.

mahyarm · on June 15, 2017

The threading story isn't great. It's not statically typed, which can make working on a large codebase with more people more painful. It also forces you to write more unit testing code coverage to make up for the lack of a static compiler checking things for you. Raw performance is not good compared to golang or java.

When you start getting to a certain scale, developers are cheaper than your server costs in some cases. That is when something being performant is more worth it.

ubernostrum · on June 15, 2017

All of these are decades-old talking points which have been largely shown to be factually incorrect. Why are you repeating them without justification?

bpicolo · on June 15, 2017

And at that scale you can make additional microservices for the critical path where it matters.

mahyarm · on June 15, 2017

Not really. You'll notice a pattern in most tech bigcos where they move from dynamic lang X to a statically typed language that can multithread properly.

A few examples: Ruby on rails to java (twitter). Java & c++ (google). Java (linked in). Python/Node -> Java/Go (uber). Or they start doing silly things like make a new VM (facebook).

stuartaxelowen · on June 15, 2017

I heard through the grapevine that Go was only chosen at Uber to help with hiring.

stock_toaster · on June 15, 2017

That seems silly. There are far more Java devs out there than Go devs.

Also, I would assume Uber probably has hiring problems in general at this point.

sidlls · on June 15, 2017

> That seems silly. There are far more Java devs out there than Go devs.

I wouldn't underestimate the level of hype-driven development that exists in this area. "Chasing the new shiny" seems like it could be a line item in a resume, these days, sometimes.

nailer · on June 16, 2017

There are more good Go devs. A lot of Java programmers are enterprise.

mvid · on June 19, 2017

I would imagine that node would draw more enthusiasm than Go

plafl · on June 15, 2017

I wonder if there is some survivor bias there. I'm sure there must be cases where everything went wrong but nobody writes about it.

adrianratnapala · on June 15, 2017

Wouldn't survivor bias explain who BigCo X can be found using whatever questionable choice they originally made, but not explain why lots of BigCos have changed their practices.

This might explain Facebook and PHP, but it doesn't explain the stuff mentioned in the previous comment.

bpicolo · on June 15, 2017

Google uses plenty of Python, no?

I don't think threading is a relevant factor for many of those decisions. Perf and static typing, sure.

treebog · on June 16, 2017

Not a googler, but I have heard that most of the python at google is under Youtube, which Google got in an acquisition.

crdoconnor · on June 16, 2017

From python to go and back again https://news.ycombinator.com/item?id=10402307 (mozilla)

zepolen · on June 15, 2017

Don't be obtuse, it's not because of multithreading.

They move to static typed because their team is 100+ developers none of which knows the entire code base - and static typing helps lower the bugs possible.

jrs95 · on June 15, 2017

Plus Facebook has a lot of C++ and some Java services as well, in addition to the Hack/HHVM stuff.

digitalzombie · on June 16, 2017

Linkedin was mostly scala and then they did some stuff with node.js >___>

Walmart also moved to node.js.

Hype was real.

petre · on June 15, 2017

There is RoR for Jruby which has native threads, plus you can call Java methods from Ruby.

crdoconnor · on June 16, 2017

"It's not statically typed, which can make working on a large codebase with more people more painful. It also forces you to write more unit testing code coverage to make up for the lack of a static compiler checking things for you."

I never missed static typing on large code bases but I had numerous bugs caused by python's implicit type conversions - string to iterable of characters ["h", "e", "l", "l", "o"], None to False, string "no" to True, 0 to False, "" to False, etc.

A lot of this mirrors C's infuriating implicit type conversions.

brianwawok · on June 17, 2017

For what it is worth string to iterable char bugs happen to me plenty in scala. Hard problem to fix. Array of strings or string of chars?

crdoconnor · on June 17, 2017

I don't think it is that hard. Simply stop making strings implicitly iterable and make chars a different type.

jeeyoungk · on June 15, 2017

See issues like https://engineering.instagram.com/dismissing-python-garbage-...

If you have to turn off a crucial language feature to increase performance, I'm not sure whether a language is considered "scalable".

nawitus · on June 15, 2017

You can "brute force" scale pretty much anything just by throwing up more computing resources at the problem - but that means the language/framework/library is less scalable than something else which requires less resources. If the different in scalability is large enough, I suppose one can claim that one of the alternatives is "not really scalable".

I suppose that quote could use better wording.

adventured · on June 15, 2017

As one example, PHP 5.x as-is (without caching) isn't a language that scales well at all. It becomes a horrible bottleneck under heavy load.

thehardsphere · on June 15, 2017

Even with caching it's quite poor, especially if you are using frameworks that aren’t written to properly take into account its unique means of execution (which the vast majority do not).

threeseed · on June 15, 2017

It's old but still useful: https://www.techempower.com/benchmarks/

Almost every language can handle typical performance requirements. But when people say slow or unscalable it's almost always in relation to other languages.

Dowwie · on June 15, 2017

Members of the Py3 transition team (the authors of that article) gave a talk about the project at PyCon 2017: https://www.youtube.com/watch?v=66XoCk79kjM

bifrost · on June 15, 2017

I can't say I think this has ever been really true: "Performance speed is no longer the primary worry. Time to market speed is."

Performance has been a concern but programatic loadbalancing has been around for decades. When I worked at MSN/Linkexchange back in the late 90's we never really worried heavily about the performance of the language we used (Perl) because we could scale out servers. Perl isn't that speedy but it sure was easy to develop in. We served a billion and a half clicks per month with 8-10 machines from a single datacenter before I left, with Perl.

mixmastamyk · on June 15, 2017

Right, you hear a lot of griping about moving from Python 2 to 3 but I personally didn't have as much trouble as expected. Some of my projects just worked. One small tip I don't think they mentioned. Start using the logging module instead of print and it will eliminate one class of potential issues.

vhost- · on June 15, 2017

I've heard people complain about print being a function now more than a hand full of times and my response is always "are you really using print that much in your code base?"

MatthewWilkes · on June 16, 2017

You still use it in debuggers and in the REPL. Using it seldom makes it harder to relearn the muscle memory, and only hitting it when you're debugging makes it more likely that you're already frustrated when it happens.

dangayle · on June 15, 2017

This is what I encourage all of our team to do, it just seems like a no brainer. Having stray "testing" strings pop up in our logs without any source is hella annoying.

passive · on June 16, 2017

I've been writing mostly small python projects for 15 years, starting with 2.2.

I've had no issues in the migration to 3.

I did some building with it around 3.3, starting to commit around 3.4, and with 3.5 I build everything in it.

I don't have the performance challenges Instagram has, but my experience with application development in general is that 98% of performance challenges can be solved with (not-too) clever engineering. This applies to projects in every language.

There are a vanishingly small number of scenarios where the performance of your runtime actually dictates your performance limits.

If you're working on something and are worried about Python's performance, or which Python to use, don't. Use 3, optimize later.

MatthewWilkes · on June 16, 2017

I watched the PyCon keynote on this topic, and while it's nice to hear they've moved to Python3 their approach probably shouldn't be copied.

For example, in their codebase they had ambiguity between bytestrings and unicode strings. As Python3 tries to prevent you doing this, to resolve a big footgun from Python2.

The right fix here is to be consistent in your use of strings. Sometimes that is tricky because of how third party libraries have decided to implement their 2/3 compatibility, but it helps prevent shooting yourself in the foot with unicode bugs down the line.

Instagram did not do this. They created utility functions to force their data into the format they wanted at the point it is used. In other places they used tuple() to make sure that map calls that had side effects were fully iterated over.

In short, they had bad Python2 code and now have had Python3 code. Sometimes, at large scale, it's your only choice. But to smaller companies looking at this it's a bad idea. You're setting a precedent in your code that it's okay to make the same mistakes that Python3 tried to prevent.

richard_todd · on June 15, 2017

So their server needs are growing faster than their user-base to the extent that they considered switching languages. PHP didn't seem to perform much better, so they stuck with python and got a ~12% CPU usage improvement by moving to python3. It doesn't seem like a 12% one-time improvement actually solves the original problem, though. Perhaps pypy would have been better?

williamstein · on June 15, 2017

Instagram also uses Cython a lot (so I've heard from a talk), so switching away from CPython might not provide as much of a speed up as one might otherwise expect, as one can get C levels of speed (and concurrency) with significant effort using Cython. Also, Cython and PyPy might not play so well together...

eggie5 · on June 15, 2017

types in python 3.5?! I had no idea -- that's exciting.

williamstein · on June 15, 2017

This was a big contribution from Dropbox: http://blog.zulip.org/2016/10/13/static-types-in-python-oh-m...

bpicolo · on June 15, 2017

The language itself supports annotations, without any actual typechecking. You use http://mypy-lang.org/ to get the static checking.

shalabhc · on June 15, 2017

There is also Google's pytype: https://github.com/google/pytype

bpicolo · on June 16, 2017

That looks like it adds types rather than checks?

shalabhc · on June 16, 2017

It does both. I can do some inference but definitely does type checking. Here's a Pycon talk from last year about pytype: https://youtu.be/IDm_YIQihhs

bpicolo · on June 16, 2017

It needs severely more documentation, heh. Why would I use it over mypy, which appears to have a lot more community investment, thorough docs, and is somewhat official? It also doesn't support python3.6 so it doesn't know about format strings =/

shalabhc · on June 19, 2017

pytype might work better in some cases, because it can do inference. But yes, it needs more docs.

joobus · on June 15, 2017

Note they are type annotations. They are for tooling/development only; the runtime doesn't care about the types at all.

dragonwriter · on June 15, 2017

> Note they are type annotations. They are for tooling/development only; the runtime doesn't care about the types at all.

Neither does, say, Haskell's runtime.

nrinaudo · on June 15, 2017

You say that like it's a bad thing, but that's what a static language is - types are checked at compile time and, hopefully, forgotten about at runtime.

jrs95 · on June 15, 2017

That's not entirely true. The information from the type annotations is available at runtime. ApiStar takes advantage of this, for example.

ubernostrum · on June 15, 2017

The information from annotations is exposed to the runtime, but nothing in the runtime treats them as types or performs any type checking. The feature was introduced as a generic way to annotate functions, without being constrained to a single use case (type checking), and all the tools which actually do type-checking based on annotations are third-party.

randyrand · on June 15, 2017

C++ also does do any type checking at runtime. Compile time checking is all that really matters.

ubernostrum · on June 15, 2017

The CPython implementation does have a "compile" step (to produce bytecode, which is what actually gets executed, by a simple stack-based virtual machine), and does not do any type checking in that step no matter how many annotations you give it.

petre · on June 15, 2017

Hmm, Perl6 has real types, although they're completely optional.

ice109 · on June 15, 2017

which language's run time cares about types?

nrinaudo · on June 15, 2017

Any dynamic language - by definition, they check types at runtime.

Or, unfortunately, a lot of static languages. Any static language that allows type casting, for instance - that's the only way they can check whether a cast from a type to one of its descendants is valid (eg, in Java, casting an Object to a, say, URL).

Chyzwar · on June 15, 2017

Type information is used in JS engines to compile into native code. If type information is too vague or change you can have deoptimization[0]

[0]http://jayconrod.com/posts/54/a-tour-of-v8-crankshaft-the-op...

stefanpie · on June 15, 2017

The types do the actually speed up execution but I find them useful in various parts of my codebase when I work on a project (for exaple when defining classes which represent more advanced and complex data types for my specific project). It help staying organized and making sure your not ambiguously passing in wrong data types. Also pycharm uses them to understand custom classes and functions you define and provide suggestions.

seanwilson · on June 15, 2017

Anybody have experience of using this? More info: https://docs.python.org/3/library/typing.html

I've been using TypeScript a lot recently and it has had a big impact on reducing bugs and making refactoring easier so something similar for Python looks great.

shalabhc · on June 15, 2017

See https://youtu.be/7ZbwZgrXnwY for a talk from the mypy team.

Description: https://us.pycon.org/2017/schedule/presentation/678/

nyangosling · on June 15, 2017

>It made sense that, if we were going to stay on Python for the next ten years, we should invest in the latest version of the language.

I don't know this stands out to me in particular, but the 10-year commitment is definitely a big decision. I suppose I've never had to make a similar decision so perhaps this is more common than I think.

treve · on June 15, 2017

Most successful application I've worked on have either already been around for 10 years, or have lived onto become 10 years old. The ones that didn't, were usually not successful.

It's a good reason to be careful to avoid the latest and greatest, but go for tech that has (some) track record of being maintained and used for a few years.