Hacker News new | past | comments | ask | show | jobs | submit login
Instagram Makes a Smooth Move to Python 3 (thenewstack.io)
218 points by fjordan on June 15, 2017 | hide | past | favorite | 61 comments



“Yeah, Python is great in so many ways, too bad it’s not really scalable.”

I'm not even sure what this means anymore. I guess I'm just not sure how any language, when used correctly, could be inherently unscalable. My guess is statements like this came from a time when monoliths were the application design of choice? Now, assuming Instagram has just 1,000 photo handling servers, each one is only responsible for 95,000 photos a day.

Of course, that's not to say that Instagram doesn't have CAP issues. It does, especially in the "C" area, but again, not a problem inherent in the language.


The threading story isn't great. It's not statically typed, which can make working on a large codebase with more people more painful. It also forces you to write more unit testing code coverage to make up for the lack of a static compiler checking things for you. Raw performance is not good compared to golang or java.

When you start getting to a certain scale, developers are cheaper than your server costs in some cases. That is when something being performant is more worth it.


All of these are decades-old talking points which have been largely shown to be factually incorrect. Why are you repeating them without justification?


And at that scale you can make additional microservices for the critical path where it matters.


Not really. You'll notice a pattern in most tech bigcos where they move from dynamic lang X to a statically typed language that can multithread properly.

A few examples: Ruby on rails to java (twitter). Java & c++ (google). Java (linked in). Python/Node -> Java/Go (uber). Or they start doing silly things like make a new VM (facebook).


I heard through the grapevine that Go was only chosen at Uber to help with hiring.


That seems silly. There are far more Java devs out there than Go devs.

Also, I would assume Uber probably has hiring problems in general at this point.


> That seems silly. There are far more Java devs out there than Go devs.

I wouldn't underestimate the level of hype-driven development that exists in this area. "Chasing the new shiny" seems like it could be a line item in a resume, these days, sometimes.


There are more good Go devs. A lot of Java programmers are enterprise.


I would imagine that node would draw more enthusiasm than Go


I wonder if there is some survivor bias there. I'm sure there must be cases where everything went wrong but nobody writes about it.


Wouldn't survivor bias explain who BigCo X can be found using whatever questionable choice they originally made, but not explain why lots of BigCos have changed their practices.

This might explain Facebook and PHP, but it doesn't explain the stuff mentioned in the previous comment.


Google uses plenty of Python, no?

I don't think threading is a relevant factor for many of those decisions. Perf and static typing, sure.


Not a googler, but I have heard that most of the python at google is under Youtube, which Google got in an acquisition.


From python to go and back again https://news.ycombinator.com/item?id=10402307 (mozilla)


Don't be obtuse, it's not because of multithreading.

They move to static typed because their team is 100+ developers none of which knows the entire code base - and static typing helps lower the bugs possible.


Plus Facebook has a lot of C++ and some Java services as well, in addition to the Hack/HHVM stuff.


Linkedin was mostly scala and then they did some stuff with node.js >___>

Walmart also moved to node.js.

Hype was real.


There is RoR for Jruby which has native threads, plus you can call Java methods from Ruby.


"It's not statically typed, which can make working on a large codebase with more people more painful. It also forces you to write more unit testing code coverage to make up for the lack of a static compiler checking things for you."

I never missed static typing on large code bases but I had numerous bugs caused by python's implicit type conversions - string to iterable of characters ["h", "e", "l", "l", "o"], None to False, string "no" to True, 0 to False, "" to False, etc.

A lot of this mirrors C's infuriating implicit type conversions.


For what it is worth string to iterable char bugs happen to me plenty in scala. Hard problem to fix. Array of strings or string of chars?


I don't think it is that hard. Simply stop making strings implicitly iterable and make chars a different type.


See issues like https://engineering.instagram.com/dismissing-python-garbage-...

If you have to turn off a crucial language feature to increase performance, I'm not sure whether a language is considered "scalable".


You can "brute force" scale pretty much anything just by throwing up more computing resources at the problem - but that means the language/framework/library is less scalable than something else which requires less resources. If the different in scalability is large enough, I suppose one can claim that one of the alternatives is "not really scalable".

I suppose that quote could use better wording.


As one example, PHP 5.x as-is (without caching) isn't a language that scales well at all. It becomes a horrible bottleneck under heavy load.


Even with caching it's quite poor, especially if you are using frameworks that aren’t written to properly take into account its unique means of execution (which the vast majority do not).


It's old but still useful: https://www.techempower.com/benchmarks/

Almost every language can handle typical performance requirements. But when people say slow or unscalable it's almost always in relation to other languages.


Members of the Py3 transition team (the authors of that article) gave a talk about the project at PyCon 2017: https://www.youtube.com/watch?v=66XoCk79kjM


I can't say I think this has ever been really true: "Performance speed is no longer the primary worry. Time to market speed is."

Performance has been a concern but programatic loadbalancing has been around for decades. When I worked at MSN/Linkexchange back in the late 90's we never really worried heavily about the performance of the language we used (Perl) because we could scale out servers. Perl isn't that speedy but it sure was easy to develop in. We served a billion and a half clicks per month with 8-10 machines from a single datacenter before I left, with Perl.


Right, you hear a lot of griping about moving from Python 2 to 3 but I personally didn't have as much trouble as expected. Some of my projects just worked. One small tip I don't think they mentioned. Start using the logging module instead of print and it will eliminate one class of potential issues.


I've heard people complain about print being a function now more than a hand full of times and my response is always "are you really using print that much in your code base?"


You still use it in debuggers and in the REPL. Using it seldom makes it harder to relearn the muscle memory, and only hitting it when you're debugging makes it more likely that you're already frustrated when it happens.


This is what I encourage all of our team to do, it just seems like a no brainer. Having stray "testing" strings pop up in our logs without any source is hella annoying.


I've been writing mostly small python projects for 15 years, starting with 2.2.

I've had no issues in the migration to 3.

I did some building with it around 3.3, starting to commit around 3.4, and with 3.5 I build everything in it.

I don't have the performance challenges Instagram has, but my experience with application development in general is that 98% of performance challenges can be solved with (not-too) clever engineering. This applies to projects in every language.

There are a vanishingly small number of scenarios where the performance of your runtime actually dictates your performance limits.

If you're working on something and are worried about Python's performance, or which Python to use, don't. Use 3, optimize later.


I watched the PyCon keynote on this topic, and while it's nice to hear they've moved to Python3 their approach probably shouldn't be copied.

For example, in their codebase they had ambiguity between bytestrings and unicode strings. As Python3 tries to prevent you doing this, to resolve a big footgun from Python2.

The right fix here is to be consistent in your use of strings. Sometimes that is tricky because of how third party libraries have decided to implement their 2/3 compatibility, but it helps prevent shooting yourself in the foot with unicode bugs down the line.

Instagram did not do this. They created utility functions to force their data into the format they wanted at the point it is used. In other places they used tuple() to make sure that map calls that had side effects were fully iterated over.

In short, they had bad Python2 code and now have had Python3 code. Sometimes, at large scale, it's your only choice. But to smaller companies looking at this it's a bad idea. You're setting a precedent in your code that it's okay to make the same mistakes that Python3 tried to prevent.


So their server needs are growing faster than their user-base to the extent that they considered switching languages. PHP didn't seem to perform much better, so they stuck with python and got a ~12% CPU usage improvement by moving to python3. It doesn't seem like a 12% one-time improvement actually solves the original problem, though. Perhaps pypy would have been better?


Instagram also uses Cython a lot (so I've heard from a talk), so switching away from CPython might not provide as much of a speed up as one might otherwise expect, as one can get C levels of speed (and concurrency) with significant effort using Cython. Also, Cython and PyPy might not play so well together...


types in python 3.5?! I had no idea -- that's exciting.


This was a big contribution from Dropbox: http://blog.zulip.org/2016/10/13/static-types-in-python-oh-m...


The language itself supports annotations, without any actual typechecking. You use http://mypy-lang.org/ to get the static checking.


There is also Google's pytype: https://github.com/google/pytype


That looks like it adds types rather than checks?


It does both. I can do some inference but definitely does type checking. Here's a Pycon talk from last year about pytype: https://youtu.be/IDm_YIQihhs


It needs severely more documentation, heh. Why would I use it over mypy, which appears to have a lot more community investment, thorough docs, and is somewhat official? It also doesn't support python3.6 so it doesn't know about format strings =/


pytype might work better in some cases, because it can do inference. But yes, it needs more docs.


Note they are type annotations. They are for tooling/development only; the runtime doesn't care about the types at all.


> Note they are type annotations. They are for tooling/development only; the runtime doesn't care about the types at all.

Neither does, say, Haskell's runtime.


You say that like it's a bad thing, but that's what a static language is - types are checked at compile time and, hopefully, forgotten about at runtime.


That's not entirely true. The information from the type annotations is available at runtime. ApiStar takes advantage of this, for example.


The information from annotations is exposed to the runtime, but nothing in the runtime treats them as types or performs any type checking. The feature was introduced as a generic way to annotate functions, without being constrained to a single use case (type checking), and all the tools which actually do type-checking based on annotations are third-party.


C++ also does do any type checking at runtime. Compile time checking is all that really matters.


The CPython implementation does have a "compile" step (to produce bytecode, which is what actually gets executed, by a simple stack-based virtual machine), and does not do any type checking in that step no matter how many annotations you give it.


Hmm, Perl6 has real types, although they're completely optional.


which language's run time cares about types?


Any dynamic language - by definition, they check types at runtime.

Or, unfortunately, a lot of static languages. Any static language that allows type casting, for instance - that's the only way they can check whether a cast from a type to one of its descendants is valid (eg, in Java, casting an Object to a, say, URL).


Type information is used in JS engines to compile into native code. If type information is too vague or change you can have deoptimization[0]

[0]http://jayconrod.com/posts/54/a-tour-of-v8-crankshaft-the-op...


The types do the actually speed up execution but I find them useful in various parts of my codebase when I work on a project (for exaple when defining classes which represent more advanced and complex data types for my specific project). It help staying organized and making sure your not ambiguously passing in wrong data types. Also pycharm uses them to understand custom classes and functions you define and provide suggestions.


Anybody have experience of using this? More info: https://docs.python.org/3/library/typing.html

I've been using TypeScript a lot recently and it has had a big impact on reducing bugs and making refactoring easier so something similar for Python looks great.



>It made sense that, if we were going to stay on Python for the next ten years, we should invest in the latest version of the language.

I don't know this stands out to me in particular, but the 10-year commitment is definitely a big decision. I suppose I've never had to make a similar decision so perhaps this is more common than I think.


Most successful application I've worked on have either already been around for 10 years, or have lived onto become 10 years old. The ones that didn't, were usually not successful.

It's a good reason to be careful to avoid the latest and greatest, but go for tech that has (some) track record of being maintained and used for a few years.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: