“Yeah, Python is great in so many ways, too bad it’s not really scalable.”
I'm not even sure what this means anymore. I guess I'm just not sure how any language, when used correctly, could be inherently unscalable. My guess is statements like this came from a time when monoliths were the application design of choice? Now, assuming Instagram has just 1,000 photo handling servers, each one is only responsible for 95,000 photos a day.
Of course, that's not to say that Instagram doesn't have CAP issues. It does, especially in the "C" area, but again, not a problem inherent in the language.
The threading story isn't great. It's not statically typed, which can make working on a large codebase with more people more painful. It also forces you to write more unit testing code coverage to make up for the lack of a static compiler checking things for you. Raw performance is not good compared to golang or java.
When you start getting to a certain scale, developers are cheaper than your server costs in some cases. That is when something being performant is more worth it.
Not really. You'll notice a pattern in most tech bigcos where they move from dynamic lang X to a statically typed language that can multithread properly.
A few examples: Ruby on rails to java (twitter). Java & c++ (google). Java (linked in). Python/Node -> Java/Go (uber). Or they start doing silly things like make a new VM (facebook).
> That seems silly. There are far more Java devs out there than Go devs.
I wouldn't underestimate the level of hype-driven development that exists in this area. "Chasing the new shiny" seems like it could be a line item in a resume, these days, sometimes.
Wouldn't survivor bias explain who BigCo X can be found using whatever questionable choice they originally made, but not explain why lots of BigCos have changed their practices.
This might explain Facebook and PHP, but it doesn't explain the stuff mentioned in the previous comment.
Don't be obtuse, it's not because of multithreading.
They move to static typed because their team is 100+ developers none of which knows the entire code base - and static typing helps lower the bugs possible.
"It's not statically typed, which can make working on a large codebase with more people more painful. It also forces you to write more unit testing code coverage to make up for the lack of a static compiler checking things for you."
I never missed static typing on large code bases but I had numerous bugs caused by python's implicit type conversions - string to iterable of characters ["h", "e", "l", "l", "o"], None to False, string "no" to True, 0 to False, "" to False, etc.
A lot of this mirrors C's infuriating implicit type conversions.
You can "brute force" scale pretty much anything just by throwing up more computing resources at the problem - but that means the language/framework/library is less scalable than something else which requires less resources. If the different in scalability is large enough, I suppose one can claim that one of the alternatives is "not really scalable".
Even with caching it's quite poor, especially if you are using frameworks that aren’t written to properly take into account its unique means of execution (which the vast majority do not).
Almost every language can handle typical performance requirements. But when people say slow or unscalable it's almost always in relation to other languages.
I can't say I think this has ever been really true:
"Performance speed is no longer the primary worry. Time to market speed is."
Performance has been a concern but programatic loadbalancing has been around for decades. When I worked at MSN/Linkexchange back in the late 90's we never really worried heavily about the performance of the language we used (Perl) because we could scale out servers. Perl isn't that speedy but it sure was easy to develop in. We served a billion and a half clicks per month with 8-10 machines from a single datacenter before I left, with Perl.
Right, you hear a lot of griping about moving from Python 2 to 3 but I personally didn't have as much trouble as expected. Some of my projects just worked. One small tip I don't think they mentioned. Start using the logging module instead of print and it will eliminate one class of potential issues.
I've heard people complain about print being a function now more than a hand full of times and my response is always "are you really using print that much in your code base?"
You still use it in debuggers and in the REPL. Using it seldom makes it harder to relearn the muscle memory, and only hitting it when you're debugging makes it more likely that you're already frustrated when it happens.
This is what I encourage all of our team to do, it just seems like a no brainer. Having stray "testing" strings pop up in our logs without any source is hella annoying.
I've been writing mostly small python projects for 15 years, starting with 2.2.
I've had no issues in the migration to 3.
I did some building with it around 3.3, starting to commit around 3.4, and with 3.5 I build everything in it.
I don't have the performance challenges Instagram has, but my experience with application development in general is that 98% of performance challenges can be solved with (not-too) clever engineering. This applies to projects in every language.
There are a vanishingly small number of scenarios where the performance of your runtime actually dictates your performance limits.
If you're working on something and are worried about Python's performance, or which Python to use, don't. Use 3, optimize later.
I watched the PyCon keynote on this topic, and while it's nice to hear they've moved to Python3 their approach probably shouldn't be copied.
For example, in their codebase they had ambiguity between bytestrings and unicode strings. As Python3 tries to prevent you doing this, to resolve a big footgun from Python2.
The right fix here is to be consistent in your use of strings. Sometimes that is tricky because of how third party libraries have decided to implement their 2/3 compatibility, but it helps prevent shooting yourself in the foot with unicode bugs down the line.
Instagram did not do this. They created utility functions to force their data into the format they wanted at the point it is used. In other places they used tuple() to make sure that map calls that had side effects were fully iterated over.
In short, they had bad Python2 code and now have had Python3 code. Sometimes, at large scale, it's your only choice. But to smaller companies looking at this it's a bad idea. You're setting a precedent in your code that it's okay to make the same mistakes that Python3 tried to prevent.
So their server needs are growing faster than their user-base to the extent that they considered switching languages. PHP didn't seem to perform much better, so they stuck with python and got a ~12% CPU usage improvement by moving to python3. It doesn't seem like a 12% one-time improvement actually solves the original problem, though. Perhaps pypy would have been better?
Instagram also uses Cython a lot (so I've heard from a talk), so switching away from CPython might not provide as much of a speed up as one might otherwise expect, as one can get C levels of speed (and concurrency) with significant effort using Cython. Also, Cython and PyPy might not play so well together...
It needs severely more documentation, heh. Why would I use it over mypy, which appears to have a lot more community investment, thorough docs, and is somewhat official? It also doesn't support python3.6 so it doesn't know about format strings =/
You say that like it's a bad thing, but that's what a static language is - types are checked at compile time and, hopefully, forgotten about at runtime.
The information from annotations is exposed to the runtime, but nothing in the runtime treats them as types or performs any type checking. The feature was introduced as a generic way to annotate functions, without being constrained to a single use case (type checking), and all the tools which actually do type-checking based on annotations are third-party.
The CPython implementation does have a "compile" step (to produce bytecode, which is what actually gets executed, by a simple stack-based virtual machine), and does not do any type checking in that step no matter how many annotations you give it.
Any dynamic language - by definition, they check types at runtime.
Or, unfortunately, a lot of static languages. Any static language that allows type casting, for instance - that's the only way they can check whether a cast from a type to one of its descendants is valid (eg, in Java, casting an Object to a, say, URL).
The types do the actually speed up execution but I find them useful in various parts of my codebase when I work on a project (for exaple when defining classes which represent more advanced and complex data types for my specific project). It help staying organized and making sure your not ambiguously passing in wrong data types. Also pycharm uses them to understand custom classes and functions you define and provide suggestions.
I've been using TypeScript a lot recently and it has had a big impact on reducing bugs and making refactoring easier so something similar for Python looks great.
>It made sense that, if we were going to stay on Python for the next ten years, we should invest in the latest version of the language.
I don't know this stands out to me in particular, but the 10-year commitment is definitely a big decision. I suppose I've never had to make a similar decision so perhaps this is more common than I think.
Most successful application I've worked on have either already been around for 10 years, or have lived onto become 10 years old. The ones that didn't, were usually not successful.
It's a good reason to be careful to avoid the latest and greatest, but go for tech that has (some) track record of being maintained and used for a few years.
I'm not even sure what this means anymore. I guess I'm just not sure how any language, when used correctly, could be inherently unscalable. My guess is statements like this came from a time when monoliths were the application design of choice? Now, assuming Instagram has just 1,000 photo handling servers, each one is only responsible for 95,000 photos a day.
Of course, that's not to say that Instagram doesn't have CAP issues. It does, especially in the "C" area, but again, not a problem inherent in the language.