Yes, yes, yes! This is what I've been waiting for so long.
Python has emphasized readability for the reference implementation vs. the practical benefits of better performance for everyone, and the community is really hurting for it.
Please CPython, upstream this or something like it. There reaches a point where this whole zeal about readability becomes idealistic and hurts real-world pragmatic goals. It's hurting the language and we can do better.
Python can be ahead in the performance race. We just need to get real.
> Python can be ahead in the performance race. We just need to get real.
I don't know if this is the case. Python's entire value proposition is that it is "executable pseudocode" with an extremely low barrier to entry - so somebody like a scientist or business analyst can solve problems without a deep understanding of computer science. Those goals are always going to run counter to performance, and python would have to re-invent itself at a foundational level to actually compete on performance with languages which were optimized for it from the ground up. Python's already winning in a lot of super relevant ways from playing to its strengths, and I don't think it makes sense to compromise those strengths to reach middle-of-the-pack performance.
If you ask me, if the python community really wanted to advance their interests, the thing to focus on would be dependency management and project encapsulation. If I could have a ux like npm or cargo for python, I would surely be tempted to use it more outside of jupyter. But it would not be because of performance ;)
You and the parent are mistaken about the cause of Python’s performance problems. Python isn’t slow because it’s readable—it could be quite a lot faster without compromising readability. Python is slow because it exposes the entire CPython interpreter as the C-extension API, which means they can’t change much about the interpreter without breaking compatibility with some extensions (and they are unwilling to do so). Since performance is so bad, the community leans hard on C-extensions, which worsens the problem.
I suspect one of the reasons packaging is so bad is because for each node on the dependency tree, you need to download and execute setup.py in order to discover the node’s direct dependencies, and to get the transitive dependencies you must download and execute the direct dependencies’ setup.py filles. Since these files aren’t idempotent, we can’t reliably cache the results (these scripts could return different sets of dependencies based on the environment or even the current time or anything else). This at least seems to be the reason all reproducible package managers are slow (by which I mean 30+ minutes to resolve dependency versions for a small non-toy project even if the resolved dependencies at their property versions are already cache to disk).
I think you and the first poster are talking about different things. They are talking about readability of the cpython compiler and you are talking about readability of the codebase.
Python has historically refused to implement many optimizations because they reduce compiler readability.
> Python's entire value proposition is that it is "executable pseudocode" (...) so somebody like a scientist (...)
LOL'd hard at that.
What kind of scientist would write "import numpy as np" on their pseudocode? Or multiply matrices with the "@" operator? From the point of view of a scientist, Matlab/Octave, or even Fortran, is executable pseudocode. Numerical stuff in Python seems like an ugly kludge.
As a scientist who had to use MATLAB up until about 2013 because that's what everyone else used, it was such a relief to move to Python. It's true that you can implement a linear algebra routine in a couple fewer characters in MATLAB, but unless that's literally all you're doing, Python is much nicer to work with. The data structures in MATLAB are just a nightmare for general purpose programming, which makes things like loading and parsing data -- things that scientists often need to do -- just terrible. The fact that the notion of the "matrix" (i.e. a 2D array rather than a general ND array) is so deeply baked into everything is a huge headache (a scalar is a 1x1 matrix in MATLAB!). I'm also surprised anyone is particularly bothered by the @ operator. The asterisk for matrix multiplication seems roughly equally unheard of in nicely typeset / handwritten math (you would just write e.g. Ab for a matrix-vector multiplication with no inset operator).
I shudder when I think of my Matlab days, but i’ve programmed long enough in enough other languages to know that we have better options than either Matlab or Numpy/Pandas with respect to API design. We can also have much better performance than Python with many other languages.
I don't know why you got downvoted, but I agree that matrix manipulation in numpy look quite foreign to to somebody new to python/numpy. I had heard this "complain" from several colleagues (mostly former matlab-users). Several colleagues have migrated to Julia, which does a quite good job in producing easy readable numerical code. For instance, the dot-product can be written with the Unicode dot symbol (typed as \dot<tab>)
> matrix manipulation in numpy look quite foreign to to somebody new to python/numpy
Not only to newcomers. I've been using python+numpy for more than 10 years, nearly every day, and it despairs me as much as the first day, if not more. I relish the few moments that I get to write numeric algorithms in any other language.
Julia's startup time is not short, but it's getting significantly shorter with every release. Right now, the startup overhead is only 0.13 seconds for me.
Some packages still take a while to load, e.g. for me to load the plotting pacakge and produce the first plot takes about 8 seconds (all subsequent plots in that session are fast). This is down from like 30 seconds last year.
One can also bundle heft packages like Plots.jl into their julia system image so they don't have to recompile all that machinery every time they restart julia.
The startup performance issue is just a regular difficult issue, as far as I'm aware. I don't think it has much to do with the goal of "being readable pseudocode" or something. It's getting better with time too.
The "readable pseudocode" kind of code is exactly the sort of code in Julia that you almost always expect to be compiled down to native code quite efficiently. The kind of pseudocode I usually see is either straightforward loops iterating over something where the compiler can infer all the necessary information, or calls to library functions where somebody else has already made sure it's good. I use this a lot in my own code, and, like I said, I don't think there is a tradeoff.
So in my mind, the main things you would need to compete with npm would be:
1. near-universal adoption
2. "just works" experience: i.e. I can clone any random git repo, and run `npm start|build|whatever` and it's always going to work, without having to know anything about my environment or fiddling with configurations.
In most npm projects, you don't depend on C extensions. For python projects that don't depend on C extensions, you already have your "just works" experience with poetry.
If you really want a "just works" experience when dealing with C extensions, you should freeze your development environment one layer up (e.g. VMs, vagrant images, what have you) so that you can always successfully install and compile your C extensions because the underlying system is also kept under version control. And this is independent of whether or not you are working on a Poetry or an NPM based project.
It's not that hard. I suspect that you're just used to simpler conditions.
From my experience, with repos that have Poetry set up, I do get a "it just works" experience that I've previously been lacking when working with python. AFAIK there is no "scripts" section (for e.g. "poetry start" like "npm start") in the config file, or at least no one is using them. You'll have to take a quick peek into the README, but the same is true for npm projects, as the script verbs aren't really standardized.
You can use a "scripts" section in pyproject.toml (it's what poetry reads, and increasingly more Python tools are leveraging it, which is good), but in general each repository/project offers what it wants to offer as commands, nothing in comparison with the "standards in use" for npm and similar. Personally I have used it to give easy access to the "main thing I want to execute" for my own weird projects.
I'm pretty happy using Poetry, and agree with your initial point, partially with the second. There can be some edges with packages involving "anything" binary (to be fair, that is to be expected), and a big issue (for me) is locking the resolution of some libraries, like boto (since botocloud has hundreds of patch releases and the resolver can get pretty crazy unless you play a bit of manual bisection). But this only hits you on poetry lock or poetry add when developing, and only in some cases, I think it's a fair price to pay for a reproducible build.
> AFAIK there is no "scripts" section (for e.g. "poetry start" like "npm start") in the config file, or at least no one is using them.
There is a scripts section in pyproject.toml, which is leveraged by “poetry run”. This is different than for npm, though, since these aren’t dev-environment scripts but the executable scripts that are installed with the project
For the case of Python packages which are primarily exposing a command line script, alluded in the `start|build|whatever`, there is pipx https://github.com/pipxproject/pipx
It seems a lot faster than pipenv.
From the little I've tried, it seems a lot more sane - but I'm biased - I hated pipenv and never saw the point of it.
It used to be incredibly slow but these days I find it's not so bad. It even has parallel pip installs now which while occasionally triggering some issues in projects with over complicated setup.pys is usually a lot faster than a good old pip install -r requirements.txt.
So have you considered the possibility that it may not be the fault of these package managers, but that there may be a different problem underneath it all that a package manager cannot fix for you?
It's a very common misconception that readability is a tradeoff against performance.
The true underpinnings of readability is language malleability, nothing more and nothing less. The whole "dynamic languages are automatically more readable" impression is a misleading consequence of the much more general claim "Malleable languages are more readable", with a malleability = dynamicism substitution.
Malleability is key because readability is really just languages being as close as possible to the problems they are used to solve. This means one of two things
1)Either the language come preequipped with the concepts and semantics of the problem domain built into their fabric, this is the DSL approach (this will always fail if you aspire to be a "general purpose language", you simply can't hope to match the sheer number of contexts people want to use your language in. One radical conclusion is to abandon the "general purpose language" myth absolutely, and just make all application development language-building, and focus on building the tools to make language building easy.),
2)Or you make the language malleable and stretchy enough that a tiny handful of the rare programmer-domainExperts breed can construct the whole domain inside the langauge, _with_ _the_ _language_ (no transpiler, preprocessor, etc..., this would require the rare programmer-expert to be an even rarer programmer-expert-languageHacker), then cover up all the low level machinery with syntactic elements that mirror the domain vocabulary.
That's it. Any langauge that allows you to do the above is a malleable language. Any malleable language is a candidate for being a readable langauge (and a horribly unreadable langauge, if you gave in to irresponsible abstractions). Dynamic languages are malleable because they give you extremely powerful hooks into their semantics, they are very... well, dynamic. The details differ, the two poster childs are python and ruby, they have a grab bag of features that range from operator overloading in python, free form syntax in ruby, and extreme dynamic dispatch and resolution rules in both. The last feature is a common theme in all dynamic languages, and it happens to be a performance killer.
But you absolutely don't have to be dynamic to be malleable, lisps have been doing it since forever, and though lisps are traditionally dynamic, it's macros that make them stretchy, not dynamicism. And there are loads of other mechanims that can make a language malleable without making it unpredictable. It can be as simple as scala allowing unicode in it's source, and as complicated as haskell making lazy evaluation the default and now control structures are just a library. Any abstraction with a "meta" flavor, with hooks extending into the language environment and doing various things depending on how you wield them, makes the language more malleable.
Malleablity is allowing the language's source to take different shapes and semantics according to the whimes of the programmer, performance comes from massaging the source till it fits comfortably into machine semantics without much runtime shenanigans. They are absolutely not in tension, making the code do everything at runtime is just the easy way out.
So much stuff just from the readme would introduce breaking changes to the Python ecosystem.
Part of the reason Instagram can get away with this is they likely have almost complete control of their dependencies and the like. But the changes they decide to make would not just work for everything.
That being said I think we could get a lot of neat PEPs from this.
EDIT: this is just my opinion , but in a world where we have type annotations, JITs feel like a massive step back. Stuff like mypyc could get us way further into high performance stuff (and no black box hand waving for perf things).
> So much stuff just from the readme would introduce breaking changes to the Python ecosystem.
Being compatible with the rest of the Python ecosystem is the main reason why Cinder is built on top of CPython. Although yes, some features are indeed very experimental.
> in a world where we have type annotations, JITs feel like a massive step back. Stuff like mypyc could get us way further into high performance stuff
Ah, but that introduces a separate compilation step, which may not be tolerable in every situation.
Why would developers have to interact with a mypyc step any more than the pyc step? Why is “developers might have to interact with it” some kind of non-starter, as though having a compile phase is a worse evil than a hyper-slow language?
FWIW, I think we could probably buy ourselves a lot of latitude to optimize CPython by designating a much smaller API surface (like h.py) and then optimizations largely won’t have to worry about breaking compatibility with C-extensions (which seems to be the biggest reason CPython is unoptimized).
But in general I’ve lost faith in the maintainers’ leadership to drive through this kind of change (or similarly, to fix package management), so I’ve moved on to greener pastures (Go for the most part, with some Rust here and there) and everything is just so easy nowadays compared to my ~15 years as a Python developer.
> Why is “developers might have to interact with it” some kind of non-starter, as though having a compile phase is a worse evil than a hyper-slow language?
For big monoliths (like ours at IG), the server start-up can take more than 10sec, which is already super high for a "edit -> refresh" workflow. Introducing a Cython like compilation step is really a major drawback for every single developer.
For smaller projects, Cython works extremely well (and we do use it for places where we need to interface with C/C++).
> For big monoliths (like ours at IG), the server start-up can take more than 10sec, which is already super high for a "edit -> refresh" workflow. Introducing a Cython like compilation step is really a major drawback for every single developer.
So we weren’t talking about Cython specifically, but something Cython-like, i.e., we’re not talking about Cython’s special syntax but rather ordinary Python. This is important because it means that dev builds execute against CPython directly (i.e., your code begins executing immediately) while production builds use our hypothetical AOT compiler.
According to the pypy people, the type annotations also don't provide the kind of information the JIT wants. The JIT cares about things like "this integer fits into a machine word" and "this parameter can be None but is almost always not None" so on. The type system doesn't concern itself with that.
It’s definitely true that there’s stuff a JIT would love to know that the type system can’t tell you. But that doesn’t mean there isn’t useful information available in type annotations. In particular it is possible to speed up attribute access and method/function calls a lot when target types are statically known.
Our approach to “int fits in a machine word” in Static Python is to require machine ints to be explicitly annotated as such, and then you opt in to limited size ints with overflow etc, in exchange for getting really fast arithmetic.
I mean if you care about perf you can do it in one step. It’s not that huge of an ask, and engineering wise “compile some typed code” is a hell of a lot more straightforward than “try to guess what I should speculatively compile at runtime”
There was a piece by v8 devs a while back where they ended up turning down JIT aggressiveness cuz it ended up making normal web pages slower (except for like Google Docs).
This is just me backseat-BDFLing “why do the hard thing when we could do the easy thing” tho
Typing a large legacy codebase written in a highly dynamic language is not as easy. Instagram is pretty large.
I have seen a very large codebase in a dynamic language at [redacted] that has been mostly converted to use a sophisticated type system. Certain core things have not been converted so far, though, but just marked with a lot of "proceed with caution" and "TODO" red flags. Making them typesafe in any conceivable way would break compatibility for large subsystems.
Sometimes a rewrite is the only way out of such a situation, but very few can afford it.
> Typing a large legacy codebase written in a highly dynamic language is not as easy.
> very few can afford it.
Starting a project in a highly dynamic language is a conscious choice with well-known properties. The second part is interesting, however. Essentially you are saying that gains from development velocity do not offset losses from technical debt, which is a strange observation. I would say it sort of hints at time to MVP being irrelevant business metric (startup runway/survivability aside) at best. This is contrary to typical startup truths.
At a certain scale, they do. Case in point: Twitter. They went through a painful transition but they did not have much choice.
But sometimes they don't. Case in point: YouTube, Instagram. Both of them are slowly migrating more and more Python code to different languages, but the amount of Python is still large, and AFAIK neither has plans to eschew it completely.
Time to MVP is a very relevant business metric. But the same thing that speeds you up in the beginning slows you down later on. If you architect your code past MVP to help replace the implementations of critical paths, it may help you down the line. (Micro)service architecture is one way to do that.
With Static Python we use type annotations in compilation, and we require them to be correct (they are runtime checked at boundaries with non-Static code and throw TypeError if wrong types are passed in.)
We haven’t gone the hidden classes route so far because it’s simpler to just look at attributes assigned in __init__ and annotated on the class and lock those into slots. If you’re writing typed Python you probably don’t do a lot of tacking extra ad hoc attributes onto instances, since type checkers don’t like that either.
> Please CPython, upstream this or something like it. There reaches a point where this whole zeal about readability becomes idealistic and hurts real-world pragmatic goals. It's hurting the language and we can do better.
I don’t know here you are picking this from. What’s hurting performance is not readability, it is the flexibility of what you can do (hence: strict modules/static python in Cinder) which means it is really hard to perform optimizations as no assumptions hold unless you turn down the more permissive language features.
Another one is backwards compatibility, and I might get some flak from people still complaining about it more than 10 years after python 3.0, but things are still 99.9% compatible with decades-old code in python 3.9, and you can’t throw that away.
Finally, the numbers of people dedicated to improving CPython performance in an upstreamable fashion is ridiculously low compared to other languages, and especially compared to the number of businesses using it. There are quite few people working on performance, but not many of them appear to be doing it in the open and with the goal of upstreaming it in a non-breaking manner (kudos to the instagram engineers, I guess).
> Finally, the numbers of people dedicated to improving CPython performance in an upstreamable fashion is ridiculously low compared to other languages
Yeah because Guido and the maintainers have proclaimed multiple times that they prefer a simple implementation for teachability, over performance minded design. That’s where this entire complaint is coming from.
This is why I see Julia's uptake as positive, because if that isn't going to change the community aversion to these attempts and will rather keep writing C, nothing else will.
I was doing a Hacker Rank puzzle yesterday in Python and after I optimised it, it still timed out on one input. Looked at the chat, and saw a comment "just run it with PyPy" and naturally it then passed all parts.
It's easy to forget that CPython is not that optimised generally, but as soon as you need to deal with algorithms it is lacking often.
I think even that the fact there are so many C implementations in libraries shows that the lack of optimisation for the sake of clean compiler code manifests as a readability / complexity issues elsewhere.
> It's easy to forget that CPython is not that optimised generally,
"Not that optimised" is actually an understatement. In my experience, CPU-heavy code typically becomes 10-100x faster if you port it from Python to a language like C (without even trying to be clever or using SIMD assembly).
That would indeed be fantastic. V8 has had so much optimization whereas CPython has had very little of it, though Pypy deserves a mention here. They've been doing fantastic work, but I guess there's always the curse of the reference implementation being the popular one.
I disagree, it was the Nim syntax that kept me away from the language back in the Uni when I tried it out. If anything, C syntax is barely as easy to read.
Python has emphasized readability for the reference implementation vs. the practical benefits of better performance for everyone, and the community is really hurting for it.
Please CPython, upstream this or something like it. There reaches a point where this whole zeal about readability becomes idealistic and hurts real-world pragmatic goals. It's hurting the language and we can do better.
Python can be ahead in the performance race. We just need to get real.