Hacker News new | past | comments | ask | show | jobs | submit login
Cython 3.0 Released (cython.readthedocs.io)
221 points by ngoldbaum on July 18, 2023 | hide | past | favorite | 82 comments



Amazing to read that most comments here are about broken build dependencies. I completely get the frustration, but somewhat sad to see that there's not that much talk about the actual improvements.

There seem to be a lot of cool things in this. For example, semantics for division, power operator, print, classes, types and subscripting are now identical with Python 3. https://cython.readthedocs.io/en/latest/src/changes.html#imp...

Improved interaction with numpy by moving that part of the code into numpy itself: https://cython.readthedocs.io/en/latest/src/changes.html#int...

Improved support for const and volatile in C, and improved support for C++ expressions like std::move https://cython.readthedocs.io/en/latest/src/changes.html#id6

I love how the changelog also shows the bugfixes and improvements that each of these changes enabled. Hats off to the team behind this release.


A project I work on is currently working through a bunch of performance issues that arose because we had inlined functions in `pxd` files that we did not declare `noexcept nogil`. So, heads up if you see similar regressions! We saw a ridiculous slowdown (7s to 100m, for one suite) because they were inside some awfully tight loops. But the fix was pretty straightforward once we worked it out.


Yep, my yesterday was also filled with figuring out this issue and changing these annotations. Heads up that the default exception forwarding also breaks function pointer types if you share them with the wrapped C code.


And chaos ensued. Unable to install aws-cli via pip: https://github.com/aws/aws-cli/issues/8036


Mostly because of PyYAML: https://github.com/yaml/pyyaml/issues/724

I was pretty annoyed to discover that it had a separate build-time dependency on cython that you can't control from your requirements.txt.


That's unfortunately a pip limitation. You can either disable build isolation so that the build requirements are using your current environment or you can use the PIP_CONSTRAINTS env var which will be passed through to the isolated build invocation allowing you to constrain the build requirements. In this case PyYAML did release a 6.0.1 version which contains an upper bound constraint on Cython.


Wonder if Cython didn't think of checking with their biggest "customers" for compatibility


PyYAML knew about the breakage since january 2022[0], and nothing really happened. After a year and a half with lots of alphas and betas, I don't think there is much cython could do, short of fixing PyYAML themselves.

[0]: https://github.com/yaml/pyyaml/issues/601


That's what I ended up doing just for the pyyaml installation.


In Spack [1] we can express all these constraints for the dependency solver, and we also try to always re-cythonize sources [2]. The latter is because bundled cythonized files are sometimes forward incompatible with Python, so it's better to just regenerate those with an up to date cython.

[1] https://github.com/spack/spack/ [2] https://github.com/spack/spack/pull/35995


That's what PyYAML does as well. It uses PEP 518 [1] to specify the dependencies for the build which PyYAML has included Cython [2]. It's just that for previous releases there was no upper bound here so pip and other tools just selected the latest version which was incompatible. In the past PyYAML included the cythonised .c files in the sdist but as of 5.4.0 they went the PEP 518 route and ensures the client will cythonise them if installing from the sdist.

[1] - https://peps.python.org/pep-0518/ [2] - https://github.com/yaml/pyyaml/blob/release/6.0/pyproject.to...


My impression is that the Python ecosystem rarely specifies upperbounds on dependencies even if they follow semver.

In the Julia ecosystem it's the default, and you basically have to release a new patch version for updated compat bounds with your dependencies. This is much more stable. It works because the process is mostly automated: dependency releases new version => a bot opens a PR on your repo updating the compat bound, you just merge it.


Neat! I had no idea that Spack can be used for cythonize dependencies. I use it to manage my local compiler/CUDA/Trilinos stack but never considered it for non C++ things.


Oh great, yet _another_ Python package manager.


It’s not a python package manager. It is a generalized package manager written in python.


Spack is a lot more than just python package manager :)


I like how the log in the issue also has another warning for breaking changes in it

    The license_file parameter is deprecated, use license_files instead.
  
    By 2023-Oct-30, you need to update your project and remove deprecated calls
    or your builds will no longer be supported.
  
    See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
The level of churn and breakage in these basal python packaging packages is honestly insane.


I am reluctant to refer you to aws-cli V2 because it is way more involved to install. No idea why aws moved away from their original distribution method... However, the V2 is going to be the way to go moving forwards :/


>No idea why aws moved away from their original distribution method

Didn't they move away exactly because of issues like this? As I understand it, aws-cli v2 is still written in Python but ships everything it needs self-contained so it can avoid Python dependency hell[1]

1: https://www.youtube.com/watch?v=U5y7JI_mHk8#t=3m40


Funnily enough, I didn’t have issues with Python’s dependency hell before working for AWS, not because we have more dependencies, but because… reasons.


Is it special files that begin with a capital letter, version sets, version filter instances, and massive symlink farms?


If you already know that much then you don’t need a response ;)


Indeed!


    cat -n shell.nix
    { pkgs ? import <nixpkgs> {}
    }:
    pkgs.mkShell {
        name  = "dev-env";
        buildInputs = with pkgs; [
            awscli2
        ];
   }
And then you can have it available when entering shell with nix-shell.


And this is why you pin your dependencies + never install the latest version of anything.


The lesson here is different: package developers must put upperbounds on major versions of dependencies that follow semver.


Should this be the take away? It looks to me that people are building pipelines without taking into account that they change roles every time they have to build a package, from a package consumer to a package packager. Even if the package developers set an upper bound on the build dependency it is the packagers responsibility to provide a deterministic build environment.


The release of this pretty much hosed my morning at work. Suddenly CI wouldn’t build because this broke things in pyyaml and awscli.

And then I got to re-learn how very bad Poetry and python dependency management is. I’m trying to update pyyaml to 6.0.1 and Poetry systematically downloaded every single major, minor, patch version of another dependency in trying to find which would fit the version constraints. Took half an hour.


Why are you blindly upgrading major versions of your tools?

Sounds like fundamental design issue more than anything.


It's a fundamental design issue with Python package management that makes it impossible to pin the version of build-time dependencies. You can do everything seemingly right, with pip-tools creating lockfiles containing hashes of downloaded packages, stuff seems to be stable for a year+. And then suddenly your build starts failing because the lockfiles only contain runtime-dependencies, not build-time-dependencies, and some library you are depending on just was silently downloading the latest version of some package you've never heard of, until it broke.

The only semi-sane workaround I know of is to use pip's `--only-binary` option to prevent any automatic builds at package installation time. Then you usually will also need your own package server for storing precompiled wheels for those third-party-dependencies that don't published compiled packages on pypi. This way you build the packages using some random tool version only once, so if it works the first time it will keep working.

In other language ecosystems were the package management isn't built on a tower of shit, this problem doesn't exist in the first place.


>you are depending on just was silently downloading the latest version of some package you've never heard of, until it broke.

How? Are you running `pip install` every time you run your pipeline? Why do you need to constantly be reinstalling your packages?


Doesn't poetry have a solution for this?


The usual methods to control versions with a lockfile don’t work here because it’s a build dependency. You’d need to disable build isolation and install the right build tools in your environment before installing pyyaml but that can become a mess real quick.


Version dependencies aren’t propagated and can be outside your local control. As an example elsewhere in this thread: PyYAML did not have a pinned Cython dependency with a maximum version before this, so if your project depended on a pinned version of PyYAML it would break when trying to build it after Cython 3.0.0 released.


Why are you constantly rebuilding dependencies? Surely you should only rebuild if there is a new (and vetted) version


Lmao you’d think. But this is not the default or recommended pip/setuptools behavior: https://pip.pypa.io/en/stable/reference/build-system/pyproje...


You'd hope people would learn this time, but they'll probably just keep blaming their tools...


If you're using poetry, why not just add a `cython = "<3.0"` to your pyproject.toml file? You can lock down subdependency versions like that, and presumably the versions of everything that worked yesterday still work today.


If a top level dependency has build time dependencies then those do not respect the version locks of your pyproject.toml unless you disable build isolation.


It’s not part of my dependencies and it wasn’t on my radar. What I learned was that pyyaml had a new version that fixed the error being thrown by pyyaml.


> Poetry systematically downloaded every single major, minor, patch version of another dependency in trying to find which would fit the version constraints.

Great news! Better do that then have to do it manually

But yes it sucks, still Poetry is much better than Pip (where it just gives up and tell you to go deal with the issue yourself!)

Now why do people care about Cython and especially in things like dependencies I don't understand. awscli shoud be an independent package! "oh but it is faster" are you counting the time it takes to fix things when they break because you are depending on a fragile thing?


Other systems can do this with far less effort. It’s really a total joke.


Well the way npm solves this is having every package with its own copy of dependencies, maybe that's the way python should go, but I'm not sure if other package managers are much better than that (let's say Cargo in Rust)


And that's pretty much why you use binary package distributions like conda, despite all its flaws a sudden release of cython will never break your deps builds because there's nothing to build.


You can force pip to only use wheels. Then if something fails, build your own wheel in your own (pinned) environment. Could be done in multi-stage docker build, for example.

If one of the build deps has no wheel then you're fucked, though. At some level you have to rely on a toolchain.


I think it doesn’t have a choice in downloading all versions if they use a setup.py config file. You have to get the sources to run it and get the package’s metadata.


I get that if I adorn my python3 with types the right way, cython can uplift C betterer, but if I don't adorn my python3 and only use core imports (no pip) do I get any benefit?

I have some humongous (for me) dict lookups I am doing on strings, counting in ints, and if I could 2x or 3x faster without having to recode I'd love it. The Cython web is directed at people willing to recode, I am ultimately willing but I wish there was some low hanging fruit evidence: is this even plausibly going to get faster or is dict() walk cost just "it is what it is" ?

Oddly, it looks like across pickle dump/load I get some improvement. Does pickle restoring consume less memory than raw-made?


You will ~likely get some improvement by directly putting your python code in a .pyx file and compiling it (without using any cdef functions). In most cases it will be as if you were calling the Python C API directly in a compiled C file. In my experience, the biggest difference will be a reduction in the function call overhead so it is better for tight loops.

I wouldn't expect a full 2x in every case. But it can also be very significant in cases where Cython deduces the type of local variables.


Cython 3 also uses python type annotaions to infer C types now too, so if you add python types you might see surprising cython speedups.


Coming back to this thread, I am running one now. It's too early to say if there's a speedup.

But I want to note it is ridiculously under-documented how to make what I would call a functional a.out from this toolset.

You basically are assumed to run python3 >>> import 'mything' and then run mything.main(args) instead of getting a functional commandline tool which "just runs" -the stepover is small, true, but I would have thought the PRIMARY use case here is commandline executable, not "embed .so in my interpreter"

There are a tonne of stack overflow "how do I" and all of them are really whacked: everyone trips over the same problems of :main undefined and a swag of other problems trying to call (g)cc themselves.


I guess it’s mismatched expectations. IMO the primary use-case is making python wrappers for C/C++ code or converting prototype Python code into a fast C extension for production use by adding types or progressively converting code into a language that is a mix of Python and C.


That's fair. Many happy customers so it's very likely I misunderstood the mission and vision.

My primary experience in Python3 is to minimise REPL and get to commands which do stdin -> <process it> -> stdout and gang them up, or uplift discrete commands into yield() pipes inside the same interpreter.

I guess I fell into the trap of going from the particular to the general.


It’s designed for building libraries really. You write functions in the foo.pyx file, Cython generates a foo.c file which gets compiled to foo.cpython-39m-x86_64-linux-gnu and then in your script you do “import foo”


I think in your case you might get a better benefit of using pypy and its jit.

The way I look at Cython is that it is a tool that allows me to write C code using python syntax. The biggest gains you get is when you use annotation to specify use simple (C) types with their drawbacks (like overflow etc) and or python features like exceptions, GIL etc


Thanks. I'm exploring that option, fixing bugs as I go too!


If you're happy to add some types you might also get some speed ups using mypyc. Another option might be to switch your dicts to pandas series; you might be iterating over several lookups in python and with pandas you might be able to switch those into functions written and compiled in C.


Did you try pypy.org ?


Yes, and I am still exploring. It is possible I have some scaling issues in dictionary which are not about compiled or not but about the cost of mutating data in a dict and gc costs.


Things like key lookup are already well optimized. So finding a more efficient algorithm might yield easier performance gains.

If you have for loops inside for loops, that's one place to attack. dict iterators, etc.


From https://cython.readthedocs.io/en/latest/src/changes.html#com... :

> Since Cython 3.0.0 started development, CPython 3.8-3.11 were released. All these are supported in Cython, including experimental support for the in-development CPython 3.12


This is a somewhat tangential question to the new release, but there might be folks here that can answer this question.

Having used swig to create Python bindings for C++ code over 10 years ago, what’s the recommended way to do this in 2023? There’s still swig, there’s Cython, pybind11 and a few others. I do know swig and how complicated it can get with footguns abound when things grow more complex.

Is Cython the way to go? How does it hold up to the alternatives? Google search gives many articles on the topic, but many typical SEO optimized low-value, and those that do show a bit of depth, focus on the basic mechanics, not really on things hold up for larger projects…


I haven't used Cython too much, it does look really interesting but the translation layer worries me a little bit for more complex modules. However, I've been using pybind11 extensively and it's a delight. Well designed, documented, predictable, removes a massive amount of boilerplate, integrates perfectly well with C++ idioms (e.g., smart pointers), and doesn't completely lock you in, as you can still call the C API in a regular way.


Thank you! - I’ll give pybind11 a go.


Cython is the most general tool. It can be used to do anything from making bindings from C/Cpp to Python, Python to C/Cpp, to writing compiled “Python-like” code in an intermediate layer that can be used for managing your wrappers or just writing performant code.

If you just want the ability to provide a Python interface to a C/Cpp library PyBind11 will get you there in fewer LoC than Cython. Nanobind is an even lighter weight option.

I’ve heard Swig is a pain to use.


Thank you! Swig indeed can be a pain but having used it before I have become somewhat blind to it. But eg smart pointers are not easy to deal with well, I’ve found out recently… I’ll have a look at pybind11. I’ve worked on Cython codebases too, which indeed allows to really nicely compile Python code and interact with c code. It does get weird when using eg pyqt and native qt…


If all you need/want is to call c++ code from python then pybind11 is the way to go. Cython really comes into its own when you have some existing python code you want to 'port' to a C extension.


Thank you! For now I am just binding c++ to Python but I expect/fear the lines might start blurring, so cython might come in handy then.


It’s a little easier to write idiomatic python bindings for a C/C++ library in Cython IMO, because you’re writing the bindings in a language that’s almost python.


The problem with SWIG bindings I’ve used is that they don’t have any type hints. They also don’t offer context managers to handle resources, so it’s a pain to use safely in Python.

From the user POV, the best bindings I’ve seen were wrappers with a Python API that calls C++ using Cython.


Cython is really great at a small scale, but it has issues scaling up to large codebases (of Cython), where I think it's an anti-pattern. None of these are the fault of the Cython creators, it's really an extremely useful tool, it's just inherent in the design space, the things that make it so cool (transparently mix C and Python!) come with trade-offs.

1. Memory unsafety. It's still C or C++ in the end, the more you shift your code in that direction the easier it is to screw up.

2. Two compiler passes: first Cython->C, then C->machine code. This means some errors only get caught in second pass, when it's much harder to match back to the original code. Extra bad when using C++. Perhaps Cython 3 made this better, but it's a very hard problem to solve.

3. Lack of tooling. IDE support, linting, autoformatting... it's all much less extensive than alternatives.

4. Python only. Polars is written in Rust, so you can use it in Rust and Python, and there's work on JavaScript and R bindings. Large Cython code bases are Python only, which makes them less useful.

Long version: https://pythonspeed.com/articles/cython-limitations/


The tooling support isn’t ideal, but there are jump to definition extensions for Cython in vscode.


Comments confirmed—Python packaging is broken


This is probably one of the most anticipated release in the one recent times for all python ecosystem. A sincere thanks to handful (literally) of people who made this possible and despite all kinds of funding challenges.

Nevermind the foo broke bar comments here.


This broke a lot of stuff for us at work today. Building the wheels or something I can’t remember. The fix was forcing pip to use not the 3.x and use something older I think.


Cython did everything right here since it used a major version number to signify the change.

Other projets not pinning their dependencies is not their fault!



Pinning depends on the requirements you as the consumer have, and the specific package manager (pip, conda, poetry, etc.) so it's hard to give guidance that's helpful when you're a library maintainer. Generally when consuming any dependency you should pin though...


Just so anyone who reads this has the version: cython<=0.29.36 is the most recent non 3.0.0 version.


As always: pin your dependencies.


Sadly, it's not always that simple. If one of your well-pinned packages has a build dependency which is not correctly pinned, then you're subject to the problem.


You raise a great point. I believe cached builds, say with Nix, are the way to go. What do you do?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: