Hacker News new | past | comments | ask | show | jobs | submit login




Why did you ignore the link i gave you above?

If you follow that link, you'll see polars and parquet are a large highly configurable collection of tools for format manipulations across many HPC formats. Debian maintainers possibly don't want to bundle the entirety, as it would be vast.

Might this help you, though?

https://cloudsmith.io/~opencpn/repos/polar-prod/packages/det...



My question is "why isn't it in Debian?", I ask that since Debian has rather high standards and the absence from Debian suggests some quality issue in available libraries for the format or the format itself.

Are there dark secrets?


Could the dark secret you seek be "debian isn't for bleeding edge packages"?

it's very modern and perhaps hasn't been around long enough to have debian maintainers feel it's vetted.

for instance, documentation for Python bindings is more advanced than for Rust bindings, but the package itself uses Rust at the low level.


Parquet is what, 12 years old? Hardly cutting edge. What you say my well be true for polars (I'm not familiar with it), if/when it (or something else) does get packaged I'll give parquet another look ...


Pandas is probably in Debian and it can read parquet files. Polars is fairly new and under active development. It's a python library, I install those in $HOME/.local, as opposed to system wide. One can also install it in a venv. With pip you can also uninstall packages and keep things fairly tidy.


Pandas is in Debian but it cannot read parquet files itself, it uses 3rd party "engines" for that purpose and those are not available in Debian

  Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
    on linux
  Type "help", "copyright", "credits" or "license" for more 
    information.
  >>> import pandas
  >>> pandas.read_parquet('sample3.parquet')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/lib/python3/dist-packages/pandas/io/parquet.py", 
        line 493, in read_parquet
      impl = get_engine(engine)
    File "/usr/lib/python3/dist-packages/pandas/io/parquet.py", 
      line 53, in get_engine
    raise ImportError(
      ImportError: Unable to find a usable engine; tried using: 
      'pyarrow', 'fastparquet'.
    A suitable version of pyarrow or fastparquet is required for 
    parquet support.


Then the options you are left with are either polars via pip or a third party parquet-tools Debian package.

https://github.com/hangxie/parquet-tools/blob/main/USAGE.md#...


polars via Rust (cargo) also


Yes, i wasn't clear: it's the polars library that's actively changing, so that might be the issue, or just the vast set of optional components configurable on installation, which isn't the normal package manager experience.

FWIW i think i share your general aversion to _not_ using packages, just for the tidiness of installs and removals, though i'm on fedora and macos.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: