Hacker News new | past | comments | ask | show | jobs | submit login

> This directly shows a clear advantage over Pandas for instance, where there is no clear distinction between a float NaN and missing data, where they really should represent different things.

Not true anymore:

> Starting from pandas 1.0, an experimental pd.NA value (singleton) is available to represent scalar missing values. At this moment, it is used in the nullable integer, boolean and dedicated string data types as the missing value indicator.

> The goal of pd.NA is provide a “missing” indicator that can be used consistently across data types (instead of np.nan, None or pd.NaT depending on the data type).

(https://pandas.pydata.org/pandas-docs/stable/user_guide/miss...)




I wonder how pandas can both be at version 1.0 and have a an experimental feature for something so central. Honest question


Because Pandas is built on top of NumPy and NumPy has never had a proper NA value. I would call that a serious design problem in NumPy, but it seems to be difficult to fix. There have been multiple NEPs (NumPy Enhancement Proposals) over the years, but they haven't gone anywhere. Probably since things are not moving along in NumPy, a lot of development that should logically happen at the NumPy level is now happening in Pandas. But, I agree, I find it baffling how Python has gotten so big in data science and been around so long without having proper NA support.

https://numpy.org/neps/#deferred-and-superseded-neps


It‘s at version 1.0 because it has a mature and stable interface. That does not mean that it cannot have experimental features which are not part of that stable interface.


And much more recently (December 26, 2020): https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.2.0...

> Experimental nullable data types for float data

> We’ve added Float32Dtype / Float64Dtype and FloatingArray. These are extension data types dedicated to floating point data that can hold the pd.NA missing value indicator (GH32265, GH34307).

> While the default float data type already supports missing values using np.nan, these new data types use pd.NA (and its corresponding behavior) as the missing value indicator, in line with the already existing nullable integer and boolean data types.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: