Hacker News new | past | comments | ask | show | jobs | submit login
Arrow v1.0: After 8 years, even better dates and times for Python (github.com/arrow-py)
216 points by krisfremen on Feb 27, 2021 | hide | past | favorite | 76 comments



The only working datetime solution is JSR310.

It has separate types for

• LocalTime (milliseconds after midnight)

• LocalDate (julian day),

• LocalDateTime (julian day and milliseconds after midnight),

• Instant (nanoseconds since EPOCH),

• ZonedDateTime (which is a point in time with a timezone, and exposes both the localdatetime and the instant APIs)

All of these types have a reason to exist, and definitely shouldn't be mixed together (like python does).

• If I program an alarm clock, I'll want to use LocalTime. Even if DST starts/ends, you still wanna get woken up at 8am.

• My birthday is a LocalDate. It usually has no time information, and usually isn't depending on a timezone. An anniversary is the same.

• The moment when a loggable event happened is an Instant. The event doesn't care about timezones, only the exact time it happened.

• A calendar event is a ZonedDateTime: the meeting will happen at 10am CET, regardless of which timezone I'm in at that point.

All these types describe different concepts, and they shouldn't be mixed together. Every language should adopt JSR310. Yes, it's relatively complicated, but it's extremely precise and accurate.

And you can convert them all!

•LocalDate + LocalTime becomes LocalDateTime

• LocalDateTime.atZone(ZoneId or ZoneOffset) becomes ZonedDateTime

• Instant.atZone(ZoneId or ZoneOffset) becomes ZonedDateTime

• ZonedDateTime.toInstant() or ZonedDateTime.toLocalDateTime() exposes local datetime and instant.

• LocalDate can easilybe converted to JapaneseDate, HebrewDate, ArabicDate.


JSR 310's Instant class and notion of absolute time has always left me wondering whether any time library in any language attempts to handle relativity. Does anybody know of any?


If I'm not mistaken "chrono"[1] behaves almost the same. It calls Local[Date|Time|DateTime] Naive[Date|Time|DateTime]. ZonedDateTime is just called DateTime. It doesn't have a separate type for Instants but can convert (zoned) DateTimes to and from UNIX timestamps.

[1] https://docs.rs/chrono/0.4.19/chrono/index.html

Edit: it seems like chrono's design took lessons learned from other time libraries, including JSR-310, into account, so that explains why they are so similar.


Joda's Instant corresponds to `std::time::SystemTime` in Rust. It was never intentional (source: I designed Chrono and it existed before `std::time`), but currently Chrono fills the calendar date and time while the standard library fills the monotonic and wall-clock timestamps.

To be honest, I think JSR-310 is not correct in every regard and its implementation of every ISO 8601 format is a mistake. I even started out Chrono without referring to JSR-310 (because I have designed other date and time libraries in the past). But it is much better than, say, Python datetime to which is being compared.


IMO SystemTime should be avoided in its current state. The API is extremely limited and annoying to use for no good reason. And what's worse is that both precision and range depend on the platform.


I'm interested in learning more about the ISO 8601 format issue. Can I read more about that anywhere?


JSR-310 contains lesser-useful fragments of ISO 8601: MonthDay, Period, Year, YearMonth, Month. They might be useful if they are agnostic to calendar systems, but they are actually just for ISO 8601.

Period is doubly problematic. It is important to realize that ISO 8601 is the standard for interchange formats, not the standard for computer data types (compare to, say, IEEE 754). As a result you need additional information to make it a proper data type standard. For ISO 8601 period you would need the actual algorithm to add it to a time point. What happens if you add 1 month (P1M) to January 30, 2021 (2021-01-30)? Would it be 2021-02-28 or 2021-03-02? Should we give both options to end users? You need a set of pluggable policies to make it work. JSR-310 Period class is pretty lacking in this regard. (In fact, pretty much every implementation of "relative delta" has the same problem. It can't be readily used in the business logic.)


Yeah, Chrono is also really nice, indeed. Didn’t know it was inspired by JSR310, but it makes sense now that I think about it :)

The new ES Temporal API is also inspired by the same API.


I think this crate should be used now https://time-rs.github.io/time/index.html


I haven't heard of it before! Do you know what sets it apart from chrono? It doesn't mention anything in the readme as far as I can tell.


The biggest difference between time-rs and Chrono to me is the inlined time zone type parameter. Chrono's `DateTime<FixedOffset>` is called `OffsetDateTime` in time-rs. The intention was to be able to specialize `DateTime` to the permanent UTC time zone and avoid any offset calculation, but this idea has never caught on. Also time-rs has a luxury of procedural macros which didn't exist when Chrono was first designed.


> LocalDate (julian day)

Here, does julian day mean day number since 4713 BC or day number since the beginning of current year?

https://en.wikipedia.org/wiki/Julian_day


LocalDate is defined as the mean day since 4713 BC, yes.

This allows you to specify year, month, and day easily, and expose it as hebrew date, japanese date, julian date or gregorian date easily.

For a day relative to the current year, there’s a separate type.


How to nicely represent all these in the database layer where most things end up living at the end of the day. I have't really found a good recommendation on that to be honest specially the whole LocalDate/Time thing. Anyone has recommendations/documents to help guide that?


I find that storing ZonedDateTime in your database as a UTC epochTime (long or int datatype depending on precision required) handles the use case where you need to store a precise moment in time, and is guaranteed to work across DB implementations.


Alternatively you can use TIMESTAMP and TIMESTAMPTZ types in postgres for instant and ZonedDateTime, and serialize local date, local time, and localdatetime either to iso 8601 pr their underlying fields (millis after midnight and julian date)


What other runtime libraries besides Java implement JSR310?


The JS Temporal type is basically a clone of it, there’s a C# implementation, and rust’s chrono is very similar (but without the problematic parts).


Not to be confused with Apache Arrow's Python bindings (pyarrow)... https://arrow.apache.org/docs/python/


After a quick search seems like this lib actually predates `pyarrow` by quite a bit.


but does it predate Apache Arrow?


Yes, by 2-3 years, as its first release was in 2012, and Arrow was still a concept around 2013 (you can see what Wes was talking about during the talks he gave then).


Too many modules: datetime, time, calendar, dateutil, pytz and more

Yes. [1] Back in 2012, I filed a Python bug report that "datetime" has a formatter for ISO 8601 format strings, but no parser. I'd found four parsers, all with serious bugs.

After six years of discussion, it was fixed.[1]

[1] https://bugs.python.org/issue15873


It still does not parse all ISO 8601 format strings: "ISO 8601, RFC 3339 and datetime.isoformat() are three slightly different and in some senses incompatible datetime serialization formats" https://bugs.python.org/issue35829 For example it rejects typical strings produced by Javascript using Z instead of +00:00 as the timezone suffix.


Is there a bug filed for this?


IIRC this was considered out of scope. fromisoformat is defined as the functional inverse of toisoformat, and since the latter never emits Zulu, the former does not parse it. The only “reasonable” fix would be to rename those functions to something else, but every report on this I’ve seen stop right here. Core maintainers are not really interested in changing this until someone makes them, and nobody really demanded the rename.


Neat, I had an impact on that. I found your bug after also needing a matching parser and went on the mailing list to ask for it. Had to come back twice more to complain and get people to focus and get the ball rolling. Perfect is the enemy of the good in a nutshell.

(8601 is a huge spec, and we only needed a matching parser to start with. It is now free to grow.)


This subject does seems to attract bikeshedding.


IIRC it was, what about weeks? Time periods? Offsets vs zones? (Other esoteric features.)

"Just have it parse .isoformat() and ship it!!"


The actual specification ISO 8601 costs like 300 bucks[0] to get a copy of, so hardly any developers implementing time libraries have ever read it.

The reason ISO 8601 handling is broken is because all these libraries are implementing against a wikipedia summary of a spec.

It's a strong argument for free, open standards.

[0] https://shop.bsigroup.com/ProductDetail/?pid=000000000030078...


If four popular parsers all had serious bugs, 6 years seems not too shabby.


Arrow is the only Python library with date-aware ceiling and flooring methods.

I wouldn't have been able to design and implement Scaleway’s billing pipeline in 2013 without those.

More details at: https://kevin.deldycke.com/2020/10/billing-pipeline-critical...


That's a nice write up, Antoine's post is insightful as well!

Spice must flow!


> Too many modules: datetime, time, calendar, dateutil, pytz and more

Worse...too many modules included in the standard Python library with slightly incompatible implementations of the same functions.

I spent quite a while trying to understand why strptime was not working like I thought it should, before figuring out that I was using time.strptime when I should have been using datetime.strptime.

The difference that got me was that time.strptime was ignoring timezones in the format string and input string and assuming the local timezone. It was not an error to include %z in the format string and give it a time string that includes a timezone. It would simply silently be ignored and the local timezone used instead.


I am an amateur programmer who started with Python some 10 years ago. I had two revelations when it comes to modules:

- requests

- arrow

Dates in Python are so fucked up that it is beyond imagination. I understand that there are philosophical reasons for that but I do not care. I need to compare dates, add dates, do sometime < now < someothertime, etc.

All this without a PhD in Time Pythonlology.

When I discovered arrow I fell in love and now I use it for anything. Need to output an ISO date? arrow. Need to output a timestamp? arrow. Need to compare two dates? arrow. arrow. arrow.

Yes, this is a hammer for any date nail *but it works*.

There are some great programs (such as Borg) that were hit by the monstruosities in pythin dates and eneded up with some half-baked semi-ISO output without timezones. Has the author used arrow by default he would not have gone wrong.

The main dev is great and a bit stubborn (he closed a Very important Issue Of Mine and I had to fix it myself :/) but I will be forever admirative about his work (and the work of other contributors).

I am waiting now for the next announce about requests.

EDIT: I also tried pendulum and delorean but always ended up back with arrow. I now need to understand how to move from moment.js to luxon.js and my life with dates will be complete.


I've recommended people stay far away from arrow for many years now. The basic problem is on their "features" list:

> [the standard library has] too many types

No. That's just not at all true. In fact I would argue it's the opposite. You can make some very bad mistakes in python with dates because they silently coerce into each other in nonsensical ways. Arrow makes this much worse. A date is not the same as the millisecond at the start of thst date!


> You can make some very bad mistakes in python with dates because they silently coerce into each other in nonsensical ways

The Big Problem in Python's datetime library is the naive/tzaware thing. The same type is used for two very different concepts.


You get type errors if you try to combine two values with different timezones. I think of naive as a null timezone and it makes sense to me. Why is it so different?


> You get type errors if you try to combine two values with different timezones.

What kind of type errors? Maybe you mean naive and tzaware objects? Working on two tzaware objects with different offsets should not yield any error. Otherwise, it's worse than I thought.

> null timezone

What is a null timezone? Can you make an example of how it would be used in the real world?

If you want something for a point in time, you need an attached offset. That's an Instant in java.time.

If you need something more "human", then you get a LocalDateTime (in java.time parlance). Which is not a point in time unless you attach an offset to it, or localize it through a timezone.

Some operations just don't make sense between LocalDateTime. Example: can you measure the number of hours between one localdatetime and another localdatetime (or even an instant) ? No; because you don't know either offset.

Python doesn't prevent such operations; it tries to make things work in a simpler, but wrong way. Date/time handling is one of the hard things to get always right in software engineering; simplification doesn't work.


> tries to make things work in a simpler, but wrong way

To be fair, it is an old language. Before Python 3, it used the old assumptions about text being just 1-byte characters. We all know how much trouble it was to switch to Unicode. At least it treats integers as true integers rather than your CPU's concept of an integer.


I wonder if this is Arrow or working with datetimes in programming in general.

These such as Arrow, like moment.js, are good for UIs. But if you require plenty of date manipulation then I’d keep it native as possible with plenty of tests.

An example would be good as well.

I think dates have been my worst experience working in programming across a multitude of languages.


A date specifying a day is a Duration. A date that is a truncated/rounded timestamp is the same as the millisecond at the start of that date.

2021-02-27T00:00:00.000Z === 2021-02-27 if truncating.

But the day 2021-02-27, first of all starts and ends at different times, depending on your local timezone, but for me it's (in ISO8601 format) 2021-02-27T00:00:00.000+11/P24H.

To cover the whole world, it would be

2021-02-27T00:00+12/2021-02-27T23:59:59-12 or

2021-02-26T12:00Z/2021-02-28T11:59:59Z which is 48 hour duration :)


Mind giving an example?


Just seems like the age old problem of Python not having types built in.


What does the absence of (static) types have anything to do with the issue here?


How does Arrow compare to Pendulum?


Thanks for bringing this onto my radar. I've already settled with arrow, but when I can go without it, I combine datetime with pytz, which is so much faster at applying/changing timezones than arrow. I still need to check out 3.9's built-in timezones.


I have only used Pendulum and skimmed Arrow's API just now.

Pendulum does more. It has tools for manipulating a date. Add days, compare to other dates, etc.

Arrow appears primarily focused on parsing and formatting.


No, Arrow is much more than parsing/formatting.

Each Arrow object has a `shift` method, which is how you can add or subtract units of time. Greater/less/equal comparisons work exactly as you'd expect them to.


Pendulum is better


I always forget what arrow is called when I need it, but back when it came out it was the only way I could do anything useful with time. I eventually got good enough with the standard library to do what I need to do, but it's so ugly. I have to remember arrow, i wonder if the name is a reference to the tng episode? I'm just going to try to associate Python time with Data's head, And the snake cane.

I really wish improvements to python focused on things like this. Maybe an official higher level standard library for wrappers that focus on being nice to use like arrow, requests, envelopes, etc. (I know envelopes is not maintained but it was so nice)


"arrow of time" might be a helpful mnemonic to remind you of the name.


There's also the Martin Amis novel Time's Arrow



First, your link should be to libtai. Second, DJB never adequately addressed the complex problem of parsing and formatting human-readable dates (only providing the fixed ISO 8601 format), nor time zones (it doesn't support the local time zone, which is fine [1], but it doesn't support the specified time zone as well). It represents an incredibly naive understanding of date and time processing.

[1] This needs elaboration. The local time zone is typically implemented by at least one of two ways: you read the central repository (`/etc/localtime` in Linux for example) every time local time is requested, or you listen to the time zone change event (`WM_SETTINGCHANGE` event in Windows for example). This is no different from your typical pan-process synchronization problem, which is complex and costly.


> Timezone-aware and UTC by default

This is the main problem with python's builtin datetime module: the default is a kind of "localdate" or "localdatetime", and when you add a timezone it becomes an instant. It's a bad idea to conflate such two concepts.

OTOH arrow seems to think that "one size fits all". Which is bad as well :-/ I think joda time / java time design is good!


Any thoughts on comparisons with Pandas Timestamp?

It parses from ISO8601, it has appropriate guards against coercing tz naive/aware instances and, unlike datetime, it behaves like ZonedDateTime, so tz aware instances are tied to instants. I would guess that it's faster than arrow-py or Pendulum, too, since it's part of Pandas, but don't have proof.


Pandas silently converts to different dates within the same column

https://github.com/pandas-dev/pandas/issues/12585


Does arrow have support for multi-language date parsing like dateparser? I like dateparser but it's slow if you allow many languages.


[0]There is for month name and AM/PM, but not for day of week.

[0]https://arrow.readthedocs.io/en/stable/#supported-tokens


Nice, love Arrow. We use it in Interana to build calendar-aware buckets for time series aggregations.


The library looks nice, but the name doesn't sit well with me since I don't endorse the namesake concept :p

> The arrow of time, also called time's arrow, is the concept positing the "one-way direction" or "asymmetry" of time.


Time flies like an arrow.

Fruit flies like bananas.

Okok I'm showing myself out...


Also I thought it had something to do with pyarrow :/



What's the best .js equivalent?


Not usable right now, but the TC39 "Temporal" proposal looks promising, fixing a bunch of issues with Date and introducing a clear API: https://tc39.es/proposal-temporal/docs/

There is a polyfill, but they recommend against production use.


I’ve usually used moment.js in the past (https://momentjs.com/).


moment.js is in maintenance mode and the library authors themselves try to push people away to other libs[1]. Seems like most people have moved to date-fns or day.js. Moment authors seem to point to Luxon, which I didn't know about.

[1] https://momentjs.com/docs/#/-project-status/


Yep, I was aware that they moved to maintenance mode but I mistakenly thought they were still good for new projects.


Either that or https://date-fns.org/


I’ve evaluated a bunch and there’s no clear winner: luxon is probably my favorite that I’ve used, but it’s missing some crucial locale stuff. I’ve never used it, but I think js-joda looks the best in paper: it’s based on the API for a fairly popular JVM date library which I used a lot.

One issue, though, is that the “correct” date-time representation is often use-case dependent.


I like dayjs best.

https://day.js.org/


I switched to dayjs after using momentjs for many years. Im happy about that switch, the only thing that bothers me a bit is that essential things like UTC, timezones, durations are plug-ins. If you're using script-tags with a CDN each plug-in is a HTTP fetch.


That is probably intentional to keep the core small and lightweight. You can try bundling all the plugins you use during build if you want to reduce the number of fetches.


Pet peeve: people should stop saying "even better" in their announcements as if simply saying "better" implied that your previous version was crappy. Be more lucid in your self assessments, more humble in your comms and just say you improved it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: