More

jammycrisp · on Aug 3, 2023

Yep! Both msgspec (https://jcristharif.com/msgspec/supported-types.html#datacla...) and orjson support encoding dataclasses to JSON natively.

jammycrisp · on July 11, 2023

> we measure the number of instructions and memory/cache accesses through CPU instrumentation performed with Valgrind. This approach gives repeatable and consistent results that couldn’t be obtained with a time based statistical approach, especially in extremely noisy CI and cloud environments.

This is a neat approach! I'm curious how well it maps to actual perf degradations though. Valgrind models an old CPU with a more naive branch predictor. For low-level branch-y code (say a native JSON parser), I'd be curious how well valgrind's simulated numbers map to real world measurements?

My probably naive intuition guesses that some low-level branchy code that valgrind thinks may be slower may run fine on a modern CPU (better branch predictor, deeper cache hierarchy). I'd expect false negatives to be rarer though - if valgrind thinks it's faster it probably is? What's your experience been like here?

art049 · on July 12, 2023

>This is a neat approach! I'm curious how well it maps to actual perf degradations though. Valgrind models an old CPU with a more naive branch predictor. For low-level branch-y code (say a native JSON parser), I'd be curious how well valgrind's simulated numbers map to real world measurements?

We didn't try it on this specific case but on we found that on branchy code valgrind does aggravate the branching cost. Probably, we could mitigate this issue by collecting more data relative to the branching and incorporate those in our reported numbers to map more accurately to the reality and more recent CPUs.

>My probably naive intuition guesses that some low-level branchy code that valgrind thinks may be slower may run fine on a modern CPU (better branch predictor, deeper cache hierarchy). I'd expect false negatives to be rarer though - if valgrind thinks it's faster it probably is? What's your experience been like here?

Totally! We never encountered a false positive in the reports yet. But as you mentioned since valgrind models an old CPU, it's likely to happen. But even though the cache simulated has a quite old, it still improves the relevance of our measures. When we have some time, we'd really enjoy refreshing the cache simulation of valgrind since it would probably eliminate some edge cases and reflect memory accesses more accurately.

jammycrisp · on July 8, 2023

starlite was the original name, it was recently renamed to litestar due to comments about how easily confused "starlette" and "starlite" are.

d0mine · on July 9, 2023

data point in favor of the renaming: while reading GP, I've assumed that "starlite" is a typo for "starlette."

jammycrisp · on July 8, 2023

+1 for litestar[1]. The higher bus-factor is nice, and I like that they're working to embrace a wider set of technologies than just pydantic. The framework currently lets you model objects using msgspec[2] (they actually use msgspec for all serialization), pydantic, or attrs[3], and the upcoming release adds some new mechanisms for handling additional types. I really appreciate the flexibility in modeling APIs; not everything fits well into a pydantic shaped box.

[1]: https://litestar.dev/

[2]: https://github.com/jcrist/msgspec

[3]: https://www.attrs.org/en/stable/

jammycrisp · on July 7, 2023

If you like cattrs, you _might_ be interested in trying out my msgspec library [1].

It works out-of-the-box with attrs objects (as well as its own faster `Struct` types), while being ~10-15x faster than cattrs for encoding/decoding/validating JSON. The hope is it's easy to integrate msgspec with other tools (like attrs!) rather than forcing the user to rewrite code to fit the new validation/serialization framework. It may not fit every use case, but if msgspec works for you it should be generally an order-of-magnitude faster than other Python options.

[1]: https://github.com/jcrist/msgspec

</blatant-evangelism>

willhoyle · on July 8, 2023

This looks like exactly what I've been looking for. I just want strong typing, json <-> struct and validation. Seems like it ticks all the boxes + speed benefits which is always nice. I especially find it useful that I can use messagepack for internal service chatter but still support json for external stuff and dump astuple to sqlite.

jammycrisp · on July 7, 2023

> Maybe it was very slow before

That is at least partly the case. I maintain msgspec[1], another Python JSON validation library. Pydantic V1 was ~100x slower at encoding/decoding/validating JSON than msgspec, which was more a testament to Pydantic's performance issues than msgspec's speed. Pydantic V2 is definitely faster than V1, but it's still ~10x slower than msgspec, and up to 2x slower than other pure-python implementations like mashumaro.

Recent benchmark here: https://gist.github.com/jcrist/d62f450594164d284fbea957fd48b...

[1]: https://github.com/jcrist/msgspec

LtWorf · on July 8, 2023

Eeey hello :D

Eeh come on, I think it's a bit unfair to compare, because msgspec doesn't support regular python union types… which are the number 1 source of slowness… at least in my real world use case of the thing. I've got hundreds of classes with abundant nesting and unions.

In pydantic v2 they did the same thing i've been doing in typedload for a few versions already: check the field annotated with a Literal and directly pick the correct type, rather than do try and error. So now the speed for unions has become better.

Even so, for being binary vs pure python, I'd have expected much more.

jammycrisp · on June 30, 2023

While it's definitely much faster than pydantic V1 (which is a huge accomplishment!), it's still not exactly what I'd call "fast".

I maintain msgspec (https://github.com/jcrist/msgspec), a serialization/validation library which provides similar functionality to pydantic. Recent benchmarks of pydantic V2 against msgspec show msgspec is still 15-30x faster at JSON encoding, and 6-15x faster at JSON decoding/validating.

Benchmark (and conversation with Samuel) here: https://gist.github.com/jcrist/d62f450594164d284fbea957fd48b...

This is not to diminish the work of the pydantic team! For many users pydantic will be more than fast enough, and is definitely a more feature-filled tool. It's a good library, and people will be happy using it! But pydantic is not the only tool in this space, and rubbing some rust on it doesn't necessarily make it "fast".

jammycrisp · on April 23, 2023

Thanks, glad you like it!

jammycrisp · on April 23, 2023

While I agree that there are ways to write a faster validation library in python, there are also benefits to moving the logic to native code.

msgspec[1] is another parsing/validation library, written in C. It's on average 50-80x faster than pydantic for parsing and validating JSON [2]. This speedup is only possible because we make use of native code, letting us parse JSON directly and efficiently into the proper python types, removing any unnecessary allocations.

It's my understanding that pydantic V2 currently doesn't do this (they still have some unnecessary intermediate allocations during parsing), but having the validation logic already in compiled code makes integrating this with the parser theoretically possible later on. With the logic in python this efficiency gain wouldn't be possible.

[1]: https://github.com/jcrist/msgspec

[2]: https://jcristharif.com/msgspec/benchmarks.html#benchmark-sc...

keithasaurus · on April 23, 2023

Definitely true. I've just soured on the POV that native code is the first thing one should reach for. I was surprised that it only took a few days of optimizations to convert my validation library to being significantly faster than pydantic, when pydantic as already largely compiled via cython.

If you're interested in both efficiency and maintainability, I think you need to start by optimizing the language of origin. It seems to me that with pydantic, the choice has consistently been to jump to compilation (cython, now rust) without much attempt at optimizing within Python.

I'm not super-familiar with how things are being done on an issue-to-issue / line-to-line basis, but I see this rust effort taking something like a year+, when my intuition is some simpler speedups in python could have been in a matter of days or weeks (which is not to say they would be of the same magnitude of performance gains).

nine_k · on April 23, 2023

Two things may preclude optimization in pure Python when producing a library for general public. Having a nice / ergonomic interface is one. Keeping things backwards-compatible is another.

nine_k · on April 23, 2023

I see that msgspec also uses native code to achieve the speed.

But the fact that it's faster than orjson (another native-code implementation) is cool.

jammycrisp · on April 8, 2023

It looks like pydantic-core is distributing musllinux wheels, which should work fine on alpine. Fwiw tooling like cibuildwheel makes building and publishing wheels for all the common platforms fairly straightforward now.