Hacker News new | past | comments | ask | show | jobs | submit login

You can run streams of many millions of JSON objects pretty much as fast as the IO can feed it... most of the time, in situations like this, you're constrained by IO speed, not CPU... assuming you are working with a stream that has flow control.

I tend to dump out data structures to line terminated JSON, and it works out really well for streams, can even gz almost transparently. Parse/stringify has never been the bottleneck... it's usually memory (to hold all the objects being processed, unless you block/pushback on the stream), or IO (the feed/source of said stream can't keep up).




Even if printing and parsing is computationally cheap, memory allocation is less so.

If you expose JSON, each serialize/deserialize will produce another instances of objects, with the same data.

The architecture of PowerShell implies commands in the pipeline can process the same instances, without duplicating them.

Another good thing about passing raw objects instead of JSON — live objects can contain stuff expensive or impossible to serialize. Like an OS handle to an open file. Sure, with JSON you can pass file names instead, but this means commands in your pipeline need to open/close those files. Not only this is slower (opening a file requires kernel call, which in turn does various security checks for user’s group membership and file system’s inherited permissions), but can even cause sharing violations errors when two commands in the pipeline try accessing the same file.


And just think how much faster it would be if there were no serialization and I/O involved at all...


And my method can work over distributed systems with network streams... There are advantages to streams of text.


What advantages?

PowerShell works over networks just fine, because standardized ISO/IEC 17963:2013 protocol a.k.a. WS-Management.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: