Hacker News new | past | comments | ask | show | jobs | submit login

Not "PQ falls over with 1Ks of items," but rather "the M language does not do well with accumulation patterns on tables; naive approaches can hit significant performance issues in the 1Ks of records and sophisticated approaches struggle with 100Ks."

These are two very different statements. I've happily used PQ to ingest GBs of data. Its streaming semantics are fine to great for some types of processing and introduce performance cliffs for others. There's no binary judgment to be made here. Laziness is neither a fundamental flaw not an unmitigated good.

I've already shared one specific pattern above. I can share some mocked up data if you need me to, but that might be a day or two. Also, feel free to reach out via email (in my profile).




>I've already shared one specific pattern above

If you mean this:

"Say you have a table of inventory movements and want instead a snapshot table of inventory at point in time"

Then I can make my own data to play with - I only want to be clear about the constraints. Would 500K records be enough to obviate the distinction between naive and non-naive approaches? Can you quantify (not precisely) "struggle"?

I have used Table.Buffer, but I probably don't thoroughly understand its use yet.

(I belatedly realized your problem is something I've done with Sharepoint list history recently, but not that many records, so I'm going to look for a public dataset to try)

P.P.S. I guess it also makes me think - I frequently am getting my data from an Oracle database, so if something is easier done there, I'd put it in the SQL. Analytic functions are convenient.

P.P.P.S. Aha! I found a file of parking meter transactions for 2020 in San Diego, which is about 140MB and almost 2 million records. This seems like a good test because not only is it well over the number you said was problematic, but it's well over the number of rows you can have directly in one Excel sheet.

https://data.sandiego.gov/datasets/parking-meters-transactio...


Ok, I agree that PQ is slow. It is possible to calculate a running total of a column in a million row table before the sun burns out though.

I am very not an algorithm person, but I got a huge speedup from a "parallel prefix sum" instead of the obvious sequential approach or the even worse N^2.

I translated this to M by rote and trial and error (page 2): https://www.cs.utexas.edu/~plaxton/c/337/05f/slides/Parallel...

Implementing the parallel, recursive solution got me a million rows in about three and a half minutes.

Fill down (which I had to do anyway to compare) was about 10 seconds.

So...probably not the first choice in this scenario but could be worse?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: