Hacker News new | past | comments | ask | show | jobs | submit login
Apache Beam for Search: Getting Started by Hacking Time (shopify.engineering)
87 points by clandry94 on Jan 8, 2021 | hide | past | favorite | 8 comments



It would be helpful if this thorough of an example could live on the apache beam website, I think it would avoid a lot of confusion. I certainly found the way it handles windowing with triggers to be quite diffrent than say, Spark.


Yeah I agree. I basically wrote this blog out of my challenges learning this content. It involved a lot of code spellunking and trial and error to figure out precisely what these concepts meant. I do find Beam powerful, but also to be a bit esoteric at times and difficult to follow how watermarks, windows, and triggers all work. And we encounter sometimes unexpected behavior that frequently causes us to revise our understanding of these concepts.

There's a fair amount of Stackoverflow highly voted answers out there like "I dunno, try this trigger, see if it works" without much understanding of how everything work underneath. Probably cause it's tricky to grok


I'm not even sure the core Beam engineers understand it all! Look at how Kafka offset acks are handled now:

https://github.com/apache/beam/blob/master/sdks/java/io/kafk...


This seems to be a general problem with many projects under the Apache Software Foundation. You look at the landing page and end up not figuring out what the project is or what it's for. This seems to be especially true with projects in the whole "big data"/"stream processing" sphere.


The link for the Apache Beam project has the hostname and domain transposed. The correct URL is: https://beam.apache.org/


Author here, thanks. I'll get it fixed.


You got it fixed in no time at all! :)


does it support data lineage?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: