Hacker News new | past | comments | ask | show | jobs | submit login

Apache Flink is also a good alternative, and works very well. We have used it in production for a while for generating live reports. I made simple example [1] and have a look at the docs if you are more interested [2]. Gonna definetely try Kafka's version, its version of stream processing [3] also interesting as well.

[1] https://medium.com/@mustafaakin/flink-streaming-sql-example-...

[2] https://ci.apache.org/projects/flink/flink-docs-release-1.3/...

[3] http://docs.confluent.io/current/streams/index.html




I'm one of the authors of Kafka. I've outlined some differences between Flink's support for streaming SQL and KSQL in this Twitter thread - https://twitter.com/juliusvolz/status/902283513382051840

Here's a summary: - KSQL has a completely Interactive SQL interface, so you don't have to switch between DSL code and SQL.

- KSQL upports local, distributed and embedded modes. Is tightly integrated with Kafka's Streams API and Kafka itself; doesn't reinvent the wheel. So is simple to use and deploy.

- KSQL doesn't have external dependencies, for orchestration, deployment etc.

- KSQL has native support for Kafka's exactly once processing semantics, supports and stream-table joins.


Disclaimer: I am one of the Flink committers.

While Flink has in fact no direct SQL entry point right now (and many users simply wrap the API entry points themselves to form a SQL entry point), the other statements are actually not quite right.

  - Flink as a whole (and SQL sits just on the DataStream API) works local, distributed and embedded as well.

  - Flink does not have any external dependencies, not even Kafka/ZooKeeper; it is self-contained. One can even just receive a data stream via a socket if that works for the use case.

  - Flink itself has always had exactly-once semantics, and works also exactly-once with Kafka.


@neha - where do you think kafka is going to evolve in the world of data processing.

I'm very bullish on kafka. Today we have Spark for batch data computation and have already switched some of our streaming stuff to Kafka.

Do you see yourselves entering into the batch processing space anytime ? Google has officially said that Flink is "compelling" because of its compatibility with the Beam model.

If I can step on thin ice... is it easier for Flink to commandeer Kafka or for Kafka to win over batch processing ?


What do you mean by "batch processing" Personally i find that term to be confusing.

I believe if Kafka can do streaming then it effectively can do batch as batch is a subset of streaming.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: