Hacker News new | past | comments | ask | show | jobs | submit | apurvamehta's comments login

From the very same blog post:

> Is this Magical Pixie Dust I can sprinkle on my application?

> No, not quite. Exactly-once processing is an end-to-end guarantee and the application has to be designed to not violate the property as well. If you are using the consumer API, this means ensuring that you commit changes to your application state concordant with your offsets as described here.

I think that is a pretty clear statement that end-to-end exactly once semantics doesn't come for free. It states that there needs to be additional application logic to achieve this and also specifies what must be done.


Right. But that's at the end of a separate article, while the post which this HN discussion is about throws around the words "exactly once" a lot more casually. The argument is over the use of the words "exactly once". They should just refer to the feature as "transactions" or "idempotent producer".


Personally, I would welcome Aphyr trying to break the Kafka EOS guarantees. Like all software, there will be bugs, and the exposure will only make it stronger and more viable.

I write as one of the implementors of exactly once guarantees in Kafka.


Points 1 & 2. There is no direct communication between a producer and consumer. The producer writes to the broker, the consumer reads from the broker. There is a detailed flow diagram for the producer side operations in the article, and this deck has more of the details of how both the producer and consumer work: https://www.slideshare.net/apurva2/introducing-exactly-once-...

3. Yes, it has been tested empirically. Quoting from the article:

> We wrote over 15,000 LOC of tests, including distributed > tests running with real failures under load and ran them > every night for several weeks looking for problems. This > uncovered all manner of issues, ranging from basic > coding mistakes to esoteric NTP synchronization issues > in our test harness. A subset of these were distributed > chaos tests, where we bring up a full Kafka cluster with > multiple transactional clients, produce message > transactionally, read these messages concurrently, and > hard kill clients and servers during the process to > ensure that data is neither lost nor duplicated.


We did our experiments directly on hardware. I don't think that AWS VMs simulate multiple physical sockets. If they don't then this article will not apply to them.


That would be true if we were using C++. Unfortunately, all our code is in Scala and we use Java NIO libraries to memory map our files. AFAIK, they don't give us the option on using these POSIX calls.


Cassandra binds to posix_fadvise to do exactly this when writing out new SSTables:

https://github.com/apache/cassandra/blob/trunk/src/java/org/...


Wow.. that's great to know. We will definitely investigate this approach. Thanks for sharing! :)


Thanks, the 400% number is wrong. It was a last minute edit.. I should learn not to do that. I have updated the post to say that the error rates have dropped by 1/4th.


Yikes. I meant that they dropped TO 1/4th the original.


This is exactly right :)


Then it dropped by 75%.


Yes. It was a blunder. The post has been updated to reflect this.


Hi, post author here.

> Also, is there a reason not to use large pages directly for the mmap'd sets if you know you're going to have them hot at all times? (I assume they read the entire file on start?)

We could use large pages directly. But, as I mentioned in the article, the performance gains would be negligible compared to the gains that come from having things in memory in the first place. These are not very large memory systems and the page table / TLB miss overhead doesn't seem to be biting us. We are just following the mantra 'pre-mature optimization is the root of all evil' :)


In my experience, most people don't know they have TLB problems because, effectively, it's always bad.

It's only when you start getting to the metal to see what your hardware is actually capable of that the TLB stands out as a glaring source of inefficiency.

Put another way: yeah, the TLB is making your app slow, but it's doing so always, so you don't notice. Instead, you mistakenly think your hardware is just slower than it really is.


> $1,200/month or less gets you your own private room in a shared apartment in NYC, Boston, or just about anywhere else.

$1400/month gets a one-bed apartment in Mountain View. I used to pay $700/mo for a private room and bathroom in a 2 bed apartment in Mountain View. Those prices seem over hyped.


I feel like this is overkill. I have been using RailsReady (https://github.com/joshfng/railsready) to setup several Mac and Ubuntu boxes and it has never been more than one click.


To be fair, he said he didn't necessarily want to duplicate the effort of others. I'm sure he will garner any information he can from those projects to help make that app awesome.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: