I haven't looked at the problem closely enough to answer, but could we start from the other direction: what makes you think that memory I/O would be the bottleneck?
From my limited understanding, we sequentially bring a large text file into L1 and then do a single read for each value. On most processors we can do two of these per cycle. The slow part will be bringing it into L1 from RAM, but sequential reads are pretty fast.
We then do some processing on each read. At a glance, maybe 4 cycles worth in this optimized version? Then we need to write the result somewhere, presumably with a random read (or two?) first. Is this the part you are thinking is going to be the I/O bottleneck?
I'm not saying it's obviously CPU limited, but it doesn't seem obvious that it wouldn't be.
Edit: I hadn't considered that you might have meant "disk I/O". As others have said, that's not really a factor here.
It's quite a bit more than that, just the code discussed in the post is around 20 instructions, and there's a bunch more concerns like finding the delimiter between the name and the temperature, and hashtable operations. All put together, it comes to around 80 cycles per row.
When explaining the timing of 1.5 seconds, one must take into account that it's parallelized across 8 CPU cores.
You are right. In my defense, I meant to say "about 4 cycles per byte of input" but in my editing I messed this up. I'd just deleted a sentence talking about the number of bytes per cycle we could bring in from RAM, but was worried my estimate was outdated. I started trying to research the current answer, then gave up and deleted it, leaving the other sentence wrong.
From my limited understanding, we sequentially bring a large text file into L1 and then do a single read for each value. On most processors we can do two of these per cycle. The slow part will be bringing it into L1 from RAM, but sequential reads are pretty fast.
We then do some processing on each read. At a glance, maybe 4 cycles worth in this optimized version? Then we need to write the result somewhere, presumably with a random read (or two?) first. Is this the part you are thinking is going to be the I/O bottleneck?
I'm not saying it's obviously CPU limited, but it doesn't seem obvious that it wouldn't be.
Edit: I hadn't considered that you might have meant "disk I/O". As others have said, that's not really a factor here.