Hacker News new | past | comments | ask | show | jobs | submit login
Why is printing “B” dramatically slower than printing “#”? (2014) (stackoverflow.com)
324 points by retox on Sept 8, 2016 | hide | past | favorite | 54 comments



Best comment:

"The real answer is clearly because hashes are faster than b trees."


This reminds me of a bug I had back in ‘98 or so. I was using a Parallax BASIC stamp and could not understand why when I put it to sleep it would sometimes using only 30 uA, and other times 300uA. I kept narrowing down my code until it was utter nonsense, and finally I made it to where if I removed or added a byte of code it would toggle between the two.

I got on the phone with the engineer responsible for the firmware and sent him my code, but I never did find out what the problem was (sorry for the letdown, maybe someone at Parallax remembers and reads Hacker News?). They acknowledged it as a bug, and made a fix. Unfortunately the fix made was to always draw 300uA at sleep!

Fast forward 18 years (!) and from their website it looks like their latest modal draws 50uA, so it was a happy ending after all. The end.


I love it when somebody immediately knows the answer to something seemingly obscure.


Sometimes I think these guys know something obscure, and then post a question about it just to answer it themselves.

SO actually allows this, Q and A from the same account. Some people use sock-puppets though.


I've done the self-answer thing a couple of times. Sometimes you just know that someone is going to run into the same problem, but you'll never see the question, and you can preempt it.


I think I've posted on StackOverflow/ServerFault three times in my life. If I've hit the point where I'm posting, I'm desperate and I've probably already spent 8+ hours scouring Google and trying everything I can think of.

If I do end up figuring out a solution, I definitely post it, and I'm glad that SO allows this, otherwise there would still be no answer to some of these questions online... One of these issues previously only turned up a single bash mailing list post from a decade ago with no resolution. Now searching for a few relevant search terms turns up my answer/solution in the top few results for all of them.

I don't care about imaginary internet points, I just figured I'd help the next guy out.


Some of these have been absolute God sends for me.

You have a cross between misinterpreting what something means, on obscure hardware, with a useless error message and you are normally stuck for days.


If I resolve my question before someone else in the community I always self answer to help the next guy.


More than once I've solved the problem by the time I am 80% of the way through formulating a question. As long as it's not something trivial I finish and post the answer immediately.


Not only is this allowed, it is encouraged.

Why would people use sock-puppets, I don't know. It seems that the SO team has special ways to detect voting frauds.


People use sock puppets to do this because although you're correct that it's with the SO guidelines, it feels a bit off, and not uncommon for it to get pushback from those who either don't agree with the guidelines or are not aware of them.

Also, while voting fraud is against the guidelines and actively countered, creating sock puppet accounts is not.


> Also, while voting fraud is against the guidelines and actively countered, creating sock puppet accounts is not.

You're right. For anyone interested here is the relevant post: http://meta.stackexchange.com/q/57682/297429


Doesn't that say that "Answering your own questions with the other account(s)" is one of the things that a sock puppet account should not be used for?


Yes, but the parent comment said "creating sock puppet accounts", which is okay because there might be legitimate uses.


You use a sock-puppet to ask the question, and your real account to answer it. Then you can look smart.


Sometimes the act of writing out a question is enough to spur the right ideas to solve the problem. It's the same as whiteboarding or asking to use someone as a sounding board.


Also known as rubber duck debugging.


Still if it helps others this is cool interesting thing.


He must had a hard time figuring it out when he first encountered the problem..


Or he's written a terminal that had to figure out how to do that functionality and remembered how it worked.


According to the comments on TJ's post, this would be correct.

http://stackoverflow.com/questions/21947452/why-is-printing-...

> There are some answers that come from hard learned experience. T.J. and I (since we are friends) grew up in the days of the Apple ][ and the zx80/81. There was no built in word wrap back then. So we both ended up writing our own — more than once. And those lessons stick with you, they get burned in to your lizard brain. But if you leaned to code after that, when your environment word wraps everything, or you do it manually prior to runtime, it is harder to run into the issues with word wrap. – JockM


100% I am the guy who wrote that comment. Sometimes there are benefits to getting older ;)


I am amused when someone comes up with obscure problems. I am disappointed when that person doesn't brief about the scenario that led them to such a problem.


This is a great example of a "10x engineer". Someone less experienced would take 10x or 100x the time to investigate and resolve this problem.


No, it just means someone has encountered that particular problem in the past.


The 10X immediately knows the answer. The 100X immediately knows the answer, but asks "That is an interesting observation and I'd really like to know why too. But is writing 1 million characters to the console essential to the problem your solving?"


Indeed.

Then the 1000x engineer explains:

"You are confused because you do not understand Tao. Only a fool expects rational behavior from his fellow humans. Why do you expect it from a machine that humans have constructed? Computers simulate determinism; only Tao is perfect.

The rules of programming are transitory; only Tao is eternal. Therefore, you must contemplate Tao before you receive Enlightenment."

"But how will I know when I have received Enlightenment?" asks the 100x engineer.

"Your program will run correctly," replies the 1000x engineer.


Meanwhile, the 10kx engineer sits here on HN, doing nothing while requirements change again and again - waiting to implement the thing when the requirements finally settle.


It's really an example that we should teach people to use a sampling profiler. anyone can answer this question in a minute or so, no speculation necessary, if they just know what tool to use.


Except the insight of the answer is that the bottleneck is in a different program than suspected. Running the suspect program under a profiler wouldn't yield any answers.

Sure, anyone can find the answer if they already know where to look.


Running the suspect program would show blocked states; even without that, you would simply see no difference, conclude that the problem must lie elsewhere, and switch to looking at a full system profile.

FWIW, I always start with a full system profile including all thread states when debugging mysterious performance issues, precisely because "the problem is in another process" is such a common occurrence. This is also why it's important to use a sampling profiler specifically.


Heh, reminds me of a bug I ran into a couple of weeks ago. Users on Wikipedia were complaining that typing into the editor <textarea> on certain pages was sluggish. I eventually narrowed it down to the presence or absence of an invisible left-to-right or right-to-left mark control characters elsewhere in the page. See https://bugzilla.mozilla.org/show_bug.cgi?id=1296050 for the gory details.


My guess was going to be Unicode, the reason why 'sort' on Linux can be so slow if you're in certain locales.

https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/846...


So, in effect, with word-wrapping on in the terminal, this scenario effectively produces a "Shlemiel the painter's algorithm" [0] in that for every character output on the same line (when it exceeds the line length), you have go backtracking in vain to find a place to break - where this is none - so outputting N characters on a line is basically an O(N^2) operation.

0. http://www.joelonsoftware.com/articles/fog0000000319.html


I wonder if this is also why the default terminal in Ubuntu slows to a crawl if you write a ~4kb long line (I was working on a program which was processing some SDR data, outputting 0/1 to the screen without any line breaks).


Probably that or some other case of selecting an algorithm without consideration of the /worst possible/ performance cases.

It may be better to try a generally good algorithm first, then if it fails after X iterations to fall back to an algorithm with a better bound on the worst case.


Maybe it's just not reflected in the question, but did the asker not immediately try other characters to see if there's a pattern? Given the answer, other letters would probably have had the same result and other characters would not. They could have at least narrowed the problem down to letters.


While we're talking about obscure speed differences, I recently needed to traverse large directories and collect files with a certain suffix and found out that using "find" as a subprocess and parsing its output line-by-line was orders of magnitudes faster than any direct directory traversal I wrote and any other method I've tried, including other Unix utilities.

I don't know how it works but it's insanely fast.


The find utility is probably smarter than the other methods. I know two things to watch out for: sorting the directory entries and calling stat on each entry.

You might like to read the rationale section of PEP 471: https://www.python.org/dev/peps/pep-0471/


So printing B isn't slower: terminal rendering of lines of text is just less obvious than thought.


Any time I'm seeing really strange results involving the terminal, I try to look for terminal-related weirdness. There is a lot of stuff going on that people generally ignore or simply don't know about, because it generally just works.

And when it doesn't, if you're not at least a little familiar with what's going on, it can be a deeply confusing and obscure little world.


I don't know if it's still valid, but SBCL has a readme entry indicating that building in xterm is much faster than building in gnome-terminal. SBCL's build output is incredibly noisy and the compiler is quite fast, so the combination of the two makes for an incredible rater of output to the terminal.


I remember how confusing IO was in college. Even the implicit flush requirement to show stuff. Utter dread. Few years ago I read an old lisp book (something written before Common Lisp), they don't touch IO until the end, for transparency/bootstraping reasons, and since it's insignificant for thinking it's a straightforward model without fancy features, I wish people taught CS this way.


For what it's worth, the difference still exists when printing with Netbeans 8.1 - 41 seconds for 'B' and 2 seconds for '#'


That also has to do with line-wrapping of word characters.


A lot of comments previously on HN here:

https://news.ycombinator.com/item?id=7294186


My first guess was that for most fonts the glyph "B" has splines, while "#" does not (only segments) and in line of principle rendering splines is more expensive than rendering segments. But probably this is nowhere that relevant.


According to my understanding, the rendered character is cached by the OS, so this probably shouldn't have any effect.


That was my first thought too, but I expect glyphs will probably be cached for a while after being rendered, which should make any difference in rendering speed negligible.


A good first step for investigating this would have been to redirect output to a file.


Let's talk about any potential application this could have.

The first thing I can think of is that if you're writing a flood-filling algorithm (finding connected components) you should fill with #s instaed of Bs, even if filling with Bs is more conventional. Specially if you're doing competitive programming and need to overkill the optimization process in order to pass the run time.

As for production coding, I can't think of anything. Any ideas?

BTW, this is probably speculation, but it's still worth discussing in my humble opinion.


Things like printing out progress bars for long operations stand out here. Lots of programs have an option to do such as a quick and dirty way to give feedback.


While stuck at a meeting today I learned you can type faster than notepad can render by making the text really big. (AltSpace X, Alt O, F, S, 600, Enter, spam buttons) The text lags just slow enough so you can use it as a ghetto marquee.


Because # is only straight lines, but for b you gotta do the whole bezier curve to make it look right.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: