Hacker News new | past | comments | ask | show | jobs | submit login

At my favorite tech interview we debugged a real prod issue they had a few months before. The interviewer spent a few minutes sketching the basic architecture of the system on the whiteboard and then started with the customer complaint:

"Sometimes users stop getting chat updates, refreshing the page fixes it."

From there I went explained my debug steps and he acted as an oracle when I took an action

Me: "Have messages been lost or does a page refresh always fix it?"

"Messages haven't been lost"

Me: "I'd check our logs for anything obvious errors"

"Nope, everything appears normal"

...

Me: "What sort of logging do we have with the websocket vendor?"

"They have a live console but don't provide any persistent logs"

Me: "Can we scrape that to get logs we can correlate to the errors?"

"We did that, didn't find any errors around the time a user had an issue"

....

Me: "Can we try X Y Z to reproduce?"

"When we did that we discovered that the disconnect only happens after a user has opened a navbar menu."

... "As it turns out there was a click handler on all navbar buttons that disconnected from the websocket. The buttons used to directly link to different pages, now some of them had submenus and opening that submenu caused chat to hang."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: