Hacker News new | past | comments | ask | show | jobs | submit login
Deep Learning Interviews book: Hundreds of fully solved job interview questions (github.com/boltzmannentropy)
582 points by piccogabriele on Jan 10, 2022 | hide | past | favorite | 147 comments



I have been working as an ML Engineer for a few years now and I am baffled by the bar to entry for these positions in the industry.

Not only I need to perform at the Software Engineer level expected for the position (with your standard leetcode style interviews), but I need to pass extra ML specific (theory and practice) rounds. Meanwhile the vast majority of my work consist of getting systems production ready and hunting bugs.

If I have to jump through so many hoops when changing jobs I'll seriously consider a regular non-ML position.


there's a bunch of gatekeeping to get into ML. Part of it is that ML people don't want non-ML people to know just how much of what they do is drudgery and how little of it is exciting math, or have competition from people with similar skills. And those roles come with a lot of prestige.

I went through all that and am a SWE again instead of an ML engineer. The one thing I learned from all that? "The very best models are distilled from postdoc tears".


Getting state of the art performance in ML requires a lot of intuition about equations though. I've seen some of the top ML engineers work at Google, they all have a really good understanding of math, how formulas translates into measurable results etc. An ML education or research background seems less important, if you have that from studying physics or math or anything then it still translates.

I feel the biggest problem for people without an ML background is that you'd think "I don't know what I'm doing, I can't get hired for this job!", but fact is that people with ML backgrounds mostly don't know what they are doing either. They just get standard results by applying standard libraries, any programmer with some math skills could do the same, it is no harder than learning a frontend or backend framework, people just think it would be harder so they lack confidence about it. There are some gotchas you got to learn, but there are a lot of gotchas in both backend and frontend as well.


And the same can be said of non-ML IT ! You always contrast better when you understand the whole history behind why you write something a certain way, even if you could just learn on the job seeing it over and over. It's like how they teach proper sorting by giving you all the bad ways first.

Also, it's not often but you do have to show creativity at times, to solve a new problem or something, and having an intuitive theoretical understanding goes a long way vs someone who learned via base mimicry.

I think instead of gatekeeping we could build bridges: be very clear that salary / responsibilities will be lower at first and judge on results. If an ML person is brilliant, he won't be threatened by an idiot Java dev. And if a Java dev is able to produce good results even if the way he reached them is less graceful, then an ML engineer should probably start shifting the second gear :D


Sure, I've been doing that "intuitions about equations" thing since 1993 (my undergrad thesis was on using gradient descent to train the weights of a dynamic programming algorithm that found e.coli gene). I generally agree, to be a top ML researcher, you need those skills in excess of the average (I worked with quite a few of those people at Google). To do state of the art work? Mostly hard work, lucky guesses, lots of compute power, and a huge support apparatus to make rapid experimentation easier.

But the vast majority of people working in ML don't need that. Sadly, most of the work I did for one of the world's most powerful machine learning systems was literally computing frequencies and then sorting by the frequency, so features that were more common were encoded in smaller varints, saving lots of disk space.


I agree with you, but I also wish more ML people knew more math. Though I think there's a difference between research and production (I'm in research).


I, on the other hand, am baffled b by the lack of basic engineering skills of mods ML workers at my company. Their code is unmaintainable, and they seem to lack the usual problem solving skills I look for in software engineers.

Then again, my company business model leads to terrible hires anyway.


This is the same for any specialized software engineering role. Compilers, GPGPU, embedded systems, computer graphics, image processing, etc. In an interview panel for any of these roles, you will be expected to be a competent software engineer and have domain knowledge about the sub-field.


> ... I'll seriously consider a regular non-ML position.

What about asking for more money at the end? Multi-stage complex interview process eliminates more candidates. Some, like you say, will opt for a developer gig instead, probably because ML wasn't something they were interested in to begin with. That narrows down the list of candidates even more. Either "play the game" and ask for more money or don't play the game at all. Let employers pay extra for polished candidates.


If what I want is money I think I'm better off getting competing offers as a regular Software Engineer and pumping the numbers.


As an "ML Engineer," this is both true and very funny


And most likely will be paid <= software-engineers


just like Software Engineer having to pass leetcode round and system design round, I also doubt that ML theory and practice is much harder than system design (theory and practice). Beside that, you get paid more than Software Engineer.


I can tell you we are not paid more than Software Engineers, but that might be only my company.

One startup asked me this. They gave me a very vague problem statement, and in 2 days I had to find a couple of recent articles relevant to the problem and prepare a presentation explaining my solution and justifying my decisions.


This book has fun problems! Example:

During the cold war, the U.S.A developed a speech to text (STT) algorithm that could theoretically detect the hidden dialects of Russian sleeper agents. These agents (Fig. 3.7), were trained to speak English in Russia and subsequently sent to the US to gather intelligence. The FBI was able to apprehend ten such hidden Russian spies and accused them of being "sleeper" agents.

The Algorithm relied on the acoustic properties of Russian pronunciation of the word (v-o-k-s-a-l) which was borrowed from English V-a-u-x-h-a-l-l. It was alleged that it is impossible for Russians to completely hide their accent and hence when a Russian would say V-a-u-x-h-a-l-l, the algorithm would yield the text "v-o-k-s-a-l". To test the algorithm at a diplomatic gathering where 20% of participants are Sleeper agents and the rest Americans, a data scientist randomly chooses a person and asks him to say V-a-u-x-h-a-l-l. A single letter is then chosen randomly from the word that was generated by the algorithm, which is observed to be an "l". What is the probability that the person is indeed a Russian sleeper agent?


Bayes rule with odd ratios makes it pretty easy.

    base odds: 20:80 = 1:4  
    relative odds = (1 letter/6 letters) : (2 letters / 8 letters) = 2/3  
    posterior odds = 1:4*2:3 = 1:6  
    Final probability = 1/(6+1) = 1/7 or roughly 14.2%
Bayes rule with raw probabilities is a lot more involved.


I don't know about "a lot more". It is essentially the same calculation without having to know 3 new terms. Let:

A = the event they are a spy B = the event that an l appears

And ^c denote the complement of these events. Then,

P(A) = 1/5

P(A^c) = 4/5

P(B|A) = 1/6

P(B|A^c) = 1/4

P(A|B) = P(B|A)P(A)/P(B)

By law of total probability,

P(B) = P(B|A)P(A) + P(B|A^c)P(A^c)

Which is very standard formulation and really just your equation as you can rewrite everything I have done as:

P(A|B) = 1/(1 + P(B|A^c)P(A^c)/P(B|A)P(A))

Which is the base odds, posterior odds, and odds to probability conversion all in one. The reason why this method is strictly better in my opinion is because the odds breaks down simply if we introduce a third type of person which doesn't pronounce l's. Also, after doing one homework's worth of these problems, you just skip to the final equation in which case my post is just as short as yours.


Hmm, a bit more involved maybe, but not that much. But your calculation sure seems short.

With S = sleeper, and L = letter L, and remembering "total probability":

   P(L) = P(L|S)P(S) + P(L|-S)P(-S), 
(where -S is not S), we have by Bayes

   P(S|L)
 = P(L|S) P(S) / P(L)
 = P(L|S) P(S) / (P(L|S)P(S) + P(L|-S)P(-S))
 = 1/6 * 1/5 / (1/6*1/5 + 1/4*4/5) 
 = 1/30 / (1/30 + 6/30) 
 = 1/7


Odds are usually represented with a colon -- the base odds are 1:4 (20%), not 1/4 (25%).


Assuming that the algorithm is 100% accurate!


I was also distracted by the fact that you can't (usually) hear the difference between English words written with one 'l' and those with two consecutive 'l's.

"Voksal" and "Vauxhall" seem like they should each have six phonemes.


Likewise, in the military, the use of countersigns have been designed to make non-native speakers stand out - should the countersign be compromised. For example, in WW2, Americans would use "Lollapalooza", as Japanese really struggled with that word.


That's more of a shibboleth than a secret, which is literally a practice as old as the Bible - "And the Gileadites took the passages of Jordan before the Ephraimites: and it was so, that when those Ephraimites which were escaped said, Let me go over; that the men of Gilead said unto him, Art thou an Ephraimite? If he said, Nay; Then said they unto him, Say now Shibboleth: and he said Sibboleth: for he could not frame to pronounce it right. Then they took him, and slew him at the passages of Jordan: and there fell at that time of the Ephraimites forty and two thousand."


If you combine a shibboleth with a secret, how do you call it? Shibbecret?


Hmm, I'd think that in a rhotic accent a word like "furlstrengths" or "fatherlands" would work better? In Japanese they sound like [ɸɯ̟ɾɯ̟ɾɯ̟sɯ̟tɯ̟ɾiĩsɯ̟] or [haɾɯ̟sɯ̟tɯ̟ɾiĩsɯ̟] and [hazaɾɯ̟randozɯ̟] respectively, rather than the native [fɚɹłstɹiŋθs] or [fɚɹłstɹiŋkθs] and [faðɚlændz]. Adjacent /rl/ pairs are a special challenge, there are multiple unvoiced fricatives that don't exist at all in Japanese, and consonant clusters totally violate Japanese phonotactics to the point where it's hard for Japanese people to even detect the presence of some of the consonants. By contrast Japanese [ɾaɾapaɾɯ̟za] is only slightly wrong, requiring a little bit more bilateral bypass on the voiced taps and a slight rounding of the ɯ̟ sound.

Some Japanese-American soldiers would be SOL tho.


A single letter is chosen randomly? Huh? Why would you do that?


Seems a bit pointless to ask. You want them to make up a story? "The data scientist's radio link degrades to static while he waits for the answer and all he hears is the letter 'l'". There.


It's just a bit funny to come up with a clever justification for 50% of the problem only to quit at the last moment with tacked-on math problem stuff.


Haha fair enough.


Really small?

How many russians in america are actually sleeper agents?


That would be a good argument in a general case but the premise of all Russians present at the diplomatic gathering being secret agents is clearly stated.


The ML/DS positions highly competitive these days. I don't get why ML positions requires hard preparations for the interviews more than other CS positions while you do similar things. People expect you to know a lot of theory from statistics, probability, algorithms to linear algebra. I am ok with knowing basic of these topics which are the foundations of ML and DL. But I don't get to ask eigenvectors and challenging algorithm problems in an ML Engineering position at the same while you already proof yourself with a Masters Degree and enough professional experience. I am not defending my PhD there. We will just build some DL models, maybe we will read some DL papers and maybe try to implement some of those. The theory is the only 10% of the job, rest is engineering, data cleaning etc. Honestly I am looking for the soft way to get back to Software Engineering.


In part because ML fails silently by design. Even if the code runs flawlessly with no errors, the outputs could be completely bunk, useless, or even harmful, and you won't have any idea if that is true just from watching The Number go down during training. It's not enough to know how to build it but also how it works. It's the difference between designing the JWST and assembling it.


ML doesn't just fail silently by design. because ML is based on error minimization, it fails in a way that is maximally hard to tell from random garbage. This is, remarkably, a subtlety that is lost on most people, which is a real surprise- my introduction to this was in structural biology, where you always do hold-outs and check the performance on the hold-out set before overfitting is such a problem.


Absolutely, the result you get is "the best you can do given the baked-in assumptions". But of course the assumptions can be wrong. And it takes time to learn how to evaluate and revise your assumptions in any analytical field, hard or soft.


But the OP was asking something different, that is why someone should excessively focus on theory, when, by the way, DL theory is very far from being solid and trial and error in ML and AI is the common way of operating.

The "model is in place, but I have no clue what's doing and so it can fail without me understanding when and how is straw-man". Especially for supervised learning, that is, we have a label for data, it is immediately clear whether the output of the model is "bunk, useless, or even harmful". There is no "fail silently by design".

I have been working in the field for almost 20 years in academia and in industry and it is not that I starting every PCA thinking about eigenvectors and eigenvalues and if you ask me now without preparing what are those, I would be between approximately right and wrong. But I fit many, many very accurate models.


You are considering only the technical aspects of the model. While of course important to understand, those are less interesting when considering potential harms than the downstream effects of the inference pipeline, particularly when it comes to interpretations of outputs. What is absolutely the worst possible MO is to offload the interpretation portion of a pipeline to a machine using proxy metrics without an exceptional model which justifies the approach unequivocally.

For instance, if we put an MSE loss function on a classification NN with sigmoid outputs, and used a classification dataset, we could generate an entire zoo of "many, many very accurate models" as measured by MSE. But once your model returns outputs, how do you interpret them to predict a label for some input data? You could hack some algorithm together (eg argmax of the highest value) which is indistinguishable from the "correct" procedure but the described probabilities are so incorrect that no ML professional would be comfortable trusting anything it says, not least because of the violation of the condition that the probabilities are non-negative and sum to one. But being able to explain why we use MSE or cross-entropy or any other loss function and which output activations (hint: and probability distributions) they are typically associated with actually has a very deep origin in the foundations of probability theory which blows open a whole new way of thinking about statistical modelling that is not made available in any of the programs whose materials I've been exposed to.


"But being able to explain why we use MSE or cross-entropy or any other loss function and which output activations (hint: and probability distributions) they are typically associated with actually has a very deep origin in the foundations of probability theory which blows open a whole new way of thinking about statistical modelling that is not made available in any of the programs whose materials I've been exposed to. "

What is the "very deep origin"? What is this "new way of thinking"? And what's so wrong with using argmax to make a classifier, if I don't care about estimating probabilities and just want the answer?


A lot of processes downstream to inference benefit from having a minimum of care put into the system design. We're talking 80/20 rule stuff here. It's a simple reorientation vs a janky argmax-classifier, but results in assumptions being obeyed broadly, in a max-entropy sense.

The key insight is that all prediction models can equally be framed as energy-based models (y = f(x) -> E = g(x, y)) and the job of ML is to estimate the joint distribution of x and y with suitable max-entropy surrogate distributions, and performing MLE on this variational distribution vs some training data. All the math in the theory follows from this (perhaps excluding causal stuff but actually I am not familiar enough with those techniques to say for sure). Things get a little more complicated when you consider e.g. autoencoders but above still holds.

Obviously with the choice of a poor surrogate distribution, your predictions will on average be worse. Yes, even if you don't care about probabilities and just want max-likelihood predictions -- your predictions will on average be worse. By construction, analysis proceeds by framing the problem as this and following through. A janky argmax-classifier is not exempted from this -- it, too, already implies a surrogate distribution, but you know, statistically speaking, it's probably a pretty bad one. So it makes sense to put a tiny bit more effort to get way closer to representing the space that your data lives in.

Naturally, you could easily find a janky model that outperforms some relatively unoptimized principled model on a specific use case, and many do get lucky with this. But the principled model has a lot more headroom specifically in terms of the information it can hold, because if the design is more or less correct to the problem specification then the inductive bias built into the model matches closely with the structure of the data which is observed.


Very few of ML is "principled" (e.g., taking account the probability distributions, priors, bounds on the value of parameters etc,), actually it is most of the time a brute-force approach that makes modelers avoid "thinking" about probability distributions, transformations etc.

I did a lot of the "principled" modeling you talk about, in Stan, TMB, and JAGS back in the day, but outside of the need for an "explanation" of model behavior—which is a scientific need much more than engineering need (mind you, here not having an explanation does not need having no idea what the model does, but it relative to the relationship between x and y, both in how we reach the estimation of parameters and the interpretation of the parameters themselves)—I would almost always favor a "brutish" for prediction in industry, out of (1) convenience, (2) accuracy that's almost always better for ML models even using un-principled methods, (3) outside of proper causal inference, predictions are what matters and even when people demand an "interpretation", causality when data and model are not up for that kind of analyses, is a just a guess anyway.


Scientific vs engineering needs is a false dichotomy. Explanation of model behavior matters a lot in many, many matters of engineering, but my point is trying to go further.

You may be thinking narrow-mindedly about what is meant by "interpretation". Or rather, conflating "interpretation of predictions of ML system", which is the common understanding in professional circles, with "interpretation of the real system whose aspects we are predicting with ML", which is a more colloquial frame. I hold you to no fault as I have been ambiguous in my usage and the two overlap quite substantially, particularly at the outputs of the ML system.

An alleged association between homosexuality and passport photos, for instance, is an interpretation of the ways humans exist and what they are fundamentally (read: physiognomy). Automating this association encodes a specific human-level interpretation about what is true about people into the ML system. But this joint distribution between homosexuality and the way a face looks when you record a picture of it is bogus in ways that are hard to put into words. The principle is lacking completely. And this kind of system can very easily be used for extreme harm in the wrong hands.

Nevertheless, surely someone motivated would (1) consider this approach convenient, (2) would have an accurate (vs data) model after the training completes, and (3) would use the raw predictions as they think those "are what matters".

I find, not only for myself but others as well, that being aware of the technical foundations opens the space of cognition to other perspectives of thinking about these issues which find synthesis between the technical and the social impacts of design decisions.


Do you have a reference to a paper that demonstrates the empirical superiority of energy-based models to well-tuned "janky argmax-classifiers"? I find it a little hard to believe there's a free lunch here given the relative popularity of basic argmax stuff – if energy-based models were obviously better, it seems like they'd be used more. But I am open to evidence on this point!


What you described seems to me pretty standard in ML and even more in statistical modeling. Maybe because I am coming from applied math and statistics.


> In part because ML fails silently by design.

That's why there's so much iteration and feedback gathering (e.g. A/B tests) as a part of DS/ML, which incidentally is rarely a part of the interview loop.

Anyone who claims they can get a good model the first time they train it is dangerously optimistic. Even the "how it works" aspect has become more and more marginal due to black boxing.


I'm sure this happens, but do you think the problem is actually one of mathematical savvy?

My guess would be that more machine learning projects go off the rails for want of understanding the data or the {business, research} problem.


My experience is bulk of the problem is insufficient monitoring. ML systems need heavy monitoring and should be sending lots of metrics to stuff like prometheus/grafana. There should also be validation/consistency checks for all data pipeline/feature transformations. And you should strongly avoid duplicating logic for stuff like feature preprocessing. I've seen people implement "same" feature preprocessing pipeline twice (one python, one java) and it is so common to find edge case bugs for a long time especially when these bugs only slightly impact model behavior.

Another issue is proliferation of data pipelines. The more distinct pipelines you have, the more painful they become to monitor. It is much better to minimize pipelines and do views on a small number. I think proliferations of models is a similar issue. It is often easier to build 4 models instead of 1 multi-task model, but monitoring/operational tasks grow more and more painful as you manage more models.


Not necessarily mathematical savvy though a lot of deeper understanding can follow from a strong grasp on the fundamentals. I think it has more to do with the alignment between intuitions and outcomes, and this is not taught well in most academic programs as far as I can tell.


And you learn literally zero about a candidate's ability to understand when and why things work by asking questions about eigenvectors. Someone can understand what an eigenvector is and still not have any clue about how you figure out a system is working, why it's working, what is likely to happen in production, how you test the limits of your method's ability to generalize, how you take an real problem and find something that you can productively use ML on, etc.

People say things like "you need to know how it works" but "it" doesn't work using your knowledge of eigenvectors. If you want to test how "it" works, test that, literally. Put up a model on the board and a dataset. Ask people about what might happen when you apply one to the other. What changes they would make in response to changes in the data. What they would do in response to the following training curves, budget limitations, etc.

These interviews are terrible and they select for people that regurgitate facts.


The "trivia" tests, when used (IMO) correctly, are not for testing whether or not the candidate recognizes the term and can regurgitate a definition. I prefer to listen to how they phrase their response to get a sense of the intuition behind the understanding of the concept as well as how it may fit in to a larger mathematical framework (i.e. their internal model for mathematical analysis).

I am not looking for someone to answer the question correctly, but to answer the question in a way that demonstrates deeper insights, which helps immensely in research settings as re-using properties of mathematical constructs in novel ways is often how theory and practice both are advanced.

I would be much less interested in someone giving a precise definition of eigenvalues than to describe them in such a way that they understand e.g. what can be deduced about an operator when one of its eigenvalues is zero.


Maybe "eigenvectors" is a bad example, because it's a pretty foundational linear algebra concept.

But there is a threshold where it stops being a test of foundational knowledge and starts being a test of arbitrary trivia, and favors who has the most free time to study and memorize said trivia.


The difference between trivia and meaty knowledge is somewhat contextually dependent, but an understanding of how core probability and statistics concepts are integrated into the framework of machine learning by means of linear algebra and the other analytical tools is pretty damn useful to have substantive conversations about ML design decisions. Helps when everyone in the team speaks that language to keep up the momentum.


Having recently completed an MLE interview loop successfully at a top company, I'm wondering where you are getting asked complicated linear algebra questions in interview?


Hopefully you aren't equating "eigenvectors" to "complicated linear algebra question".

But I agree, a lot of MLE roles don't get asked such things.

I think the OP's guide is closer to interviews I've seen for phd programs.


> Hopefully you aren't equating "eigenvectors" to "complicated linear algebra question".

They explicitly say something harder than eigenvectors in the GP.

I was imagining something involving the spectral theorem or something like that, ie. beyond the most basic linear algebra.

OPs guide seems to cover plenty of things I'd expect someone to learn in undergrad, I think I touched on almost all of this - except for stuff involving jax and recent CNN architectures, both of which can easily be supplemented online.


A reason for such requirements is similar to that that software engineers need to leetcode hard: supply and demand. Prestigious companies get hundreds, if not thousands, of applications every day. The companies can afford looking for candidates who have raw talent, such as the capability of mastering many concepts and being able solve hard mathematical problems in a short time. Case in point, you may not need to use eigenvectors directly in the job, but the concept is so essential in linear algebra and I as a hiring manager would expect a candidate to explain and apply it in their sleep. That is, knowing eigenvector is an indirect filter to get people who are deeply geeky. Is it the best strategy for a company? That's up to discussion. I'm just explaining the motives behind such requirements.


I can’t help but think there’s been a ton of filters used in the past to figure out if someone is deeply geeky, and we’ll continue to invent more in the future.

It’s really looking like another rat race. Especially since there’s no central authority, every hiring manager has the potential to invent their own filter, and make it arbitrarily harder or easier based on supply and demand (and then the filter drifts away from the intended purposes).


It will be rat race when there are so many interview books and courses and websites. It was a not rat race before 2005, when there were only two reasons that one can solve problems like Pirate Coins or Queen Killing Infidel Husbands: the person is so mathematically mature that such problems are easy for them; the person is so geeky that they read Scientific American or Gardner's columns and remembered everything they read.


You're missing the third category: people like myself who absolutely love this kind of riddles and destroy them in a few minutes, without any significance on their actual work abilities.

I don't think I'm a bad engineer, but I'm certainly not the rock star you absolutely need for your team, but when it comes to this kind of “cleverness” tests, I'm really really good.

I've had the “Queen Killing Infidel Husbands" (with another name) in an interview last year and I aced it in a few minutes, and I didn't knew about "Pirate Coins", but when I read your comment HN said your comment was "35 minutes ago" and now it says "40 minutes" which means I googled the problem, figured out the solution and then found the correction online to see if I was right in less than 6 minutes, and so while I'm putting my son to bed!

It's really sad because there are many engineers much better at there job than me who will get rejected because of pointless tests like this…


If somebody asked me logic/brainteaser questions like that, I would politely stop them, explain that if they're asking me that question I'm not a good match for the company, and if they would like to ask a better question, I'm open to it, but otherwise, we can end the application process now. I did that recently with a junior eng who asked me a leetcode question literally with the same exact test data as the leetcode page. I ended up explaining to the CEO that at the very least his engineers should be creative enough to come up with different test data, but that realistically, if "recognize the need for, and implement binary search in 45 minutes" is your go-to question, I'm not gonna be a match at your company.

I had to fight my way into google by doing every bit of prep and practice to solve stupid questions and code quicksort but when I joined, nothing I did in the 12 years I was there required any of that. And I wrote high performance programs that ran on millions of cores (I did know some folks who needed that skill, like the search engine developers, or the maps engine, or the core scheduling algorithms in borg). The entire time I was there I tried to get people to understand the questions they're asking are just not good indicators of programming, but it was repeatedtly pointed out, the goal is to minimize false-positive hires.

I do admire your ability to solve problems like that quickly, always wished I could.


> If somebody asked me logic/brainteaser questions like that, I would politely stop them, explain that if they're asking me that question I'm not a good match for the company

This is exactly what I started to do after I was asked a leetcode-based question for a SRE manager position.

It turned out that by making clear my "profile", I stopped to have bullshit interviews and started to get ones more aligned to actual daily work.


The Queen problem first showed up in a Putnam Math Contest. If you solved it in no time, then you're mathematically talented, which puts you in the first category.


I'm not questioning the fact that I'm kind of gifted when it comes to mathematics (I actually ranked #72 in a nation-wide math contest in France when I was 10) but you were talking about “maturity” and not innate skill. Since don't have a math degree and I haven't done math in more than a decade, I'm definitely far from “mature” on any mathematics perspective that can matter for a job.

And after ten years working in the industry, I can assure you that it is not a skill I can leverage a lot in my job…


But if there is an abundance of supply, the company has to use some kind of filter.

Testing for geekyness and ability to solve tricky coding math problems, seems like a rational way to do that.

If companies were starving for talent because 'nobody could pass the test' - it would be another thing.

But they have to set the bar on something, somewhere.

I can't speak to AI/ML but I would imagine it might be hard to hire there, given the very deep and broad concepts, alongside grungy engineering.

I've rarely had such fascination and interest in a field that I would never actually want to work in.


There’s an abundance of supply of people with masters degrees in machine learning? How’s that possible? I thought this shit was supposed to be hard.

Has humanity just scaled way too hard or something, because if we’re having an abundance of supply in difficult cutting edge fields to the point where they also have their own version of Leetcode, then what hope do average people have of getting any job in this world?

Or, is it at all possible that companies are disrespecting the candidate pool by being stingy and picky?

Maybe the truth is gray.


I currently work as an ML engineer and have interviewed on both sides for some well known companies.

The absolute demand in number of people is small compared to popularity. It would not surprise me at all if many computer science master's programs had a majority of the students studying machine learning. I remember in undergrad we had to ration computer science classes due to too much demand from students. I think school had 3x majors over a couple year time period in CS.

The number of needed ML engineers is much smaller than total software engineers. When a lot of students decide ML is coolest we have imbalanced CS pool with too many wanting to do ML. Especially when for ML to work you normally need good data engineering, backend engineer, infra, and the actual ML is only a small subset of the service using ML.

At the same time supply of experienced ml engineers is still low due to recent growth of the field. Hiring 5+ years of professional experience ML engineers is more challenging. The main place were supply is excessive is for new graduates.


> There’s an abundance of supply of people with masters degrees in machine learning? How’s that possible? I thought this shit was supposed to be hard.

I think it's just a matter of proliferation of these types of programs, as well as a large supply of students.

Also, the average qualification of people working in ML is probably no longer a Ph.D, like it used to be. This is arguably because deep learning techniques require less involved math to understand, and are more focused on computational methods that work well.

So the field has probably saturated. When I got involved with ML for the first time (well, really, statistical signal processing) in the mid 2000s, the field was kind of dead, and very high qualified postdocs had tough time finding jobs.


> There’s an abundance of supply of people with masters degrees in machine learning? How’s that possible?

I don't know for ML, but there are almost 12k Masters CS degrees awarded per year and 1.1k PhDs. If my university is any indication, then there's a good portion of those that are ML or doing some sort of ML in their research. But even if it was just 10%, that's a lot of people per year that are being added. This is just the US btw.

https://datausa.io/profile/cip/computer-science-110701


> Case in point, you may not need to use eigenvectors directly in the job, but the concept is so essential in linear algebra and I as a hiring manager would expect a candidate to explain and apply it in their sleep.

Exactly. Whenever eigenvectors come up during interviews, it’s usually in the context of asking a candidate to explain how something elementary like principal components analysis works. If they claim on their CV to understand PCA, then they’d better understand what eigenvectors are. If not, it means they don’t actually know how PCA works, and the knowledge they profess on their CV is superficial at best.

That said, if they don’t claim to know PCA or SVD or other analysis techniques requiring some (generalized) form of eigendecomposition, then I won’t ask them about eigenvectors. But given how fundamental these techniques are, this is rare.


Given that PCA is heavily antiquated these days, I'd say that asking your candidates to know algebraic topology (the basis behind many much more effective non linear DR algorithms like UMAP) is far better... But in spite of the field having long ago advanced beyond PCA, you're still using it to gatekeep.


The initialization strategy for UMAP is important enough that asking about that in practice is probably more important than anything out of Ghrist's book as an interview question

cf. https://twitter.com/hippopedoid/status/1356906342439669761


UMAP (and t-SNE) aren't the same as PCA. UMAP is pretty close to t-SNE and I think expanding PCA (Principle Component Analysis) and t-SNE (teacher Stochastic Neighbor Embedding) explain the difference. Neighbor embedding is a visualization technique and not the same as determining principle components. PCA preserves global properties while t-SNE and UMAP don't. They are good techniques for _visual_ dimensional reduction, but they aren't going to tell you the dominant eigenvectors of the data, or _dimensional reduction_. This is a bit of a pet peeve of mine.

There's some more in this SE post https://stats.stackexchange.com/questions/238538/are-there-c...


> asking your candidates to know algebraic topology

Congratulation, you've eliminated 99% of the ML research community.


and yet we're also told that tech companies can't get enough people.


> But I don't get to ask eigenvectors and challenging algorithm problems in an ML Engineering position at the same while you already proof yourself with a Masters Degree and enough professional experience.

People know pity passes exist for Master's degrees. You can't trust that someone actually knows what they should know just because they have a degree. Ditto professional experience. The entire reason FizzBuzz exists is because people with years of profesional experience can't program.


We aren't talking about FizzBuzz here; but rather the fashionable practice of subject people to 4-6 hours of grilling on "medium-to-hard" problems that you absolutely cannot fail, or even be slightly halting in your delivery on. And which can only be effectively prepared for by investing substantial amounts of time on by-the-book cramming.

On top of the fact that these problems are often poorly selected, poorly communicated, conducted under completely unrealistic time pressure, often as pile-ons (with 3-4 strangers as if just to add pressure and distraction), and (these days) over video conferencing (so you have to stare in the camera and pretend to make eye contact with people while supposedly thinking about your problem, on top of shitty acoustics), etc, etc.

It's just fucking ridiculous.


I'm quite happy these places makes it so clear they're not places I would be happy to work. I always ask about the interview process and tell the recruiters I'm not interested if they expect really lengthy processes. I'm fine with things dragging out of they have additional questions after initial interviews, but not if their default starting position is that they need that.


I figure the best way to prepare for an ML job is to pull out the nastiest working rat’s nest of if statements you’ve ever written & claim it was autogenerated by an adversarial network (which was you fighting with your coworkers over your spaghetti code).


This really made me laugh, thanks.


I've interviewed well over 100 people for DL/ML positions. This may be a good roadmap to what some people ask, but it's a terrible guide to what you should ask. It's like a collection of class exam questions.

Just as in programming, the world is full of people who can recite facts but don't understand them. There is no point in asking what an L1 norm is and asking for its equation. Or say, giving someone the C++ code that corresponds to computing the norm of a vector and asking them "what does this do". Or even worse, showing them some picture of some cross-validation scheme and asking them to name it. Yes, your candidates should be able to do this, but positive answers to these kinds of questions are nearly useless. These are the kinds of questions you get answers to by Googling.

It's far more critical to know what your candidate can do, practically. Create a hypothetical dataset from your domain where the answer is that they need to use an L1 norm. Do they realize this? Do they even realize that the distance metric matters? Are they proposing reasonable distance metrics? Do they understand what goes wrong with different distance metrics? etc. Or problems where they need to use a network but say, padding matters a lot. Or where the particulars of cross validation matter a lot.

This also gives you depth. "name this cross validation scheme" gives you a binary answer "yes, they can do it, or no they can't" And you're done. If you have a hypothetical dataset, you can keep prodding. "Ok, but how about if I unbalance the data" or "what if we now need to fine tune" or "what if the payoffs for precision and recall change in our domain", "what if my budget is limited", etc. It also lets you transition smoothly to other kinds of questions. And to discover areas of deeper expertise than you expected. For example, even for the cross validation questions, if you ask that binary question, you might never discover that a candidate knows about how to use generalized cross validation, which might actually be very useful for your problem.

The uninformative tedious mess that we see in programming interviews? This is the equivalent for ML/DL interviews!


Do you have any books/material that can help the learner acquiring this deeper understanding?


I know one good reference.

https://www.deeplearningbook.org/

Also there are various courses and lectures but that needs time and effort. There is no short cuts like the book posted by OP.


I believe "shortcut books" like the one posted by OP appeal to some of us because it's a succinct source of, basically... lookup queries. That is, when we run across a question we can't answer, we'd prefer to Google that topic on our own and learn in our own nonlinear style. Don't give me a pile of textbooks corresponding to 6-10 semesters of classes, give me a single book like this and let me research by myself everything it refers to.


Yeah. You just have to build models, experiment, intentionally make bad decisions, and get a feel for how things work. There's no clear shortcut.

But, this is also what you will practically be doing.


I make a point of never doing "projects" during a recruitment process. The fact is that there are too many good opportunities out there that will not require me two spend a full day or even several days in a week to complete a job application, without compensation. Whenever I hear that there is a "project" to complete, I just tell them that I already have tons of projects to work on and pass the "opportunity".


A problem with these questions is that a lot of them people can answer without knowing ML/DL, admittedly cherry picked but still.

For example what is the definition of two events being independent in probability?

Or the L1 norm example: 'Which norm does the following equation represent? |x1 − x2| + |y1 − y2|'

Find the taylor series expansion for e^x (this is highschool maths).

Find the partial derivatives of f (x, y) = 3 sin2(x − y)

Limits etc...

These aren't specific to deep learning or machine learning, not that I claim to be a practitioner.


Exactly I though the same. Not sure what a really good alternative is. BUT you may be in risk to get bad candidates, since they might be the ones with the best intrview practice.

Maybe that kind of questions are ok for people without expirience but not for seniors.


Data science and ML interviews can be tough because it's very difficult to prepare for everything and cover all the theory. A lot of the value you add comes from knowing the theory so it's understandable to test it but it's still hard to prepare well. And you have a take-home and/or LC style problem(s) in addition to the theory interview.


The hard questions in DS/ML interviews I've received over the years aren't the theory questions (which I rarely get asked), but the trick SQL questions that often depend on obscure syntax and/or dialect-specific features, or "implement binary search" when I'm not in the mindset for that as that isn't what DS/ML is in the real world.


I think they're fine as long as you know the format and have an opportunity to prepare or just get in the right mindset for it. And some things (like binary search) should be easy to write anyway.

The SQL questions can also be a symptom of the type of job - Facebook's first data science round focuses a lot on SQL but that's because it's a very product/analytics/decision-making focused role without that much coding or ML. With data science you have to be more careful about these things when searching for a job; you can't just use the job title as a descriptor.


> And some things (like binary search) should be easy to write anyway.

It's a different story when a) your mind is set on statistics/linear algebra b) you've never had to actually implement binary search by hand since college and c) even if you do implement the algorithm and demonstrate that you have a general understanding, it must work perfectly and pass test cases otherwise it doesn't count.

FWIW I was rarely asked about algorithmic complexity which is more relevant in DS/ML, albeit it's usually in the context of whiteboarding another algorithm and the interviewer mocking me for doing it in O(n) instead of O(logn).


Binary search in particular is surprisingly tricky, which is precisely what makes it useful for telling if someone knows how to program. To a significant extent, though, you can cheat by studying binary search itself, which is a surprisingly beautiful thing.

I like this formulation for finding the first index in a half-open range where p is true, assuming p stays true thereafter:

    bsearch p i j :=
     i                   if i == j else
     bsearch p i       m if p m    else
     bsearch p (m + 1) j
     where m := i + (j - i)//2
Or in Python:

    def bsearch(p, i, j):
        m = i + (j - i) // 2
        return (i if i == j
                else bsearch(p, i, m) if p(m)
                else bsearch(p, m+1, j))
The only tricky thing about this formulation is that m < j if i < j, thus the asymmetric +1 in only one case to ensure progress. If invoked with a p such as a[m] >= k it gives the usual binary search on an array without early termination. The i + (j - i) // 2 formulation is not needed in modern Python, but historically an overflowing (i + j) // 2 was a bug in lots of binary search library functions, notably in Java and C.

(Correction: I said a[m] <= k. This formulation is less tricky than the usual ones, but it's still tricky!)


> Binary search in particular is surprisingly tricky, which is precisely what makes it useful for telling if someone knows how to program.

That's the problem. There are many other ways to do that without risking false negatives and annoying potential candidates (e.g. I would not reapply to places that have rejected me due to skepticism about my programming abilities and using tests blatantly irrelevant to day-to-day work because it's a bad indication of the engineering culture).

Even FizzBuzz is better at accomplishing that task.


FizzBuzz (or equivalent) is actually great IMO. It weeds out the people who lied on their resume, without punishing the people who never learned CS because they were too busy learning things that were actually useful to DS, like statistics or data visualization.


I've actually been given fizzbuzz in a DS interview! Up to that point I thought that fizzbuzz was just a meme because it's obviously too easy.


I tried to make Fizzbuzz on a paper when I first heard of it, and it had a bug printing fizzbuzzfizzbuzz on 15.

If you want a correct program without a compiler/computer I don't think anything is too easy. Maybe like, "make a function returning the sum of two float parameters".


That would just test syntax, though. Fizzbuzz tests logic. Your bug was a logic bug.

To a certain extent you can dispense with mental logic by using a compiler. But the feedback loop is much slower. Thinking your logic through before feeding it to a compiler is like looking at a map when you're driving a car; you can cut off whole branches of exploration.

Binary search is a particularly tricky logic problem in part because it's so deceptively simple. In a continuous domain it's easy to get right, but the discrete domain introduces three or four boundary cases you can easily get wrong.

But the great-great-grandparent is surely correct that many programming jobs don't require that level of thinking about program logic. Many that do, it's because the codebase is shitty, not because they're working in an inherently mentally challenging domain.


Ye I meant running it and then correcting the error.

Concerning binary search I acctually implemented that in an ECU for message sorting. It took like a whole day, including getting the outer boundries one off too big in the first test run. Funnely enough the vehicle ran fine anyway.

I would never pull that algorithm off correctly in an interview without training to, I think.


Take a look at the downvoted-to-0 formulation I gave upthread, then see if you can program it that way tomorrow without looking, and then think it through to see if it could possibly be wrong, and once you're satisfied it's correct, try testing it. Probably you'll never need to implement binary search yourself again, but it's a good exercise for thinking through algorithms. You can probably get it working that way in under an hour instead of a whole day.


There are levels of not knowing how to program that go beyond FizzBuzz. But sure, many programming jobs don't require them.


If that's the case for the DS/ML domain, then a short take-home exam should provide a better example of practical coding ability (the common counterargument that "take-home exams can be gamed" is a strawman that would be more on the interviewer's fault for creating a flawed exam).

In my case, I typically got the "implement binary search" questions in a technical interview after I passed a take-home exam, which just makes me extra annoyed.


Agreed.

If you're gaming the take-home exam by looking up the answer on Stack Overflow, you could game the same exam in person by reading books of interview questions ahead of time, and the interviewer can avoid that by making up new questions. (OTOH if you're gaming the take-home exam by paying someone else to solve the problem for you, that might be harder to tell.)


Why is that gaming the exam? What sort of professional doesn't look up the solutions to potential problems online, even if it is just to verify that you're correct? Outside of incredibly trivial things, I would expect this of everyone.


I suppose it depends on whether the purpose of the exam is to see if you know how to write working code to solve new problems or how to look up known solutions to well-known problems. Both are valuable skills, but they are definitely not the same skill. Perhaps telling the difference is one reason interviews frequently include in-person programming challenges rather than using take-home exams.

In most cases the right way to do a binary search is not to copy and paste a binary search implementation from Stack Overflow (https://stackoverflow.com/a/41956372 is similar to the formulation I gave above), but to call a binary-search library function. If calling a library function isn't the solution the interviewer is looking for, probably they wouldn't be satisfied with you searching for it on Stack Overflow either.


Facebook Product Data Science has always been a Product Analyst role more than anything else. I did the interviews a while back, and it was a pretty fun experience, but it's not what a lot of people call data science.


> but it's not what a lot of people call data science

I think that's changed a bit over time and the term has expanded to mean more things. In addition to Facebook, another great example is this article from Lyft in 2018 where they say that they're renaming all their data analysts to data scientists and all their data scientists to research scientists - https://medium.com/@chamandy/whats-in-a-name-ce42f419d16c


This is called title inflation.

Like the hilarious thing about Facebook and Data Science is that the term was invented there, and they needed to retitle all of their analysts (like Product Data Science) as they couldn't hire any analytical people with an analyst title in SV (or so I have been told).

Like, data science was defined back in the days as a social science PhD who could run experiments and write MapReduce jobs. I'm pretty sure that most people would disagree with this definition these days.


Yes. My point is that the person running experiments and mapreduce jobs is still called a data scientist. But so is the product analytics person (btw product data scientists run experiments too). And there are some other data scientist job profiles too (more focus on research, more focus on engineering etc). So it's not really a complete redefinition of the term, it's more of an expansion of the types of jobs it covers.


In my experience, it varied greatly from team to team.


I had an "implement binary search" interview once. I came away feeling like I was being interviewed for the wrong role. I don't understand how anyone could think that's an appropriate interview task for a DS position.


I'm an MLE and I get asked much harder questions than that. Implement a binary search seems ... fine?


Implementing anything even a little tricky under pressure can be tough. unless you’ve practiced with bit or pointer twiddling regularly, you are mostly validating whether they did interview prep or not. That probably selects for more serious candidates, so it probably works. But i was tripped up by a simple binary search problem the other day, even after i’d just solved several harder problems quite quickly. It’s just the nature of algorithmic problem solving — until you’ve done a lot of prep, it’s dicey whether a novel problem will take me five minutes or five hours to solve.


But it makes sense for MLE! IMO you should ask a stats or probability question in a DS interview.


The distinction between the two roles isn't that clear. Some data science jobs are very focused on engineering.


Agreed. MLE in very ML-heavy companies tends to mean SWE who work on ML systems, and sometimes, that can mean as much working on stuff like infrastructure as modeling.


I actually bought this as a physical book on Amazon. Naturally it came as a print-on-demand book. Unfortunately it has many problems in this format. E.g. the lack of margins makes it hard to read the end of sentences towards the gutter. Also some text is pushed into each other. Not sure what source file format you have to provide to Amazon, but it's certainly not the pdf provided in the repo.

Edit:

It seems the overlapping text also occurs on some pdf readers: https://github.com/BoltzmannEntropy/interviews.ai/issues/2


The last 5 textbooks I bought new on amazon had similar problems. Totally unacceptable. I started returning them and (because most were exclusive to amazon) started buying them new on ebay with great results.


It's really a hit and miss. This [1] book also came as print-on-demand but looks perfectly fine. Good layout and clean colors.

[1] https://mml-book.github.io/book/mml-book.pdf


Fisher Information is under the "Kindergarten" section?

Maybe I've just been interviewing at the wrong places, I'd be very curious if anyone here has been asked to even explain Fisher information in any DS interview?

It's not that Fisher information is a particularly tricky topic, but I certainly wouldn't put it as a "must know" for even the most junior of data scientists. Not that I wouldn't mind living in a world where this was the case... just not sure I live in the same world as the authors.


When I was a mathematician it was pretty common to make jokes whenever we actually had to evaluate an integral, along the lines of 'think back to your elementary-school calculus...'


“integrate by parts, like you learned in middle school”

tf middle school did you go to?!


It's a joke. Like, we joke that the more math you learn the less arithmetic you can do (ok, maybe that one isn't a joke).


In my undergrad abstract algebra class our professor asked us a question about finding the order of a group that involved dividing 32/8 and we all just sat there for ten seconds before someone bravely ventured "...four?"


I've experienced that many times among groups of electrical engineers - we're all fine discussing equations but once its time to plug in the numbers no one wants to volunteer an answer.


This is amazing. I am ecstatic.

I've been looking for something exactly like this – and it's executed better than I could have imagined.

(Needs a good proofreader still, though! Also, whatever custom LaTeX template the authors are using is misbehaving a bit in various places. Still great content.)


In my 20s, I was doing data science at a very high level spanning multiple disciplines. Truly state of the art. I would like to think I was quite good at my job.

I am 99% certain I would not have passed the interview bars set today. More specifically, the breadth they expect you to master is very puzzling (and seemingly unrealistic).


My problem with these line of numerous shallow books and courses are

  1) Written by people who has no experience in industry or they are not working on "real" machine learning jobs

  2) They think the standard in industry is pretty low and any BS works. For example the concept of "lagrange multiplier" is missing from the book. One need this concept to understand training convergence guarantee.


Question aside: using arXiv for distributing such interview questions, seems to me inappropriate. Is there any SEO trick behind it?


Yes I was also surprised how this is hosted on arxiv. Can someone explain why this is ok ? It is definitely not a scholarly article.


I'm really enjoying the discussion here, as I've been thinking a lot about what a full modern ML/DS curriculum would look like.

I currently work for a non-profit investigating making a free high quality set of courses in this space, and would love to talk to as many people either working in ML/DS or looking to get into the field. (I have ideas but would prefer to ground them in as many real-world experiences as I can collect.)

If anyone here wouldn't mind chatting about this, or even just sharing an experience or opinion, please drop me an email (in my profile).

EDIT: We already have Into to DS, and a Deep RL sequence far along in our pipeline, but are looking to see where we can help the most with available resources.

I really appreciate this Interviews book as an example of what topics might be necessary (and at what level), taking into account the qualifying discussion here, of course.


Are there deep learning roles that focus more on software engineering and using the tools rather than having a deep understanding of statistics?


> having a deep understanding of statistics?

As someone with a strong background in statistics, please tell me where I can find DS jobs that require this.

For me and all my statistics friends in DS we find much more frustration in how hard it is to pass DS interviews when you understand problems deeper than "use XGBoost". I have found that very few data scientists really even understand basic statistics, I failed an interview once because an interviewer did not believe that logistic regression could be used to solve statistical inference questions (when it and more generally the GLM is the workhorse of statistical work).

And to answer your question, whenever I'm in a hiring manager position I very strongly value strong software engineering skills. DS teams made up of people that are closer to curious engineers tend to greatly outperform teams made up of researchers that don't know you can write code outside of a notebook.


A good conceptual understanding of statistics is always helpful.

It's not really tested for in most places though, where they regard a DS as a service that produces models.


There are. But

1) the titles will vary a lot (software engineer, ML engineer, research engineer, data scientist etc.) which makes it hard to locate those jobs and to move in the job market in general

2) you still need a reasonable amount of theory (not necessarily too much statistics) to use the tools well. And in all likelihood you will be tested on it in some way during the interviews.

3) the interviews/job descriptions that don't emphasise the theory often will be for jobs where you get a title like Machine Learning Engineer but you focus more on the infrastructure rather than on the ML code


I would say on average MLE roles tend to be more SWEng heavy. But some roles are as much creating infrastructure as running the tools.


I think they're called research engineering roles or ML engineering


Now someone just train a model using these questions and answers and we will let the model take all future interviews.


This would be great resource for creating a DL/AI course. Or chapter quizzes for such a course.

However, one of the important things when interviewing someone is that the person has not seen the question before. So as an interviewer my impulse would be to first ensure that my question is NOT in this book :)

Or perhaps even if it is in the book, if the question is advanced enough, I could test how they articulate and reason through the solution, so I know they are not simply regurgitating the answer?


I think I know the answer to this, but how bad should I feel for being a software engineer with little-to-no knowledge of deep learning. I suspect it's not bad at all since the software engineering field has split into a few camps, and mine - backend systems work - isn't in the same universe as the machine learning one, for the most part.


Not bad at all. I'm a data scientist and my not knowing React doesn't affect me one bit.


I commend the author’s effort, but this is not reflective of any interviews I’ve been part of, which is many across several industries and levels. Bayesian Deep Learning? Chapter 2 in Kindergarten? If anyone asked me a question on that, I would kindly ask them to eat shit.


Wow, nice resource! Wish it had some sections about (deep) reinforcement learning and its algorithms. Looks like it is in the plan though.


RL is still kind of niche - the number of companies that ship anything using RL and the number of jobs that require it are both quite low.


just a clarification I think you are confused between RL and robotics. RL algorithm could be used anywhere either in ads, nlp, computer vision etc.


Why are all the em dashes missing from the PDF?


This may be a rendering issue. Some interaction of the Computer Modern font, the TeX layout algorithm, and Chrome's rendering engine sometimes ends up making em-dashes and minus signs invisible.


I'm not using Chrome's rendering engine, is he?


Wow! Great resource! Thank you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: