> In this course, we’ll be looking for the following trifecta: (i) ideas that are non-obvious, even to the well-trained computer scientist, so that we’re not wasting your time; (ii) conceptually simple — realistically, these are the only ideas that you might remember a year or more from now, when you’re a start-up founder, senior software engineer, PhD student, etc. (iii) fundamental, meaning that there is some chance that the idea will prove useful to you in the future.
In the first lesson, they discuss consistent hashing, and they seem to have achieved their goals.
ETA: Must include this anecdote on the history of the algorithm:
1. 1997: The implementation of consistent hashing given in this lecture first appeared in a research paper in STOC (“Symposium on the Theory of Computing”) [...] Ironically, the paper had previously been rejected from a theoretical computer science conference because at least one reviewer felt that “it had no hope of being practical.”
2. 1998: Akamai is founded.
3. March 31, 1999: A trailer for “Star Wars: The Phantom Menace” is released online, with Apple the exclusive official distributor. apple.com goes down almost immediately due to the overwhelming number of download requests. For a good part of the day, the only place to watch (an unauthorized copy?) of the trailer is via Akamai’s Web caches. This put Akamai on the map.
4. April 1, 1999: Steve Jobs, having noticed Akamai’s performance the day before, calls Akamai’s President Paul Sagan to talk. Sagan hangs up on Jobs, thinking it’s an April Fool’s prank by one of the co-founders, Danny Lewin or Tom Leighton.
I love this channel, I've been watching their videos for a few weeks. The guy has a great ability to break things down and explain complex topics with ease.
Sorry, I left something out. The authors mention earlier in the lecture that
> [The algorithm (Consistent Hashing)] has real applications [and] gave birth to Akamai, which to this day is a major player in the Internet. [...] (Quantitatively, Akamai serves 10-30% of all internet traffic, and has a market cap ≈ $19B.)
> (i) ideas that are non-obvious, even to the well-trained computer scientist
> In the first lesson, they discuss consistent hashing, and they seem to have achieved their goals.
I was really excited when I read your comment here before clicking the link, but having had a look at the rest of the curriculum, I'm slightly underwhelmed.
- Regularization. The polynomial embedding and random projection, L2 regularization, and L1 regularization as a computationally tractable surrogate for L0 regularization.
- Understanding Principal Component Analysis (PCA). ... The simple geometry of "diagonals in disguise." The power iteration algorithm.
- Low-rank matrix approximations. The singular value decomposition (SVD), applications to matrix compression, de-noising, and matrix completion (i.e. recovering missing entries).
- Graphs as matrices and the Laplacian of a graph. Interpretations of the largest and smallest eigenvectors/eigenvalues of the Laplacian. Spectral embeddings. Interpretations of the second eigenvalue
- Markov Chains, stationary distributions. Markov Chain Monte Carlo (MCMC)
- Fourier methods
- Compressive sensing
- Linear and convex programming. Matrix completion
- Differential privacy
I was thinking that the vast majority of those topics should be pretty much standard knowledge for a mathematically trained computer scientist (not being an expert in each of them, but knowing the basics of what they do, how they work and where they're applied). What level is this course at?
No disrespect to the parent, but I would strongly discourage any aspiring computer scientist or software engineer from concluding that the above list of topics is in any way remedial or “lower division”.
Just over a decade ago Netflix paid out a million bucks (that was real money then) for a modest improvement over their existing recommender based on an SVD-like gradient descent for a low-rank approximate factorization: https://en.m.wikipedia.org/wiki/Netflix_Prize. Many of the competing teams had exemplary academic and industrial pedigree.
Many (if not most) ML practitioners would profit from thinking more about the information-theoretic links between lossy compression, regularization via norm, and channel capacity to this day.
If I correctly gather that this is a syllabus for elite undergraduates, they’ll be grading on a curve.
fully agree. I remember at one point I studied the Perron-Frobenius theorem because I was analyzing a problem in a different domain but mathematically not completely dissimilar to PageRank, and it was crazy to think that a trillion-dollar company had come out of that bit of math.
> Many (if not most) ML practitioners would profit from thinking more about the information-theoretic links between lossy compression, regularization via norm, and channel capacity to this day.
I am a mathematically trained computer scientist (by qualification, not by current practice) who has kept up an interest in both fields for many many years, and I don’t know most of the stuff in that list. I’ll happily accept that I’m just not all that good, but I offer myself as evidence that there might be a market for this course ;)
We like to keep abreast of undergrad classes here, it's useful for knowing what incoming software engineers have been learning as well as to fill any gaps in our own education.
fair enough. also useful for (both sides of) job interviews, I guess. do you keep up to date about relevant research as well? I see a lot of Stable Diffusion posts, but that's arguably not because of the paper but because the published model is hackable. I guess keeping abreast of the papers leading up to Stable Diffusion would have been more important if you wanted to be the entrepreneur who created the tool?
The Stanford CS curriculum only distinguishes on the first digit: 1XX for undergrads and 2XX and above for graduate students. (Undergrads often take one or more graduate courses as they progress.)
Most taking this course are advanced 2nd year or 3rd or 4th year students who have taken the introduction to algorithms course. Most/many CS majors never take this class at all.
All those topics are unimpressive? I’ve spent quite a bit of time trying to teach myself them.
I’ve seen several Stanford courses (e.g CS229) so the low course number of 168 would seem to indicate that it’s taken within the first couple years of college.
At Stanford, 100 level courses are intended for undergraduates while 200 level courses are intended for mixed undergrad/grad. 100 doesn't necessarily indicate freshman/sophomore.
at the veryleast this would belong to a second or third class of algorithms for ug( after the usual meat and potatoes: big,small o notation, searching and sorting, dp, greedy algorithms and so on) depending on the difficulty of the assignments this could be easily a graduate course
remember that no matter how difficult, obscure or new a technical field is you will have people here claiming they took a class about it as a freshman 10 years ago and they wrote some papers about it. sometimes it may be true, but most of the time it isnt
I love this book. As opposed to the way that linear algebra is typically introduced, this book focuses on concrete applications (like text analysis, image/signal processing, finance, ML, etc.) and eschews more arcane concepts (like eigen). To me, building practical intuition is the best way to learn* the subject.
A more advanced but still accessible manuscript by Boyd et al: Generalized Low Rank Models (establishes connections between PCA/SVD and many other matrix factorization methods, and shows you how to roll your own): https://web.stanford.edu/~boyd/papers/pdf/glrm.pdf
Another excellent paper on the similarities between all those linear models is “ A Unifying Review of Linear Gaussian Models”
> Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model.
Interesting to me is that a lot of the material you could also find in a theoretical machine learning course or in an applied mathematics course. Before seeing the syllabus I was expecting more on streaming algorithms, randomized algoriths, etc. In my experience most Computer Science students get a bit freaked out when, e.g., concentration inequalities enter the scene.
"Nothing I learned in CS has an application to industry programming" is a regular complaint in discussion on education here and elsewhere. This is a good course to point to for people who have this very limited view on CS.
>"Nothing I learned in CS has an application to industry programming" is a regular complaint in discussion on education here and elsewhere. This is a good course to point to for people who have this very limited view on CS.
> Lecture 11 (Mon 5/2): Graphs as matrices and the Laplacian of a graph. Interpretations of the largest and smallest eigenvectors/eigenvalues of the Laplacian. Spectral embeddings, and an overview of applications (e.g. graph coloring, spectral clustering.)
What industry fields does this apply to? Machine Learning?
The web is a graph. Each node in the graph is a web page. Each edge is a connection a hyperlink between pages. You can represent a graph as a matrix.
If you random click on links you'll end up visiting some web pages (the more connected ones) more often than others. You can define a probability distribution over pages as "what is the probability I'll end up at this web page after an infinite number of clicks". This is the stationary distribution of a Markov chain.
The stationary distribution is the largest eigenvector of the matrix that represents the graph. This gives a way to compute the stationary distribution via well studied algorithms.
You can use this idea to
1) assign importance to pages based on the magnitude of the probability, to improve web search. You might call this algorithm PageRank.
2) found a company called Google that is currently worth USDtrillions.
1) use the raw count of inbound links, weighted by the count of inbound links of the linking pages, normalize by the total number of links on that page, and run it a few iterations to get a first approximation of the probability distribution. This is a pretty intuitive approach, would almost certainly have been good enough for Google, and avoids all the jargon, probability, and "well studied algorithms".
2) found a company called Google that is currently worth US$trillions.
I love algorithms as much as the next guy, but PageRank is not a fantastic example of success-because-I-knew-eigenvectors despite what their PR folks would have you believe. One can't deny, though, that Google's PR has definitely attracted them a lot of super smart engineers over the decades, especially in the early days!
I think it's pretty easy to construct adversarial examples to your (1) that are dealt with cleanly by real pagerank.
e.g. if A is a Huge Important website, and A -> B -> C, then locally looking at {B, C} will underweight C significantly. (And, sure, you might say to look at k-th order inbound links for your iterative approach, but the adversary can just move the weight to k+1).
Perhaps, as you claim, your approach would have been good enough, but clearly understanding the theory got them something better.
This argument that it’s pretty easy to construct adversarial examples would be much more convincing if…it were not what PageRank actually is. PageRank itself initially ran iteratively and would have had exactly this same problem.
In any case, I’m not sure your example is actually adversarial, as there’s not an action that an adversary could take implied by it? Maybe you meant “poorly handled case”? But yes, run it iteratively. 10 steps? 20? Until convergence for some epsilon?
Nothing against theory, but I think they did fine without it.
Huh, interesting. In the ML and CS theory literature I’ve only ever seen “adversarial example” used to mean an input explicitly designed to produce a specific unexpected output, not just a worst-case output that isn’t what you want.
Do you have an example use of this sense that I could look at to update myself?
1. The algorithm you're attempting to describe in 1) above is either exactly PageRank or an approximation to it, so I don't get the point you're trying to make.
2. The point of jargon and well studied algorithms is that you can recognize when your problem is a problem someone else has already tackled. In the specific case of PageRank, if you recognise it's an eigenvalue problem you can
1) Find efficient algorithms to implement it. In the case of PageRank there is a matrix factorization that can make the algorithm much faster.
2) Find convergence bounds for appoximate solutions such as the power method, so you can appropriately tradeoff compute time versus accuracy.
3) Find numerical techniques to avoid calculating a matrix full of NaNs.
4) Benefit from the decades of work in optimizing numerical linear algebra.
3. When Google was first released it was clearly significantly better than the competition (mostly AltaVista). I assume that was due to PageRank. I wouldn't be surprised if PageRank is much less important to Google now, but to get this point it had to have that initial advantage.
0. Hah, yeah. Pretty sure you’re the first person to accuse me of being anti-intellectual...I can’t say for certain, but I think this interpretation here might be on you.
1. Indeed I think you missed my point: the causality implied in your earlier post reads as “well, if you know math, PageRank pops right out!” — my point is, you don’t need to know anything about eigenvalues for PageRank to be an intuitive solution; the insight isn’t eigenvalues or even probability distributions of pages, it’s ranking by relative rank of inbound links. There are many ways to operationalize that insight, and your post made gatekeeping allusions about what math it takes to do that. That’s what I’m responding to.
2. Sure, but you were originally responding to a post about when graph algorithms might ever be useful in industry. Obviously all things being equal, knowing about graph algorithms > not knowing about them. But again, you don’t need to know about them in advance.
3. No doubt Google’s early success is due to PageRank. But PageRank’s success is due to the insight behind it — following web links backward (in fact the precursor to PageRank was called “backrub”) — and not due to eigenvalues.
To be clear, I’m not at all anti-intellectual, and I love a good math puzzle or algorithm as much as the next guy — and I said as much in my other post.
But I’m also not in love with the fetishization of formalisms that is common in technical and academic circles, and the attendant value system that is so eager to prove itself that it tries to fit any and all adjacent insights into its own paradigm — just as you’ve done here.
I wasn't using jargon for jargon's sake. The original question quoted this little passage:
"Graphs as matrices and the Laplacian of a graph. Interpretations of the largest and smallest eigenvectors/eigenvalues of the Laplacian. Spectral embeddings, and an overview of applications (e.g. graph coloring, spectral clustering.)"
My response used the same terms to quickly sketch how they relate to PageRank, and hence try to show how the theory relates to an application. If I was trying to explain PageRank to someone who wasn't familiar with the maths I would take a different approach, but this is a tiny little textbox and there are already many very nice visualizations and other explanations that one can Google.
Oh, I wasn't accusing you of using jargon for jargon's sake, and nor do I believe that you were -- you used the appropriate jargon appropriately and correctly! And certainly representing graphs as matrices and then applying matrix math to them is valuable, I am absolutely not arguing against that.
I'm just taking issue with the (probably tongue-in-cheek!) framing of your "quick sketch": rather than merely demonstrate the connection ("hey look, PageRank uses this stuff!"), the sketch implied that if one knew theory about graphs and their representation as matrices, then PageRank is pretty obvious and falls out of the math (and then you could start a US$trillions business!). I was content to just leave it at that with a snarky version of "hey, sure, this math is relevant, but it wasn't a requirement to come up with the PageRank insight" -- but then I got triggered by your "anti-intellectual" comment. :)
So now I feel compelled to say, in case any impressionable young minds read this thread: that sketch (which I likely grossly misrepresented above) is a fantasy portrayal of how the insights behind technologies like PageRank come about; matrix knowledge isn't necessary (but not unhelpful, of course) to create PageRank or similar things. I've met too many (not necessarily you!) who tell themselves some version of this story: "PageRank? That's just the largest eigenvector the edge matrix" followed by some variant of "Those guys just got lucky! / Those guys just commercialized some known math! / I do math, and math can be worth billions!"
That story ignores the actual insights required to get to PageRank, even if you already know the theory; it ignores the hard work of actually building a company and commercializing such an insight.
Anyway, thanks for coming to my therapy session. </soapbox>
I'm not the person you're responding to, and perhaps it was their tone more than their content that you're replying to, but...
I think there's a lot of value in a KISS philosophy, and using the simplest most accessible algorithms possible. I don't think that's anti-intellectual so much as minimalist. Less charitably, one could say that reaching for something more complex than is needed is navel gazing or gate keeping.
I'm far from an expert on the topic, so I'm not trying to assert if either of you is more correct than the other as far as what the more appropriate model is.
I'm not into mathematics or algorithms and the like at all, but if someone explains things like PageRank in layman's terms, I'm like, "yeah I get that". That's been the recurring theme with me and mathematics, if it's just the theory, language and formulas I'm like "what?", but give me a practical example / use case, a way to visualize it, and I get it. I struggled with linear algebra in school - I don't even know why it's called that, and I can't explain it in proper terms - but if it would be called "the math of video games" then it instantly becomes a lot more accessible and visualizable (to me).
Anyway. I think I've intuited plenty of CS algorithms in my career, but don't ask me to explain the theory.
In my experience, there’s a language of mathematics that makes an almost comically strong attempt to make itself inaccessible through jargon, single-letter variable names, and an over reliance on symbolic manipulation.
If you talk to most mathematicians or algorithm researchers, you’ll find that they are some of the most intuitive people you’ll ever meet, and they use visualization and other intuitive techniques all the time.
And then the classes are all about symbol manipulation.
I swear there are millions of people out there who could be mathematicians but just couldn’t get excited by all the symbol manipulation.
Photogrammetry & image processing, anything to do with dimension reduction (basically every field with real-world sensor data), ontology/semantic data processing...
In the lecture note 6 about regularization, section 3.4:
the l1 norm of a vector is simply the sum of the absolute values of the coordinates, and hence it
is continuous (and linear). I don't think that the l1 norm is linear since |x+y| # |x| + |y|.
It would be cool to be an expert at algorithms, apply them to hard problems, and build cool things. I feel like, for me, there's no point in learning them since the work I get is always "put a button here, update the DB" type stuff.
I have been able to do some algorithmic work in my 4 decades (mostly lately for generative art) but most of my career was How So I Ship This Complicated App With Too Few People In So Little Time. This usually requires creative thinking and experience, but rarely requires advanced knowledge of many of the topics mentioned. I never had a CS education so I had to learn things myself, but it still makes sense to me today to understand complex topics; even if you never use them, being able to understand such things is a benefit to the complex things you will have to do.
try a side project? when I think of fun ideas, I always notice how easy it is to bump into computationally hard or complicated stats/ML/math problems left and right, even if it's just a simple game.
I remember one time when I realized this: a TV show character suggested building an app that lets you take a picture of someone's shoes and then give you links where you could buy them. It sounded a) like a pretty good business idea and b) crazy hard to build (the time the show aired was before the Deep Learning proliferation iirc).
I suppose even then you'd use a library which solves the problem rather than figure out and build the entire thing yourself.
I do agree that algorithms are fun and the basic ones are probably useful in everyday work, but for anything complicated you'd rather use a proper library. It's also definitely good to have knowledge of their internal workings though.
I think this is the reality for the majority of software developers, although it's not to be underestimated because it's not just a one-off button, it ends up being dozens, hundreds of "simple" things like that. Keeping all of them up to date and consistent is where the challenge is in that regard, and not many people manage - in my experience, a lot of systems as you describe them are rewritten every 5-10 years.
I'm not someone that graduated college, but do my best to study independently.
I'm somewhat familiar with all of the topics (except privacy preservation), but I'm a bit surprised to see these all lumped together. They're all certainly useful, but they seem some what sparsely related.
Can someone tie together
- Probabilistic data structures & hashing
- PCA/SVD/Sampling/Compressed Sensing
- Privacy preservation
for me? They seem like they should be three independent courses.
For anyone interested in the PCA/SVD/Compressed Sensing - Steve Brunton[0] (prof at UW) has a youtube channel full of various lectures from vector calculus, pca, svd, dynamical systems, and more
Even when I was a student and loved a particular class, there was the constant distraction of having to attend and do homework for four or five other classes.
To answer your strawman, nah for me I got zero network from "college" but I enjoyed the education. Financially the biggest benefit is not getting rejected from jobs that require it.
My (heretical?) opinion is that videos are a waste of time compared to text for the majority of content and the majority of people. There are some ideas that can be very nicely visualized and some people (e.g. dyslexics, non-native speakers) who struggle with text. For the rest I think text is better.
I disagree strongly with this opinion. Videos and text can play different roles in the comprehension of ideas.
My experience is that if I'm comfortable with a topic, text is far more effective as a medium to absorb knowledge from. For learning programming languages, for example, I've found it much more effective to read books that to sit for lectures or online videos.
However, for topics like complex mathematics or algorithms, especially with dense terminology, text often appears like an impenetrable wall of text for me. Good video lectures help me see through the professor's thought process and divide the problem in smaller, more comprehensible parts.
In my opinion, the strongest advantage of video content is that they include an element of guided attention. A good video can forcefully mute the secondary details and highlight the essence of a concept or process.
I can see how you can spend hours on videos and not learn a lot though. After getting an insight from a video, the concepts need to be practiced through exercises, which is active as compared to passive watching, else the advantages are lost because the information can be easily forgotten.
I find videos terrible for learning on their own. But combined with taking notes, reviewing them after, and using them to kick off a bit of reading or (better) experimentation, they can be an invaluable part of the learning process.
This is "mostly" maths for "big data" (aka statistical efficiency over huge set of data or "qsort-ized" algos) but kind of taught the wrong way: first you learn maths, then you code algos based on current hardware architecture.
I had a doubt when I did realize I could not find one of the most important topics of computer science: proof of accuracy of floating point computations.
I see this more like an index of a library of various algorithms, and that, I think it's good.
Are there algorithm courses that take into account how hardware affects algorithms? For example with databases, you have implement theoretically inefficient algorithms which are faster in practice (mostly because they use sequential access).
The problem with accounting for hardware, is which hardware do you account for? If you really want to get into the performance optimization weeds you end up relying on e.g. specific characteristics of a vendor's CPU. The Mechanical Sympathy blog had posts that do this, and the problem with this is it doesn't generalize. Hardware changes (e.g. SSDs have different performance characteristics compared to spinning disks) and you can even design hardware to efficiently implement particular algorithms.
That said, there are areas within algorithms that do look at optimizing for typical hardware. E.g. cache oblivious algorithms are designed to take advantage of caches as found in modern CPUs.
Rather than insisting on things that generalize with no additional work, it may be useful to use a certain modern architecture as a target for learning techniques and then rely on the wetware to generalize.
A lot of his work is in the journal "Experimental Algorithmics", so I guess there are a substantial number of people working in this way. I don't know this field very well, and I don't know if there are general principles that have been extracted from their work.
This Advanced Data Structure course [1] which, while not accounting for any particular hardware, had some interesting "cache-oblivious" algorithms (i.e. designed to make the best use of your cache no matter the cache sizes.) Is that the type of work you are thinking about?
This area is less mature than traditional algorithms work. It is the case that cache-friendliness and minimization of data dependency chains is a huge deal for hyper efficient modern algorithms but this hasn't been the case for too long so there is a less rich history of ideas to pull from.
This course is also more about solving problems in the first place rather than implementation efficiency.
Usually that lives in more applied courses, I guess partially because there is less useful theoretical treatment of that and the usually-taught tools don't really fit it. "A cache miss is 1000x slower" is just a constant factor after all ;)
ooh this looks nice. although teaching a course with such an emphasis on decomposition in 2022 that doesn't include vae/nn/transformers methods seems... behind the times?
It presupposes the basics of CS (such as hashing) and mathematics (such as some linear algebra). The prerequisites are CS107 (Computer Organization & Systems) and CS161 (Design and Analysis of Algorithms), which have prerequisites in turn.
The lecturers write:
> We welcome all comers — there’s a zillion courses you could be taking, and we’ll be happy and flattered if you decide to take this one.
That said, to prepare a coherent lecture, it’s helpful to have a target audience in mind. We view the canonical student in the class as a senior-year computer science major. As you can see in this lecture, we assume a certain degree of “computer science maturity,” taking for granted that you know and care about concepts like caching, hashing, balanced search trees, and so on. We also assume sufficient programming maturity to translate the high-level descriptions given in lecture to working implementations.
> Our ambition is for this to be the coolest computer science course you’ve ever taken. Seriously!
this is Za. wish my school had stanford level classes, im caught at a t20 research focused university that isn’t a industry pillar of CS (Cal, MIT, stanford), and they don’t seem motivated to update the classes to be challenging and relevant, just dated and unnecessarily difficult niche topics
Its a class for undergrads. Nothing can ever be explored in its full depth. If people want to spend an entire semester on a single problem they should take a grad course.
The course description is especially ironic as it claims rigor.
> This course will provide a rigorous and hands-on introduction to the central ideas and algorithms that constitute the core of the modern algorithms toolkit.
The result, of course, will be a bunch of undergrads with a very hazy recollection, and zero real knowledge.
Do you have any one particular topic in mind you think isn't covered rigorously enough? If so, what's missing from it? That way someone can use "is that covered" as a heuristic for evaluating other resources in the future.
One week (two lectures, one discussion, one homework) for Fourier transforms?
Seriously?
Having been a TA for a course with similar pretentions. Students will be miserable because they aren't given the tools to deal with the homework, they will learn very little, and forget it all very quickly
> In this course, we’ll be looking for the following trifecta: (i) ideas that are non-obvious, even to the well-trained computer scientist, so that we’re not wasting your time; (ii) conceptually simple — realistically, these are the only ideas that you might remember a year or more from now, when you’re a start-up founder, senior software engineer, PhD student, etc. (iii) fundamental, meaning that there is some chance that the idea will prove useful to you in the future.
In the first lesson, they discuss consistent hashing, and they seem to have achieved their goals.
ETA: Must include this anecdote on the history of the algorithm:
1. 1997: The implementation of consistent hashing given in this lecture first appeared in a research paper in STOC (“Symposium on the Theory of Computing”) [...] Ironically, the paper had previously been rejected from a theoretical computer science conference because at least one reviewer felt that “it had no hope of being practical.”
2. 1998: Akamai is founded.
3. March 31, 1999: A trailer for “Star Wars: The Phantom Menace” is released online, with Apple the exclusive official distributor. apple.com goes down almost immediately due to the overwhelming number of download requests. For a good part of the day, the only place to watch (an unauthorized copy?) of the trailer is via Akamai’s Web caches. This put Akamai on the map.
4. April 1, 1999: Steve Jobs, having noticed Akamai’s performance the day before, calls Akamai’s President Paul Sagan to talk. Sagan hangs up on Jobs, thinking it’s an April Fool’s prank by one of the co-founders, Danny Lewin or Tom Leighton.