More

dragon96 · on June 22, 2022

I read the beginning sections of each chapter until I understand conceptually what the chapter implements by the end. (I needed more assistance for the initial chapters not knowing what the output of scanning or parsing would be, or where to start, but by the later chapters, it becomes pretty clear what the output of implementing say, classes, is).

Then I would try to implement it on my own without reading, having Chapter 3, the language specs, open in a different tab for reference. I chose Java because (1) I wasn't very familiar with it, and (2) I wanted the option to use their code snippets.

My goal is to pass all of the edge test cases the book mentions. So finally I would read the chapter and take some notes on the different design choices and test cases I missed. The reading is quite fast once you've already implemented because you know what to look for.

dragon96 · on June 19, 2022

Regardless of which text you choose, I recommend writing proofs (full sentences and all) for the solutions that were harder to come by, or you feel difficulty expressing clearly. And if you can, get some feedback on the proofs. (Happy to take a look at a few if you DM me)

The process of writing will hopefully help you:

  - build awareness of when your arguments are not airtight or when you make false assumptions
  - modularize your thinking 
  - become more fluent with logical "vocabulary"

As for books, my personal favorites were Problem Solving Strategies (Engel) and Art and Craft of Problem Solving (Zeitz). They're both really approachable, have plenty of examples, and will give you a different perspective on what math can be about.

dragon96 · on June 7, 2022

For anyone interested in a more technical summary of the some of the main ideas:

Problem Statement

  The Erdos conjecture that all primitive sets S satisfy f(S) <= f(PRIMES), where f(S) = sum_{s} (1/(s log s)) and PRIMES is the set of all primes.

Definitions

  Define A_q to be a primitive set such that all elements a are divisible by prime q, but no primes less than q.
   * Example: A_3 contains multiples of 3, but no even numbers.

  Define A*_q to be the set of all integers divisible by only primes >= q.
   * Example: A_q is a subset of A*_q.
   * Example: A_q is also a subset of q \* A*_q (i.e. multiply each element by q).

  Define g(a) = 1/a * Product_{p<LargestPrimeDivisor(a)}[(1 - 1/p)]. 
   * Example: if a=7, then the Product term (excluding the 1/a) is the "density" of A*_7. This is because (1-1/2) of integers aren't divisible by 2, (1-1/3) aren't divisible by 3, and (1-1/5) aren't divisible by 5.
   * Example: if a=7, then g(7) is the density of all multiples of 7 that aren't divisible by 2,3,5.

Results

  In [1], Author proves that for all A_q, with A_q \neq {q}: f(A_q) < k*\sum_{a\in A_q} g(a) <= k*g(q), for a constant k. 
   * The first half of the inequality comes from a cited result: 1/(a log(2a)) < k*g(a).
   * The second half of the inequality comes from clever construction of the sets S_a = (a \* A*_{LargestPrimeDivisor(a)}), for all a in A_q. These sets are constructed to satisfy g(a) = |S_a|/|NaturalNumbers|. So because (1) S_a and S_b are disjoint for a\neq b (2) S_a is a subset of A*_q, this implies that sum(g(a)) = sum(|S_a|) <= sum(|A*_q|) = g(q).

  In [2], Author shows that k\*g(q) <= 1/(q log q). Note that the right-hand side of the inequality is equal to f({q}). 
   * This uses a pretty clever partitioning of the primitive set A into A_2 U A_3 U A_5 U ... U A_q U .... Combining with part 1, this implies f(A_q) < f({q}) for all primes q, so f(A) < f(PRIMES).

[1] https://arxiv.org/pdf/1806.02250.pdf [2] https://arxiv.org/pdf/2202.02384.pdf

dragon96 · on May 24, 2022

Interesting, could you give some examples of concepts from relational algebra that have helped inform your understanding of schema design?

Asking as a practioner interested in learning the conceptual underpinnings :)

dgb23 · on May 24, 2022

I think there is a simple, almost obvious, but very powerful concept that gets clearer when learning relational algebra: Everything is a relation.

There is an important part about using an SQL DB which is about the engineering side of things, such as access patterns, indexing, efficient storage and so on. And then there is the side that is about relational expressions.

An intuition of mine is that people who feel more comfortable with the second part, lean on SQL (and DB features in general) to do work for them, while eschewing things like ORMs.

dragon96 · on Dec 17, 2021

For me, I think I could tell myself to be more productive for 1 week and succeed at it, but I'd feel a lot more stressed or cynical about work as a result. I feel I'ma lazy person because I know I'm not doing what I could when I try my best, but I also feel wellbeing and health is more important in the long run.

dragon96 · on Oct 7, 2021

One piece of advice I hear a lot is "review your games", but how do you actually do that without a stronger player? I'd sometimes use an engine and it'll point out moves I hadn't considered before, but without understanding the plan or positional ideas behind them, I often find this pretty opaque.

nescioquid · on Oct 7, 2021

I found that bit of advice similarly daunting. However, in trying to understand where things go wrong in a game, you might notice patterns emerging after you've analyzed several of your own games, which should give you something concrete to work on for improvement.

At my level, that basically amounted to identifying blind spots I'm a prey to (at one point it was discovered attacks along a particular diagonal). A master, expert, or higher level class player will be concerned with entirely different things when they review their games.

webnrrd2k · on Oct 7, 2021

One thing that helps me is to play a better chess engine, dialed down to close to my skill level, and play for a bit until I really get stuck. Then I take back a bunch of moves and figure out where I went wrong and why, and then play the game out until I get stuck again. Or I'll go back and try to see if I can find a better way to accomplish my goals. In general, creating a low-risk environment to learn, where I try to compare my original thinking to my later thinking has been key.

I haven't played more than a handful of games since pre-covid times, so I'm back to being pretty clumsy, and just started "rewinding" games again. It seems to help a lot.

webnrrd2k · on Oct 7, 2021

I forgot to add that simply writing down the moves when I play someone else makes a huge difference in my play. I'm much less likely to blunder, for one thing.

jdkenney · on Oct 7, 2021

Before internet chess it was very common to analyze games either at the tournament with a group, or a club later also with a group, both usually having some stronger players around.

To do it yourself, the best explanation and framework I think is found in Yermolinsky’s “Road to Chess Improvement”. It’s very helpful in systemizing this and also has thorough explanations of his experience in analyzing his own games.

adgjlsfhk1 · on Oct 7, 2021

My general strategy is to review with a computer, and if I don't understand the move the computer is suggesting, I follow the PV (principal variation) 3 or 4 moves deep. Generally that is enough to either tell me what I should have seen or "oh, the computer is thinking way above my level and I can probably ignore this"

X6S1x6Okd1st · on Oct 7, 2021

at least on lichess there is a "learn from your mistakes" button where you need to guess a move that doesn't lose points. Try not to just randomly make guesses but think hard when you don't see it.

dragon96 · on June 30, 2021

There ought to be a rule that any "curated" list of N books should also include N books not to read in the same category

abecedarius · on June 30, 2021

An example of following a similar rule (briefly list other standard books and why the recommended books are preferred): https://www.lesswrong.com/posts/xg3hXCYQPJkwHyik2/the-best-t...

dragon96 · on June 17, 2021

Interesting idea. I love the recognition of current hiring asymmetries. Some of the practices by companies simply shouldn't be tolerated, like the recruiter ghosting, opaqueness on salary ranges, exploding offers, misleading listings or qualifications, lack of feedback, etc., and I wonder if some of these can be solved by a company like TB that facilitates the hiring market and pipelining the process.

So hiring through TripleByte would be conducted in some fixed-length time intervals (e.g. 1 month). Companies specify skills they're hiring for, interested engineers interview for the skills that TripleByte can screen for and supply a minimum salary (invisible to companies), and companies hiring that month bid starting at the min salary for the engineer with all other hiring companies.

If the market determines the hiring timeline, recruiters can't put you on hold if they're interested-but-not-sure, and can't use exploding offer tactics. It also pushes companies to be more honest about salary ranges and required skills. TripleByte would be responsible for providing feedback rather than the companies.

dragon96 · on May 8, 2021

Can you share more about some of the technical problems Streamline works on?

amirkdv · on May 9, 2021

Sure!

Our domain is precision oncology. In brief, this is about matching cancer patients with available targeted therapies by examining the genome of their cancer. This is different from how the majority of cancer patients are treated today: surgery and {chemo,radio}therapy.

Here is a problem in our domain where tech is one of the limiting factors.

If you look at the DNA of any cell and compare it against the reference genome, you'll find a lot of differences, aka variants. Typically even more so if you're looking at a sequenced tumour (~1e6 variants). This is your hay stack. And a variant that can be medically targeted to treat the cancer is the needle. The definition of what variant is "clinically relevant" is layered, context-dependent, and (partially) regulated. Software is responsible for automating away the majority of variants, say down to 10-100, in a justifiable, traceable way. It's also responsible for giving tools to the clinician to deal with the remaining ones. This manual step typically involves an informed line of questioning about each variant backed by 100s of supporting data points about it.

Without these two tech pieces, interpreting a single molecular pathology report can and does take many hours of (expensive) expert time, instead of minutes. For a rough sense of scale: human genome has ~3e9 nucleotides (ACTG), has ~3e4 known genes, ~1e6 known gene interactions, and ~1e8 known variants. Typical whole genome sequencing produces > 30GB of raw data (compressed).

This is probably the first problem everyone runs into. There are plenty of other ones, some more challenging and interesting than others. Feel free to send me an e-mail if you'd like to discuss this more! amir[at]streamlinegenomics.com

dragon96 · on Dec 15, 2020

The authors of Tackling Climate Change with Machine Learning spun off a forum at climatechange.ai with tons of related resources