The biggest difference is when you feed a sequence into a decoder only model, it will only attend to previous tokens when computing hidden states for the current token. So the hidden states for the nth token is only based on tokens <n. This is where you hear the talk about "causal masking", as the attention matrix is masked to achieve this restriction. Encoder architectures on the other hand allow for each position in the sequence to attend to every other position in the sequence.
Encoder architectures have been used for semantic analysis, and feature extraction of sequences, and encoder only for generation (i.e. next token prediction).
Linear attention is terrible for chatbot style request-response applications, but if you're giving the model the prompt and then let it scan the codebase and fill in the middle, then linear attention should work pretty decently. The performance benefit should also have a much bigger impact, since you're reprocessing the same code over and over again.
I’m similar. Never been very over weight but I’ve put on 20ish lbs in the last few years. My wife and I started tracking calories and exercising a month ago and it has been so beneficial. I lost about ten lbs, but even better than losing the weight has been building a better intuition of what foods are calorie rich, higher in protein, etc.
This is fun. It would be interesting to build a single graph of concepts that all users contribute to. Then you wouldn't have to run LLM inference on every request, just the novel ones, plus you could publish the complete graph which would be something like an embedding space.
I just went back and did some new combinations with early ones and I'm still getting intermittent delays even though all early combinations must be done, so I assume part of this is just the server itself being a little overloaded and so even responses that are cached remotely but not locally may experience delays.
I've been long time listener of his show, so I don't necessarily disagree with your points. It just seems like he punches above his weight when it comes to guests.
This is perfect. We need more hobbyist apps. Minimalist design, simple and solid functionality, and no ads in sight. Shake off the shackles of the advertising dystopia!
For what it’s worth, my undergraduate was in Economics with an emphasis in econometrics and this article touched on probably 80% of the curriculum.
The only problem is by the time I graduated I was somewhat disillusioned with most causal inference methods. It takes a perfect storm natural experiment to get any good results. Plus every 5 years a paper comes out that refutes all previous papers that use whatever method was in vogue at the time.
This article makes me want to get back into this type of thinking though. It’s refreshing after years of reading hand-wavy deep learning papers where SOTA is king and most theoretical thinking seems to occur post hoc, the day of the submission deadline.
Yeah, the only common theme I see in causal inference research is that every method and analysis eventually succumbs to a more thorough analysis that uncovers serious issues in the assumptions.
Take for instance the running example of catholic schoolings effect on test scores used by the boook Counterfactuals and Causal Inference. Subsequent chapter re-treat this example with increasingly sophisticated techniques and more complex assumptions about causal mechanisms, and each time they uncover a flaw in the analysis using techniques from previous chapters.
My lesson from this: outcomes causal inference is very dependent on assumptions and methodologies, of which the options are many. This is a great setting for publishing new research, but its the opposite of what you want in an industry setting where the bias is/should be towards methods that are relatively quick to test and validate and put in production.
I see researchers in large tech companies pushing for causal methodologies, but I'm not convinced they're doing anything particularly useful since I have yet to see convincing validation on production data of their methods that show they're better than simpler alternatives which will tend to be more robust.
> My lesson from this: outcomes causal inference is very dependent on assumptions and methodologies, of which the options are many.
This seems like a natural feature of any sensitive method, not sure why this is something to complain about. If you want your model to always give the answer you expected you don't actually have to bother collecting data in the first place, just write the analysis the way pundits do.
Because with real world data like in production in tech there are so many factors to account for. Brittle methods are more susceptible to unexpected changes in the data or unexpected ways in which complex assumptions abut the data fail.
From my experience propensity scores + ipw really doesn't get you far in practice. Propensity scoring models rarely balance all the covariates well (more often, one or two are marginally better and some may be worse than before). On top of that, IPW either assumes you don't have any cases of extreme imbalance, or, if you do you end up trimming weights to avoid adding additional variance, but in some cases you do even with trimmed weights..
Encoder architectures have been used for semantic analysis, and feature extraction of sequences, and encoder only for generation (i.e. next token prediction).