More

mbowcut2 · 2024-07-19T20:38:00 1721421480

The biggest difference is when you feed a sequence into a decoder only model, it will only attend to previous tokens when computing hidden states for the current token. So the hidden states for the nth token is only based on tokens <n. This is where you hear the talk about "causal masking", as the attention matrix is masked to achieve this restriction. Encoder architectures on the other hand allow for each position in the sequence to attend to every other position in the sequence.

Encoder architectures have been used for semantic analysis, and feature extraction of sequences, and encoder only for generation (i.e. next token prediction).

mbowcut2 · 2024-07-16T17:43:24 1721151804

THIS. People don't realize the importance of Mamba competing on par with transformers.

imtringued · 2024-07-17T11:07:11 1721214431

Linear attention is terrible for chatbot style request-response applications, but if you're giving the model the prompt and then let it scan the codebase and fill in the middle, then linear attention should work pretty decently. The performance benefit should also have a much bigger impact, since you're reprocessing the same code over and over again.

mbowcut2 · 2024-02-17T05:25:23 1708147523

I’m similar. Never been very over weight but I’ve put on 20ish lbs in the last few years. My wife and I started tracking calories and exercising a month ago and it has been so beneficial. I lost about ten lbs, but even better than losing the weight has been building a better intuition of what foods are calorie rich, higher in protein, etc.

mbowcut2 · 2024-01-31T19:21:18 1706728878

This is fun. It would be interesting to build a single graph of concepts that all users contribute to. Then you wouldn't have to run LLM inference on every request, just the novel ones, plus you could publish the complete graph which would be something like an embedding space.

lilyball · 2024-01-31T19:22:57 1706728977

A lot of combinations return instantly, so I assume that it is in fact caching a lot.

mbowcut2 · 2024-01-31T19:25:54 1706729154

oh I just realized that 'isNew' in the response refers to a global set, not the user set. So, I guess it's doing exactly what I said lol.

lilyball · 2024-01-31T19:39:00 1706729940

I just went back and did some new combinations with early ones and I'm still getting intermittent delays even though all early combinations must be done, so I assume part of this is just the server itself being a little overloaded and so even responses that are cached remotely but not locally may experience delays.

mbowcut2 · on Dec 5, 2023

This makes sense. Add in that he was featured on the JRE (which is where I first learned about him) which no doubt gave him a nice audience boost.

harryquach · on Dec 6, 2023

That's where I first learned of him as well. Since he was first featured I now listen to his podcast.

mbowcut2 · on Dec 5, 2023

I've been long time listener of his show, so I don't necessarily disagree with your points. It just seems like he punches above his weight when it comes to guests.

mbowcut2 · on Nov 27, 2023

This is perfect. We need more hobbyist apps. Minimalist design, simple and solid functionality, and no ads in sight. Shake off the shackles of the advertising dystopia!

jakey_bakey · on Nov 27, 2023

This comment is perfect :)

I think certianly the toddler-based inspo made it easier to turn into a simple toy

mbowcut2 · on Nov 17, 2023

Next they'll announce GPT-4 is the new CEO.

mbowcut2 · on Oct 15, 2023

For what it’s worth, my undergraduate was in Economics with an emphasis in econometrics and this article touched on probably 80% of the curriculum.

The only problem is by the time I graduated I was somewhat disillusioned with most causal inference methods. It takes a perfect storm natural experiment to get any good results. Plus every 5 years a paper comes out that refutes all previous papers that use whatever method was in vogue at the time.

This article makes me want to get back into this type of thinking though. It’s refreshing after years of reading hand-wavy deep learning papers where SOTA is king and most theoretical thinking seems to occur post hoc, the day of the submission deadline.

mmmmpancakes · on Oct 15, 2023

Yeah, the only common theme I see in causal inference research is that every method and analysis eventually succumbs to a more thorough analysis that uncovers serious issues in the assumptions.

Take for instance the running example of catholic schoolings effect on test scores used by the boook Counterfactuals and Causal Inference. Subsequent chapter re-treat this example with increasingly sophisticated techniques and more complex assumptions about causal mechanisms, and each time they uncover a flaw in the analysis using techniques from previous chapters.

My lesson from this: outcomes causal inference is very dependent on assumptions and methodologies, of which the options are many. This is a great setting for publishing new research, but its the opposite of what you want in an industry setting where the bias is/should be towards methods that are relatively quick to test and validate and put in production.

I see researchers in large tech companies pushing for causal methodologies, but I'm not convinced they're doing anything particularly useful since I have yet to see convincing validation on production data of their methods that show they're better than simpler alternatives which will tend to be more robust.

Trombone12 · on Oct 16, 2023

> My lesson from this: outcomes causal inference is very dependent on assumptions and methodologies, of which the options are many.

This seems like a natural feature of any sensitive method, not sure why this is something to complain about. If you want your model to always give the answer you expected you don't actually have to bother collecting data in the first place, just write the analysis the way pundits do.

mmmmpancakes · on Oct 16, 2023

> This seems like a natural feature of any sensitive method, not sure why this is something to complain about.

I am exactly complaining it is sensitive. If theres robust alternatives why would i put this in prod?

Trombone12 · on Oct 16, 2023

Because you care about accuracy?

mmmmpancakes · on Oct 18, 2023

Because with real world data like in production in tech there are so many factors to account for. Brittle methods are more susceptible to unexpected changes in the data or unexpected ways in which complex assumptions abut the data fail.

mbowcut2 · on Oct 18, 2023

But really, how accurate are your results if they depend on strong assumptions about your data?

gridland · on Oct 15, 2023

just use propensity scores + ipw and you have the same thing as a rct. :)

mmmmpancakes · on Oct 15, 2023

From my experience propensity scores + ipw really doesn't get you far in practice. Propensity scoring models rarely balance all the covariates well (more often, one or two are marginally better and some may be worse than before). On top of that, IPW either assumes you don't have any cases of extreme imbalance, or, if you do you end up trimming weights to avoid adding additional variance, but in some cases you do even with trimmed weights..

hackernewds · on Oct 15, 2023

not necessarily unless you skim over meaningful confounding factors :)

mbowcut2 · on Oct 10, 2023

Looks like a duck, quacks like a duck.

wsintra2022 · on Oct 10, 2023

But is actually a Disney animatronic name Daffy