Hacker News new | past | comments | ask | show | jobs | submit login
LLMs and GPT: Some of my favorite learning materials (gist.github.com)
280 points by rain1 on March 28, 2023 | hide | past | favorite | 24 comments



For self-directed learning, my favorite has been actually using ChatGPT (GPT4, especailly), because you can just ask as you go along. Some questions I asked:

    I have a pytorch ml llm gpt-style model and it has many layers, called "attention" and "feed forward". Can you explain to somehow who is highly technical, understands software engineering, but isn't deeply familiar with ML terms or linear algebra what these layers are for?

    Where I can get all the jargon for this AI/ML stuff? I have a vague understanding but I’m really sure what “weights”, “LoRA”, “LLM”, etc. are to really understand where each tool and concept fit in. Explain to a knowledgeable software engineer with limited context on ML and linear algebra.

    In gpt/llm world, what's pre-training vs fine-tuning?

    in huggingface transformers what's the difference bertween batch size vs microbatch size when training?

    how do I free cuda memory after training using huggingface transformers?

    I'm launching gradio with `demo.queue().launch()` from `main.py`. How can I allow passing command line arguments for port and share=True?

    Comment each of these arguments with explanations of what they do (this is huggingface transformers)

        args=transformers.TrainingArguments(
            per_device_train_batch_size=micro_batch_size,
            gradient_accumulation_steps=gradient_accumulation_steps,
            warmup_steps=100, #
            max_steps=max_steps,
            num_train_epochs=epochs,
            learning_rate=learning_rate,
            fp16=True,
            logging_steps=20,
            output_dir=output_dir,
            save_total_limit=3,
        ),
    
    In the context of ML, can you explain with examples what "LoRA", "LLM", "weights" are in relation to machine learning, specifically gpt-style language models?


I don't trust an LLM to answer accurately on topics that I don't already know well. It's useful as a multiplier on things I do know, since I can correct mistakes and spot hallucinations, but if it's something I don't have knowledge on it seems like it might be confident but incorrect and I wouldn't notice or check it's answers.

I'm pretty sceptical of people using it to learn a foreign language or something they aren't familiar enough with.


It can be very useful for things you don't know as well, as long as you can verify it.


One approach that I think is pretty safe is to ask it for relevant papers to read. If they don't exist then you'll find out pretty quick.


I’ve found GPT 3.5 and 4 to be far better than other learning resources as well.

It just gets to the point immediately.

No chapters of books you need to read first, where there is no clear view of how it comes together or if it ever will.

No documentation to deal with, which is notorious for never solving your specific question and just requires you to conform to that developer’s attempt at being a teacher of their own concept.

No long winded youtube video or video educational course.

Just “explain this concept” and it goes out of its way to explain answers to things you didn't even ask well.


I've found myself asking questions that I felt too embarrassed to ask before to a coworker.


I have compiled a list of some of the materials that I found the best for learning about how LLMs work. I Hope this is useful to you all. I will continue to update it as I find new things.


I'm learning the basics of transformer from Andrew Ng's courses, your post seems to make me learn it faster. Thanks!


Sharing is caring! Nice one, thanks


This is great, I've loved Karpathy's videos. Perhaps this is out of scope, but have you considered an "Applied LLM" section?

I know prompt engineering is a lesser art, but I think some of the literature on in-context, self-consistency, self ask, CoT, etc might be useful. Here's my favorite lit review on the subject (no affiliation): https://lilianweng.github.io/posts/2023-03-15-prompt-enginee...


Realistically speaking, how long would it take for someone who has college-level Linear Algebra and multi-variable Calculus knowldge and rusty familiarity with ML (via Andrew NG's matlab course) to learn the concept of LLM and state of the art algorithms behind SDs and GPTs?

Would it even make sense if one's interest is not image or text generation?


Only a month or two for the basics, if you're willing to take some stochastic calculus in the SDs for granted.

Part of what's eluded public awareness so far is that these algorithms are simple.


The algos are but the architecture isn't.


For example, cf. ZionEX and Pathways.

AI accelerators are important.


A couple of months. Also, there is a world of difference between knowing the topic on a superficial level and training and evaluating a model yourself.


i mean to learn all the details it'd prob take the equivalent work of doing a phd + research fellowship or two. but i have the qualifications you cite and I'm doing the fast.ai courses to get a working knowledge of things and that seems to take 4-8 weeks depending on your pace


Not really. These algorithms are generally quite simple and you are not coming up with them from scratch.

It might take a PhD to build the JVM, but you don't need one to understand what it does (at least most parts of it).


I have not prettied mine up (would take PRs if any volunteers!) but here's my equivalent repo of reading materials in case it helps https://github.com/sw-yx/ai-notes/tree/main#top-ai-reads


I think one section that should be included is "interfacing with LLMs". I know most of the stuff on your list, but without ever having used an LLM. A lot of the few shot/prompt engineering, fine-tuning methods, LoRA, 8bit quantization stuff, etc.. would be the most useful to me. Practical knowledge of how to use them or adapt them to a domain seems more scattered and harder for me to find, since it's all pretty much new.


> Ted Chiang, ChatGPT Is a Blurry JPEG of the Web

I found an interesting counter perspective on the Mindscape podcast[0]:

  And Ted Chiang, in that article, suggests that ChatGPT and language models generally can be thought of as a kind of blurry jpeg of the web, where they get trained to compress the web and then at inference time, when you're generating a text with these models, it's a form of lossy decompression that involves interpolation in the same way. And I think it's a very interesting test case for intuitions because I think this metaphor, this analogy, parts of it are pumping the right intuitions.

  There is definitely a deep connection between machine learning and compression that has long been observed and studied. [...] I think comparing it to lossy image decompression is pumping the wrong intuitions because, again, that suggests that all it's doing is this kind of shallow interpolation that amounts to a form of approximate memorization where you have memorized some parts of the data and then you are loosely interpolating what's in between.

  [...] the intuition is that there would be a way, presumably, to characterize what large language models and image generation models are doing when they generate images and texts as involving a form of interpolation but this form of interpolation would be very very different from what we might think of when we think of nearest neighbour pixel interpolation in lossy image decompression. So different, in fact, that this analogy is very unhelpful to understand what generative models are doing because, again, instead of being analogous to brute force memorization, there's something much more generally novel and generative about the process of inference in these models.
[0] https://www.preposterousuniverse.com/podcast/2023/03/20/230-...


Maybe consider https://generative.ink/posts/simulators/ for the Philosophy of GPT section I think that one is by far the most insightful take.


Fast.ai YouTube course is a great intro to practical machine learning. I'm not sure if Jeremy Howard has made any lectures specifically about transformers but the course has plenty of good practical info that's really well explained.


I can't seem to find the right one. Care to post the link?


Thank you! Very useful reference.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: