For self-directed learning, my favorite has been actually using ChatGPT (GPT4, especailly), because you can just ask as you go along. Some questions I asked:
I have a pytorch ml llm gpt-style model and it has many layers, called "attention" and "feed forward". Can you explain to somehow who is highly technical, understands software engineering, but isn't deeply familiar with ML terms or linear algebra what these layers are for?
Where I can get all the jargon for this AI/ML stuff? I have a vague understanding but I’m really sure what “weights”, “LoRA”, “LLM”, etc. are to really understand where each tool and concept fit in. Explain to a knowledgeable software engineer with limited context on ML and linear algebra.
In gpt/llm world, what's pre-training vs fine-tuning?
in huggingface transformers what's the difference bertween batch size vs microbatch size when training?
how do I free cuda memory after training using huggingface transformers?
I'm launching gradio with `demo.queue().launch()` from `main.py`. How can I allow passing command line arguments for port and share=True?
Comment each of these arguments with explanations of what they do (this is huggingface transformers)
args=transformers.TrainingArguments(
per_device_train_batch_size=micro_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
warmup_steps=100, #
max_steps=max_steps,
num_train_epochs=epochs,
learning_rate=learning_rate,
fp16=True,
logging_steps=20,
output_dir=output_dir,
save_total_limit=3,
),
In the context of ML, can you explain with examples what "LoRA", "LLM", "weights" are in relation to machine learning, specifically gpt-style language models?
I don't trust an LLM to answer accurately on topics that I don't already know well. It's useful as a multiplier on things I do know, since I can correct mistakes and spot hallucinations, but if it's something I don't have knowledge on it seems like it might be confident but incorrect and I wouldn't notice or check it's answers.
I'm pretty sceptical of people using it to learn a foreign language or something they aren't familiar enough with.
I’ve found GPT 3.5 and 4 to be far better than other learning resources as well.
It just gets to the point immediately.
No chapters of books you need to read first, where there is no clear view of how it comes together or if it ever will.
No documentation to deal with, which is notorious for never solving your specific question and just requires you to conform to that developer’s attempt at being a teacher of their own concept.
No long winded youtube video or video educational course.
Just “explain this concept” and it goes out of its way to explain answers to things you didn't even ask well.