Napkin Math For Fine Tuning Part 2
Johno Whitaker answers follow-up questions about the first Napkin Math for Fine Tuning video and about his research.
If you enjoyed this content, subscribe to receive updates on new educational content for LLMs.
Chapters
00:20 Introduction
Johno Whitaker introduces this talk, aiming to clarify points and answer follow-up questions from Part 1.
01:01 Saturating GPUs
Discussion on whether to always saturate GPUs completely, explaining that while it’s best when memory bandwidth-bound, compute-bound is more nuanced.
04:17 Cost/Complexity Trade-off
Exploring the balance between cost savings and added complexity for different GPU configurations.
09:44 Hyperparameter Tuning
Johno explains his approach to hyperparameter tuning, suggesting that default parameters work fine unless the final few percent improvement is crucial.
11:31 Fine-Tuning
Johno discusses the role that fine-tuning plays in his R&D, describing it an alternative he would explore if prompt engineering is insufficient.
15:37 TPUs
A brief look at how TPUs or other non-GPU accelerators fit into the napkin math.
18:52 Optimizing Llama-3 with LoRA
Practical tips for reducing memory usage with Llama 3.
22:55 Sequence Length
Developing an intuition for the sequence length parameter and tips for optimizing it.
27:45 Quick Development Loop
Walkthrough of how to start small and build when working with LLMs.
29:35 Tools
What to look for when choosing tools to build and run models.
31:29 CPU Offloading
Discussion on offloading to CPU when GPU VRAM maxes out, why it is not common, and when it might be useful.
34:43 Learning Styles
Johno shares how he discovers new information and continues learning, emphasizing learning through application.
39:00 Hardware Lottery Theory
Discussion about how the symbiotic development of algorithms and hardware might limit future breakthroughs.
42:27 Direction of Research
Dan, Johno, and Hamel discuss different approaches to exploring new avenues of research and development, and how to move with the industry.
48:10 LLM Impact on Research
Discussion of how LLM-based tools have enabled Johno’s research and possible future development of tools.
49:57 Quick Questions
Johno gives quick responses to final questions touching on diffusion models, evolutionary AI, coding styles, and QLoRA overhead.
58:16 1.58-bit Quantization
Detailed discussion of what quantization under 4-bit means, its benefits, and how it might be used.
1:02:20 Alternative Architectures
Brief dive into alternatives to transformers, such as state space models (SSMs) and recurrent models.
1:03:44 Conclusion
Johno wraps up the talk.
Resources
Links to resources mentioned in the talk:
- Johnowhitaker.dev << Personal website for Johno Whitaker.
- FSDP+QLoRA Benchmarks << Ballpark costs for different hardware configurations
- Napkin Math for Fine Tuning Part 1 << Napkin math explanation for fine-tuning processes.
- Google Context Caching << Overview of Google’s context caching on Vertex AI.
- Sakana AI Evolutionary Model Merge << Sakana AI’s approach to merging models using evolutionary algorithms.
- Undermind - AI Powered Document Search << AI-powered document search tool by Undermind.
- ChatPDF - Chatbot Tuned on Research Papers << ChatPDF’s chatbot designed for interacting with research papers.
- nbdev - Documentation and Source Control from Notebooks << nbdev’s tools for managing documentation and source control directly from notebooks.
- Mobius Labs on 1 Bit and 1.58 Bit LLMs << Blog post by Mobius Labs discussing 1-bit and 1.58-bit quantization for LLMs.
- Mamba (State Space Model) << Mamba’s implementation of state space models on GitHub.