Napkin Math For Fine Tuning Part 2

fine-tuning

llm-conf-2024

Published

July 29, 2024

Abstract

Johno Whitaker answers follow-up questions about the first Napkin Math for Fine Tuning video and about his research.

Subscribe For More Educational Content

If you enjoyed this content, subscribe to receive updates on new educational content for LLMs.

Chapters

00:20 Introduction

Johno Whitaker introduces this talk, aiming to clarify points and answer follow-up questions from Part 1.

01:01 Saturating GPUs

Discussion on whether to always saturate GPUs completely, explaining that while it’s best when memory bandwidth-bound, compute-bound is more nuanced.

04:17 Cost/Complexity Trade-off

Exploring the balance between cost savings and added complexity for different GPU configurations.

09:44 Hyperparameter Tuning

Johno explains his approach to hyperparameter tuning, suggesting that default parameters work fine unless the final few percent improvement is crucial.

11:31 Fine-Tuning

Johno discusses the role that fine-tuning plays in his R&D, describing it an alternative he would explore if prompt engineering is insufficient.

15:37 TPUs

A brief look at how TPUs or other non-GPU accelerators fit into the napkin math.

18:52 Optimizing Llama-3 with LoRA

Practical tips for reducing memory usage with Llama 3.

22:55 Sequence Length

Developing an intuition for the sequence length parameter and tips for optimizing it.

27:45 Quick Development Loop

Walkthrough of how to start small and build when working with LLMs.

29:35 Tools

What to look for when choosing tools to build and run models.

31:29 CPU Offloading

Discussion on offloading to CPU when GPU VRAM maxes out, why it is not common, and when it might be useful.

34:43 Learning Styles

Johno shares how he discovers new information and continues learning, emphasizing learning through application.

39:00 Hardware Lottery Theory

Discussion about how the symbiotic development of algorithms and hardware might limit future breakthroughs.

42:27 Direction of Research

Dan, Johno, and Hamel discuss different approaches to exploring new avenues of research and development, and how to move with the industry.

48:10 LLM Impact on Research

Discussion of how LLM-based tools have enabled Johno’s research and possible future development of tools.

49:57 Quick Questions

Johno gives quick responses to final questions touching on diffusion models, evolutionary AI, coding styles, and QLoRA overhead.

58:16 1.58-bit Quantization

Detailed discussion of what quantization under 4-bit means, its benefits, and how it might be used.

1:02:20 Alternative Architectures

Brief dive into alternatives to transformers, such as state space models (SSMs) and recurrent models.

1:03:44 Conclusion

Johno wraps up the talk.

Resources

Links to resources mentioned in the talk:

Johnowhitaker.dev << Personal website for Johno Whitaker.
FSDP+QLoRA Benchmarks << Ballpark costs for different hardware configurations
Napkin Math for Fine Tuning Part 1 << Napkin math explanation for fine-tuning processes.
Google Context Caching << Overview of Google’s context caching on Vertex AI.
Sakana AI Evolutionary Model Merge << Sakana AI’s approach to merging models using evolutionary algorithms.
Undermind - AI Powered Document Search << AI-powered document search tool by Undermind.
ChatPDF - Chatbot Tuned on Research Papers << ChatPDF’s chatbot designed for interacting with research papers.
nbdev - Documentation and Source Control from Notebooks << nbdev’s tools for managing documentation and source control directly from notebooks.
Mobius Labs on 1 Bit and 1.58 Bit LLMs << Blog post by Mobius Labs discussing 1-bit and 1.58-bit quantization for LLMs.
Mamba (State Space Model) << Mamba’s implementation of state space models on GitHub.