Napkin Math For Fine Tuning Part 2

Published

July 29, 2024

Abstract

Johno Whitaker answers follow-up questions about the first Napkin Math for Fine Tuning video and about his research.

Subscribe For More Educational Content

If you enjoyed this content, subscribe to receive updates on new educational content for LLMs.

    Chapters

    00:20 Introduction

    Johno Whitaker introduces this talk, aiming to clarify points and answer follow-up questions from Part 1.

    01:01 Saturating GPUs

    Discussion on whether to always saturate GPUs completely, explaining that while it’s best when memory bandwidth-bound, compute-bound is more nuanced.

    04:17 Cost/Complexity Trade-off

    Exploring the balance between cost savings and added complexity for different GPU configurations.

    09:44 Hyperparameter Tuning

    Johno explains his approach to hyperparameter tuning, suggesting that default parameters work fine unless the final few percent improvement is crucial.

    11:31 Fine-Tuning

    Johno discusses the role that fine-tuning plays in his R&D, describing it an alternative he would explore if prompt engineering is insufficient.

    15:37 TPUs

    A brief look at how TPUs or other non-GPU accelerators fit into the napkin math.

    18:52 Optimizing Llama-3 with LoRA

    Practical tips for reducing memory usage with Llama 3.

    22:55 Sequence Length

    Developing an intuition for the sequence length parameter and tips for optimizing it.

    27:45 Quick Development Loop

    Walkthrough of how to start small and build when working with LLMs.

    29:35 Tools

    What to look for when choosing tools to build and run models.

    31:29 CPU Offloading

    Discussion on offloading to CPU when GPU VRAM maxes out, why it is not common, and when it might be useful.

    34:43 Learning Styles

    Johno shares how he discovers new information and continues learning, emphasizing learning through application.

    39:00 Hardware Lottery Theory

    Discussion about how the symbiotic development of algorithms and hardware might limit future breakthroughs.

    42:27 Direction of Research

    Dan, Johno, and Hamel discuss different approaches to exploring new avenues of research and development, and how to move with the industry.

    48:10 LLM Impact on Research

    Discussion of how LLM-based tools have enabled Johno’s research and possible future development of tools.

    49:57 Quick Questions

    Johno gives quick responses to final questions touching on diffusion models, evolutionary AI, coding styles, and QLoRA overhead.

    58:16 1.58-bit Quantization

    Detailed discussion of what quantization under 4-bit means, its benefits, and how it might be used.

    1:02:20 Alternative Architectures

    Brief dive into alternatives to transformers, such as state space models (SSMs) and recurrent models.

    1:03:44 Conclusion

    Johno wraps up the talk.

    Resources

    Links to resources mentioned in the talk: