Modal: Simple Scalable Serverless Services
Modal makes it easy to run code in the cloud. In this talk, we will deliver a guided tour of how to use it, with an emphasis on the applications that matter for LLM fine-tuners.
If you enjoyed this content, subscribe to receive updates on new educational content for LLMs.
Chapters
00:00 Introduction Charles introduces Modal at a high level, explaining how it can be used throughout the various stages of LLM development to production.
01:06 Components of Scalable Services Designing scalable services involves considering how data will be obtained, stored, and processed. Scaling often requires distributing the service across multiple machines or data centers.
04:59 How Modal Makes Scaling Easier Modal offers simple and intuitive Pythonic methods to deploy scalable services.
07:40 Modal’s Distributed Compute Charles demonstrates the Modal function’s compute dashboard, highlighting its ability to display resource and traffic utilization. Modal can also run Cron jobs.
10:37 Databases for Modal Currently, Modal does not offer its own database primitives but connects seamlessly with external databases.
13:35 Modal Storage Modal’s Volumes are distributed file systems that appear as local files to Python code. Designed with fewer writes and more reads in mind, Volumes are ideal for model weights and datasets.
18:30 Modal Interface (I/O) FastAPI is recommended for I/O due to its asynchronous concurrent nature and excellent documentation. Modal provides web endpoints that convert functions to URLs based on FastAPI and supports both ASGI and WSGI.
23:58 Mitigating DDOS Attacks While Modal does not offer native DDOS protection, using authentication can help mitigate such attacks.
26:46 Mount vs Volume Mounts allow local files or directories to be accessible to applications running on a system, whereas Volumes offer persistent storage accessible across different environments.
27:42 Modal as a Serverless Service Modal is cost-effective as a serverless service. While provisioning resources for peak usage can be expensive, autoscaling services like Kubernetes adjust costs based on resource utilization, making them more economical and user-friendly.
33:08 Why Use Third-Party Serverless Offerings Serverless platforms like Modal efficiently manage resources by aggregating multiple users’ profiles, reducing the impact of individual fluctuations, and offering economic advantages and scalability that individual users cannot achieve.
35:08 Remote Procedure Calling (RPC) Remote procedure calling (RPC) allows local code to call functions on a different machine. Modal uses gRPC for this, making remote calls feel like local Python code by running your script’s functions on their servers. This requires careful management of the global scope and imports.
37:08 Demo Charles discusses minimodal, a demo available on GitHub, showing how code can run in multiple virtual environments.
38:46 Conclusion and Recap of Modal Charles concludes by explaining why Suno.ai uses Modal, detailing their use of remote storage, functions, Volumes for storage, and web endpoints, as described in a blog post about their platform’s capabilities.
Resources
Links to resources mentioned in the talk:
- Modal A serverless platform for generative AI
- How Suno shaved 4 months off their launch timeline with Modal
- MiniModal: A toy project showcasing the basic features of Modal for Python developers
Full Transcript
[0:03] Charles Frye: Yeah, thanks for coming, everyone. I wanted to walk through a deeper dive on modal. and to talk about sort of like the broad vision of what modal can be used for, focusing less on like fine tuning LLMs in particular. And the sort of vision with modal is to allow people to deploy scalable services and in a cost efficient serverless manner, and to do all that in a way that is simple and easy. So if you want to check out, by the way, the slides for this are available at the link in that QR code. Great.
[0:51] Charles Frye: So I’m going to break down each of the pieces of that alliterative title and what they mean to me and what I think they mean to you. So the goal is to create scalable services. And what do computing services need to do? There’s kind of like three things that you’re going to need to do with a service. First, maybe, is what’s all the way on the right here, input-output, connecting your service to the world and the world to information.
[1:22] Charles Frye: So this means that rather than having a Jupyter notebook that just runs on your machine or a script that runs if somebody, like, happens to like you know is able to like download the docker container image and run it um you have something where people can connect via the internet or some other network to your uh to to your service and get like put information into it or get information out of it So you need that input output. That’s maybe the most important part.
[1:55] Charles Frye: But what’s on the other side of that input output is what happens to that information inside of the system. So one of the most important things that happens to information is just storing it, holding onto it from the place it was created and then using it somewhere else. So you need to store or preserve information using something like a database or storing files somewhere. And then finally, you want to do something usually with that information, even if it’s as simple as make the storage of that information easy, speedy, and fast.
[2:28] Charles Frye: So it’s stored on files, but it’s accessible almost as fast as if it were in memory. So you need to compute and manipulate that information. So the collection of these things together are what we use to find the services that we make available to people. So I’ve been kind of talking about services from more of a databases perspective as I’ve been describing these things, which is the way something like Amazon works. You click a button to indicate your interest in purchasing an item that’s stored somewhere.
[3:01] Charles Frye: The related items are computed and then displayed back to the user. But that service could also be LLM inference. You put in a description of the code that you want, you get out the code that you want. So it’s like a fairly generic set of problems that need to be solved. And one of the big challenges with doing something like this is doing it in a scalable manner. Is setting it up so that those connections that are coming in can come from all around the world.
[3:34] Charles Frye: Or can come from hundreds or thousands or millions of people at once. That that storage can scale up to gigabytes, petabytes. or maybe even more in size, and that that compute can scale, that you aren’t limited to just what can be done by a single machine, but what can be done by many machines. So scalability, like when defining a service is like increasingly becoming table stakes. People expect to be able to get your service from like from all around the world on day one. Or people expect to like.
[4:16] Charles Frye: people expect to be able to share something you’ve posted publicly on a social media network and draw 100,000 of their closest friends to your service for a day. And going down when that happens is something of a rite of passage, but it’s also a bit embarrassing. So defining this in a way that it can scale is really critical. And the way that people figured out to make these things scale is to distribute them, to spread them across. many machines possibly in many different data centers.
[4:46] Charles Frye: So this is one of the most important ways that applications and services are developed as distributed services in the Cloud. The problem with that is that it’s really, really hard. And so one of the goals of modal is to give you the ability to define these scalable services in a simple way in the sort of like comfy, like velour chair that is Python development. So, Modal has, like, kind of Pythonic versions of all of these, like, all these things that you need in order to build your scalable service.
[5:27] Charles Frye: So we have a very, like, simple and Pythonic way to define web endpoints or web servers. So you should just wrap any arbitrary web server that you want and turn it into something that is, like, you know, that can be distributed, scaled up with, you know, no YAML files. and only the configuration you need inside of your code for defining that service. For storage, we have, I think, in the end, distributed databases, like managed database services are kind of the way a lot of people build these things.
[6:01] Charles Frye: So we don’t yet really have a database primitive, but there’s other kinds of storage that you need to do. So one is like sort of caching or like communication between once you scale something out, now all of a sudden you’ve got hundreds of computers involved in delivering your service. You can’t just store stuff on a disk or store stuff in memory and share it between processes. You need a distributed version of storage. And so we have dictionaries and queues that are on our service and natively distributed.
[6:34] Charles Frye: And then you also can’t write things to the file system at one point and expect them to be available at another one. It’s a little bit naughty to… do that when you’re building a web server anyway, but that used to kind of fly and work, but that’s increasingly untenable. And so there’s, like, Modal has volumes for, like, creating something that looks like a file system but can be accessed from many different locations, many different workers. And then lastly, and maybe most prominently, Modal has a bunch of facilities for defining compute.
[7:10] Charles Frye: So the scalable units of work or things you might want to do, so like a Python function is kind of the unit of work that we offer. And then you can say, okay, this is the Python function that I want to run when somebody hits this endpoint. Or when I’m running my ETL job, here’s a Python function that I want to run. And yeah, so let me dive in a little bit deeper on each one of these as we go.
[7:42] Charles Frye: So, going to have some screenshots from the modal dashboard and kind of walk you through them. So, for example, you might have a model inference function. So this sort of encapsulates all the work that needs to be done to do inference. And then as people like hit your service and consume your service, that function gets called. So that’s this function call graph up here at the top of this diagram. And then as that happens, you’ll sort of scale up and down and consume different amounts of resources.
[8:13] Charles Frye: I think this is maybe a great example for sort of scaling up and down. You can see that it’s consuming different amounts of CPU resources at different times. So when there’s like lots of requests at once, the CPU works harder. The amount of memory being used increases. This one seems like it fit on a single GPU and we never needed to scale up from one GPU to more than one. But. But we have all those resources available for this function when it’s called and we can bring more to bear.
[8:46] Charles Frye: If a bunch of calls come in at once and the function can’t service them as quickly as they’re coming in, then another function gets scaled up. So this one, I think with a lot of model inference functions, you define them in such a way that they can take in lots of inputs, more than one input at once. So that’s why this one has many inputs coming in, but only one function available. Right. And then on the bottom here, another way that functions get used is as the sort of like really easy cron jobs.
[9:17] Charles Frye: So something that runs regularly on a period. In this case, this is for a little tiny retro display called a tidbit in the modal office. I was showing a poem written by Mixtrel every hour. So 45 minutes past the hour. So 15 minutes before the hour, I would spin up Mixtrel, write a poem. save it somewhere, and then display it on the display for the next hour. And both pushing the poem to the display and writing the poem was done via these brawn jobs, these regularly running periodic functions.
[9:53] Charles Frye: So those are great for pulling information from a production database into a data warehouse, or running a regular analysis on data that’s in a data warehouse, or regularly fine-tuning and retraining a model. on a certain cadence as more data comes in from production or rerunning your evals with live new user data to check for rift in production. All these things are well supported by some kind of regular execution cadence. I’m going to quickly check the Q&A section to see if there’s some questions. Oh, yeah, Wade had an interesting question from a couple of minutes ago.
[10:37] Charles Frye: Is a database service coming? Would love to deploy a serverless Postgres alongside a serverless API app. So I would say that’s not something that we have planned immediately. And in particular, serverless Postgres is more challenging. There’s kind of like two types of databases that people use these days. OLTP transaction processing databases and OLAP analytics processing databases. It’s relatively straightforward, and we have some examples of how to run analytic workloads on modal. So that’s like your database is in a big collection of parquet files somewhere. You run analytics on it regularly.
[11:17] Charles Frye: Maybe you do it with one of these cron jobs to pull down these parquet files that you’re writing to S3, run some analysis on it. So that is… There’s not a modal primitive for that, but we do have some examples of how to use your favorite DuckDB, or maybe we don’t have a Polar’s example, but how to use those. I think we have a DuckDB pulling Parquet files or Arrow files from S3 example that shows you how to do that. For transaction processing, the workloads look very different, and it’s much more challenging to scale out.
[11:56] Charles Frye: transaction processing. It’s much harder to build a distributed database that operates on this row-by-row level where you’re like, join seven tables together to pick out this particular row that represents the user whose shopping cart I’m updating and then update that row. It’s doable to do distributed databases, but it’s much more challenging. And so for that… We very much lean on other services for hosting. So that would be something like Neon would be a great serverless Postgres option. There’s another one and I can’t believe I’m blanking on it. What’s the other really good serverless Postgres? Supabase.
[12:38] Charles Frye: Right. I’ve actually used Supabase more than Neon. But Supabase and Neon both have I think Supabase has scale to zero semantics and pricing, which we’ll talk about in a second for what serverless means. But in both cases, they work as a great external service for giving you these Postgres, the most popular open source, longstanding transactional database as something you can talk to from your serverless API app. And I’ve deployed that on modal many times and had a great time with it.
[13:17] Charles Frye: great yeah great comment from Prasad as well okay I’ll answer some of the other questions are bigger and I’ll answer them like you know as we go through the talk but please keep them coming and I’ll make sure that they all get answered live great so in like computing is fun and good, but computing is not super useful without storage. Like, data is what ends up defining the value of a lot of, like, applications these days. And so, modal has lots of features for sort of, like, storing data.
[13:56] Charles Frye: So, I’m going to focus more on the long-term storage side rather than the dictionary and queue stuff. If there are questions or interest in… out like job cues and dictionaries and modal, please post in the Q&A and I’ll review it. But the part that’s more interesting or more important with people who are doing LLM fine-tuning and ML inference and ML training workloads are these file system abstractions, volumes, which you can use to store weights and data sets. So these are just some of the volumes that I have stored on modal. I have…
[14:39] Charles Frye: Those model weights are, I think, from Axolotl fine-tuning runs, but there’s some training runs that I’ve run. You can see I’ve got CIFAR-10 data. I think that’s actually CIFAR-10 data and models in a single little volume or file system. Wikipedia, like a raw Wikipedia dataset in from Hugging Face datasets as arrow files, if I remember correctly for that one. So like reasonably sized data set, maybe in the low gigabytes. And we have some examples to show you how to store really, really big terabyte scale, maybe even low petabyte scale data sets on modal.
[15:22] Charles Frye: Those are in our examples recently added, actually since the course started. Some examples of how to store really large data sets. So these are the important thing about this is that it’s a form of storage that looks to your Python code like its local files, but is in fact distributed, you know, like a distributed file system, like a nice robust like cloud native file system. So the primary like catch or.
[15:54] Charles Frye: The design choice that we made with the volumes is that they’re designed for writing a very small number of times, but reading a very large number of times. Or worm, write once, read many workloads. So that’s pretty common for datasets. You don’t repeatedly overwrite the same dataset element over and over and over again. And with model weights… like you don’t overwrite the same model weight version over and over again. You might like repeatedly write new weights, but you aren’t overwriting the same weights.
[16:26] Charles Frye: And the nice thing about that is it’s much easier to scale out many readers than it is to scale many writers. And so these volumes work really well for those kinds of workloads. That’s one reason why it can be easier to run these like analytical, analytic databases on, on with modal is the like fully the backing.
[16:45] Charles Frye: like cloud platform for it because they frequently have write once read many workloads where you write like uh you dump like a big file and then you read it from like many workers to run in parallel um as opposed to like writing from many different places if you’re running like distributed redis or distributed postgres you’ve got like writes coming in from multiple like write replicas um you Right. Okay. Yeah. So let me check Q&A, see if anybody was, if people were thirsting for discussion of dictionaries and queues.
[17:21] Charles Frye: They’re not, but I do see a good relevant question from Philip Thomas. Will you guys charge for storage? I can’t see pricing on it now. We will eventually have to charge for storage, I think. And the goal is to charge for storage at roughly the price of S3, which is the price that we’re basically getting it at. So we do things like look around to find storage primitives that are available that are S3 compatible that allow us to run this service more cheaply. But there is a limit and storage is not free. So I think…
[18:03] Charles Frye: Yeah, but for now, we aren’t charging for it, and you can store lots of stuff there. I think if you were to start sending us petabytes of data every day, we might have to have a conversation about what your plans are with that data. But yeah, great question. And yeah, so other good questions, but not anything that I see on storage. So I’ll keep going forward. And then, so last but certainly not least, there’s input and output to your computing storage, the ability to interface with this stuff.
[18:40] Charles Frye: So kind of I think the most blessed way to use that in modal is with fast API apps. So I have the swagger docs for a little fast API app that’s kind of like a fake version of Twitter. They’re on the right-hand side. So with the like fast API is really nice. You get asynchronous concurrent Python without as much of the agonizing pain of writing async await. And it’s got great documentation.
[19:10] Charles Frye: And it’s really great to do that with modal in particular, because async like sort of plays more nicely often with kind of scaling out and running lots of different workers and gets you more juice out of any individual worker. And. And… anything else with async that’s important on our platform. Oh, yeah, and it’s also like Modal’s design to be able to run sort of like weird compositions of synchronous and asynchronous functions without throwing some of the weird error messages you might be used to seeing if you run async.
[19:46] Charles Frye: So like you write just fast API code, which is like very Pythonic and clean. And then you don’t have to worry about running your own event loop. We’ve got like, you know, we run your own event loop. that we keep any of our code sort of like out of. So you get good performance with it. And then you can, but if you don’t care about event loops or you’re not thinking about it, you just run synchronous functions and nothing goes poorly. And you can just pretty much just start flapping asynchs around in your modal code.
[20:19] Charles Frye: At least your like modal decorated code without needing to like refactor your app. at least you might need to refactor it to get performance, but that’s kind of the way async works. You won’t get, yeah, we have some nice async features for voiding. them, you know, coroutines not being called. If you’re a concurrent Python person, that sounds exciting to you. So, yeah, so we have web endpoints. Those, like, FastAPI is an example of an asynchronous server gateway interface, ASCII web framework. So we actually aren’t, we have some, like, nice features that are, like, FastAPI-centric, like…
[21:05] Charles Frye: Fast API is a dependency of modal, so it comes along for the ride. And when you install it, you don’t need to think about managing another dependency. And then we have web endpoints, which just take a Python function and turn it into a URL you can hit. And that’s, for now, based off of Fast API. But really, what makes Fast API tick underneath is this asynchronous server gateway interface protocol for defining a web service. as something that interacts with other things.
[21:39] Charles Frye: And so you can, any ASCII app can be run on modal using like the ASCII app decorator. And so that’s like one level up of… additional flexibility but with some complexity. Parallel to that, the other framework for serving for web Python is WSGI. The most famous or popular WSGI Python framework is Flask. If you’ve ever made a Flask app, you’ve made a WSGI app. WSGI flask. It’s a joke that almost nobody gets, I think. So WSGI is based off of no asynchrony, no concurrent Python. And the tooling there is older and more mature.
[22:34] Charles Frye: And so I think Django is all sync and has a WSGI component. I haven’t looked at that part of Django in a minute. But so like you might find more batteries included projects and like you have this thick stack that covers everything with whiskey um but and you don’t have to worry about all the stuff i was just talking about about async and await and go routines um but you might um But you might leave some performance on the table and you aren’t.
[23:05] Charles Frye: And some of the hottest, newest libraries with cool new features are sort of more focused on the ASCII stuff. But you aren’t… So ASCII, WSGI, these are like server gateway interfaces. It’s like a way to define a web service generically. We also just let you run any web server you want. And just like… It can be… It doesn’t even have to be Python. There are people running like… go servers and stuff. And that is our like final sort of exit point of like, oh, I just want to run any old web server.
[23:36] Charles Frye: I’m going to call it with subprocess. So it’s as though I’m like calling it from the command line with bash. And I just want modal to handle like scaling this thing up and down and like keeping it alive and, and like HTTPS and all that kind of stuff. Yeah, some comments on this one. So let me go through these. Yeah, for web endpoints, is there any mechanism to prevent something like DDoS attack from mobile.com by default? Can we expect something like that in the future?
[24:10] Charles Frye: Yeah, exposing web points and public website and giving you H100 GPUs, yeah, you can get people coming in and trying to use them without bringing you any benefit. So we don’t have any protection like that built in. I think that is a good idea and something we could offer in the future. We could offer in the future. So I’ll definitely bring that up as a potential product feature. I think right now, you can wrap authentication if you’re using FastAPI or Flask. There’s middleware for authentication. Really, in the end, preventing DDoS attacks.
[24:47] Charles Frye: There are ways to prevent suspicious traffic patterns that can help with this. But in the end, it ends up putting things behind a login. YouTube and Twitter are contemplating… Twitter’s already made it that you have to log in to read stuff. YouTube’s contemplating it. And that’s the only way to prevent people from accessing your API without you thinking about it if it’s out there. So yeah, but something like CloudFlare DDoS production is a cool idea and definitely worth exploring. All right, there’s a question about WebSocket interactions with the max execution time.
[25:27] Charles Frye: I haven’t used the WebSocket stuff very much, so I’d recommend bringing that up in the modal Slack. You can probably get some good answers on that. And let’s see. A point of clarification from Hugh Brown. Django has support for writing async views along with an entirely async-enabled request stack if you’re running under ASD. Okay, interesting. All right, yeah, yeah. Thanks for the call-out, Hugh, on that. I haven’t tried async Django. And then there was a point of clarification on storage stuff. Some other questions came in. Petabyte-sized dataset living within your store?
[26:11] Charles Frye: I actually, I think that one might be backed by S3, like raw S3 rather than modal’s like volumes that are on top of Cloud Storage. And yeah, another question about petabytes, data sets, does current pricing or lack of it also apply to data transport costs? Yeah, we currently aren’t charging on ingress egress. If that starts to be a large cost for us, because people are doing big, like, you know, doing huge jobs for it, we’ll have to implement pricing.
[26:40] Charles Frye: But again, the goal is to stay, you know, just above the raw storage that we’re being charged for. How does a mount differ from a volume? Is mount meant for local dirs? Yeah, I kind of skimmed, just breezed right past this one. Mount is take something that you have on your local computer and make it available to the stuff running on modal. So that’s like we mount your… like, mount your Python file in order to be able to run it, for example, every time. But you can also mount your own stuff.
[27:15] Charles Frye: Like, oh, I need this, like, file for my static site. I need to mount my assets. That’s what mounts are for. Great. Okay. So there’s some cool stuff that I want to make sure to get to. And yeah, but great questions. And if I don’t answer your question here, please do like post it in the discord or in the modal Slack and I’ll make sure it gets answered. Okay, and the last piece about these scalable services that we’ve kind of been dancing around a little bit is that they’re serverless, simple, scalable serverless services.
[27:49] Charles Frye: I can’t say it five times fast. So server, like this is like kind of important part of what makes modal, even though it has like a higher sticker price, often an economical choice and not just like developer experience and ergonomics based choice. or teams. So the key thing here is that if you run any kind of service, you’ll find that your resource utilization is variable over time. So you might have slow traffic patterns of like, oh, people in particular time zones use my service in particular ways and cause resource utilization to increase.
[28:24] Charles Frye: You might have big spikes, like when things end up on the front page of Hacker News or whatever. And the sort of classic way to handle this is to provision ahead of time for whatever you think your peak usage is going to need. So the pink bar there is how much resources you provision. if you’re not doing cloud development and you’re doing on-prem or co-located or running your own data centers, you have no choice but to provision for peaks ahead of time.
[28:57] Charles Frye: And the bad news is that resources generally cost money as a function of time. And so if you’ve provisioned resources up for the peak and then you aren’t using it at peak the whole time, you’re going to be paying for it. So you have to get really clever with resource utilization like Amazon did. And we’re like, wow, we need all these computers for Christmas. We don’t need them the rest of the time. Might as well invent cloud computing to make use of that.
[29:21] Charles Frye: But not everybody has that ability to, or has something else they can do with their resources off-peak. Hamil, I saw you call them off-camera. You got a hot take to drop. No, I was just getting ready for questions. So I thought this was a good time to… Don’t get distracted by me. Sorry. Great. You never know when Hamil’s about to drop a truth banger on you. So, yeah. So, yeah. So then the other thing people do, if you are not, like, buying resources, you can manually provision things on the cloud.
[30:00] Charles Frye: So when you start to see your, like, resource utilization go up, you say, oh, well, let’s spin up some more stuff Then when you see it go down, you spin things down. And then you have an oh shit moment when you hit the front page of Hacker News and you suddenly need to spin up like 100 copies of your service. And then you wait for a while to see whether the traffic spike is really gone before spinning back down. So this works well to reduce your excess costs.
[30:29] Charles Frye: As you can see, the area under the curve or area between the two curves has gone way down when we switch over to manual provisioning. But we’re also starting to get a lot more area sort of like above the curve here where like, oh, we’re missing some stuff. Like we didn’t have enough resources. And that usually leads to like poor user experience. And this is also very stressful on teams. So you can do automatic provisioning and auto scaling.
[30:56] Charles Frye: And like one of the primary tools for doing this is like Kubernetes has auto scaling in it and will auto scale up containerized services on clouds. And so that can get your resource utilization to kind of like match. Often slightly lagging, just like with the manual provision, there’s a lag. You can see this kind of time lag of the pink curve relative to the green one. but computers are generally faster than people and can be programmed to be nearly as good as them at resource utilization. And so you tend to track it more closely.
[31:33] Charles Frye: But with auto-scaling, it will be easier for you to match this curve, the sort of smaller your units of auto-scaling are. And so when you run the… Yeah, what’s achieved by this is that your costs match your actual resource utilization. And if your resource utilization that you are… What’s the way to put this? If your resources scale to match your needs, and they can also scale all the way down to zero, that combination is what people call serverless.
[32:11] Charles Frye: The usual way that that’s achieved is by packaging things up as something much smaller than a whole virtual machine, but just as a function. So it’s also called functions as a service, though that term has fallen out of favor in favor of serverless. And so the nice thing about this is you are like…
[32:34] Charles Frye: generally like you’re paying a lot less for the resources that you need and delivering a superior user experience so what makes this possible like why like you can do this yourself there’s a lot of engineering effort required so that’s one reason it’s like does it make sense to expend this engineering effort but infrastructure is important and like it’s like why would i go to somebody like modal to do this auto scaling for me and run these resources for me this is like like infrastructure is an important part of my business.
[33:06] Charles Frye: Why would I let them do it? The key idea is that… Oops, this thing should not be there. like if you look at the resource utilization of a serverless platform like modal, we have many people’s like resource utilization profiles superimposed on top of each other. So our baseline is way higher. We don’t have to worry about going all the way down to zero, which can be especially painful. And those fluctuations that are big for any individual like consumer of the platform are small relative to the entire platform.
[33:39] Charles Frye: And so those fluctuations that were really challenging and speedy. for the, like for an underlying user become much less, the engineering complexity is much less and is amortized over a much larger workload. And so there are like, things that can be engineered at that scale and with those time scales that can’t be done at the lower level. And so that’s the sort of argument put forward about the economics of serverless computing in this Berkeley View paper from Eric Jonas and others.
[34:15] Charles Frye: It’s a great paper, kind of changed a lot of the way that I think about cloud computing and computing in general. All these Berkeley View papers that have the little Campanile on the front are bangers. Ian Stoica’s and David Patterson are on most of them and they’re all like they’re all really incredible great okay, so this is set to end at 1015 right Hummel it’s ending at 1015 so we have about 7 minutes or so got it Okay, I do want to show one thing. I will stick around for questions.
[34:57] Charles Frye: And for another like 15 minutes at least after the set time, but there’s one more point I wanted to get to. which is like the core idea with what’s going on with functions of service. Like I think in general, but certainly on modal is like an old idea being revisited, remote procedure calling. The general idea with remote procedure calling is that your local code calls a function.
[35:21] Charles Frye: And instead of that function being like code, like binary code that lives somewhere else on your machine or like some other process or something, it’s like it lives on a totally different machine.
[35:31] Charles Frye: and the goal of remote procedure calling is trying to make that as transparent as possible, but no more transparent, and that’s the idea with modal, and that’s basically, like, we use gRPC as our, like, framework for doing that, and it’s, like, it can feel very confusing and weird because modal otherwise makes it feel so much like local Python coding when you run into this piece of the system. So it’s worth kind of walking through in this deeper dive. The idea is your code is on your machine. That’s the script you’re writing.
[36:06] Charles Frye: And then you ask to run a modal function. If that modal function is defined inside your script, when you do modal run, we’re grabbing that function, pushing it up to our machine so that it’s ready. to be run from your local code dynamically when it’s needed. So that’s the source of kind of like, why does modal have all this stuff? Why do I need to be careful about the global scope? Why do I need to put imports inside functions?
[36:32] Charles Frye: It’s so that we can do this magical sort of like remote procedure calling of a function that’s defined inside the same file. But the core idea is just to get that code onto our machines where we’re running the Python environment that you asked us to run. and have that code available so we can run it on a GPU, even though your local machine doesn’t have a GPU. And so we run your code on our machines for that.
[37:01] Charles Frye: So I wanted to show this quick demo I built of this in only a couple hundred lines of code. But I don’t think we’ll have time to do that. So this mini-modal shows that core piece of it as 200 lines of code that runs just in a separate Python process locally, but shows the basic structure of having an app and using decorators to send code somewhere else to run. So if you’re interested in it, some people have asked some questions about the GUPS, how does this work? That’s all at mini-modal. on my GitHub.
[37:41] Charles Frye: If you’re interested in that and want to dive deeper, please hit me up. Okay. So you create your own modal? Like kind of to kind of understand how it works in a way? Yeah, yeah. Local only, local only, very dumb version of modal. And like the only thing it does is sort of like separate out virtual environments. So that’s like almost useful to like be able to call code from multiple Python virtual environments. Yeah. What is this symbol here? Oh, that’s my dumb version of the modal logo.
[38:15] Charles Frye: I asked GPT to make an SVG of two cubes, and I knew it would get it wrong, and it made this delightful, bad modal logo. Yeah. All right, so I want to make sure to answer some of the questions that came in, because there were a lot of really good ones. Yeah, okay, great question from Korean at a high level. Yeah, let me… Get to the end there. All right. So how is modal hosting Suno.ai? And it’s like, hooray, it says my favorite Gen.ai application for the last three to four months.
[38:56] Charles Frye: Yeah, so there’s a blog post where Suno talks a little bit about why they chose to use modal. The details are kind of interesting. Like they’re making heavy use of modal’s platform. Like there’s things like we have all these remote storage. things like dictionaries and queues. So that’s used to manage work. And yeah, but fundamentally, they’re using functions, prongs, volumes for storage, setting things up as web endpoints, using all of the stuff that we talked about. We need to leave at least a minute or so for the transition between. We’ve learned.
[39:44] Charles Frye: So, in between different sessions, because like one, when it’s back to back. Oh, right. Yeah. Um, there’s another session immediately after this one. Yeah, there’s another session from your favorite company immediately after. Oh, yeah. Replicate has their office hours. Okay, so I so I can’t stick around here in the zoom. Let me jump into the Discord voice chat. Yeah, totally. We’ll probably do that. You’re an admin, so you can create one. Yeah. Cool. All right. I’m going to quickly screenshot everybody’s questions lo-fi style and drop into the voice chat. Great. Thanks, everyone.
[40:32] Charles Frye: Thanks for great questions. Hope to be able to answer them. If I didn’t answer it and you can’t make it to the voice chat, please post it in the channel on modal. Wasn’t able to watch the Discord, so I’ll also answer any questions that came in live there. Hope you enjoy the platform and you know where to find me on Discord and on Twitter, Charles underscore IRL. Look forward to seeing what you build. These are $1,000. Yeah, use your $1,000.
[40:56] Charles Frye: The other $500, if you didn’t get it last week, will be coming out today, midnight UTC, Masamanos. So make sure to use modal by then. All right. Take care, y’all. All right. Thanks.