The importance of speed when loading large language models is discussed, particularly in the context of the Llama 2 series of models. The current process of loading a model into GPU memory can take up to 10 minutes and involves multiple steps such as getting a node from the cluster, pulling down the docker image, setting up the environment, fetching data from S3, decoding the model, and transferring it to GPU memory. This process is slow due to the sequential nature of these steps and the disk I/O being a bottleneck. To address this issue, techniques such as parallel downloading using multiple threads, streaming data directly into GPU memory, and optimizing CPU memory usage are employed. The Anyscale Model Loader is proposed as a solution that can achieve a speed increase of over 20x by leveraging concurrent downloading with multiple threads, caching data in disk for later usage, and removing network bandwidth as the bottleneck.