Company
Date Published
Author
Vihan Lakshman, Pratik Pranav, Siddharth Jain, Tharun Medini
Word count
1643
Language
English
Hacker News points
78

Summary

This startup, ThirdAI Corp, has developed a new deep learning framework called BOLT, which efficiently trains large models on standard CPU hardware by making sparsity a first-class design principle. The company leveraged Ray for distributed training of their models, achieving near-linear scaling for terabyte-scale datasets and billion-parameter models. By using Ray's distributed data parallel engine, ThirdAI was able to quickly build an industry-grade solution with features like fault-tolerance, multiple modes of communication, and seamless scalability. This approach allows for the democratization of deep learning in a sustainable manner, as specialized hardware is not required, reducing costs and energy consumption. The company has also simplified their developer experience by transitioning from Ray Core to Ray Trainer, which provides a streamlined training pipeline, enhanced fault tolerance, and refined automatic scaling. Experimental results demonstrate the performance of BOLT on various benchmarks, showcasing its competitive efficiency on CPUs.