Company
Date Published
Author
Stephen Balaban
Word count
431
Language
English
Hacker News points
None

Summary

Lambda Echelon is a turn-key GPU cluster designed for AI and deep learning tasks, offering a scalable solution from single rack to data center scale. It features high-speed communication fabric, leveraging InfiniBand HDR 200 Gb/s or 100 Gb/s Ethernet, and supports a wide range of compute nodes, including Lambda's Hyperplane and Blade servers. The cluster offers multiple networks for compute, storage, management, and out-of-band management, with options for proprietary and open-source storage solutions. With seasoned HPC experts providing phone support directly from AI infrastructure engineers, Echelon aims to accelerate training, hyperparameter search, and inference tasks, as demonstrated by the ability to train BERT on Wikipedia in minutes instead of days.