How LinkedIn Scales Its Analytical Data Platform
LinkedIn operates one of the largest analytics platforms in the world, with approximately 20,000 Hadoop nodes storing one exabyte of data. The company's largest data cluster comprises 10,000 nodes storing 500 PB of data including one billion objects (directories, files and blocks), all using the Hadoop Distributed File System (HDFS). LinkedIn has developed various tools to optimize its storage and compute performance, such as Azkaban for workflow job scheduling, Robin for load balancing, and DynoYARN for performance forecasting. The company is currently in the process of migrating its on-premises Hadoop clusters to the Azure public cloud owned by parent Microsoft.
Company
Acceldata
Date published
March 31, 2022
Author(s)
Acceldata Product Team
Word count
2308
Hacker News points
None found.
Language
English