/plushcap/analysis/acceldata/data-engineering-best-practices-linkedin

How LinkedIn Scales Its Analytical Data Platform

What's this blog post about?

LinkedIn operates one of the largest analytics platforms in the world, with approximately 20,000 Hadoop nodes storing one exabyte of data. The company's largest data cluster comprises 10,000 nodes storing 500 PB of data including one billion objects (directories, files and blocks), all using the Hadoop Distributed File System (HDFS). LinkedIn has developed various tools to optimize its storage and compute performance, such as Azkaban for workflow job scheduling, Robin for load balancing, and DynoYARN for performance forecasting. The company is currently in the process of migrating its on-premises Hadoop clusters to the Azure public cloud owned by parent Microsoft.

Company
Acceldata

Date published
March 31, 2022

Author(s)
Acceldata Product Team

Word count
2308

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.