/plushcap/analysis/acceldata/why-move-from-spark-on-yarn-to-kubernetes

How (And Why) To Move From Spark on YARN to Kubernetes

What's this blog post about?

Apache Spark is a popular open source distributed computing framework that enables data engineers to process large amounts of data across multiple machines. It is optimized for machine learning and AI, making it valuable in batch processing tasks. Traditionally, companies have used the Java Virtual Machine (JVM)-based Hadoop YARN to manage their Spark clusters. However, with the rise of Kubernetes and cloud-native computing, many organizations are moving away from YARN to Kubernetes for managing their Spark clusters. Kubernetes offers numerous potential benefits such as scalability, open source flexibility, and compatibility with various infrastructure types. The transition from YARN to Kubernetes can provide better dependency management, resource management, and access to a rich ecosystem of integrations. Key steps in this migration include determining the complexity of jobs, evaluating data connectivity needs, analyzing compute and storage latency, and auditing monitoring and security policies. Switching to Spark on Kubernetes can yield significant benefits for data engineers, including simpler dependency and resource management, value-added integrations, and cost savings opportunities.

Company
Acceldata

Date published
Nov. 4, 2021

Author(s)
Rohit Choudhary

Word count
1078

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.