Getting started with HDFS on Kubernetes
This text discusses the process of running Hadoop Distributed File System (HDFS) on Kubernetes. It begins by explaining the basic architecture of HDFS and then moves onto how to architect it on Kubernetes. The author highlights the challenges faced while deploying HDFS on Kubernetes, such as pods going down and coming back up with different IP addresses, and provides solutions for these issues. They propose wrapping the namenode in a Service resource and using Stateful Sets to identify datanodes. Additionally, they demonstrate how to run fully distributed HDFS on a single node using Kubernetes Persistent Volume (PV) resources. The text concludes by mentioning that a follow-up blog post will showcase deploying Apache Spark on Kubernetes to process data stored in the new k8s HDFS.
Company
Hasura
Date published
Feb. 13, 2018
Author(s)
Tirumarai Selvan
Word count
1041
Hacker News points
None found.
Language
English