Getting started with HDFS on Kubernetes

Company

Hasura

Date Published

Feb. 13, 2018

Author

Tirumarai Selvan

Word count

1041

Language

English

Hacker News points

None

URL

hasura.io/blog/getting-started-with-hdfs-on-kubernetes-a75325d4178c

Summary

This text discusses the process of running Hadoop Distributed File System (HDFS) on Kubernetes. It begins by explaining the basic architecture of HDFS and then moves onto how to architect it on Kubernetes. The author highlights the challenges faced while deploying HDFS on Kubernetes, such as pods going down and coming back up with different IP addresses, and provides solutions for these issues. They propose wrapping the namenode in a Service resource and using Stateful Sets to identify datanodes. Additionally, they demonstrate how to run fully distributed HDFS on a single node using Kubernetes Persistent Volume (PV) resources. The text concludes by mentioning that a follow-up blog post will showcase deploying Apache Spark on Kubernetes to process data stored in the new k8s HDFS.