Cassandra File System Design
The Cassandra File System (CFS) is an HDFS compatible filesystem designed to replace traditional Hadoop NameNode, Secondary NameNode and DataNode daemons. It simplifies operational overhead by removing single points of failure in the Hadoop NameNode and offers easy Hadoop integration for Cassandra users. CFS is modeled as a Keyspace with two Column Families in Cassandra: "inode" and "sblocks". The "inode" column family contains meta information about a file, while the "sblocks" column family stores the actual contents of the file. Meta information includes filename, parent path, user, group, permissions, filetype, and a list of block IDs that make up the file. CFS splits a block into sub-blocks since it relies on Thrift, which does not support streaming, to prevent overloading the node with large amounts of data at once. When a read comes in for a file or part of a file, CFS executes a custom Thrift call that returns either the specified sub-block data or, if the call was made on a node with the data locally, the file and offset information of the Cassandra SSTable file with the subblock. This approach cuts down network traffic between nodes by compressing and decompressing sub-blocks on the client side.
Company
DataStax
Date published
Feb. 11, 2012
Author(s)
Jake Luciani
Word count
767
Language
English
Hacker News points
None found.