/plushcap/analysis/datastax/datastax-cassandra-file-system-design

Cassandra File System Design

What's this blog post about?

The Cassandra File System (CFS) is an HDFS compatible filesystem designed to replace traditional Hadoop NameNode, Secondary NameNode and DataNode daemons. It simplifies operational overhead by removing single points of failure in the Hadoop NameNode and offers easy Hadoop integration for Cassandra users. CFS is modeled as a Keyspace with two Column Families in Cassandra: "inode" and "sblocks". The "inode" column family contains meta information about a file, while the "sblocks" column family stores the actual contents of the file. Meta information includes filename, parent path, user, group, permissions, filetype, and a list of block IDs that make up the file. CFS splits a block into sub-blocks since it relies on Thrift, which does not support streaming, to prevent overloading the node with large amounts of data at once. When a read comes in for a file or part of a file, CFS executes a custom Thrift call that returns either the specified sub-block data or, if the call was made on a node with the data locally, the file and offset information of the Cassandra SSTable file with the subblock. This approach cuts down network traffic between nodes by compressing and decompressing sub-blocks on the client side.

Company
DataStax

Date published
Feb. 11, 2012

Author(s)
Jake Luciani

Word count
767

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.