Kindling: An Introduction to Spark with Cassandra (Part 1)

Post Details

Company

DataStax

Date Published

Jan. 20, 2015

Author

Erich Ess

Word Count

1,329

Language

English

Hacker News Points

-

Source URL

www.datastax.com/blog/kindling-introduction-spark-cassandra-part-1

Summary

Erich is the CTO of SimpleRelevance, a company that specializes in dynamic content personalization using data science tools. Before joining SimpleRelevance, he worked on scalable distributed architectures and studied mathematics and computer graphics in college. He enjoys studying category theory and functional languages like F# and Clojure. This article introduces Apache Spark, a recent Hadoop successor that supports both batch and stream processing, multiple programming languages (Scala, Java, Python), in-memory computations, an interactive shell, and an easier-to-use API compared to Hadoop. The focus of the article is on setting up Spark with Cassandra and providing a small example of what can be done with Spark. Erich shares his experience learning Spark and how he used it in a talk at the Chicago Cassandra meetup. He emphasizes the benefits of in-memory computations, interactive shell, and Spark's API design, which is based on Resilient Distributed Datasets (RDDs). The article also provides steps to set up a server with both a Spark node and a Cassandra node, as well as instructions for connecting Spark to a Cassandra cluster using the Cassandra Connector.