Kindling: An Introduction to Spark with Cassandra (Part 1)
Erich is the CTO of SimpleRelevance, a company that specializes in dynamic content personalization using data science tools. Before joining SimpleRelevance, he worked on scalable distributed architectures and studied mathematics and computer graphics in college. He enjoys studying category theory and functional languages like F# and Clojure. This article introduces Apache Spark, a recent Hadoop successor that supports both batch and stream processing, multiple programming languages (Scala, Java, Python), in-memory computations, an interactive shell, and an easier-to-use API compared to Hadoop. The focus of the article is on setting up Spark with Cassandra and providing a small example of what can be done with Spark. Erich shares his experience learning Spark and how he used it in a talk at the Chicago Cassandra meetup. He emphasizes the benefits of in-memory computations, interactive shell, and Spark's API design, which is based on Resilient Distributed Datasets (RDDs). The article also provides steps to set up a server with both a Spark node and a Cassandra node, as well as instructions for connecting Spark to a Cassandra cluster using the Cassandra Connector.
Company
DataStax
Date published
Jan. 20, 2015
Author(s)
Erich Ess
Word count
1329
Hacker News points
None found.
Language
English