/plushcap/analysis/confluent/confluent-process-github-data-with-kafka-streams

How to Process GitHub Data with Kafka Streams

What's this blog post about?

The text discusses using Apache Kafka to track events in a large codebase, specifically GitHub's data sources (REST + GraphQL APIs). It explains how to use the Confluent GitHub source connector to get GitHub events into a Kafka topic and then process those events using Kafka Streams topology. The author also provides an overview of data pipelines, sources, and sinks, as well as details on implementing a state store in Kafka Streams. Furthermore, the text touches upon extending the project by adding a sink and mentions other resources for learning more about Kafka demos, Flink SQL tutorials, and resolving "unknown magic byte" errors.

Company
Confluent

Date published
March 26, 2024

Author(s)
Lucia Cerchie, Bill Bejeck

Word count
1528

Language
English

Hacker News points
None found.


By Matt Makai. 2021-2024.