/plushcap/analysis/clickhouse/clickhouse-clickhouse-mta-data-challenge-subway-transits-demo

ClickHouse and the MTA Data Challenge

What's this blog post about?

The Metropolitan Transportation Authority (MTA) has launched an Open Data Challenge for developers and data enthusiasts to create projects using MTA datasets. One of the largest datasets available is the turnstile dataset, which contains information on entry/exit values for turnstiles in New York City over several years. ClickHouse, an OLAP database designed for scale, has made this dataset available in their new playground where users can query the data for free. The text provides a detailed guide on how to load and clean the MTA transit dataset using ClickHouse, including schema improvements, handling cumulative values and outliers, and dealing with missing or inconsistent station names.

Company
ClickHouse

Date published
Oct. 24, 2024

Author(s)
The PME Team

Word count
3433

Hacker News points
None found.

Language
English


By Matt Makai. 2021-2024.