Exploring Global Internet Speeds using Apache Iceberg and ClickHouse
This post is about a new file format called "Iceberg" which provides for efficient storage and retrieval of data in columnar databases such as ClickHouse. The author explores how to use SQL queries with this new file format, and demonstrates some examples using real-world datasets like the Ookla internet speed dataset. The post begins by explaining what Iceberg is and why it's useful for storing large amounts of data in a columnar database. It then goes on to describe how one can use SQL queries with this new file format, including details about importing and querying data stored in an Iceberg file. Next, the author delves into some more advanced topics related to working with geospatial data using UDFs (User Defined Functions) in ClickHouse. They show how to create a view that simplifies SQL queries for visualizing internet speeds across different regions worldwide. This involves computing centroid coordinates and generating colors based on download speed values. In conclusion, the post provides an overview of the Iceberg file format and demonstrates its utility for working with large datasets in ClickHouse. It also showcases some practical applications of using SQL queries to analyze geospatial data using UDFs.
Company
ClickHouse
Date published
Feb. 8, 2024
Author(s)
Dale McDiarmid
Word count
5501
Language
English
Hacker News points
None found.