How We Did It: Data Ingest and Compression Gains in InfluxDB 3.0

Company

InfluxData

Date Published

Oct. 4, 2023

Author

Rick Spencer

Word count

1821

Language

English

Hacker News points

None

URL

www.influxdata.com/blog/improved-data-ingest-compression-influxdb-3-0

Summary

InfluxDB 3.0 achieves drastic improvements in data ingest efficiency and compression by introducing a new data model that persists data by table, rather than time series, and uses the Parquet file format for efficient storage and querying. The database's default behavior generates a new Parquet file every 15 minutes, with each file representing one day of data for a single measurement, and limits the size of each Parquet file to 100 megabytes. InfluxDB 3.0 also optimizes analytical queries by using custom partitioning, which allows users to define their own partitioning scheme based on tag keys and values, and enables faster query performance for specific query types. The database's ingest process is streamlined, requiring fewer compute resources compared to previous versions, and uses a write ahead log (WAL) to ensure durability and availability. Additionally, InfluxDB 3.0 optimizes leading edge queries by using the Apache Arrow ecosystem, including Parquet, for high-performance analytical queries on large datasets. The combination of these features results in significant compression gains, enabling users to store more data in less space for a fraction of the cost.