Company
Date Published
Author
Rick Spencer
Word count
1821
Language
English
Hacker News points
None

Summary

InfluxDB 3.0 achieves drastic improvements in data ingest efficiency and compression by introducing a new data model that persists data by table, rather than time series, and uses the Parquet file format for efficient storage and querying. The database's default behavior generates a new Parquet file every 15 minutes, with each file representing one day of data for a single measurement, and limits the size of each Parquet file to 100 megabytes. InfluxDB 3.0 also optimizes analytical queries by using custom partitioning, which allows users to define their own partitioning scheme based on tag keys and values, and enables faster query performance for specific query types. The database's ingest process is streamlined, requiring fewer compute resources compared to previous versions, and uses a write ahead log (WAL) to ensure durability and availability. Additionally, InfluxDB 3.0 optimizes leading edge queries by using the Apache Arrow ecosystem, including Parquet, for high-performance analytical queries on large datasets. The combination of these features results in significant compression gains, enabling users to store more data in less space for a fraction of the cost.