InfluxDB Internals 101 - Part Two

Company

InfluxData

Date Published

Nov. 27, 2017

Author

Ryan Betts

Word count

1462

Language

English

Hacker News points

None

URL

www.influxdata.com/blog/influxdb-internals-101-part-two

Summary

InfluxDB is queried using a SQL dialect called influxql, which focuses on the query engine's internal workflow rather than the language semantics itself. The database supports two main querying patterns: windowing and searching for specific points, both of which filter data by criteria applied to dimensions stored as tagsets. InfluxDB maintains an index to make filtering by tagsets efficient, storing mappings of measurement names to field keys, series IDs, and tags. The current default index is stored in-memory, but a new index structure called Time Series Index (TSI) is being developed to store the index on SSD, allowing for higher cardinality datasets. The query engine parses, plans, and executes queries by determining the type of query, separating time range and condition expressions, and validating semantic correctness. InfluxDB uses flexible schema-on-write tagsets instead of pre-defined dimension tables in a star-schema, making it different from traditional SQL columnar OLAP databases. The database supports retention policies to enforce time-to-live policies against data, but also allows for DELETE and DROP statements to remove unwanted points, which can be expensive due to the need to undo work on disk. InfluxDB does not support an UPDATE statement, but re-inserting a fully qualified series key at an existing timestamp will replace the old point's field value with the new one.