Why Use K-Means for Time Series Data? (Part Two)

Company

InfluxData

Date Published

Oct. 2, 2018

Author

Anais Dotis-Georgiou

Word count

1338

Language

English

Hacker News points

None

URL

www.influxdata.com/blog/why-use-k-means-for-time-series-data-part-two

Summary

K-Means is used for anomaly detection in time series data by first windowing the data into segments, then clustering these segments using K-Means. The centroids of the clusters represent different shapes or polynomials that the data takes. By analyzing the shape of each cluster and its position in the 32-dimensional space, it's possible to detect anomalies in the data. However, K-Means has limitations, such as only converging on local minima, which can lead to poor clustering and predictions if initial centroids are placed poorly. Additionally, using the Euclidean distance as a similarity measure can be misleading, especially when dealing with non-uniform time-steps or sensor data.