The future of data pipelines

Company

Aiven

Date Published

Nov. 23, 2018

Author

Word count

2614

Language

English

Hacker News points

None

URL

aiven.io/blog/the-future-of-data-pipelines

Summary

The future of data pipelines will be characterized by handling massive scale. As technology evolves, the amount of generated data grows exponentially, transforming in terms of amount, velocity, purpose, trajectory, and format. Data pipelines must adapt to accommodate this growth, focusing on functionality, design, compliance, usability, performance, and scalability. Future pipelines will need to support core-to-endpoint systems, handle near real-time data, and be capable of auto-scaling, sharding, and partition tolerance with minimal human interaction. They should also be troubleshootable and configurable on the fly, agnostic to various formats, and implement measures for error handling. Analytics pipelines will increasingly serve as a funnel/conduit for data used in training AI and ML models. Design considerations include incorporating a kill switch, making data in the pipeline modellable on the fly, and implementing immutable, ordered event logs. Compliance with regulations like GDPR is crucial, and various methods can be employed to secure data in an event log. Usability will involve GUIs for the entire lifecycle of pipelines, while performance improvements will focus on reducing latency and increasing mean-time-to-failure. Scalability will require decisions about which data to keep volatile and temporary and which to store persistently, as well as massive autoscaling capabilities.