A History of HTML Parsing at Cloudflare: Part 2
In 2017, developers using the Cloudflare edge compute platform Workers wanted HTML rewriting capabilities similar to those used internally by Cloudflare. To meet this demand, a streaming HTML rewriter/parser with a CSS-selector based API was built in Rust and open-sourced as LOL HTML. The major change compared to the previous rewriter, LazyHTML, is the dual-parser architecture required to overcome the additional performance overhead of wrapping/unwrapping each token when propagating tokens to the Workers runtime. This new approach significantly speeds up parsing and reduces output latency and memory consumption.
Company
Cloudflare
Date published
Nov. 29, 2019
Author(s)
Andrew Galloni, Ivan Nikulin
Word count
3142
Hacker News points
None found.
Language
English