/plushcap/analysis/cloudflare/improving-compression-with-preset-deflate-dictionary

Improving compression with a preset DEFLATE dictionary

What's this blog post about?

Google proposed a new HTTP compression method called SDCH (SanDwiCH) to create dictionaries for long strings appearing on many pages of the same domain or popular search results. The compression replaces these long strings with references to the dictionary, resulting in significant file size reduction. However, the drawbacks include large dictionary files and limited usefulness across different sets of pages. At CloudFlare, the challenge is supporting millions of domains with varying content. Better compression leads to smaller payloads and faster content delivery. Besides SDCH, common HTTP compression methods are gzip and DEFLATE, which perform identical compression but differ in header content and error detection functions. DEFLATE consists of two stages: LZ77 algorithm for string replacement and Huffman encoding for further compression. The algorithm's search for matches is determined by the compression level. A deflate dictionary can act as an initial back reference, improving compression ratio. An experiment using a 16KB or 32KB dictionary showed significant compression improvements without substantial performance hits. The utility to make a dictionary for DEFLATE and the optimized version of zlib used by CloudFlare are available at https://github.com/vkrasnov/dictator and https://github.com/cloudflare/zlib, respectively.

Company
Cloudflare

Date published
March 30, 2015

Author(s)
Vlad Krasnov

Word count
1287

Hacker News points
4

Language
English


By Matt Makai. 2021-2024.