Improving compression with a preset DEFLATE dictionary
Google proposed a new HTTP compression method called SDCH (SanDwiCH) to create dictionaries for long strings appearing on many pages of the same domain or popular search results. The compression replaces these long strings with references to the dictionary, resulting in significant file size reduction. However, the drawbacks include large dictionary files and limited usefulness across different sets of pages. At CloudFlare, the challenge is supporting millions of domains with varying content. Better compression leads to smaller payloads and faster content delivery. Besides SDCH, common HTTP compression methods are gzip and DEFLATE, which perform identical compression but differ in header content and error detection functions. DEFLATE consists of two stages: LZ77 algorithm for string replacement and Huffman encoding for further compression. The algorithm's search for matches is determined by the compression level. A deflate dictionary can act as an initial back reference, improving compression ratio. An experiment using a 16KB or 32KB dictionary showed significant compression improvements without substantial performance hits. The utility to make a dictionary for DEFLATE and the optimized version of zlib used by CloudFlare are available at https://github.com/vkrasnov/dictator and https://github.com/cloudflare/zlib, respectively.
Company
Cloudflare
Date published
March 30, 2015
Author(s)
Vlad Krasnov
Word count
1287
Hacker News points
4
Language
English