TOO_FAR in zlib Is Not Too Far

The source file deflate.c found in the zlib implementation has the macro symbol TOO_FAR which is the constant value 4096. In the LZ algorithm as implemented in zlib, matches of length 3 are discarded if their distance exceeds TOO_FAR. Its value can be overriden by recompiling zlib having the macro redefined (e.g. by using the option -DTOO_FAR=16384, for most command-line compilers).

According to Jean-loup Gailly, the author of gzip and one of the co-authors of zlib, the value of TOO_FAR was tuned mostly on ascii text, where the 4K value seemed to be optimal. It is quite possible that it is not optimal for binary files.

After some experimentation with PNG files, it turned out that for many files there is a slight compression gain if the value of TOO_FAR is increased, and it is the best for TOO_FAR=32767 (the maximum possible value). Glenn Randers-Pehrson found 1.20% size gain with Kodak ColorSet, 0% size gain with Waterloo Bragzone ColorSet, and 0.4% size gain with a larger sample of over 200 image files taken from The Art of Lossless Image Compression collection. For very few images (such as phoenix.png) the results were better with smaller values of TOO_FAR (1024).

As reported by Glenn, the compression time is around 7% bigger when TOO_FAR has the maximum value.

Conclusion

5-10% more time spent for 0.5% better compression on average is not a bad tradeoff. For instance, pngcrush is compiled with TOO_FAR=32767.


Back to PNG-Tech Home