A guide to PNG optimization

1. Background

1.1 The PNG file format

The Portable Network Graphics (PNG) is a format for storing compressed raster graphics. The compression engine is based on the Deflate method [RFC1951], designed by PKWare and originally used in PKZIP.

The PNG format is defined by the PNG Specification. This specification was developed by an ad-hoc group named the PNG Development Group, and it is both an International Standard (published under the formal name ISO/IEC 15948) and a W3C Recommendation.

PNG was initially intended as a superior, patent-free replacement of GIF. The final outcome is a modern, extensible, reliable image format, capable to handle an impressive number of image types (from 1-bit black-and-white images up to 48-bit RGB images with a full 16-bit alpha channel), and geared by a significantly stronger lossless compression engine (typically 5-25% better than GIF).

Unlike other lossless compression schemes, PNG compression does not depend solely on the statistics of the input, but it may vary within wide limits, depending on the compressor's implementation. A good PNG encoder must be able to take informed decisions about the factors that affect the size of the output. The purpose of this article is to provide information about these factors, and to give advice on implementing efficient PNG encoders.

1.2 The PNG compression

The PNG compression works in a pipeline manner.

In the first stage, the image pixels are passed through a lossless arithmetic transformation named delta filtering, or simply filtering, and sent further as a (filtered) byte sequence. Filtering does not compress or otherwise reduce the size of the data, but it makes the data more compressible.

In the second stage, the filtered byte sequence is passed through the Ziv-Lempel algorithm (LZ77), producing LZ77 codes that are further compressed by the Huffman algorithm in the third and final stage. The combination of the last two stages is referred to as the Deflate compression, a widely-used, patent-free algorithm for universal, lossless data compression. The maximum size of the LZ77 sliding window in Deflate is 32768 bytes, and the LZ77 matches can be between 3 and 258 bytes long.

A complete description of the PNG compression is beyond the scope of this guide. The PNG Specification describes the format completely, and provides a complete list of references to the underlying technologies.

2. Factors that affect the PNG file size

Like any other compression scheme, PNG compression depends on the statistics of the input data. In addition, it depends on the following PNG-specific parameters:
  1. The PNG image type
  2. The PNG delta filters
  3. The strategy of searching LZ77 matches
  4. The size of the Huffman buffers inside the Deflate encoder

Depending on how these parameters are chosen by the implementation, PNG compression may vary within wide limits. The process of selecting the best configuration is computationally infeasible, but heuristics to select a satisfactory configuration are available. The problem of improving these heuristics constitutes an interesting subject for research.

2.1 The PNG image type

The type of a PNG image is defined in the IHDR image header. The image has a certain bit depth, up to 16 bits per sample, and a certain color type, from Grayscale to RGB+Alpha. If two PNG files of different types represent exactly the same image, each file can be regarded as a lossless transformation of the other. A lossless transformation can reduce the uncompressed stream, and such a transformation is named image reduction. In most cases, image reductions are capable of reducing the compressed stream (which is, in fact, our interest), as an indirect effect of reducing the size of the compressor's input.

The possible image reductions are:

There are, however, a few cases when some image type reductions do not necessarily lead to the reduction of the compressed stream. The PNG-Tech site contains experimental analyses of these possibilities; for example, see the article 8 bits per pixel in paletted images.

Interlacing, useful for a faster, progressive rendering, is another component of the PNG image type that affects compression. In an interlaced stream, the samples corresponding to neighboring pixels are stored far away, hence the data in it is less correlated and less compressible. Unlike JPEG, where interlacing may improve the compression slightly, the PNG interlacing degrades the compression significantly.

2.2 The PNG delta filters

The role of filtering can be illustrated in the following example. Assume the sequence 2, 3, 4, 5, 6, 7, 8, 9. Although it has much redundancy, the sequence is not compressible by a Ziv-Lempel compressor, nor by a Huffman compressor. However, if one makes a simple and reversible transformation, replacing each value with the numerical difference between it and the value to its left, the sequence becomes 2, 1, 1, 1, 1, 1, 1, 1, which is highly compressible.

The PNG format employs five types of filters: None, Left, Up, Average, and Paeth. The first filter leaves the original data intact, and the other four are subtracting from each pixel a value that involves the neighbor pixels from the left, up, and/or the upper left.

A certain filter is assigned to each row, and is applied to all pixels from that row. Therefore, an image can be delta-filtered in a huge number of possible configurations (5 ^ height), and each configuration leads to a different compressed output. Two different filter configurations may make a difference in the compressed file size by a couple of factors, so a careful choice of filters is of paramount importance.

It is possible to apply a single filter to all rows, or to apply different filters to different rows. In the former case, the filtering process is fixed; in the latter, it is adaptive.

While an exhaustive search is unfeasible, the PNG Specification suggests a heuristic filtering strategy:

Cases where the above heuristics are less than optimal are shown on the PNG-Tech site; for example, see Brute-force vs. heuristic filtering.

2.3 The strategy of searching LZ77 matches

The Ziv-Lempel algorithm works under the assumption that contiguous sequences appear repeatedly in the input stream. If the sequence to be encoded matches one or more sequences already present in the sliding history window, the encoder sends a LZ77 pair (distance, length) that points to the closest match. In most LZ77 incarnations, including Deflate, smaller distance codes are encoded more concisely.

In Deflate, in particular, the regular (non-matched) symbols, and the match lengths, are sent to the same Huffman coder, while the match distances are sent to a separate Huffman coder. If the LZ77 matches fall between the accepted boundaries (i.e. they are not shorter than 3 and not longer than 258), a greedy strategy will accept them as a replacement for the symbols to which they correspond.

The greedy strategy is preferable when compressing text files, or many types of binary files, but it may be suboptimal when compressing filtered data, such as the byte strings that come from a PNG filter. Filtered data consist mostly of small values with a pseudo-random distribution. Therefore, in certain situations, it may be desirable to favor the encoding of individual symbols, even if matches that may replace these symbols exist.

The zlib Reference Library is a reference implementation of Deflate, which is further used by the PNG Reference Library. By default, zlib selects the greedy strategy, but the user is able to specify his or her custom preference via the strategy parameter. This parameter can take one of the following values:
- Z_DEFAULT_STRATEGY = 0, the default greedy search strategy.
- Z_FILTERED = 1, a strategy in which the matches are accepted only if their length is 6 or bigger.
- Z_HUFFMAN_ONLY = 2, a fast strategy in which the Ziv-Lempel algorithm is entirely bypassed, and all the symbols from the input are encoded directly by the Huffman coder.
- Z_RLE = 3 (appeared in the zlib-1.2.x series), a fast strategy in which the LZ77 algorithm is essentially reduced to the Run-Length Encoding algorithm. In other words, the matches are accepted only if their distance is 1. For example, the 10-symbol sequence "aaaaaaaaaa" can be LZ77-encoded as ['a', (distance=1, length=9)]; by removing distance=1 from the picture, this encoding can be regarded as a peculiar run-length encoding (which differs from the classic RLE by using length=9 instead of count=10).
The strategy parameter affects only the compression ratio. It does not affect the correctness of the compressed output, even if it is set to an inappropriate value.

It was experimentally observed that the LZ77 search is occasionally capable of producing smaller PNGs if it is less exhaustive. The reason behind this act resides in the same category of "strategic searches" discussed here. Unfortunately, there is no known method of anticipating which search level (from the fastest and the least exhaustive, to the slowest and the most exhaustive) is better, other than assuming "the most exhaustive is better in most cases".

Unfortunately, even a "filtered" strategy does not always produce better results than a "greedy" strategy on filtered input, and the only known method to obtain the best combination is by multiple trials. Experiments and measurements can, again, be found on the PNG-Tech site; for example, see the original Z_RLE strategy proposal.

2.4 The size of Huffman buffers

As mentioned earlier, the entropy encoder inside the Deflate method is the static Huffman algorithm. The output of LZ77 is fed into a buffer which is occasionally flushed by sending a static Huffman tree followed by all the Huffman codes, to the output of Deflate. After this, both the buffer and the Huffman tree are reset, waiting for the subsequent LZ77 codes to come and refill the buffer.

The Deflate specification refers to dynamic Huffman codes. However, this is a misnomer, in which the term dynamic is used in contrast to the fixed Huffman codes. The fixed Huffman codes are simply built according to a predefined Huffman tree, without regard to the actual symbol frequencies. The dynamic Huffman codes referred to by the Deflate specification are NOT built by the dynamic Huffman algorithm, as defined, for example, by Faller, Gallager and Knuth (the FGK algorithm), or by Vitter (the V algorithm). The predefined Huffman tree was introduced in PKZIP as a fast compression alternative, but it produces poor results even on text, and it is almost useless in PNG compression. Still, a PNG stream that contains codes built by the fixed (predefined) Huffman tree, is a valid stream, and a compliant PNG reader must decode this stream correctly.

It is desirable to establish the buffer boundaries so that sequences conforming to the same probability model are fit in the same Huffman buffer. Methods for approaching these boundaries exist, but they are not used in the mainstream Deflate implementation(s). Instead, the buffers are flushed when a limit (typically, 16k LZ77 codes) is reached. This is, however, a fast approach, and the results are satisfactory.

The size of Huffman buffers is indirectly determined by the encoder's memory (usage) level. For this reason, certain memory levels might be good for certain types of images.

3. PNG (lossless) optimization programs

The multitude of PNG encoding programs is listed at http://www.libpng.org/pub/png/pngapps.html. Their performance varies as much as the range of possible compression ratios; the good encoders are at least applying the filtering heuristics, described briefly in the PNG Specification, and illustrated above.
Some programs gain extra compression by discarding some of the data in the input images (so these programs are lossy!)

This section contains the small list of PNG optimization programs that show a particular concern towards obtaining a file size as small as possible. They work by performing repeated compression trials, applying various parameter sets, and selecting the parameter set that yields the smallest compressed output.

4. An extra note on losslessness

What is lossless PNG optimization, after all? This is a straightforward question, whose answer is intuitive, yet not so straightforward.

Losslessness in the strictest sense, where no information whatsoever is lost, can only be achieved by leaving the original file (any file) intact, or by transforming it (e.g. compressing it, encrypting it) in such a way that there is an inverse transformation which recovers it completely, bit by bit.

In the case of PNG images, this condition of strict losslessness has little relevance to the casual graphics user, and is, therefore, too strong. There are instances where strict losslessness is required; for example, when handling certified PNG files whose integrity is guaranteed by an external checksum like MD5 or SHA, or by a digital signature such as dSIG. Most of the time, however, it is desirable to relax the notion of PNG losslessness, to the extent of not losing any information that pertains to the rendered image and to the semantic value of the metadata that accompanies the image. This allows the user to concentrate on what is really important when it comes to preserving the contents of a PNG image, and enables the concept of PNG optimization tools.

A lossless transform of a PNG image file is a transform which fully preserves the rendered RGB triples (the RGB triples that come either directly, or from a palette index, or from a gray->RGB expansion), the rendered transparency (the alpha samples that come either directly, or from a tRNS chunk, or the implicit 100% opacity assumed due to the lack of any explicit transparency information), the order of rendering (sequential or interlaced), and the semantics contained by the ancillary chunks.
This definition allows the execution of the above-mentioned image reduction operations, and the recompression of IDAT. It also allows the alteration or the elimination of other pieces of information that are technically valid, but have no influence on any presentation of the image pixels:

If any of the discardable information is important in a particular application, and lossless PNG optimization is still desirable, it is recommended to store this information in ancillary chunks, rather than hack it inside critical chunks. For example, if sterile palette entries are necessary (e.g. for later editing stages), it is recommended to store them inside a suggested palette (sPLT) chunk, rather than keeping them inside PLTE.

5. Selective bibliography

Besides the discussed specifications, the references below provide essential information necessary to comprehend the contents of this article.


Copyright © 2003-2008 Cosmin Truţa. Permission to distribute freely.
Appeared: 7 Apr 2003.
Last updated: 10 May 2008.