A guide to PNG optimization

1. Background

1.1 The PNG file format

The Portable Network Graphics (PNG) is a format for storing compressed raster graphics. The compression engine is based on the Deflate method [RFC1951], designed by PKWare and originally used in PKZIP.

The PNG format is defined by the PNG Specification. This specification was developed by an ad-hoc group named the PNG Development Group, and it is both an International Standard (published under the formal name ISO/IEC 15948) and a W3C Recommendation.

PNG was initially intended as a superior, patent-free replacement of GIF. The final outcome is a modern, extensible, reliable image format, capable to handle an impressive number of image types (from 1-bit black-and-white images up to 48-bit RGB images with a full 16-bit alpha channel), and geared by a significantly stronger lossless compression engine (typically 5-25% better than GIF).

Unlike other lossless compression schemes, PNG compression does not depend solely on the statistics of the input, but it may vary within wide limits, depending on the compressor's implementation. A good PNG encoder must be able to take informed decisions about the factors that affect the size of the output. The purpose of this article is to provide information about these factors, and to give advice on implementing efficient PNG encoders.

1.2 The PNG compression

The PNG compression works in a pipeline manner.

In the first stage, the image pixels are passed through a lossless arithmetic transformation named delta filtering, or simply filtering, and sent further as a (filtered) byte sequence. Filtering does not compress or otherwise reduce the size of the data, but it makes the data more compressible.

In the second stage, the filtered byte sequence is passed through the Ziv-Lempel algorithm (LZ77), producing LZ77 codes that are further compressed by the Huffman algorithm in the third and final stage. The combination of the last two stages is referred to as the Deflate compression, a widely-used, patent-free algorithm for universal, lossless data compression. The maximum size of the LZ77 sliding window in Deflate is 32768 bytes, and the LZ77 matches can be between 3 and 258 bytes long.

A complete description of the PNG compression is beyond the scope of this guide. The PNG Specification describes the format completely, and provides a complete list of references to the underlying technologies.

2. Factors that affect the PNG file size

Like any other compression scheme, PNG compression depends on the statistics of the input data. In addition, it depends on the following PNG-specific parameters:

The PNG image type
The PNG delta filters
The strategy of searching LZ77 matches
The size of the Huffman buffers inside the Deflate encoder

Depending on how these parameters are chosen by the implementation, PNG compression may vary within wide limits. The process of selecting the best configuration is computationally infeasible, but heuristics to select a satisfactory configuration are available. The problem of improving these heuristics constitutes an interesting subject for research.

2.1 The PNG image type

The type of a PNG image is defined in the IHDR image header. The image has a certain bit depth, up to 16 bits per sample, and a certain color type, from Grayscale to RGB+Alpha. If two PNG files of different types represent exactly the same image, each file can be regarded as a lossless transformation of the other. A lossless transformation can reduce the uncompressed stream, and such a transformation is named image reduction. In most cases, image reductions are capable of reducing the compressed stream (which is, in fact, our interest), as an indirect effect of reducing the size of the compressor's input.

The possible image reductions are:

Bit depth reduction
The bit depth can be reduced to a minimum value that is acceptable for all samples. For example, if all sample values in a 16-bit image have the form (256+1)*n, (e.g. #0000, #2323, #FFFF), then the bit depth can be reduced to 8, and the new sample values will become n, (e.g. #00, #23, #FF).
Color type reduction
- If an RGB image has 256 distinct colors or less, it can be reencoded as a Palette image.
- If an RGB or Palette image has only gray pixels, it can be reencoded as Grayscale.
A color type reduction can also enable a bit depth reduction.
Color palette reduction
If the color palette contains redundant entries (i.e. duplicate entries that indicate the same RGB value) or sterile entries (i.e. entries that do not have a correspondent in the raw pixel data), these entries can be removed.
A color palette reduction can also enable a bit depth reduction.
Alpha channel reduction
If all pixels in a Grayscale+Alpha or an RGB+Alpha image are fully opaque (i.e. all alpha components are equal to 2^^bitdepth-1), or if the transparency information can be stored entirely in a (much cheaper) tRNS chunk, the alpha channel can be stripped.

There are, however, a few cases when some image type reductions do not necessarily lead to the reduction of the compressed stream. The PNG-Tech site contains experimental analyses of these possibilities; for example, see the article 8 bits per pixel in paletted images.

Interlacing, useful for a faster, progressive rendering, is another component of the PNG image type that affects compression. In an interlaced stream, the samples corresponding to neighboring pixels are stored far away, hence the data in it is less correlated and less compressible. Unlike JPEG, where interlacing may improve the compression slightly, the PNG interlacing degrades the compression significantly.

2.2 The PNG delta filters

The role of filtering can be illustrated in the following example. Assume the sequence 2, 3, 4, 5, 6, 7, 8, 9. Although it has much redundancy, the sequence is not compressible by a Ziv-Lempel compressor, nor by a Huffman compressor. However, if one makes a simple and reversible transformation, replacing each value with the numerical difference between it and the value to its left, the sequence becomes 2, 1, 1, 1, 1, 1, 1, 1, which is highly compressible.

The PNG format employs five types of filters: None, Left, Up, Average, and Paeth. The first filter leaves the original data intact, and the other four are subtracting from each pixel a value that involves the neighbor pixels from the left, up, and/or the upper left.

A certain filter is assigned to each row, and is applied to all pixels from that row. Therefore, an image can be delta-filtered in a huge number of possible configurations (5 ^ ^height), and each configuration leads to a different compressed output. Two different filter configurations may make a difference in the compressed file size by a couple of factors, so a careful choice of filters is of paramount importance.

It is possible to apply a single filter to all rows, or to apply different filters to different rows. In the former case, the filtering process is fixed; in the latter, it is adaptive.

While an exhaustive search is unfeasible, the PNG Specification suggests a heuristic filtering strategy:

If the image type is Palette, or the bit depth is smaller than 8, then do not filter the image (i.e. use fixed filtering, with the filter None).
(The other case) If the image type is Grayscale or RGB (with or without Alpha), and the bit depth is not smaller than 8, then use adaptive filtering as follows: independently for each row, apply all five filters and select the filter that produces the smallest sum of absolute values per row.

Cases where the above heuristics are less than optimal are shown on the PNG-Tech site; for example, see Brute-force vs. heuristic filtering.

2.3 The strategy of searching LZ77 matches

The Ziv-Lempel algorithm works under the assumption that contiguous sequences appear repeatedly in the input stream. If the sequence to be encoded matches one or more sequences already present in the sliding history window, the encoder sends a LZ77 pair (distance, length) that points to the closest match. In most LZ77 incarnations, including Deflate, smaller distance codes are encoded more concisely.

In Deflate, in particular, the regular (non-matched) symbols, and the match lengths, are sent to the same Huffman coder, while the match distances are sent to a separate Huffman coder. If the LZ77 matches fall between the accepted boundaries (i.e. they are not shorter than 3 and not longer than 258), a greedy strategy will accept them as a replacement for the symbols to which they correspond.

The greedy strategy is preferable when compressing text files, or many types of binary files, but it may be suboptimal when compressing filtered data, such as the byte strings that come from a PNG filter. Filtered data consist mostly of small values with a pseudo-random distribution. Therefore, in certain situations, it may be desirable to favor the encoding of individual symbols, even if matches that may replace these symbols exist.

The zlib Reference Library is a reference implementation of Deflate, which is further used by the PNG Reference Library. By default, zlib selects the greedy strategy, but the user is able to specify his or her custom preference via the strategy parameter. This parameter can take one of the following values:
- Z_DEFAULT_STRATEGY = 0, the default greedy search strategy.
- Z_FILTERED = 1, a strategy in which the matches are accepted only if their length is 6 or bigger.
- Z_HUFFMAN_ONLY = 2, a fast strategy in which the Ziv-Lempel algorithm is entirely bypassed, and all the symbols from the input are encoded directly by the Huffman coder.
- Z_RLE = 3 (appeared in the zlib-1.2.x series), a fast strategy in which the LZ77 algorithm is essentially reduced to the Run-Length Encoding algorithm. In other words, the matches are accepted only if their distance is 1. For example, the 10-symbol sequence "aaaaaaaaaa" can be LZ77-encoded as ['a', (distance=1, length=9)]; by removing distance=1 from the picture, this encoding can be regarded as a peculiar run-length encoding (which differs from the classic RLE by using length=9 instead of count=10).
The strategy parameter affects only the compression ratio. It does not affect the correctness of the compressed output, even if it is set to an inappropriate value.

It was experimentally observed that the LZ77 search is occasionally capable of producing smaller PNGs if it is less exhaustive. The reason behind this act resides in the same category of "strategic searches" discussed here. Unfortunately, there is no known method of anticipating which search level (from the fastest and the least exhaustive, to the slowest and the most exhaustive) is better, other than assuming "the most exhaustive is better in most cases".

Unfortunately, even a "filtered" strategy does not always produce better results than a "greedy" strategy on filtered input, and the only known method to obtain the best combination is by multiple trials. Experiments and measurements can, again, be found on the PNG-Tech site; for example, see the original Z_RLE strategy proposal.

2.4 The size of Huffman buffers

As mentioned earlier, the entropy encoder inside the Deflate method is the static Huffman algorithm. The output of LZ77 is fed into a buffer which is occasionally flushed by sending a static Huffman tree followed by all the Huffman codes, to the output of Deflate. After this, both the buffer and the Huffman tree are reset, waiting for the subsequent LZ77 codes to come and refill the buffer.

The Deflate specification refers to dynamic Huffman codes. However, this is a misnomer, in which the term dynamic is used in contrast to the fixed Huffman codes. The fixed Huffman codes are simply built according to a predefined Huffman tree, without regard to the actual symbol frequencies. The dynamic Huffman codes referred to by the Deflate specification are NOT built by the dynamic Huffman algorithm, as defined, for example, by Faller, Gallager and Knuth (the FGK algorithm), or by Vitter (the V algorithm). The predefined Huffman tree was introduced in PKZIP as a fast compression alternative, but it produces poor results even on text, and it is almost useless in PNG compression. Still, a PNG stream that contains codes built by the fixed (predefined) Huffman tree, is a valid stream, and a compliant PNG reader must decode this stream correctly.

It is desirable to establish the buffer boundaries so that sequences conforming to the same probability model are fit in the same Huffman buffer. Methods for approaching these boundaries exist, but they are not used in the mainstream Deflate implementation(s). Instead, the buffers are flushed when a limit (typically, 16k LZ77 codes) is reached. This is, however, a fast approach, and the results are satisfactory.

The size of Huffman buffers is indirectly determined by the encoder's memory (usage) level. For this reason, certain memory levels might be good for certain types of images.

3. PNG (lossless) optimization programs

The multitude of PNG encoding programs is listed at http://www.libpng.org/pub/png/pngapps.html. Their performance varies as much as the range of possible compression ratios; the good encoders are at least applying the filtering heuristics, described briefly in the PNG Specification, and illustrated above.
Some programs gain extra compression by discarding some of the data in the input images (so these programs are lossy!)

This section contains the small list of PNG optimization programs that show a particular concern towards obtaining a file size as small as possible. They work by performing repeated compression trials, applying various parameter sets, and selecting the parameter set that yields the smallest compressed output.

pngrewrite by Jason Summers, available at http://www.pobox.com/~jason1/pngrewrite, is an open-source program that performs lossless image reductions. It works best in conjunction with pngcrush (see below); the user should run pngcrush after pngrewrite.
pngcrush by Glenn Randers-Pehrson, available at http://pmt.sourceforge.net/pngcrush, is an open-source program that iterates over PNG filters and zlib (Deflate) parameters, compresses the image repeatedly using each parameter configuration, and chooses the configuration that yields the smallest compressed (IDAT) output. At the user's option, the program can explore few (below 10) or many (a brute-force traversal over more than 100) configurations. The method of selecting the parameters for "few" trials is particularly effective, and the use of a brute-force traversal is generally not recommended.

In addition, pngcrush offers a multitude of extra features, such as recovery of erroneous PNG files (e.g. files containing bad CRCs), and chunk-level editing of PNG meta-data.
OptiPNG by Cosmin Truţa, available at http://www.cs.toronto.edu/pngtech/optipng, is a newer open-source program, inspired from pngcrush, but designed to be more flexible and to run faster. Unlike pngcrush, OptiPNG performs the trials entirely in memory, and writes only the final output file on the disk. Moreover, it offers multiple optimization presets to the user, who can choose among a range of options from "very few trials" to "very many trials" (in contrast to the coarser "smart vs. brute" option offered by pngcrush).

It is important to mention that the achieved compression ratio is less and less likely to improve when higher-level presets (trigerring more trials) are being used. Even if the program is capable of searching automatically over more than 200 configurations (and the advanced users have access to more than 1000 configurations!), a preset that selects around 10 trials should be satisfactory for most users. Furthermore, a preset that selects between 30-40 trials should be satisfactory for all users, for it is very, very unlikely to be beaten significantly by any wider search. The rest of the trial configurations are offered rather as a curiosity (but they were used in the experimentation from which we concluded they are indeed useless!)
AdvanceCOMP by Andrea Mazzoleni is a set of tools for optimizing ZIP/GZIP, PNG and MNG files, based on the powerful 7-Zip deflation engine. The name of the PNG optimization tool is AdvPNG. At the time of this writing, AdvPNG does not perform image reductions, so the use of pngrewrite or OptiPNG prior to optimiziation may be necessary. However, given the effectivenes of 7-Zip deflation, AdvanceCOMP is a powerful contender.

The AdvanceCOMP tool set is a part of the AdvanceMAME project, available at http://advancemame.sourceforge.net.
PNGOut by Ken Silverman, available at http://advsys.net/ken/utils.htm, is a freely-available compiled program (no source code), running on Windows and Linux. According to our tests, the compression ratio achieved by PNGOut is comparable to that of AdvPNG. Unfortunately, due to the lack of information, we cannot say much about this tool.

A nice GUI frontend for PNGOut, named PNGGauntlet, is available at http://www.numbera.com/software/pnggauntlet.aspx.

4. An extra note on losslessness

What is lossless PNG optimization, after all? This is a straightforward question, whose answer is intuitive, yet not so straightforward.

Losslessness in the strictest sense, where no information whatsoever is lost, can only be achieved by leaving the original file (any file) intact, or by transforming it (e.g. compressing it, encrypting it) in such a way that there is an inverse transformation which recovers it completely, bit by bit.

In the case of PNG images, this condition of strict losslessness has little relevance to the casual graphics user, and is, therefore, too strong. There are instances where strict losslessness is required; for example, when handling certified PNG files whose integrity is guaranteed by an external checksum like MD5 or SHA, or by a digital signature such as dSIG. Most of the time, however, it is desirable to relax the notion of PNG losslessness, to the extent of not losing any information that pertains to the rendered image and to the semantic value of the metadata that accompanies the image. This allows the user to concentrate on what is really important when it comes to preserving the contents of a PNG image, and enables the concept of PNG optimization tools.

A lossless transform of a PNG image file is a transform which fully preserves the rendered RGB triples (the RGB triples that come either directly, or from a palette index, or from a gray->RGB expansion), the rendered transparency (the alpha samples that come either directly, or from a tRNS chunk, or the implicit 100% opacity assumed due to the lack of any explicit transparency information), the order of rendering (sequential or interlaced), and the semantics contained by the ancillary chunks.

This definition allows the execution of the above-mentioned image reduction operations, and the recompression of IDAT. It also allows the alteration or the elimination of other pieces of information that are technically valid, but have no influence on any presentation of the image pixels:

The information that pertains to Deflate streams, either inside IDAT, or in other compressed chunks like zTXt, iTXt or iCCP; e.g. the LZ77 window size, the type and size of Deflate blocks, etc. (The only thing that matters is that the decompressed byte sequence must remain the same.)
The order of palette entries inside a PLTE chunk. (When changing this order, the information that depends on it, such as the palette-encoded pixels or the tRNS information, must be updated accordingly.)
RGB triples that do not correspond to any pixel in the actual image, but are stored in a tRNS chunk.
Fully opaque tRNS entries in a palette image.
Gamma correction (gAMA) or significant bit (sBIT) information inside an image that consists exclusively of samples whose intensity is either minimum (0) or maximum (2^^bitdepth-1).
The fact that a textual comment is stored uncompressed in a tEXt chunk, or compressed in a zTXt chunk, or with no translation in an iTXt chunk.
Etcetera.

If any of the discardable information is important in a particular application, and lossless PNG optimization is still desirable, it is recommended to store this information in ancillary chunks, rather than hack it inside critical chunks. For example, if sterile palette entries are necessary (e.g. for later editing stages), it is recommended to store them inside a suggested palette (sPLT) chunk, rather than keeping them inside PLTE.

5. Selective bibliography

Besides the discussed specifications, the references below provide essential information necessary to comprehend the contents of this article.

Thomas Boutell, Glenn Randers-Pehrson et al. Portable Network Graphics (PNG) Specification, Second Edition. ISO/IEC 15948:2003(E); W3C Recommendation 10 November 2003.
David A. Huffman. A method for the construction of minimum redundancy codes. In Proceedings of the Institute of Radio Engineers, vol. 40, no. 9, pp. 1098-1101, September 1952.
Jacob Ziv and Abraham Lempel. A universal algorithm for data compression. IEEE Transactions on Information Theory, vol. IT-23, no. 3, pp. 337-343, May 1977.
Due to a historical accident, the famous algorithm is better-known as the "Lempel-Ziv (LZ) algorithm", even though the "Ziv-Lempel algorithm" is a more legitimate name.
Greg Roelofs. PNG: The definitive guide. O'Reilly and Associates, 1999.