The Portable Network Graphics (PNG) is a format for storing compressed raster graphics. The compression engine is based on the Deflate method [RFC1951], designed by PKWare and originally used in PKZIP.
The PNG format is defined by the PNG Specification. This specification was developed by an ad-hoc group named the PNG Development Group, and it is both an International Standard (published under the formal name ISO/IEC 15948) and a W3C Recommendation.
PNG was initially intended as a superior, patent-free replacement of GIF. The final outcome is a modern, extensible, reliable image format, capable to handle an impressive number of image types (from 1-bit black-and-white images up to 48-bit RGB images with a full 16-bit alpha channel), and geared by a significantly stronger lossless compression engine (typically 5-25% better than GIF).
Unlike other lossless compression schemes, PNG compression does not depend solely on the statistics of the input, but it may vary within wide limits, depending on the compressor's implementation. A good PNG encoder must be able to take informed decisions about the factors that affect the size of the output. The purpose of this article is to provide information about these factors, and to give advice on implementing efficient PNG encoders.
The PNG compression works in a pipeline manner.
In the first stage, the image pixels are passed through a lossless arithmetic transformation named delta filtering, or simply filtering, and sent further as a (filtered) byte sequence. Filtering does not compress or otherwise reduce the size of the data, but it makes the data more compressible.
In the second stage, the filtered byte sequence is passed through the Ziv-Lempel algorithm (LZ77), producing LZ77 codes that are further compressed by the Huffman algorithm in the third and final stage. The combination of the last two stages is referred to as the Deflate compression, a widely-used, patent-free algorithm for universal, lossless data compression. The maximum size of the LZ77 sliding window in Deflate is 32768 bytes, and the LZ77 matches can be between 3 and 258 bytes long.
A complete description of the PNG compression is beyond the scope of this guide. The PNG Specification describes the format completely, and provides a complete list of references to the underlying technologies.
Depending on how these parameters are chosen by the implementation, PNG compression may vary within wide limits. The process of selecting the best configuration is computationally infeasible, but heuristics to select a satisfactory configuration are available. The problem of improving these heuristics constitutes an interesting subject for research.
The type of a PNG image is defined in the IHDR
image
header. The image has a certain bit depth, up to 16 bits per sample, and a
certain color type, from Grayscale to RGB+Alpha. If two PNG files of different
types represent exactly the same image, each file can be regarded as a lossless
transformation of the other. A lossless transformation can reduce the
uncompressed stream, and such a transformation is named image
reduction. In most cases, image reductions are capable of reducing the
compressed stream (which is, in fact, our interest), as an indirect
effect of reducing the size of the compressor's input.
tRNS
chunk, the alpha channel can be stripped.
There are, however, a few cases when some image type reductions do not necessarily lead to the reduction of the compressed stream. The PNG-Tech site contains experimental analyses of these possibilities; for example, see the article 8 bits per pixel in paletted images.
Interlacing, useful for a faster, progressive rendering, is another component of the PNG image type that affects compression. In an interlaced stream, the samples corresponding to neighboring pixels are stored far away, hence the data in it is less correlated and less compressible. Unlike JPEG, where interlacing may improve the compression slightly, the PNG interlacing degrades the compression significantly.
The role of filtering can be illustrated in the following example. Assume the sequence 2, 3, 4, 5, 6, 7, 8, 9. Although it has much redundancy, the sequence is not compressible by a Ziv-Lempel compressor, nor by a Huffman compressor. However, if one makes a simple and reversible transformation, replacing each value with the numerical difference between it and the value to its left, the sequence becomes 2, 1, 1, 1, 1, 1, 1, 1, which is highly compressible.
The PNG format employs five types of filters: None, Left, Up, Average, and Paeth. The first filter leaves the original data intact, and the other four are subtracting from each pixel a value that involves the neighbor pixels from the left, up, and/or the upper left.
A certain filter is assigned to each row, and is applied to all pixels from that row. Therefore, an image can be delta-filtered in a huge number of possible configurations (5 ^ height), and each configuration leads to a different compressed output. Two different filter configurations may make a difference in the compressed file size by a couple of factors, so a careful choice of filters is of paramount importance.
It is possible to apply a single filter to all rows, or to apply different filters to different rows. In the former case, the filtering process is fixed; in the latter, it is adaptive.
Cases where the above heuristics are less than optimal are shown on the PNG-Tech site; for example, see Brute-force vs. heuristic filtering.
The Ziv-Lempel algorithm works under the assumption that contiguous sequences appear repeatedly in the input stream. If the sequence to be encoded matches one or more sequences already present in the sliding history window, the encoder sends a LZ77 pair (distance, length) that points to the closest match. In most LZ77 incarnations, including Deflate, smaller distance codes are encoded more concisely.
In Deflate, in particular, the regular (non-matched) symbols, and the match lengths, are sent to the same Huffman coder, while the match distances are sent to a separate Huffman coder. If the LZ77 matches fall between the accepted boundaries (i.e. they are not shorter than 3 and not longer than 258), a greedy strategy will accept them as a replacement for the symbols to which they correspond.
The greedy strategy is preferable when compressing text files, or many types of binary files, but it may be suboptimal when compressing filtered data, such as the byte strings that come from a PNG filter. Filtered data consist mostly of small values with a pseudo-random distribution. Therefore, in certain situations, it may be desirable to favor the encoding of individual symbols, even if matches that may replace these symbols exist.
The
zlib Reference Library
is a reference implementation of Deflate, which is further used by the
PNG Reference Library.
By default, zlib selects the greedy strategy, but the user is able to
specify his or her custom preference via the strategy
parameter.
This parameter can take one of the following values:
- Z_DEFAULT_STRATEGY = 0
, the default greedy search strategy.
- Z_FILTERED = 1
, a strategy in which the matches are accepted
only if their length is 6 or bigger.
- Z_HUFFMAN_ONLY = 2
, a fast strategy in which the Ziv-Lempel
algorithm is entirely bypassed, and all the symbols from the input are encoded
directly by the Huffman coder.
- Z_RLE = 3
(appeared in the zlib-1.2.x series), a fast
strategy in which the LZ77 algorithm is essentially reduced to the Run-Length
Encoding algorithm. In other words, the matches are accepted only if their
distance is 1. For example, the 10-symbol sequence "aaaaaaaaaa
"
can be LZ77-encoded as
['a
', (distance=1, length=9)];
by removing distance=1 from the picture, this encoding can be regarded
as a peculiar run-length encoding (which differs from the classic RLE by using
length=9 instead of count=10).
The strategy
parameter affects only the compression ratio. It does
not affect the correctness of the compressed output, even if it is set to an
inappropriate value.
It was experimentally observed that the LZ77 search is occasionally capable of producing smaller PNGs if it is less exhaustive. The reason behind this act resides in the same category of "strategic searches" discussed here. Unfortunately, there is no known method of anticipating which search level (from the fastest and the least exhaustive, to the slowest and the most exhaustive) is better, other than assuming "the most exhaustive is better in most cases".
Unfortunately, even a "filtered" strategy does not always produce better results than a "greedy" strategy on filtered input, and the only known method to obtain the best combination is by multiple trials. Experiments and measurements can, again, be found on the PNG-Tech site; for example, see the original Z_RLE strategy proposal.
As mentioned earlier, the entropy encoder inside the Deflate method is the static Huffman algorithm. The output of LZ77 is fed into a buffer which is occasionally flushed by sending a static Huffman tree followed by all the Huffman codes, to the output of Deflate. After this, both the buffer and the Huffman tree are reset, waiting for the subsequent LZ77 codes to come and refill the buffer.
The Deflate specification refers to dynamic Huffman codes. However, this is a misnomer, in which the term dynamic is used in contrast to the fixed Huffman codes. The fixed Huffman codes are simply built according to a predefined Huffman tree, without regard to the actual symbol frequencies. The dynamic Huffman codes referred to by the Deflate specification are NOT built by the dynamic Huffman algorithm, as defined, for example, by Faller, Gallager and Knuth (the FGK algorithm), or by Vitter (the V algorithm). The predefined Huffman tree was introduced in PKZIP as a fast compression alternative, but it produces poor results even on text, and it is almost useless in PNG compression. Still, a PNG stream that contains codes built by the fixed (predefined) Huffman tree, is a valid stream, and a compliant PNG reader must decode this stream correctly.
It is desirable to establish the buffer boundaries so that sequences conforming to the same probability model are fit in the same Huffman buffer. Methods for approaching these boundaries exist, but they are not used in the mainstream Deflate implementation(s). Instead, the buffers are flushed when a limit (typically, 16k LZ77 codes) is reached. This is, however, a fast approach, and the results are satisfactory.
The size of Huffman buffers is indirectly determined by the encoder's memory (usage) level. For this reason, certain memory levels might be good for certain types of images.
The multitude of PNG encoding programs is listed at
http://www.libpng.org/pub/png/pngapps.html.
Their performance varies as much as the range of possible compression ratios;
the good encoders are at least applying the filtering heuristics, described
briefly in the PNG Specification, and illustrated above.
Some programs gain extra compression by discarding some of the data in the
input images (so these programs are lossy!)
This section contains the small list of PNG optimization programs that show a particular concern towards obtaining a file size as small as possible. They work by performing repeated compression trials, applying various parameter sets, and selecting the parameter set that yields the smallest compressed output.
pngrewrite by Jason Summers, available at http://www.pobox.com/~jason1/pngrewrite, is an open-source program that performs lossless image reductions. It works best in conjunction with pngcrush (see below); the user should run pngcrush after pngrewrite.
pngcrush by Glenn Randers-Pehrson, available at http://pmt.sourceforge.net/pngcrush, is an open-source program that iterates over PNG filters and zlib (Deflate) parameters, compresses the image repeatedly using each parameter configuration, and chooses the configuration that yields the smallest compressed (IDAT) output. At the user's option, the program can explore few (below 10) or many (a brute-force traversal over more than 100) configurations. The method of selecting the parameters for "few" trials is particularly effective, and the use of a brute-force traversal is generally not recommended.
In addition, pngcrush offers a multitude of extra features, such as recovery of erroneous PNG files (e.g. files containing bad CRCs), and chunk-level editing of PNG meta-data.
OptiPNG by Cosmin Truţa, available at http://www.cs.toronto.edu/pngtech/optipng, is a newer open-source program, inspired from pngcrush, but designed to be more flexible and to run faster. Unlike pngcrush, OptiPNG performs the trials entirely in memory, and writes only the final output file on the disk. Moreover, it offers multiple optimization presets to the user, who can choose among a range of options from "very few trials" to "very many trials" (in contrast to the coarser "smart vs. brute" option offered by pngcrush).
It is important to mention that the achieved compression ratio is less and less likely to improve when higher-level presets (trigerring more trials) are being used. Even if the program is capable of searching automatically over more than 200 configurations (and the advanced users have access to more than 1000 configurations!), a preset that selects around 10 trials should be satisfactory for most users. Furthermore, a preset that selects between 30-40 trials should be satisfactory for all users, for it is very, very unlikely to be beaten significantly by any wider search. The rest of the trial configurations are offered rather as a curiosity (but they were used in the experimentation from which we concluded they are indeed useless!)
AdvanceCOMP by Andrea Mazzoleni is a set of tools for optimizing ZIP/GZIP, PNG and MNG files, based on the powerful 7-Zip deflation engine. The name of the PNG optimization tool is AdvPNG. At the time of this writing, AdvPNG does not perform image reductions, so the use of pngrewrite or OptiPNG prior to optimiziation may be necessary. However, given the effectivenes of 7-Zip deflation, AdvanceCOMP is a powerful contender.
The AdvanceCOMP tool set is a part of the AdvanceMAME project, available at http://advancemame.sourceforge.net.
PNGOut by Ken Silverman, available at http://advsys.net/ken/utils.htm, is a freely-available compiled program (no source code), running on Windows and Linux. According to our tests, the compression ratio achieved by PNGOut is comparable to that of AdvPNG. Unfortunately, due to the lack of information, we cannot say much about this tool.
A nice GUI frontend for PNGOut, named PNGGauntlet, is available at http://www.numbera.com/software/pnggauntlet.aspx.
What is lossless PNG optimization, after all? This is a straightforward question, whose answer is intuitive, yet not so straightforward.
Losslessness in the strictest sense, where no information whatsoever is lost, can only be achieved by leaving the original file (any file) intact, or by transforming it (e.g. compressing it, encrypting it) in such a way that there is an inverse transformation which recovers it completely, bit by bit.
In the case of PNG images, this condition of strict losslessness has little
relevance to the casual graphics user, and is, therefore, too strong.
There are instances where strict losslessness is required; for example, when
handling certified PNG files whose integrity is guaranteed by an external
checksum like MD5 or SHA, or by a digital signature such as
dSIG
. Most of the time, however, it is desirable to relax
the notion of PNG losslessness, to the extent of not losing any information
that pertains to the rendered image and to the
semantic value of the metadata that accompanies the image. This allows
the user to concentrate on what is really important when it comes to preserving
the contents of a PNG image, and enables the concept of PNG optimization tools.
A lossless transform of a PNG image file is a transform which
fully preserves the rendered RGB triples (the RGB triples that come
either directly, or from a palette index, or from a gray->RGB expansion), the
rendered transparency (the alpha samples that come either directly, or
from a tRNS
chunk, or the implicit 100% opacity assumed due
to the lack of any explicit transparency information), the order of
rendering (sequential or interlaced), and the semantics contained by the
ancillary chunks.
IDAT
. It also allows
the alteration or the elimination of other pieces of information that are
technically valid, but have no influence on any presentation of the image
pixels:
IDAT
, or in other compressed chunks like
zTXt
, iTXt
or
iCCP
; e.g. the LZ77 window size, the type and size of
Deflate blocks, etc. (The only thing that matters is that the
decompressed byte sequence must remain the same.)
PLTE
chunk. (When
changing this order, the information that depends on it, such as the
palette-encoded pixels or the tRNS
information, must be
updated accordingly.)
tRNS
chunk.
tRNS
entries in a palette image.
gAMA
) or significant bit
(sBIT
) information inside an image that consists
exclusively of samples whose intensity is either minimum (0) or maximum
(2^bitdepth-1).
tEXt
chunk, or compressed in a zTXt
chunk, or with no translation in an iTXt
chunk.
If any of the discardable information is important in a particular application,
and lossless PNG optimization is still desirable, it is recommended to store
this information in ancillary chunks, rather than hack it inside critical
chunks. For example, if sterile palette entries are necessary (e.g. for later
editing stages), it is recommended to store them inside a suggested palette
(sPLT
) chunk, rather than keeping them inside
PLTE
.
Besides the discussed specifications, the references below provide essential information necessary to comprehend the contents of this article.