ZLIB Spec Correction Proposal
The current ZLIB Data Format Specification version 3.3, as presented in
RFC1950, defines the CINFO
(Compression Info) field in Section 2.2. Data Format:
For CM = 8, CINFO is the base-2 logarithm of the LZ77 window size,
minus eight (CINFO=7 indicates a 32K window size). Values of CINFO above
7 are not allowed in this version of the specification. CINFO is
not defined in this specification for CM not equal to 8.
For the reasons presented below, this text should be changed to:
For CM = 8, CINFO is an upper bound of the base-2 logarithm of the
biggest distance code, minus eight. CINFO = 7 indicates that the distance
codes can be at most 32768, and values of CINFO above 7 are not allowed
in this version of the specification. CINFO is not defined in this
specification for CM not equal to 8.
Also, in Section 2.3. Compliance, the following text should be added:
A compliant compressor must not output distance codes bigger than
2^(CINFO+8).
...
A compliant decompressor must give an error indication if it detects
distance codes bigger than 2^(CINFO+8) in the compressed stream.
The term "the LZ77 window size" is defined by the LZ77 algorithm, but
there are some unadressed issues, such as whether to allow matches longer
than the window size, or whether to allow matches in the immediate neighborhood
of the window margins. These issues are defined by applications, based
mainly on the compression ratio vs. execution speed tradeoff. ZLIB is an
LZ77 application, but it doesn't specifically define these issues. This
may cause unwanted events, such as the events presented here.
- The following encoder optimizations should be correct, but they
are not allowed by the current wording of the ZLIB spec:
-
Correct Optimization #1:
If a (non-streamable) encoder with a 32K window detects that its output
is identical to the output of any valid encoder with a 2K window (because
distance codes bigger than 2K do not occur), CINFO can be optimized from
7 to 3. For this reason, this patch
can be applied to the zlib/libz (reference library version 1.1.3) to fix
the 256-byte window bug.
-
Correct Optimization #2:
There may be reasons for encoders to employ windows of sizes other
than 256, 512, 1K, 2K, 4K, 8K, 16K, 32K. For example, the proposed Z_RLE
strategy uses a window of size 1, but encoder implementors may be confused
about the value of CINFO: -8 (incorrect because no negative values are
allowed?), or 0 (incorrect because 2^(CINFO+8) != 1 ?), or a window of
size 1 is merely disallowed? In fact, windows of size 1 should be allowed,
and CINFO should be 0 in this case (any other values are also valid, but
sub-optimal). This cannot be inferred from the current spec. Interpretation
problems also occur when the window size is not a power of two, because
CINFO must be an integer in the range 0..7: it can be misinterpreted that
such window sizes are disallowed.
- The following decoder optimizations should be incorrect, but they
are allowed by the current wording of the ZLIB spec:
-
Incorrect Optimization #1:
Since it is not stated, nor can it be inferred that the window size
does not restrict the match lengths, decoder implementors might assume
the contrary and optimize the decoders thinking that 257- and 258-byte
matches will not appear in 256-byte windows. Such decoders will have
unpredicted behavior, which may yield an incorrectly decoded (altered) stream,
and/or a system crash.
-
Incorrect Optimization #2:
The term the LZ77 sliding window does not assure that the distance
codes can go exactly up to the size of this window. In general, LZ77
applications might decide to have a faster algorithm that keeps the entire
match in the window, and this might imply
match_length + match_distance <= window_size.
In spite that the ZLIB reference library implements the encoder in this
fashion, the ZLIB specification does not impose this restriction, so the
relation does not hold for ZLIB encoders in general.
However, decoder implementors might assume the contrary and optimize the
decoders thinking that match_distance <= window_size - 3 (for example,
perform double-byte or quad-byte memory copying that is unsafe when window_size
- match_distance < 3). This could again yield at least to altered output.
Back to PNG-Tech Home