Tuesday, February 22, 2011

Huff0 : Early detection offers better speed (v0.7)

 In my earlier release, i skipped the integration of a simple to understand but more difficult to execute feature : early detection of border-cases segments.

This definition regroups both "not compressible segments" and "single character segments". The previous algorithm was correctly detecting these cases, but during the construction of the Huffman tree.

The new algorithm ensures these situations are detected before any operation is done to build the tree. It does so by counting differently, in a new structure created for this purpose. The net result is a speed improvement for files which feature such situations :

Detailed performance assessment :


Huff0 v0.6

Huff0 v0.7

RatioCompress DecodingRatioCompressDecoding
Not compressible





enwik8.7z1.000810 MB/s1.93 GB/s1.000870 MB/s1.93 GB/s
Hardly compressible





win98-lz4hc-lit1.024465 MB/s600 MB/s1.024485 MB/s600 MB/s
audio11.070285 MB/s280 MB/s1.070285 MB/s280 MB/s
Distributed





enwik8-lz4hc-lit1.290204 MB/s194 MB/s1.290204 MB/s194 MB/s
Lightly Ordered





enwik8-lz4hc-offh1.131197 MB/s184 MB/s1.131197 MB/s184 MB/s
Ordered





enwik8-lz4hc-ml2.309214 MB/s195 MB/s2.309214 MB/s195 MB/s
Squeezed





office-lz4hc-run3.152218 MB/s202 MB/s3.152218 MB/s202 MB/s
enwik8-lz4hc-run4.959245 MB/s224 MB/s4.959245 MB/s224 MB/s
Ultra compressible





loong275785 MB/s2.93 GB/s275860 MB/s2.93 GB/s



This (mostly) closes the gap with Range0 regarding the detection speed of not compressible segments, which is the main case to consider for real-life situations.

You can download and test the new version here :
http://fastcompression.blogspot.com/p/huff0-range0-entropy-coders.html

No comments:

Post a Comment