LZMA compression settings details

According to Wikipedia no complete natural language specification of the compressed format seems to exist. However configuration settings are specified.

During my work with LZMA SDK I discovered the following compression settings CLzmaEncProps and CLzma2EncProps structure types:

LZMA Options:

level

Description: The compression level.
Range: [0;9].
Default: 5.

dictSize

Description: The dictionary size.
Range: [1<<12;1<<27] for 32-bit version or [1<<12;1<<30] for 64-bit version.
Default: 1<<24.

lc

Description: The number of high bits of the previous byte to use as a context for literal encoding.
Range [0;8].
Default: 3
Sometimes lc = 4 gives gain for big files.

lp

Description: The number of low bits of the dictionary position to include in literal_pos_state.
Range: [0;4].
Default: 0.
It is intended for periodical data when period is equal 2^value (where lp=value). For example, for 32-bit (4 bytes) periodical data you can use lp=2. Often it's better to set lc=0, if you change lp switch.

pb

Description: pb is the number of low bits of the dictionary position to include in pos_state.
Range: [0;4].
Default: 2.
It is intended for periodical data when period is equal 2^value (where lp=value).

algo

Description: Sets compression mode.
Options: 0 = fast, 1 = normal.
Default: 1.

fb

Description: Sets the number of fast bytes for the Deflate/Deflate64 encoder.
Range: [5;255].
Default: 128.
Usually, a big number gives a little bit better compression ratio and a slower compression process. A large fast bytes parameter can significantly increase the compression ratio for files which contain long identical sequences of bytes.

btMode

Description: Sets Match Finder for LZMA.
Options: 0 = hashChain mode, 1 = binTree mode.
Default: 1.
Default method is bt4. Algorithms from hc* group don't provide a good compression ratio, but they often work pretty fast in combination with fast mode.

numHashBytes

Description: Number of hash bytes. See mf={MF_ID} section here for details.
Options: 2, 3 or 4.
Default: 4.

mc

Description: Sets number of cycles (passes) for match finder.
Range: [1;1<<30].
Default: 32.
If you specify mc = 0, LZMA will use default value. Usually, a big number gives a little bit better compression ratio and slower compression process. For example, mf=HC4 and mc=10000 can provide almost the same compression ratio as mf=BT4.

writeEndMark

Description: Option for writing or not writing the end mark.
Options: 0 - do not write EOPM, 1 - write EOPM.
Default: 0.

numThreads

Description: Number of threads.
Options: 1 or 2
Default: 2

LZMA2 Options:

LZMA2 is modified version of LZMA. It provides the following advantages over LZMA:

Better compression ratio for data than can't be compressed. LZMA2 can store such blocks of data in uncompressed form. Also it decompresses such data faster.
Better multithreading support. If you compress big file, LZMA2 can split that file to chunks and compress these chunks in multiple threads.

Note: LZMA2 also supports all LZMA parameters, but lp + lc cannot be larger than 4.

blockSize

Description: Sets chunk size.
Default: dictSize * 4.

numBlockThreads

Description: Set the number of threads per chunk(block).

numTotalThreads

Description: The maximum number of threads LZMA2 can use.

Note: LZMA2 uses: 1 thread for each chunk in x1 and x3 modes; and 2 threads for each chunk in x5, x7 and x9 modes. If LZMA2 is set to use only such number of threads required for one chunk, it doesn't split stream to chunks. So you can get different compression ratio for different number of threads.

I think that in order to get more information on this subject you have to study in a more profound way the LZMA. There are very few examples on the internet about it and the documentation is quite incomplete.

More Info Here:

http://sevenzip.sourceforge.jp/chm/cmdline/switches/method.htm

http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm

http://linux.die.net/man/1/lzma