According to Wikipedia no complete natural language specification of the compressed format seems to exist
. However configuration settings are specified.
During my work with LZMA SDK
I discovered the following compression settings CLzmaEncProps
and CLzma2EncProps
structure types:
LZMA Options:
level
- Description: The compression level.
- Range: [0;9].
- Default: 5.
dictSize
- Description: The dictionary size.
- Range: [1<<12;1<<27] for
32-bit
version or [1<<12;1<<30] for 64-bit
version.
- Default: 1<<24.
lc
- Description: The number of high bits of the previous byte to use as a context for literal encoding.
- Range [0;8].
- Default: 3
- Sometimes lc = 4 gives gain for big files.
lp
- Description: The number of low bits of the dictionary position to include in literal_pos_state.
- Range: [0;4].
- Default: 0.
- It is intended for periodical data when period is equal 2^value (where lp=value). For example, for 32-bit (4 bytes) periodical data you can use lp=2. Often it's better to set lc=0, if you change lp switch.
pb
- Description: pb is the number of low bits of the dictionary position to include in pos_state.
- Range: [0;4].
- Default: 2.
- It is intended for periodical data when period is equal 2^value (where lp=value).
algo
- Description: Sets compression mode.
- Options: 0 = fast, 1 = normal.
- Default: 1.
fb
- Description: Sets the number of fast bytes for the Deflate/Deflate64 encoder.
- Range: [5;255].
- Default: 128.
- Usually, a big number gives a little bit better compression ratio and a slower compression process. A large fast bytes parameter can significantly increase the compression ratio for files which contain long identical sequences of bytes.
btMode
- Description: Sets Match Finder for LZMA.
- Options: 0 = hashChain mode, 1 = binTree mode.
- Default: 1.
- Default method is bt4. Algorithms from hc* group don't provide a good compression ratio, but they often work pretty fast in combination with fast mode.
numHashBytes
- Description: Number of hash bytes. See
mf={MF_ID}
section here for details.
- Options: 2, 3 or 4.
- Default: 4.
mc
- Description: Sets number of cycles (passes) for match finder.
- Range: [1;1<<30].
- Default: 32.
- If you specify mc = 0, LZMA will use default value. Usually, a big number gives a little bit better compression ratio and slower compression process. For example, mf=HC4 and mc=10000 can provide almost the same compression ratio as mf=BT4.
writeEndMark
- Description: Option for writing or not writing the end mark.
- Options: 0 - do not write EOPM, 1 - write EOPM.
- Default: 0.
numThreads
- Description: Number of threads.
- Options: 1 or 2
- Default: 2
LZMA2 Options:
LZMA2
is modified version of LZMA
. It provides the following advantages over LZMA
:
- Better compression ratio for data than can't be compressed.
LZMA2
can store such blocks of data in uncompressed form. Also it
decompresses such data faster.
- Better multithreading support. If you compress big file,
LZMA2
can split that file to chunks and compress these chunks in multiple threads.
Note: LZMA2
also supports all LZMA
parameters, but lp + lc
cannot be larger than 4
.
blockSize
- Description: Sets chunk size.
- Default: dictSize * 4.
numBlockThreads
- Description: Set the number of threads per chunk(block).
numTotalThreads
- Description: The maximum number of threads
LZMA2
can use.
Note: LZMA2
uses: 1 thread for each chunk in x1
and x3
modes; and 2 threads for each chunk in x5
, x7
and x9
modes. If LZMA2
is set to use only such number of threads required for one chunk, it doesn't split stream to chunks. So you can get different compression ratio for different number of threads.
I think that in order to get more information on this subject you have to study in a more profound way the LZMA
. There are very few examples on the internet about it and the documentation is quite incomplete.
More Info Here:
http://sevenzip.sourceforge.jp/chm/cmdline/switches/method.htm
http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm
http://linux.die.net/man/1/lzma