WiredTiger includes a number of optional compression techniques. Configuring compression generally decreases on-disk and in-memory resource requirements and the amount of I/O, and increases CPU cost when data are read and written.
Configuring compression may change application throughput. For example, in applications using solid-state drives (where I/O is less expensive), turning off compression may increase application performance by reducing CPU costs; in applications where I/O costs are more expensive, turning on compression may increase application performance by reducing the overall number of I/O operations.
An example of turning on row-store key prefix compression:
An example of turning on row-store or column-store dictionary compression:
WiredTiger provides two methods of compressing your data when using block compression: the raw and noraw methods. These methods change how WiredTiger works to fit data into the blocks that are stored on disk.
Noraw compression is the traditional compression model where a fixed amount of data is given to the compression system, then turned into a compressed block of data. The amount of data chosen to compress is the data needed to fill the uncompressed block. Thus when compressed, the block will be smaller than the normal data size and the sizes written to disk will often vary depending on how compressible the data being stored is. Algorithms using noraw compression include zlib-noraw, lz4-noraw and snappy.
WiredTiger's raw compression takes advantage of compressors that provide a streaming compression API. Using the streaming API WiredTiger will try to fit as much data as possible into one block. This means that blocks created with raw compression should be of similar size. Using a streaming compression method should also make for less overhead in compression, as the setup and initial work for compressing is done fewer times compared to the amount of data stored. Algorithms using raw compression include zlib, lz4.
When looking at which compression method to use the biggest consideration is that raw compression will normally provide higher compression levels while using more CPU for compression.
An additional consideration is that raw compression may provide a performance advantage in workloads where data is accessed sequentially. That is because more data is generally packed into each block on disk. Conversely, noraw compression may perform better for workloads with random access patterns because each block will tend to be smaller and require less work to read and decompress.
See File formats and compression for more information on available compression techniques.
See Compressors for information on how to configure and enable compression.