When loading a large amount of data into a new object, using a cursor with the bulk
configuration string enabled and loading the data in sorted order will be much faster than doing out-of-order inserts.
WiredTiger cursors can be configured for bulk-load using the bulk
configuration keyword to WT_SESSION::open_cursor. Bulk-load is a "fast
path" for quickly loading a large number of rows. Bulk-load may only be used on newly created objects, and an object being bulk-loaded is not accessible from other cursors.
Cursors configured for bulk-load only support the WT_CURSOR::insert and WT_CURSOR::close methods. Bulk load inserts are non-transactional: they cannot be rolled back and ignore the transactional state of the WT_SESSION in which they are opened.
When doing a bulk-load insert, keys must be inserted in sorted order. When doing a bulk-load insert into a column-store object, any skipped records will be created as already-deleted rows. If a column-store bulk-load cursor is configured with append
, the cursor key will be ignored and each inserted row will be assigned the next sequential record number.
When using the sort
utility on a Linux or other POSIX-like system to pre-sort keys, the locale specified by the environment affects the sort order and may not match the default sort order used by WiredTiger. Set LC_ALL=C
in the process' environment to configure the traditional sort order that uses native byte values.
When bulk-loading fixed-length column store objects, the bulk
configuration string value bitmap
allows chunks of a memory resident bitmap to be loaded directly into an object. This is done by passing a WT_ITEM to WT_CURSOR::set_value, where the size field indicates the number of records in the bitmap (not the number of bytes) and the data pointer points to the proper number of bits, packed into bytes with no padding, most significant bits first. For example, if the value format is 3t
, the topmost three bits of the first byte hold the first value, the next three the second value, and to load the values 7, 6, 5, 4, 3, 2, 1, 0 one would use the three bytes 250, 195, 136.
The bitmap must be loaded starting at an aligned record number such that the data does not need to be shifted before being placed in the database; that is, the first record number to load must be a record that appears at the beginning of a byte. Since the first record number is 1, the first such record is 1, and the next is given by the bit packing. For example, if the value format is 3t
, every 8 records pack into 3 bytes, so legal starting record numbers are 1, 9, 17, etc. In general, the load record number less 1, multiplied by the bit size, should be a multiple of 8.
The number of records in each load should also in general be chosen to load a whole number of bytes, so that after loading one chunk the cursor is correctly positioned for loading another. This consideration can of course be ignored for the last chunk.