When loading a large amount of data into a new object, using a cursor with the bulk
configuration string enabled and loading the data in sorted order will be much faster than doing out-of-order inserts.
WiredTiger cursors can be configured for bulk-load using the bulk
configuration keyword to WT_SESSION::open_cursor. Bulk-load is a "fast
path" for quickly loading a large number of rows. Bulk-load may only be used on newly created objects, and an object being bulk-loaded is not accessible from other cursors.
Cursors configured for bulk-load only support the WT_CURSOR::insert and WT_CURSOR::close methods. Bulk load inserts are non-transactional: they cannot be rolled back and ignore the transactional state of the WT_SESSION in which they are opened.
When doing a bulk-load insert, keys must be inserted in sorted order. When doing a bulk-load insert into a column-store object, any skipped records will be created as already-deleted rows. If a column-store bulk-load cursor is configured with append
, the cursor key will be ignored and each inserted row will be assigned the next sequential record number.
When using the sort
utility on a Linux or other POSIX-like system to pre-sort keys, the locale specified by the environment affects the sort order and may not match the default sort order used by WiredTiger. Set LC_ALL=C
in the process' environment to configure the traditional sort order that uses native byte values.
When bulk-loading fixed-length column store objects, the bulk
configuration string value bitmap
allows chunks of a memory resident bitmap to be loaded directly into an object by passing a WT_ITEM to WT_CURSOR::set_value, where the size field indicates the number of records in the bitmap (as specified by the object's value_format
configuration). Bulk-loaded bitmap values must end on a byte boundary relative to the bit count (except for the last set of values loaded).