Version 3.1.0
System buffer cache

Direct I/O

WiredTiger optionally supports direct I/O. Configuring direct I/O may be useful for applications wanting to:

  • minimize the operating system cache effects of I/O to and from WiredTiger's buffer cache,
  • avoid double-buffering of blocks in WiredTiger's cache and the operating system buffer cache, and
  • avoid stalling underlying solid-state drives by writing a large number of dirty blocks.

Direct I/O is configured using the "direct_io" configuration string to the wiredtiger_open function. An example of configuring direct I/O for WiredTiger's data files:

error_check(wiredtiger_open(
home, NULL, "create,direct_io=[data]", &conn));

On Windows, the "direct_io" configuration string controls whether the operating system cache is used to buffer reads, and writes to disk, i.e., FILE_FLAG_NO_BUFFERING. When "direct_io" is off, Windows will use free RAM to cache access to files. This may had adverse effects because Windows may page out the WiredTiger buffer cache instead of its file cache. An additional configuration string "write_through" controls whether the disk is allowed to cache the writes. Enabling this flag increases write latency as the drive must ensure all writes are persisted to disk, but it ensures write durability. To get the equivalent of O_DIRECT on Windows, "direct_io", and "write_through" must be both set.

Direct I/O implies a writing thread waits for the write to complete (which is a slower operation than writing into the system buffer cache), and configuring direct I/O is likely to decrease overall application performance.

Many Linux systems do not support mixing O_DIRECT and memory mapping or normal I/O to the same file, and attempting to do so can result in data loss or corruption. For this reason:

  • WiredTiger silently ignores the setting of the mmap configuration to the wiredtiger_open function in those cases, and will never memory map a file which is configured for direct I/O.
  • If O_DIRECT is configured for data files on Linux systems, any system utilities used to copy data files for the purposes of backup should also specify O_DIRECT when configuring their file access. A standard Linux system utility that supports O_DIRECT is the dd utility, when using the iflag=direct command-line option.

Additionally, Windows, and many Linux systems require specific alignment for buffers used for I/O when direct I/O is configured, and using the wrong alignment can cause data loss or corruption. When direct I/O is configured on Windows, and Linux systems, WiredTiger aligns I/O buffers to 4KB; if different alignment is required by your system, the buffer_alignment configuration to the wiredtiger_open call should be configured to the correct value.

Finally, if direct I/O is configured on any system, WiredTiger will silently change the file unit allocation size and the maximum leaf and internal page sizes to be at least as large as the buffer_alignment value as well as a multiple of that value.

Direct I/O is based on the non-standard O_DIRECT flag to the POSIX 1003.1 open system call and may not be available on all platforms.

os_cache_dirty_max

As well as direct I/O, WiredTiger supports two additional configuration options related to the system buffer cache:

The first is os_cache_dirty_max, the maximum dirty bytes an object is allowed to have in the system buffer cache. Once this many bytes from an object are written into the system buffer cache, WiredTiger will attempt to schedule writes for all of the dirty blocks the object has in the system buffer cache. This configuration option allows applications to flush dirty blocks from the object, avoiding stalling any underlying drives when the object is subsequently flushed to disk as part of a durability operation.

An example of configuring os_cache_dirty_max:

error_check(session->create(
session, "table:mytable", "os_cache_dirty_max=500MB"));

The os_cache_dirty_max configuration may not be used in combination with direct I/O.

The os_cache_dirty_max configuration is based on the non-standard Linux sync_file_range system call and will be ignored if set and that call is not available.

os_cache_max

The second configuration option related to the system buffer cache is os_cache_max, the maximum bytes an object is allowed to have in the system buffer cache. Once this many bytes from an object are either read into or written from the system buffer cache, WiredTiger will attempt to evict all of the object's blocks from the buffer cache. This configuration option allows applications to evict blocks from the system buffer cache to limit double-buffering and system buffer cache overhead.

An example of configuring os_cache_max:

error_check(session->create(
session, "table:mytable", "os_cache_max=1GB"));

The os_cache_max configuration may not be used in combination with direct I/O.

The os_cache_max configuration is based on the POSIX 1003.1 standard posix_fadvise system call and may not available on all platforms.

Configuring direct I/O, os_cache_dirty_max or os_cache_max all have the side effect of turning off memory-mapping of objects in WiredTiger.