Version 11.1.0
System buffer cache

Direct I/O

WiredTiger optionally supports direct I/O. Configuring direct I/O may be useful for applications wanting to:

  • minimize the operating system cache effects of I/O to and from WiredTiger's buffer cache,
  • avoid double-buffering of blocks in WiredTiger's cache and the operating system buffer cache, and
  • avoid stalling underlying solid-state drives by writing a large number of dirty blocks.
Warning
Using Direct I/O is almost never a good idea. Direct I/O implies a writing thread waits for the write to complete (which is a slower operation than writing into the system buffer cache), and configuring direct I/O is likely to decrease overall application performance.
Direct I/O may not be available on all platforms.

Direct I/O can be separately configured for log files, data files, and blocks read using a (read-only) checkpoint cursor. Configuring direct I/O for data files will also configure direct I/O for all internal data files created by WiredTiger itself as well as application owned data files.

Direct I/O is configured using the direct_io configuration string to the wiredtiger_open function. An example of configuring direct I/O for data files:

error_check(wiredtiger_open(home, NULL, "create,direct_io=[data]", &conn));

Systems require alignment for I/O buffers when direct I/O is configured, and using the wrong alignment can cause data loss or corruption. If direct I/O is configured, WiredTiger aligns I/O buffers to a default of 4KB; if different alignment is required by your system, the buffer_alignment configuration to the wiredtiger_open call should be set to the correct value. If buffer alignment is configured explicitly or as a result of configuring direct I/O, WiredTiger will silently increase file allocation and page sizes to be at least as large as the buffer_alignment value. In all cases, file allocation and page sizes must be a multiple of the buffer_alignment value.

On Windows, the direct_io configuration string controls whether the operating system cache is used to buffer reads, and writes to disk, i.e., FILE_FLAG_NO_BUFFERING. When direct I/O is off, Windows will use free RAM to cache access to files. This may had adverse effects because Windows may page out the WiredTiger buffer cache instead of its file cache. An additional configuration string, write_through, controls whether the disk is allowed to cache the writes. Enabling this flag increases write latency as the drive must ensure all writes are persisted to disk, but it ensures write durability. To get the equivalent of O_DIRECT on Windows, both direct_io and write_through must be set.

Some Linux systems do not support mixing O_DIRECT and memory mapping or normal I/O to the same file, and attempting to do so can result in data loss or corruption. For this reason:

  • WiredTiger silently ignores the setting of the mmap configuration to the wiredtiger_open function in those cases, and will never memory map a file which is configured for direct I/O.
  • If O_DIRECT is configured for data files on Linux systems, any system utilities used to copy data files for the purposes of backup should also specify O_DIRECT when configuring their file access. A standard Linux system utility that supports O_DIRECT is the dd utility, when using the iflag=direct command-line option.

os_cache_dirty_max

As well as direct I/O, WiredTiger supports two additional configuration options related to the system buffer cache:

The first is os_cache_dirty_max, the maximum dirty bytes an object is allowed to have in the system buffer cache. Once this many bytes from an object are written into the system buffer cache, WiredTiger will attempt to schedule writes for all of the dirty blocks the object has in the system buffer cache. This configuration option allows applications to flush dirty blocks from the object, avoiding stalling any underlying drives when the object is subsequently flushed to disk as part of a durability operation.

An example of configuring os_cache_dirty_max:

error_check(session->create(session, "table:mytable", "os_cache_dirty_max=500MB"));

The os_cache_dirty_max configuration may not be used in combination with direct I/O.

The os_cache_dirty_max configuration is based on the non-standard Linux sync_file_range system call and will be ignored if set and that call is not available.

os_cache_max

The second configuration option related to the system buffer cache is os_cache_max, the maximum bytes an object is allowed to have in the system buffer cache. Once this many bytes from an object are either read into or written from the system buffer cache, WiredTiger will attempt to evict all of the object's blocks from the buffer cache. This configuration option allows applications to evict blocks from the system buffer cache to limit double-buffering and system buffer cache overhead.

An example of configuring os_cache_max:

error_check(session->create(session, "table:mytable", "os_cache_max=1GB"));

The os_cache_max configuration may not be used in combination with direct I/O.

The os_cache_max configuration is based on the POSIX 1003.1 standard posix_fadvise system call and may not available on all platforms.

Configuring direct I/O, os_cache_dirty_max or os_cache_max all have the side effect of turning off memory-mapping of objects in WiredTiger.

WT_SESSION::create
int create(WT_SESSION *session, const char *name, const char *config)
Create a table, column group, index or file.
wiredtiger_open
int wiredtiger_open(const char *home, WT_EVENT_HANDLER *event_handler, const char *config, WT_CONNECTION **connectionp)
Open a connection to a database.