There are some considerations when configuring commit-level durability that can affect performance.
WiredTiger automatically groups the flush operations for threads that commit concurrently into single calls. This usually means multi-threaded workloads will achieve higher throughput than single-threaded workloads because the operating system can flush data more efficiently to the disk. No application-level configuration is required for this feature.
By default, log records are written to an in-memory buffer before WT_SESSION::commit_transaction returns, giving highest performance but not ensuring durability. The durability guarantees can be stricter but will impact performance.
If transaction_sync=
(enabled=false) is configured to wiredtiger_open, log records may be buffered in memory, and only flushed to disk by checkpoints, when log files switch or calls to WT_SESSION::commit_transaction with sync=on
. (Note that any call to WT_SESSION::commit_transaction with sync=on
will flush the log records for all committed transactions, not just the transaction where the configuration is set.) This provides the minimal guarantees, but will be significantly faster than other configurations.
If transaction_sync=
(enabled=true), transaction_sync=
(method) further configures the method used to flush log records to disk. By default, the configured value is fsync
, which calls the operating system's fsync
call (of fdatasync
if available) as each commit completes.
If the value is set to dsync
, the O_DSYNC
or O_SYNC
flag to the operating system's open
call will be specified when the file is opened. (The durability guarantee of the fsync
and dsync
configurations are the same, and in our experience the open
flags are slower, this configuration is only included for systems where that may not be the case.)
If the value is set to none
, the operating system's write
call will be called as each commit completes but does not flush to disk. This setting gives durability at the application level but not at the system level.
When a log file fills and the system moves to the next log file, the previous log file will always be flushed to disk prior to close. So when running in a durability mode that does not flush to disk, the risk is bounded by the most recent log file change.
Here is the expected performance of durability modes, in order from the fastest to the slowest (and from the fewest durability guarantees to the most durability guarantees).
Durability Mode | Notes |
---|---|
log=(enabled=false) | checkpoint-level durability |
log=(enabled),transaction_sync=(enabled=false) | in-memory buffered logging configured; updates durable after checkpoint or after sync is set in WT_SESSION::begin_transaction |
log=(enabled),transaction_sync=(enabled=true,method=none) | logging configured; updates durable after application failure, but not after system failure |
log=(enabled),transaction_sync=(enabled=true,method=fsync) | logging configured; updates durable on application or system failure |
log=(enabled),transaction_sync=(enabled=true,method=dsync) | logging configured; updates durable on application or system failure |
The durability setting can also be controlled directly on a per-transaction basis via the WT_SESSION::commit_transaction method. The WT_SESSION::commit_transaction supports several durability modes with the sync
configuration that override the connection level settings.
If sync=on
is configured then this commit operation will wait for its log records, and all earlier ones, to be durable to the extent specified by the transaction_sync=
(method) setting before returning.
If sync=off
is configured then this commit operation will write its records into the in-memory buffer and return immediately.
The durability of the write-ahead log can be controlled independently as well via the WT_SESSION::log_flush method. The WT_SESSION::log_flush supports several durability modes with the sync
configuration that immediately act upon the log.
If sync=on
is configured then this flush will force the current log and all earlier records to be durable on disk before returning. This method call overrides the transaction_sync
setting and forces the data out via fsync
.
If sync=off
is configured then this flush operation will force the logging subsystem to write any outstanding in-memory buffers to the file system before returning.