Version 10.0.2
All Classes Functions Variables Typedefs Enumerations Enumerator Modules Pages
Commit-level durability

WiredTiger supports checkpoint durability by default, and optionally commit-level durability when logging is enabled. In most applications, commit-level durability impacts performance more than checkpoint durability; checkpoints offer basic operation durability across application or system failure without impacting performance (although the creation of each checkpoint is a relatively heavy-weight operation). See Checkpoint durability for information on checkpoint durability.

Commit-level durability is implemented using a write-ahead log and is enabled using the log=(enabled) configuration to wiredtiger_open. When logging is enabled, WiredTiger writes records to the log for each transaction.

Transactions define which updates are made durable together; see Transactions for details. By default, log records are buffered in memory when written by WT_SESSION::commit_transaction. See Group commit for information about durability options.

When the transactional log is enabled, calling wiredtiger_open automatically performs a recovery step when opening the database that applies whatever changes from the log are required to bring the database up to date with the most recent transactional state. This recovery step may require extensions be available when it runs (for example, collators and compression). Therefore, applications doing recovery must configure extensions with the extensions keyword to wiredtiger_open consistently whenever re-opening the database.

Recovery is required after the failure of any thread of control in the application, where the failed thread might have been executing inside of the WiredTiger library or open WiredTiger handles have been lost. In most applications, if any thread of control exits unexpectedly, the application will close and re-open the database.

Checkpoints

Checkpoints of the database should still be performed periodically when commit-level durability is configured, either explicitly from the application or periodically based on elapsed time or data size with the checkpoint configuration to wiredtiger_open.

Database checkpoints are necessary for two reasons: First, log files can only be archived after a checkpoint completes, and so the frequency of checkpoints determines the disk space required by log files. Second, checkpoints bound the time required for recovery to complete after application or system failure by limiting the log records that need to be processed.

Backups

With logging enabled, partial backups (backups where not all of the database objects are copied), may result in error messages during recovery, because data files referenced in the logs might not be found. Applications should either copy all objects and log files if commit-level durability of the copied database is required, or alternatively, copy only selected objects when backing up and not copy log files at all, then fall back to checkpoint durability when switching to the backup.

Bulk loads

Bulk-loads are not commit-level durable, that is, the creation and bulk-load of an object will not appear in the database log files. For this reason, applications doing incremental backups after a full backup should repeat the full backup step after doing a bulk-load to make the bulk-load durable. In addition, incremental backups after a bulk-load can cause recovery to report errors because there are log records that apply to data files which don't appear in the backup.

Log file archival

WiredTiger log files are named "WiredTigerLog.[number]" where "[number]" is a 10-digit value, for example, WiredTigerLog.0000000001". The log file with the largest number in its name is the most recent log file written. The log file size can be set using the log configuration to wiredtiger_open.

By default, WiredTiger automatically removes log files no longer required for recovery. Applications wanting to archive log files instead must disable log file removal using the log=(archive=false) configuration to wiredtiger_open.

Log files may be removed or archived after a checkpoint has completed, as long as there's not a backup in progress. When performing Log-based Incremental backup, WT_SESSION::truncate can be used to remove log files after completing each incremental backup.

Immediately after the checkpoint has completed, only the most recent log file is needed for recovery, and all other log files can be removed or archived. Note that there must always be at least one log file for the database.

Open log cursors prevents WiredTiger from automatically removing log files. Therefore, we recommend proactively closing log cursors when done with them. Applications manually removing log files should take care that no log cursors are opened in the log when removing files or errors may occur when trying to read a log record in a file that was removed.

Tuning commit-level durability

Group commit

WiredTiger automatically groups the flush operations for threads that commit concurrently into single calls. This usually means multi-threaded workloads will achieve higher throughput than single-threaded workloads because the operating system can flush data more efficiently to the disk. No application-level configuration is required for this feature.

Flush call configuration

By default, log records are written to an in-memory buffer before WT_SESSION::commit_transaction returns, giving highest performance but not ensuring durability. The durability guarantees can be stricter but will impact performance.

If transaction_sync=(enabled=false) is configured to wiredtiger_open, log records may be buffered in memory, and only flushed to disk by checkpoints, when log files switch or calls to WT_SESSION::commit_transaction with sync=on. (Note that any call to WT_SESSION::commit_transaction with sync=on will flush the log records for all committed transactions, not just the transaction where the configuration is set.) This provides the minimal guarantees, but will be significantly faster than other configurations.

If transaction_sync=(enabled=true), transaction_sync=(method) further configures the method used to flush log records to disk. By default, the configured value is fsync, which calls the operating system's fsync call (of fdatasync if available) as each commit completes.

If the value is set to dsync, the O_DSYNC or O_SYNC flag to the operating system's open call will be specified when the file is opened. (The durability guarantee of the fsync and dsync configurations are the same, and in our experience the open flags are slower, this configuration is only included for systems where that may not be the case.)

If the value is set to none, the operating system's write call will be called as each commit completes but does not flush to disk. This setting gives durability at the application level but not at the system level.

When a log file fills and the system moves to the next log file, the previous log file will always be flushed to disk prior to close. So when running in a durability mode that does not flush to disk, the risk is bounded by the most recent log file change.

Here is the expected performance of durability modes, in order from the fastest to the slowest (and from the fewest durability guarantees to the most durability guarantees).

Durability ModeNotes
log=(enabled=false)checkpoint-level durability
log=(enabled),transaction_sync=(enabled=false)in-memory buffered logging configured; updates durable after checkpoint or after sync is set in WT_SESSION::begin_transaction
log=(enabled),transaction_sync=(enabled=true,method=none)logging configured; updates durable after application failure, but not after system failure
log=(enabled),transaction_sync=(enabled=true,method=fsync)logging configured; updates durable on application or system failure
log=(enabled),transaction_sync=(enabled=true,method=dsync)logging configured; updates durable on application or system failure

The durability setting can also be controlled directly on a per-transaction basis via the WT_SESSION::commit_transaction method. The WT_SESSION::commit_transaction supports several durability modes with the sync configuration that override the connection level settings.

If sync=on is configured then this commit operation will wait for its log records, and all earlier ones, to be durable to the extent specified by the transaction_sync=(method) setting before returning.

If sync=off is configured then this commit operation will write its records into the in-memory buffer and return immediately.

The durability of the write-ahead log can be controlled independently as well via the WT_SESSION::log_flush method. The WT_SESSION::log_flush supports several durability modes with the sync configuration that immediately act upon the log.

If sync=on is configured then this flush will force the current log and all earlier records to be durable on disk before returning. This method call overrides the transaction_sync setting and forces the data out via fsync.

If sync=off is configured then this flush operation will force the logging subsystem to write any outstanding in-memory buffers to the file system before returning.