WiredTiger supports checkpoint durability by default, and optionally commit-level durability when logging is enabled. In most applications, commit-level durability impacts performance more than checkpoint durability; checkpoints offer basic operation durability across application or system failure without impacting performance (although the creation of each checkpoint is a relatively heavy-weight operation). See Checkpoint durability in Java for information on checkpoint durability.
Commit-level durability is implemented using a write-ahead log and is enabled using the log=
(enabled) configuration to wiredtiger.open
. When logging is enabled, WiredTiger writes records to the log for each transaction.
Transactions define which updates are made durable together; see Transactions in Java for details. By default, log records are buffered in memory when written by Session.commit_transaction. See Group commit for information about durability options.
When the transactional log is enabled, calling wiredtiger.open
automatically performs a recovery step when opening the database that applies whatever changes from the log are required to bring the database up to date with the most recent transactional state. This recovery step may require extensions be available when it runs (for example, collators and compression). Therefore, applications doing recovery must configure extensions with the extensions
keyword to wiredtiger.open
consistently whenever re-opening the database.
Recovery is required after the failure of any thread of control in the application, where the failed thread might have been executing inside of the WiredTiger library or open WiredTiger handles have been lost. In most applications, if any thread of control exits unexpectedly, the application will close and re-open the database.
Checkpoints of the database should still be performed periodically when commit-level durability is configured, either explicitly from the application or periodically based on elapsed time or data size with the checkpoint
configuration to wiredtiger.open
.
Database checkpoints are necessary for two reasons: First, log files can only be archived after a checkpoint completes, and so the frequency of checkpoints determines the disk space required by log files. Second, checkpoints bound the time required for recovery to complete after application or system failure by limiting the log records that need to be processed.
With logging enabled, partial backups (backups where not all of the database objects are copied), may result in error messages during recovery, because data files referenced in the logs might not be found. Applications should either copy all objects and log files if commit-level durability of the copied database is required, or alternatively, copy only selected objects when backing up and not copy log files at all, then fall back to checkpoint durability when switching to the backup.
Bulk-loads are not commit-level durable, that is, the creation and bulk-load of an object will not appear in the database log files. For this reason, applications doing incremental backups after a full backup should repeat the full backup step after doing a bulk-load to make the bulk-load durable. In addition, incremental backups after a bulk-load can cause recovery to report errors because there are log records that apply to data files which don't appear in the backup.
WiredTiger log files are named "WiredTigerLog.[number]" where "[number]" is a 10-digit value, for example, WiredTigerLog.0000000001". The log file with the largest number in its name is the most recent log file written. The log file size can be set using the log
configuration to wiredtiger.open
.
By default, WiredTiger automatically removes log files no longer required for recovery. Applications wanting to archive log files instead must disable log file removal using the log=
(archive=false) configuration to wiredtiger.open
.
Log files may be removed or archived after a checkpoint has completed, as long as there's not a backup in progress. Immediately after the checkpoint has completed, only the most recent log file is needed for recovery, and all other log files can be removed or archived. Note that there must always be at least one log file for the database.
Open log cursors prevents WiredTiger from automatically removing log files. Therefore, we recommend proactively closing log cursors when done with them. Applications manually removing log files should take care that no log cursors are opened in the log when removing files or errors may occur when trying to read a log record in a file that was removed.
WiredTiger automatically groups the flush operations for threads that commit concurrently into single calls. This usually means multi-threaded workloads will achieve higher throughput than single-threaded workloads because the operating system can flush data more efficiently to the disk. No application-level configuration is required for this feature.
By default, log records are written to an in-memory buffer before WT_SESSION::commit_transaction returns, giving highest performance but not ensuring durability. The durability guarantees can be stricter but will impact performance.
If transaction_sync=
(enabled=false) is configured to wiredtiger_open, log records may be buffered in memory, and only flushed to disk by checkpoints, when log files switch or calls to WT_SESSION::commit_transaction with sync=on
. (Note that any call to WT_SESSION::commit_transaction with sync=on
will flush the log records for all committed transactions, not just the transaction where the configuration is set.) This provides the minimal guarantees, but will be significantly faster than other configurations.
If transaction_sync=
(enabled=true), transaction_sync=
(method) further configures the method used to flush log records to disk. By default, the configured value is fsync
, which calls the operating system's fsync
call (of fdatasync
if available) as each commit completes.
If the value is set to dsync
, the O_DSYNC
or O_SYNC
flag to the operating system's open
call will be specified when the file is opened. (The durability guarantee of the fsync
and dsync
configurations are the same, and in our experience the open
flags are slower, this configuration is only included for systems where that may not be the case.)
If the value is set to none
, the operating system's write
call will be called as each commit completes but does not flush to disk. This setting gives durability at the application level but not at the system level.
When a log file fills and the system moves to the next log file, the previous log file will always be flushed to disk prior to close. So when running in a durability mode that does not flush to disk, the risk is bounded by the most recent log file change.
Here is the expected performance of durability modes, in order from the fastest to the slowest (and from the fewest durability guarantees to the most durability guarantees).
Durability Mode | Notes |
---|---|
log=(enabled=false) | checkpoint-level durability |
log=(enabled),transaction_sync=(enabled=false) | in-memory buffered logging configured; updates durable after checkpoint or after sync is set in WT_SESSION::begin_transaction |
log=(enabled),transaction_sync=(enabled=true,method=none) | logging configured; updates durable after application failure, but not after system failure |
log=(enabled),transaction_sync=(enabled=true,method=fsync) | logging configured; updates durable on application or system failure |
log=(enabled),transaction_sync=(enabled=true,method=dsync) | logging configured; updates durable on application or system failure |
The durability setting can also be controlled directly on a per-transaction basis via the WT_SESSION::commit_transaction method. The WT_SESSION::commit_transaction supports several durability modes with the sync
configuration that override the connection level settings.
If sync=on
is configured then this commit operation will wait for its log records, and all earlier ones, to be durable to the extent specified by the transaction_sync=
(method) setting before returning.
If sync=off
is configured then this commit operation will write its records into the in-memory buffer and return immediately.
If sync=background
is configured then this commit operation will write its record to an in-memory buffer, and will return. Prior to returning it will signal an internal WiredTiger worker thread to synchronize this log record. The caller may then check the status of that background synchronization with the WT_SESSION::transaction_sync method.
The durability of the write-ahead log can be controlled independently as well via the WT_SESSION::log_flush method. The WT_SESSION::log_flush supports several durability modes with the sync
configuration that immediately act upon the log.
If sync=on
is configured then this flush will force the current log and all earlier records to be durable on disk before returning. This method call overrides the transaction_sync
setting and forces the data out via fsync
.
If sync=off
is configured then this flush operation will force the logging subsystem to write any outstanding in-memory buffers to the file system before returning.
If sync=background
is configured then this flush operation will force the signalling of a background synchronization operation. The caller may then check the status of that background synchronization with the WT_SESSION::transaction_sync method.