Version 3.0.0
Checkpoint durability in Java

WiredTiger supports checkpoint durability by default, and optionally commit-level durability when logging is enabled. In most applications, commit-level durability impacts performance more than checkpoint durability; checkpoints offer basic operation durability across application or system failure without impacting performance (although the creation of each checkpoint is a relatively heavy-weight operation). See Commit-level durability in Java for information on commit-level durability.

Checkpoints with transactional logging enabled allow logging to archive older log files (if configured to do so) while still retaining commit-level durability and the ability to run recovery.

A checkpoint is automatically created whenever a modified data source is closed. Additionally, checkpoints of one or more data sources, or the entire database, can be explicitly created using the Session.checkpoint method. Checkpoints can also be scheduled periodically based on elapsed time or data size with the checkpoint configuration to wiredtiger.open.

All transactional updates committed before a checkpoint are made durable by the checkpoint, therefore the frequency of checkpoints limits the volume of data that may be lost due to application or system failure. This guarantee has an exception: If a crash occurs when a backup cursor is open, then the system will be restored to the most recent checkpoint prior to the opening of the backup cursor, even if later database checkpoints were completed.

Data sources that are involved in an exclusive operation when the checkpoint starts, including bulk load, verify or salvage, will be skipped by the checkpoint. Operations requiring exclusive access may fail with an EBUSY error if attempted during a checkpoint.

When data sources are first opened, they are opened in the state of the most recent checkpoint taken on the file, in other words, updates after the most recent checkpoint will not appear in the data source. If no checkpoint is found when the data source is opened, the data source will appear empty.

Automatic checkpoints

WiredTiger will automatically checkpoint the entire database when the checkpoint configuration parameter to wiredtiger.open is set. When this configuration is used, an internal server thread is created.

The period between checkpoints can be defined either in seconds via wait, or, if logging is enabled, as the number of bytes written to the log since the last checkpoint via log_size, or both. If both periods are defined then the checkpoint occurs as soon as either threshold has occurred and both are reset once the checkpoint is complete. If using log_size, we recommend that the size selected be a multiple of the log file size for archiving purposes.

Checkpoint cursors

Cursors are normally opened in the most recent version of a data source. However, a checkpoint configuration string may be provided to Session.open_cursor, opening a read-only, static view of the data source. This provides a limited form of time-travel, as the static view is not changed by subsequent checkpoints, and will persist until the checkpoint cursor is closed.

Checkpoint naming

Additionally, checkpoints that do not include LSM trees may optionally be given names by the application. Because named checkpoints persist until discarded or replaced, they can be used to periodically snapshot data for later use.

Checkpoints named by the application persist until explicitly discarded or the application creates a new checkpoint with the same name (which replaces the previous checkpoint of that name). If the previous checkpoint cannot be replaced, either because a cursor is reading from the previous checkpoint, or backups are in progress, the checkpoint will fail.

Internal checkpoints (that is, checkpoints not named by the application) use the reserved name "WiredTigerCheckpoint". Applications can open the most recent of these checkpoints by specifying "WiredTigerCheckpoint" as the checkpoint name to Session.open_cursor. Creating a new internal checkpoint drops all previous internal checkpoints, if possible; if a previous internal checkpoint cannot be dropped for any reason, the checkpoint will ignore the previous checkpoint and continue. Subsequent checkpoints will drop those ignored checkpoints when it becomes possible.

The -c option to the wt command line utility list command will list a data source's checkpoints, with time stamp, in a human-readable format.

Checkpoints and file compaction

Checkpoints share file blocks, and dropping a checkpoint may or may not make file blocks available for re-use, depending on whether the dropped checkpoint contained the last reference to a file block. Because checkpoints named by the application are not discarded until explicitly discarded or replaced, they may prevent Session.compact from accomplishing anything due to shared file blocks.