Version 10.0.2
Managing the global timestamp state

Applications using timestamps need to manage the global timestamp state. These timestamps are queried using the WT_CONNECTION::query_timestamp method and set using the WT_CONNECTION::set_timestamp method.

Applications will generally focus on two of these global timestamps: first, the beginning of the database's read window, called oldest_timestamp, and second, the application time at which data is known to be stable and will never be rolled back, called stable_timestamp.

The oldest timestamp is the oldest time at which a transaction is allowed to read. The historic values of data modified before this time can no longer be read by new transactions. (Transactions already in progress are not affected when the oldest_timestamp changes.)

The stable timestamp is the earliest time at which data is considered stable. (Data is said to be stable when it is not only durable, but additionally, transactions committed at or before the stable time cannot be rolled back by application-level transaction management.) The stable timestamp is saved along with every checkpoint, and that saved time is the point to which the database is recovered after a crash. It is also the earliest point to which the database can be returned via an explicit WT_CONNECTION::rollback_to_stable call. (See Using rollback-to-stable with timestamps.) All transactions must commit after the current stable timestamp.

Applications are responsible for managing these timestamps and periodically updating them. Applications must move both timestamps forward relative to the amount of data being written; not updating them can result in pinning large amounts of data in the cache, which can in turn affect application performance.

Note that updating the stable timestamp does not itself trigger a database checkpoint or change data durability; updates before the new stable timestamp become stable (and thus visible after recovery) as part of the next checkpoint.

Setting global timestamps

The following table lists the global timestamps an application can set using the WT_CONNECTION::set_timestamp method, including constraints.

Timestamp Constraints Description
durable_timestamp <= oldest Reset the maximum durable timestamp (see Using transaction prepare with timestamps for discussion of the durable timestamp).
oldest_timestamp <= stable; may not move backward, set to the value as of the last checkpoint during recovery Inform the system future reads and writes will never be earlier than the specified timestamp.
stable_timestamp may not move backward, set to the recovery timestamp during recovery Inform the system checkpoints should not include commits newer than the specified timestamp.

Setting the global "durable_timestamp" timestamp

The global durable_timestamp functions as application input into the timestamp returned by querying the all_durable timestamp (see below). It is not otherwise used; in particular, moving it forward does not force data to disk. To force data to disk, advance the stable timestamp as desired and take a checkpoint.

Warning
The durable_timestamp is not changed when the application calls the WT_CONNECTION::rollback_to_stable method, and should be reset by the application in that case.

Setting the "oldest_timestamp" timestamp

Setting oldest_timestamp indicates future read timestamps will be at least as recent as the timestamp, allowing WiredTiger to discard history before the specified point. It is not required that there be no currently active readers at earlier timestamps: this setting only indicates future application needs. In other words, as active readers age out of the system, historic data up to the oldest timestamp will be discarded, but no historic data at or after the oldest timestamp will be discarded. It is critical the application update the oldest timestamp frequently as retaining history can be expensive in terms of both cache and I/O resources.

During recovery, the oldest_timestamp is set to its value as of the checkpoint to which recovery is done.

Attempting to set the oldest_timestamp to a value earlier than its current value will be silently ignored.

The oldest_timestamp must be less than or equal to the stable_timestamp.

Setting the "stable_timestamp" timestamp

The stable_timestamp determines the timestamp for subsequent checkpoints. In other words, updates to an object after the stable timestamp will not be included in a future checkpoint. Because tables in a timestamp world are generally using checkpoint durability, the stable_timestamp also determines the point to which recovery will be done after failures.

During recovery, the stable_timestamp is set to the value to which recovery is performed.

It is possible to explicitly roll back to a time after, or equal to, the current stable_timestamp using the WT_CONNECTION::rollback_to_stable method. (See Using rollback-to-stable with timestamps.)

The stable_timestamp should be updated frequently as it generally determines work to be done after recovery to bring the system online, as well as allowing WiredTiger to write data from the cache, and for that reason will affect overall performance.

Attempting to set the stable_timestamp to a value earlier than its current value will be silently ignored.

Using the stable timestamp in the checkpoint is not required, and can be overridden using the use_timestamp=false configuration of the WT_SESSION::checkpoint call. This is not intended for general use, but can be useful for backup scenarios where rolling back to a stable timestamp isn't possible and it's useful for a checkpoint to contain the most recent possible data.

Forcing global timestamps

Warning
The WT_CONNECTION::set_timestamp method includes a force flag which allows applications to violate the usual constraints. For example, this can be used to move stable_timestamp backward. Violating these constraints can lead to undefined or incorrect behavior. For example, while moving oldest_timestamp backward with this mechanism may make some historical data accessible again, other obsolete historical data may have already been discarded, and no guarantees of consistency are made.

Querying global timestamps

The following table lists the global timestamps an application can query using the WT_CONNECTION::set_timestamp method, including constraints.

Timestamp Constraints Description
all_durable None The largest timestamp such that all timestamps up to that value have been made durable (see Using transaction prepare with timestamps for discussion of the durable timestamp).
last_checkpoint <= stable The stable timestamp at which the last checkpoint ran (or 0 if no checkpoints have run).
oldest_reader None The timestamp of the oldest currently active read transaction.
oldest_timestamp <= stable The current application-set oldest_timestamp value.
pinned <= oldest The minimum of the oldest_timestamp and the oldest active reader.
recovery <= stable The stable timestamp used in the most recent checkpoint prior to the last shutdown (or 0 if none available).
stable_timestamp None The current application-set stable_timestamp value.

In all cases, WT_NOTFOUND is returned if there is no matching timestamp.

Reading the "all_durable" timestamp

The all_durable timestamp is the minimum of the global durable_timestamp (as set by the application), and the durable timestamps of all currently running transactions in the system.

Warning
Reading the all_durable timestamp does not prevent additional prepared but uncommitted transactions from appearing. In particular it is possible for the all_durable value read by the application to be obsolete by the time the application receives it, if a transaction prepares setting an earlier durable timestamp. Applications using this value to make operational decisions are responsible for coordinating their own actions (for example by using high-level locks) to ensure its validity.

The all_durable timestamp is read-only.

Reading the "last_checkpoint" timestamp

The last_checkpoint timestamp is the stable timestamp of the previous checkpoint (or 0 if no checkpoints have run). This timestamp is a general case of the recovery timestamp. Generally, applications are expected to use the recovery timestamp and not the last_checkpoint timestamp.

The last_checkpoint timestamp is read-only.

Reading the "oldest_reader" timestamp

The oldest_reader timestamp is the oldest transaction read timestamp active, including the read timestamp of any running checkpoint.

The oldest_reader timestamp is read-only.

Reading the "oldest_timestamp" timestamp

The oldest_timestamp is the current oldest_timestamp as set by the application.

Reading the "pinned" timestamp

The pinned timestamp is the minimum of oldest_timestamp and the oldest active reader, including any running checkpoint. It is not the same as oldest_timestamp because the oldest timestamp can be advanced past currently active readers, leaving a reader as the earliest timestamp in the system. Applications can use the pinned timestamp to understand the earliest data required by any reader in the system.

The pinned timestamp is read-only.

Reading the "recovery" timestamp

The recovery timestamp is the stable timestamp to which recovery was performed on startup. Applications can use the recovery timestamp to retrieve the value the stable timestamp had at system startup.

The recovery timestamp is read-only.

Reading the "stable_timestamp" timestamp

The stable_timestamp is the current stable_timestamp as set by the application.