Applications using timestamps need to manage the global timestamp state. These timestamps are queried using the WT_CONNECTION::query_timestamp method and set using the WT_CONNECTION::set_timestamp method.
Applications will generally focus on two of these global timestamps: first, the beginning of the database's read window, called oldest_timestamp
, and second, the application time at which data is known to be stable and will never be rolled back, called stable_timestamp
.
The oldest timestamp is the oldest time at which a transaction is allowed to read. The historic values of data modified before this time can no longer be read by new transactions. (Transactions already in progress are not affected when the oldest_timestamp
changes.)
The stable timestamp is the earliest time at which data is considered stable. (Data is said to be stable when it is not only durable, but additionally, transactions committed at or before the stable time cannot be rolled back by application-level transaction management.) The stable timestamp is saved along with every checkpoint, and that saved time is the point to which the database is recovered after a crash. It is also the earliest point to which the database can be returned via an explicit WT_CONNECTION::rollback_to_stable call. (See Using rollback-to-stable with timestamps.) All transactions must commit after the current stable
timestamp.
Applications are responsible for managing these timestamps and periodically updating them. Applications must move both timestamps forward relative to the amount of data being written; not updating them can result in pinning large amounts of data in the cache, which can in turn affect application performance.
Note that updating the stable timestamp does not itself trigger a database checkpoint or change data durability; updates before the new stable timestamp become stable (and thus visible after recovery) as part of the next checkpoint.
The following table lists the global timestamps an application can set using the WT_CONNECTION::set_timestamp method, including constraints.
Timestamp | Constraints | Description |
---|---|---|
durable_timestamp | <= oldest | Reset the maximum durable timestamp (see Using transaction prepare with timestamps for discussion of the durable timestamp). |
oldest_timestamp | <= stable; may not move backward, set to the value as of the last checkpoint during recovery | Inform the system future reads and writes will never be earlier than the specified timestamp. |
stable_timestamp | may not move backward, set to the recovery timestamp during recovery | Inform the system checkpoints should not include commits newer than the specified timestamp. |
The global durable_timestamp
functions as application input into the timestamp returned by querying the all_durable
timestamp (see below). It is not otherwise used; in particular, moving it forward does not force data to disk. To force data to disk, advance the stable timestamp as desired and take a checkpoint.
durable_timestamp
is not changed when the application calls the WT_CONNECTION::rollback_to_stable method, and should be reset by the application in that case.Setting oldest_timestamp
indicates future read timestamps will be at least as recent as the timestamp, allowing WiredTiger to discard history before the specified point. It is not required that there be no currently active readers at earlier timestamps: this setting only indicates future application needs. In other words, as active readers age out of the system, historic data up to the oldest timestamp will be discarded, but no historic data at or after the oldest
timestamp will be discarded. It is critical the application update the oldest timestamp frequently as retaining history can be expensive in terms of both cache and I/O resources.
During recovery, the oldest_timestamp
is set to its value as of the checkpoint to which recovery is done.
Attempting to set the oldest_timestamp
to a value earlier than its current value will be silently ignored.
The oldest_timestamp
must be less than or equal to the stable_timestamp
.
The stable_timestamp
determines the timestamp for subsequent checkpoints. In other words, updates to an object after the stable timestamp will not be included in a future checkpoint. Because tables in a timestamp world are generally using checkpoint durability, the stable_timestamp
also determines the point to which recovery will be done after failures.
During recovery, the stable_timestamp
is set to the value to which recovery is performed.
It is possible to explicitly roll back to a time after, or equal to, the current stable_timestamp
using the WT_CONNECTION::rollback_to_stable method. (See Using rollback-to-stable with timestamps.)
The stable_timestamp
should be updated frequently as it generally determines work to be done after recovery to bring the system online, as well as allowing WiredTiger to write data from the cache, and for that reason will affect overall performance.
Attempting to set the stable_timestamp
to a value earlier than its current value will be silently ignored.
Using the stable timestamp in the checkpoint is not required, and can be overridden using the use_timestamp=false
configuration of the WT_SESSION::checkpoint call. This is not intended for general use, but can be useful for backup scenarios where rolling back to a stable timestamp isn't possible and it's useful for a checkpoint to contain the most recent possible data.
force
flag which allows applications to violate the usual constraints. For example, this can be used to move stable_timestamp
backward. Violating these constraints can lead to undefined or incorrect behavior. For example, while moving oldest_timestamp
backward with this mechanism may make some historical data accessible again, other obsolete historical data may have already been discarded, and no guarantees of consistency are made.The following table lists the global timestamps an application can query using the WT_CONNECTION::set_timestamp method, including constraints.
Timestamp | Constraints | Description |
---|---|---|
all_durable | None | The largest timestamp such that all timestamps up to that value have been made durable (see Using transaction prepare with timestamps for discussion of the durable timestamp). |
last_checkpoint | <= stable | The stable timestamp at which the last checkpoint ran (or 0 if no checkpoints have run). |
oldest_reader | None | The timestamp of the oldest currently active read transaction. |
oldest_timestamp | <= stable | The current application-set oldest_timestamp value. |
pinned | <= oldest | The minimum of the oldest_timestamp and the oldest active reader. |
recovery | <= stable | The stable timestamp used in the most recent checkpoint prior to the last shutdown (or 0 if none available). |
stable_timestamp | None | The current application-set stable_timestamp value. |
In all cases, WT_NOTFOUND is returned if there is no matching timestamp.
The all_durable
timestamp is the minimum of the global durable_timestamp
(as set by the application), and the durable timestamps of all currently running transactions in the system.
all_durable
timestamp does not prevent additional prepared but uncommitted transactions from appearing. In particular it is possible for the all_durable
value read by the application to be obsolete by the time the application receives it, if a transaction prepares setting an earlier durable timestamp. Applications using this value to make operational decisions are responsible for coordinating their own actions (for example by using high-level locks) to ensure its validity.The all_durable
timestamp is read-only.
The last_checkpoint
timestamp is the stable timestamp of the previous checkpoint (or 0 if no checkpoints have run). This timestamp is a general case of the recovery
timestamp. Generally, applications are expected to use the recovery
timestamp and not the last_checkpoint
timestamp.
The last_checkpoint
timestamp is read-only.
The oldest_reader
timestamp is the oldest transaction read timestamp active, including the read timestamp of any running checkpoint.
The oldest_reader
timestamp is read-only.
The oldest_timestamp
is the current oldest_timestamp
as set by the application.
The pinned
timestamp is the minimum of oldest_timestamp
and the oldest active reader, including any running checkpoint. It is not the same as oldest_timestamp
because the oldest timestamp can be advanced past currently active readers, leaving a reader as the earliest timestamp in the system. Applications can use the pinned
timestamp to understand the earliest data required by any reader in the system.
The pinned
timestamp is read-only.
The recovery
timestamp is the stable timestamp to which recovery was performed on startup. Applications can use the recovery
timestamp to retrieve the value the stable timestamp had at system startup.
The recovery
timestamp is read-only.
The stable_timestamp
is the current stable_timestamp
as set by the application.