Applications using timestamps need to manage the global timestamp state. These timestamps are queried using the WT_CONNECTION::query_timestamp method and set using the WT_CONNECTION::set_timestamp method.
Applications will generally focus on two of these global timestamps: first, the beginning of the database's read window, called oldest_timestamp
, and second, the application time at which data is known to be stable and will never be rolled back, called stable_timestamp
.
The oldest timestamp is the oldest time at which a transaction is allowed to read. The historic values of data modified before this time can no longer be read by new transactions. Transactions already in progress are not affected when the oldest_timestamp
changes.
The stable timestamp is the earliest time at which data is considered stable. (Data is said to be stable when it is not only durable, but additionally, transactions committed at or before the stable time cannot be rolled back by application-level transaction management.) The stable timestamp is saved along with every checkpoint, and that saved time is the point to which the database is recovered after a crash. It is also the earliest point to which the database can be returned via an explicit WT_CONNECTION::rollback_to_stable call. (See Using rollback-to-stable with timestamps.) All transactions must commit after the current stable
timestamp.
Applications are responsible for managing these timestamps and periodically updating them.
Note that updating the stable timestamp does not itself trigger a database checkpoint or change data durability; updates before the new stable timestamp become stable (and thus visible after recovery) as part of the next checkpoint.
The following table lists the global timestamps an application can set using the WT_CONNECTION::set_timestamp method, including constraints.
Timestamp | Constraints | Description |
---|---|---|
durable_timestamp | none | temporarily set the system's maximum durable timestamp, bounding the timestamp returned by all_durable . |
oldest_timestamp | <= stable; may not move backward, set to the value as of the last checkpoint during recovery | Inform the system future reads and writes will never be earlier than the specified timestamp. |
stable_timestamp | may not move backward, set to the stable timestamp during rollback-to-stable and recovery | Inform the system checkpoints should not include commits newer than the specified timestamp. |
The global durable_timestamp
temporarily sets the system's maximum durable timestamp, bounding the timestamp returned when querying the all_durable
timestamp (see below). Note that the "system's maximum durable timestamp" is the maximum durable timestamp of any committed transaction, and not the timestamp of data changes which have made it to stable storage.
The durable_timestamp
is advanced to the transaction's durable timestamp (or the transaction's commit timestamp, if no durable timestamp was set), by any transaction commit with a durable (commit) timestamp after the current value.
The durable_timestamp
is set to the stable timestamp whenever the WT_CONNECTION::rollback_to_stable method is called.
durable
timestamp is used by the MongoDB server and is generally not expected to be useful to other applications.Setting oldest_timestamp
indicates future read timestamps will be at least as recent as the timestamp, allowing WiredTiger to discard history before the specified point. It is not required that there be no currently active readers at earlier timestamps: this setting only indicates future application needs. In other words, as active readers age out of the system, historic data up to the oldest timestamp may be discarded, but no historic data at or after the oldest
timestamp will be discarded. Retaining history can be expensive in terms of both disk and I/O resources.
During recovery, the oldest_timestamp
is set to its value as of the checkpoint to which recovery is done.
Attempting to set the oldest_timestamp
to a value earlier than its current value will be silently ignored.
The oldest_timestamp
must be less than or equal to the stable_timestamp
.
The stable_timestamp
determines the timestamp for subsequent checkpoints. In other words, updates to an object after the stable timestamp will not be included in a future checkpoint. Because tables in a timestamp world are generally using checkpoint durability, the stable_timestamp
also determines the point to which recovery will be done after failures.
During recovery, the stable_timestamp
is set to the value to which recovery is performed.
It is possible to explicitly roll back to a time after, or equal to, the current stable_timestamp
using the WT_CONNECTION::rollback_to_stable method. (See Using rollback-to-stable with timestamps.)
The stable_timestamp
should be updated frequently as it determines the amount of work required after recovery to bring the system online.
Attempting to set the stable_timestamp
to a value earlier than its current value will be silently ignored.
Using the stable timestamp in the checkpoint is not required, and can be overridden using the use_timestamp=false
configuration of the WT_SESSION::checkpoint call. This is not intended for general use, but can be useful for backup scenarios where rolling back to a stable timestamp isn't possible and it's useful for a checkpoint to contain the most recent possible data.
The following table lists the global timestamps an application can query using the WT_CONNECTION::query_timestamp method, including constraints. In all cases, a timestamp of 0 is returned if the timestamp is not available or has not been set, for example, the last_checkpoint
timestamp if no checkpoints have run, or oldest_reader
if there are no active transactions.
Timestamp | Constraints | Description |
---|---|---|
all_durable | None | The largest timestamp such that all timestamps up to that value have been made durable, bounded by the durable timestamp. |
last_checkpoint | <= stable | The stable timestamp at which the last checkpoint ran. |
oldest_reader | None | The timestamp of the oldest currently active read transaction. |
oldest_timestamp | <= stable | The current application-set oldest_timestamp value. |
pinned | <= oldest | The minimum of the oldest_timestamp and the oldest active reader. |
recovery | <= stable | The stable timestamp used in the most recent checkpoint prior to the last shutdown. |
stable_timestamp | None | The current application-set stable_timestamp value. |
The all_durable
timestamp is the minimum of the global durable_timestamp
value and a durable timestamp before any unresolved transaction in the system.
For example:
all_durable
will return 50.all_durable
will return 39.durable_timestamp
to 30, all_durable
will return 30.durable_timestamp
to 30, all_durable
will return 19.durable_timestamp
to 30, and a transaction then commits with a durable timestamp of 50, all_durable
will return 50.all_durable
value read by the application to be obsolete by the time the application receives it and applications using this value to make operational decisions are responsible for coordinating their own actions (for example by using high-level locks) to ensure its validity. Specifically, any transaction commit with a durable timestamp (or a commit timestamp if no durable timestamp was set), larger than the current durable_timestamp
value will set the durable_timestamp
to the transaction's durable (commit) timestamp. Calling the WT_CONNECTION::rollback_to_stable method will reset the durable_timestamp
to the stable timestamp.all_durable
timestamp does not prevent additional prepared but uncommitted transactions from appearing if a transaction prepares, setting an earlier durable timestamp.all_durable
query is used by the MongoDB server and is generally not expected to be useful to other applications.The all_durable
timestamp is read-only.
The last_checkpoint
timestamp is the stable timestamp of the previous checkpoint (or 0 if no checkpoints have run). This timestamp is a general case of the recovery
timestamp. Generally, applications are expected to use the recovery
timestamp and not the last_checkpoint
timestamp.
The last_checkpoint
timestamp is read-only.
The oldest_reader
timestamp is the oldest transaction read timestamp active, including the read timestamp of any running checkpoint.
The oldest_reader
timestamp is read-only.
The oldest_timestamp
is the current oldest_timestamp
as set by the application.
The pinned
timestamp is the minimum of oldest_timestamp
and the oldest active reader, including any running checkpoint. It is not the same as oldest_timestamp
because the oldest timestamp can be advanced past currently active readers, leaving a reader as the earliest timestamp in the system. Applications can use the pinned
timestamp to understand the earliest data required by any reader in the system.
The pinned
timestamp is read-only.
The recovery
timestamp is the stable timestamp to which recovery was performed on startup. Applications can use the recovery
timestamp to retrieve the value the stable timestamp had at system startup.
The recovery
timestamp is read-only.
The stable_timestamp
is the current stable_timestamp
as set by the application.