Version 11.1.0
Using transaction prepare with timestamps

Applications configuring timestamps can use the WT_SESSION::prepare_transaction call as the pre-commit step in the implementation of a two-phase commit protocol. (WiredTiger currently only permits transactions to be prepared when timestamps are in use.)

The WT_SESSION::prepare_transaction method assigns a prepare timestamp to the transaction, which will be used for visibility checks until the transaction is committed or aborted. Once a transaction has been prepared no further data operations are permitted, and the transaction must next be resolved by calling WT_SESSION::commit_transaction or WT_SESSION::rollback_transaction. Calling WT_SESSION::prepare_transaction only guarantees that a subsequent WT_SESSION::commit_transaction will succeed and specifically does not guarantee the transaction's updates are durable.

If a read operation encounters an update from a prepared transaction, the error WT_PREPARE_CONFLICT will be returned indicating it is not possible to choose a version of data to return until the prepared transaction is resolved. Retrying such failed operations is reasonable, assuming prepared transactions are expected to be resolved quickly.

Both a commit_timestamp and a durable_timestamp must be specified when committing a prepared transaction. The job of the durable_timestamp is to allow a prepared transaction to be predictably included or excluded from a checkpoint. For non-prepared transactions, the commit timestamp controls both transaction update visibility and durability. For prepared transactions, the durable timestamp separately controls the durability, that is, checkpoint uses the durable timestamp of the prepared transaction for persisting a transaction's updates rather than the commit timestamp.

/*
* Prepare a transaction which guarantees a subsequent commit will succeed. Only commit and
* rollback are allowed on a transaction after it has been prepared.
*/
error_check(session->open_cursor(session, "table:mytable", NULL, NULL, &cursor));
error_check(session->begin_transaction(session, NULL));
cursor->set_key(cursor, "key");
cursor->set_value(cursor, "value");
error_check(session->prepare_transaction(session, "prepare_timestamp=2a"));
error_check(
session->commit_transaction(session, "commit_timestamp=2b,durable_timestamp=2b"));

Prepared transactions are limited to a single commit timestamp, which can only be set after the transaction has successfully prepared. The prepare timestamp can be set at any point in the transaction's lifecycle prior to preparing it; doing so does not itself prepare the transaction but does oblige the application to prepare it before committing.

The durable timestamp can only be set after the transaction has been prepared and a commit timestamp set, or as part of transaction commit. The durable timestamp provides input into the system's all_durable timestamp.

MongoDB specifies different commit and durable timestamps because prepared transactions are higher-level MongoDB operations, requiring cluster-level consensus on visibility. Applications without similar requirements for prepared transactions should set the durable and commit timestamps to the same time.

Warning
When a transaction has a durable timestamp later than its commit timestamp, reading its writes in a second transaction and then committing other writes such that the second transaction becomes durable before the first can produce data inconsistency.
In this scenario the second transaction depends on the first; thus it must be rolled back if the first transaction is rolled back; thus it must not become durable before the first transaction. Applications that create gaps between their commit timestamps and durable timestamps are responsible for either not reading in those gaps, or establishing an ordering for the durable timestamps of their commits to make sure that this scenario cannot occur. (Note that for the purposes of this issue the commit timestamp of a non-prepared transaction is also its durable timestamp, and committing with no timestamp is roughly comparable to committing at the current stable timestamp.)
This scenario is not currently detected by WiredTiger; applications are responsible for avoiding it. In future versions such transactions might fail.
Similarly, if a transaction has a durable timestamp later than its commit timestamp, and a checkpoint is taken while the global stable timestamp is between these points, the transaction may or may not be visible when the checkpoint is opened with a checkpoint cursor; the behavior is unspecified. Applications should avoid this situation.

Configuring ignore_prepare

The WT_SESSION::begin_transaction method includes the ignore_prepare configuration. Setting the ignore_prepare configuration to true causes readers to ignore prepared transactional values, that is, returning read values as if the prepared transaction didn't exist. This prevents readers from seeing the WT_PREPARE_CONFLICT error, returning the data as it was before the transaction was prepared. For this reason, applications using ignore_prepare cannot rely on repeatable reads, as the same read after the prepared transaction is resolved could return a different value. Additionally, setting the ignore_prepare configuration also causes the transaction to be read-only, and attempts to update items in the transaction will fail.

Warning
The ignore_prepare configuration can also be set to force, which not only causes readers to ignore prepared transactions, but also allows the transaction to make updates. This can cause data inconsistency problems with the commit or rollback of the prepared transaction, or the disappearance of a prepared update by overwriting it.

Checkpoints taken while a transaction is prepared but not committed will not include the prepared transaction; reading from the checkpoint with a checkpoint cursor will behave as if the prepared transaction did not exist. This is comparable to the ignore_prepare behavior and carries the same consequences: reading the checkpoint and reading the live database at the checkpoint's time after the prepared transaction is resolved may produce different values. This situation can only arise if the stable timestamp is advanced (and a checkpoint then taken) while a transaction is prepared and still unresolved. Applications wishing to rule out this situation can avoid doing that.

WT_SESSION::open_cursor
int open_cursor(WT_SESSION *session, const char *uri, WT_CURSOR *to_dup, const char *config, WT_CURSOR **cursorp)
Open a new cursor on a data source or duplicate an existing cursor.
WT_SESSION::commit_transaction
int commit_transaction(WT_SESSION *session, const char *config)
Commit the current transaction.
WT_SESSION::begin_transaction
int begin_transaction(WT_SESSION *session, const char *config)
Start a transaction in this session.
WT_CURSOR::set_value
void set_value(WT_CURSOR *cursor,...)
Set the value for the next operation.
WT_SESSION::prepare_transaction
int prepare_transaction(WT_SESSION *session, const char *config)
Prepare the current transaction.
WT_CURSOR::set_key
void set_key(WT_CURSOR *cursor,...)
Set the key for the next operation.