Version 11.1.0
Rollback to Stable (RTS)
Data StructuresSource Location
src/txn/

Caution: the Architecture Guide is not updated in lockstep with the code base and is not necessarily correct or complete for any specific release.

RTS is an operation that retains only the modifications that are considered stable. A modification is considered unstable if it has a durable timestamp greater than the stable timestamp or its transaction id is not committed according to the recovery checkpoint snapshot. The recovery checkpoint snapshot corresponds to the checkpoint transaction snapshot details saved at the end of every checkpoint.

RTS works for all storage types except for LSM trees.

Overview of RTS

RTS scans tables in the database to remove any unstable modifications. A table is selected by RTS if it meets one of the following conditions:

  1. It is modified.
  2. It has prepared updates.
  3. It has updates from transactions greater than the recovery checkpoint snapshot (this is applicable only during the restart phase).
  4. One of the checkpoint durable timestamps (start/stop) is greater than the stable timestamp.

On the other hand, a table is skipped by RTS if it does not meet any of the previous criteria but does meet one of the following:

  1. It is empty.
  2. It has timestamped updates but there is no stable timestamp set.
  3. The associated files are missing.
  4. The associated files are corrupted.
  5. It is dedicated to metadata.
  6. It is a logged table. RTS is not necessary for tables with commit-level durability.
  7. It has no aggregated time window. It is worth noting that the tables with no aggregated time window are only created in versions older than MongoDB 4.4.

For each selected table, its pages are read into the cache. The unstable in-memory updates and the unstable historical versions from the history store are removed. The unstable on-disk version is replaced by the latest stable in-memory update if any, otherwise by the latest stable version from the history store. If there is no stable in-memory updates and no stable version in the history store, the data store entry is completely removed. In the situation where the table has no timestamped updates, all the historical versions related to this table are removed from the history store as they are no longer required and the on-disk value is preserved.

Finally, RTS is triggered on startup, shutdown or when the application is initiated. It is worth noting that RTS needs exclusive access to the database and no concurrent transaction is allowed.

How RTS aborts unstable updates

For in-memory updates, RTS traverses all the updates defined in the update list and aborts every update until a stable is found. To remove any key from the table, RTS adds a globally visible tombstone to the key's update list and this key gets removed later during reconciliation.

For on-disk values, if the start time point is not stable and the update list does not contain any stable update, RTS searches the history store to find a stable update to replace the unstable on-disk version with. If the history store does not contain any stable update, the on-disk key is removed. If the stop time point exists and it is not stable, the on-disk update is restored into the update list.

Skipping reading unnecessary pages into memory

RTS doesn't load pages that don't have unstable updates that need to be removed. This check is done by comparing the time aggregated values with the stable timestamp or the recovery checkpoint snapshot during the tree walk.

Example 1

In the following scenario, there are 3 updates on the update list, an on-disk value, the history store is empty and the stable timestamp is 10:

Update list: U3 (30) -> U2 (20) -> U1 (10)
On-disk: U2 (20)
History store: Empty
Stable timestamp: 10

RTS will mark all the unstable updates from the update list as aborted and the on-disk value will be replaced the most recent stable update after reconciliation:

Update list: U3 (30) Aborted -> U2 (20) Aborted -> U1(10)
On-disk: U1 (10)
History store: Empty
Stable timestamp: 10

Example 2

Let's consider a scenario where the history store is not empty:

Update list: U5(50) -> U4 (40)
On-disk: U3 (30)
History store: U2 (20) -> U1 (10)
Stable timestamp: 20

In this case, RTS will abort the unstable update from the update list again but will also move the latest stable update from the history store to the update list:

Update list: U2 (20) -> U5 (50) Aborted -> U4 (40) Aborted
On-disk: U2 (20)
History store: U1 (10)
Stable timestamp: 20

Example 3

The last scenario covers the situation when the on-disk update is restored into the update list because the stop time point exists and it is not stable:

Update list: Empty
On-disk: U3 (30) with a stop timestamp (40)
History store: U2 (20) -> U1 (10)
Stable timestamp: 30

After RTS, we end up with the following state:

Update list: U3 (30)
On-disk: U3 (30)
History store: U2 (20) -> U1 (10)
Stable timestamp: 30