Version 12.0.0
All Classes Functions Variables Typedefs Enumerations Enumerator Modules Pages
Pre-fetch
Data StructuresSource Location
WT_PREFETCH
WT_PREFETCH_QUEUE_ENTRY
WT_REF
src/btree/bt_prefetch.c
src/conn/conn_prefetch.c
src/session/session_prefetch.c

Caution: the Architecture Guide is not updated in lockstep with the code base and is not necessarily correct or complete for any specific release.

What is pre-fetching? When is pre-fetching useful?

Database records must be in memory before WiredTiger can operate on them, but if that record is initially on disk, WiredTiger needs to perform an expensive I/O operation to fetch the record. Instead of performing I/O on demand, WiredTiger can be configured to predict which records are about to be accessed and speculatively load (pre-fetch) the corresponding page, hiding the I/O delays from the user.

Applications access the database in different ways, and so it is not necessarily ideal to always enable pre-fetching. Pre-fetching is beneficial when an operation has locality of reference and it is reasonably likely that the application will access data in a sequential way. When future operations are likely to access data that is near the data currently being accessed, WiredTiger can predict which pages will be needed soon and pre-emptively read them into the cache.

Pre-fetch Algorithm

  1. The main application thread finds candidate pages for pre-fetching as it traverses the B-tree.
  2. It populates the pre-fetch queue with these candidate pages.
  3. Pre-fetch worker threads will pop off pages from the pre-fetch queue and read pages into the cache.

Finding Candidate Pages

Pre-fetch looks for potential candidate pages in two places: (1) while walking the B-tree (__tree_walk_internal) and (2) when verifying a tree (__verify_tree). The __wt_session_prefetch_check function contains a few checks to decide if pre-fetching should be performed for a given ref. Pre-fetching does not operate on internal pages, and should not be performed until we have triggered at least two leaf page reads from sequential read requests. Pre-fetching is also disabled for tiered tables, and special operations with the exception of verify.

Populating the Pre-fetch Queue

The pre-fetch queue is a first-in-first-out (FIFO) data structure containing the page references of potential pre-fetch candidates. The __wti_btree_prefetch function is responsible for pushing candidate pages onto the queue. Given a ref, it will queue the next set of pages if they are leaf pages, not already in the cache, and not already on the pre-fetch queue. To prevent overloading the pre-fetch queue, there is a maximum number of candidate pages allowed in the queue (WT_MAX_PREFETCH_QUEUE), and a limit on the maximum number of pages we can add to the queue during a pre-fetch check (WT_PREFETCH_QUEUE_PER_TRIGGER). Pages will not be added to the queue if either of these conditions are met.

Pre-fetching Content Into the Cache

Pre-fetch worker threads are responsible for the work of reading page contents into the cache. The __wt_prefetch_thread_run function is the entry point for a pre-fetch thread, and threads will continue to run while the pre-fetch queue is not empty. When there are no more pre-fetch entries to process, pre-fetch worker threads wait on a condition variable that signals when an entry has been pushed onto the pre-fetch queue. Pre-fetch worker threads invoke the __wt_page_in function which reads the page from disk and builds an in-memory representation.

Pre-fetch API

Connection-Level Configuration

wiredtiger_open("prefetch=(available=true,default=false)")

The available setting determines whether the pre-fetching functionality can be enabled for the given database, and starts up the pool of pre-fetching threads if it is turned on. The default setting determines whether pre-fetching is turned on or off for all sessions belonging to the connection.

Session-Level Configuration

WT_CONNECTION.open_session("prefetch=(enabled=true)")

The session-level configuration can be used to override the connection-level configuration for specific sessions if desired.

Interaction With Other Components

Considering Current Cache Busyness

Cache busyness increases in proportion to the eviction workload and the rate of pages being read into the cache. It is important to consider how pre-fetch impacts cache utilization as it may leave the cache in a state that increases operational latency. Pre-fetch stops adding page references to the queue as soon as it detects that we are close to hitting the eviction clean trigger. It will also skip processing page references that are being retrieved from the queue until space in the cache clears up.

Ensuring References on the Pre-fetch Queue Remain Valid

Page references on the pre-fetch queue may become invalid if other operations that modify page references are happening in parallel. Eviction is not allowed to operate on any page references that are on the pre-fetch queue, because it may split pages and free the underlying ref/s. Since eviction has no way to coordinate with the queue to indicate references are no longer valid and should be removed, we take a conservative approach by preventing eviction on these refs. This condition is checked with the WT_REF_FLAG_PREFETCH flag. On top of this, the pre-fetch flag also guarantees that the corresponding parent page cannot be evicted until pre-fetch has processed the leaf page.

Concurrency

Pre-fetch is a highly concurrent module and we rely on a few mechanisms to ensure that it runs smoothly with the rest of the system.

  • A connection level lock prefetch_lock is used for the pre-fetch queue, and is needed when adding and removing WT_PREFETCH_QUEUE_ENTRY items from it.
  • Whenever we modify a ref structure, e.g. when adding it to the pre-fetch queue and setting the corresponding WT_REF_FLAG_PREFETCH flag, it must be in the WT_REF_LOCKED state.
  • There are scenarios which fall outside of the typical eviction flow, where pages can be forcibly removed from the cache if eviction has exclusive eviction access to that file (see __wt_evict_file_exclusive_on). When this occurs, we need to remove all page references associated with that file's tree from the pre-fetch queue because the system may evict internal pages which still have child pages present on the queue. The prefetch_busy variable on the WT_BTREE data structure keeps track of pre-fetch threads operating on that tree. When clearing pages from the pre-fetch queue belonging to a certain tree, we wait for any other concurrent pre-fetch activity to drain to prevent invalid ref accesses.