WiredTiger has a lot of domain specific nomenclature - this page attempts to decode it. This is intended for those navigating the WiredTiger source tree - it describes terms internal to the storage engine.
The definitions list how the terms are used in WiredTiger, which may not match how they are used in other settings.
Term | Category | Definition |
---|---|---|
address cookie | block manager | an opaque set of bytes returned by the block manager to reference a block in a Btree file, it includes an offset, size, checksum, and object id. |
atomicity | transactions | a guarantee provided within the context of a transaction that all operations made within that transaction either succeed or fail. |
barrier | general | a call that forces the CPU and compiler to enforce ordering constraints on memory operations, used in multithreaded code. |
block | block manager | the smallest granularity of data that is read from or written to a Btree file. A btree page, when it appears on disk, may consist of one or more blocks. |
block compression | btree | after pages of btrees are encoded to be written, they may be compressed according to a configured algorithm. Compressors may be added to WiredTiger via the extension mechanism. |
block manager | block manager | an internal API that abstracts writing, reading and allocating blocks to and from files. A block manager is associated with a Btree. |
Bloom filter | lsm | a data structure that identifies that a key cannot be present. Used by LSM. |
btree | btree | a data structure on disk or in memory that stores keys, it with its associated value. In WiredTiger, all btrees are in fact B+trees, meaning that adjacent keys and values are grouped into pages. |
bulk insert | btree | the capability to rapidly insert a set of ordered key value pairs into a Btree. In WiredTiger, a Btree's dhandle must be obtained exclusively, guaranteeing that the inserts can be done single threaded. |
cell | file format | a key or value packed as it appears on disk for a Btree. |
checkpoint | checkpoint | a stable point that bounds how much work recovery must do. Data committed before a checkpoint is guaranteed to reside in data files. Thus, recovery must replay log files starting at the last completed checkpoint. |
chunk | lsm | a single file within an LSM tree. |
collator | general | a comparison function that determines how keys will be ordered in a Btree. The default collator is byte oriented. A custom collator may be packaged as an extension and attached to a Btree. |
column | schema | a single field in a record, which must be unpacked from a raw key or value. |
column group | schema | a set of column store Btrees, whose record ids logically correspond. Together, a column group represents a table of data, with its columns being the union of columns in the individual Btrees. |
column store | btree | a Btrees that has as its key a record id |
commit timestamp | transactions | values in this transaction become visible at this timestamp. |
compare and swap (CAS) | general | a CPU instruction that can be used to coordinate the activity of multiple threads. Specifically, the instruction atomically changes a memory location to a new value only if the existing value matches a second value. |
compression | general | one of several techniques to reduce the size of data on disk and memory. See block compression, run length encoding, Huffman encoding, key prefix compression. |
connection | general | an encapsulation of information stored for an application's use of a WiredTiger instance attached to a specific home directory. This encapsulation is kept in a handle that is used in the API. |
cursor | general | a handle in the WiredTiger API that encapsulates a position in a Btree, and may hold the key and value associated with that position. |
data handle | general | an abstraction of a named file, table or other data source in WiredTiger |
data source | general | an abstraction that extends the uri name space. A custom data source defines APIs that are invoked when objects are created in, or cursors are opened on, the new name space. |
dhandle | general | see data handle |
diagnostic build | general | a build of WiredTiger that includes extra code to check invariants, and abort if they fail. |
durability | transactions | the property that guarantees that transactions that have been committed will survive permanently. |
encryption | btree | in WiredTiger, encryption is applied to data files, log files and metadata. Encryption algorithms may be added to WiredTiger via the extension mechanism. |
encryptor | btree | an encryption algorithm packaged as an extension. |
eviction | general | the process of freeing memory in the cache, which often includes reconciling and writing dirty pages. This term generally includes the process of deciding which pages should be evicted. |
extension | general | a module that uses the WiredTiger extension interface, allowing it to be added in an encapsulated way, without altering the core of the WiredTiger library. Extensions are typically compiled into shared libraries. |
extent list | block manager | a list of contiguous sets of blocks in a file. Extent lists are used to track available blocks and/or used blocks. |
fast truncate | btree | a truncate that spans a key range over more than one leaf page may be performed quickly by deleting entire data pages at a time. |
frag | verify | in verify, a part of a file that is the minimum allocation size and aligned the same. The smallest blocks may be the same as a frag, other blocks may be composed of multiple contiguous frags. |
hazard pointer | general | a data structure that assists in lock free algorithms, used by WiredTiger Btrees |
history store | general | storage in WiredTiger (currently implemented as a Btree) that keeps previous values associated with a key. |
home directory | general | the directory containing all the data for a WiredTiger connection. The connection is attached to this directory via a call to wiredtiger_open. |
Huffman encoding | btree | data in a btree may optionally be encoded using a Huffman encoding, based on either English letter frequencies or a custom encoding. See Huffman Encoding for details. |
index | schema | a Btree with records having values that are keys to another Btree. An index (or set of indices) become associated with a primary Btree via the table construct. |
insert | btree | an in-memory structure holding a key and a set of updates for a Btree. |
internal page | btree | either in-memory or on disk, a page in a Btree that has a set of keys that reference pages beneath it. |
isolation | transactions | determines which versions of data are visible to a running transaction, and whether it can see changes made by concurrently running transactions. |
key prefix compression | btree | a technique to store identical key prefixes only once per page. See File formats and compression |
kv or k/v | btree | a key/value pair in a Btree. |
leaf page | btree | either in-memory or on disk, a page in a Btree that has a pairs of keys and values, and no pages beneath it. |
log structured merge tree (or LSM tree) | lsm | a data structure that stores keys and values (as a Btree), but composed of potentially many trees (Btrees in WiredTiger). |
LSM | lsm | see log structured merge tree |
merge | btree | the process of making a single page out of two adjacent smaller pages in a Btree. This can occur with leaf pages or internal pages. |
metadata | general | a set of data that is used to help manage files, tables, indices and column groups in WiredTiger. |
metadata tracking | general | a technique to track a set of changes to the metadata, so they can be "rolled back" when an error occurs. |
mutex | general | a locking algorithm that waits, allowing other threads to execute, when the lock cannot be immediately acquired, and is woken to retry when the lock is available. Although more heavyweight than a spin lock, it consumes fewer resources while waiting. Appropriate for resources that may be in contention and may be held for longer periods. |
object id | tiered storage | an integer that indicates an individual file that is a part of a Btree. |
oldest timestamp | transactions | a connection-wide value, specifies the oldest timestamp that must be tracked by the system. Values with stop timestamps older than this value do not have to be retained. |
on disk | general | a shorthand meaning "in the file system", and may or may not be stored on a physical disk. |
overflow page | btree | either in-memory or on disk, a page in a Btree that has a key or value that is too large to appear on a leaf or internal page. |
pack | general | convert in-memory representations of keys or values to a compact format for file storage |
page | btree | one logical node of a Btree, in memory, or on disk, that may store keys and/or values. See internal page, leaf page, overflow page. |
page reference | btree | an indirect reference to a key on a page in a Btree. The reference struct (WT_REF) acts as the indirection and may include a pointer to an in memory object or an address cookie to find the object on disk. "ref" for short. |
page split | btree | the process of breaking up a large page in a Btree into multiple smaller pages. This may occur with either leaf pages or internal pages. |
pthread | general | a thread implementation used on POSIX systems |
raw | schema | data as it appears on-disk, typically in a packed format. |
read timestamp | transactions | a timestamp that specifies what data should be visible. |
recno | general | a 64 bit integer. When used as a key, the Btree is said to be a column store, and the keys are assigned incrementally by WiredTiger when new records are inserted. |
reconcile | general | convert the in-memory version of a page to a more compact form for file storage. |
recovery | general | a procedure used on opening WiredTiger whereby, starting at the last checkpoint written, log records are played back to bring the database up to date. |
ref | btree | see page reference |
RLE | general | see run length encoding |
run length encoding | general | also known as "RLE". A technique to save space whereby repeated items can be represented by a count followed by the item. For example the string of letters "AAAAAAABBBCDDDD" might be encoded as "7A3BC4D" using "out of band" digits. |
salvage | general | functionality to repair Btree disk files. |
schema | schema | the algorithms that use metadata to map the raw keys and values in Btrees to higher level abstractions like tables with typed columns. |
session | general | an encapsulation of information stored for single thread of control in the application. This encapsulation is kept in a handle that is used in the API. Sessions manage transactions and performs the transactional operations. |
skip list | general | a variation of a sorted linked list that allows for faster (O(log(N))) searching. It is used in WiredTiger as a lock-free data structure for keys inserted on a page. |
spin lock | general | a simple locking algorithm that continually attempts to acquire a lock, appropriate for resources that are likely to be available and may be held for a short time. This lock is CPU intensive while it is waiting. Contrast with mutex |
split | btree | see page split |
stable timestamp | transactions | a connection-wide value, checkpoints will not include commits newer than this value. |
table | schema | an abstraction that allows one or more btrees to be used in a coordinated way, which is specified by the schema. The table coordinates indices and column groups. |
thread | general | a thread of control. WiredTiger uses pthreads for POSIX systems. |
thread group | general | a group of threads that perform a common task. Eviction is performed by a thread group. |
timestamp | transactions | refers to one of several types of 64 bit values used to control transactional visibility of data in WiredTiger. See oldest timestamp, stable timestamp, commit timestamp, read timestamp. |
transaction | transactions | a unit of work performed in WiredTiger that is atomic and consistent with the specified isolation level and durability level. |
truncate | btree | an operation to remove a range of key/values from a Btree. See also fast truncate |
update | btree | a value associated with a key in a Btree. A key may have many updates, these correspond to values written to the key previously. |
update chain | btree | a list of updates. |
verify | general | functionality to check integrity of Btree disk files. |