Caution: the Architecture Guide is not updated in lockstep with the code base and is not necessarily correct or complete for any specific release.
Metadata in WiredTiger captures important information about the user's database. Metadata includes tracking essential information such as the files and tables present in the database, their associated configurations, as well as the latest checkpoints. The checkpoint information tells WiredTiger where to the find root page and all other parts of its tree when accessing the file.
The main metadata in WiredTiger is stored in the WiredTiger.wt
file.
The keys in the WiredTiger.wt
table are uri
strings. For example, keys can include "table:"
& "file:"
prefixed URIs, representing typical row-store WiredTiger tables. A non-exhaustive list of other example keys could also include "index:"
, "colgroups:"
and "system:"
prefixed URIs, representing indexes, colgroups and special system values respectively.
The value corresponding to a given uri
key in the metadata table is its associated configuration string. The configuration value itself is a list of key/value pairs in string form. The key/value pairs in the configuration strings are dependent on the type of uri
. Thus, a metadata entry with a uri
key beginning with "table:"
will be a configuration string having entries such as key_format
and value_format
to describe the data encoding for the uri. A metadata key beginning with "file:"
will have a different set of configuration entries associated with it. See Configuration Strings for more details about WiredTiger configuration strings.
The latest checkpoint information for each btree in the system is stored in its metadata entry. The checkpoint metadata provides a known point in time from which WiredTiger can recover a btree's data after a restart. The checkpoint metadata for a given btree is stored within the configuration string under its associated uri
key within the WiredTiger.wt
table. Metadata configuration entries relevant to capturing a given btree's checkpoint state include:
checkpoint:
The file's checkpoint entries, including their timestamp, transaction ID and write generation. Most importantly this information also tracks an encoded address cookie that describes the blocks that make up the checkpoint's data.checkpoint_lsn:
The LSN of the last checkpoint.checkpoint_backup_info:
Incremental backup information.The metadata file WiredTiger.wt
is a regular WiredTiger table itself. As such, the metadata table has a btree that also needs to be checkpointed. Since a new metadata entry needs to be created for each btree at the time of checkpointing, the metadata table is the last file to be checkpointed in the process. Checkpointing the metadata table last is important for capturing a consistent state of the btrees in the database at the time of the last checkpoint.
Due to this ordering, to successfully restore the checkpoint of the WiredTiger.wt
file, WiredTiger captures the most recent checkpointing information of the metadata table in a separate special turtle file called "WiredTiger.turtle"
. The turtle file itself is metadata for the WiredTiger metadata. Using the turtle file, WiredTiger can read the latest checkpoint information from the file when recovering the WiredTiger.wt
table. Recovering the checkpoint of the WiredTiger.wt
table in turn restores a consistent view of the checkpointing information for the other btrees in the system.
The contents of the metadata table is ASCII printable. The URIs and configuration values are all printable strings.
To read the metadata, a caller can open a cursor, via WT_SESSION::open_cursor
, with a uri
equal to "metadata:"
. Changes to the metadata cannot be written through the metadata cursor interface i.e. the metadata cursor is a read-only interface. Metadata is only changed/affected by API calls.
Additionally, to only read strings relating to creation configurations of the various uri
keys present, a cursor can be opened with a uri
equal to "metadata:create"
. This type of cursor causes WiredTiger to strip out internal metadata when querying.
Lastly the metadata in the WiredTiger.wt
table can be dumped using the wt
utility. A command option for specifically printing the metadata is "wt list -v"
. This will display the various uri
key strings with their associated configuration strings.
When table A
is created (without named column groups), there are three entries in the metadata file, having these keys:
"table:A"
"colgroup:A"
"file:A.wt"
For brevity, just the keys are shown; the values in the metadata btree are configuration strings that would show information about key/value format and other creation options, checkpoints associated with the table, etc.
The "table:A"
represents the table interface that the API caller uses, and the associated data structure in memory for the table is WT_TABLE
. If there were column groups or indices, there may be multiple btrees associated with the table, and listed in the metadata, but in this example there is just one.
Every table has one or more "colgroup:"
entries. The column group entry's metadata references the "file:"
entry that stores the btree, in this case it would reference "file:A.wt"
. Column groups do not have an explicit data structure associated with them, but their associated btrees are referenced via an array in the WT_TABLE
.
The "file:"
entry represents the btree stored on disk, as well as parts cached in memory. The data structure implied by this entry is a WT_BTREE
.
In summary, the relationship between prefixes and data structures is as follows:
URI prefix | struct type | has dhandle? | notes |
---|---|---|---|
table: | WT_TABLE | yes | the dhandle is cast to (WT_TABLE *) |
file: | WT_BTREE | yes | dhandle->handle is a (WT_BTREE *) |
colgroup: | no | stored in an array in the WT_TABLE |
See Data Handles for more information about data handles (dhandles).