Data Structures | Source Location | |
---|---|---|
|
|
Caution: the Architecture Guide is not updated in lockstep with the code base and is not necessarily correct or complete for any specific release.
WiredTiger's use of configuration strings in its API provides flexibility for callers, but when they are used in high performance code paths, they can introduce bottlenecks. This is typically because, in the worst case, internal WiredTiger code repeatedly checks for keys by scanning the entire configuration string. The longer the configuration string, the greater the scanning time. Compiled configuration strings provide a binary representation of the string, allowing for quick access to keys and their values.
The old configuration framework is known as config, where the new framework is conf. (We hope that a shorter name implies faster processing). The conf framework builds upon and uses elements of config.
As the configuration compilation subsystem (conf) was introduced after the WiredTiger code and API reached stability, there is a set of performance improvement goals and two compatibility goals.
On the performance side, the new conf processing should greatly reduce the impact of these actions, which take the bulk of the time in traditional configuration processing:
api_data.py
.The first compatibility goal is on the calling side, to limit the API explosion that has otherwise happens when we provide a "fast" version of an API. One casualty of adding alternate APIs is that they are lightly tested in C, and often never tested in our Python tests. Our approach is to introduce a new API that can take a configuration string and returns another "compiled" string, that is actually a reference to the binary representation of the configuration string. That string can be used in any API that supports compiled configuration. This means that as APIs are adapted over time to allow a compiled configuration argument, no new changes or structures are needed for the API. Callers can continue to use regular configuration strings, or can pre-compile any configuration string that would be used in a performance path.
A second compatibility goal is with respect to the internal implementation of the API within WiredTiger. These API(s) need to be changed to allow the compiled strings to be used, as well as allow non-compiled strings to be used. These changes should be easy to do and not prone to error.
The primary structure used by conf is WT_CONF
. This provides information about both a single configuration string or a set of configuration strings. That is, in WiredTiger code, it replaces variables or arguments such is const char *config
as well as arguments like const char *cfg[]
.
To allow for fast checking of keys, we give an integer value to each key used in the entire set of configurations. This is done automatically by the api_config.py
script in the dist
directory. When WT code asks to get a configuration value for a key, it is now able to use an integer key rather than a string. Previously, code used "read_timestamp" to look up a key, now it uses the integer WT_CONF_ID_checkpoint_read_timestamp
. Some of this is hidden from view via macros, so that code that previously did:
can now do:
Note that both of these code snippets return cval
, which is a WT_CONFIG_ITEM
. So any code following this that uses the cval
is unchanged. This has allowed us to make mostly mechanical changes on the WiredTiger side, reducing the possibility of errors.
Having one of these WT_CONF_ID_*
(or "conf id") integer keys allows the conf functions to access needed information quickly. The WT_CONF
struct (which corresponds to conf
in the second code snippet above) contains a table that is indexed by this WT_CONF_ID_*. That is here:
The value of WT_CONF->value_map
[conf_id] for a given conf_id
gives us another index, which is called conf_value_index
in the code. This value is either 0
, meaning the key is not in this WT_CONF
, or it is an offset into an array of WT_CONF_VALUE
entries. Typically, a WT_CONF_VALUE has an embedded WT_CONFIG_ITEM, which is what we want. Thus, returning a value from a WT_CONF
usually means two indirect references, and then we have a populated WT_CONFIG_ITEM
to return.
If we were looking for greater performance, we could have modified this to use one table access by skipping having a value_map
. Since there are currently over 300 possible keys in the system, this would have meant having a larger chunk of memory for each compiled string, 12K for starters (300 * ~40 bytes for each WT_CONF_VALUE). But supporting composite keys means that most compiled strings would need some multiple of that value. For now, we use a value_map
, this could be reassessed.
A WT_CONF_VALUE
is a simple struct that has a union along with a discriminant tag. The tag indicates how the union should be interpreted. The union is one of:
WT_CONFIG_ITEM
. This is the typical case, it contains a value ready to returnWT_CONF
in the WT_CONF
array to use for subordinate keys.WT_CONF_BIND_DESC
. This struct contains an offset into WT_SESSION::conf_bindings.values table, that's where the bound value will be found. It also contains info to perform runtime checks on the bound value when we get it.The WT_CONF
struct is meant to be position independent. A position independent struct can be copied or moved to another memory location without "fixing up" any internal pointers, or allocating new structs. The only pointers in WT_CONF
are to items that are immutable and will never change locations, like strings. For example, when we do a compilation, we make a copy of the source string. That copy will always exist for as long as we have the connection open. The WT_CONFIG_ITEM
entries that are referenced by WT_CONF
have pointers to parts of that immutable string. But the references in WT_CONF
to WT_CONFIG_ITEM
entries, though, are not pointers. The reference is achieved by using an offset to an adjacent piece of memory.
What if the WT_CONF_VALUE
array was not adjacent to the WT_CONF? They could be allocated separately, but then the WT_CONF
would need a pointer to where the array was allocated. Then, copying the WT_CONF
would probably need a new WT_CONF_VALUE
array to be allocated and copied, and the pointer adjusted in the copy. And copies are needed to make changes to a base configuration. When we compile a configuration string, we want to take an (already compiled) WT_CONF for all the default values for that configuration and copy it, and then overwrite it with the values in the configuration string. To do that, we need a fast copy without extra allocations and fix-ups.
Why do we need compilation to be fast anyway? After all, an efficient program will do pre-compiling in advance. The reason is to also efficiently handle any uses of the API that don't use pre-compiled strings. For those uses, we go ahead and pre-compile the given configuration string. This has several advantages:
The total size of the WT_CONF
struct and its associated WT_CONF_VALUE
structs can be totally determined at compile time. This is also true for WT_CONF
structs that have subordinate WT_CONF
structs, which is used to support composite keys. Subordinate WT_CONF
structs appear after the parent WT_CONF
struct and before the WT_CONF_VALUE
array. Since they each have their own fixed size, but each size may be different, we have, within the WT_CONF
struct, an offset of the WT_CONF_VALUE
array. The WT_CONF_VALUE_TABLE_ENTRY
macro is used to access the WT_CONF_VALUE
entries, using the offset.
Suppose we have a fictional key "blah"
, and we want a slick way to get its associated ID (WT_CONF_ID_blah) while using the syntax:
Well, we can certainly use identifier pasting (the ##
operator) via the C pre-processor to accomplish this. That's fine, but...
WiredTiger also makes heavy use of composite keys to reach down into sub-configurations (or "categories"). For example, suppose our fictional configuration string looks like "blah=7,foo=(bar=123,hello=456)"
and we want to get the value for foo.bar
. How do we identify these multiple parts to our access functions? And note that we have depths of sub-configuration greater than two. Ideally, we want to take code that currently looks like this:
and change it to:
Our solution is to create a const struct in conf.h
that contains something like this:
This is pretty close to our ideal, we need to use capitalization for walking through subordinate structures, like so:
The capitalization was an unfortunate side effect of having an unregulated, but flexible namespace. For example, "checkpoint"
is both a key and a category name.
As you might guess __wt_conf_gets_def
is a macro that changes its third argument Foo.bar
to WT_CONF_ID_STRUCTURE.Foo.bar
. And that in turn is a const
value, it takes no space and can be fully evaluated at compile time to be WT_CONF_ID_foo | (WT_CONF_ID_bar << 16)
.
That allows the function that will be called (by the macro) to take a uint64_t arg, whose layers can be peeled off as we are evaluating composite queries. It also means that we have a depth limit of four for subcategories.
A WT_CONF
and all its sub-configurations fit contiguously in memory. The struct WT_CONF
for the configuration in question appears first, followed by the WT_CONF
for each sub-configurations, and additional WT_CONF
structs for any other subordinate configuration. These effective form a WT_CONF
array. Following this array, all the WT_CONF_VALUE structs that are used appear in a single array. To avoid extra calculations during processing, each WT_CONF
contains an offset to the slice of the WT_CONF_VALUE
array that it uses.
As an example, consider compiling the default configuration string for WT_SESSION::begin_transaction
. Default configuration strings for every API (compiled or not) are listed in config_def.c
. The string for begin_transaction
lists all the legal keys and their default value:
Notice that this has a sub-configuration:
that will have its own WT_CONF
struct. In total, we have two WT_CONF structs. There are a total of eight top level keys, and two keys associated with roundup_timestamps
. So we'll have an array of ten WT_CONF_VALUE structs. This looks like:
You can see that value_map
is a sparse array; only a small number of its 300 some entries are non-zero. Both value_map arrays are actually the same size, the second one is much more sparse, so the zeroed gaps are larger than they appear in this picture. Each entry in the map is a byte, which currently limits the number of WT_CONF_VALUE array entries to 256.
Let's look at how we would look up read_timestamp
in this WT_CONF
. We'll always start at the top WT_CONF
. WT_CONF_ID_read_timestamp
is defined as 4
, which means we'll look at value_map
[4]. That is the entry that has 7
. We then use 7
along with the offset to the associated WT_CONF_VALUE
array using a 1-based lookup. And that leads us to the WT_CONF_VALUE
for read_timestamp
. A WT_CONF_VALUE
is a union, and its tag indicates that the WT_CONFIG_ITEM
part of the union is used. That WT_CONFIG_ITEM
is filled in according to the value indicated in the default string, that is, "read_timestamp="
. That WT_CONFIG_ITEM
is what is returned by the look up.
For a composite key, let's examine looking up roundup_timestamps.read
. The definition of WT_CONF_ID_roundup_timestamps
is 157
, the top level value_map
[157] has 8
, that takes us to the eighth WT_CONF_VALUE
. Its union tag indicates it has sub-configuration at offset 1. That is, the second WT_CONF
in the conf array. WT_CONF_ID_read
is 159
, so now we use the value_map
in the second WT_CONF
, that takes us to entry 2. We use the offset from the second WT_CONF
to finally find the matching WT_CONF_VALUE
for roundup_timestamps.read
. This is tagged as a WT_CONFIG_ITEM
, so that's what we'll return.
When we compile any configuration string for a given API, we always know the total size of the WT_CONF
and all of its parts, even without looking at the configuration string. That is because a configuration string constructed by a calling program is always used in conjunction with the set of defaults. So any WT_CONF
for compiled string includes the values given by the input string, as well as the values given by the defaults. Using this approach means checking and getting configuration values involves looking at only one data structure. The cost is that any compiled WT_CONF
struct maybe large. However, we expect that most programs will only have a small number of pre-compiled strings.
Up to now, we've been talking about fixed configuration strings. A more typical case is a configuration string where one or more parts of it may vary from call to call. For example, with begin_transaction, we might want to give each transaction a read_timestamp, and we only know that when we are ready to do the call. The compilation system allows unbound parameters that can later be bound. Currently, there are two kinds of bindings:
These special values indicate a deferred binding. For example, if we compile the string:
we'll need a way to mark these unbound parameters in the WT_CONF
struct (discussed below). Before using the compiled string, the API user will call WT_SESSION::bind_configuration with the actual values. Using the above compiled string, the caller might say:
The binding values are first checked for validity if needed, just as a configuration string checks its values. The values are then put into the session in an array of WT_CONFIG_ITEM
. The ordering of values given in the bind_configuration
must match the order of the ''
entries given in the configuration string.
We said we need a way to mark the unbound parameters in the WT_CONF
during compilation. That is done in the associated WT_CONF_VALUE
struct. It has a tag indicating a binding, and thus contains a WT_CONF_BIND_DESC
struct. This struct has the offset into the session's array of bound values. Also, an ordered list of pointers to these WT_CONF_BIND_DESC structs makes it easy to find the right binding during the bind_configuration
call.
This is one of the few times we allow pointers in WT_CONF
. It is safe because unbound parameters are not used internally by WiredTiger. That is, a WT_CONF
built for a default configuration has no unbound parameters, and thus has no pointers. So it can be freely copied when we create a WT_CONF
as a basis for a user's request to compile a configuration string.
The WT_SESSION::bind_configuration call will be used to fill entries in an array
encapsulated in the session with the WT_CONF_BINDINGS
struct. Each element of this array has two entries, a pointer to the WT_CONF_BIND_DESC
descriptor used and the WT_CONFIG_ITEM
.
The WT_SESSION::bind_configuration does the following steps.
WT_CONF
struct we are binding to. See The Compiled String.WT_CONF
, get the list of unbound parameters WT_CONF->binding_descriptions
.WT_CONF_BIND_DESC
pointer from the WT_CONF->binding_descriptions array. – advance the position pointer in the session to the next array entry for parameter bindings.At runtime, if we look up the key in question, in the WT_CONF_VALUE union, we would find a WT_CONF_BIND_DESC
, and know what offset in the session array to find the needed WT_CONFIG_ITEM
. For sanity, we can check that the WT_CONF_BIND_DESC
pointer in the array matches the union entry we are on.
In dist
, API methods that support compilation are marked in api_data.py
. api_config.py
contains functions that use the data in api_data.py
to create or modify files in src/include
, including config.h
and conf.h
. Knowledge of the configuration strings is used to set up initialized data structures that are used for compilation, argument checking, and default values.
For any API method that supports compilation, we do pre-compilation when a connection is opened. That is, we compile the configuration string that has all the default values listed, and store the result. The gives us a ready made "default" WT_CONF
, that is stored in the connection. This is used for a common case when the API is called with no configuration string. It also is a basis for when we compile a configuration string.
Compiling a configuration string starts by knowing the size of the WT_CONF struct. We know the size of the default WT_CONF
, derived from how many sub-configurations it has, and how many total keys it has. Our compiled configuration is the same size, so we allocate storage for it, and do a memory copy of the default WT_CONF
. Then we can fill any parts that change.
A call to WT_SESSION::compile_configuration compiles a configuration string and returns a const char *
that can be used by the caller in place of a compilation string. When WiredTiger compiles the string a we've described, we end up with a WT_CONF
struct, and we'd like to return that to the caller. For debugging purposes, we'd like what we return to look like a string.
Our current implementation returns a special string that is an offset into a dummy array associated with the WT_CONNECTION. That is, we create an string of size 1024 and put it into WT_CONNECTION->conf_dummy
. When the application calls WT_SESSION::compile_configuration to compile a string the Nth time, we put a pointer to the compiled WT_CONF
into entry N of the WT_CONNECTION->conf_array
struct, and return &WT_CONNECTION->conf_dummy[N]
. When the application uses that string later, we can easily recognize that it is an offset into the conf_dummy
array, and find the needed WT_CONF
.
Above, we state that we have a separate identifier for every key used in the configuration system, that's more than 300 keys. Data structures could be smaller, and possibly made faster, if we had a smaller number of keys in play. Since each API might have only a small fraction of keys, perhaps a dozen or so, might it make sense to scope the identifiers. For example, instead of having:
could we not have something like:
These scoped numbers would only need to be unique within their own API, so they could be a lot smaller.
This approach could work in some cases, but there are places in WiredTiger where certain sets of configuration keys are known to multiple APIs. There are then shared utility functions that manage the configuration parsing of these sets of keys. These utility functions currently take configuration string arguments, and would be converted over to using WT_CONF
arguments. But such utility functions aren't necessarily scoped, for example, they may be used by multiple APIs. This is a case where we would want values between multiple APIs to be the same. There are workarounds for this, but it rapidly becomes messy. For simplicity, we went with identifiers that have no scope.