Applications can implement their own custom data sources underneath WiredTiger using the WT_DATA_SOURCE interface. Each data source should support a set of methods for a different URI type (for example, in the same way WiredTiger supports the built-in type "file:", an application data source might support the type "dsrc:").
The WiredTiger distribution includes an example of a complete custom data source implementation (based on Oracle's Berkeley DB database engine), in the file test/format/kvs_bdb.c. This example implementation is public domain software, please feel free to use this code as a prototype implementation of other custom data sources.
Applications register their WT_DATA_SOURCE interface implementations with WiredTiger using the WT_CONNECTION::add_data_source method:
Any of the WT_DATA_SOURCE methods may be initialized to NULL. Calling uninitialized WT_DATA_SOURCE methods through a WiredTiger API will result in an "operation not supported" error. For example, an underlying data source that does not support a compaction operation should set the compact
field of their WT_DATA_SOURCE structure to NULL.
When implementing a custom data source, you may want to consider the 'r' key and 't' value formats, and whether or not they should be implemented.
The 'r' key format indicates keys are uint64_t
typed record numbers. In this case, cursor methods will be passed and/or return a record number in the recno
field of the WT_CURSOR structure.
Cursor methods for data sources supporting fixed-length bit field column-stores (that is, a store with an 'r' type key and 't' type value) should accept and/or return a single byte in the value field of the WT_CURSOR structure, where between 1 and 8 of the low-order bits of that byte contain the bit-field's value. For example, if the value format is "3t", the key's value is the bottom 3 bits.
The WT_DATA_SOURCE::open_cursor method is responsible for allocating and returning a WT_CURSOR structure. The only fields of the WT_CURSOR structure that should be initialized by WT_DATA_SOURCE::open_cursor are a subset of the underlying methods it supports.
The following methods of the WT_CURSOR structure must be initialized:
No other methods of the WT_CURSOR structure should be initialized, for example initializing WT_CURSOR::get_key or WT_CURSOR::set_key will have no effect.
Incorrectly configuring the WT_CURSOR methods will likely result in a core dump.
The data source's WT_CURSOR::close method is responsible for discarding the allocated WT_CURSOR structure returned by WT_DATA_SOURCE::open_cursor.
Custom data sources supporting column-stores (that is, a store with an 'r' type key) should consider the append
configuration string optionally specified for the WT_DATA_SOURCE::open_cursor method. The append
string configures WT_CURSOR::insert to allocate and return an new record number key.
Custom data sources should consider the overwrite
configuration string optionally specified for the WT_DATA_SOURCE::open_cursor method. If the overwrite
configuration is true
, WT_CURSOR::insert, WT_CURSOR::remove and WT_CURSOR::update should ignore the current state of the record, and these methods will succeed regardless of whether or not the record previously exists.
When an application configures overwrite
to false
, WT_CURSOR::insert should fail with WT_DUPLICATE_KEY if the record previously exists, and WT_CURSOR::update and WT_CURSOR::remove will fail with WT_NOTFOUND if the record does not previously exist.
Custom data sources supporting fixed-length bit field column-stores (that is, a store with an 'r' type key and 't' type value) may, but are not required to, support the semantic that inserting a new record after the current maximum record in a store implicitly creates the missing records as records with a value of 0.
Custom data source cursor methods are expected to use the recno
, key
and value
fields of the WT_CURSOR handle.
The recno
field is a uint64_t type and is the record number set or retrieved using the cursor when the data source was configured for record number keys.
The key
and value
fields are WT_ITEM structures. The key.data
, key.size
, value.data
and value.size
fields are read by the cursor methods storing items in the underlying data source and set by the cursor methods retrieving items from the underlying data source.
Some specific errors should be mapped to WiredTiger errors:
Otherwise, successful return from custom data source methods should be indicated by a return value of zero; error returns may be any value other than zero or an error in WiredTiger's Error handling. A simple approach is to always return either a system error value (that is, errno
), or WT_ERROR
. Error messages can be displayed using the WT_SESSION::msg_printf method. For example:
Configuration information is provided to each WT_DATA_SOURCE method as an argument. This configuration information can be parsed using the WT_EXTENSION_API::config method, and is returned in a WT_CONFIG_ITEM structure.
For example, the append
, overwrite
key_format
and value_format
configuration strings may be required for the WT_DATA_SOURCE::open_cursor method to correctly configure itself.
A key_format
value of "r" indicates the data source is being configured to use record numbers as keys. In this case initialized WT_CURSOR methods must take their key value from the cursor's recno
field, and not the cursor's key
field. (It is not required that record numbers be supported by a custom data source, the WT_DATA_SOURCE::open_cursor method can return an error if an application attempts to configure a data source with a key_format
of "r".)
The WT_DATA_SOURCE::open_cursor method might retrieve the boolean or integer value of a configuration string as follows:
The WT_DATA_SOURCE::open_cursor method might retrieve the string value of a configuration string as follows:
Applications can add their own configuration strings to WiredTiger methods using WT_CONNECTION::configure_method.
WT_CONNECTION::configure_method takes the following arguments:
"WT_SESSION.create"
would add new configuration strings to the WT_SESSION::create method, and "WT_SESSION.open_cursor"
would add the configuration string to the WT_SESSION::open_cursor method."table:"
would extend the configuration arguments for table objects, and "my_data:"
could be used to extend the configuration arguments for a data source with URIs beginning with the "my_data" prefix. A NULL value for the object type implies all object types."device"
, specifying either "device=/path"
or "device=35"
would configure the default values."boolean"
, "int"
, "list"
or "string"
.For example, an application might add new boolean, integer, list or string type configuration strings as follows:
Once these additional configuration calls have returned, application calls to the WT_SESSION::open_cursor method could then include configuration strings such as my_boolean=false
, or my_integer=37
, or my_source=/home
.
Additional checking information can be provided for int
, list
or string
type configuration strings.
For integers, either or both of a maximum or minimum value can be provided, so an error will result if the application attempts to set the value outside of the acceptable range:
For lists and strings, a set of valid choices can also be provided, so an error will result if the application attempts to set the value to a string not listed as a valid choice:
Custom data sources do not support custom ordering of records, and attempting to create a custom data source with a collator configured will fail.
WiredTiger does not serialize calls to the WT_DATA_SOURCE methods or to cursor methods configured by WT_DATA_SOURCE::open_cursor, and the methods may be called from multiple threads concurrently. It is the responsibility of the implementation to protect any shared data. For example, object operations such as WT_DATA_SOURCE::drop might not be permitted while there are open cursors for the WT_DATA_SOURCE object.