Version 3.0.0
Schema, Columns, Column Groups, Indices and Projections

While many tables have simple key/value pairs for records, WiredTiger also supports more complex data patterns.

Tables, rows and columns

A table is a logical representation of data consisting of cells in rows and columns. For example, a database might have a simple table including an employee identifier, last name and first name, and a salary:

Employee IDLast NameFirst NameSalary
1SmithJoe40000
2JonesMary50000
3JohnsonCathy44000

A row-oriented database would store all of the values for the first employee in the first row, then the next employee's values in the next row, and so on:

      1,Smith,Joe,40000
      2,Jones,Mary,50000
      3,Johnson,Cathy,44000

A column-oriented database would store all of the values of a column together, then the values of the next column, and so on:

      1,2,3
      Smith,Jones,Johnson
      Joe,Mary,Cathy
      40000,50000,44000

WiredTiger supports both storage formats, and can mix and match the storage of columns within a logical table.

A table in WiredTiger consist of one or more "column groups" that together hold all of the columns in primary key order; and zero or more indices that enable fast lookup of records by columns in orders other than the primary key.

Applications describe the format of their data by supplying a schema to WT_SESSION::create. This specifies how the application's data can be split into fields and mapped onto rows and columns.

Column types

By default, WiredTiger works as a traditional key/value store, where the keys and values are raw byte arrays accessed using a WT_ITEM structure. Key and value types may also be chosen from a list, or composed of multiple columns with any combination of types. Keys and values may be up to (4GB - 512B) bytes in size.

See Key/Value pairs for more details on raw key / value items.

Format types

WiredTiger's uses format strings similar to those specified in the Python struct module to describe the types of columns in a table: http://docs.python.org/library/struct

FormatC TypeJava typePython typeNotes
x N/AN/AN/Apad byte, no associated value
b int8_t byte int signed byte
B uint8_t byte int unsigned byte
h int16_t short int signed 16-bit
H uint16_t short int unsigned 16-bit
i int32_t int int signed 32-bit
I uint32_t int int unsigned 32-bit
l int32_t int int signed 32-bit
L uint32_t int int unsigned 32-bit
q int64_t long int signed 64-bit
Q uint64_t long int unsigned 64-bit
r uint64_t long int record number
s char[]String str fixed-length string
S char[]String str NUL-terminated string
t uint8_t byte int fixed-length bit field
u WT_ITEM *byte[]str raw byte array

The 'r' type is used for record number keys in column stores. It is otherwise identical to the 'Q' type.

The 's' type is used for fixed-length strings. If it is preceded by a size, that indicates the number of bytes to store; the default is a length of 1 byte.

The 'S' type is encoded as a C language string terminated by a NUL character.

The 't' type is used for fixed-length bit field values. If it is preceded by a size, that indicates the number of bits to store, between 1 and 8. That number of low-order bits will be stored in the table. The default is a size of 1 bit: that is, a boolean. C applications must always use a uint8_t type (or equivalently, unsigned char) for calls to WT_CURSOR::set_value, and a pointer to the same for calls to WT_CURSOR::get_value. If a bit field value is combined with other types in a packing format, it is equivalent to 'B', and a full byte is used to store it.

When referenced by a record number (that is, a key format of 'r'), the 't' type will be stored in a fixed-length column-store, and will not have an out-of-band value to indicate the record does not exist. In this case, a 0 byte value is used to indicate the record does not exist. This means removing a record with WT_CURSOR::remove is equivalent to storing a value of 0 in the record with WT_CURSOR::update (and storing a value of 0 in the record will cause cursor scans to skip the record). Additionally, creating a record past the end of an object implies the creation of any missing intermediate records, with byte values of 0.

The 'u' type is for raw byte arrays: if it appears at the end of a format string (including in the default "u" format for untyped tables), the size is not stored explicitly. When 'u' appears within a format string, the size is stored as a 32-bit integer in the same byte order as the rest of the format string, followed by the data.

There is a default collator that gives lexicographic (byte-wise) comparisons, and the default encoding is designed so that lexicographic ordering of encoded keys is usually the expected ordering. For example, the variable-length encoding of integers is designed so that they have the natural integer ordering under the default collator.

See Packing and Unpacking Data for details of WiredTiger's packing format.

WiredTiger can also be extended with custom collators by implementing the WT_COLLATOR interface.

Key and value formats

Every table has a key format and a value format as describe in Column types. These types are configured when the table is created by passing key_format and value_format keys to WT_SESSION::create.

For example, a simple row-store table with strings as both keys and values would be created as follows:

error_check(session->create(session,
"table:mytable", "key_format=S,value_format=S"));

A simple column-store table with strings for values would be created as follows:

error_check(session->create(session,
"table:mytable", "key_format=r,value_format=S"));
error_check(session->alter(session,
"table:mytable", "access_pattern_hint=random"));

Cursor formats

Cursors for a table have the same key format as the table itself. The key columns of a cursor are set with WT_CURSOR::set_key and accessed with WT_CURSOR::get_key. WT_CURSOR::set_key is analogous to printf, and takes a list of value in the order the key columns are configured in key_format.

For example, setting the key for a row-store table with strings as keys would be done as follows:

/* Set the cursor's string key. */
const char *key = "another key";
cursor->set_key(cursor, key);

For example, setting the key for a column-store table would be done as follows:

uint64_t recno = 37; /* Set the cursor's record number key. */
cursor->set_key(cursor, recno);

A more complex example, setting a composite key for a row-store table where the key_format was "SiH", would be done as follows:

/* Set the cursor's "SiH" format composite key. */
cursor->set_key(cursor, "first", (int32_t)5, (uint16_t)7);

The key's values are accessed with WT_CURSOR::get_key, which is analogous to scanf, and takes a list of pointers to values in the same order:

const char *key; /* Get the cursor's string key. */
error_check(cursor->get_key(cursor, &key));
uint64_t recno; /* Get the cursor's record number key. */
error_check(cursor->get_key(cursor, &recno));
/* Get the cursor's "SiH" format composite key. */
const char *first;
int32_t second;
uint16_t third;
error_check(cursor->get_key(cursor, &first, &second, &third));

Cursors for a table have the same value format as the table, unless a projection is configured with WT_SESSION::open_cursor. See Projections for more information.

WT_CURSOR::set_value is used to set value columns, and WT_CURSOR::get_value is used to get value columns, in the same way as described for WT_CURSOR::set_key and WT_CURSOR::get_key.

Columns

The columns in a table can be assigned names by passing a columns key to WT_SESSION::create. The column names are assigned first to the columns in the key_format, and then to the columns in value_format. There must be a name for every column, and no column names may be repeated.

For example, a column-store table with an employee ID as the key and three columns (department, salary and first year of employment), might be created as follows:

/*
* Create a table with columns: keys are record numbers, values are
* (string, signed 32-bit integer, unsigned 16-bit integer).
*/
error_check(session->create(session, "table:mytable",
"key_format=r,value_format=SiH,"
"columns=(id,department,salary,year-started)"));

In this example, the key's column name is id, and the value's column names are department, salary, and year-started (where id maps to the column format r, department maps to the column value format S, salary maps to the value format i and year-started maps to the value format H).

Once the table is created, there is no need to call WT_SESSION::create during subsequent runs of the application. However, it's worthwhile making the call anyway as it both verifies the table exists and the table schema matches the schema expected by the application.

Column groups

Once column names are assigned, they can be used to configure column groups. Column groups are primarily used to define storage in order to tune cache behavior, as each column group is stored in a separate file.

There are two steps involved in setting up column groups: first, pass a list of names for the column groups in the colgroups configuration key to WT_SESSION::create. Then make a call to WT_SESSION::create for each column group, using the URI colgroup:<table>:<colgroup name> and a columns key in the configuration. Every column must appear in at least one column group; columns can be listed in multiple column groups, causing the column to be stored in multiple files.

For example, consider the following data being stored in a WiredTiger table:

/* The C struct for the data we are storing in a WiredTiger table. */
typedef struct {
char country[5];
uint16_t year;
uint64_t population;
} POP_RECORD;
static POP_RECORD pop_data[] = {
{ "AU", 1900, 4000000 },
{ "AU", 1950, 8267337 },
{ "AU", 2000, 19053186 },
{ "CAN", 1900, 5500000 },
{ "CAN", 1950, 14011422 },
{ "CAN", 2000, 31099561 },
{ "UK", 1900, 369000000 },
{ "UK", 1950, 50127000 },
{ "UK", 2000, 59522468 },
{ "USA", 1900, 76212168 },
{ "USA", 1950, 150697361 },
{ "USA", 2000, 301279593 },
{ "", 0, 0 }
};

If we primarily wanted to access the population information by itself, but still wanted population information included when accessing other information, we might store all of the columns in one file, and store an additional copy of the population column in another file:

/*
* Create the population table.
* Keys are record numbers, the format for values is (5-byte string,
* uint16_t, uint64_t).
* See ::wiredtiger_struct_pack for details of the format strings.
*/
error_check(session->create(session, "table:poptable",
"key_format=r,"
"value_format=5sHQ,"
"columns=(id,country,year,population),"
"colgroups=(main,population)"));
/*
* Create two column groups: a primary column group with the country
* code, year and population (named "main"), and a population column
* group with the population by itself (named "population").
*/
error_check(session->create(session,
"colgroup:poptable:main", "columns=(country,year,population)"));
error_check(session->create(session,
"colgroup:poptable:population", "columns=(population)"));

Column groups always have the same key as the table. This is particularly useful for column stores, because record numbers are not stored explicitly on disk, so there is no repetition of keys across multiple files. Keys will be replicated in multiple files in the case of row-store column groups.

A cursor can be opened on a column group by passing the column group's URI to the WT_SESSION::open_cursor method. For example, the population can be retrieved from both of the column groups we created:

/*
* Open a cursor on the main column group, and return the information
* for a particular country.
*/
error_check(session->open_cursor(
session, "colgroup:poptable:main", NULL, NULL, &cursor));
cursor->set_key(cursor, 2);
error_check(cursor->search(cursor));
error_check(cursor->get_value(cursor, &country, &year, &population));
printf(
"ID 2: country %s, year %" PRIu16 ", population %" PRIu64 "\n",
country, year, population);
/*
* Open a cursor on the population column group, and return the
* population of a particular country.
*/
error_check(session->open_cursor(session,
"colgroup:poptable:population", NULL, NULL, &cursor));
cursor->set_key(cursor, 2);
error_check(cursor->search(cursor));
error_check(cursor->get_value(cursor, &population));
printf("ID 2: population %" PRIu64 "\n", population);

Key columns may not be included in the list of columns for a column group. Because column groups always have the same key as the table, key columns for column groups are retrieved using WT_CURSOR::get_key, not WT_CURSOR::get_value.

Indices

Columns are also used to create and configure indices on tables.

Table indices are automatically updated whenever the table is modified.

Table index cursors are read-only and cannot be used for update operations.

To create a table index, call WT_SESSION::create using the URI index:<table>:<index name>, listing a column in the configuration.

Continuing the example, we might open an index on the country column:

/* Create an index with a simple key. */
error_check(session->create(session,
"index:poptable:country", "columns=(country)"));

Cursors are opened on indices by passing the index's URI to the WT_SESSION::open_cursor method.

Index cursors use the specified index key columns for WT_CURSOR::get_key and WT_CURSOR::set_key. For example, we can retrieve information from the country index as follows:

/* Search in a simple index. */
error_check(session->open_cursor(session,
"index:poptable:country", NULL, NULL, &cursor));
cursor->set_key(cursor, "AU\0\0\0");
error_check(cursor->search(cursor));
error_check(cursor->get_value(cursor, &country, &year, &population));
printf("AU: country %s, year %" PRIu16 ", population %" PRIu64 "\n",
country, year, population);

To create an index with a composite key, specify more than one column to the WT_SESSION::create call:

/* Create an index with a composite key (country,year). */
error_check(session->create(session,
"index:poptable:country_plus_year", "columns=(country,year)"));

To retrieve information from a composite index requires a more complicated WT_CURSOR::set_key call, but is otherwise the same:

/* Search in a composite index. */
error_check(session->open_cursor(session,
"index:poptable:country_plus_year", NULL, NULL, &cursor));
cursor->set_key(cursor, "USA\0\0", (uint16_t)1900);
error_check(cursor->search(cursor));
error_check(cursor->get_value(cursor, &country, &year, &population));
printf(
"US 1900: country %s, year %" PRIu16 ", population %" PRIu64 "\n",
country, year, population);

Immutable indices

It is possible to create an index with the immutable configuration setting enabled. This setting tells WiredTiger that the index keys for a record do not change when records are updated. This is an optimization that it saves a remove and insert into the index whenever a value in the primary table is updated.

If immutable is configured when updates should alter the content of the index it is possible to corrupt data.

An example of using an immutable index is:

/* Create an immutable index. */
error_check(session->create(session,
"index:poptable:immutable_year", "columns=(year),immutable"));

Index cursor projections

By default, index cursors return all of the table's value columns from WT_CURSOR::get_value. The application can specify that a subset of the usual columns should be returned in calls to WT_CURSOR::get_value by appending a list of columns to the uri parameter of the WT_SESSION::open_cursor call. This is called a projection, see Projections for more details.

In the case of index cursors, a projection can be used to avoid lookups in column groups that do not hold columns relevant to the operation.

The following example will return just the table's primary key (a record number, in this case) from the index:

/*
* Use a projection to return just the table's record number key
* from an index.
*/
error_check(session->open_cursor(session,
"index:poptable:country_plus_year(id)", NULL, NULL, &cursor));
while ((ret = cursor->next(cursor)) == 0) {
error_check(cursor->get_key(cursor, &country, &year));
error_check(cursor->get_value(cursor, &recno));
printf("row ID %" PRIu64 ": country %s, year %" PRIu16 "\n",
recno, country, year);
}
scan_end_check(ret == WT_NOTFOUND);

Here is an example of a projection that returns a subset of columns from the index:

/*
* Use a projection to return just the population column from an
* index.
*/
error_check(session->open_cursor(session,
"index:poptable:country_plus_year(population)",
NULL, NULL, &cursor));
while ((ret = cursor->next(cursor)) == 0) {
error_check(cursor->get_key(cursor, &country, &year));
error_check(cursor->get_value(cursor, &population));
printf("population %" PRIu64 ": country %s, year %" PRIu16 "\n",
population, country, year);
}
scan_end_check(ret == WT_NOTFOUND);

For performance reasons, it may be desirable to include all columns for a performance-critical operation in an index, so that it is possible to perform index-only lookups where no column group from the table is accessed. In this case, all of the "hot" columns should be included in the index (always list the "real" index key columns first, so they will determine the sort order). Then, open a cursor on the index that doesn't return any value columns, and no column group will be accessed.

/*
* Use a projection to avoid accessing any other column groups when
* using an index: supply an empty list of value columns.
*/
error_check(session->open_cursor(session,
"index:poptable:country_plus_year()", NULL, NULL, &cursor));
while ((ret = cursor->next(cursor)) == 0) {
error_check(cursor->get_key(cursor, &country, &year));
printf("country %s, year %" PRIu16 "\n", country, year);
}
scan_end_check(ret == WT_NOTFOUND);

Index cursors for column-store objects may not be created using the record number as the index key (there is no use for a secondary index on a column-store where the index key is the record number).

Code samples

The code included above was taken from the complete example program ex_schema.c.

Here is another example program, ex_call_center.c.

/*
* In SQL, the tables are described as follows:
*
* CREATE TABLE Customers(id INTEGER PRIMARY KEY,
* name VARCHAR(30), address VARCHAR(50), phone VARCHAR(15))
* CREATE INDEX CustomersPhone ON Customers(phone)
*
* CREATE TABLE Calls(id INTEGER PRIMARY KEY, call_date DATE,
* cust_id INTEGER, emp_id INTEGER, call_type VARCHAR(12),
* notes VARCHAR(25))
* CREATE INDEX CallsCustDate ON Calls(cust_id, call_date)
*
* In this example, both tables will use record numbers for their IDs, which
* will be the key. The C structs for the records are as follows.
*/
/* Customer records. */
typedef struct {
uint64_t id;
const char *name;
const char *address;
const char *phone;
} CUSTOMER;
/* Call records. */
typedef struct {
uint64_t id;
uint64_t call_date;
uint64_t cust_id;
uint64_t emp_id;
const char *call_type;
const char *notes;
} CALL;
error_check(conn->open_session(conn, NULL, NULL, &session));
/*
* Create the customers table, give names and types to the columns.
* The columns will be stored in two groups: "main" and "address",
* created below.
*/
error_check(session->create(session, "table:customers",
"key_format=r,"
"value_format=SSS,"
"columns=(id,name,address,phone),"
"colgroups=(main,address)"));
/* Create the main column group with value columns except address. */
error_check(session->create(session,
"colgroup:customers:main", "columns=(name,phone)"));
/* Create the address column group with just the address. */
error_check(session->create(session,
"colgroup:customers:address", "columns=(address)"));
/* Create an index on the customer table by phone number. */
error_check(session->create(session,
"index:customers:phone", "columns=(phone)"));
/* Populate the customers table with some data. */
error_check(session->open_cursor(
session, "table:customers", NULL, "append", &cursor));
for (custp = cust_sample; custp->name != NULL; custp++) {
cursor->set_value(cursor,
custp->name, custp->address, custp->phone);
error_check(cursor->insert(cursor));
}
error_check(cursor->close(cursor));
/*
* Create the calls table, give names and types to the columns. All the
* columns will be stored together, so no column groups are declared.
*/
error_check(session->create(session, "table:calls",
"key_format=r,"
"value_format=qrrSS,"
"columns=(id,call_date,cust_id,emp_id,call_type,notes)"));
/*
* Create an index on the calls table with a composite key of cust_id
* and call_date.
*/
error_check(session->create(session,
"index:calls:cust_date", "columns=(cust_id,call_date)"));
/* Populate the calls table with some data. */
error_check(session->open_cursor(
session, "table:calls", NULL, "append", &cursor));
for (callp = call_sample; callp->call_type != NULL; callp++) {
cursor->set_value(cursor, callp->call_date, callp->cust_id,
callp->emp_id, callp->call_type, callp->notes);
error_check(cursor->insert(cursor));
}
error_check(cursor->close(cursor));
/*
* First query: a call arrives. In SQL:
*
* SELECT id, name FROM Customers WHERE phone=?
*
* Use the cust_phone index, lookup by phone number to fill the
* customer record. The cursor will have a key format of "S" for a
* string because the cust_phone index has a single column ("phone"),
* which is of type "S".
*
* Specify the columns we want: the customer ID and the name. This
* means the cursor's value format will be "rS".
*/
error_check(session->open_cursor(session,
"index:customers:phone(id,name)", NULL, NULL, &cursor));
cursor->set_key(cursor, "123-456-7890");
error_check(cursor->search(cursor));
error_check(cursor->get_value(cursor, &cust.id, &cust.name));
printf("Read customer record for %s (ID %" PRIu64 ")\n",
cust.name, cust.id);
error_check(cursor->close(cursor));
/*
* Next query: get the recent order history. In SQL:
*
* SELECT * FROM Calls WHERE cust_id=? ORDER BY call_date DESC LIMIT 3
*
* Use the call_cust_date index to find the matching calls. Since it is
* is in increasing order by date for a given customer, we want to start
* with the last record for the customer and work backwards.
*
* Specify a subset of columns to be returned. (Note that if these were
* all covered by the index, the primary would not have to be accessed.)
* Stop after getting 3 records.
*/
error_check(session->open_cursor(session,
"index:calls:cust_date(cust_id,call_type,notes)",
NULL, NULL, &cursor));
/*
* The keys in the index are (cust_id,call_date) -- we want the largest
* call date for a given cust_id. Search for (cust_id+1,0), then work
* backwards.
*/
cust.id = 1;
cursor->set_key(cursor, cust.id + 1, 0);
error_check(cursor->search_near(cursor, &exact));
/*
* If the table is empty, search_near will return WT_NOTFOUND, else the
* cursor will be positioned on a matching key if one exists, or an
* adjacent key if one does not. If the positioned key is equal to or
* larger than the search key, go back one.
*/
if (exact >= 0)
error_check(cursor->prev(cursor));
for (count = 0; count < 3; ++count) {
error_check(cursor->get_value(cursor,
&call.cust_id, &call.call_type, &call.notes));
if (call.cust_id != cust.id)
break;
printf("Call record: customer %" PRIu64 " (%s: %s)\n",
call.cust_id, call.call_type, call.notes);
error_check(cursor->prev(cursor));
}