Version 1.3.0
Schemas

While many tables have simple key/value pairs for records, WiredTiger also supports more complex data patterns.

Tables, rows and columns

A table is a logical representation of data consisting of cells in rows and columns. For example, a database might have this table.

EmpIdLastnameFirstnameSalary
1SmithJoe40000
2JonesMary50000
3JohnsonCathy44000

This simple table includes an employee identifier, last name and first name, and a salary.

A row-oriented database would store all of the values in a row together, then the values in the next row, and so on:

      1,Smith,Joe,40000;
      2,Jones,Mary,50000;
      3,Johnson,Cathy,44000;

A column-oriented database stores all of the values of a column together, then the values of the next column, and so on:

      1,2,3;
      Smith,Jones,Johnson;
      Joe,Mary,Cathy;
      40000,50000,44000;

WiredTiger supports both storage formats, and can mix and match the storage of columns within a logical table.

Applications describe the format of their data by supplying a schema to WT_SESSION::create. This specifies how the application's data can be split into fields and mapped onto rows and columns.

Column types

By default, WiredTiger works as a traditional key/value store, where the keys and values are raw byte arrays accessed using a WT_ITEM structure. Keys and values may be up to (4GB - 512B) bytes in size, but depending on how internal_item_max or leaf_item_max are configured, large key and value items will be stored on overflow pages.

See Key/Value pairs for more details on raw key / value items.

The schema layer allows key and value types to be chosen from a list, or composite keys or values made up of columns with any combination of types. The size (4GB - 512B) byte limit on keys and values still applies.

WiredTiger's uses format strings similar to those specified in the Python struct module to describe the types of columns in a table: http://docs.python.org/library/struct

Format types

FormatC TypeJava typePython typeStandard size
xpad byteN/AN/A1
bsigned charbyteinteger1
Bunsigned charbyteinteger1
hshortshortinteger2
Hunsigned shortshortinteger2
iintintinteger4
Iunsigned intintinteger4
llongintinteger4
Lunsigned longintinteger4
qlong longlonginteger8
Qunsigned long longlonginteger8
ruint64_tlonginteger8
schar[]Stringstringfixed length
Schar[]Stringstringvariable
tunsigned charbyteintegerfixed bit length
uWT_ITEM *byte[]stringvariable

The 'r' type is used for record number keys in column stores.

The 'S' type is encoded as a C language string terminated by a NUL character.

The 't' type is used for fixed-length bit field values. If it is preceded by a size, that indicates the number of bits to store, between 1 and 8. That number of low-order bits will be stored in the table. The default is a size of 1 bit: that is, a boolean. The application must always use an unsigned char type (or equivalently, uint8_t) for calls to WT_CURSOR::set_value, and a pointer to the same for calls to WT_CURSOR::get_value. If a bit field value is combined with other types in a packing format, it is equivalent to 'B', and a full byte is used to store it.

When referenced by a record number (that is, a key format of 'r'), the 't' type will be stored in a fixed-length column-store, and will not have a out-of-band value to indicate the record does not exist. In this case, a 0 byte value is used to indicate the record does not exist. This means removing a record with WT_CURSOR::remove is equivalent to storing a value of 0 in the record with WT_CURSOR::update (and storing a value of 0 in the record will cause cursor scans to skip the record). Additionally, creating a record past the end of an object implies the creation of any missing intermediate records, with byte values of 0.

The 'u' type is for raw byte arrays: if it appears at the end of a format string (including in the default "u" format for untyped tables), the size is not stored explicitly. When 'u' appears within a format string, the size is stored as a 32-bit integer in the same byte order as the rest of the format string, followed by the data.

There is a default collator that gives lexicographic (byte-wise) comparisons, and the default encoding is designed so that lexicographic ordering of encoded keys is usually the expected ordering. For example, the variable-length encoding of integers is designed so that they have the natural integer ordering under the default collator.

See Packing and Unpacking Data for details of WiredTiger's packing format.

WiredTiger can also be extended with WT_COLLATOR.

Columns in key and values

Every table has a key format and a value format as describe in Column types. These types are configured when the table is created by passing key_format and value_format keys to WT_SESSION::create.

Cursors for a table have the same key format as the table itself. The key columns of a cursor are set with WT_CURSOR::set_key and accessed with WT_CURSOR::get_key. WT_CURSOR::set_key is analogous to printf, and takes a list of values in the order the key columns are configured in key_format. The columns values are accessed with WT_CURSOR::get_key, which is analogous to scanf, and takes a list of pointers to values in the same order.

Cursors for a table have the same value format as the table, unless a projection is specified to WT_SESSION::open_cursor. WT_CURSOR::set_value is used to set value columns, and WT_CURSOR::get_value is used to get value columns.

Describing columns

The columns in a table can be assigned names by passing a columns key to WT_SESSION::create. The column names are assigned first to the columns in the key_format, and then to the columns in value_format. There must be a name for every column, and no column names may be repeated.

Storing groups of columns together

Once column names are assigned, they can be used to configure column groups, where groups of columns are stored in separate files.

There are two steps involved in setting up column groups: first, pass a list of names for the column groups in the colgroups configuration key to WT_SESSION::create. Then make a call to WT_SESSION::create for each column group, using the URI colgroup:<table>:<colgroup name> and a columns key in the configuration. Columns can be stored in multiple column groups, but all value columns must appear in at least one column group.

Column groups have the same key as the table. This is particularly useful for column stores, because record numbers are not stored explicitly on disk, so there is no repetition of keys across multiple files. Also note that key columns cannot be stored in column group values: they can be retrieved with WT_CURSOR::get_key.

Adding an index

Schema columns can also be used to configure indices on tables. To create an index, call WT_SESSION::create using the URI index:<table>:<index name> and include a columns key in the configuration. WiredTiger updates all indices for a table whenever the table is modified.

A cursor can be opened on an index by passing the index URI to WT_SESSION::open_cursor. Index cursors use the specified index key columns for WT_CURSOR::get_key and WT_CURSOR::set_key, and by default return all of the table value columns in WT_CURSOR::get_value.

Index cursors are read-only: they cannot be used to perform updates.

Index cursors for column-store objects may not be created using the record number as the index key.

Code samples

The code below is taken from the complete example program ex_schema.c.

/* The C struct for the data we are storing with WiredTiger. */
typedef struct {
char country[5];
uint16_t year;
uint64_t population;
} POP_RECORD;
ret = conn->open_session(conn, NULL, NULL, &session);
/*
* Create the population table.
* Keys are record numbers, the format for values is
* (5-byte string, short, long).
* See ::wiredtiger_struct_pack for details of the format strings.
*
* If this program is run multiple times so the table already exists,
* this call will verify that the table exists. It is not required in
* that case, but is a safety check that the schema matches what the
* program expects.
*/
ret = session->create(session, "table:population",
"key_format=r,"
"value_format=5sHQ,"
"columns=(id,country,year,population),"
"colgroups=(main,population)");
/* Create the column groups to store population in its own file. */
ret = session->create(session, "colgroup:population:main",
"columns=(country,year)");
ret = session->create(session, "colgroup:population:population",
"columns=(population)");
/* Create an index with composite key (country,year). */
ret = session->create(session, "index:population:country_year",
"columns=(country,year)");
ret = session->open_cursor(session, "table:population",
NULL, "append", &cursor);
endp = pop_data + (sizeof (pop_data) / sizeof(pop_data[0]));
for (p = pop_data; p < endp; p++) {
cursor->set_value(cursor, p->country, p->year, p->population);
ret = cursor->insert(cursor);
}
ret = cursor->close(cursor);
/* Now just read through the countries we know about */
ret = session->open_cursor(session,
"index:population:country_year(id)",
NULL, NULL, &cursor);
while ((ret = cursor->next(cursor)) == 0) {
cursor->get_key(cursor, &country, &year);
cursor->get_value(cursor, &recno);
printf("Got country %s : row ID %d\n", country, (int)recno);
}
ret = conn->close(conn, NULL);

The code below is taken from the complete example program ex_call_center.c.

/*
* In SQL, the tables are described as follows:
*
* CREATE TABLE Customers(id INTEGER PRIMARY KEY,
* name VARCHAR(30), address VARCHAR(50), phone VARCHAR(15))
* CREATE INDEX CustomersPhone ON Customers(phone)
*
* CREATE TABLE Calls(id INTEGER PRIMARY KEY, call_date DATE,
* cust_id INTEGER, emp_id INTEGER, call_type VARCHAR(12),
* notes VARCHAR(25))
* CREATE INDEX CallsCustDate ON Calls(cust_id, call_date)
*
* In this example, both tables will use record numbers for their IDs, which
* will be the key. The C structs for the records are as follows.
*/
/* Customer records. */
typedef struct {
uint64_t id;
const char *name;
const char *address;
const char *phone;
} CUSTOMER;
/* Call records. */
typedef struct {
uint64_t id;
uint64_t call_date;
uint64_t cust_id;
uint64_t emp_id;
const char *call_type;
const char *notes;
} CALL;
ret = conn->open_session(conn, NULL, NULL, &session);
/*
* Create the customers table, give names and types to the columns.
* The columns will be stored in two groups: "main" and "address",
* created below.
*/
ret = session->create(session, "table:customers",
"key_format=r,"
"value_format=SSS,"
"columns=(id,name,address,phone),"
"colgroups=(main,address)");
/* Create the main column group with value columns except address. */
ret = session->create(session,
"colgroup:customers:main", "columns=(name,phone)");
/* Create the address column group with just the address. */
ret = session->create(session,
"colgroup:customers:address", "columns=(address)");
/* Create an index on the customer table by phone number. */
ret = session->create(session,
"index:customers:phone", "columns=(phone)");
/* Populate the customers table with some data. */
ret = session->open_cursor(
session, "table:customers", NULL, "append", &cursor);
for (custp = cust_sample; custp->name != NULL; custp++) {
cursor->set_value(cursor,
custp->name, custp->address, custp->phone);
ret = cursor->insert(cursor);
}
ret = cursor->close(cursor);
/*
* Create the calls table, give names and types to the columns. All the
* columns will be stored together, so no column groups are declared.
*/
ret = session->create(session, "table:calls",
"key_format=r,"
"value_format=qrrSS,"
"columns=(id,call_date,cust_id,emp_id,call_type,notes)");
/*
* Create an index on the calls table with a composite key of cust_id
* and call_date.
*/
ret = session->create(session, "index:calls:cust_date",
"columns=(cust_id,call_date)");
/* Populate the calls table with some data. */
ret = session->open_cursor(
session, "table:calls", NULL, "append", &cursor);
for (callp = call_sample; callp->call_type != NULL; callp++) {
cursor->set_value(cursor, callp->call_date, callp->cust_id,
callp->emp_id, callp->call_type, callp->notes);
ret = cursor->insert(cursor);
}
ret = cursor->close(cursor);
/*
* First query: a call arrives. In SQL:
*
* SELECT id, name FROM Customers WHERE phone=?
*
* Use the cust_phone index, lookup by phone number to fill the
* customer record. The cursor will have a key format of "S" for a
* string because the cust_phone index has a single column ("phone"),
* which is of type "S".
*
* Specify the columns we want: the customer ID and the name. This
* means the cursor's value format will be "rS".
*/
ret = session->open_cursor(session,
"index:customers:phone(id,name)", NULL, NULL, &cursor);
cursor->set_key(cursor, "123-456-7890");
ret = cursor->search(cursor);
if (ret == 0) {
ret = cursor->get_value(cursor, &cust.id, &cust.name);
printf("Read customer record for %s (ID %" PRIu64 ")\n",
cust.name, cust.id);
}
ret = cursor->close(cursor);
/*
* Next query: get the recent order history. In SQL:
*
* SELECT * FROM Calls WHERE cust_id=? ORDER BY call_date DESC LIMIT 3
*
* Use the call_cust_date index to find the matching calls. Since it is
* is in increasing order by date for a given customer, we want to start
* with the last record for the customer and work backwards.
*
* Specify a subset of columns to be returned. (Note that if these were
* all covered by the index, the primary would not have to be accessed.)
* Stop after getting 3 records.
*/
ret = session->open_cursor(session,
"index:calls:cust_date(cust_id,call_type,notes)",
NULL, NULL, &cursor);
/*
* The keys in the index are (cust_id,call_date) -- we want the largest
* call date for a given cust_id. Search for (cust_id+1,0), then work
* backwards.
*/
cust.id = 1;
cursor->set_key(cursor, cust.id + 1, 0);
ret = cursor->search_near(cursor, &exact);
/*
* If the table is empty, search_near will return WT_NOTFOUND, else the
* cursor will be positioned on a matching key if one exists, or an
* adjacent key if one does not. If the positioned key is equal to or
* larger than the search key, go back one.
*/
if (ret == 0 && exact >= 0)
ret = cursor->prev(cursor);
for (count = 0; ret == 0 && count < 3; ++count) {
ret = cursor->get_value(cursor,
&call.cust_id, &call.call_type, &call.notes);
if (call.cust_id != cust.id)
break;
printf("Call record: customer %" PRIu64 " (%s: %s)\n",
call.cust_id, call.call_type, call.notes);
ret = cursor->prev(cursor);
}