Version 11.3.0
Backups

WiredTiger cursors provide access to data from a variety of sources. One of these sources is the list of files required to perform a backup of the database. The list may be the files required by all of the objects in the database, or a subset of the objects in the database.

WiredTiger backups are "on-line" or "hot" backups, and applications may continue to read and write the databases while a snapshot is taken.

Backup from an application

  1. Open a cursor on the "backup:" data source, which begins the process of a backup.
  2. Copy each file returned by the WT_CURSOR::next method to the backup location, for example, a different directory. Do not reuse backup locations unless all files have first been removed from them, in other words, remove any previous backup information before using a backup location.
  3. Close the cursor; the cursor must not be closed until all of the files have been copied.

The directory into which the files are copied may subsequently be specified as a directory to the wiredtiger_open function and accessed as a WiredTiger database home.

Copying the database files for a backup does not require any special alignment or block size (specifically, Linux or Windows filesystems that do not support read/write isolation can be safely read for backups).

The database file may grow in size during the copy, and the file copy should not consider that an error. Blocks appended to the file after the copy starts can be safely ignored, that is, it is correct for the copy to determine an initial size of the file and then copy that many bytes, ignoring any bytes appended after the backup cursor was opened.

The cursor must not be closed until all of the files have been copied, however, there is no requirement the files be copied in any order or in any relationship to the WT_CURSOR::next calls, only that all files have been copied before the cursor is closed. For example, applications might aggregate the file names from the cursor and then list the file names as arguments to a file archiver such as the system tar utility.

During the period the backup cursor is open, database checkpoints can be created, but checkpoints created prior to the backup cursor cannot be deleted. Additionally while the backup cursor is open automatic log file archiving, even if enabled, will not reclaim any log files.

Additionally, if a crash occurs during the period the backup cursor is open and logging is disabled (in other words, when depending on checkpoints for durability), then the system will be restored to the most recent checkpoint prior to the opening of the backup cursor, even if later database checkpoints were completed. Note this exception to WiredTiger's checkpoint durability guarantees.

The following is a programmatic example of creating a backup:

WT_CURSOR *cursor;
const char *filename;
int ret;
/* Create the backup directory. */
error_check(mkdir("/path/database.backup", 077));
/* Open the backup data source. */
error_check(session->open_cursor(session, "backup:", NULL, NULL, &cursor));
/* Copy the list of files. */
while ((ret = cursor->next(cursor)) == 0) {
error_check(cursor->get_key(cursor, &filename));
testutil_system("cp /path/database/%s /path/database.backup/%s", filename, filename);
}
scan_end_check(ret == WT_NOTFOUND);
error_check(cursor->close(cursor));

When logging is enabled, opening the backup cursor forces a log file switch. The reason is so that only data that was committed and visible at the time of the backup is available in the backup when that log file is included in the list of files. WiredTiger offers a mechanism to gather additional log files that may be created during the backup.

Duplicate backup cursors

Since backups can take a long time, it may be desirable to catch up at the end of a backup with the log files so that operations that occurred during backup can be recovered. WiredTiger provides the ability to open a duplicate backup cursor with the configuration target=log:. This secondary backup cursor will return the file names of all log files via dup_cursor->get_key(). There will be overlap with log file names returned in the original cursor. The user only needs to copy file names that are new but there is no error copying all log file names returned. This secondary cursor must be closed explicitly prior to closing the parent backup cursor.

/* Open the backup data source. */
error_check(session->open_cursor(session, "backup:", NULL, NULL, &cursor));
/* Open a duplicate cursor for additional log files. */
error_check(session->open_cursor(session, NULL, cursor, "target=(\"log:\")", &dup_cursor));

Backup from the command line

The wt backup command may also be used to create backups:

rm -rf /path/database.backup &&
mkdir /path/database.backup &&
wt -h /path/database.source backup /path/database.backup

Block-based Incremental backup

Once a full backup has been done, it can be rolled forward incrementally by copying only modified blocks and new files to the backup copy directory. The application is responsible for removing files that are no longer part of the backup when later incremental backups no longer return their name. This is especially important for WiredTiger log files that are no longer needed and must be removed before recovery is run.

Block-based incremental backup can be performed after a bulk load, without an intervening full backup.

The following is the procedure for incrementally backing up a database using block modifications:

  1. Perform a full backup of the database (as described above), with the additional configuration incremental=(enabled=true,this_id="ID1"). The identifier specified in this_id starts block tracking and that identifier can be used in the future as the source of an incremental backup. Identifiers can be any text string, but should be unique.
  2. Begin the incremental backup by opening a backup cursor with the backup: URI and config string of incremental=(src_id="ID1",this_id="ID2"). Call this backup_cursor. Like a normal full backup cursor, this cursor will return the filename as the key. There is no associated value. The information returned will be based on blocks tracked since the time of the previous backup designated with "ID1". New block tracking will be started as "ID2" as well. WiredTiger will maintain modifications from two IDs, the current and the most recently completed one. Note that all backup identifiers are subject to the same naming restrictions as other configuration naming. See Introduction for details.
  3. For each file returned by backup_cursor->next(), open a duplicate backup cursor to do the incremental backup on that file. The list returned will also include log files (prefixed by WiredTigerLog) that need to be copied. Configure that duplicate cursor with "incremental=(file=name)". The name comes from the string returned from backup_cursor->get_key(). Call this incr_cursor.
  4. The key format for the duplicate backup cursor, incr_cursor, is qqq, representing a file offset and size pair plus a type indicator for the range given. There is no associated value. The type indicator will be one of WT_BACKUP_FILE or WT_BACKUP_RANGE. For WT_BACKUP_RANGE, read the block from the source database file indicated by the file offset and size pair and write the block to the same offset in the backup database file, replacing the portion of the file represented by the offset/size pair.
    Warning
    It is not an error for an offset/size pair to extend past the current end of the source file, and any missing file data should be ignored. This is because there can be a long time between incremental backups and the file can expand and shrink a lot in the meantime.
    For WT_BACKUP_FILE, the user can choose to copy the entire file in any way they choose, or to use the offset/size pair which will indicate the expected size WiredTiger knew at the time of the call.
  5. Close the duplicate backup cursor, incr_cursor.
  6. Repeat steps 3-5 as many times as necessary while backup_cursor->next() returns files to copy.
  7. Close the backup cursor, backup_cursor.
  8. Repeat steps 2-7 as often as desired.

Full and incremental backups may be repeated as long as the backup database directory has not been opened and recovery run. Once recovery has run in a backup directory, you can no longer back up to that database directory.

Block-based incremental backup does not work with LSM trees. An error will be returned in that case.

An example of opening the backup data source for block-based incremental backup:

/* Open the backup data source for block-based incremental backup. */
error_check(session->open_cursor(
session, "backup:", NULL, "incremental=(enabled,src_id=ID0,this_id=ID1)", &cursor));

The URI backup:query_id can be used to return existing block incremental identifier strings. It operates like a backup cursor but will return the identifier strings as the keys of the cursor. There are no values. As with all backup cursors, there can only be one backup cursor of any type open at a time.

An example of opening the backup data source to query incremental identifiers:

error_check(session->open_cursor(session, "backup:query_id", NULL, NULL, &backup_cur));
while ((ret = backup_cur->next(backup_cur)) == 0) {
error_check(backup_cur->get_key(backup_cur, &idstr));
printf("Existing incremental ID string: %s\n", idstr);
}
error_check(backup_cur->close(backup_cur));

Log-based Incremental backup

Once a backup has been done, it can be rolled forward incrementally by adding log files to the backup copy. Adding log files to the copy decreases potential data loss from switching to the copy, but increases the recovery time necessary to switch to the copy. To reset the recovery time necessary to switch to the copy, perform a full backup of the database. For example, an application might do a full backup of the database once a week during a quiet period, and then incrementally copy new log files into the backup directory for the rest of the week. Incremental backups may also save time when the tables are very large.

Bulk loads are not commit-level durable, that is, the creation and bulk-load of an object will not appear in the database log files. For this reason, applications doing incremental backups after a full backup should repeat the full backup step after doing a bulk load to make the bulk load appear in the backup. In addition, incremental backups after a bulk load (without an intervening full backup) can cause recovery to report errors because there are log records that apply to data files which do not appear in the backup.

By default, WiredTiger automatically removes log files no longer required for recovery. Applications wanting to use log files for incremental backup must first disable automatic log file removal using the "log=(remove=false)" configuration to wiredtiger_open.

The following is the procedure for incrementally backing up a database and removing log files from the original database home:

  1. Perform a full backup of the database (as described above).
  2. Open a cursor on the "backup:" data source, configured with the "target=(\"log:\")" target specified, which begins the process of an incremental backup.
  3. Copy each log file returned by the WT_CURSOR::next method to the backup directory. It is not an error to copy a log file which has been copied before, but care should be taken to ensure each log file is completely copied as the most recent log file may grow in size while being copied.
  4. If all log files have been successfully copied, remove the log files by calling the WT_SESSION::truncate method with the URI log: and specifying the backup cursor as the start cursor to that method. (Note there is no requirement backups be coordinated with database checkpoints, however, an incremental backup will repeatedly copy the same files, and will not make additional log files available for removal, unless there was a checkpoint after the previous incremental backup.)
  5. Close the backup cursor.

Steps 2-5 can be repeated any number of times before step 1 is repeated. Full and incremental backups may be repeated as long as the backup database directory has not been opened and recovery run. Once recovery has run in a backup directory, you can no longer back up to that database directory.

An example of opening the backup data source for log-based incremental backup:

/* Open the backup data source for log-based incremental backup. */
error_check(session->open_cursor(session, "backup:", NULL, "target=(\"log:\")", &cursor));

Export tables using backup cursor

The URI backup:export can be used to generate WiredTiger.export - a text file that contains metadata for all objects in the database. The file can be specified as the value for metadata_file to import a table. See WT_SESSION::create for more details. The cursor operates like a normal backup cursor, it can be used to iterate over all files needed for a backup. The main difference is that wiredtiger_open will ignore WiredTiger.export, it will not try to start the system using the file nor delete it. As with all backup cursors, there can only be one backup cursor of any type open at a time.

Backup and O_DIRECT

Many Linux systems do not support mixing O_DIRECT and memory mapping or normal I/O to the same file. If O_DIRECT is configured for data or log files on Linux systems (using the wiredtiger_open direct_io configuration), any program used to copy files during backup should also specify O_DIRECT when configuring its file access. Likewise, when O_DIRECT is not configured by the database application, programs copying files should not configure O_DIRECT.

WT_SESSION::open_cursor
int open_cursor(WT_SESSION *session, const char *uri, WT_CURSOR *to_dup, const char *config, WT_CURSOR **cursorp)
Open a new cursor on a data source or duplicate an existing cursor.
WT_CURSOR::get_key
int get_key(WT_CURSOR *cursor,...)
Get the key for the current record.
WT_CURSOR
A WT_CURSOR handle is the interface to a cursor.
Definition: wiredtiger.in:199
WT_CURSOR::next
int next(WT_CURSOR *cursor)
Return the next record.
WT_CURSOR::close
int close(WT_CURSOR *cursor)
Close the cursor.
WT_NOTFOUND
#define WT_NOTFOUND
Item not found.
Definition: wiredtiger.in:4064