Data Structures | Source Location | |
---|---|---|
|
|
Caution: the Architecture Guide is not updated in lockstep with the code base and is not necessarily correct or complete for any specific release.
The WiredTiger extension API allows control over how files are stored via the storage source abstraction layer. More information can be found about customizing storage sources in Custom Tiered Storage sources. There are three storage sources that extend WiredTiger's capability to store data files into cloud storage through Azure, Google and AWS providers. The cloud storage extensions provide a means to construct, configure and control a custom file system linked to a cloud bucket. All three storage source implementations internally interact with the cloud storage through their own C++ SDKs.
All storage sources implement the following key functionalities defined by the storage source interface:
WT_FILE_SYSTEM
. The file system has a one-to-one connection to a bucket and requires an access token to perform operations on the bucket.All extensions have a connection class which represents an active connection to a bucket. It encapsulates and implements all the operations the filesystem requires. Instantiation of the class object authenticates with the cloud provider and validates the existence of, and access to, the configured bucket. The connection class exposes APIs to list the bucket contents filtered by a directory and a prefix, check for an object's existence in the bucket, put an object in the cloud, and get an object from the cloud. Though not required for the file system's implementation, the class also provides the means to delete objects, to clean up artifacts from internal unit testing.
WiredTiger supports a database over a custom file system by exposing a WT_FILE_SYSTEM
interface for the end-user to implement. The store itself provides a means to configure as many file systems as needed, each linked to a bucket. These file systems - GCP_FILE_SYSTEM
, AZURE_FILE_SYSTEM
and S3_FILE_SYSTEM
- are custom implementations provided by the storage source. These file systems contain an instance of the connection class, hence providing an active connection to the bucket embedded in the file system instance.
The cloud file system uses the functionality provided by the connection class to simulate a write-once-read-many file system. The filesystem can list directory contents, check for a file's presence, and access the file by creating a file handle.
The storage sources use the file system's put-object functionality to flush files into the cloud. The Tier Manager inside WiredTiger directs this flush. Only S3 storage source files pushed to the cloud are also copied to the local cache. See Tiered Storage and Tiered Storage.
Each storage source also implements the WT_FILE_HANDLE
interface, to access the files on the file system. Since the object store is read-only, all the write access methods are disabled.
The cloud provider SDKs use the term logging differently to WiredTiger. The SDK's logging definition refers to the error or informational messages that are generated when performing calls to the SDK. The SDKs also provide their own log levels which are translated into WiredTiger messaging levels.
All of the cloud storage sources have their own messaging system and statistics to monitor the state of the SDKs and the storage sources. The messaging system captures error and informational messages from both the SDK and the custom storage source which are redirected into the WiredTiger messaging system. The statistics are printed when the WiredTiger verbosity is set to WT_VERBOSE
or higher on the storage source.
The S3 store implements the following functionality defined by the storage source interface:
WT_FILE_HANDLE
implementation native to WiredTiger. Hence, a WT_FILE_HANDLE
is kept encapsulated in the S3 file handle.The Azure and Google Cloud Platform (GCP) stores implement the following functionality:
GOOGLE_APPLICATION_CREDENTIALS
environment variable with the path to an authentication file and Azure requires the AZURE_STORAGE_CONNECTION_STRING
environment variable to be set.