Version 2.5.3
WiredTiger Change Log

WiredTiger release 2.5.3, 2015-04-22

The WiredTiger 2.5.3 release contains important bug fixes.

API and behavior changes:

  • Update configuration string parsing to always be case sensitive. See upgrading documentation for more information.
  • Change the statistics cursor WT_CURSOR.reset method to re-load statistics values. See upgrading documentation for more information. refs WT-1533
  • Only align buffers on Linux if direct I/O is configured. See upgrading documentation for more information.
  • Fixes to how and when idle handles are closed. refs WT-1808, WT-1811, WT-1814
  • Add some new statistics related to cache usage.
  • Add new configuration strings to provide control of how often handles are are reviewed for closure, and how long a handle needs to be idle before it is closed. The option is via the wiredtiger_open API, file_manager=(close_idle_time=30,close_scan_interval=10)
  • Add support for running WiredTiger command line utilities without logging. refs WT-1732
  • Update the async configuration API to allow a minimum of 1. That is change: async=(ops_max=X) so that the minimum value is now 1 the old minimum was 10.

Bug fixes and other significant changes:

  • Fixes and improvements to Windows support.
  • Fix several bugs that prevent page eviction. refs SERVER-16662, SERVER-17382, WT-1777
  • Fix a race when stopping eviction workers on shutdown. refs WT-1698
  • Fix a bug where if the system crashes during create the base configuration file could be left in an invalid state. refs WT-1775, WT-1776, SERVER-17571
  • Fix cases where WT_SESSSION::truncate could return EBUSY when a schema level operation is running - for example a checkpoint. refs WT-1404, WT-1643
  • Fix a bug in logging - where we could fail to update the end of the log when there is a gap in the log records. refs WT-1766, SERVER-17569, SERVER-17613
  • Fix how we account for space used in the cache to be more accurate. refs SERVER-17424
  • Fix a bug where we could leak memory if opening a statistics cursor failed. refs WT-1760
  • Fix a bug where a single page could consume a large portion of the cache. Leading to cases where a small cache size could result in a hang. refs WT-1759
  • Fix a bug in the eviction server that could cause a WT_PANIC. The issue was encountered when the number of open handles exceeded the configured number of hazard pointers. refs SERVER-17551
  • Fix a bug parsing huffman configuration options that could lead to a segfault.
  • Fix accounting in btree statistics gathering, so page tracking is accurate. refs WT-1733
  • Fix a memory leak in cache management, where a race during page split could leave a key structure allocated. refs WT-1582, WT-1747
  • Enhance checkpoint tracking code to allow eviction in files once the checkpoint has finished processing them. This helps reduce the impact of checkpoints on workloads with cache pressure. refs WT-1745
  • Fix when aggregation is set on statistics fields. Fixes problems with visualising statistics via wtstats graphs. refs WT-1742
  • Change how checkpoints use empty blocks in on-disk files. Use a first-fit algorithm.

WiredTiger release 2.5.2, 2015-03-23

The WiredTiger 2.5.2 release contains important bug fixes.

API changes:

  • Allow memory_page_max to be at most a quater of the cache size not half. This avoids operations getting stalled due to the cache being filled with one or two pages.

Bug fixes and other important changes:

  • When skipping a dirty page during a checkpoint, make sure the tree is marked dirty. refs SUPPORT-1248, SERVER-17319, SERVER-17506, #1404, #1643, #1721, #1735
  • Fix a bug in range truncate where we could remove the wrong records. refs SERVER-17345
  • Fix a bug in LSM management where we could let the cache get full - leading to a operations being blocked. refs #1720
  • Fix several bugs in the checkpoint implementation that could lead to a tree being marked clean when it had updates in memory. If shutdown occurred at a specific time those updates would be discarded without being written. refs SUPPORT-1248
  • Fix some bugs in logging - where system crashes could leave empty files that would stop recovery working on re-start. refs #1717, #1719, SERVER-17451
  • Fix a bug in recovery. Force recovery instead of returning an error if the LSN given doesn't exist. refs #1700, #1704
  • Move writing into log worker thread to avoid latency in application threads. refs #1683
  • Fix a bug in the reconfigure API related to adhering to shared cache quotas. refs #1712, #1713
  • Fix a bug in WiredTiger statistics where we weren't recording overflow record statistics. refs #1520, #1703, #1711
  • Several enhancements to eviction of large pages including:
    • Don't do forced eviction of a page if it is the current walk point.
    • Don't update the read generation on page in if it's set to oldest.
    • Clear the walk positions before the eviction server sleeps.
    • Reverse the direction of the LRU walk regularly.
    • Add all pages that would block to the eviction queue.
    • If evicting dirty pages use the worker threads not the server. refs #1706
  • Use raw mode when dumping indices. refs #1709
  • Fix a bug where we could race opening files while a WT_CONNECTION::close is in progress. refs SERVER-17319
  • Fix a bug in LSM where snapshot transaction updates could have the wrong visibility check applied. Leading to invalid updates. refs #1641, #1701, #1702
  • Fix a bug in checkpoint where it could get an EBUSY return unnecessarily. refs #1404, #1589, #1705
  • Fix a bug when writing a page from memory to disk (reconciling). We could overwrite the end of a temporary buffer in some cases. refs #1697, #1699
  • Sometimes we would choose a sub-optimal layout for on disk pages when writing them out from memory. refs #1699
  • Improve the performance of in-memory lookups by making the content of the page structure more cache friendly.

WiredTiger release 2.5.1, 2015-03-07

The WiredTiger 2.5.1 release contains new features, minor API changes and many bug fixes.

New features and API changes:

  • Add a new "log=(recover=on)" option to wiredtiger_open. The default value is "on", if set to "error", recovery won't be run on startup. An error will be returned if recovery is needed but disabled. This option is mainly to support the WiredTiger command line utility. refs #1651
  • Add a new WT_CURSOR::equals method that returns when the cursors are equal, intended as a fast-path for cursor comparison.
  • Change how statistics work when there are checkpoints for an object. We used to aggregate statistics for all open checkpoints and the current handle. We now only return statistics for the object being queried. See upgrading documentation for further information.
  • Add a mode to LSM that allows us to limit the size of a tree by dropping old chunks. Enabled via "lsm=(chunk_count_limit=0)", default to 0 which disables the functionality. Note that enabling this feature discards old data automatically. refs #1652
  • Update the WiredTiger printlog command line utility to generate JSON that can be parsed by third party tools. refs #1438
  • Change how we track memory allocation overhead. We used to apply a fixed size for each allocation (which was difficult to track). The overhead can be specified via a new configuration option to wiredtiger_open using "cache_overhead=8". The value is a percentage and the default is 8. refs #1564 #1565
  • Major enhancements to the tool that translates WiredTiger statistics into an HTML graph. The tool now generates interactive graphs and no longer requires third party Python libraries to be installed.
  • Add a new WT_CURSOR::reconfigure method for cursor configuration. See API documentation for more information. refs #1381
  • Add a new WT_SESSION.strerror method, a thread-safe alternative to wiredtiger_strerror. refs #1516

Bug fixes for bugs that could cause data inconsistency:

  • Fix a bug in recovery where we could lose track of file identifiers and apply updates to the wrong file. refs SERVER-17142 SERVER-17131
  • Fix several bugs in data consistency that could cause corruption when restarting after a hard crash, including:
    • A bug in table create that could cause recovery to fail. refs SERVER-17204

Other significant changes:

  • Significant tuning enhancements for the WiredTiger cache, including:
    • Avoiding stalls due to evicting large pages
    • Better algorithms for getting application threads to help with cache management without interfering with operation latencies.
    • Better algorithms for maintaining the cache when there are a few very hot pages (e.g: append workloads). SERVER-17344
    • Freeing obsolete references more aggressively, which saves space and reduces traversal overhead when there are lots of updates or deletes. refs SERVER-17195, #1647
    • Fix a bug that could cause a data loss if we split a large page into multiple smaller pages and attempted to evict the page at the same time. refs #1583 #1563 SERVER-16868
  • Significant improvements to the cursor truncate implementation, especially for workloads that periodically truncate from the start of the file. refs SERVER-17141
  • Fix a bug in eviction where reconfiguring the number of eviction threads could result in a segfault. refs SERVER-17293
  • Significant bug fixes and performance enhancements to the logging subsystem including:
    • Avoid yielding excessively to avoid CPU overhead when there are many active sessions. refs #1610
    • The log close thread needs to wait for outstanding writes. #1571
    • Create a new utility thread to close and fsync log files. #1560
    • Ensure that log files are closed in sequence. #1555
    • Ensure that log file create appears atomic. #1482
    • Fix a race between connection close and switching log files that could lead to a new log file being in an undefined state. #1480
  • Added support for advanced options to the WiredTiger verify command line tool.
  • Fix a bug in our conflict detection algorithm, where we were failing to detect some write-write conflicts in no-overwrite cursors. refs SERVER-16351
  • Significant bug fixes when writing pages to disk, including:
    • Stop double count the on-disk header when choosing split points. refs #1655
    • Fill the first and second pages as much as possible when splitting. refs #1282
    • Improve the algorithm for fitting large items onto pages when splitting. refs #1630 #1631
    • Fix a bug when using raw compression where we could overflow allocated memory. refs SERVER-16664
  • Fix several cases where WT_SESSION::verify and WT_SESSION::salvage could return EBUSY unnecessarily. refs #1404 SERVER-16457
  • Fix a bug where racing between discarding and updating a tree returned an error to the application. refs #1618 SERVER-17048
  • Fix a bug where opening a statistics cursor in parallel with a checkpoint could lead to a deadlock. refs #1575 SERVER-16738
  • Change the shared cache implementation to use cache read pressure rather than write pressure to determine how best to share memory (as checkpoints skew write pressure as a metric). refs #1569
  • Fix a bug where a deadlock could occur if a checkpoint starts in parallel to compact operations. refs #1589 SERVER-16967
  • Fix a bug in how we parse Huffman-encoding configuration settings during WT_SESSION::create. refs #1417 #1536
  • Fix a bug where the custom extractor terminate callback was being made twice. refs #1503
  • Add a new mode to the eviction server where it writes out some pages even though the eviction triggers have not yet been reached.
  • Fix several issues reported by COVERITY static analysis tool.

WiredTiger release 2.5.0, 2014-12-24

The WiredTiger 2.5.0 release contains significant new features, API changes and many bug fixes.

Now that WiredTiger is part of MongoDB, we are tracking issues related to MongoDB usage of WiredTiger in the MongoDB JIRA system. Some entries in the changelog now reference JIRA tickets that can be found at:

New features and API changes:

  • Add support for storing large values on-page in a btree rather than in an overflow item. This is useful for workloads that want to keep large items in cache - since WiredTiger overflow items are never cached. It is configured via new leaf_value_max configuration setting. This enhancement led to the deprecation of internal_item_max and leaf_item_max configuration settings, see upgrading documentation for further information. [#1282]
  • Add support for compressing log files. When configured each compressable log record will be compressed. This is configure with the log=(compressor=X) configuration setting. See upgrading documentation for further information. [#1359]
  • No longer return EBUSY when opening a bulk cursor, verifying or salvaging a database if a checkpoint is currently running. This allows applications to do these exclusive operations without shutting down the checkpoint server thread. [#1397] [#1404] SERVER-16236 SERVER-16457
  • Add support for immutable indexes. [#1344]
  • Added several new statistics, improved accuracy for some statistics tracking and simplified mechanism for querying a particular statistic. [#1505]
  • Have the eviction server write out unnecessary pages prior to the cache reaching the configured eviction trigger size. This can reduce the amount of eviction application threads do when configured with a large cache.
  • Several enhancements to managing how long we keep files open
  • Revert a change in the 2.4.1 release that caused the WT_ROLLBACK (and deprecated WT_DEADLOCK) error return to map to different numeric values. Applications should ensure they are compiling against the same version of the wiredtiger.h header file as the library they link against, otherwise odd behavior will be experienced.
  • Support setting configuration strings to "none" as being equivalent to an empty string in most cases. [#1417]
  • Enhance the hot backup implementation to allow recovery to be run between incremental backups. [#1183]

Other significant changes:

  • Improve performance of cursor open when there are many tables in a database. [#1391] [#1443]
  • Reduce the impact checkpoints have on concurrent operations. This was done by changing how we lock tables. [#1391] [#1392]
  • Improve performance when scanning a table that has many deleted items. SERVER-16247
  • Fix a bug in checkpoint, where the metadata (turtle) file wasn't being synced on checkpoint. [#1383]
  • Fix a bug where WiredTiger could accumulate memory during page splits and never free it. SERVER-16546
  • Many enhancements and bug fixes for Windows.
  • Fix a bug where a custom extractor terminate was being called twice. [#1503]
  • Fix a bug where a race between closing a handle and checkpointing could lead to errors. [#1495] [#1497]
  • Validate the block header checksum before we clear it - if the checksum field had been corrupted, we didn't notice. SERVER-16457
  • Fix a bug in write conflict detection. Cursors configured with no-overwrite could sometimes not see update conflicts for deleted records. SERVER-16351
  • Several bug fixes and performance improvements in LSM including:
    • Add support for custom collators in LSM trees. [#1361]
    • Fix a bug in LSM search_near, where it returned a deleted item. BF-694, BF-700
    • Improve background maintenance operations so that the cache does not get full unnecessarily.
    • Fix a bug that could lead to updates being written into old chunks. [#1432] [#1418]
    • Fix a bug in background merge that could skip updates. SERVER-16123
  • Fix a bug when maintaining the cache, that could cause checkpoints to skip writing an update that should have been included. [#1419] SERVER-16336
  • Fix a bug in WT_SESSION::drop, where failures generated error output even when force was specified. [#1436]
  • Fix a bug in WiredTiger integer packing code when figuring out how much space is required for a value (it can shrink as numbers grow). SERVER-16118
  • Several enhancements to the tool that generates graphs from standard WiredTiger statistics logs. [#1365]
  • Fix a bug on OS X where fsync isn't sufficient to flush a file, use fcntl(F_FULLFSYNC) instead.
  • Work around a bug in clang 3.5.0 compare and swap primitive. The __sync_bool_compare_and_swap version of the API in clang produces bad code for us with -O3 optimization enabled.

WiredTiger release 2.4.1, 2014-11-06

The WiredTiger 2.4.1 release contains several new features, many bug fixes and performance enhancements.

New features and API changes:

  • Add new custom extractor functionality to WiredTiger indexes. Allowing an application to define mutated and/or multiple keys for indexes. [#1199]
  • Add a new WT_SESSION::transaction_pinned_range method that allows users to identify when a session is keeping a transaction ID pinned for a long time. [#1314]
  • Enhance statistics output so that keys are more clearly categorized. [#1313]
  • Rename WT_DEADLOCK error return to WT_ROLLBACK. WiredTiger uses the return in cases other than traditional application deadlock. The old value is retained as an alias to maintain backward compatability. [#1204]
  • Increase the maximum configurable cache size to 100GB.

Other significant changes:

  • Improve support for building on Windows platforms. [#1342]
  • Fix a bug where WiredTiger could race closing handles. [#1336]
  • Enhance performance when hot pages in cache are growing rapidly. [#1317]
  • Fix a bug in recovery, where log files that are zero extended could result in some log records being skipped. [#1334]
  • Updates to the Java API to improve documentation and exception handling. [#1295]
  • Improve support for building on Oracle Solaris platform [#1329]
  • Fix a bug where closing a handle could leave the tree in an inconsistent state on failure. [#1316]
  • Several bug fixes and improvements to LSM including:
    • Improving algorithm for switching the in-memory chunk.
    • Fixing a bug related to dropping obsolete chunks. [#1304]
  • Fix a bug in schema level operations (table create, drop, etc). If there was an explicit transaction running when the operation was performed that was subsequently rolled back the object could be left in an inconsistent state.
  • Several enhancements to cache management when there are long running transactions present.

WiredTiger release 2.4.0, 2014-10-15

The WiredTiger 2.4.0 release contains significant new features, API changes and many bug fixes.

New features and API changes:

  • Cursors keep their position across transaction boundaries. That is WT_SESSION::begin_transaction and WT_SESSION::commit_transaction no longer reset cursors. [#1181]
  • Change cursor behavior so that when an operation returns WT_NOTFOUND, the cursor is now left pointing to the original key/value pair. [#1209]
  • Initial support for building WiredTiger on Windows.
  • Add ability to customize a collator for specific data sources or with application managed metadata. See upgrading documentation for more information. [#1165]
  • Enhance extension mechanism in WiredTiger to support loading extensions from the application binary - not just a separate library. [#1174]
  • Replace WT_SESSION::create "lsm=(merge_threads)" configuration option with wiredtiger_open "lsm_manager=(worker_thread_max)". See upgrading documentation for more information.
  • Enhancements to the WiredTiger Python API build process. [#1188]
  • Add ability to dump and load WiredTiger databases in JSON format. [#1154]
  • Add ability to automatically checkpoint based on the volume of log records generated since the last checkpoint. This is enabled using the wiredtiger_open configuration option "checkpoint=(log_size=size)" [#1170]
  • Enhance functionality allowing users to write content into the WiredTiger transaction log. [#1171][#1175]
  • Enhance the WiredTiger HyperLevelDB implementation to support log replay. [#1106][#1155]

Other significant changes:

  • Fix several bugs in the shared cache implementation. [#1180][#1176]
  • Fix a bug where the public URI field in a cursor did not match the string passed to WT_SESSION::open_cursor. [#1235]
  • Fix several bugs in salvage. [#1222][#1169]
  • Several bug fixes and enhancements for WT_CONNECTION::reconfigure. [#1214][#1172]
  • Fix several bugs in raw compression implementation, particularly for data that compresses extremely well. [#1191]
  • Several bug fixes and enhancements to WiredTiger LevelDB interface.
  • Switch default build from using adaptive pthread mutexes to default pthread mutexes.

WiredTiger release 2.3.1, 2014-08-14

The WiredTiger 2.3.1 release contains mainly performance enhancements and bug fixes.

Changes to the WiredTiger API:

  • Fix a bug in WT_CURSOR::set_value that could lead to undefined behavior with some value formats.
  • Make the asynchronous API generally available [#1139]
  • Add log cursors for replay and verification. Make generated log record and operation types public. [#1106]
  • Allow eviction worker threads to be started and stopped dynamically. Applications that use the eviction_workers configuration should see the upgrading documentation on how to use this feature. [#1116, #1143, #1158]

Other significant changes:

  • Improve performance and reduce latency during checkpoints and LSM merges. Remove uses of the checkpoint lock other than serializing checkpoints: compact holds the schema lock, so it doesn't need to hold the checkpoint lock, the new WT_BTREE handle close lock prevents checkpoints from colliding with handle close, so LSM doesn't need the checkpoint lock either.
  • Some minor cleanups, setting the internal session's name in a few places. [#1073]
  • Grab the live lock when loading a checkpoint in diagnostic mode: that could race with a read. [#1102]
  • Instead of keeping a list of file URIs for checkpoint to flush, open a handle and stash it. [#1114]
  • Add a new OS-layer function __wt_fsync_async to flush a file without waiting for the results, call it from the Btree flush-leaves code so pages start flushing while we're working the rest of the checkpoint. [#1136, #1152]
  • Wait for the handle flush lock when writing the leaf pages instead of returning EBUSY. [#1136]
  • Add a wtperf page to the documentation, describe how to simulate workloads and view statistics. [#1147]
  • Flag new structures not listed in PREDEFINE. [#1148]
  • Return EBUSY if no async handles available and fix ex_async to look for it. [#1153]
  • Fix some problems with navigation in the reference guide.
  • Bump the number of slots for internal sessions: we have a lot more than 2 now. Add a test for session_max settings, make sure we add enough to account for at least the default internal sessions.
  • Remove tcbench: we're no longer maintaining it.

WiredTiger release 2.3.0, 2014-07-29

The WiredTiger 2.3.0 release contains significant new features, performance enhancements and bug fixes. Significant changes are described below.

Changes to the WiredTiger API (see upgrading documentation for details):

  • Add a LevelDB API implementation for WiredTiger. This includes support for stock LevelDB as well as Basho, HyperLevelDB and RocksDB versions of the API. To build the LevelDB API include –enable-leveldb in the configure command, to specify compatability with an alternative LevelDB API use –enable-leveldb=[basho,hyper,rocksdb]. [#1028]
  • Add ability to build some common extensions into the WiredTiger library. This means that the libraries for those extensions don't need to be dynamically loaded at runtime. Currently supported extensions are Snappy compression and zlib compression. The option can be enabled by passing –with-builtins=[snappy,zlib] to the configure command line.
  • Add a new configuration to wiredtiger_open: statistics_log=(on_close=true), that causes a set of statistics to be logged on WT_CONNECTION::close. [#1086]
  • Add a new configuration to wiredtiger_open: exclusive, that causes the open to fail if the database already exists.

Other significant changes:

  • Performance improvement for high throughput workloads using multiple eviction threads. Performance of some workloads improves by over 15% [#1087]
  • Significant performance optimizations for queries, giving up to 20% throughput improvement for in-memory query workloads.
  • Fix an off-by-one bug that could lead to ENOMEM during commit with logging. [#1104][#1121]
  • Allow bulk loads to multiple files to complete in parallel. [#1114][#1126]

WiredTiger release 2.2.1, 2014-06-24

The WiredTiger 2.2.1 release contains mainly performance enhancements and bug fixes. Significant changes are described below.

Changes to the WiredTiger API (see upgrading documentation for details):

  • Change the order in which configuration setting mechanisms are applied by wiredtiger_open. [#1010][#1034]
  • Split the global transaction_sync configuration into two parts: a sync method (dsync, fsync or none), and an enabled flag (false by default). [#1074]
  • Add ability to sync with per transaction granularity. [#1074]
  • Update WiredTiger Java API to throw WiredTigerException consistently. [#1011]
  • Add ability to dump and load databases using JSON format. [#740][#1049]

Other significant changes:

  • Various performance improvements to the main cursor search routine including reductions in how often we need to copy data and profiling based optimizations for tight search loops. [#1050][#1070]
  • Fix a bug in recovery with missing files (e.g., after a hotbackup that raced with file creation). [#1042][#1045]
  • Several bug fixes and performance enhancements related to LSM trees and snapshot isolation transactions. [#1057][#1060][#1075]
  • Several performance tuning enhancements to LSM trees around locking, throttling and switching chunks. [#1051]
  • Algorithmic improvements to LSM tree compact operation. It is now faster and more reliable. [#1063]
  • Create a separate thread to manage open file handles - which means that:
    • Application threads are less likely to be responsible for closing handles
    • Multi threaded workloads don't open/close handles more often than necessary [#1018]

WiredTiger release 2.2.0, 2014-05-21

The WiredTiger 2.2.0 release contains new features, performance enhancements and bug fixes. Significant changes include:

Changes relevant for upgrading applications:

Update the table create API to disable prefix compression by default. Applications generally see better performance without prefix compression, choosing space saving over performance is up to the application. [#981]

Change the default leaf_page_max setting from 1MB to 32KB. Choosing a large default leaf_page_max led to poor performance in out of cache workloads.

Remove the --enable-debug option to configure. It is more standard to set CFLAGS="-g" variable instead.

Save the wiredtiger_open configuration when a database is created, so that settings like cache size, extensions and logging are set consistently by all subsequent users of the database.

Add an --enable-verbose option to configure. In order to access the verbose message functionality available as part of the wiredtiger_open and WT_CONNECTION::reconfigure APIs, it is necessary to pass the --enable-verbose option to configure.

Enhance the metadata cursor implementation (i.e: cursors created with a "metadata:" prefix) so that they can be used to inspect metadata for internal tables and now support altering the metadata. Add a new "read_only" flag to cursor configuration that defaults to false for metadata cursors.

Fix several bugs in raw compression, including one that could cause data corruption and some that triggered poor performance. [#984][#991][#1007][#1008][#1013]

Improve the performance of recovery - we no longer need to scan all log files looking for the last checkpoint.

Improve performance of read-only transactions, by deferring the allocation of transaction IDs. [#978]

Several bug fixes in hot backup related to log files, including:

  • Always choose the right metadata version in the backup [#972]
  • Don't require that hot backup copies log files in order [#976]
  • Always copy log files before data files [#976]
  • Fix a bug where recovery returned an error if the last log record was incomplete [#994]

Speed up checkpoints by doing a better job of skipping pages that can't contain changes that need to be included. [#954][#963][#1001]

Add ability to store zero length data items into LSM trees. [#540]

Add an asynchronous data access/manipulation API to WiredTiger. [#933]

Add the ability to configure multiple eviction server threads, to help with keeping space available in the cache. [#918]

Add the ability to reconfigure the checkpoint and statistics log servers. [#997][#1004]

Improve the performance of retrieving data for in cache workloads. [#970]

Improve the structure of the in-memory tree we are generating, by allowing internal pages to be split. This significantly improves query performance in some workloads. [#876]

Work around a bug in posix_fallocate on Linux, where it could corrupt already written data.

Add the ability to leak memory on connection close via new leak_memory option to WT_CONNECTION::close API. This allows for faster shutdown if a process is going to exit when the WiredTiger connection is closing.

Allow salvage to run on any table type.

WiredTiger release 2.1.2, 2014-03-27

The WiredTiger 2.1.2 release contains performance enhancements and bug fixes. Significant changes include:

Update the configuration settings for shared_cache to make the distinction between cache_size and shared_cache less confusing. See upgrading documentation for more information.

Various performance enhancements to improve the performance of checkpoints.

Fix a bug that could cause a hang with small caches under heavy load. [#894]

WiredTiger release 2.1.1, 2014-03-04

The WiredTiger 2.1.1 release contains new features, performance enhancements and bug fixes. Significant changes include:

Fix a bug where a page could be marked clean when it contained uncommitted changes. This bug could cause undefined behavior in transaction rollback under load.

Fix a bug with shared caches when rebalancing between connections.

Add a new public API to WiredTiger that provides the ability to parse WiredTiger compatible configuration strings. See the upgrading documentation for further information. [#873]

A number of performance enhancements to the LSM implementation, particularly for long running workloads.

A number of performance enhancements and bug fixes to cache eviction code.

Add an option to use direct I/O when reading from checkpoints. To enabled the functionality add "direct_io=[checkpoint]" to your wiredtiger_open configuration string. [#847]

WiredTiger release 2.1.0, 2014-02-04

The WiredTiger 2.1.0 release contains new features, performance enhancements and bug fixes. Significant changes include:

The WT_ITEM structure was changed so that the size field is a size_t rather than a uint32_t. See upgrading documentation for details.

A change to the compress_raw interface around repeating the call with more records. See upgrading documentation for details.

In LSM trees, the memory_page_max setting is ignored. The effective setting is double the chunk size. [#861][#859]

Add support for zlib compression. [#855] [#865]

Various enhancements to how WiredTiger generates tree structures in memory to help maintain consistent performance as table size grows. [#851]

Add support for Levyx Inc Helium as an external data source in WiredTiger [#849][#850]

Improve insert performance when a table contains many identical overflow items.

Various performance enhancements to btree searches. [#838][#839][#840]

Add support for newer versions of autoconf up to 1.14. [#599][#841]

Improve multi-threaded throughput of durable log writes, including changing the default wiredtiger_open transaction_sync configuration from dsync to fsync, see the upgrading documentation for further information. [#831][#832]

In the Python and Java APIs, automatically close handles to prevent invalid accesses by applications. [#649][#800][#830]

Various enhancements to the LSM merge algorithm, including improvements to how files are selected for merging, and throttling based on whether merges are keeping up (to limit write amplification). Made the minimum number of chunks chosen to merge configurable. [#817][#819][#822]

WiredTiger release 2.0.1, 2013-12-12

The WiredTiger 2.0.1 release contains major new features, numerous performance enhancements and bug fixes.

Significant changes include:

  • WiredTiger now supports fine-grained durability via Write Ahead Logging (WAL). Logging is enabled with the "log=(enabled)" configuration string to wiredtiger_open. If the connection is not shut down cleanly and logging is enabled, WiredTiger will automatically run recovery the next time it is opened, rolling forward changes in the log until the last commit. [#605]
  • Many enhancements to the LSM implementation to improve the throughput and reduce maximum operation latency including:
    • Algorithmic improvements when multiple merge threads are configured.
    • Improvements to bloom filter lookup speed.
    • Enhancements to internal cursor management, to reduce search overhead.
    • Prioritize switching to a new level 0 chunk in utility threads, to avoid application thread pauses.
    • More advanced logic in choosing when to create bloom filters.
  • LSM specific WT_SESSION::create configuration option enhancements. Including:
    • Move existing options into their own group, and strip leading lsm_ prefix.
    • Add a new merge_max configuration option.
    • Update the default chunk_size to be 10MB.
    • Increase the default bloom filter bit and hash counts.
    • Clean up files left after interrupted merges. See the upgrading documentation for details. [#784, #785, #786, #802]
  • WT_SESSION::compact can now be used to merge LSM trees into a small number of chunks on disk. [#792]
  • Enhanced the Java API, so that when WiredTiger automatically closes a handle, the handle is automatically invalidated for the Java application. [#485]
  • Add a script that can create an interactive web page to view statistics from a WiredTiger statistics log. Based on D3:
  • Enhancements to the wtperf performance testing tool to add new features

WiredTiger release 1.6.6, 2013-11-19

The WiredTiger 1.6.6 release is a bugfix and performance tuning release.

This release of WiredTiger contains a database format change. Database files from previous releases will need to be upgraded.

A special note: the WiredTiger code base is now being regularly reviewed using the Coverity Static Analysis Verification Engine. We'd like to thank Coverity for their on-going support of Open Source projects like WiredTiger!

Significant changes include:

  • Performance changes include: limiting operations done inside update serialization primitives, removing unnecessary memory barriers, replacing spinlocks with atomic instructions, padding structures to avoid false cache sharing, switching from per-file mutexes to per-page mutexes, pre-allocating structures to avoid memory allocation while holding mutexes, and using adaptive mutexes where available. [#707, #718, #719]
  • A number of LSM stability and performance improvements: changes include better merge algorithms, reduced locking, and higher concurrency.
  • A number of table compaction performance improvements, including changes allowing compaction to no longer read unnecessary file blocks into the cache, requires fewer passes over the file and support concurrent checkpoints and eviction. This change required an underlying file format change, see the upgrading documentation for details. [#756, #761]
  • WiredTiger statistics have been significantly improved:

    Statistics logging has been changed to aggregate information from all open handles. [#709, #717]

    For performance reasons, statistics are now disabled by default, see the upgrading documentation for details. [#715]

    Statistics configuration has been changed so the connection and cursor configuration are consistent, with matching changes to the "wt stat" command-line utility; see the upgrading documentation for details.

  • Update WT_EVENT_HANDLER interface to contain a new "handle close" interface and to pass a WT_SESSION handle into all callbacks, see the upgrading documentation for details. [#649]

    Add timestamp, process ID and thread ID to messages generated via WT_EVENT_HANDLER interface. [#753]

  • WiredTiger eviction improvements, supporting larger data-to-cache size ratios. [#754]
  • Various fixes for handling overflow records. [#726, #743]
  • Overflow records are no longer tracked during bulk-loads, significantly increasing bulk-load performance for some data sets.

WiredTiger release 1.6.5, 2013-10-09

This is primarily a bugfix and performance tuning release. The main changes are:

  • Change the default statistics_fast configuration from false to true.
  • Change WT_CURSOR::insert to not hold a position. [#673]
  • Disallow WT_SESSION::compact operations on LSM trees.
  • The 'sync' setting to wiredtiger_open has been renamed 'checkpoint_sync'.
  • Add a "metadata:" cursor type. [#660]
  • Fix race in the cache's dirty byte tracking. [#635, #699]
  • Fix a bug scanning through a memory-mapped file with overflow items. [#701]
  • Use hardware checksum instructions when available. [#582, #702]
  • Several bug fixes related to tracking active transaction IDs and detection of obsolete updates with high concurrency workloads. [#639, #643, #657, #683]
  • Fix several bugs in LSM including races on shutdown and Bloom filter creation. [#686, #687, #688].
  • Fix a bug in LSM where we were not including Bloom filter files in backups. [#684]
  • Optimize the LSM throttle and merge algorithms. [#676]
  • Make hot backups work concurrently with files being bulk-loaded. [#570, #653]
  • Add full support for snapshot isolation to LSM: only switch LSM chunks if all changes are globally visible and detect conflicts between transactions across file switches. [#629]

WiredTiger release 1.6.4, 2013-08-20

This is primarily a bugfix and performance tuning release. The main changes are:

  • Make prefix compression of keys conditional on the amount of space saved. A database format change was required for this enhancement. See upgrading documentation for details. [#624]
  • The default behavior of the wt utility's load command has been changed to overwrite existing data.
  • Add a WT_SESSION.create prefix_compression_min configuration option with a default value of 4. [#624] and [#624]
  • Fix "make install" of Python API. [#598]
  • Require platform support for atomic read/write of 64 bit values. [#553]
  • Support transaction semantics for custom data source implementations. Enhance Memrata data source to support transactions.
  • Changes to the wtperf testing tool related to how configuration options are specified.
  • Enhance cursor key/value memory management to be more efficient, consistent, and have stricter checking of inputs and outputs.
  • Increase the likelihood of being able to evict hot pages. [#604]
  • Reference on-page keys instead of copying them to allocated memory. This saves space in the cache and overhead when reading pages into cache. [#592] and [#600]
  • Add a btree search optimization that skips matching prefixes. [#595]
  • Turn off Huffman encoding for keys on row-store internal pages. [#592]
  • Add concurrent logging infrastructure that will be used to support write ahead logging in a future release.

WiredTiger release 1.6.3, 2013-07-12

This is a bugfix and performance tuning release. The main changes are:

  • Change the default cursor overwrite configuration so that it is consistent across all data sources. This change may alter the behavior of existing applications without triggering any compilation or runtime warnings. See the upgrade documentation for details. [#512]
  • Require platform support for 64 bit atomic operations. [#553]

WiredTiger release 1.6.2, 2013-06-18

This is a bugfix and performance tuning release. The main changes are:

  • Fix a race in the WiredTiger pseudo random number generator that was leading to poor distribution of numbers.
  • Change the default compression configuration to "uncompressed".
  • Fix a race between checkpoints and LSM that could result in a crash. [#543]
  • Add an option to output version information at runtime. Configure by including "verbose=[version]" in the wiredtiger_open connection configuration string. [#564]
  • Add a configurable prefix to error messages. [#527]
  • Add two new extension APIs, one to return a transaction ID, one to return if a transaction ID is visible to the current transaction.
  • Add standard metadata functions to the extension API and make extension data sources responsible for their own metadata entries.
  • Add a new extension function __wt_ext_config_strget that returns the configuration value from a single string.

WiredTiger release 1.6.1, 2013-05-31

This is a bugfix and performance tuning release. The main changes are:

  • Fix the compress_raw API so that it uses platform independent types. See the upgrade guide for further information. [#561]
  • Add an explicit enable setting to shared_cache configuration. See the upgrade guide for further information.
  • Fix several bugs in hot backup, including race conditions between backup and table drop (and other schema level operations). [#556] [#557]
  • Allow any data source type for indices as well as column groups. [#545]
  • Preload btree internal pages into file system cache when opening a table.
  • Change the default allocation size to 4KB so that DIRECT_IO with 4KB blocks works. [#547]
  • Fix some bugs related to tracking the oldest active transaction. [#552]
  • Fix a bug in the extension API when using multiple databases.
  • Disallow named checkpoints on LSM trees - they aren't supported. [#546]
  • Fix support for custom collators with LSM trees. [#544]
  • Build fixes for gcc 4.1.2.

See the upgrade documentation for details of API changes that may require altering existing applications.

WiredTiger release 1.6.0, 2013-05-16

This release contains new features, bug fixes and performance improvements. The significant changes are highlighted below:

  • Fix a bug where configuring direct I/O could cause checksum errors at runtime. NOTE: database file format change. [#526]
  • Fix a race that allowed checkpoints to be deleted while hot backups are running. [#515]
  • Scale to events per second in graphs generated from statistics log output. [#518]
  • Changes to reduce the latency of LSM operations.
  • Add a new terminate callback to extension interfaces that is called when the WiredTiger connection is closed. [#530]
  • Various optimizations and bug fixes to cache management and eviction code.
  • Update various statistics.
  • Fix a bug where using a combination of read-committed and snapshot transactions could result in inconsistent values being returned. [#539]
  • Fix a bug where using LSM trees with compression enabled could result in an invalid system call. [#535]
  • Enhance statistics logging so that it can dump "lsm:" statistics.

See the upgrade documentation for information about database format changes in this release.

WiredTiger release 1.5.3, 2013-04-26

This release contains some major new features along with numerous bug fixes and performance improvements. The significant changes are highlighted below:

  • Enhance the extension data source API to facilitate implementation of new data stores in WiredTiger.
  • Add support for the STEC / Memrata KVS data source.
  • Add a Berkeley DB data source via the WiredTiger extension API.
  • Various enhancements to cache eviction management. Mostly to avoid stalls in application threads.
  • Fixes to shared cache pool implementation, so resources are more aggressively reallocated.
  • Add new statistics.
  • Implement automatic insert throttling in LSM - enabled by default.
  • Configuration strings are now case sensitive.
  • Enhance LSM merge algorithms to be more efficient as trees grow very large.

See the upgrade documentation for details of API changes that may require altering existing applications.

WiredTiger release 1.5.2, 2013-03-28

This is a bugfix release. The main changes are:

[#493] Fix get_key/value in the Java API for complex cursors.

  • Fix a leak in eviction detected by valgrind.
  • Stop trying to cache the oldest reader: we only use it for eviction and only update it when required.
  • Track cursor creation in the statistics (creating a cursor per operation isn't a good idea).

WiredTiger release 1.5.1, 2013-03-25

This is a bugfix and performance tuning release. The main changes are:

  • Fix several bugs in LSM:
    • the logic for setting the "no eviction" flag on LSM chunks was reversed, causing unnecessary eviction once the cache became full;
    • calling session.checkpoint while writing to an LSM tree could confuse the logic around switching to new chunks; and
    • fix a possible NULL pointer indirection when switching chunks.
  • Make WT_ASSERT a no-op when not in DIAGNOSTIC mode.
  • Panic if we find a block on the wrong list, that's not something we can recover from.
  • If a page is reconciled (causing it's on-disk blocks to be freed and potentially recycled), and then a subsequent collapse of a stack of split-merge pages replaces that page with a page that has not yet been reconciled, we can potentially free the same blocks twice. The fix is to clear the page's WT_REF.addr field at the time we free the blocks, so future reconciliations will ignore the original disk blocks.
  • Fix a bug in the dump utility that allowed index URIs.
  • Tweak merge to build better trees with random insert workloads.
  • Don't use a stale value for the oldest reader transaction ID.
  • Track the size of the WT_REF array in internal pages (including WT_ADDRs). Also add an estimate of per-allocation overhead.
  • Fix a bug where URIs containing absolute paths were not being parsed correctly.
  • Add a RMW insert mode to wtbench.

[#427] Improve cleanup after a failed wiredtiger_open call.

[#484] Don't allow true/false values in config strings where integers are expected.

[#486] Move the cache full check for autocommit transactions out of the rollback path (since we don't reset cursors there), to after we close a cursor.

[#488] Fix an assertion failure if we try to do eviction without ever having done an update.

WiredTiger release 1.5.0, 2013-03-14

This release contains some major new features along with numerous bug fixes and performance improvements. The significant changes are highlighted below:

  • Add a Java API.
  • Create a thread to do automatic checkpoints, configured by passing "checkpoint=(wait=X)" to wiredtiger_open.
  • Add support for periodically logging statistics to a file and a tool to generate graphs based on those logs. Configured by passing "statistics_log=(wait=X)" to wiredtiger_open.
  • Several changes to minimize the impact of checkpoints on other threads.
  • When reading from checkpoints, use mmap by default.
  • Enhance eviction so that internal pages take up less space.
  • Add maximum filesystem buffer cache settings to wiredtiger_open called "os_cache_max" and "os_cache_dirty_max". After doing the specified amount of reads or writes, WiredTiger will call fadvise and/or sync_file_range to drop pages from the filesystem cache. This is an alternative to direct I/O with less impact on performance.
  • Make run-time statistics optional, defaulted to "off".
  • Change how we detect if shared cache is used. It used to rely on a name, now it will be used if the shared_cache configuration option is included.
  • Add the ability to specify a per-connection reserved size for cache pools. Ensure cache pool reconfiguration is honoured quickly.
  • Rework hazard pointer coupling during cursor walks to be more efficient.
  • Add a cache_eviction_walk statistic to track the pages we walk and a cache_eviction_force statistic to track the count of pages queued for forced eviction.
  • Fixes to reduce the number of operations on shared data that were causing bottlenecks in read only workloads.
  • Add streaming pack / unpack to the API.
  • Add some basic reconciliation stats to the connection stats.
  • In LSM, keep trying to switch if there is an error: it may be transient.
  • Minor clean up and enhancement for the reconciliation statistics, add a set of compression statistics, both to the data-source statistics.
  • Compaction cannot run at the same time as a checkpoint: the problem is that checkpoints review page reconciliation information and checkpoints update page reconciliation information. Lock out checkpoints while compaction is running.

WiredTiger release 1.4.2, 2013-01-14

[#387] Fast-path "S" and "u" formats in cursor.get_key and cursor.get_value.

[#407] Allow non-conflicting updates to complete concurrently.

[#418] Add code in to prioritize eviction of pages that are larger than a certain threshold. This avoids taking a performance hit when a huge page needs to be reconciled. Add a new memory_max_page configuration option.

[#419] If a page splits, it potentially creates a merge-split internal page and we potentially walk that page during fast-delete. The WT_REF.addr field doesn't point to a cell in that case and we'll drop core.

[#424] Add clarification wording for boolean configuration strings.

[#425] Perform checkpoints in the calling thread, don't block eviction: when evicting in a file that is being checkpointed, only evict clean pages. Also Do compaction in the calling thread instead of interrupting the eviction thread to do the work.

[#426] Fixes for automake 1.3.x. Allow examples to run in parallel: give each a unique home directory.

Make the tree build without HAVE_VERBOSE.

Fix some issues with LSM rename and add a Python test.

Track when cursors refer to memory returned by WiredTiger, copy it if required before dropping hazard pointers that might be protecting it.

Verify shouldn't ever modify the file – don't bother checking for dirty pages, just discard everything.

When rolling forward to resolve key prefix compression, don't copy the key, we only need a reference to it, should speed up tables with lots of key prefix compression.

Requested changes for the WT_COMPRESSOR::compress_raw method API: pass in the configured object's page size as a convenience, and if WT_COMPRESSOR::pre_size is set, use it to determine the size of the destination buffer, rather than using the object's page size as the maximum needed.

WiredTiger release 1.4.1, 2012-12-12

This is a bugfix, cleanup and performance tuning release. The significant changes are highlighted below:

[215] Add a __wt_panic function that shuts down all of the WiredTiger APIs. Also add a new error return WT_PANIC which means there has been an error in the WiredTiger engine, and it should be restarted.

[409] Fix a bug populating column groups with complex schema. Also allow empty column lists in projection cursors.

[150] Add description of how to do index-only searches to the documentation.

[392] Move examples/c/ex_test_perf.c to bench/wtperf.

[322] Add support for statistics on schema-level objects i.e tables, column groups, indices.

  • Enhance statistics, including changing the name of some statistics.
  • Fix a bug in the eviction server that could cause it to abort, leaving the system unusable.

WiredTiger release 1.4.0, 2012-12-03

This release adds several major new features, a number of performance improvements and bug fixes. The significant changes are outlined below:

New features and API changes:

[242] Track the percentage of cache that is dirty, trigger eviction to bound it. This can be used to bound how much data checkpoints write.

[324] Add support for WT_COMPRESS::compress_raw, which lets the compression routine select how many rows are included in each disk block.

[381] Add statistics to track read and write amplification (application data size versus I/O size)

Bug fixes:

  • Fix build issues on Solaris.
  • Fix a bug calculating the generation of an LSM merge.
  • Fix WiredTiger dump and load for tables.
  • Fix a memory leak in checkpoints.
  • Improve accuracy of cache memory tracking with overflow items.

WiredTiger release 1.3.8, 2012-11-22

This release improves the performance of LSM trees, changes how statistics are reported and adds a shared cache implementation:

New features and API changes:

[232] Add a "size of checkpoint" statistic.

  • Add a shared cache pool implemention. Manages a single cache among multiple databases within a process.
  • Merge statistics from file and LSM sources into a "data source" statistic structure. Rename and regroup some shared stastistics. Add a helper to the Python API to lookup in a cursor in a simple expression.
  • Add support for sub groups of options in configuration strings.

Performance tuning for LSM trees:

  • Don't try to merge with a chunk that is much larger than a small chunk.
  • After an LSM merge, fault in some pages before the new tree goes live to avoid stalling application threads.
  • Don't automatically fail inserts if the write generation check fails: compare keys instead.
  • Switch the LSM tree lock to a read/write lock, so cursors can read the state of the tree in parallel.

Bug fixes:

  • Fix a bug where we could write past the end of a buffer after it was grown.

WiredTiger release 1.3.7, 2012-11-09

This release fixes a bug and improves performance with Bloom filters:

  • Drop any old Bloom filter before creating a new one – we may have been interrupted in between creating it and updating the metadata. Write the metadata after creating missing Bloom filters.
  • Use a separate thread for creation of Bloom filters for the newest, unmerged LSM chunks.
  • Changes to the ex_test_perf example: change the default configuration to 4KB pages and disable prefix compression. Change the "-i" command line option to be a simple count of records to insert. Clean up error handling and add option to populate using multiple threads.
  • Clarify the docs for the default buffer_alignment setting.

WiredTiger release 1.3.6, 2012-11-06

This is a bugfix and performance tuning release. The changes are as follows:

  • Rename the WiredTiger installed modules to libwiredtiger_XXX. Don't install the nop and reverse collator modules.
  • Replace test/format's bzip configuration string with compression, which can take one of four arguments (none, bzip, ext, snappy), change format to run snappy compression if the library is available.
  • Rename the builtin block compressor names from "bzip2_compress" to "bzip2", and from "snappy_compress" to "snappy".
  • Support multiple LSM merge threads with the "lsm_merge_threads" config key. Use IDs rather than array index to mark the start chunk in a merge, in case we race with another thread.
  • Cache the hash values used for Bloom filter lookups, rather than hashing for each Bloom filter in an LSM tree.
  • Only switch trees in an LSM cursor if the primary chunk is on disk.
  • Add a per-btree cache priority, currently only used to make it more likely for Bloom filter pages to stay in cache.
  • Only evict pages with read generations in the bottom quarter of the range we see. Fix a 32-bit wrapping bug in assigning read generations.
  • For update-only LSM cursors, only open a cursor in the primary chunk.
  • LSM: Report errors from the checkpoint thread.
  • LSM: only save a Bloom URI in the metadata after it is successfully created.
  • LSM: Create missing Bloom filters when reading from an LSM tree if "lsm_bloom_newest"is set.
  • LSM: Include all of the chosen chunks in a merge. Only pin the current chunk in an LSM cursor if it is writeable.

WiredTiger release 1.3.5, 2012-10-26

This is a bugfix and performance tuning release. The changes are as follows:

[#370] Document that applications are responsible for figuring out their upgrade path if they might swap out compression engines.

[#371] When a single session was used to reconcile multiple btrees, one of which had dictionaries configured and one of which didn't, we failed to clear the dictionary when starting page reconciliation. Be consistent, never use anything other than the btree handle's configuration to decide if we're using a dictionary in a reconcilation run.

[#372] Fix several potential integer overflow bugs.

[#373] Fix a bug where calls that performed an operation on multiple objects (such as creating a table that implicitly creates a column group) could leave the metadata incomplete if a process exited without calling WT_CONNECTION::close. Hold the schema lock while opening tables. Fixes the error "cannot be opened until all column groups are created" message when create calls race with open_cursor.

[#374] Fix a race that caused crashes when using the Python API with multi-threaded code.

[#375] Fix a bug in __wt_cond_wait - so that it returns after timeout expires.

  • Protect the list of LSM trees with the schema lock to avoid races during create.
  • Update ex_test_perf to output statistics during populate and improve timing accuracy.
  • Skew eviction in favor of leaf pages - which improves read-only performance for large LSM trees.
  • Hold the LSM tree lock while gathering statistics.
  • Fix a bug in bulk load of bitmap files.
  • Fix a related bug in the bloom code that uses bitmap stores.
  • Don't attempt to drop the first chunk of an LSM tree before creating it.
  • Instead of entering a fake key cell after the last cell on the page just in case the page ends with a key cell which has no value, use the end of the page to detect that case.
  • Cache cursor key/value formats in Python, to save a native call from every get_key/value.
  • Don't sync the directory after open if the global "sync" flag is false.
  • Fix a race for LSM trees that could happen if two threads race to open a cursor and drop the LSM tree.

WiredTiger release 1.3.4, 2012-10-19

This release includes several important new features, including:

  • support for online compaction of files;
  • support for tables, column groups and indices that use LSM trees for storage; and
  • improved statistics and configuration for LSM trees and Bloom filters.

In addition, there are some significant performance improvements and bug fixes. The full list of changes is:

[#248] Add support for online compaction.

[#310] Fixed a bug where overflow blocks could be accessed by a long-running reader after they had been freed in a checkpoint.

[#358] Allocate checkpoint blocks from the live system's list of available blocks rather than always extending the file.

[#361] Sync the directory after creating a file: this is apparently required for durability on Linux, according to the Linux fsync man page.

[#362] Don't check if a page is on the avail or discard lists if we're salvaging the file, that is okay.

[#363] Remove obsolete code dealing with forced eviction.

[#366] Fake checkpoints may have the delete flag set, ignore them when rolling checkpoints forward.

[#367] All metadata reads should ignore the application's transactional context.

[#369] Support LSM as a data source for tables, column groups and indices.

  • Add tuning options for LSM bloom filters, including controlling whether the oldest level in the tree has a Bloom filter, whether newly-created (level 0) files have Bloom filters, and passing arbitrary file configuration for Bloom filters.
  • Add a merge generation to LSM chunks. Add a statistic that reports the highest merge generation in a tree.
  • Add a new LSM statistic tracking searches that could benefit from bloom filters.
  • Enable LSM statistics in the "wt stat" utility.
  • Interrupt LSM merge operations, rather than waiting on close.
  • Wait for a while before looking for LSM major merges, in case merges catch up with inserts.
  • Fix LSM index searches. The main issue was LSM search_near was not always returning the closest key to the search key, which calling code expects. It now tries hard to find the smallest cursor larger than the search key, and only if no larger record exists does it return the largest record smaller than the search key.
  • Reset any old cursor position before an LSM search. This limits hazard references in an LSM search to a single chunk.
  • Fix a memory leak in an error path in Bloom filters.
  • Tweak the search loops in hazard_{set,clear} in favor of last-in-first-out ordering.
  • If there are many files open, some hotter than others, walk more files looking for pages to evict.
  • Don't stop evicting until we reach the target, have eviction wake up periodically regardless of whether the application signals it. This latter requires a "timed condition wait" operation.
  • Tweaks to file handle flags for out-of-cache read performance on Linux (disable readahead and access time updates).
  • Replace the WT_SESSION::dumpfile method with configuration strings to WT_SESSION::verify.
  • Fix a bug where we weren't skipping unnecessary default checkpoints because we weren't handling the generational number included in the internal checkpoint name.
  • Add a "force" configuration flag to WT_SESSION::checkpoint, object compaction needs it because the work it wants done is done by the block manager.
  • Make compact and checkpoint operate on a table's indices.
  • When doing a page truncate, lock down the page before we unpack the on-page cell – it's possible the page could be instantiated, modified and reconciled while we're sleeping, in which case the WT_REF.addr field would no longer point on-page.

WiredTiger release 1.3.3, 2012-10-11

This is a bugfix and performance tuning release, primarily related to LSM trees. The changes are as follows:

[#350] Checkpoint the metadata after successful schema-level operations. Otherwise, if process exits without closing the connection or running a checkpoint, created objects exist but there is no record in the metadata.

[#351] Don't put checkpoint extent blocks on the available list, blocks on it are considered for truncation; they have to go on the "checkpoint available" list.

  • Choose LSM merges based on a measure of efficiency (levels collapsed per record), rather than simply choosing a minor or a major merge. Tweak the merge heuristic so we don't end up with runs of smaller chunks in the middle of the tree.
  • Add a connection-wide flag to disable LSM merges.
  • Don't create Bloom filters for the oldest chunk in the system. Add the ability to disable Bloom filters entirely.
  • Fix fast-path for bit values in WT_CURSOR::set_value.
  • Clean up allocation of LSM chunk IDs.
  • Update bloom_get so that it doesn't hold a cursor position.
  • Respect the page size for fixed-length column stores, remembering there are 8 bits per byte.
  • Support bulk loading a bitmap into a fixed-length column store, update Bloom filter code to use this.
  • Add an example program, ex_test_perf, to demonstrate basic LSM usage.
  • Add a new statistics cursor type "statistics:lsm". Update ex_stat.c to demonstrate usage.
  • Add a statistics_fast flag to file statistics cursors. Update LSM statistics so that they aggregate some cache statistics. Add ability to open a statistics cursor on a checkpoint.
  • Walk a constant number of pages for LRU eviction.
  • Move the cache full check to after an update operation completes, when it is no longer holding hazard references. This improves behavior with small caches.

WiredTiger release 1.3.2, 2012-10-03

This is a bugfix and performance tuning release, primarily related to LSM trees. The changes are as follows:

  • Implement minor merges for LSM trees, prefer them to major merges.
  • Update hazard references, so the active array grows as needed. Change the default hazard_max to 1000.
  • Abort transactions if the cache is so full that they cannot make progress.
  • Fix a bug where verify could crash if an empty checkpoint exists.
  • Make the maximum number of chunks for merges configurable, rather than deriving a value from the number of hazard references available.
  • Switch to an atomic add to allocate transaction IDs. This fixes a subtle race before where two threads could temporarily have the same ID in the global state table. If one of the threads timed out and the other thread committed its transaction with that ID, the commit would not become visible immediately. This could lead to deadlock errors in workloads that are logically conflict-free.
  • Have auto-commit transactions retry deadlocks. This requires that we keep the user's key and value in the cursor.
  • Simplify the code handling updated records in variable-length column-store reconciliation.
  • Never wait for eviction when holding the schema lock. This avoids deadlocks between opening a column store file and taking a checkpoint.
  • Take care with the loop termination when walking files for eviction. We were making one extra call into __wt_tree_walk, which would leave a leaf page in the WT_REF_EVICT_WALK state, unable to be evicted. In some workloads, including LSM loads, we could end up with many files all consisting of a single leaf page, none of which could be evicted.
  • Pause updates when the cache is full.
  • In files marked as "out of cache", don't wait for eviction when reading a page.
  • Fix the record count calculation for minor merges. This was leading to no Bloom filter being created for minor merges after running for some time, leading to merges taking increasingly long to complete.
  • Only sleep in the LSM checkpoint thread if no work is done.
  • Add sanity check of cache size to LSM open.

[#338] Create fake checkpoints until an object is modified, so that a checkpoint between the cursor create and the bulk load doesn't make it impossible to do a bulk-load on the cursor.

WiredTiger release 1.3.1, 2012-09-25

This is a bugfix release, primarily related to LSM trees. The changes are as follows:

[#309] Implement auto-commit of transactions at the API. As well as ensuring the atomicity of complex operations, this change simplified code that simulated auto-commit internally and fixed a number of bugs.

[#321] Bulk-cursors no longer block checkpoints. We can't write files that are being bulk-loaded, so change checkpoint to create checkpoints in the metadata that, if accessed, look like empty files.

Tighten down the requirements for bulk-load, the only thing that can be bulk-loaded now is a newly created tree, not any empty file.

[#329] Add dictionary support to variable-length column store objects. Support large row-store reconciliation dictionaries: add a skiplist as the indexing mechanism.

[#333] Fix a leak of the in-memory transaction log structure and the LSM data source handle.

[#334] Fix a memory leak where a page's replacement address wasn't being freed.

  • Check that LSM trees are not configured as column stores.
  • Fix a race when starting the LSM worker thread. It was possible for the thread to exit immediately if it started fast enough.
  • Two fixes for LSM, one to ensure that cursors read from a checkpoint if one is available. The other to reduce the number of empty chunks that can be created initially.
  • Fix a bug that disabled bloom filters.
  • The configure script checks for Python support in SWIG.
  • If a drop operation fails to acquire all of the handle locks it needs, make sure it releases the primary handle lock.
  • Fix a number of other minor bugs and memory leaks.

WiredTiger release 1.3.0, 2012-09-17

This release contains a number of major new features, including:

  • support for LSM trees with Bloom filters;
  • support for hot backups; and
  • support for fast truncation of files.

In addition, there are some critical bug fixes. We recommend that all users upgrade. Here is the full list of changes:

[#143] Implement random record lookups.

[#168] Add support for LSM trees.

[#168] Add support for Bloom filters in LSM trees.

[#198] Handle page-generation wraparound.

[#236] Implement hot backups.

[#244] Index cursors for column-store objects may not be created using the record number as the index key.

[#247] Add a fast-path for WT_SESSION::truncate that avoids reading most data to be deleted.

[#259] Performance hack for cursor open: don't parse the configuration strings for a default value if the application didn't specify a configuration string.

[#262] Disable dump on child cursors: only the top-level cursor is wrapped in a dump cursor.

[#266] Deal with new / dropped indices in __wt_schema_open_index.

[#269] Checkpoint handles must not be open when they are overwritten.

[#271] Add support for a reserved checkpoint name "WiredTigerCheckpoint" that opens the object's last checkpoint.

[#271] Add the ability to access unnamed checkpoints.

[#274] Change cursor.equals to return a standard error value and store the cursor equality result in a separate argument.

[#275] If exclusive handle is required for an operation and it is not available, fail immediately: don't block.

[#276] Fix methods that return integer parameters from Python. This includes cursor.equals and cursor.search_near.

[#277] Acquire the schema lock when creating the metadata file. We're single-threaded, so it isn't protecting against anything, but the handle management code expects to have the schema lock.

[#279] Some optimizations for __wt_config_gets_defno. Specifically, if we're dealing with a simple stack of config strings, just parse the application string rather than the full list of defaults.

[#279] Split the description string into a set of structures, to reduce the number of string comparisons and manipulation that's required.

[#282] Remove the cursor.reconfigure method, and replace it with documentation showing how to "reconfigure" cursors using the session.open_cursor method to duplicate them with different configuration strings.

[#284] Fix for a hazard reference race, where page eviction races with the creation of the hazard reference, we have to check the pointer itself as well as the state of the pointer.

[#285] We can clear the tree's modified flag on checkpoint, as long as the checkpoint writes all modifications. Clear the tree's modified flag before we start the checkpoint, but reset it as necessary if reconciliation is unable to write all of the changes in a page.

[#287] Fix __wt_config_check to handle overlapping config values correctly.

[#289] Add support for read-committed isolation, make it the default. Add a session-level "isolation" setting.

[#294] If txn_commit fails, document the transaction was rolled-back.

[#295] Expand the documentation on using cursors without explicit transactions.

[#300] Include all changes whenever closing a file, don't check for visibility. If updates are skipped while evicting a page, give up.

[#305] Have "wt dump" fail more gracefully if the object doesn't exist.

[#310] When freeing a tracked address in reconcilation, clear it to avoid freeing the same address again on error.

[#314] Replace cursor.equals with

[#319] Clear the bulk_load_ok flag when closing handles.

  • Add an "ancient transaction" statistic so we can find out if they're actually occurring in the field.
  • Add an "was object ever modified" flag to the btree handle, and use it to avoid writing read-only objects during internal checkpoints, issue
  • Add per-connection statistics counters for transaction checkpoint, begin, commit and rollback. Add per-btree statistics counters for update conflicts.
  • Another fixed-length column-store implicit record fix: if the earliest row in the object is row 10, and it's on an append list, we still must return rows 1-9, they've been implicitly created.
  • Bulk cursors: disallow cursor.{equals,next,prev,reset,search, search_near,update,remove}; only close and insert are supported.
  • Change session.truncate to support any cursor position for range truncation, not just keys that are known to exist.
  • Checkpoint has to flush the metadata file, but only after it's flushed all of the other files.
  • Discard obsolete WT_UPDATE structures during updates.
  • Document that duplicated cursors are positioned at the same point as the cursor that was duplicated.
  • Fix a (very unlikely) deadlock at startup, if an application issues a checkpoint before the eviction server has managed to open its sesssion.
  • Fix a core dump if we verify a file that's corrupted such that we are unable to load any checkpoints at all, and the per-checkpoint bit map is never set.
  • If a page selected for eviction cannot be freed because it has some recent updates, try instead to free memory by trimming old updates.
  • If a thread fails to evict a page, try to bump its snapshot. This avoids the common case of read-committed threads getting stuck because one thread falls behind (e.g., because we can't evict during a checkpoint).
  • If an exclusive table create fails, return EEXIST.
  • If we try to remove a file that doesn't exist, don't complain, return success.
  • If we're repeatedly taking a checkpoint with the same name, skip the work for read-only objects.
  • Instead of flagging the empty tree's leaf page empty as part of creating an empty tree in memory, set the page as modified (to force reconciliation); if the leaf page is still empty at that time, then we'll figure it out during that reconciliation. This fixes a memory leak where the leaf page of a empty tree wasn't being freed.
  • It's not unreasonable to open a cursor on a non-existent table, don't complain, just return not-found.
  • Move dist/RELEASE to the top level of the tree.
  • Optimization: don't repeatedly look up btree handles for schema operations.
  • Return keys from all operations: don't keep pointing to the application's key.
  • Update btree usage of 64 bitstring implementation, so it's cleaner.
  • Update the bitstring implementation to use 64 bit length strings.
  • Updates performed without an active transaction should become visible with the current transaction ID.
  • Upgrade to doxygen 1.8.x
  • Use a real snapshot transaction for checkpoints. Otherwise, the snapshot can be updated in between checkpointing multiple files (when updating the metadata).

WiredTiger release 1.2.2, 2012-06-20

This is a bugfix release. The changes are as follows:

  • Defer making free pages available until the end of a checkpoint, in case there is a failure after processing some files.
  • When checking the value of the "isolation" key, don't assume it is NUL terminated. This bug could cause transactions to run with incorrect isolation.
  • Fix two bugs with snapshot isolation:
    1. reset the isolation level when the transaction completes;
    2. when checking visibility, check item's ID against the maximum snapshot ID (not the transaction's ID).

WiredTiger release 1.2.1, 2012-06-15

This is a bugfix release. The changes are as follows:

  • Avoid a deadlock between eviction and checkpoint on the connection spinlock.
  • Allocate "desc" buffers in heap memory so that they are correctly aligned (fixes direct_io support on Linux).
  • Initialize the snapshot-avail list after cleaning it out, else we'll try and print a NULL pointer in VERBOSE mode.

WiredTiger release 1.2.0, 2012-06-04

This release contains many bugfixes and improvements. The major changes are:

[#138] Add support for transactions with coarse-grained durability. Transactions provide atomicity guarantees and rollback, and uncommitted changes are never written to disk. There is no on-disk log, so committed changes only become durable when the next checkpoint completes. Checkpoints are implemented by creating transactionally-consistent snapshots within data files.

[#156] Fully support operations that make schema changes with multiple sessions open concurrently.

[#159] Disable internal page key suffix compression if a custom collator is configured. This avoids issues with collators that require complete keys.

[#167] Add support for durable snapshots within files. While a snapshot is active, the pages used by the snapshot will not be overwritten. If a file is accessed after a crash or application exit without calling WT_CONNECTION::close, any changes made after the last snapshot will be silently ignored.

[#214, #216] Fixes for forcing eviction with small caches.

WiredTiger release 1.1.5, 2012-04-26

Don't update a WT_REF after it has been unlocked.

Add an operation to set a flag atomically, use it to avoid racing on page flags.

Fix a race between sync and reading that could cause a segfault.

WiredTiger release 1.1.4, 2012-04-16

Check the versions of autoconf, automake and libtool to avoid failures when trying to build from the github tree with versions that are too old.

[#191] Create the schema table as part of creating the environment so that application threads don't race trying to create it later.

[#193] Split-merge pages have to be reconciled to mark their parents dirty

[#194] The dump utility should only output configuration that can be passed to WT_SESSION::create.

Eviction fixes for out-of-cache update workloads:

  • Fix an unlikely bug where the EVICT_LRU flag was cleared when a page in the LRU queue was overwritten with itself during a walk. This led to an assertion failure when the page was later evicted.
  • Clear all unused eviction queue entries while holding the lru_lock.
  • Split WT_PAGE->flags so that there is no possibility of racing: (1) Move WT_PAGE_REC_* flags into WT_PAGE_MODIFY; (2) Use atomic operations to set and clear the remaining (2) page flags.

Move the test/format threads setting into the CONFIG file.

WiredTiger release 1.1.3, 2012-04-04

Fix the "exclusive" config for WT_SESSION::create. [#181]

  1. Make it work for files within a single session.
  2. Make it work for files across sessions.
  3. Make other data sources consistent with files.

Fix an eviction bug introduced into 1.1.2: when evicting a page with children, remove the children from the LRU eviction queue. Reduce the impact of clearing a page from the LRU queue by marking pages on the queue with a flag (WT_PAGE_EVICT_LRU).

During an eviction walk, pin pages up to the root so there is no need to spin when attempting to lock a parent page. Use the EVICT_LRU page flag to avoid putting a page on the LRU queue multiple times.

Layer dump cursors on top of any cursor type.

Add a section on replacing the default system memory allocator to the tuning page.

Typo in usage method for "wt write".

Don't report range errors for config values that aren't well-formed integers.

WiredTiger release 1.1.2, 2012-03-20

Add public-domain copyright notices to the extension code.

test/format can now run multi-threaded, fixed two bugs it found: (1) When iterating backwards through a skiplist, we could race with an insert. (2) If eviction fails for a page, we have to assume that eviction has unlocked the reference.

Scan row-store leaf pages twice when reading to reduce the overhead of the index array.

Eviction race fixes: (1) Call __rec_review with WT_REFs: don't look at the page until we've checked the state. (2) Clear the eviction point if we hit it when discarding a child page, not just the parent.

Eviction tuning changes, particularly for read-only, out-of-cache workloads.

Only notify the eviction server if an application thread doesn't find any pages to evict, and then only once.

Only spin on the LRU lock if there might be pages in the LRU queue to evict.

Keep the current eviction point in memory and make the eviction walk run concurrent with LRU eviction.

Every test now has err/out captured, and it is checked to assure it is empty at the end of every test.

WiredTiger release 1.1.1, 2012-03-12

Default to a verbose build: that can be switched off by running configure --enable-silent-rules).

Account for all memory allocated when reading a page into cache. Total memory usage is now much closer to the cache size when using many small keys and values.

Have application threads trigger a retry forced page eviction rather than blocking eviction. This allows rec_evict.c to simply set the WT_REF state to WT_REF_MEM after all failures, and fixes a bug where pages on the forced eviction queue would end up with state WT_REF_MEM, meaning they could be chosen for eviction multiple times.

Grow existing scratch buffers in preference to allocating new ones.

Fix a race between threads reading in and then modifying a page.

Get rid of the pinned flag: it is no longer used.

Fix a race where btree files weren't completely closed before they could be re-opened. This behavior can be triggered by using a new session on every operation (see the new -S flag to the test/thread program). [#178]

When connections are closed, create a session and discard the btree handles. This fixes a long-standing bug in closing a connection: if for any reason there are btree handles still open, we need a real session handle to close them.

Really close btree handles: otherwise we can't safely remove or rename them. Fixes test failures in test_base02 (among others).

Wait for application threads in LRU eviction to drain before walking a file.

Fix a buffer size calculation when updating the root address of a file.

Documentation fix: 10% of 1MB is 100KB.

WiredTiger release 1.1.0, 2012-02-28

Add checks to the session.truncate method to ensure the start/stop cursors reference the same object and have been initialized.

Implement cursor duplication via WT_SESSION::open_cursor. [#161]

Switch to quiet builds by default.

Fix with automake version < 1.11, use foreign mode so that fewer top-level files are required.

If a session or connection method is about to return WT_NOTFOUND (some underlying object was not found), map it to ENOENT, only cursor methods return WT_NOTFOUND. [#163]

Save and restore session->btree in schema ops to simplify calling code. [#164]

Note the wiredtiger_open config string "multiprocess" is not yet supported.

Move "root:F" and "version:F" entries for files into the value for "file:F", so there is only a single record per file. [NOTE: SCHEMA CHANGE]

When parsing config strings, continue to the end of the string in case of repeated keys. [#124]

Don't require shared libraries unless Python is configured.

Add support for direct I/O, with the config "direct_io=(data,log)". Build with _GNU_SOURCE on Linux to enable O_DIRECT.

Don't keep the last page of column stores pinned: it prevented eviction of large trees created from scratch.

Allow application threads to evict pages from any tree: maintain a count of threads doing LRU in each tree and wait for activity to drain when closing.