Download Latest Version dolt-windows-amd64.7z (21.9 MB)
Email in envelope

Get an email when there's a new version of Dolt

Home / v1.57.1
Name Modified Size InfoDownloads / Week
Parent folder
dolt-windows-amd64.msi 2025-07-22 44.0 MB
install.sh 2025-07-22 3.2 kB
dolt-windows-amd64.7z 2025-07-22 22.0 MB
dolt-windows-amd64.zip 2025-07-22 38.4 MB
dolt-darwin-arm64.tar.gz 2025-07-22 40.4 MB
dolt-darwin-amd64.tar.gz 2025-07-22 42.3 MB
dolt-linux-arm64.tar.gz 2025-07-22 39.7 MB
dolt-linux-amd64.tar.gz 2025-07-22 42.3 MB
1.57.1 source code.tar.gz 2025-07-22 12.3 MB
1.57.1 source code.zip 2025-07-22 14.1 MB
README.md 2025-07-22 19.7 kB
Totals: 11 Items   295.5 MB 0

Merged PRs

dolt

  • 9557: go: statspro: Ensure we Close() the DoltDB associated used for stats storage when we rotate storage. Dolt previously leaked a file descriptor on certain platforms every time stats was garbage collected. By closing the DoltDB, we no longer leak the descriptor. Closing is currently best effort, since certain in-flight file operations can actually cause closing to fail.
  • 9547: Fix off-by-one error in deciding whether to match keys or prefixes -1 was causing us to try to match full keys when only a prefix was available. This lead to a panic when comparing tuples. Panic exposed by dolthub/go-mysql-server#3099 because it caused us to pick lookup joins more frequently
  • 9542: Dolt ci view Adds dolt ci view Right now displays the yaml file for the associated workflow when used via dolt ci view <workflow name>. Can also display individual jobs via --job option.
  • 9528: proto/third_party,go.mod: Bump grpc-go, protobuf and protobuf-go. Regenerate go/gen/proto.
  • 9514: [#9508] - Fix dolt_log table function to support bind variables in prepared statements Fixes [#9508] Fix dolt_log table function to support bind variables in prepared statements Implemented deferred argument parsing to allow proper bind variable handling during prepare phase
  • 9490: Unskip feature version bats test that now passes It looks like someone has fixed this feature version issue and Codex identified it: https://github.com/dolthub/dolt/issues/6303
  • 9484: Fix sql-diff skipped tests
  • 9463: Fixed Skipped bats test for fetching a tag ref
  • 9460: Implement the @@read_only system variable This fixes: [#9176] This sets the system variable when the engine is created. There still is no ability to change the @@read_only flag while the server is running.
  • 9300: Implement tree-based merging ## High Level Concept We want to optimize merges by taking into account the shape of the prolly tree. In many cases we can assemble a merged tree out of nodes from either branch, and avoiding unnecessarily recursing into the leaf nodes. But in order for that to work, the diff and merge operations need to be able to operate on internal tree nodes, and not just on leaf key-value pairs. This project contains several independent-but-related changes in order to make that possible. The three-way merge process has three distinct stages:
  • The Two-Way Differ: compares two trees and produces an ordered stream of diffs describing the changes between them.
  • The Three-Way Differ: Takes two two-way differs, each of which compares one of the branches with their common ancestor. Produces an ordered stream of patches, describing changes that must be applied to the ours tree in order to produce the merged tree.
  • The Three-Way Merger: Applies the patches from the previous stage. This PR modifies each of these stages in order to make tree-based diffing possible. ## Part 1: Two-Way Range Diffs Relevant commit: [10e6cb] If an internal tree node is added on one branch and has no conflicting changes on the other branch, that node will appear in the merged tree. We want to be able to treat that entire node as a single change, represented as a single Diff object. This diff object is a new type of Diff called a RangeDiff: a diff that represents a change to a range of keys. In addition to the typical Key field that all diffs have, the range diff also has a PreviousKey field, which contains the key value that immediately precedes the beginning of the range. The diff thus represents changes on range (PreviousKey,Key] where the lower bound is open and the upper bound is closed. The To value of this diff is the hash address of the added tree node, or nil if all keys in that range were deleted. Note that if the address is not nil, it is not possible to determine whether the diff contains an addition, a modification, or a deletion without loading the referenced node, and in fact a range diff may contain all of the above within its range. But in many cases loading the node is not necessary for the merge. Range diffs make it possible for a differ to produce diffs based on an intermediate level of the tree without loading the levels beneath it. For non-leaf levels of the tree, all produced diffs have a DiffType of RangeDiff. By default, differs do not produce range diffs. The newly added function RangeDifferFromRoots returns a differ that can return range diffs. The user of this "Range Differ" can call two different methods in order to get the next diff:
  • Next(): returns the next diff, which may be at the same tree level as the previous diff, or a higher level.
  • split(): recurses one level lower into the tree, returning a diff whose range is within the range of the previous diff. (If this causes the differ to reach level 0, it will start producing the standard AddedDiff, ModifiedDiff, and RemovedDiff values.) When a call to Next hits a chunk boundary, it returns to the previous tree level. This results in the following invariant:
  • If you list every node returned by the differ except for the nodes that preceded a call to split(), then none of those nodes have overlapping ranges, those nodes appear in order, and they collectively describe every change between the two trees. As a requirement for implementing this, I had to make some changes to the AddrDiff and JsonDiff types. These were built on top of differs, but directly accessed Differ internals and made assumptions about their behavior. Changing the internal behavior broke these types, so I changed them to no longer rely on internal differ state for correctness. ## Part 2: Producing Three-Way Diffs (Range Patches) Relevant commit: [02a242] Part 1 only describes two-way diffing. We want to use that to achieve optimal three-way diffing and merging. Dolt currently has two different algorithms for performing three-way merging:
  • The simpler ThreeWayMerge method, which takes two trees, and diffs them both against a common base tree. We use this for merging simple internal tree-based structures like artifact maps and commit closures.
  • The more complicated ThreeWayDiffer type, which is an iterator that produces ThreeWayDiff values. ThreeWayDiffs contain additional information that is necessary to detect constraint violations, update artifact maps, etc. We use this for merging table data. As of https://github.com/dolthub/dolt/pull/9229, both approaches write a stream of Patch values to a channel, where a Patch is a key-value pair describing a change that must be made to the left branch in order to produce the merged tree. The ThreeWayDiffer type does extra work before it produces these patches, but in both cases the patches are consumed and applied the same way. In order to merge optimally, we need to allow Patches to represent a range, just like how Part 1 allows diffs to represent arrange. I accomplished this by just allowing a patch to wrap a diff. If the table being merged has constraints, we have to look at every modified row to ensure that the constraints aren't being violated by the merge. So for now, we only allow for range patches in the case that there are no constraints to violate, and we can safely use the simpler ThreeWayMerge approach. We also currently only do this if there are no secondary indexes on the table, although we should be able to relax this in the future. The new version of ThreeWayMerge uses the new range differs to produce the smallest set of diffs that describe the merged tree: if a range diff produced by one branch does not overlap with any of the diffs produced by the other branch, we can pass that entire range as a single Patch. If the two branches have overlapping range diffs, we must call split() on the differs to produce smaller range diffs that don't overlap. ## Part 3: Apply Range Patches via the Chunker Relevant commits:
  • [96889a]
  • [ffdce3] Once we produce the stream of patches, we need to apply them to the tree. This is the most straightforward part of the PR: we change the API for the chunker to allow it to take a node address and write all of that node's rows into the new table. If the chunker is currently at a chunk boundary, it can write the address directly into the new tree without needing to load it. Otherwise, it loads the node and recursively writes it's children into the new tree. ## Putting it all together during Merge Relevant commit: [083fdb] This commit contains the changes to table merger, built on top of all the previous changes. It checks whether or not the table merge meets the current limitations for tree-based merging, and picks which algorithm to use to produce the stream of patches, then uses ApplyMutations to apply those patches to the ours tree, producing the merged tree. ## Impact The impact is best seen when the two branches are making changes to completely separate regions of the key space, for example if every key modified by branch A is less than every key modified by branch B. This isn't an unreasonable use case: imagine a bulk import job getting merged back into a main branch, where all the imported keys are contiguous. I created a benchmark: a table with a single int column. Ancestor table: empty Left branch table: contains the values 0 <= pk < 1,000,000 Right branch table: contains the values 1,000,000 <= pk < 2,000,000 The SerialMessage flatbuffer uses 8 bytes to store each row, so both branches contain around 8MB of new data. Prior to this PR, calling dolt_merge took 1.38 seconds on my laptop, regardless of the direction of the merge. After this PR, calling dolt_merge took 0.012 seconds on my laptop, regardless of the direction of the merge. That's roughly a 100x speedup for this example. The actual speedup is an asymptotic increase in performance: previously, merge time was O(N+M), where N and M are the number of changed rows on each branch. The new time complexity is harder to measure, since it depends on the number of contiguous regions that are modified by one branch but not the other, and also on the height of tree, but it's approximately O(log(N+M)) in the best case, and no worse than the original algorithm even in the worst case. I believe that we should always prefer this approach over the original implementation. The original implementation cares about the "direction" the merge, which can impact performance, while this implementation should have the same performance characteristics regardless of the direction of the merge. This is not immediately obvious given that both approaches pick one side of the merge and apply "patches" from the other, but it's still true: the tree-aware approach to merging should achieve the theoretical minimum number of loads and comparisons to complete the merge, regardless of merge direction. In the worst case, this approach should have performance comparable to the original performance, although I'll be doing more benchmarks before submitting to verify this. The most likely way that this may cause a performance regression in the worst case is because the objects that we use to encode a tree patch is now larger, and these objects get sent through channels, which may have observable performance implications: See https://www.jtolio.com/2016/03/go-channels-are-bad-and-you-should-feel-bad/ for more details. If this turns out to be the case, we can optimize the Patch struct for this before submitting. ## Consequences / Caveats One unintended consequence of this change is that it may not be possible to accurately calculate merge stats using this method. Merging a table produces a MergeStats struct that contains a count of the number of rows added/modified/deleted. But this information can only be computed by visiting every row that has been changed, which this approach no longer does. If we skip recursing into a node because we see that node has only changed on one branch, we can't know the exact number of modifications. However, if you try the above command line example, you'll notice that it correctly reports the number of added rows. You'll also notice that it's significantly slower than calling DOLT_MERGE. This is because the MergeStats produced by merging the table is actually ignored. Instead, the dolt merge CLI makes a subsequent select from the dolt_diff_stats system table function in order to get the number of modified rows, and this system table function does a conventional diff. Thus, the fact that we're not calculating correct merge stats during the merge itself may not matter. We do have some tests for it, which are breaking. I'm going to discuss with Neil how we want to handle this. ## Limitations Currently, this PR only produces RangeDiffs under the following narrow circumstances:
  • The table is not keyless
  • The table does not have secondary indexes
  • The table does not have any constraints
  • The merge does not introduce schema changes We should be able to relax these constraints if needed. For example:
  • It should be possible to merge secondary indexes using the same algorithm we use here to merge primary indexes.
  • We need to ensure that the merge doesn't produce a table that violates constraints, but this doesn't necessarily require examining every leaf node: depending on the exact constraints / which branch adds a constraint / which branch adds rows that potentially violate constraints, we may be able to validate the table without having to process every modified row. ## Room for Improvement The most obvious room for improvement is relaxing the limitations under which we can safely produce and apply range diffs. Another possible improvement: The current design for the how the three-way differ uses the underlying two-way differs is interactive: after getting a two-way diff, the three-way differ has to decide whether to call split() to recurse into a lower level of the tree, or call Next() to accept the provided diff and move on. This design is inherently non-parallelizable. It's possible that an alternate approach may allow each two-way differ to have its own goroutine and produce diffs in parallel, and then choosing which ones to split separately.

go-mysql-server

  • 3110: Use child column id's for union when assigning exec indexes Fixes [#9516] SetOp is a TableIdNode so ColumnIds were assigned based off SetOp.cols, which is a ColSet. However, ColSet stores ColumnIds as a bitmap, not in the order they appear, and iterates over them in increasing numerical order. This causes a problem when the ColumnIds are not arranged in increasing order. For example, in (select 'parent' as tbl, id, words from t union select 'child' as tbl, id, words from t2) as combined, the ColumnIds are 3, 1, 2 but ColSet iterates over them as 1, 2, 3. As a result, combined.id gets wrongly assigned the field index of 0, the wrong column is compared in the filter, and an empty result is returned. This was fixed by adding a case for SetOp where ColumnIds are assigned based on the left child (a Project node for the above example). Added TODOs:
  • It may not be necessary for SetOp to be a TableIdNode. It seems kinda hacky that it is.
  • ColumnIds for TableIdNode probably shouldn't be assigned based on ColSet.ForEach (increasing order) since that might not reflect the actual order they are in.
  • create table as select... currently failing in Doltgres (dolthub/doltgresql#1669)
  • 3107: change Compute method signature to return error separately for window functions
  • 3106: adding skipped tests for time types and foreign keys Similar to decimal types, MySQL allows differing precisions with time types to have foreign key constraints, but zero padding prevents anything from being inserted. Additionally, timestamps and datetimes can have foreign key constraints referencing each other.
  • 3105: support set returning function cases in table function wrapper
  • 3104: [#9519] - Fix SQL syntax error for mixed named columns and * in SELECT Fixes [#9519]
  • Validates SELECT expressions in the existing processing loop
  • Rejects named expressions before unqualified *
  • Allows qualified table.* in any position
  • Allows expressions after * (e.g., SELECT *, column)
  • 3102: [#9494] - Clean up mixed string foreign key logic Fixes [#9494] Added mixed string type support - Modified the else branch in foreignKeyComparableTypes to check for compatible string types instead of just returning false
  • 3101: [#9472] - Fix SET column foreign key constraintsfix specific errs Fixes [#9472] • Enabled previously skipped SET foreign key tests • Fix SET type compatibility checking in foreign key validation • Handle SET conversion errors appropriately during foreign key checks
  • 3100: add skipped tests for auto_increment and max integer auto_increment breaks integer limits on dolt. Instead of throwing an error when attempting to insert a value greater than the max of that type, a column with the auto_increment constraint just inserts the max value instead. Additionally, results in incorrect values in the AUTO_INCREMENT= table option when doing a show create table ....
  • 3097: test aggregates over indexes with false filter
  • 3096: Allow empty strings in set string conversions part of [#9468] skips test case related to [#9510]
  • 3094: [#9496] - Fix DECIMAL foreign key constraint validation to match MySQL behavior Fixes [#9496] Allow DECIMAL foreign key creation with different precision/scale but enforce strict constraint validation MySQL allows DECIMAL foreign keys with different precision/scale but rejects constraint violations based on exact scale matching

Closed Issues

  • 9516: Multi-db union can't be filtered
  • 9508: Can't prepare dolt_log table function statement
  • 9519: throw syntax error when * is combined with named columns in select clause
  • 9494: Mixed String Type Foreign Keys
  • 9472: Foreign Key constraints over SET columns beahvior differs from MySQL
  • 9176: @@read_only not true when server started with --read-only flag.
  • 9496: DECIMALs with foreign key behavior differs from MySQL
Source: README.md, updated 2025-07-22