coopy Code

Brought to you by: eshuy

Tree [66e048] master /

History

HTTPS access

File	Date	Author	Commit
bindings	2012-01-21	Paul Fitzpatrick	[593e80] general cleanup
conf	2012-03-10	Paul Fitzpatrick	[4bf310] version bump
doc	2012-02-29	Paul Fitzpatrick	[30e766] no need to match all rows in first pass, just t...
packaging	2011-11-10	Paul Fitzpatrick	[4617a9] doc update
scripts	2012-03-06	Paul Fitzpatrick	[cb4356] hilite diff: test column reordering
src	2012-03-10	Paul Fitzpatrick	[66e048] fix js harness to report row mode
tests	2012-03-06	Paul Fitzpatrick	[cb4356] hilite diff: test column reordering
BUILD.txt	2012-03-01	Paul Fitzpatrick	[1e84b4] update mingw cross-compile instructions
CMakeLists.txt	2012-02-13	Paul Fitzpatrick	[9919f9] factor out options
COPYING.txt	2011-05-07	Paul Fitzpatrick	[f1070b] follow quoting spec
ChangeLog	2012-02-13	Paul Fitzpatrick	[622a6a] bump version to 0.6.0
CoopyGuide.pdf	2012-03-01	Paul Fitzpatrick	[50f795] update the guide to 0.6.2
GPL.txt	2010-09-18	Paul Fitzpatrick	[42878c] Merge branch 'master' of ssh://coopy.git.source...
README.md	2011-05-22	Paul Fitzpatrick	[a44af7] rename readme for github
SERVE.txt	2011-05-07	Paul Fitzpatrick	[f1070b] follow quoting spec
autogen.sh	2010-09-30	Paul Fitzpatrick	[d1f4c8] shuffle gui code around

Read Me

The COOPY toolbox

Diffing, patching, merging, and revision-control for spreadsheets and
databases. Focused on keeping data in sync across different
technologies (e.g. a MySQL table and an Excel spreedsheet).

See BUILD.txt for information on building the programs.
Summary: CMake
See SERVE.txt for server-side information.
Summary: fossil
See COPYING.txt for copyright and license information.
Summary: GPL. Relicensing of library core planned for version 1.0.

Example uses

Enumerating differences between any pairwise combination of CSV files,
database tables, or spreadsheets.
Applying changes to a database or spreadsheet, without losing
meta-data (formatting of spreadsheet, indexing/type information for
database). Particularly useful for applying changes in an
exports CSV file back to the original source.
Editing a MySQL/Sqlite database in gnumeric/openoffice/Excel/...
Distributed editing of a spreadsheet/database using a DVCS.
Benefits: revision history, offline editing in tool of choice,
self-hosting possible.

The main programs

ssdiff - generate diffs for spreadsheets and databases.
sspatch - apply patches to spreadsheets and databases.
ssmerge - merge tables with a common ancestor.
ssfossil - the fossil DVCS, modified to use tabular diffs
rather than line-based diffs.
coopy - a graphical interface to ssfossil.

Supported data formats

CSV (comma separated values)
SSV (semicolon separated values)
TSV (tab separated values)
Excel formats (via gnumeric's libspreadsheet)
Other spreadsheet formats (via gnumeric's libspreadsheet)
Sqlite
MySQL
Microsoft Access format (via mdbtools - READ ONLY)
A JSON representation of tables.
A custom "CSVS" format that is a minimal extension of CSV
to handle multiple sheets in a single file, allow
for unambiguous header rows, and have a clear representation
of NULL.

Supported diff formats

TDIFF (format developed with Joe Panico of diffkit.org)
DTBL (csv-compatible format, COOPY specific, may be dropped)
SQL (Sqlite flavor)

Features

By default, when comparing tables, no initial assumption is
made about schema similarity. Column names are not required
to exist, or to be preserved between tables. The number and
order of columns may also differ.
If schema changes are not expected, COOPY can be directed
to use certain columns as a trusted identity for rows (a key).
Respects row order for table representations for which row
order is meaningful (spreadsheets, csv).

Algorithm

The core of the COOPY toolbox is a 3-way comparision between an
ancestor and two descendents. First, rows are compared using bags of
substrings drawn from across all columns. Once corresponding rows are
known, columns are compared, again using bags of substrings. Row and
column assignments are optimized and ordered using a Viterbi lattice.
Once the pairwise relationships between each descendent and its
ancestor are known, differences are computed, and a good merged
ordering is determined (again using the Viterbi algorithm).

Status

COOPY targets a stable, fully-documented release at version 1.0. At
the time of writing, the version number is just beyond 0.5. It is
about half way there.