[Assorted-commits] SF.net SVN: assorted:[1867] sandbox/trunk/src/one-off-scripts
Brought to you by:
yangzhang
From: <yan...@us...> - 2013-08-26 06:54:33
|
Revision: 1867 http://sourceforge.net/p/assorted/svn/1867 Author: yangzhang Date: 2013-08-26 06:54:31 +0000 (Mon, 26 Aug 2013) Log Message: ----------- Add deduper for bitrot.db Added Paths: ----------- sandbox/trunk/src/one-off-scripts/deduper-for-bitrot-db/ sandbox/trunk/src/one-off-scripts/deduper-for-bitrot-db/README sandbox/trunk/src/one-off-scripts/deduper-for-bitrot-db/dups.py Added: sandbox/trunk/src/one-off-scripts/deduper-for-bitrot-db/README =================================================================== --- sandbox/trunk/src/one-off-scripts/deduper-for-bitrot-db/README (rev 0) +++ sandbox/trunk/src/one-off-scripts/deduper-for-bitrot-db/README 2013-08-26 06:54:31 UTC (rev 1867) @@ -0,0 +1 @@ +(Incomplete) a file deduper leveraging the DB files from `bitrot`. Added: sandbox/trunk/src/one-off-scripts/deduper-for-bitrot-db/dups.py =================================================================== --- sandbox/trunk/src/one-off-scripts/deduper-for-bitrot-db/dups.py (rev 0) +++ sandbox/trunk/src/one-off-scripts/deduper-for-bitrot-db/dups.py 2013-08-26 06:54:31 UTC (rev 1867) @@ -0,0 +1,14 @@ +import sqlalchemy as sa, itertools as itr, collections as col + +eng = sa.create_engine('sqlite:////tmp/bitrot.db') +rows = eng.execute("select * from bitrot where path not like './.Trash/%' and path not like './Library/%';").fetchall() +duphashes = set(k for k,v in col.Counter(row['hash'] for row in rows).iteritems() if v > 1) +dups = [row for row in rows if row['hash'] in duphashes] +groups = [list(members) for group, members + in itr.groupby(sorted(dups, key=lambda x: x['hash']), lambda x: x['hash'])] +#print groups[:10] + +# TODO get file sizes +# TODO compare actual contents +# TODO look for close matches (esp. in larger files) +# TODO package for windows This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |