I haven't gotten to using it for actual deduplication yet, but have already encountered trivial bugs in the interface and data loading.
90% of the tests in testData.py seems to be asserting return types, assertions which are little use in Python because all Python code cares about is supporting the correct method calls (duck typing). The problems you want to avoid are logic errors that cause crashed and wrong results, for example corner-cases such filenames containing spaces when you know the code is passing the filenames on to other programs.
At least testIndex.py seems to have more tests of correctness on a sample test-case, but I wonder how many edge- and corner-cases are being tested.
Log in to post a comment.