I'm currently experiencing memory problems when de-duping a 160Mb file with 1M records. The problem seems to be the design of the classifier classes. The classify method returns three sets, match, non-match and possible match, which in effect doubles the number of rows in memory. A more memory efficient solution would not hold all the data in memory but would rather iterate over the weight-vector file. The three returned datasets makes this solution somewhat awkward. Has anyone encountered a similar problem? Anyway to work around the problem without getting my hands messy?