From: Gaudenz S. <ga...@so...> - 2009-05-13 22:41:21
|
Hi I'll send 4 patches for febrl in separate messages following this one. I first tried to send the patches in one mail, but it was too big for the mailinglist software and never got approved. I used febrl standardise a very large set of email addresses as part of a research project on the Debian GNU/Linux Distribution. I also did some experiments for deduplication of the dataset, but ultimately ended up doing most of the deduplication with a manually supervised script which used the lists produced by febrl as input. The main task was the deduplication of all the mailaddresses and associated realnames and nicknames of all the people who reported bugs into the Debian bug database. Therefore my patches add some improvements for this kind of data. Please tell me if you need further information about any of the patches or if you would like the in another form (e.g. split up into smaller patches, ...). I would be happy if you could integrate some or all of these patches into the next release of febrl, but I leave it to your own judgement if they are good enough to be part of a release. If you have any suggestions for improvements I certainly willing to update the patches. Gaudenz -- Ever tried. Ever failed. No matter. Try again. Fail again. Fail better. ~ Samuel Beckett ~ |