From: Robin G. <ro...@in...> - 2011-03-28 01:01:27
|
Hi Albert-Jan, I would imagine "String Truncate" might be the most appropriate similarity measure for abbreviations. Robin On 24 March 2011 09:05, Albert-Jan Roskam <fo...@ya...> wrote: > Hello, > > I'm using Febrl 0.3 to match two datasets, using school names (among > others) as linkage variables. Dataset A has long versions of school names, > and dataset B has short(er) versions (e.g., partially abbreviated) of the > school names. What is the best string similarity measure to use in such > a case? Many similarity measures seem to be designed for typos, > not for cases such as this. > > Cheers!! > Albert-Jan > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > All right, but apart from the sanitation, the medicine, education, wine, > public order, irrigation, roads, a fresh water system, and public health, > what have the Romans ever done for us? > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > ------------------------------------------------------------------------------ > Enable your software for Intel(R) Active Management Technology to meet the > growing manageability and security demands of your customers. Businesses > are taking advantage of Intel(R) vPro (TM) technology - will your software > be a part of the solution? Download the Intel(R) Manageability Checker > today! http://p.sf.net/sfu/intel-dev2devmar > _______________________________________________ > Febrl-list mailing list > Feb...@li... > https://lists.sourceforge.net/lists/listinfo/febrl-list > > -- Robin Gower http://infonomics.ltd.uk 0791 255 3187 Lies, damn lies, and your evidence base: http://clients.infonomics.ltd.uk/?q=statisticslie |