Re: [Febrl-list] what string similarity measure?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Albert-Jan,

I would imagine "String Truncate" might be the most appropriate similarity
measure for abbreviations.

Robin

On 24 March 2011 09:05, Albert-Jan Roskam <fo...@ya...> wrote:

> Hello,
>
> I'm using Febrl 0.3 to match two datasets, using school names (among
> others) as linkage variables. Dataset A has long versions of school names,
> and dataset B has short(er) versions (e.g., partially abbreviated) of the
> school names. What is the best string similarity measure to use in such
> a case? Many similarity measures seem to be designed for typos,
> not for cases such as this.
>
> Cheers!!
> Albert-Jan
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> All right, but apart from the sanitation, the medicine, education, wine,
> public order, irrigation, roads, a fresh water system, and public health,
> what have the Romans ever done for us?
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>
> ------------------------------------------------------------------------------
> Enable your software for Intel(R) Active Management Technology to meet the
> growing manageability and security demands of your customers. Businesses
> are taking advantage of Intel(R) vPro (TM) technology - will your software
> be a part of the solution? Download the Intel(R) Manageability Checker
> today! http://p.sf.net/sfu/intel-dev2devmar
> _______________________________________________
> Febrl-list mailing list
> Feb...@li...
> https://lists.sourceforge.net/lists/listinfo/febrl-list
>
>

-- 
Robin Gower
http://infonomics.ltd.uk
0791 255 3187

Lies, damn lies, and your evidence base:
http://clients.infonomics.ltd.uk/?q=statisticslie