Please help me find the best metric/hybrid

  • steve labar
    steve labar

    I'd like to thank you all in advance for any help you can give me to get a better match on my data. Here is the current situation:

    I have client names and url's that are entered into two different applications. Because of this the names are being abbreviated and slightly entered in different from person to person. My job is to map these names together to the names in application 1.

    Here is small sample of some i'm trying to match.
    Application1 | application 2

    AITSMM Technology | AITSMMTechnologyInc
    CareGivingmark Rx, Inc. (CVS CareGivingmark Corporation)| Caremark Rx, Inc.
    BrinkmanJones Financial Corporation | BrinkmanJonesFinancial | CitySearch
    (etrade) E*TRADE Financial Corp. | etrade
    eLiftIT (First American) | eLiftIT
    First American Equity Services (ELS) (formerly Lenders Advantage) | First American Equity Loan Services
    Open Technology Solutions, LLC (OTS LLC) | OTS LLC

    I'm looking for the best metric or hybrid to help me out.

    Right now what i try to do is i loop through the data  starting at a result of .90 and call the the algorithm and test if there was a single match if not i decrement the result number by .05 and try again until i get a single match. I return nothing if accuracy drops past .60. I'M CURRENTLY USING MongeElkan AND IT IS NOT DOING A VERY GOOD JOB.

    as an example this - Quarterly

    matched against this list does not match ? QA
    CVS Vendor CRM App -dropped

    Any help on how to be more effective on my string matches would be great. Also doing that loop decrementing is that bad idea?