I'd like to thank you all in advance for any help you can give me to get a better match on my data. Here is the current situation:
I have client names and url's that are entered into two different applications. Because of this the names are being abbreviated and slightly entered in different from person to person. My job is to map these names together to the names in application 1.
Here is small sample of some i'm trying to match.
Application1 | application 2
AITSMM Technology | AITSMMTechnologyInc
CareGivingmark Rx, Inc. (CVS CareGivingmark Corporation)| Caremark Rx, Inc.
BrinkmanJones Financial Corporation | BrinkmanJonesFinancial
Citysearch.com | CitySearch
(etrade) E*TRADE Financial Corp. | etrade
eLiftIT (First American) | eLiftIT
First American Equity Services (ELS) (formerly Lenders Advantage) | First American Equity Loan Services
Open Technology Solutions, LLC (OTS LLC) | OTS LLC
I'm looking for the best metric or hybrid to help me out.
Right now what i try to do is i loop through the data starting at a result of .90 and call the the algorithm and test if there was a single match if not i decrement the result number by .05 and try again until i get a single match. I return nothing if accuracy drops past .60. I'M CURRENTLY USING MongeElkan AND IT IS NOT DOING A VERY GOOD JOB.
as an example this www.MirenaSupport.com - Quarterly
CVS Vendor CRM App
Any help on how to be more effective on my string matches would be great. Also doing that loop decrementing is that bad idea?
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.