<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Recent changes to Home</title><link>https://sourceforge.net/p/deduper/wiki/Home/</link><description>Recent changes to Home</description><atom:link href="https://sourceforge.net/p/deduper/wiki/Home/feed" rel="self"/><language>en</language><lastBuildDate>Tue, 30 Oct 2012 14:00:56 -0000</lastBuildDate><atom:link href="https://sourceforge.net/p/deduper/wiki/Home/feed" rel="self" type="application/rss+xml"/><item><title>WikiPage Home modified by Venki</title><link>https://sourceforge.net/p/deduper/wiki/Home/</link><description>&lt;pre&gt;--- v1
+++ v2
@@ -1,8 +1,61 @@
-Welcome to your wiki!
+Deduper
+=======
 
-This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: [SamplePage].
+Deduper is a simple command line java tool to remove duplicate customer records. Customer duplicate records could be very tricky. They suffer the problems such
+as abbreviating the address, typos and various possible representation of same address and name. 
 
-The wiki uses [Markdown](/p/deduper/wiki/markdown_syntax/) syntax.
+Say for eg.
 
-[[project_admins]]
+1. John Street 23
+2.  John st. 23
+
+Both mean the same thing
+
+similary, in the below example both refer to the same thing, but there is a typo and also an abbreviation in place
+
+1. Alphan Majar
+2. Alp. Major
+
+Even with powerful computers, it is difficult to identify these duplicates. Deduper uses modified blocking nearest neighbor based clustering to identify possible duplicates. 
+
+Usage
+-----
+
+
+     venki@venki-Studio-1535:~/javaworkspace/deduper/build$ java -jar deduper.jar 
+     Deduper
+     ===================================
+     USAGE : java -Xmx2G -jar deduper.jar &lt;customer_csv&gt; &lt;blocksize&gt; &lt;radius&gt;
+
+     customer_csv : csv file, format: customer_id|postalCode|concatenatedAddress
+     blocksize    : used for bucketing, default: 6
+     radius       : allowed differences between addresses, default: 5
+
+
+The customer_csv must contain three fields, separated by pipe ('|")
+
+*customer_id|postalCode|concatenatedAddress*
+
+**customer_id:**current Unique identifier of customer record
+
+**postalCode:**postal code of the customer address
+
+**concatenatedAddress:** concatenated address fields, for readability use "\\t" to separate address fields.
+
+example
+-------
+
+java -jar deduper.jar customer_merge.csv 6 4
+
+
+output
+------
+The output will contain only possible duplicates, the output will be saved in the file called clusters_&lt;input_file&gt; . say for e.g clusters_customer_merge.csv
+
+the fields of the output file will be **new_customer_id|old_customer_id|address**
+
+**new_cluster_id:** possible duplicates will be assigned same new_customer_id.
+
+
+
 [[download_button]]
&lt;/pre&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Venki</dc:creator><pubDate>Tue, 30 Oct 2012 14:00:56 -0000</pubDate><guid>https://sourceforge.netfece42fac3513a86d9a4b0bc43d189935ad73839</guid></item><item><title>WikiPage Home modified by Venki</title><link>https://sourceforge.net/p/deduper/wiki/Home/</link><description>Welcome to your wiki!

This is the default page, edit it as you see fit. To add a new page simply reference it within brackets, e.g.: [SamplePage].

The wiki uses [Markdown](/p/deduper/wiki/markdown_syntax/) syntax.

[[project_admins]]
[[download_button]]
</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Venki</dc:creator><pubDate>Mon, 29 Oct 2012 21:04:36 -0000</pubDate><guid>https://sourceforge.neta22ddeafe0d395a18e0ed58faed482fe06463a2d</guid></item></channel></rss>