find duplicates in text? Can this be done?

Red Dwarf
2009-07-12
2013-05-18
  • Red Dwarf

    Red Dwarf - 2009-07-12

    I have a pile (3000+) of IP address and usernames in a list with the IP address and then username on each line which I have managed to extract from other forum data using regular expression replace in Notepad++. I have sorted them and I want to search for and either highlight or mark any that have the same IP number. So I want to find any duplicate IP addresses in the list and mark them in some way so they can be easily searched for.

    Can Notepad++ do this and if so how?

    This is for forum moderation reasons.

     
    • Michel Merlin

      Michel Merlin - 2009-07-19

      Thx CChris. The most efficient in Original Poster's case seems:

      Select the string to be found (an IP address in OP's case), Ctrl+F, "Find all in current document"; the result pane (in bottom of the NPPP window) is Copy-able.

      I think this is what I (vaguely) remembered to have used in the past. Sort of simplified (e, f)grep.

      Versailles, Sun 19 Jul 2009 13:49:35 +0200

       
    • cchris

      cchris - 2009-07-13

      I can see something a little cnvoluted:
      1/ make two extra copies of your files, assuming the master file is sorted by IP
      2/ remove user names from both
      3/ remove the first line from the second file
      4/ use the compare plugin to mark changed lines, disabling move detection. Lines not marked as changed are duplicate IPs, and the line number will match those in the master file.

      There may be a simpler way using SimpleScript, but I don't have it at hand, because it is ANSI only <sigh/>

      Using sed looks much, much easier. Or AwkPlugin, but again it is ANSI only.

      CChris

       
    • Michel Merlin

      Michel Merlin - 2009-07-17

      I don't find back how it was, but I remember that in Notepad++ you do have a button or command that returns search results "à la" grep - IOW, as in a the grep family, a series of all the entire lines that contain the searched string. TIA to who could point the tool involved.

      Versailles, Fri 17 Jul 2009 13:00:00 +0200

       
    • cchris

      cchris - 2009-07-17

      Find all after checking Mark lines, and then Copy bookmarked lines, paste somewhere for further processing. Or use the various Find all in ... commands, and then copy/paste the text in the result window.

      Ah, and TextFX Viz -> Hide lines without (Clipboard), Select All, then Copy visible selcetion.

      Or is this something else yet?

      CChris

       
  • Michal84

    Michal84 - 2010-01-19

    Hi reddwarfer,

    is there any chance you found a solution to this problem? I also have a file chock full of names and IPs and would like to find out which are duplicates. Sorting them alphabetically doesn't cut it as the IP address is towards the middle of the lines.

     
  • cchris

    cchris - 2010-01-19

    You need some way to split your lines into fields, and then sort according to one of these fields. I'd use gawk.exe to do this - you can drive it through Notepad++.

    CChris

     
  • mahdionnet

    mahdionnet - 2013-05-18

    In the name of God
    Hi there,
    I what about doing this:
    1-make a new copy of your file and open it
    2-in search box select replace
    3-select "extended" mode
    4-if your ips start for ex. with 192, put 192 in find field and \n192 in replace field
    5- now push replace all so your names and ips become separate in lines
    6- save as this file in .csv file format
    7- open this csv file with excel
    8- select hole data ( select first cell of your data and push ctrl+shift+end buttons on the keyboard"
    9- now in "data" menu you can select "delete duplicates
    is that it?