Regular expression for IP addresses

2. Help
Mike
2011-10-27
2013-04-03
  • Mike
    Mike
    2011-10-27

    I'm trying to figure out what the regular expression would be to pick out all dotted decimal IP addresses in a file using Notepad++ and replacing them with something else.

    I've tried various expressions from other examples and forums but just cant seem to get it to work..

    Can anyone help?

    Thanks Mike

     
  • cchris
    cchris
    2011-10-29

    Doesn't \d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d? work?
    I assume you are parsing IPv4 addresses only.

    CChris

     
  • THEVENOT Guy
    THEVENOT Guy
    2012-10-21

    Hello login365

    I've been registrated on Notepad Forums, since 08-15-12 only, so I assume that you
      could fully manage your problem, since your post !

    But, with the new PCRE regular expressions, from Notepad++ 6.0 Unicode and above, there is
      a general solution, I would like to talk about !

     
    Of course, select the radio button 'Regular expressions', and type :

     
    FIND           :  (?(DEFINE)(25|2\d|1\d\d|?\d))\b(?1)(\.(?1)){3}\b

    REPLACE  :  Whatever you like !

    Some explanations :

      This regex uses a special form of the conditionnal IF statement :  (?(DEFINE)(……..))

      The FIRST part of the regex (?(DEFINE)(25|2\d|1\d\d|?\d)) is looking
        for a VALID INDIVIDUAL number of an IPv4 address :

        25     represents a number between 250 and 255
        2\d    represents a number between 200 and 249
        1\d\d        represents a number between 100 and 199
        ?\d    represents a number between 10 and 99 or a number between 0 and 9,
                            WITHOUT ANY LEADING ZERO

      The VALID INDIVIDUAL number is referenced as an INDEPENDANT regex and stored in the VARIABLE (?1)

      Then, a COMPLETE address IPv4 can be seen as a WORD, formed with FOUR valid numbers, as described above,
        and separated with a DECIMAL point

      So, the SECOND part of the regex can be  \b(?1)\.(?1)\.(?1)\.(?1)\b

      IMPORTANT : Note that EVERY reference (?1) is INDEPENDANT of the others,
                                  and JUST means 25|2\d|1\d\d|?\d

      As the text \.(?1) is repeated THREE times, we can be shortened it, in \b(?1)(\.(?1)){3}\b

    This regex ignores ALL NON VALID IPv4 addresses, like :

      201.45.257.300  ( Number GREATER than 255 )

      123.099.04.200  ( NON significant ZERO in a number )

      100.200.3            ( LESS than FOUR blocks of numbers )

      bar1.1.1.1foo      ( IPv4 address GLUED in a text )

      12.34. 56.78       ( NON DIGITS and NON DECIMAL POINT in IPv4 address )

    But, ALL these VALID addresses, below, are detected :

      201.45.255.255

      123.99.4.200

      100.200.3.0

      bar 127.0.0.1 foo

      12.34.56.78

      and also :

      0.0.0.0

      1.1.1.1

      1.10.100.200

      127.0.0.1

      255.255.255.255

      200.100.10.0

    Remark : We could also use a nommed variable, as 'Byte'. So, the SEARCH regex becomes :

                       (?(DEFINE)(?<Byte>25|2\d|1\d\d|?\d))\b(?&Byte)(\.(?&Byte)){3}\b

    I've made a TUTORIAL, about the PCRE Regular Expressions ( Perl Common Regular Expressions ),
      used in Notepad++, from the 6.0 version.

    As I'm French, all this manual is written in French. but you can find out some tricks or
      explanations in all the lists and examples, all along this tutorial.

    Christian Cuvier ( cchris ), a very well-known contributer, allowed me to put my tutorial
      on his personnel site.

    So, you can download this TUTORIAL, in 3 versions, (.txt .pdf .html), at the address below :

           http://oedoc.free.fr/Regex/TutorielRegex.zip

    I hope it'll be useful to you

    Cheers !

    guy038

    P.S. :  You can find some documentation, about the new PRCE Regular Expressions, used by N++, at the
                  two adresses below

         http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

         http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

         The FIRST one concerns the syntax of regular expressions in the SEARCH part

         The SECOND one concerns the syntax of regular expressions in the REPLACEMENT part

     
  • THEVENOT Guy
    THEVENOT Guy
    2013-04-03

    Hi login365,

    Since the use of Markdown syntax, by SourceForge, to allow us to create rich text, my precedent post is obsolete and, more important wrong !

    So, see below, the same post, using "Markdown" features, which is correct !


    I've been registrated on Notepad Forums, since 08-15-12 only, so I assume that you could fully manage your problem, since your post !

    But, with the new PCRE regular expressions, from Notepad++ 6.0 Unicode and above, there is a general solution !

    Of course, select the radio button 'Regular expressions', and type in FIND dialog :

    (?(DEFINE)(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d))\b(?1)(\.(?1)){3}\b

    and whatever you like, in REPLACE dialog !


    Some explanations :

    This regex uses a special form of the conditionnal IF statement :

    (?(DEFINE)(........))

    The FIRST part of the regex

    (?(DEFINE)(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d))
    is looking for a VALID INDIVIDUAL number of an IPv4 address :

    25[0-5]      represents a number between 250 and 255
    2[0-4]\d    represents a number between 200 and 249
    1\d\d          represents a number between 100 and 199
    [1-9]?\d    represents a number between 10 and 99 or a number between 0 and 9,
    without ANY LEADING ZERO

    The VALID INDIVIDUAL number is referenced as an INDEPENDANT regex and stored in the VARIABLE (?1)

    Then, a COMPLETE address IPv4 can be seen as a WORD, formed with FOUR valid numbers, as described above, and separated with a DECIMAL point

    So, the SECOND part of the regex can be   \b(?1)\.(?1)\.(?1)\.(?1)\b

    IMPORTANT : Note that EVERY reference (?1) is INDEPENDANT of the others,
    and JUST means   25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d

    As the text \.(?1) is repeated THREE times, we can be shortened it, in

    \b(?1)(\.(?1)){3}\b


    This regex ignores ALL NON VALID IPv4 addresses, like :

    201.45.257.300  ( Number GREATER than 255 )

    123.099.04.200  ( NON significant ZERO in a number )

    100.200.3          ( LESS than FOUR blocks of numbers )

    bar1.1.1.1foo     ( IPv4 address GLUED in a text )

    12.34. 56.78      ( NON DIGITS and NON DECIMAL POINT in IPv4 address )


    But, ALL the VALID addresses, below, are detected :

    201.45.255.255

    123.99.4.200

    100.200.3.0

    bar 127.0.0.1 foo

    12.34.56.78

    and also :

    0.0.0.0

    1.1.1.1

    1.10.100.200

    127.0.0.1

    255.255.255.255

    200.100.10.0


    Remark : We could also use a nommed variable, as 'Byte'. So, the SEARCH regex becomes :

    (?(DEFINE)(?<Byte>25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d))\b(?&Byte)(\.(?&Byte)){3}\b


    I've made a TUTORIAL, about the PCRE Regular Expressions ( Perl Common Regular Expressions ), used in Notepad++, from the 6.0 version.

    As I'm French, all this manual is written in French. but you can find out some tricks or explanations in all the lists and examples, all along this tutorial.

    Christian Cuvier ( cchris ), a very well-known contributer, allowed me to put my tutorial on his personnel site.

    So, you can download this TUTORIAL, in 3 versions, (.txt .pdf .html), at the address below :

    http://oedoc.free.fr/Regex/TutorielRegex.zip

    I hope it'll be useful to you

    Cheers,

    guy038


    P.S. You can find some documentation, about the new PRCE Regular Expressions, used by N++, at the two adresses below :

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

    The FIRST one concerns the syntax of regular expressions in the SEARCH part

    The SECOND one concerns the syntax of regular expressions in the REPLACEMENT part

     
    Last edit: THEVENOT Guy 2013-04-03