Menu

if (substr($1,1,3) == "123")...

Help
reggie27
2011-05-30
2012-07-26
  • reggie27

    reggie27 - 2011-05-30

    I just downloaded the gawk utility today to run commands in DOS on my PC. I
    can get a SUBSTR for numeric values to work in an IF statement, but I can't
    get alpha characters to work. When I run the following command:

    gawk "{if (substr($1,1,3) == "123") print $1}" baseball_test_file.txt >
    baseball_test_file1.txt,

    I get the correct data:

    123

    123xxx

    123yyy.

    However, when I run this command:

    gawk "{if (substr($1,1,3) == "Sun") print $1}" baseball_test_file.txt >
    baseball_test_file2.txt,

    I get an empty file as a result. Below is my input data. I tried a bunch of
    different things, but I can't figure it out. Little help?

    123

    123xxx

    Fri, 4/1 at Rays W 4-1 1-0 Guthrie (1-0) Price (0-1)

    Sun, 4/3 at Rays W 5-1 3-0 Britton (1-0) Davis (0-1)

    Fri, 4/8 Rangers Postponed 5-1

    Tue, 5/17 at Red Sox Postponed 19-21

    Sat, 5/28 at Athletics L 2-4 24-26 Outman (1-0) Bergesen (1-6)

    123yyy

     
  • Jay Satiro

    Jay Satiro - 2011-05-30

    You need to use the backslash character to escape the quotes.

    gawk "{if (substr($1,1,3) == \"Sun\") print $0}" abc.txt

     
  • reggie27

    reggie27 - 2011-05-30

    Thanks again, Ray. I'm going to get one of those Buddha statues, write
    "raysatiro" in magic marker on it, and light a candle in front of it. I read
    about the back slashes being used like this but I wasn't sure what "escape"
    meant in this context. I kept thinking, What am I escaping from? If it's not
    too much bother, Why don't I need the escape back slashes for numerics?

     
  • Allan

    Allan - 2011-05-30

    <quote>I kept thinking, What am I escaping from? If it's not too much bother,
    Why don't I need the escape back slashes for numerics?</quote>

    If you read the manual, numeric constants and string constants are treated
    differently. If you wanted '123' to be treated as a character string it should
    be enclosed in single or double quotes in the text file. Awk assumes that 123
    without quotes is a decimal number.

     
  • Jay Satiro

    Jay Satiro - 2011-05-31

    Reggie what you first need to understand is how your command line arguments
    will be seen by a gcc compiled C program after the command line is parsed. I
    wrote a very simple program to demonstrate this. These examples were made
    using the windows command interpreter.

    C:\Temp>getargs "{if (substr($1,1,3) == "123") print $0}"
    argv[ 0 ]: getargs
    argv[ 1 ]: {if (substr($1,1,3) == 123) print $0}
    
    C:\Temp>getargs "{if (substr($1,1,3) == \"123\") print $0}"
    argv[ 0 ]: getargs
    argv[ 1 ]: {if (substr($1,1,3) == "123") print $0}
    
    C:\Temp>getargs "{if (substr($1,1,3) == "Sun") print $0}"
    argv[ 0 ]: getargs
    argv[ 1 ]: {if (substr($1,1,3) == Sun) print $0}
    
    C:\Temp>getargs "{if (substr($1,1,3) == \"Sun\") print $0}"
    argv[ 0 ]: getargs
    argv[ 1 ]: {if (substr($1,1,3) == "Sun") print $0}
    
    C:\Temp>getargs "apple "orange" pear"
    argv[ 0 ]: getargs
    argv[ 1 ]: apple orange pear
    
    C:\Temp>getargs "apple " orange " pear"
    argv[ 0 ]: getargs
    argv[ 1 ]: apple 
    argv[ 2 ]: orange
    argv[ 3 ]:  pear
    
    C:\Temp>getargs "apple \" orange \" pear"
    argv[ 0 ]: getargs
    argv[ 1 ]: apple " orange " pear
    
    C:\Temp>getargs "apple""orange"
    argv[ 0 ]: getargs
    argv[ 1 ]: apple"orange
    
    C:\Temp>getargs "apple "" orange"
    argv[ 0 ]: getargs
    argv[ 1 ]: apple "
    argv[ 2 ]: orange
    
    C:\Temp>getargs "apple"
    argv[ 0 ]: getargs
    argv[ 1 ]: apple
    
    C:\Temp>getargs \"apple\"
    argv[ 0 ]: getargs
    argv[ 1 ]: "apple"
    

    Arguments are separated by spaces and each argument can be quoted. If an
    argument contains spaces it must be quoted. If an argument contains special
    characters in most cases it should/must be quoted. If an argument contains
    quotes in the argument in most cases you would want to escape them. There is
    also "apple""orange" parsed as one argument apple"orange . But that is
    undocumented. If you want a more thorough understanding of command line
    parsing go here:

    http://www.autohotkey.net/~deleyd/parameters/parameters.htm#WIN

    Now as to your question. You can see above that what is happening is while in
    your first command you may have intended to compare to a numeric string of
    "123" you were actually comparing a numeric of 123 because you didn't escape
    the quotes. gawk's type comparison rules, which are the same as POSIX's, allow
    that. It gets kind of complicated but basically it's okay to compare what you
    get from substr() to a numeric.

    Also, while you intended to compare to a string of "Sun" you were actually
    comparing to a variable named Sun because you didn't escape the quotes. Unless
    the unquoted argument in that if statement is a numeric (+2 or 234 etc.) gawk
    interprets it as a variable name. In this case an undefined variable, because
    there is no variable named Sun... unless you define it, and why would you do
    that except to make things really confusing!

    If you feel up to it review gawk's documentation on String Type Versus Numeric
    Type:

    [http://www.gnu.org/software/gawk/manual/html_node/Variable-Typing.html

    Variable-Typing](http://www.gnu.org/software/gawk/manual/html_node/Variable-

    Typing.html%23Variable-Typing)

    One of the most confusing issues for gnuwin32 users reading GNU documentation
    is that in many cases it assumes a POSIX compliant shell when it gives usage
    examples. The windows command interpreter (what most users use) is certainly
    not a POSIX compliant shell. So the examples in the link above you won't be
    able to run in a windows command interpreter due to the way they're quoted.
    You must use double quotes. So take this example:

    echo ' +3.14' | gawk '{ print $1 == "+3.14" }'
    

    In the windows command interpreter you must do it like this:

    c:\gnuwin32\bin\echo " +3.14" | gawk "{ print $1 == \"+3.14\" }"
    

    Although I focused on the windows command interpreter there are POSIX shells
    for windows that you can use instead of the command interpreter.

    If you want POSIX compliance Microsoft offers that for certain versions of
    windows:

    http://en.wikipedia.org/wiki/Windows_Services_for_UNIX

    Personally I alternate between the POSIX bash shell that came with my MSYS
    installation and the windows command interpreter. I don't use SFU.

    As your gawk code becomes more complex it will be easier to just move it to a
    script file where you won't have any of these problems, POSIX shell or not.
    gawk -f yourscript in.txt > out.txt

    The program getargs and its source is available here:

    https://sourceforge.net/projects/getgnuwin32/files/getgnuwin32/other/getargs/

     
  • reggie27

    reggie27 - 2011-05-31

    Whoo, lordy. I'll have to set aside some time to digest this. But thank you,
    again. I appreciated it.

     
MongoDB Logo MongoDB