I just downloaded the gawk utility today to run commands in DOS on my PC. I
can get a SUBSTR for numeric values to work in an IF statement, but I can't
get alpha characters to work. When I run the following command:
Thanks again, Ray. I'm going to get one of those Buddha statues, write
"raysatiro" in magic marker on it, and light a candle in front of it. I read
about the back slashes being used like this but I wasn't sure what "escape"
meant in this context. I kept thinking, What am I escaping from? If it's not
too much bother, Why don't I need the escape back slashes for numerics?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
<quote>I kept thinking, What am I escaping from? If it's not too much bother,
Why don't I need the escape back slashes for numerics?</quote>
If you read the manual, numeric constants and string constants are treated
differently. If you wanted '123' to be treated as a character string it should
be enclosed in single or double quotes in the text file. Awk assumes that 123
without quotes is a decimal number.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Reggie what you first need to understand is how your command line arguments
will be seen by a gcc compiled C program after the command line is parsed. I
wrote a very simple program to demonstrate this. These examples were made
using the windows command interpreter.
Arguments are separated by spaces and each argument can be quoted. If an
argument contains spaces it must be quoted. If an argument contains special
characters in most cases it should/must be quoted. If an argument contains
quotes in the argument in most cases you would want to escape them. There is
also "apple""orange" parsed as one argument apple"orange . But that is
undocumented. If you want a more thorough understanding of command line
parsing go here:
Now as to your question. You can see above that what is happening is while in
your first command you may have intended to compare to a numeric string of
"123" you were actually comparing a numeric of 123 because you didn't escape
the quotes. gawk's type comparison rules, which are the same as POSIX's, allow
that. It gets kind of complicated but basically it's okay to compare what you
get from substr() to a numeric.
Also, while you intended to compare to a string of "Sun" you were actually
comparing to a variable named Sun because you didn't escape the quotes. Unless
the unquoted argument in that if statement is a numeric (+2 or 234 etc.) gawk
interprets it as a variable name. In this case an undefined variable, because
there is no variable named Sun... unless you define it, and why would you do
that except to make things really confusing!
If you feel up to it review gawk's documentation on String Type Versus Numeric
Type:
One of the most confusing issues for gnuwin32 users reading GNU documentation
is that in many cases it assumes a POSIX compliant shell when it gives usage
examples. The windows command interpreter (what most users use) is certainly
not a POSIX compliant shell. So the examples in the link above you won't be
able to run in a windows command interpreter due to the way they're quoted.
You must use double quotes. So take this example:
echo ' +3.14' | gawk '{ print $1 == "+3.14" }'
In the windows command interpreter you must do it like this:
Personally I alternate between the POSIX bash shell that came with my MSYS
installation and the windows command interpreter. I don't use SFU.
As your gawk code becomes more complex it will be easier to just move it to a
script file where you won't have any of these problems, POSIX shell or not.
gawk -f yourscript in.txt > out.txt
The program getargs and its source is available here:
I just downloaded the gawk utility today to run commands in DOS on my PC. I
can get a SUBSTR for numeric values to work in an IF statement, but I can't
get alpha characters to work. When I run the following command:
gawk "{if (substr($1,1,3) == "123") print $1}" baseball_test_file.txt >
baseball_test_file1.txt,
I get the correct data:
123
123xxx
123yyy.
However, when I run this command:
gawk "{if (substr($1,1,3) == "Sun") print $1}" baseball_test_file.txt >
baseball_test_file2.txt,
I get an empty file as a result. Below is my input data. I tried a bunch of
different things, but I can't figure it out. Little help?
123
123xxx
Fri, 4/1 at Rays W 4-1 1-0 Guthrie (1-0) Price (0-1)
Sun, 4/3 at Rays W 5-1 3-0 Britton (1-0) Davis (0-1)
Fri, 4/8 Rangers Postponed 5-1
Tue, 5/17 at Red Sox Postponed 19-21
Sat, 5/28 at Athletics L 2-4 24-26 Outman (1-0) Bergesen (1-6)
123yyy
You need to use the backslash character to escape the quotes.
gawk "{if (substr($1,1,3) == \"Sun\") print $0}" abc.txt
Thanks again, Ray. I'm going to get one of those Buddha statues, write
"raysatiro" in magic marker on it, and light a candle in front of it. I read
about the back slashes being used like this but I wasn't sure what "escape"
meant in this context. I kept thinking, What am I escaping from? If it's not
too much bother, Why don't I need the escape back slashes for numerics?
<quote>I kept thinking, What am I escaping from? If it's not too much bother,
Why don't I need the escape back slashes for numerics?</quote>
If you read the manual, numeric constants and string constants are treated
differently. If you wanted '123' to be treated as a character string it should
be enclosed in single or double quotes in the text file. Awk assumes that 123
without quotes is a decimal number.
Reggie what you first need to understand is how your command line arguments
will be seen by a gcc compiled C program after the command line is parsed. I
wrote a very simple program to demonstrate this. These examples were made
using the windows command interpreter.
Arguments are separated by spaces and each argument can be quoted. If an
argument contains spaces it must be quoted. If an argument contains special
characters in most cases it should/must be quoted. If an argument contains
quotes in the argument in most cases you would want to escape them. There is
also "apple""orange" parsed as one argument apple"orange . But that is
undocumented. If you want a more thorough understanding of command line
parsing go here:
http://www.autohotkey.net/~deleyd/parameters/parameters.htm#WIN
Now as to your question. You can see above that what is happening is while in
your first command you may have intended to compare to a numeric string of
"123" you were actually comparing a numeric of 123 because you didn't escape
the quotes. gawk's type comparison rules, which are the same as POSIX's, allow
that. It gets kind of complicated but basically it's okay to compare what you
get from substr() to a numeric.
Also, while you intended to compare to a string of "Sun" you were actually
comparing to a variable named Sun because you didn't escape the quotes. Unless
the unquoted argument in that if statement is a numeric (+2 or 234 etc.) gawk
interprets it as a variable name. In this case an undefined variable, because
there is no variable named Sun... unless you define it, and why would you do
that except to make things really confusing!
If you feel up to it review gawk's documentation on String Type Versus Numeric
Type:
[http://www.gnu.org/software/gawk/manual/html_node/Variable-Typing.html
Variable-Typing](http://www.gnu.org/software/gawk/manual/html_node/Variable-
Typing.html%23Variable-Typing)
One of the most confusing issues for gnuwin32 users reading GNU documentation
is that in many cases it assumes a POSIX compliant shell when it gives usage
examples. The windows command interpreter (what most users use) is certainly
not a POSIX compliant shell. So the examples in the link above you won't be
able to run in a windows command interpreter due to the way they're quoted.
You must use double quotes. So take this example:
In the windows command interpreter you must do it like this:
Although I focused on the windows command interpreter there are POSIX shells
for windows that you can use instead of the command interpreter.
If you want POSIX compliance Microsoft offers that for certain versions of
windows:
http://en.wikipedia.org/wiki/Windows_Services_for_UNIX
Personally I alternate between the POSIX bash shell that came with my MSYS
installation and the windows command interpreter. I don't use SFU.
As your gawk code becomes more complex it will be easier to just move it to a
script file where you won't have any of these problems, POSIX shell or not.
gawk -f yourscript in.txt > out.txt
The program getargs and its source is available here:
https://sourceforge.net/projects/getgnuwin32/files/getgnuwin32/other/getargs/
Whoo, lordy. I'll have to set aside some time to digest this. But thank you,
again. I appreciated it.