gawk treats pipe FS-delimiters inconsistently

Help
trids
2010-05-13
2012-07-26
  • trids

    trids - 2010-05-13

    Hi there

    I'm using gawk 3.1.6 to parse some output from isql (Sybase) that looks
    like the following:

    |data_type|type_name       |precision  |length     |scale |
    |---------|----------------|-----------|-----------|------|
    |        4|int             |         10|          4|     0|
    |        1|char            |          7|          7|  NULL|
    |       12|varchar         |         50|         50|  NULL|
    |       12|varchar         |         35|         35|  NULL|
    |       12|varchar         |         35|         35|  NULL|
    |       12|varchar         |         35|         35|  NULL|
    |       12|varchar         |         35|         35|  NULL|
    |       12|varchar         |         15|         15|  NULL|
    |       12|varchar         |         25|         25|  NULL|
    |        1|char            |          2|          2|  NULL|
    |       12|varchar         |         30|         30|  NULL|
    |        1|char            |          1|          1|  NULL|
    

    To do so, I am using a regexp FS as follows:

    BEGIN { FS=" *| *";     OFS = "\t" }
    { 
        print $1, $2, $3, $4, $5    
    }
    

    The result is that the fields are indeed split on the pipes, but field values
    all INCLUDE the pipes as well! Whereas if the pipes are say colons, and i use
    FS=" : ", then i get expected results .. viz: the colons are NOT included
    with the field values.

    I have tried escaping the pipe (\|) .. but gawk reports that it is not
    necessary to do so:

    awk: awk\readpipes.awk:7: warning: escape sequence \|' treated as plain|'

    Am i doing something wrong, or is this a bug/problem that i need to find a
    workaround for?

    TIA

     
  • lowella

    lowella - 2010-05-14

    Yes, you're doing something wrong. The pipe symbol '|' is the regex
    alternation operator so you are telling gawk that the field separator is
    either 'a bunch of blanks' or 'a bunch of blanks'; so gawk does exactly what
    you ask it to do.

    Try FS=" * *" and you should see a difference-- is used to create a character
    list. However, based on how your data is structured and the regex FS
    seperator, field $1 will be null so your code won't print the column with
    NULLs in it. You probably want to use print $2, $3, $4, $5, $6.

    HTH

     
  • trids

    trids - 2010-05-14

    Awesome - Thank you both for two excellent answers!

    It's good to know the right way (raysatiro), but i love the lateral solution
    too (lowella).

    Thanks again!

    =oD

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks