parsing entire file (dhcpd.conf)

2006-04-20
2013-05-14
  • Tim Edwards

    Tim Edwards - 2006-04-20

    Hello all,

    First of all, Paul, thanks for writing pyparsing.  I have a feeling I will use it for parsing just about anything from now on.

    Being new to pyparsing I am having a problem, of which I am sure the solution will seem simple afterwards.  I am writing a dns and dhcp management system, and need to parse existing config files to populate a database.  I would like to parse the entire file, dropping out if something in the file is poorly written.  I would also like to parse the file in a context that allows me to insert data into the database as the parser finds relevant information.  So I have defined some of the grammar for the specific declarations I want to parse.  The problem I have, is that the parser just stops when it reaches a line that does not match a grammar definition.  Below is my code, and a test file I am using.   I put a bad line in the test file on purpose.

    The code is kind of long I know, but I want it handle things specifically, instead of grabbing tokens in general.

    Thanks in advance for any help
    --
    Tim

    --Code--
    from pyparsing import *

    dhcpdConf='docs/dhcpd.conf'
    ##punctuation
    colon  = Literal(":")
    lbrace = Literal("{")
    rbrace = Literal("}")
    lbrack = Literal("[")
    rbrack = Literal("]")
    lparen = Literal("(")
    rparen = Literal(")")
    equals = Literal("=")
    comma  = Literal(",")
    dot    = Literal(".")
    slash  = Literal("/")
    bslash = Literal("\\")
    star   = Literal("*")
    semi   = Literal(";")
    langle = Literal("<")
    rangle = Literal(">")

    ##Suppressed Grammar
    sDot=dot.suppress()
    sColon=colon.suppress()
    sLbrace=lbrace.suppress()
    sRbrace=rbrace.suppress()
    sColon=colon.suppress()
    sSemi=semi.suppress()

    ##Comment
    comment="#"+restOfLine

    ##Line markers
    bol=LineStart().suppress()
    eol=LineEnd().suppress()

    ##Non Terminators
    nonTerms=CharsNotIn(''';}''',min=1)
    nonSpecials=CharsNotIn('''{;}''',min=1)
    ##Ip Address
    ipOctet=Word(nums,min=1,max=3)
    ipAddr=Combine(ipOctet+dot+ipOctet+dot+ipOctet+dot+ipOctet).setResultsName('ip')

    ##Mac Address
    macOctet=Word(nums+'abcdef',exact=2)
    macAddr=Combine(macOctet+sColon+macOctet+sColon+macOctet+sColon+macOctet+sColon+macOctet+sColon+macOctet).setResultsName('mac')

    ##Domain Name
    domain=Word(alphanums+'-_',min=1)
    fqDomainName=Combine(domain + OneOrMore(dot|domain)).setResultsName('hostname')

    ##Option Directive
    option=Combine(Literal('option')+OneOrMore(nonTerms)+sSemi)
    options=ZeroOrMore(option)

    ##Generic Options
    genOption=Combine(bol+OneOrMore(~bol+nonSpecials)+sSemi).setName('Generic').setDebug()
    genOptions=ZeroOrMore(option)

    ##Options
    algorithm=Group(Literal('algorithm').suppress()+Word(alphanums+'-_')+sSemi).setResultsName('algorithm')
    secret=Group(Literal('secret').suppress()+(Literal('"')|Literal("'")).suppress()+CharsNotIn(''' '"; ''')+(Literal('"')|Literal("'")).suppress()+sSemi).setResultsName('secret')

    ## The big declartaions need to come at the end
    ##Host Declaration
    hostDeclaration= Group(Literal('host').suppress() + fqDomainName + sLbrace + Literal('hardware ethernet').suppress() +  macAddr + sSemi + Literal('fixed-address').suppress() + ipAddr + sSemi + sRbrace)
    hostDeclarations=ZeroOrMore(hostDeclaration)

    ##Key Declaration
    keyDeclaration=Group(Literal('key').suppress()+fqDomainName+sLbrace+(algorithm|secret)+(secret|algorithm)+sRbrace).setName('KeyDec')

    ##Zone Declaration
    zoneDeclaration=Group(Literal('zone').suppress()+fqDomainName+sLbrace+sRbrace)

    ##Subnet Declaration

    opt=option | genOption
    decl=hostDeclaration | keyDeclaration | zoneDeclaration
    expr=decl | opt
    parser=OneOrMore(expr)
    parser.ignore(comment)
    parser.ignore(bol+eol)

    #for each in testString.split('\n'):
    try:
        foo=parser.parseFile('docs/testfile.txt')
        for each in foo:
        print '--',each
    except ParseException, error:
        print "Parse Error on :", error.line
        print '%s at   lineno(%s) col(%s)' % (error.msg,error.lineno,error.column)

    --End Code--

    --Test file --
    option PXE.mtftp-delay          code 5 = unsigned integer 8;
    option PXE.discovery-control            code 6 = unsigned integer 8;
    option PXE.discovery-mcast-addr         code 7 = unsigned integer 8;
    asdf bomb here
    authoritative;
    ddns-update-style interim;
    ddns-ttl 30;

    key ns3.domain.edu. {
            algorithm hmac-md5;
            secret "538Xs4vDzqcd1wG7obHNsGcQSJpA+Ym/Q82TTxDXIIpKv5cPH72zg==";
    }

    zone 80.46.10.in-addr.arpa {
            key ns3.domain.edu.;
            primary 10.34.33.1;
    }

    zone 96.46.10.in-addr.arpa {
            key ns3.domain.edu.;
            primary 10.34.33.1;
    }

    zone 112.46.10.in-addr.arpa {
            key ns3.domain.edu.;
            primary 10.34.33.1;
    }

    host hostname-1.domain.edu {
                    hardware ethernet               00:12:d9:6e:6f:9d;
                    fixed-address                   10.46.17.71;
            }

            host hostname-2.domain.edu {
                    hardware ethernet               00:12:d9:6e:6f:8d;
                    fixed-address                   10.46.17.72;
            }

     
    • Larry Maccherone

      Try scanString.  To do this, you'll need to first read the file into a string.  You can use something like this:
          file = open(fullname, mode='rU')
          s = file.read()

      scanString also behaves a bit differently from parseString/parseFile in that it returns a lazy iterator and will only return the results a single match at a time.  So I actually use the code below.  It allows me to set a debug flag which uses scanString while I'm debugging but once the full parser is working, it will switch to parseString.  I'm not really sure that I ever need to use parseString except that it will fail when it comes across something in the source that it doesn't know how to parse.

          try:
              if debug:
                  tokens = statement_scanner.scanString(s)
                  for t in tokens:
                      pass
              else:
                  tokens = full_parser.parseString(w.s)
          except ParseException, err:
              print " "*err.loc + "^\n" + err.msg
              print err

       
    • Larry Maccherone

      the line should read:
      tokens = full_parser.parseString(s)

      you should drop the "w."

       
    • Tim Edwards

      Tim Edwards - 2006-04-20

      Larry, thanks for the suggestion.

      I must have overlooked the part of the documentation where it stated that scanString was handy for parsing entire files without an exhaustive grammar.

      So based on your suggestion I made the following changes:

      def scanFile(myParser, file):
          try:
          file_contents=file.read()
          except AttributeError:
          f=open(file,'rb')
          file_contents=f.read()
          f.close()
          return myParser.scanString(file_contents)

      if __name__=='__main__':
          debug=True
          testFileName='docs/testfile.txt'
          #for each in testString.split('\n'):
          try:
          if debug:
              tokens=scanFile(parser,testFileName)
          else:
              tokens=parser.parseFile(testFileName)
          for aToken in tokens:
              print '--',aToken
          except ParseException, error:
          print "Parse Error on :", error.line
          print '%s at   lineno(%s) col(%s)' % (error.msg,error.lineno,error.column)

      If others find scanFile functionality handy, maybe it could go into the next version.  It definitely is useful for parsing the entire file, but does not provide the effect of failing when undefined grammar is present.
      So I guess I am in the position of needing an "exhaustive grammar", because I need to know if there is something in the config files that my parser is not acting on.   Otherwise, information may be lost without warning.

       
    • Paul McGuire

      Paul McGuire - 2006-04-21

      Thanks for the compliments on pyparsing.  This is a pretty full-featured grammar for a first pyparsing attempt, congratulations!

      Thanks to Larry for pointing you towards scanString, this looks a workable approach.

      Look over the helpers and built-ins that come with pyparsing.  Using dblQuotedString could simplify your definition of secret, for instance.

      Best of luck in your parsing endeavors!
      -- Paul

       
    • Tim Edwards

      Tim Edwards - 2006-04-21

      Paul,

      The problem I have with the scanString approach, is it doesn't let me identify errors in the dhcpd.conf file. 

      I am currently looking at two possible solutions:
      1) parsing or scanning the file twice.  once for errors, and once for matches.  possibly scanning for the negation of the default parser.

      2)parsing or scanning, requiring each line or set of lines to match one of the expressions.  possibly inheriting ParseElement, or ParseElementEnhance to make a OneOfElements class.

      I would prefer the latter of the approaches, but either will suffice. 

      Thanks for the coding complement, but most of it came from looking at the given examples, and examples online.

      --
      Tim

       
    • Paul McGuire

      Paul McGuire - 2006-04-21

      No need to create your own OneOfElements class, Or or MatchFirst should be sufficient.  You've already done this in your definition of expr, so if you just call expr.scanString(), this should give you every decl or option in turn.  scanString will give you start and end locations for each match, so, whitespace aside, each start location should be one after the previous end location.  Or take a shot at scanning for ~expr, and see if this gives you any better results.

      -- Paul

       
    • Tim Edwards

      Tim Edwards - 2006-04-21

      So like I said in the beginning, the solution was simple.  I knew I needed to parse until the end of the file, it just didn't click that stringEnd was the key ;)

      Below I have posted the code to parse a dhcpd.conf file.  The only part of the grammar I don't like is the '\n'  in the genOptions expr.

      Also I had trouble getting psyco to play nice.  It bombs out when parseFile trys to open the file with a 'Illegal Instruction' error, and the whole script dies.  I haven't had time to test it on my pc, so I don't know if that is a problem that is specific to my macBook or not.  Without psyco the parser parses a 17000+ dhcpd.conf file in about 24seconds.

      Once again, thanks Paul for such a cool tool.

      --code--

      from pyparsing import *
      import psyco
      from datetime import datetime
      now=datetime.now

      ##punctuation
      colon  = Literal(":")
      lbrace = Literal("{")
      rbrace = Literal("}")
      lbrack = Literal("[")
      rbrack = Literal("]")
      lparen = Literal("(")
      rparen = Literal(")")
      equals = Literal("=")
      comma  = Literal(",")
      dot    = Literal(".")
      slash  = Literal("/")
      bslash = Literal("\\&quot;)
      star   = Literal("*")
      semi   = Literal(";")
      langle = Literal("<")
      rangle = Literal(">")

      ##Suppressed Grammar
      sDot=dot.suppress()
      sColon=colon.suppress()
      sLbrace=lbrace.suppress()
      sRbrace=rbrace.suppress()
      sColon=colon.suppress()
      sSemi=semi.suppress()

      ##Comment
      comment="#"+restOfLine

      ##Line markers
      bol=LineStart().suppress()
      eol=LineEnd().suppress()

      ##Non Terminators
      nonTerms=CharsNotIn(''';}''',min=1)
      nonSpecials=CharsNotIn('''{;}\n''',min=1)
      ##Ip Address
      ipOctet=Word(nums,min=1,max=3)
      ipAddr=Combine(ipOctet+dot+ipOctet+dot+ipOctet+dot+ipOctet).setResultsName('ip').setName('ip')

      ##Mac Address
      macOctet=Word(nums+'abcdefABCDEF',exact=2)
      macAddr=Combine(macOctet+sColon+macOctet+sColon+macOctet+sColon+macOctet+sColon+macOctet+sColon+macOctet).setResultsName('mac').setName('mac')

      ##Domain Name
      domain=Word(alphanums+'-_',min=1)
      fqDomainName=Combine(domain + OneOrMore(dot|domain)).setResultsName('hostname').setName('FQDomain')

      ##Option Directive
      option=Combine(CaselessKeyword('option')+OneOrMore(nonTerms)+sSemi).setName('Option')

      ##Generic Options
      genOption=Combine(OneOrMore(nonSpecials)+sSemi).setName('Generic')#.setDebug()

      ##Options
      algorithm=Group(CaselessKeyword('algorithm').suppress()+Word(alphanums+'-_')+sSemi).setResultsName('algorithm').setName('algorithm')
      secret=Group(CaselessKeyword('secret')+quotedString+sSemi).setResultsName('secret').setName('secret')
      range=Group(CaselessKeyword('range')+ipAddr+ipAddr+sSemi).setResultsName('range').setName('range')

      ## The big declartaions need to come at the end
      ##Host Declaration
      hostDeclaration= Group(CaselessKeyword('host').suppress() + fqDomainName + sLbrace + CaselessKeyword('hardware ethernet').suppress() +  macAddr + sSemi + CaselessKeyword('fixed-address').suppress() + ipAddr + sSemi + sRbrace).setName('hostDec')#.setDebug()

      ##Key Declaration
      keyDeclaration=Group(CaselessKeyword('key').suppress()+fqDomainName+sLbrace+(algorithm & secret)+sRbrace).setName('KeyDec')

      ##Zone Declaration
      zoneDeclaration=Group(CaselessKeyword('zone').suppress()+fqDomainName+sLbrace+OneOrMore(genOption)+sRbrace).setName('zoneDec')

      ##Subnet Declaration
      subnetDeclaration=Group(CaselessKeyword('subnet').suppress()+ipAddr.setName('subnet')+CaselessKeyword('netmask').suppress()+ipAddr.setName('netmask')+sLbrace+OneOrMore(option|genOption|range|hostDeclaration)+sRbrace).setName('subnetDec')#.setDebug()

      ##Vlan Declaration
      vlanDeclaration=Group(CaselessKeyword('shared-network').suppress()+Word(alphanums).setName('vlan')+sLbrace+OneOrMore(option|genOption|subnetDeclaration|hostDeclaration)+sRbrace)

      ##Main Declarations
      opts=option | genOption
      decl=hostDeclaration | keyDeclaration | zoneDeclaration | subnetDeclaration | vlanDeclaration
      stmt=Forward()
      expr=decl| opts
      stmt<< (expr+OneOrMore(expr))+stringEnd
      parser=stmt.setName('parser')#.setDebug()
      parser.ignore(comment)

      if __name__=='__main__':
          debug=False
          start=now()
          try:
          testFileName='docs/dhcpd.conf'
          tokens=parser.parseFile(testFileName)
          finish=now()
          for aToken in tokens:
              print '--',aToken
          except ParseException, error:
          print "Parse Error on :", error.line
          if error.msg=='Expected stringEnd' and not debug:
              print "Unrecognized Statement at lineno(%s)" % error.lineno
          else:
              print '%s at   lineno(%s) col(%s)' % (error.msg,error.lineno,error.column)
             
          print "File parsed in %s" % (finish-start)

       

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks