Re: [Pyparsing] How do fix this?
Brought to you by:
ptmcg
From: Paul M. <pt...@au...> - 2015-10-31 15:11:45
|
First of all, it is not necessary (and probably not even helpful) to define every element of your grammar as an instance variable, preceded by 'self.'. In nearly all cases, when I define a grammar within a parsing class, I'll do something like this: class BobParser(object): def __init__(self): expr1 = Literal("Bob's your") expr2 = oneOf("uncle aunt brother sister father mother") self.parser = expr1 + expr2('relation') def who_is_it(self, string): return self.parser.parseString(string).relation All the sub-expressions can just be local variables, and save the "self." business for just the top-level parser. This really looks like you just started at the beginning of your input text and started writing pyparsing expressions to it. Many times there's nothing wrong with that, but in your case, there is a lot of complexity and structure to your input. And more importantly, many internal patterns that are repeated - these can be defined once and then reused by name over and over. I often encourage people to write a BNF for complex data like this. Or at *least* look at the input at an overall level for which bits are common and can be reused. There are many small pieces that can be defined and reused in larger parts, and these will help simplify your grammar. For instance, there are many places where you use this: Combine(self.plain_number + 'x' + self.plain_number, adjacent=False).setResultsName('something') which really is challenging to the eye to see what is the parser and what is the meta-information (adjacency, results name). And you repeat this whenever you need a "0 x 0" or "640 x 480", so it makes things very messy. If you define for yourself this reusable expression: number_x_number = Combine(plain_number + 'x' + plain_number, adjacent = False) Then you can use it in multiple places as: number_x_number("clean_aperature") number_x_number("Dimensions") etc. I also strongly recommend using the short-cut version of expr.setResultsName('xyz'), as expr('xyz'). This notation can really clean up your grammar definition, and make it easier to see the overall parser without being distracted by all the function calls. Finally, you make heavy use of "Combine(something + something + something, adjacent=False)". Please consider using Group instead. It's clearer to follow, it implicitly allows whitespace (so adjacent=False is not necessary), and it allows you to define results names within the group, making for a useful substructure (as in sample_time below): number_x_number = Group(plain_number + 'x' + plain_number) number_sl_number = Group(plain_number + '/' + plain_number) word_sl_word = Gruop(word + '/' + word) timestamp = Regex(r'\d\d:\d\d:\d\d\.\d\d\d') sample_time = Group(number_sl_number('sample') + timestamp('time')) So just going through and stripping out all the "self." stuff and using some of these repeated sub-expressions might make things easier to follow, and your overall intent and structure will be clearer. Here is one area of your parser with these changes: audio_dimensions = 'Dimensions:' + number_x_number('Dimensions') audio_track_matrix = 'Track Matrix:' + restOfLine('track_matrix') audio_track_dimensions = audio_dimensions + audio_track_matrix subtitle_dimensions = 'Dimensions:' + number_x_number('Dimensions') subtitle_track_matrix = 'Track Matrix:' + restOfLine('track_matrix') subtitle_track_dimensions = subtitle_dimensions + subtitle_track_matrix video_dimensions = 'Dimensions:' + number_x_number('Dimensions') video_clean_aperture = 'CleanAperture: ' + number_x_number('CleanAperture') video_production_aperture = 'ProductionAperture:' + number_x_number('ProductionAperture') video_encoded_pixels = 'EncodedPixels:' + number_x_number('EncodedPixels') Now you can answer some other questions for yourself, like "why do I repeat the 'Dimensions:' expression?", and further simplify your grammar. Good luck, -- Paul --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus |