[Lxr-dev] [ lxr-Bugs-3204266 ] Fragment recognition too rudimentary (parsing)
Brought to you by:
ajlittoz
From: SourceForge.net <no...@so...> - 2011-03-09 14:52:45
|
Bugs item #3204266, was opened at 2011-03-09 15:45 Message generated for change (Settings changed) made by ajlittoz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=3204266&group_id=27350 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Lang support Group: None Status: Open Resolution: Works For Me >Priority: 8 Private: No Submitted By: Andre-Littoz (ajlittoz) Assigned to: Nobody/Anonymous (nobody) Summary: Fragment recognition too rudimentary (parsing) Initial Comment: LXR uses a very simple parser to isolate homogeneous fragment of source file to ease recognition of keywords and variables. It tries to wipe out strings, comments and include constructs, so that what is left is composed only of operators and variables. In some circumstances, the parser leaves strings before seeing the end of the string, e.g. in "A string with \" inside". The parser thinks "A string with \" is the string, the inside may be a variable and a new string is then opened out of sync. The cause is a simplistic parse based on pattern-matching with opening and closing delimiters for context. There is a need for an other kind of objects which maintain the parser in the same state when detected. What happens is data represented by the regexp associated to these objects is "swallowed" and nothing happens because they cannot be classified as opening or closing delimiters (provided it is possible, i.e. they do not appear also as these delimiters). For that, 'spec' of generic.conf is modified to have 4 parameters insted of 3: 'spec' => [ context_name, open_pattern, close_pattern, stay_pattern, ... ] (which, by the way, means a huge rewrite of generic.conf) In the case of C strings, we can now have (extract only): 'string', '"', '"', '\\\\"', But it is not enough; we must take care that this stay_pattern is considered BEFORE end_pattern because the latter is a prefix of the former and the expected match leng of stay_pattern is longer than that of end_pattern. Unhappily, there is no way to guarantee that all patterns of a 'spec' can be sorted such that they'll match from the most specific to the less specific. The proposed patch tries to solve that, but there will always be situations where it will fail. It is a consequence of pattern-matching parsing. Using a finite state automaton only to find candidate identifiers is not really worth it. Other bug loosely related: When the line is split according to all patterns of a spec, it must be determined if one element is an opening delimiter. The answer is positive if there is an exact match, that is the pattern extends from the start ^ to the end $. These anchors are added to the ends of the merged regexp containing all the patterns as a sequence of alternatives. It means that only the first delimiter is constrained to start at the beginning of the string and the last delimiter to end at the end of the string. This leads to false detection of short delimiters in the middle of strings. This is also addressed in the proposed patch. !-37 110307 (after line 37 in sub init, add:) !my @stay; # Fragment maintaining current context !-49 110307 (after line 49, add:) ! @stay = (); !-58,58 110307 (replace line 58 with:) ! while (@_ = splice(@blksep, 0, 4)) { !-61 110307 (after line 61, add:) ! push(@stay, $_[3]); !-74 110307 (after line 74, add:) ! ! foreach (@stay) { ! next if $_ eq ''; ! $split = "$_|" . $split; ! } ------------------- end of patch ------------- in sub init & nextfrag nextfrag sometimes matches erroneously delimiters because ^ and $ surround the merged patterns of all alternatives. ^ and $ should be put arround each alternative to ensure that the delimiter will match as a wholle and not as a part of the fragment. !-68,68 110307 init ! $open .= "^($_)\$|"; !-138,138 110307 nextfrag ! if ($frags[0] =~ /$open/) { !-150,150 110307 nextfrag ! if (defined($frag) && (@_ = $frag =~ /$open/)) { ------------------------ end of patch --------------------------- ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=390117&aid=3204266&group_id=27350 |