The query [pos="PP$"]
matches the Penn tag PP$
, but queries [pos="PP$"%c]
and [pos="PP$|PP$"]
match PP
instead.
The reason, of course, is that the first query is matched as a literal string rather than a regexp, so $
isn't interpreted as a metacharacter anchoring the regexp at end-of-string. CQP heuristically checks for metacharacters in do_flagged_string()<cqp/parse_actions.c>
, but the list doesn't include the “useless” metacharacters $
and ^
. This raises three questions:
1. Should we change behaviour to ensure consistency between the three queries? This might break existing applications (and users) who have unwittingly relied on the current inconsistent behaviour.
2. If we do, perhaps the current list "[](){}.*+|?\\"
has further gaps?
3. Is do_flagged_string()
the only place where this test is run or do we need to patch other functions as well?
Fixed in
r1705
. The macro CL_REGEX_METACHARACTERS is defined incl/cl.h
(and documentation explains that it's not a list of all "unsafe" characters despite the name).