-
You could, but I should think that this would be far simpler in straight Python, something like:
infile = open(htmlsource)
src = list(infile.read())
filtered = [c for c in src if ord(c) <= 256]
open(outfile,"w").write("".join(filtered))
No?.
2009-10-03 22:48:27 UTC in Python parsing module
-
ptmcg committed revision 186 to the Python parsing module SVN repository, changing 2 files.
2009-09-24 07:37:59 UTC in Python parsing module
-
Oops, should be:
name_line = restOfLine("name") + NL.
2009-09-24 07:28:51 UTC in Python parsing module
-
Congratulations! You've helped me find and fix a bug in pyparsing!
There is a bug in originalTextFor, a very subtle one that causes the parser to read past trailing whitespace or newlines before matching the next element. In the case of name_line, the next element is an expected newline, so the parse fails (because originalTextFor already ate the newline). If you want to patch your version...
2009-09-24 07:16:42 UTC in Python parsing module
-
This now becomes more of a Python issue than a pyparsing one. Look up some of the string functions like encode and decode to output your strings in human-readable form.
2009-09-15 23:17:26 UTC in Python parsing module
-
Ah! I forgot one other change. Remove Group from:
record = Group((first_addr_line + ZeroOrMore(subsq_addr_line))("address") +
When you call searchString, it implicitly groups your tokens.
2009-09-15 12:54:08 UTC in Python parsing module
-
(this is a follow-up from [this question][1] on stackoverflow)
The simplest is to change this line:
records = OneOrMore(record).parseString(data)
to this:
records = record.searchString(data)
This is a little risky, since it will ready any '.' anywhere as the beginning of first\_addr\_line. So to make sure you only match '.' at the start of a line, change first\_addr\_line...
2009-09-15 04:26:24 UTC in Python parsing module
-
Well, oddly enough, when I step through your submitted code using the debugger, your unexpected results are actually the correct ones. You have found an unusual case in which a parser defined as "A & B" behaves differently from "B & A".
The basic implementation of Each works like this (I've left out special handling if there are OneOrMore or ZeroOrMore...
2009-09-14 02:28:50 UTC in Python parsing module
-
If you send me a patch file, I'd be happy to test your change and let you know the outcome. The Each class does get a bit touchy around Optionals and repetitive classes like OneOrMore and ZeroOrMore.
Yes, there are a collection of unit tests, but I've not published them because some of them test equipment data files that are proprietary in format. This has come up often enough, though, that...
2009-09-13 13:04:18 UTC in Python parsing module
-
Catherine -
Thanks for reporting this, this is a known bug. I have a fix in the works, courtesy of Alex Martelli.
-- Paul.
2009-09-08 07:41:51 UTC in Python parsing module