-
You could, but I should think that this would be far simpler in straight Python, something like:
infile = open(htmlsource)
src = list(infile.read())
filtered = [c for c in src if ord(c) <= 256]
open(outfile,"w").write("".join(filtered))
No?.
2009-10-03 22:48:27 UTC by ptmcg
-
Hi,
Can I use pyparsing to remove unwanted characters from a string. For example I have an xml message that I need to parse with characters like u'\u25c4'. I cant seem to print these characters so can I remove these characters? Use something other than pyparse?.
2009-10-03 09:03:23 UTC by paulb123
-
Hi Paul, thanks! I patched pyparsing.py but the result is still the same. I only got 3 matches in the data set:
data = """
. BAGONG SILANG BRGY. I RABAGO
REVENA, BARANGAY I (POB.),
RABAGO, REVENA
F
06/15/1925
SAMSON, JAMES HUBILLA
1111-0001A-F1567GHA2
1
. BAGONG SILANG BRGY. I RABAGO
REVENA, BARANGAY I (POB.),
RABAGO...
2009-09-24 15:56:42 UTC by francisv
-
ptmcg committed revision 186 to the Python parsing module SVN repository, changing 2 files.
2009-09-24 07:37:59 UTC by ptmcg
-
Oops, should be:
name_line = restOfLine("name") + NL.
2009-09-24 07:28:51 UTC by ptmcg
-
Congratulations! You've helped me find and fix a bug in pyparsing!
There is a bug in originalTextFor, a very subtle one that causes the parser to read past trailing whitespace or newlines before matching the next element. In the case of name_line, the next element is an expected newline, so the parse fails (because originalTextFor already ate the newline). If you want to patch your version...
2009-09-24 07:16:42 UTC by ptmcg
-
I was able to write it back to unicode :) However, it seems that it's not matching this block:
. BAGONG SILANG BRGY. I ALAMINOS
RAMONES, BARANGAY I (POB.),
RAMONES, TREVOR
F
02/10/1960
ZAMORA, GRACE SAMSON
3401-2221A-F1111GHA2
1
Here's a snippet of the code I'm using.
NL = LineEnd().suppress()
gender = oneOf("M F")
integer =...
2009-09-23 23:51:17 UTC by francisv
-
This now becomes more of a Python issue than a pyparsing one. Look up some of the string functions like encode and decode to output your strings in human-readable form.
2009-09-15 23:17:26 UTC by ptmcg
-
I submitted a fix for this in https://sourceforge.net/tracker/?func=detail&aid=2859467&group_id=97203&atid=617313
When a ParseElement contains a group, .transformString returns an
incorrect result that has a 'stringified list' contained within it
For instance, try this code:
from pyparsing import *
# function for use with setParseResults, so we can see what transformString
is doing...
2009-09-15 18:19:16 UTC by barnabas79
-
When a ParseElement contains a group, .transformString returns an incorrect result that has a 'stringified list' contained within it
For instance, try this code:
from pyparsing import *
# function for use with setParseResults, so we can see what transformString is doing
def lowercaseResults(results):
if isinstance(results, (ParseResults, list, tuple)):
return...
2009-09-15 18:16:05 UTC by barnabas79