#3 Parser crashes anytime search for something with an ampersan

open
5
2007-12-08
2007-12-08
Anonymous
No

Repro:
import sys
import yahoo.search.web

srch = yahoo.search.web.SpellingSuggestion(app_id)
srch.query='Boonock Saints & test'
#srch.results = 10

dom = srch.get_results()
results = srch.parse_results(dom)

for res in results:
print res

>python -u "testYahoo.py"
Traceback (most recent call last):
File "testYahoo.py", line 15, in <module>
dom = srch.get_results()
File "/usr/lib/python2.5/site-packages/yahoo/search/__init__.py", line 743, in get_results
res = xml_parser(stream)
File "/usr/lib/python2.5/site-packages/_xmlplus/dom/minidom.py", line 1915, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.5/site-packages/_xmlplus/dom/expatbuilder.py", line 930, in parse
result = builder.parseFile(file)
File "/usr/lib/python2.5/site-packages/_xmlplus/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 2, column 227
>Exit code: 1

My email address is bsrour@gmail.com
I am using the 3.1 release.

Discussion

  • Leif Hedstrom

    Leif Hedstrom - 2007-12-08

    Logged In: YES
    user_id=480913
    Originator: NO

    I suspect this is a problem with the Y! web service, it ought to return the XML with the '&' encoded as '&amp;' (but if you look at the XML, you'll see it's just '&'). Let me check with Jason on this one, and see if he's got any input.

    Thanks for the report!

    -- leif

    Example:

    curl 'http://search.yahooapis.com/WebSearchService/V1/spellingSuggestion?query=Boonock+Saints+%26+test&appid=foo'

    <?xml version="1.0" encoding="UTF-8"?>
    <ResultSet xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:yahoo:srch" xsi:schemaLocation="urn:yahoo:srch http://api.search.yahoo.com/WebSearchService/V1/WebSearchSpellingResponse.xsd">
    <Result>Boondock Saints & test</Result>
    </ResultSet>

     
  • Nobody/Anonymous

    Logged In: NO

    I suspect this is a "bug" in the Y! web service. I've asked them to look into this, they really should use a character entity (&amp;) and not just '&' in the response.

     
  • Leif Hedstrom

    Leif Hedstrom - 2008-08-24

    Logged In: YES
    user_id=480913
    Originator: NO

    Fwiw, I've reported this bug twice to the Y! search developers. Still no solution afaik.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks