Menu

problems with parsing html string

mironcaius
2009-10-20
2013-01-03
  • mironcaius

    mironcaius - 2009-10-20

    Hello,
    I have to say it is the best java html parser, even better than mozzila's mozzila2java.

    I am using awt Browser, and trying to display your structure in a jtree.
    I have the html string but how can i tell the source to parse html, you just have constructor for:
    Source(final CharSequence text)<br/>
    Source(final EncodingDetector encodingDetector)<br/>
    Source(final Reader reader, final String encoding)<br/>
    Source(final CharSequence sourceText, final StreamedParseText streame…
    If i send the URL it works great.

     
  • Martin Jericho

    Martin Jericho - 2009-10-20

    Use the Source(CharSequence) constructor.
    Cheers
    Martin

     
  • mironcaius

    mironcaius - 2009-10-20

    I tried doing Source source = new Source(htmlText); where html text is
    org.eclipse.swt.browser.Browser..getText()
    String org.eclipse.swt.browser.Browser.getText()
    Returns a string with HTML that represents the content of the current page.

    And it does not work. I have tested with ebay.com and google.com. Both work if i try to get the content dirrectly from the URL. using
    Source(final URL url), but when i try to get it from the string..it fails.

    Thanks Martin.

     
  • Martin Jericho

    Martin Jericho - 2009-10-20

    I don't see a getText() method in the Browser class.

    What do you mean by "it doesn't work"?  What error are you getting?

     
  • mironcaius

    mironcaius - 2009-10-20

    This method exists but it is not correctly implemented
    https://bugs.eclipse.org/bugs/show_bug.cgi?id=107142
    It is converting some > to < and &gt. I will have to see how to fix this problem.
    A short question, do you thread Tbody tags, because i did not see any tag in my testing of the valid display ?

     
  • Martin Jericho

    Martin Jericho - 2009-10-22

    I'm not sure what you mean by threading Tbody tags.

     
  • mironcaius

    mironcaius - 2009-10-23

    Sorry about that, i mean check to see if you also parse Tbody tags, because in my testing you jumped over those.
    Thanks.

     

Log in to post a comment.