Menu

#3 multi-thread support

open
None
5
2010-06-01
2010-05-30
No

Can add multi-thread support for nekohtml?

Discussion

  • Jacob Kjome

    Jacob Kjome - 2010-05-30

    Can you expand on what you mean and how it might be beneficial?

     
  • shirley wilder

    shirley wilder - 2010-05-31

    I want to test my site with htmlunit with nearly 1000 threads.But it can only parse html with one thread using nekohtml.So the speed is slow.So I think it'll be wonderful if nekohtml supports multi-thead.
    Thanks!

     
  • Marc Guillemot

    Marc Guillemot - 2010-05-31

    Can you explain a bit what you want to do? For me it doesn't make sense to use more than one thread to parse a document but x threads can be used to parse x document.

     
  • Marc Guillemot

    Marc Guillemot - 2010-05-31
    • status: open --> pending
     
  • shirley wilder

    shirley wilder - 2010-05-31

    I use one thread to parse a html,and it costs 5s.
    What I want to do is parses 1000 html using 1000 threads,and the using time is also 5s or a little more.
    Thanks!

     
  • Marc Guillemot

    Marc Guillemot - 2010-05-31

    sorry, but I still don't understand what the problem is. Can you explain more precisely?

     
  • shirley wilder

    shirley wilder - 2010-05-31
    • status: pending --> open
     
  • shirley wilder

    shirley wilder - 2010-05-31

    Err.
    And I have changed the htmlparse class to thread safe and remove all static method to normal.But it still blocks.Sometimes,One thread costs near several minutes to parse the html.
    I have tracked the code,and found the problem is nekohtml.
    I don't read the source code of nekohtml, I think the nekohtml or the dependence lib has used the synchronized static method which cost very long time.So it makes many threads block there.

     
  • Marc Guillemot

    Marc Guillemot - 2010-06-01

    Can you provide more information? Which static methods are synchronized and causing problems?

    Looking at the source code of NekoHTML, I could only find one synchronized block (in ObjectFactory) and I'm quite sure that it can't be the cause of performance problems.

     
  • Marc Guillemot

    Marc Guillemot - 2010-06-01
    • status: open --> pending
     
  • shirley wilder

    shirley wilder - 2010-06-01

    I have modified the code of HTMLParser.java.
    So it can support for multi thread access.
    When I test it with 1000 threads.It blocks here.

    long start=System.currentTimeMillis();
    super.parse(inputSource);
    System.out.println("parse time:"+(System.currentTimeMillis()-start));

    The super.parse comes from nekohtml.So I think that is the problem.
    But I didn't read the code.So I really don't know the cause.

     
  • shirley wilder

    shirley wilder - 2010-06-01
    • status: pending --> open
     
  • shirley wilder

    shirley wilder - 2010-06-01

    HTMLParser.java

     
  • Marc Guillemot

    Marc Guillemot - 2010-06-01

    You seem to speak from HtmlUnit, not NekoHTML. If you're able to fix your problem by changing HtmlUnit, then it should probably be changed in HtmlUnit, not in NekoHTML.

    I'm a bit tired of trying to guess what you want and therefore I'm closing this issue. Please reopen it when you're able to provide *precise* information.

     
  • Marc Guillemot

    Marc Guillemot - 2010-06-01
    • assigned_to: nobody --> mguillem
    • status: open --> closed
     
  • shirley wilder

    shirley wilder - 2010-06-01

    I'm sorry to trouble you so much.
    I think I didn't explain what I really want to do.
    I have to test my site with more ten thousands data with htmlunit.If I test it one by one,it will cost me several hours for one completely test.So I have to use multi-thread model.It'll save me lots of time.
    But even I have used 1000 threads to test.It still cost me several hours.I track the code,and find all threads block at

    long start=System.currentTimeMillis();
    super.parse(inputSource);
    System.out.println("parse time:"+(System.currentTimeMillis()-start));

    And this method comes from nekohtml.So I think this is the problem.
    So what I really want is every thread can parse independently.If so,I can finish my one complete test in several minutes.

    Thanks.

     
  • shirley wilder

    shirley wilder - 2010-06-01
    • status: closed --> open
     
  • Marc Guillemot

    Marc Guillemot - 2010-06-01

    I think that this should discussed in HtmlUnit mailing lists until it is clear that it is a NekoHTML issue because otherwise we come here to discussion about HtmlUnit classes like for instance: do you use a WebClient per thread?

     
  • shirley wilder

    shirley wilder - 2010-06-01

    Yes, I use a webclient per thread.
    Ok, I'll post a thread in HtmlUnit mailling list.
    Thanks again!

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.