I want to test my site with htmlunit with nearly 1000 threads.But it can only parse html with one thread using nekohtml.So the speed is slow.So I think it'll be wonderful if nekohtml supports multi-thead.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can you explain a bit what you want to do? For me it doesn't make sense to use more than one thread to parse a document but x threads can be used to parse x document.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I use one thread to parse a html,and it costs 5s.
What I want to do is parses 1000 html using 1000 threads,and the using time is also 5s or a little more.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Err.
And I have changed the htmlparse class to thread safe and remove all static method to normal.But it still blocks.Sometimes,One thread costs near several minutes to parse the html.
I have tracked the code,and found the problem is nekohtml.
I don't read the source code of nekohtml, I think the nekohtml or the dependence lib has used the synchronized static method which cost very long time.So it makes many threads block there.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can you provide more information? Which static methods are synchronized and causing problems?
Looking at the source code of NekoHTML, I could only find one synchronized block (in ObjectFactory) and I'm quite sure that it can't be the cause of performance problems.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You seem to speak from HtmlUnit, not NekoHTML. If you're able to fix your problem by changing HtmlUnit, then it should probably be changed in HtmlUnit, not in NekoHTML.
I'm a bit tired of trying to guess what you want and therefore I'm closing this issue. Please reopen it when you're able to provide *precise* information.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm sorry to trouble you so much.
I think I didn't explain what I really want to do.
I have to test my site with more ten thousands data with htmlunit.If I test it one by one,it will cost me several hours for one completely test.So I have to use multi-thread model.It'll save me lots of time.
But even I have used 1000 threads to test.It still cost me several hours.I track the code,and find all threads block at
long start=System.currentTimeMillis();
super.parse(inputSource);
System.out.println("parse time:"+(System.currentTimeMillis()-start));
And this method comes from nekohtml.So I think this is the problem.
So what I really want is every thread can parse independently.If so,I can finish my one complete test in several minutes.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think that this should discussed in HtmlUnit mailing lists until it is clear that it is a NekoHTML issue because otherwise we come here to discussion about HtmlUnit classes like for instance: do you use a WebClient per thread?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Can you expand on what you mean and how it might be beneficial?
I want to test my site with htmlunit with nearly 1000 threads.But it can only parse html with one thread using nekohtml.So the speed is slow.So I think it'll be wonderful if nekohtml supports multi-thead.
Thanks!
Can you explain a bit what you want to do? For me it doesn't make sense to use more than one thread to parse a document but x threads can be used to parse x document.
I use one thread to parse a html,and it costs 5s.
What I want to do is parses 1000 html using 1000 threads,and the using time is also 5s or a little more.
Thanks!
sorry, but I still don't understand what the problem is. Can you explain more precisely?
Err.
And I have changed the htmlparse class to thread safe and remove all static method to normal.But it still blocks.Sometimes,One thread costs near several minutes to parse the html.
I have tracked the code,and found the problem is nekohtml.
I don't read the source code of nekohtml, I think the nekohtml or the dependence lib has used the synchronized static method which cost very long time.So it makes many threads block there.
Can you provide more information? Which static methods are synchronized and causing problems?
Looking at the source code of NekoHTML, I could only find one synchronized block (in ObjectFactory) and I'm quite sure that it can't be the cause of performance problems.
I have modified the code of HTMLParser.java.
So it can support for multi thread access.
When I test it with 1000 threads.It blocks here.
long start=System.currentTimeMillis();
super.parse(inputSource);
System.out.println("parse time:"+(System.currentTimeMillis()-start));
The super.parse comes from nekohtml.So I think that is the problem.
But I didn't read the code.So I really don't know the cause.
HTMLParser.java
You seem to speak from HtmlUnit, not NekoHTML. If you're able to fix your problem by changing HtmlUnit, then it should probably be changed in HtmlUnit, not in NekoHTML.
I'm a bit tired of trying to guess what you want and therefore I'm closing this issue. Please reopen it when you're able to provide *precise* information.
I'm sorry to trouble you so much.
I think I didn't explain what I really want to do.
I have to test my site with more ten thousands data with htmlunit.If I test it one by one,it will cost me several hours for one completely test.So I have to use multi-thread model.It'll save me lots of time.
But even I have used 1000 threads to test.It still cost me several hours.I track the code,and find all threads block at
long start=System.currentTimeMillis();
super.parse(inputSource);
System.out.println("parse time:"+(System.currentTimeMillis()-start));
And this method comes from nekohtml.So I think this is the problem.
So what I really want is every thread can parse independently.If so,I can finish my one complete test in several minutes.
Thanks.
I think that this should discussed in HtmlUnit mailing lists until it is clear that it is a NekoHTML issue because otherwise we come here to discussion about HtmlUnit classes like for instance: do you use a WebClient per thread?
Yes, I use a webclient per thread.
Ok, I'll post a thread in HtmlUnit mailling list.
Thanks again!