Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

5 Figure what profiler to use - ID: 1002336
Last Update: Comment added ( karl-ia )

I got the eclipse profiler --
http://eclipsecolorer.sourceforge.net/index_profiler.html
-- to work locally. Its dog slow. Buggy. It keeps
OOME'ing though I up my mem allocation.

Going against a remote machine, ops03, only, everytime
it connects it seems to crap out immediately with the
below.

java.net.SocketException: Connection reset
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
at
java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at
java.io.BufferedOutputStream.write(BufferedOutputStream.java:78)
at
ru.nlmk.eclipse.plugins.profiler.io.ThinDataOutputStream.writeChar(ThinData
OutputStream.java:19)
at
ru.nlmk.eclipse.plugins.profiler.io.ThinDataOutputStream.writeString(ThinDa
taOutputStream.java:41)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:87)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceTreeInfo.writeStatistics(TraceT
reeInfo.java:94)
at
ru.nlmk.eclipse.plugins.profiler.trace.TraceThreadInfo.write(TraceThreadInf
o.java:312)
at
ru.nlmk.eclipse.plugins.profiler.trace.Trace.writeInstrumentationStatistics
(Trace.java:434)
at
ru.nlmk.eclipse.plugins.profiler.trace.Trace.writeStatistictToFrontEnd(Trac
e.java:290)
at
ru.nlmk.eclipse.plugins.profiler.trace.Trace$SenderThread.run(Trace.java:14
7)




Here is launch script I used:


stack@ops03:~$ more profiler.sh
#!/bin/sh
LIB=$HOME/lib/
export LD_LIBRARY_PATH=$LIB:$LD_LIBRARY_PATH
export JAVA_OPTS="-Xmx1024m -XrunProfilerDLL:1
-Xbootclasspath/a:$LIB/commons-lang.jar:$LIB/jakarta-regexp.jar:$LIB/profil
er_trace.jar
-D__PROFILER_PACKAGE_FILTER=__A__org.archive.crawler.Heritrix:__M__sun.:__M
__com.sun.:__M__java.:__M__javax.
-D__PROFILER_TIMING_METHOD=1"
$HOME/heritrix/bin/foreground_heritrix


I had to build the native profiling shared llib and
put it into the jdk i386 lib dir so it could be found.

Here is the script I used to build the shared lib:


stack@ops03:~/lib$ more m
#!/bin/sh
g++ -O0 -DLINUX -shared -Wall -I.
-I${JAVA_HOME}/include -I${JAVA_HOME}/include/linux
ProfilerDLL.cpp -o libProfilerDLL.so
cp libProfilerDLL.so
${JAVA_HOME}/jre/lib/i386/libProfilerDLL.so


Michael Stack ( stack-sf ) - 2004-08-02 23:37

5

Closed

None

Nobody/Anonymous

Performance

None

Public


Comments ( 11 )

Date: 2007-03-14 01:32
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-812 -- please add further
comments at that location.


Date: 2004-08-23 19:20
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

JProfiler gave us licenses.


Date: 2004-08-04 17:11
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Ok. This JProfiler looks pretty good. The profiled
application will crash afer a while for the more austere
profiling configs (i.e. Full Instrumentation) but runs long
enough to get a picture.

The value of the profiiler will come with familiarity and
frequent use; we'll be able to look at the graphs and see if
a change made things better or worse but we need to be used
to how the profiler to be able to detect change.

I'll write to JProfiler to see if they'll give us a copy for
free or a price break.

Meantime, spending time trying to make sense of the info.


Date: 2004-08-03 23:09
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Can't get another optimizeit eval license without making up
some address somewhere. I wanted to confirm it does a
better job Besides says requiremens are red hat 7.3.
Tried starting it on debian and gave out message about
installing rpms.

Was going to give up on jprofiler for getting a better
picture of CPU use. and then I used the sampling rather than
recording of exact use and now I get a good picture of CPU
use (Don't know whether I should believe it or not, but it
looks good). Will attach it to the performance wiki page
soon as I can get it out (First attempt crashed). Picture
is of a broad crawl with 200 threads crawling 50 porn seeds.
Its pretty.






Date: 2004-08-03 20:12
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Tried jprofiler to see if I could get better reporting on
CPU usage. $500. Very nice -- fancy wizards, beautiful
views -- but remote app crashes when I try to disable the
sun and apache filters (The CPU usage shows ToeThread#run at
90% but not what its doing in #run. In YJP could infer it
calling processCrawlURI by looking at percentages). Remote
app is using jdk1.5.0 because its ops03 machine with 2.6
kernel and NPTL.

Trying on a 1.4.2 crawl machine, still can't get any more
granularity than ToeThread.#run. Nothing in help nor
configuration seems to speak to this. Giving up on this
profiler.

Here is start script:

stack@ops03:~$ more jprofiler.sh
#!/bin/sh
JAVA_HOME=/usr/local/jdk1.5.0/
PATH=$JAVA_HOME/bin:$PATH
JPROFILER_HOME=$HOME/jprofiler3/
LIB=$JPROFILER_HOME/bin/linux-x86
export LD_LIBRARY_PATH=$LIB:$LD_LIBRARY_PATH
export JAVA_OPTS="-Xmx1024m -Xint -Xshare:off
-Xrunjprofiler:port=8849
-Xbootclasspath/a:$JPROFILER_HOME/bin/agent.jar"
$HOME/heritrix/bin/foreground_heritrix


Date: 2004-08-03 17:19
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

yjp is annoying in that I can't export the reports as HTML.

Here is start script I used on remote side to get profiling
to work:

stack@ops03:~$ more yjp.sh
#!/bin/sh
LIB=$HOME/yjp-2.5.2-build316/bin/
export LD_LIBRARY_PATH=$LIB:$LD_LIBRARY_PATH
export JAVA_OPTS="-Xmx1024m -Xrunyjpagent:cpu=sampling "
$HOME/heritrix/bin/foreground_heritrix



Date: 2004-08-03 16:00
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

The yourkit profiler is very basic. Takes cpu and memory
snapshots. You have to copy them from remote machine to
local and load it up into the viewing tool. The cpu
rendering lists percenage usage by package down to class
level and then to method, but it does not make connection
between classes so you'll see that the HTTPFetcher consumes
80% of CPU in its innerProcess method but it won't say which
of the methods called by innerProcess are responsible; you
have to infer by looking at other places where 80% or
sub-80% are consumed and assume thats where innerProcess
went next (In other profilers, they'll follow the usage
till you can drill down no more).


Date: 2004-08-03 01:51
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Looking at yourkit profiler. Its 300 euros.
http://www.yourkit.com/download/index.jsp.




Date: 2004-08-03 00:57
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

jMechanic is no longer actively supported. Suggests using
http://www.eclipse.org/hyades/ instead. I tried this but it
has explicit libc version requirements. I ain't up for
moving my libc version back a bunch of versions to make this
profiler work.

JProfiler is 500 dollars per single-user.

ejp seems to be dead since january.

JMP pages won't load. Server must be down (The debian
packages for it are broken; can't resolve dependencies. One
of the packages is missing).




Date: 2004-08-03 00:22
Sender: stack-sfProject Admin

Logged In: YES
user_id=924942

Running locally profiliing CPU, soon after startup it says
frontier#next is responsible for up to 90% of cpu time but
tells me no more than that. I try to set it up to debug
threads and it looks like it sees the two PoolThreads only
and all of the other thread events are dump out on console.

Giving up on this profiler.


Date: 2004-08-03 00:15
Sender: nobody

Logged In: NO

I looked at commercial profilers: Optimizeit is 700 dollars
plus for single user and 2400 for the enterprise edition.


Attached File

No Files Currently Attached

Changes ( 2 )

Field Old Value Date By
status_id Open 2004-08-23 19:20 stack-sf
close_date - 2004-08-23 19:20 stack-sf