Share

Heritrix: Internet Archive Web Crawler

Tracker: Feature Requests

5 garbage hot spot: SerialBinding & FastOutputStream.bump() - ID: 1208770
Last Update: Comment added ( karl-ia )

Allocation sites profiling showed a BDB util class
method, FastOutputStream.bump(), as a top garbage
creator (oftentimes, the top of all allocation sites).
Turns out that for our ~1K serialized CrawlURIs, a
buffer was being grown and copied to the larger version
multiple times per serialization.

Sleepycat reports others have runn into this and the
initial-size/grow will be better behaved and tunable in
BDBJE 2.0.

Slimming our CrawlURI serialization should help a lot
too. (See #1208747
http://sourceforge.net/tracker/index.php?func=detail&aid=1208747&group_id=7
3833&atid=539102
).

However, we can also refine SerialBinding slightly in
the meantime to reuse a single FastOutputStream per
thread, making the number of allocations here
negligible. (The single stream per thread would each
grow its buffer to a sufficiently spacious size
quickly, eliminating almost all initial allocations and
per-serialization grows.)



Gordon Mohr ( gojomo ) - 2005-05-25 21:29

5

Closed

None

Gordon Mohr

None

1.6.0

Public


Comments ( 3 )

Date: 2007-03-14 01:42
Sender: karl-ia


This issue is now discussed in the new JIRA tracker at
http://webteam.archive.org/jira/browse/HER-941 -- please add further
comments at that location.


Date: 2005-08-04 22:20
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Closing. Will open new issue for any new hot spots found in
future.


Date: 2005-05-25 21:33
Sender: gojomoProject Admin

Logged In: YES
user_id=144912

Addressed. Commit comment:

Fix for [ 1208770 ] garbage hot spot: SerialBinding &
FastOutputStream.bump()
* RecyclingSerialBinding.java
subclass of SerialBinding which reuses a single
FastOutputStream per thread (via ThreadLocal); should drive
total allocations to a negligible number
* BdbMultipleWorkQueues.java
use RecyclingSerialBinding
(also: commented-out intrumentation uses to estimate
CrawlURI serialization sizes, left in while changes there
are new)

Leaving open pending new allocation-profiling data.



Attached File

No Files Currently Attached

Changes ( 3 )

Field Old Value Date By
artifact_group_id None 2005-09-23 21:08 gojomo
close_date - 2005-08-04 22:20 gojomo
status_id Open 2005-08-04 22:20 gojomo