opjitconv fails with "Floating point exception" when it tries to convert jit dump file, created by oprofile java agent libjvmti_oprofile.so.
running opjitconv in debug mode (-d) reveals the problem:
...
overlap idx=554, name=LsomeJavaSymbol;#0, start=7f60345be6e0, end=7f60345be840, life_start=1347859492, life_end=1347859492, lifetime=0
overlap idx=555, name=LsomeJavaSymbol;#0, start=7f60345be6e0, end=7f60345be840, life_start=1347859492, life_end=1347859492, lifetime=0
invalidate_entry: addr=7f60345be6e0, name=LsomeJavaSymbol;#0
Program received signal SIGFPE, Arithmetic exception.
0x0000000000404c44 in handle_overlap_region (start_idx=<value optimized out>, end_idx=<value optimized out>) at jitsymbol.c:400
400 j = (e->life_end - e->life_start) * 100 / totaltime;
(gdb) bt
#0 0x0000000000404c44 in handle_overlap_region (start_idx=<value optimized out>, end_idx=<value optimized out>) at jitsymbol.c:400
#1 0x00000000004051c5 in scan_overlaps (start_time=<value optimized out>) at jitsymbol.c:475
#2 resolve_overlaps (start_time=<value optimized out>) at jitsymbol.c:504
#3 0x0000000000403ffb in op_jit_convert (file_info=..., elffile=0x6c7340 "/tmp/oprofile.KECCPG/11792.jo", start_time=1347777318, end_time=<value optimized out>)
at conversion.c:61
#4 0x00000000004039e3 in process_jit_dumpfile (session_dir=<value optimized out>, start_time=1347777318, end_time=1347863714) at opjitconv.c:402
#5 op_process_jit_dumpfiles (session_dir=<value optimized out>, start_time=1347777318, end_time=1347863714) at opjitconv.c:656
#6 0x0000000000403ed2 in main (argc=<value optimized out>, argv=<value optimized out>) at opjitconv.c:812
which leads to handle_overlap_region function in jitsymbol.c:
> cnt = 1;
> j = (e->life_end - e->life_start) * 100 / totaltime;
> while ((j = j/10))
> cnt++;
as 'totaltime' is zero we have divide-by-zero exception. As far as I understand it is normal to have some symbol lifetife to be = 0 for very fast invocations within one single time tick (life_start == life_end). So I suggest the following modifications which I think should be applied anyway to provide more robustness.
--- oprofile-0.9.8/opjitconv/jitsymbol.c.orig 2012-09-17 18:01:53.000000000 +1100
+++ oprofile-0.9.8/opjitconv/jitsymbol.c 2012-09-19 17:09:00.000000000 +1100
@@ -375,7 +375,7 @@
int cnt;
char * name;
int i, j;
- unsigned long long totaltime;
+ unsigned long long totaltime, pct;
if (debug) {
for (i = start_idx; i <= end_idx; i++) {
@@ -396,16 +396,18 @@
}
e = entries_address_ascending[idx];
+
+ pct = (totaltime == 0) ? 100 : (e->life_end - e->life_start) * 100 / totaltime;
+
cnt = 1;
- j = (e->life_end - e->life_start) * 100 / totaltime;
+ j = pct;
while ((j = j/10))
cnt++;
// Mark symbol name with a %% to indicate the overlap.
cnt += strlen(e->symbol_name) + 2 + 1;
name = xmalloc(cnt);
- snprintf(name, cnt, "%s%%%llu", e->symbol_name,
- (e->life_end - e->life_start) * 100 / totaltime);
+ snprintf(name, cnt, "%s%%%llu", e->symbol_name, pct);
if (e->sym_name_malloced)
free(e->symbol_name);
e->symbol_name = name;
Daniel, can you please take a look at this bug and proposed fix? Thanks.
Daniel, will you be able to find time to fix this?
Hi Maynard,
sure.
In the meantime I had a bigger change in my working area.
I just forgot to work on this bug. Sorry.
I think I could complete the work on this bug until next week.
Kind regards...
Hi,
the patch is good enough to be pushed.
ACK from myself.
Kind regards...
A properly formatted patch would have been nice. I hand-patched the jitsymbol.c file and committed it. This bug is now resolved.