From: William K. V. <wk...@us...> - 2005-06-04 08:33:15
|
Hello, While perusing the source code I noticed that the usage of libxml2 xmlGetProp was incorrect, the result of the call should be freed via xmlFree. I've made numerous fixes to parsexml.c and free.c to fix the oversight. While testing those changes however I noticed that memory appeared to be leaking at a value equal to the size of the report. After much pondering and stepping through things with gdb I've located the cause however fixing it is non-trivial (removing the memory leak fixes for libxml2 usage might help). What is happening is this, if a part only has one report a call is made to rlib_evaulate_single_report_variables, which in turn calls rlib_resolve_report_fields. Later in the make_report routine a call is made to rlib_layout_part_tr which again will trigger a call to rlib_resolve_report_fields. Thus stepping on the results of the first call and leaking about a reports worth of memory. I'm not certain on how to proceed from here, perhaps coding an equivalent of rlib_report_free that only destroys the "dynamically" allocated portions of the report, or perhaps add look before allocating to the rlib_resolve_report_fields routine. Anyway since "parts" are the newly implemented feature it would probably be best for me to punt and let the designer think about what the rlib_evaulate_single_report_variables routine is to accomplish and how best to do so. If someone would like the patches I made to parsexml.c and free.c let me know. Kind regards, William. |
From: Bob D. <bd...@si...> - 2005-06-05 17:11:54
|
On Sat, 2005-06-04 at 02:31 -0600, William K. Volkman wrote: > Hello, > While perusing the source code I noticed that the > usage of libxml2 xmlGetProp was incorrect, the result of the > call should be freed via xmlFree. I've made numerous fixes > to parsexml.c and free.c to fix the oversight. Hymm.. your right. Any chance you'll sign this http://rlib.sicompos.com/copyright_form.pdf and send in a patch? > While testing > those changes however I noticed that memory appeared to > be leaking at a value equal to the size of the report. After > much pondering and stepping through things with gdb I've > located the cause however fixing it is non-trivial (removing > the memory leak fixes for libxml2 usage might help). What > is happening is this, if a part only has one report a call > is made to rlib_evaulate_single_report_variables, which > in turn calls rlib_resolve_report_fields. Later in the > make_report routine a call is made to rlib_layout_part_tr > which again will trigger a call to rlib_resolve_report_fields. > Thus stepping on the results of the first call and leaking > about a reports worth of memory. > I'm not certain on how > to proceed from here, perhaps coding an equivalent of > rlib_report_free that only destroys the "dynamically" > allocated portions of the report, or perhaps add look before > allocating to the rlib_resolve_report_fields routine. Anyway > since "parts" are the newly implemented feature it would > probably be best for me to punt and let the designer think > about what the rlib_evaulate_single_report_variables routine > is to accomplish and how best to do so. If someone would like > the patches I made to parsexml.c and free.c let me know. To fix this double free probably in rlib_layout_part_tr check "report->is_the_only_report" and don't call it again if it is TRUE Thanks for pointing these out. Will you be able to send in a patch fixing all of this? - Bob |
From: Shannon W. <we...@ro...> - 2005-06-05 17:55:30
|
Bob Doan wrote: > On Sat, 2005-06-04 at 02:31 -0600, William K. Volkman wrote: > >>Hello, >> While perusing the source code I noticed that the >>usage of libxml2 xmlGetProp was incorrect, the result of the >>call should be freed via xmlFree. I've made numerous fixes >>to parsexml.c and free.c to fix the oversight. > > > Hymm.. your right. > .. > > To fix this double free probably in rlib_layout_part_tr check > "report->is_the_only_report" and don't call it again if it is TRUE > > Thanks for pointing these out. > > Will you be able to send in a patch fixing all of this? > > - Bob > FWIW, this may be related to the segfaults I'm seeing in Apache (which only happen after several reloads). Definitely seems like a memory issue. Bob, any headway on the HTML/trim_links output issues? Thanks, Shannon |
From: Bob D. <bd...@si...> - 2005-06-05 18:08:08
|
> FWIW, this may be related to the segfaults I'm seeing in Apache (which > only happen after several reloads). Definitely seems like a memory issue. I'm not sure that not freeing something would cause a segfault... I think the #1 cause for a segfault in RLIB is bad query/datasource and not checking the return codes... We use RLIB in 2 production environments so we stomp out any crashes we see. I have not fixed any in the past month or so.... > > Bob, any headway on the HTML/trim_links output issues? Err no... I've been out of town/ busy catching up with other things. Hopefully this week I'll have some time to poke @ it. -- Bob Doan <bd...@si...> |
From: Shannon W. <we...@ro...> - 2005-06-06 13:48:08
|
Bob Doan wrote: > I'm not sure that not freeing something would cause a segfault... > True, may be a stretch. > I think the #1 cause for a segfault in RLIB is bad query/datasource and > not checking the return codes... > Well, the segfaults do only happen when it should be showing an error message (ie, a failed sql query or an unknown pcode). But like I said, at any one time, I can get at least 5 or 6 error messages out of it ok (which show up in apache error log) before it starts segfaulting (instead of showing error messages - it just says the apache child died in the error log). When I restart apache, it works again for another 5 or 6. So something is starting out ok and getting corrupted along the way. It's been hard for me to track down because running it in gdb (apache single process) isn't giving much information and I don't have a way to reproduce it without apache. I'm willing to try any suggestions you guys have. You say the #1 cause is a bad query/datasouce -- why should that ever actually segfault, as opposed to issuing an error message and exiting cleanly? Thanks, Shannon |
From: Bob D. <bd...@si...> - 2005-06-06 14:10:38
|
> Well, the segfaults do only happen when it should be showing an error > message (ie, a failed sql query or an unknown pcode). But like I said, > at any one time, I can get at least 5 or 6 error messages out of it ok > (which show up in apache error log) before it starts segfaulting > (instead of showing error messages - it just says the apache child died > in the error log). When I restart apache, it works again for another 5 > or 6. So something is starting out ok and getting corrupted along the way. > > It's been hard for me to track down because running it in gdb (apache > single process) isn't giving much information and I don't have a way to > reproduce it without apache. I'm willing to try any suggestions you guys > have. Are you getting a "*** NUTS WE CRASHED??" in the log A few times I have tracked down a crash by going it to util.c add a sleep(180) to myFaultHandler make, make install then when it crashes quickly starting trying to grab the pids of apache processes w/ gdb gdb /usr/sbin/httpd xxxx where xxx is one of the apache process ID's once you find the pid that crashed but is in limbo issue a "bt" to see whats going on. This kinda sucks but it works. > > You say the #1 cause is a bad query/datasouce -- why should that ever > actually segfault, as opposed to issuing an error message and exiting > cleanly? It does issue errors messages but you need to check the results of data sources and queries before you call execute. I think most of the crashes have been fixed, there might be one left w/ data sources. - bob |
From: William K. V. <wk...@us...> - 2005-06-06 18:23:58
|
Hello Shannon, Help me out a bit if you will by giving me a little more information, unless your only interested in Bob looking into it. On Mon, 2005-06-06 at 07:47, Shannon Weyrick wrote: > It's been hard for me to track down because running it in gdb (apache > single process) isn't giving much information and I don't have a way to > reproduce it without apache. I'm willing to try any suggestions you guys > have. > > You say the #1 cause is a bad query/datasouce -- why should that ever > actually segfault, as opposed to issuing an error message and exiting > cleanly? What language are you using (PHP)? Which data source(s) do you use? I've checked out fields, literals, variables, and images. Do you use graphs? Callback functions? Thanks, William. |
From: Shannon W. <we...@ro...> - 2005-06-06 19:34:40
|
Hi William, William K. Volkman wrote: > Hello Shannon, > Help me out a bit if you will by giving me a little more > information, unless your only interested in Bob looking into it. > Nope, I'm not picky about who solves my problems :) > On Mon, 2005-06-06 at 07:47, Shannon Weyrick wrote: > >>It's been hard for me to track down because running it in gdb (apache >>single process) isn't giving much information and I don't have a way to >>reproduce it without apache. I'm willing to try any suggestions you guys >>have. >> >>You say the #1 cause is a bad query/datasouce -- why should that ever >>actually segfault, as opposed to issuing an error message and exiting >>cleanly? > > > What language are you using (PHP)? > Which data source(s) do you use? > I've checked out fields, literals, variables, and images. Do you use > graphs? Callback functions? > > PHP4 and MySQL. I use all of the above except images, but I believe I can duplicate the segfault with even a small, basic report. Unfortunately I'm in the middle of crunch time on a project that doesn't use rlib so it's bit tough to work on this at the moment. When I get a chance I'll try to scrape together a little more info and maybe even a reproduction script/report. I also have yet to try it on multiple systems, I'm not sure I can duplicate it on anything but my own devel machine. Thanks, Shannon |
From: William K. V. <wk...@us...> - 2005-06-05 19:41:22
|
On Sun, 2005-06-05 at 11:55, Shannon Weyrick wrote: > FWIW, this may be related to the segfaults I'm seeing in Apache (which > only happen after several reloads). Definitely seems like a memory issue. This wouldn't be caused by memory leaks and I've not noticed any memory overruns yet (I've not exercised all of the functionality, I.E. graphs). Do you have a small test case or can you elaborate on what rlib elements you are using? Thanks, William. |
From: William K. V. <wk...@us...> - 2005-06-05 18:07:07
|
Hello Bob, On Sun, 2005-06-05 at 11:11, Bob Doan wrote: > On Sat, 2005-06-04 at 02:31 -0600, William K. Volkman wrote: > > While perusing the source code I noticed that the > > usage of libxml2 xmlGetProp was incorrect, the result of the > > call should be freed via xmlFree. I've made numerous fixes > > to parsexml.c and free.c to fix the oversight. > > Hymm.. your right. > > Any chance you'll sign this http://rlib.sicompos.com/copyright_form.pdf > and send in a patch? Can do. > > > > While testing > > those changes however I noticed that memory appeared to > > be leaking at a value equal to the size of the report. After > > much pondering and stepping through things with gdb I've > > located the cause however fixing it is non-trivial (removing > > the memory leak fixes for libxml2 usage might help). What > > is happening is this, if a part only has one report a call > > is made to rlib_evaulate_single_report_variables, which > > in turn calls rlib_resolve_report_fields. Later in the > > make_report routine a call is made to rlib_layout_part_tr > > which again will trigger a call to rlib_resolve_report_fields. > > Thus stepping on the results of the first call and leaking > > about a reports worth of memory. > > > > I'm not certain on how > > to proceed from here, perhaps coding an equivalent of > > rlib_report_free that only destroys the "dynamically" > > allocated portions of the report, or perhaps add look before > > allocating to the rlib_resolve_report_fields routine. Anyway > > since "parts" are the newly implemented feature it would > > probably be best for me to punt and let the designer think > > about what the rlib_evaulate_single_report_variables routine > > is to accomplish and how best to do so. If someone would like > > the patches I made to parsexml.c and free.c let me know. > > To fix this double free probably in rlib_layout_part_tr check > "report->is_the_only_report" and don't call it again if it is TRUE Well that seems to be an elegantly simple fix, I'll try it and let you know. > Thanks for pointing these out. > > Will you be able to send in a patch fixing all of this? I'll be happy to I normally use "diff -u" format, is that OK? Cheers, William. |
From: Bob D. <bd...@si...> - 2005-06-05 18:12:11
|
> > > > Any chance you'll sign this http://rlib.sicompos.com/copyright_form.pdf > > and send in a patch? > > Can do. > Great. Let me know if you Fax or Mail it so I can be looking for it. > > I'll be happy to I normally use "diff -u" format, is that OK? Perfect - Bob |
From: William K. V. <wk...@us...> - 2005-06-05 23:07:42
Attachments:
wkv-memleak-20050605.patch
|
Hello, I've finished making changes, except for a leaklet in rpdf.c (not sure what's going on there). There were a couple of places that I was not certain what the design intent was so I left code in "#if 0"ed out, if that's unacceptable let me know. It still runs my smallish report just fine, YMMV. If it breaks anything let me know. On Sun, 2005-06-05 at 11:11, Bob Doan wrote: > To fix this double free probably in rlib_layout_part_tr check > "report->is_the_only_report" and don't call it again if it is TRUE > > Thanks for pointing these out. > > Will you be able to send in a patch fixing all of this? Attached. Cheers, William. |
From: Bob D. <bd...@si...> - 2005-06-06 00:29:05
|
Hi, Patch looks good. But there are two problems. In resolution.c the strduping needs to stay. There is no guarantee that the pointer returned from data sources is going to stay where its at and it since RLIB manipulates some of that data it would be rude to change it on the datasource. But there is a serious leak there. Put the strduping back but in pcode.c line 738: Change: return rlib_value_new(rval, RLIB_VALUE_STRING, FALSE, rlib_resolve_field_value(r, o->value)); To: return rlib_value_new(rval, RLIB_VALUE_STRING, TRUE, rlib_resolve_field_value(r, o->value)); So it knows to free the string when the RVAL is freed. Second is rlib_value_dup_contents removal of the strdup. There is going to be a double free if it's not duped Follow "color" in layout.c: extra_data[i].rval_color = line_rval_color; // <--- Really a memset here rlib_value_dup_contents(&extra_data[i].rval_color); . . . rlib_value_free(&line_rval_color); . . . . and later the extra data is freed. The correct thing to do is probably is set the RVAL to free: struct rlib_value * rlib_value_dup_contents(struct rlib_value *rval) { if(rval->type == RLIB_VALUE_STRING) { rval->string_value = g_strdup(rval->string_value); rval->free = TRUE; } return rval; } Other then that great work! How did you track these down? Typically I use valgrind to find these sorta things. - bob |
From: William K. V. <wk...@us...> - 2005-06-06 02:32:19
Attachments:
wkv-memleak-20050605a.patch
|
On Sun, 2005-06-05 at 18:28, Bob Doan wrote: > Patch looks good. But there are two problems. > > In resolution.c the strduping needs to stay. There is no guarantee that > the pointer returned from data sources is going to stay where its at and > it since RLIB manipulates some of that data it would be rude to change > it on the datasource. > > But there is a serious leak there. Put the strduping back but in > pcode.c line 738: > > Change: > > return rlib_value_new(rval, RLIB_VALUE_STRING, FALSE, > rlib_resolve_field_value(r, o->value)); > > To: > > return rlib_value_new(rval, RLIB_VALUE_STRING, TRUE, > rlib_resolve_field_value(r, o->value)); > > So it knows to free the string when the RVAL is freed. Did that and verified that it still doesn't leak. > Second is rlib_value_dup_contents removal of the strdup. > > There is going to be a double free if it's not duped > > Follow "color" in layout.c: > > extra_data[i].rval_color = line_rval_color; // <--- Really a memset here > rlib_value_dup_contents(&extra_data[i].rval_color); > . > . > . > rlib_value_free(&line_rval_color); I had run test cases, and just verified again that the example case doesn't fail, even for color (from a previous message): Well I commented it out, ran some test cases like: <Line bgcolor="'0xefefef'"> <literal bgcolor="'0xffffee'" width="95" align="right">Total:</literal> <field value="v.subtotal" format="'!$%(n'" width="11" align="right"/> </Line> And did not see any problems, no references to freed memory or things like that. That said however you are the designer so know what you intended so I tried your other correction. > . > and later the extra data is freed. > > The correct thing to do is probably is set the RVAL to free: > > struct rlib_value * rlib_value_dup_contents(struct rlib_value *rval) { > if(rval->type == RLIB_VALUE_STRING) { > rval->string_value = g_strdup(rval->string_value); > rval->free = TRUE; > } > return rval; > } > Tried this and it also worked, the leaking was fixed, this is now the patch. > How did you track these down? Typically I use valgrind to find these > sorta things. I first noticed the leaking from the python side and it's garbage collector. It's a check I always do when I find out SWIG has been used (every time I find major memory leaks, some of them even SWIG related :-) I then perused the code and, due to having just poked my nose into a python-libxml2 interface, spotted the usage of GetProp without the frees. The leaking was better however still there from python, code reading found the resolve_fields case. Today I bit the bullet so to speak and downloaded and fired up valgrind, generated supressions for python, and verified and fixed almost all remaining issues. One left however is in rpdf.c, the call to rpdf_text is leaving allocations on the slist: ==14555== 13627 bytes in 32 blocks are still reachable in loss record 28 of 32 ==14555== at 0x1B904A80: malloc (vg_replace_malloc.c:131) ==14555== by 0xAA7A56: g_malloc (in /usr/lib/libglib-2.0.so.0.400.8) ==14555== by 0xAA8C03: g_mem_chunk_alloc (in /usr/lib/libglib-2.0.so.0.400.8) ==14555== by 0xAB675C: (within /usr/lib/libglib-2.0.so.0.400.8) ==14555== by 0xAB5CB1: g_slist_append (in /usr/lib/libglib-2.0.so.0.400.8) ==14555== by 0x1B945607: rpdf_stream_append (rpdf.c:142) ==14555== by 0x1B94764B: rpdf_text (rpdf.c:798) ==14555== by 0x1BD74545: pdf_print_text (pdf.c:113) ==14555== by 0x1BD650EF: rlib_layout_text_string (layout.c:323) ==14555== by 0x1BD66C9C: rlib_layout_report_output_array (layout.c:820) ==14555== by 0x1BD672C7: rlib_layout_report_outputs_across_pages (layout.c:951) ==14555== by 0x1BD67302: rlib_layout_report_output (layout.c:961) ==14555== by 0x1BD63A73: rlib_layout_report (reportgen.c:571) ==14555== by 0x1BD63F1D: rlib_layout_part_td (reportgen.c:668) ==14555== by 0x1BD6416A: rlib_layout_part_tr (reportgen.c:715) ==14555== by 0x1BD64503: rlib_make_report (reportgen.c:800) Since I'm not a glib guru (actually I've not coded in C since 1997) I'm not certain what's going wrong here, I added the call to g_slist_free in rpdf_free however it didn't fix the problem. Suggestions welcome, otherwise I'll come back to it after I get the python interface re-done. btw. Updated patch with your corrections attached. Cheers, William. |
From: Bob D. <bd...@si...> - 2005-06-06 14:30:52
|
Patch applied! > One left however is in rpdf.c, > the call to rpdf_text is leaving allocations on the slist: > > ==14555== 13627 bytes in 32 blocks are still reachable in loss record 28 > of 32 > ==14555== at 0x1B904A80: malloc (vg_replace_malloc.c:131) > ==14555== by 0xAA7A56: g_malloc (in /usr/lib/libglib-2.0.so.0.400.8) > ==14555== by 0xAA8C03: g_mem_chunk_alloc (in > /usr/lib/libglib-2.0.so.0.400.8) > ==14555== by 0xAB675C: (within /usr/lib/libglib-2.0.so.0.400.8) > ==14555== by 0xAB5CB1: g_slist_append (in > /usr/lib/libglib-2.0.so.0.400.8) > ==14555== by 0x1B945607: rpdf_stream_append (rpdf.c:142) > ==14555== by 0x1B94764B: rpdf_text (rpdf.c:798) > ==14555== by 0x1BD74545: pdf_print_text (pdf.c:113) > ==14555== by 0x1BD650EF: rlib_layout_text_string (layout.c:323) > ==14555== by 0x1BD66C9C: rlib_layout_report_output_array > (layout.c:820) > ==14555== by 0x1BD672C7: rlib_layout_report_outputs_across_pages > (layout.c:951) > ==14555== by 0x1BD67302: rlib_layout_report_output (layout.c:961) > ==14555== by 0x1BD63A73: rlib_layout_report (reportgen.c:571) > ==14555== by 0x1BD63F1D: rlib_layout_part_td (reportgen.c:668) > ==14555== by 0x1BD6416A: rlib_layout_part_tr (reportgen.c:715) > ==14555== by 0x1BD64503: rlib_make_report (reportgen.c:800) > > Since I'm not a glib guru (actually I've not coded in C since 1997) > I'm not certain what's going wrong here, I added the call > to g_slist_free in rpdf_free however it didn't fix the problem. Hym.... I see in your new patch you freed the page_contents in RPDF which looks like the right thing to do to fix the problem. I'm going to go w/ that patch you have now since it fixes a lot of things. I'll poke it a bit further and see why valgrind still complains. GLIB doesn't always sys free on a glib free call and valgrind doesn't know about this so its harder to find mem leaks.. let me poke @ it. - bob |
From: William K. V. <wk...@us...> - 2005-06-06 18:19:37
|
Hello Bob, On Mon, 2005-06-06 at 08:30, Bob Doan wrote: > Patch applied! > > > One left however is in rpdf.c, > > the call to rpdf_text is leaving allocations on the slist: > > > > ==14555== 13627 bytes in 32 blocks are still reachable in loss record 28 > > of 32 > > ==14555== at 0x1B904A80: malloc (vg_replace_malloc.c:131) > > ==14555== by 0xAA7A56: g_malloc (in /usr/lib/libglib-2.0.so.0.400.8) > > ==14555== by 0xAA8C03: g_mem_chunk_alloc (in > > /usr/lib/libglib-2.0.so.0.400.8) > > ==14555== by 0xAB675C: (within /usr/lib/libglib-2.0.so.0.400.8) > > ==14555== by 0xAB5CB1: g_slist_append (in > > /usr/lib/libglib-2.0.so.0.400.8) > > ==14555== by 0x1B945607: rpdf_stream_append (rpdf.c:142) <snip> > > Since I'm not a glib guru (actually I've not coded in C since 1997) > > I'm not certain what's going wrong here, I added the call > > to g_slist_free in rpdf_free however it didn't fix the problem. > > I see in your new patch you freed the page_contents in RPDF which looks > like the right thing to do to fix the problem. I'm going to go w/ that > patch you have now since it fixes a lot of things. I'll poke it a bit > further and see why valgrind still complains. GLIB doesn't always sys > free on a glib free call and valgrind doesn't know about this so its > harder to find mem leaks.. let me poke @ it. Actually reading the glib docs closely you hit on it: g_slist_free () void g_slist_free (GSList *list); Frees all of the memory used by a GSList. The freed elements are added to the GListAllocator free list. I had also seen this entry which I already figured out could not be fixed: ==17376== 40800 bytes in 2 blocks are still reachable in loss record 30 of 32 ==17376== at 0x1B904A80: malloc (vg_replace_malloc.c:131) ==17376== by 0x73795B: __gconv_open (in /lib/tls/libc-2.3.3.so) ==17376== by 0x7374E4: iconv_open (in /lib/tls/libc-2.3.3.so) ==17376== by 0xA8D835: (within /usr/lib/libglib-2.0.so.0.400.8) ==17376== by 0xA8D8E2: g_iconv_open (in /usr/lib/libglib-2.0.so.0.400.8) ==17376== by 0xA8DEEE: (within /usr/lib/libglib-2.0.so.0.400.8) ==17376== by 0xA8E1C0: g_convert (in /usr/lib/libglib-2.0.so.0.400.8) ==17376== by 0xA8EC81: g_locale_from_utf8 (in /usr/lib/libglib-2.0.so.0.400.8) ==17376== by 0xA94EEE: g_date_strftime (in /usr/lib/libglib-2.0.so.0.400.8) ==17376== by 0x1BD7FFC0: rlib_datetime_format_date (datetime.c:119) Since it is now the normal work week my time becomes a bit more limited. Hopefully everything goes well. Take care, William. |