|
From: Alec M. <ma...@ma...> - 2010-02-25 02:47:40
|
Hello I am trying to debug a leaking TCP server written in Python running on Linux x86_64 I am a novice in using Valgrind. I recompiled python with all debug options, and I am using all suppression, that come with Python. I suspect the problem has nothing to do with Python though. When I start the server, the RSS memory size of the process is about 25MB (or about 75MB when I start in valgrind) Then I cause the memory leak, and the RSS size rises to >350MB. When I exit the process, I get ==17483== ERROR SUMMARY: 28 errors from 3 contexts (suppressed: 134 from 1) ==17483== malloc/free: in use at exit: 4,396,013 bytes in 25,917 blocks. ==17483== malloc/free: 291,998 allocs, 266,081 frees, 479,044,608 bytes allocated. ==17483== For counts of detected errors, rerun with: -v ==17483== searching for pointers to 25,917 not-freed blocks. ==17483== checked 5,025,576 bytes. ==17483== ==17483== LEAK SUMMARY: ==17483== definitely lost: 0 bytes in 0 blocks. ==17483== possibly lost: 75,288 bytes in 156 blocks. ==17483== still reachable: 4,320,725 bytes in 25,761 blocks. ==17483== suppressed: 0 bytes in 0 blocks. ==17483== Rerun with --leak-check=full to see details of leaked memory. >From this output, it seems that Valgrind says that the memory held at the exit was only 4,396,013 bytes, not 350MB, that I can see with ps -orss -p pid (or using "top" command) I guess I do not understand something basic here, about the two order of magnitude mismatch between the memory size numbers that the OS sees and that valgrind reports at exit. Thank you for your advice. |
|
From: Kaz K. <kky...@gm...> - 2010-02-25 03:21:22
|
On Wed, Feb 24, 2010 at 4:00 PM, Alec Matusis <ma...@ma...> wrote: > I guess I do not understand something basic here, about the two order of > magnitude mismatch between the memory size numbers that the OS sees and that > valgrind reports at exit. Hi Alec, It's possible that this is fragmentation, though it's an awful lot of fragmentation. I wonder whether you are not running into an issue with glibc's malloc. http://sourceware.org/bugzilla/show_bug.cgi?id=1128 |
|
From: Alec M. <ma...@ma...> - 2010-02-25 04:03:22
|
Hi Kaz,
I've since run my program using --tool=massif , and massif correctly
reported all memory:
MB
265.7^
.#
|
..:::#
|
..@:::::#
|
..:::@:::::#
|
,:@:::::@:::::#
|
.::@:@:::::@:::::#
|
,:::::@:@:::::@:::::#
| . :
@:::::@:@:::::@:::::#
| ..@: :
@:::::@:@:::::@:::::#
| ..:::@: :
@:::::@:@:::::@:::::#
| ..::::::@: :
@:::::@:@:::::@:::::#
| ..:::::::::@: :
@:::::@:@:::::@:::::#
| ., :::::::::::@: :
@:::::@:@:::::@:::::#
| ., ::@ :::::::::::@: :
@:::::@:@:::::@:::::#
| ..::@ ::@ :::::::::::@: :
@:::::@:@:::::@:::::#
| ,.:::::@ ::@ :::::::::::@: :
@:::::@:@:::::@:::::#
| .:@::::::@ ::@ :::::::::::@: :
@:::::@:@:::::@:::::#
| .. : ::@::::::@ ::@ :::::::::::@: :
@:::::@:@:::::@:::::#
| .. :: : ::@::::::@ ::@ :::::::::::@: :
@:::::@:@:::::@:::::#
| , ::: :: : ::@::::::@ ::@ :::::::::::@: :
@:::::@:@:::::@:::::#
0
+----------------------------------------------------------------------->Gi
0
1.584
Number of snapshots: 83
Detailed snapshots: [4, 7, 9, 21, 29, 32, 45, 49, 59, 62, 72, 81 (peak)]
----------------------------------------------------------------------------
----
n time(i) total(B) useful-heap(B) extra-heap(B)
stacks(B)
----------------------------------------------------------------------------
----
73 1,566,070,529 256,292,664 254,821,441 1,471,223
0
74 1,581,387,012 259,160,256 257,679,135 1,481,121
0
75 1,596,703,495 262,027,848 260,536,829 1,491,019
0
76 1,612,019,978 264,895,440 263,394,523 1,500,917
0
77 1,627,336,461 267,763,032 266,252,217 1,510,815
0
78 1,642,652,944 270,630,624 269,109,911 1,520,713
0
79 1,657,969,427 273,498,216 271,967,605 1,530,611
0
80 1,673,285,910 276,365,808 274,825,299 1,540,509
0
81 1,685,522,232 278,653,400 277,104,989 1,548,411
0
....................
----------------------------------------------------------------------------
----
n time(i) total(B) useful-heap(B) extra-heap(B)
stacks(B)
----------------------------------------------------------------------------
----
82 1,700,827,952 7,258,056 6,800,229 457,827
0
This probably means that when I kill the program with Ctrl+C, it first
releases most of the memory after the signal is caught, and only then exits,
with a very small memory footprint at the very last moment, which memcheck
correctly reports (corresponding to snapshot 82 in massif).
I wonder if I could see what takes all that memory in snapshots 73-81, since
from my standpoint those are memory leaks. Massif only shows me which line
in Python interpreter code did the allocations, which is too generic- I
would like to know what has actually been allocated.
> -----Original Message-----
> From: Kaz Kylheku [mailto:kky...@gm...]
> Sent: Wednesday, February 24, 2010 7:21 PM
> To: Alec Matusis
> Cc: val...@li...
> Subject: Re: [Valgrind-users] Basic question about memcheck
>
> On Wed, Feb 24, 2010 at 4:00 PM, Alec Matusis <ma...@ma...>
> wrote:
> > I guess I do not understand something basic here, about the two order of
> > magnitude mismatch between the memory size numbers that the OS sees
> and that
> > valgrind reports at exit.
>
> Hi Alec,
>
> It's possible that this is fragmentation, though it's an awful lot of
> fragmentation.
>
> I wonder whether you are not running into an issue with glibc's malloc.
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=1128
|
|
From: Kaz K. <kky...@gm...> - 2010-02-25 04:40:17
|
On Feb 24, 2010 7:36pm, Alec Matusis <ma...@ma...> wrote: > Hi Kaz, [...] > > This probably means that when I kill the program with Ctrl+C, it first > > releases most of the memory after the signal is caught, and only then exits, > > with a very small memory footprint at the very last moment, which memcheck > > correctly reports (corresponding to snapshot 82 in massif). Okay, so what you have is not a leak or fragmentation, but unwanted retention. It might be worth looking at the logic in the Python code to see where it might be hanging on to object references longer than necessary. |
|
From: Douglas L. <dou...@so...> - 2010-02-25 09:55:10
|
Alec Matusis wrote: > Hello > > I am trying to debug a leaking TCP server written in Python running on Linux > x86_64 > I think you're problem is right there - 'Python'. Python is a garbage-collected language, so it's impossible to really leak memory. All that you can do is forget where it's being referenced. Either: a) You've got references sticking around which you don't know about. b) You've got cycles of objects with __del__() methods (which the garbage collector can't free). c) The garbage collector has decided not to run yet. Python objects can't be leaked (at the C/valgrind level), because the Python run-time will keep references to them. -- Douglas Leeder Sophos Plc, The Pentagon, Abingdon Science Park, Abingdon, OX14 3YP, United Kingdom. Company Reg No 2096520. VAT Reg No GB 348 3873 20. |
|
From: tom f. <tf...@al...> - 2010-02-27 19:10:32
|
Douglas Leeder <dou...@so...> writes: > Alec Matusis wrote: > > I am trying to debug a leaking TCP server written in Python running > > on Linux x86_64 > > I think you're problem is right there - 'Python'. > > Python objects can't be leaked (at the C/valgrind level), because the > Python run-time will keep references to them. Python has to be built specially to be valgrindable. Some #define needs set, maybe a configuration option. There is documentation available in the python distribution. -tom |