|
From: Philippe W. <phi...@sk...> - 2011-03-03 22:47:29
|
With the recent checked in changes, helgrind 3.7.0 SVN works faster and uses less memory
(when using -track-lockorders=no).
So, I have looked at the memory used by the laog data structure when track-lockorders is yes.
I have (temporarily) added a gdbserver monitor command to output (on demand) the state
of the arenas + output the whole of the various laog data structures.
While the arena statistics shows that the univ_laog takes about
1.4 Gb of memory, dumping the data structures in printable layout
only takes about 25 Mb.
I have copied below the various data structures output (truncated
the big ones).
I do not understand where the memory is used.
Any hint about what is missing to obtain a printout matching the 1.4 Gb of memory ?
All_Data_Structures (caller = "mon cmd") {
admin_threads (4 records) {
Thread 0x23EE3EF8 {
locksetA 67201
locksetW 67201
}
Thread 0x23F18E48 {
locksetA 67544
locksetW 67544
}
Thread 0x17AC51A0 {
locksetA 0
locksetW 0
}
Thread 0x42C68E0 {
locksetA 94609
locksetW 94609
}
}
map_threads (4 entries) {
coretid 1 -> Thread 0x42C68E0
coretid 2 -> Thread 0x17AC51A0
coretid 3 -> Thread 0x23F18E48
coretid 4 -> Thread 0x23EE3EF8
}
admin_locks (135689 records) {
Lock 0x23D57A10 (ga 0x26ef3544) {
unique 135798
kind nonRec
heldW no
heldBy 0x0
}
Lock 0x22FB3690 (ga 0x26ef66e4) {
unique 135797
kind nonRec
heldW no
heldBy 0x0
}
Lock 0x229A8038 (ga 0x26ef4fac) {
unique 135796
kind nonRec
heldW no
heldBy 0x0
}
......... (plenty of records removed here)
Lock 0x434A518 (ga 0x11651524) {
unique 1
kind mbRec
heldW no
heldBy 0x0
}
Lock 0x42C4FA8 (ga 0x381fcf80) {
unique 0
kind nonRec
heldW no
heldBy 0x0
}
}
map_locks (135689 entries) {
guest 0x441B240 -> Lock 0x4386108
guest 0x7EC1CA0 -> Lock 0x4352528
.... plenty of lines removed here
guest 0x26EF51D4 -> Lock 0x23E61140
guest 0x26EF66E4 -> Lock 0x22FB3690
guest 0x381FCF80 -> Lock 0x42C4FA8
guest 0xFE8AA04C -> Lock 0x22A8CA00
}
}
(In the below data structure, there is 27431 nodes, 29091 'inn'
and 29091 'out'.)
laog (requested by mon cmd) {
node 0x434A518:
inn 0x4360458
inn 0x438bf60
inn 0x438c848
inn 0x438c8e8
inn 0x43919e8
inn 0x4395a18
inn 0x4397c50
inn 0x4398330
inn 0x43983d0
inn 0x4399fc8
inn 0x439a388
inn 0x439a4c8
inn 0x439d290
inn 0x43c4200
inn 0x20a2ae88
........plenty of lines removed here
node 0x23E4D158:
inn 0x4395a18
inn 0x4399fc8
inn 0x439a4c8
inn 0x439d290
inn 0x23140890
out 0x43866f0
out 0x4386a00
out 0x14651c98
out 0x151a9180
out 0x151a9250
out 0x1789e1b0
out 0x178a5ce8
out 0x178a5e88
out 0x178ac550
out 0x178bb9c0
out 0x178c1410
out 0x178c2190
out 0x178c29b8
out 0x17a02188
out 0x17a03998
out 0x17a09468
out 0x2277f448
out 0x229ac6e8
out 0x2376a3d8
out 0x23eb24f0
node 0x23E4F828:
inn 0x2277f448
node 0x23E8AAD0:
inn 0x2277f448
node 0x23EB24F0:
inn 0x4395a18
inn 0x4399fc8
inn 0x439a4c8
inn 0x439d290
inn 0x2277f448
inn 0x23140890
inn 0x2376a3d8
inn 0x23d5bc00
inn 0x23e4d158
out 0x178c2190
node 0x23EDD7D0:
inn 0x43918a8
inn 0x4395a18
inn 0x4399fc8
inn 0x439a248
inn 0x439a4c8
inn 0x439d290
inn 0x22efa3f0
inn 0x2301ee70
inn 0x23620840
node 0x2402A5E8:
inn 0x4395a18
node 0x24E7E5E0:
}
WordSet "univ_tsets":
addTo 0 (0 uncached)
delFrom 0 (0 uncached)
union 0
intersect 0 (0 uncached) [nb. incl isSubsetOf]
minus 0 (0 uncached)
elem 0
doubleton 0
isEmpty 0
isSingleton 0
anyElementOf 0
isSubsetOf 0
WordSet "univ_lsets":
addTo 1716289 (769109 uncached)
delFrom 1712012 (767474 uncached)
union 0
intersect 13 (0 uncached) [nb. incl isSubsetOf]
minus 0 (0 uncached)
elem 1427451
doubleton 0
isEmpty 856013
isSingleton 0
anyElementOf 0
isSubsetOf 13
WordSet "univ_laog":
addTo 378757 (81292 uncached)
delFrom 28 (26 uncached)
union 0
intersect 0 (0 uncached) [nb. incl isSubsetOf]
minus 0 (0 uncached)
elem 61364
doubleton 27431
isEmpty 0
isSingleton 0
anyElementOf 0
isSubsetOf 0
locksets: 94,610 unique lock sets
threadsets: 1 unique thread sets
univ_laog: 30,048 unique lock sets
LockN-to-P map: 0 queries (0 map size)
string table map: 0 queries (0 map size)
LAOG: 27,431 map size
LAOG exposition: 29,105 map size
locks: 858,346 acquires, 858,339 releases
sanity checks: 6
-------- Arena "core": 1048576 mmap'd, 325568/309160 max/curr --------
64 in 4: stacks.rs.1
72 in 1: main.mpclo.3
368 in 27: errormgr.losf.4
1,008 in 46: errormgr.losf.3
5,056 in 158: errormgr.losf.1
5,864 in 157: errormgr.losf.2
34,584 in 22: gdbsrv
262,144 in 4: di.syswrap-x86.azxG.1
-------- Arena "tool": 1564475392 mmap'd, 1521820168/1521820168 max/curr --------
40 in 1: libhb.event_map_init.1 (RCEC groups)
40 in 1: libhb.event_map_init.3 (OldRef groups)
48 in 2: commandline.sua.3
64 in 2: libhb.Thr__new.4
80 in 2: hg.pSfs.1
104 in 4: libhb.Thr__new.1
104 in 2: commandline.sua.2
112 in 3: hashtable.Hc.1
152 in 1: libhb.event_map_init.4 (oldref tree)
168 in 4: hg.mk_Thread.1
176 in 4: libhb.Thr__new.3 (local_Kws_and_stacks)
184 in 1: initimg-linux.sce.5
304 in 2: hg.mstSs.1
368 in 4: hg.mpttT.1
608 in 37: hg.mctCloa.1
664 in 82: hg.lae.1
1,272 in 38: hg.mctCI.1
1,992 in 5: hg.ids.3
2,000 in 1: hg.ids.1
2,272 in 21: hg.lNaw.1
6,160 in 2: hashtable.Hc.2
8,976 in 120: errormgr.mre.2
49,152 in 4: libhb.Thr__new.2
196,632 in 1: hashtable.resize.1
251,632 in 27,349: hg.lae.2
491,456 in 29,105: hg.lae.3
514,264 in 21,266: libhb.zsm_init.1 (map_shmem)
697,584 in 27,432: hg.laog__init.1
730,384 in 29,106: hg.laog__init.2
786,456 in 1: libhb.event_map_init.2 (context table)
857,352 in 34,256: hg.new_MallocMeta.1
2,097,168 in 1: libhb.libhb_init.1
2,171,912 in 135,729: libhb.SO__Alloc.1
2,488,576 in 2: libhb.vts_tab_init.1
3,257,104 in 135,690: hg.ids.2
3,724,280 in 145,520: libhb.vts_set_init.1
5,970,696 in 283,832: hg.ids.4
6,513,272 in 135,689: hg.mk_Lock.1
7,084,728 in 145,519: libhb.vts_set_focaa.1
26,923,376 in 996: libhb.aFfw.1 (LineF storage)
1,456,988,256 in 90,146: hg.ids.5 (univ_laog)
-------- Arena "dinfo": 102596608 mmap'd, 100668584/100668584 max/curr --------
496 in 27: di.redi.1
1,056 in 63: redir.rnnD.4
1,736 in 107: redir.rnnD.2
1,904 in 107: redir.rnnD.3
2,544 in 71: redir.ri.1
3,120 in 63: di.debuginfo.aDI.2
3,424 in 107: redir.rnnD.1
23,712 in 63: di.debuginfo.aDI.1
211,216 in 4: di.ccCt.2
15,660,000 in 63: di.storage.addSym.1
22,952,000 in 24: di.storage.addLoc.1
26,151,440 in 45: di.storage.addDiCfSI.1
35,655,936 in 544: di.storage.addStr.1
-------- Arena "client": 16777216 mmap'd, 10953648/10947472 max/curr --------
10,947,472 in 34,256: replacemalloc.cm.1
-------- Arena "demangle": 65536 mmap'd, 256/0 max/curr --------
-------- Arena "exectxt": 12062720 mmap'd, 6966704/6812528 max/curr --------
1,572,968 in 1: execontext.reh1
5,239,560 in 215,720: execontext.rEw2.2
-------- Arena "errors": 65536 mmap'd, 4800/4800 max/curr --------
4,800 in 120: errormgr.mre.1
-------- Arena "ttaux": 524288 mmap'd, 466576/466352 max/curr --------
466,352 in 1,015: transtab.aECN.1
|
|
From: Julian S. <js...@ac...> - 2011-03-05 00:23:14
|
> 1,456,988,256 in 90,146: hg.ids.5 (univ_laog) Urr, that's terrible. I had no idea that the laog mechanism could use so much memory. Search for the string "hg.ids.5" -- this is the allocation tag used for allocating the word sets. The LAOG implementation is my first attempt to implement a lock order graph checker, and probably could use some improvement, or "rm -rf" followed by a new implementation. These are sets of words -- 90146 sets, each set is a set of words, and each word is a Lock*. So in short these are sets of Lock*s. I don't remember exactly how the LAOG works -- I would need to study the code. Basically it must build a directed graph of Lock*s and then check that later acquisitions of locks does not violate the ordering recorded in the graph. The scaling problem that worries me is like this: the application creates thousands of (eg) C++ objects, and each one has a lock (eg, a pthread_mutex_t). That means Helgrind has to allocate the same number (thousands) of its own Lock structures, and so the laog gets very big. Having said that .. if the application does pthread_mutex_destroy on a lock before deleting the object, then H should free up the associated Lock structure. Maybe the problem is that the WordSets (which are sets of Lock*'s) are never GC'd. (Just guessing.) In this situation it is useful to make a small test case which shows the problem. Maybe you can do that? J |
|
From: Philippe W. <phi...@sk...> - 2011-03-07 00:07:20
|
> These are sets of words -- 90146 sets, each set is a set of words,
> and each word is a Lock*. So in short these are sets of Lock*s.
My understanding is the following:
there are two data structures maintaining Locksets:
univ_lsets
and
univ_laog
univ_lsets are ordered locksets (maintaining the history of all locksets
held by each thread).
univ_laog maintains the locksets that have been locked "before"
and "after" a lock.
In my case, there is about 27431 entries in univ_laog.
Each of these entries is the successor of a set of "inn" locks
and is the predecessor of a set of "out" locks.
> The scaling problem that worries me is like this: the application
> creates thousands of (eg) C++ objects, and each one has a lock
> (eg, a pthread_mutex_t). That means Helgrind has to allocate
> the same number (thousands) of its own Lock structures, and so
> the laog gets very big.
I do not believe that in my case, there are so many locks that
it would explain 1.4 GB of memory.
I rather believe that there is a quadratic (or exponential) effect
of some operations on the memory of laog.
At least the following is quadratic:
Application has an array of e.g. 1000 LockX
an lock LockD
an array of e.g. 1000 LockY
If the application does:
for i in 1 .. 1000
lock LockX[i]
lock LockD
lock LockY
unlock LockX/LockD/LockY
end loop
destroy LockD
Then at the destroy of D, laog will create 1000 * 1000 Lock sets to represent
all the "possible" orders between LockXs and LockYs.
I thought this was the cause of the huge memory, but when I commented out
the quadratic logic in laog, the memory was still exhausted.
So, there must be something else that I have not seen.
>
> Having said that .. if the application does pthread_mutex_destroy
> on a lock before deleting the object, then H should free up the
> associated Lock structure. Maybe the problem is that the WordSets
> (which are sets of Lock*'s) are never GC'd. (Just guessing.)
Effectively, I believe Lock sets in univ_* are never cleaned up
(but I understood this is the design) and that is why they are
called "univ_*" (for "universe" I am guessing).
>
> In this situation it is useful to make a small test case which
> shows the problem. Maybe you can do that?
I will take a look at how I can make a small test case.
At least for the 1000x1000, the small test case is easy, but
I believe this does not match the problem I have on the big
application.
Philippe
|
|
From: Julian S. <js...@ac...> - 2011-03-07 17:49:18
|
> If the application does:
> for i in 1 .. 1000
> lock LockX[i]
> lock LockD
> lock LockY
> unlock LockX/LockD/LockY
> end loop
> destroy LockD
AFAICS this would only create 1000 locksets: {LockX[i], LockD, LockY}
for i in 1 .. 1000. (But I am not sure of my memory of the laog
logic in H, so ..)
Do you mean to have 2 nested loops, like this:
for i in 1 .. 1000
for j in 1 .. 1000
lock LockX[i]
lock LockD
lock LockY[j]
unlock LockX[i]/LockD/LockY[j]
end loop
end loop
destroy LockD
Would that maybe create the 1 million locksets?
> I will take a look at how I can make a small test case.
Yes .. without that we can't really analyze this properly.
J
|
|
From: Philippe W. <phi...@sk...> - 2011-03-07 23:21:25
|
> The scaling problem that worries me is like this: the application > creates thousands of (eg) C++ objects, and each one has a lock > (eg, a pthread_mutex_t). That means Helgrind has to allocate > the same number (thousands) of its own Lock structures, and so > the laog gets very big. I think that in general, an application having many locks can always produce a gigantic set of various lock acquisition order. However, there is something quadratic in the laog data structures even for one single sequence of locks. > In this situation it is useful to make a small test case which > shows the problem. Maybe you can do that? Small test case attached to: https://bugs.kde.org/show_bug.cgi?id=267925 Philippe |