[Dpcl-develop] Re: some DPCL questions
Brought to you by:
dpcl-admin,
dwootton
|
From: Steve C. <sl...@sg...> - 2004-03-05 23:13:49
|
Many thanks again to DaveW for his list of helpful hints
for analyzing the 'limit of 45 callbacks' problem we've been
seeing on our ia64 box. These hints and some sleuthing by
myself and Bill Hachfeld-SGI (mostly Bill!!) combined to
come up with what we think are some <potential> bug fixes
for the current DPCL. These fixes do NOT seem to be 64-bit
specific. Next week I will send along a list of 64-bit
specific <potential> changes we would probably need to run
DPCL on our ia64 box. I'm sure Dave has already found most
of these 64-bit specific problems, but I'll send them along
for his comment, just in case. Again, the 64-bit specific
changes I have accumulated in the past few months will be
coming NEXT week. The following is Bill Hachfeld's analysis
and suggested fixes for the 'shared memory' problems we were
seeing (aka the 'limit of 45 callbacks').
Thanks again to DaveW and JamesW for their continued
support!
SteveC - SGI Compilers/Tools
***************************************************************************
==>
==> Proposed Fixes: diff is from JamesW's hybrid source of 121503.
==>
File: ~dpcl/src/daemon_RT/src/os/linux/ShmManager.C
***************************************************************************
237,238c237,238
< unsigned int * obj_tail =
< (unsigned int *) ((unsigned long) obj_header + true_obj_size -
---
> int * obj_tail =
> (int *) ((unsigned long) obj_header + true_obj_size -
552c552
< page->object_size,
---
> 0,
580,581c580,581
< unsigned int * free_object_tail =
< (unsigned int *) ((unsigned long) free_object + page->object_size +
---
> int * free_object_tail =
> (int *) ((unsigned long) free_object + page->object_size +
583d582
<
643,644c642
<
< freeFObjectH ** p_free_object = (freeFObjectH **) object_holder;
---
>
660c658
< page->object_size,
---
> 0,
685a684
> freeFObjectH ** p_free_object = (freeFObjectH **) object_holder;
698c697
< p_free_object = (freeFObjectH **) object_holder;
---
> freeFObjectH ** p_free_object = (freeFObjectH **) object_holder;
701,702c700,701
< unsigned int * free_object_tail =
< (unsigned int *) ((unsigned long) p_free_object [i] + page->object_size +
---
> int * free_object_tail =
> (int *) ((unsigned long) p_free_object [i] + page->object_size +
705d703
<
758,759c756,757
< unsigned int * free_object_tail =
< (unsigned int *) ((unsigned long) free_object + page->object_size +
---
> int * free_object_tail =
> (int *) ((unsigned long) free_object + page->object_size +
814,815c812,813
< unsigned int * free_object_tail =
< (unsigned int *) ((unsigned long) free_object + page->object_size +
---
> int * free_object_tail =
> (int *) ((unsigned long) free_object + page->object_size +
***************************************************************************
--------------------------------------------------------------
Change #1: Change int* --> unsigned int*
--------------------------------------------------------------
==>
==> Analysis (courtesy of Bill Hachfeld, SGI):
==>
ShmManager.C, Line #237-8
The "mask" field is defined as an "unsigned int" in the structures
freeFObjectH, freeVObjectH, freeVObjectT, and allocVObject. While I
don't believe this change made a material difference, it should be
change for the sake of understandability and consistency.
--------------------------------------------------------------
Change #2: Change 0 --> page->object_size
--------------------------------------------------------------
==>
==> Analysis (courtesy of Bill Hachfeld, SGI):
==>
The first page allocated for message queue headers and buffers is
properly requested as a page containing fixed-sized objects. This is
done in Ais_msgInit() by calling shm_blockAlloc(). Subsequent calls
to shm_processObjectAlloc() and shm_processObjectAllocV(), however,
extend the page list by also calling shm_blockAlloc() with pages
containing variable-sized objects.
At both source locations we are requesting the allocation of a new
page to be added to a list of previously allocated pages holding
fixed-sized objects. By passing "0" as the object size, we ask the
allocator to give us a new page containing variable-sized objects.
Subsequent code in the *ObjectAlloc*() functions treats this page as
if it where another fixed-size object page, looking for 0xDEADBEEF
magic symbols at object header/tailer locations where none exist.
My change simply insures that subsequent pages are allocated as
fixed-size object pages with the object size being the same as the
previous page.
--------------------------------------------------------------
Change #3: Move declaration of loop variable
--------------------------------------------------------------
==>
==> Analysis (courtesy of Bill Hachfeld, SGI):
==>
This bug is even more insidious than the one above. It occurs only
when attempting to allocate variable-length arrays of fixed-sized
objects, where the objects are allocated from more than one page.
In shm_processObjectAllocV() we are allocating multiple fixed-size
objects and placing pointers to them into an array passed in by the
caller. We start by allocating as many as possible from the current
page in the loop at line #684. When we run out of objects within that
page, we break out to the outer loop that begins at line #644. Here
we allocate a new page and begin filling in objects again at line
#684. Unfortunately we reset our array pointer back to the beginning
of the user-provided array. Previously allocated objects are
overwritten and the remaining objects are allocated but never
properly returned to the user.
My change simply moves the declaration of the "current" pointer into
the user-passed array outside the outer loop beginning at line #644.
Thus insuring we fill in the entire array properly.
-------------------------------------------------------------------
William Hachfeld EMail: wd...@sg...
SGI Debugger, Object, and Performance Tools Phone: 651-683-3103
|