Menu

#133 Better VMemPool free list handling

open
nobody
Repository (26)
5
2007-03-22
2007-03-22
No

We've recently started to have evidence that VMemPool::allocate is taking a long time in production servers. To investigate further I started adding some simple data recording to VMemPool in this branch:

/vesta/vestasys.org/vesta/repos/86.VMemPool_stats

Having looked at some data from that, it seems clear that the longer the repository runs without weeding the longer the VMemPool free list becomes, the more free blocks get considered and rejected on each allocation before finding one that can be used, and the longer each allocation takes. Since the VMemPool free list handling is currently simplistic, it shouldn't be too hard to improve the situation.

One thing we can do is partition the free list by block size. This will make it easier to avoid spending time considering free blocks which are too small to satisfy the current allocation request. I'm not sure yet exactly how we should partition the free list, and I would like to gather additional data before making a decision. A common method in other allocators is to use powers of 2 (4 bytes, 8 bytes, 16 bytes, etc.), but that might be too coarse-grained for VMemPool which is designed to use tightly packed data structures. Several of the VMemPool allocations are for fixed block sizes and we may want to keep separate free lists for blocks of those sizes.

Another thing we could do is improve the ability to coalesce free space to reduce fragmentation in the VMemPool heap. The only coalescing that happens currently is that a freed block will absorb following free bytes. It would be easy to combine a freed block with another free block that immediately follows it, but to combine with free bytes or free blocks which are before the freed block we will need additional data structures. I think probably the best way to do this would be to keep a table that maps from the end of each free region to the beginning of each free region. It would be efficient both to keep such a table up to date and to use it for coalescing when a block is freed.

Discussion

  • Kenneth C. Schalk

    Logged In: YES
    user_id=304837
    Originator: YES

    Dave Andrews from Intel had an interesting suggestion for
    how to implement better free space coalescing without adding
    an additional data structure: have a background thread that
    periodically wakes up and looks for opportunities to
    coalesce blocks on the free list. This would make it
    possible to simply coalesce with free space following blocks
    on the free list. Eventually such a thread would find any
    free block that precedes another.

    This is worth considering as it could be a bit easier to
    implement. The difficult part seems to be balancing having
    such a thread do a useful amount of work vs. having it
    starve out threads doing real work by holding the VMemPool
    lock. It would have to use the existing freedMemCursor as
    the point to start working each time it begins. If it did
    too little work before giving up, some parts of the free
    list might never get checked.

    I should also mention that I have some data now on
    allocation block size popularity. In a recent test-vesta
    run the top 10 block sizes with their associated counts
    were:

    16 : 412269
    512 : 51693
    32 : 8342
    40 : 7205
    24 : 3739
    56 : 2665
    336 : 2589
    88 : 1447
    120 : 1260
    376 : 1241

    16 is now the minimum allocation size due to the increase in
    the minimum free block size. It's probably the most popular
    because of VDirEvaluator, but there are undoubtedly other
    kinds of blocks that fall into that size as well.

    512 is the size of block allocated by VDirChangeable when a
    new block for more directory entries is needed, so it's not
    surprising that it's in second place.

    Before I make a decision about free list partitioning I'm
    planning on gathering similar data from a production
    repository.

     
  • Kenneth C. Schalk

    Logged In: YES
    user_id=304837
    Originator: YES

    It occurred to me today that there's an even simpler
    solution for free block coalescing than having a background
    thread do it: When considering a block from the free list
    for an allocation request, attempt to coalesce it with any
    following free space. This will get the job done just in
    time to satisfy an allocation request and minimizes the
    amount of extra work we expend on coalescing. This would
    not be quite as good as coalescing with preceding free space
    when each block is freed, but it could get pretty close.

     
  • Kenneth C. Schalk

    Logged In: YES
    user_id=304837
    Originator: YES

    The free list handling is much improved with the changes I
    just checked in as:

    /vesta/vestasys.org/vesta/repos/89

    However, data from production repositories at Intel still
    shows a noticeable number of blocks from the free list
    considered and rejected per allocation. This suggests that
    we can still squeeze out more performance with a dynamic
    free list partitioning rather than the static free list
    partitioning we're using now.

    For this reason I'm going to leave this tracker entry open
    until we can get that implemented.

     

Log in to post a comment.

MongoDB Logo MongoDB