loading/unloading stxxl::vector<>

Edgar
2011-03-28
2013-04-25
  • Edgar

    Edgar - 2011-03-28

    Hi,

    I've extended the vector class by two methods (load and unload) for completely unloading the vector's constructor data to disks managed by the block_manager instance and loading them at a later stage of a process. I would appreciate it, if someone would integrate them into the project. Test exists.

    e.g.:

    vector_type v (0x200000);
    init (v);
    sentinel = v.unload ();

    … do something else

    v.load (sentinel);
    // continue working on v

    Edgar

     
  • Johannes Singler

    Would it be hard to generalize this to setting the number of pages currently used?  Setting it to 0 would effectively unload the vector, and increasing it  would allow access again.

    Johannes

     
  • Edgar

    Edgar - 2011-04-11

    No - principally it is a good idea, but

    it is unclear to me, how to interface it i.e.:

    v.numpages(0, &sentinel)  // Unload
    v.numpages(8, &sentinel)  // Load

    or

    sentinel = v.unload ()
    v.load (sentinel, 8)

    ?

    Apart from that I've added a constructor which takes sentinel as an addition argument, to construct the vector from a sentinel i.e.:

    vector_type v(sentinel, n_pages);

    Edgar

     
  • Johannes Singler

    So far, there is no sentinel for vector, and we should not make it mandatory.  Why have you introduced it?

    Johannes

     
  • Edgar

    Edgar - 2011-04-12

    Hmmh.., I think we've spotted different goals.

    First of all, the main procedure of
    a) unloading the vector to disk is:
    - writing the vector's bid-vector as a compact tree
    - clearing the bid-vector and
    - clearing the cache status (keeping the allocated memory).
    b) loading the vector from disk is:
    - reading the vector's compact tree into the vector's bid-vector
    - setting up the cache status (reusing the allocated memory).

    Currently sentinel consists of the disk vector's root bid, the vector's size and a disk queue map to pack the 16-byte BID to an 8-byte BID using a disk queue id in the 8 low-order bits of the offset (the map could be removed if stxxl can provide such a mapping).

    Introducing a sentinel frees somebody from reallocating the vector's cache, where allocating large chunks of memory is a concern (i.e: not guaranteed due to memory fragmentation).  So my primary goal is to avoid memory contention.

    I'm using the sentinel to handle hundreds of extended vectors in a multiuser environment, swapping them to disk, when they are displaced and loading them into memory, when they are demanded. In this scenario the sentinels work perfect.

    Edgar

     
  • Andreas Beckmann

    Hmmh.., I think we've spotted different goals. First of all, the main procedure of a) unloading the vector to disk is: - writing the vector's bid-vector as a compact tree - clearing the bid-vector and - clearing the cache status (keeping the allocated memory).

    So you dump the internal state (the BIDs) into a (chained list of) typed_block<block_size, bid_type, 1, size_type>, returning the first BID in the list …

    b) loading the vector from disk is: - reading the vector's compact tree into the vector's bid-vector - setting up the cache status (reusing the allocated memory).

    and reload the vector from that BID

    Currently sentinel

    That's not a sentinel. A sentinel (in computer science) is a special value that marks the end of a data structure, e.g. a NULL pointer at the end of a list or some min/max value as we use it in the priority queue to fill missing elements at the end of a block without needing to check an exact element count everytime.

    consists of the disk vector's root bid, the vector's size and a disk queue map to pack the 16-byte BID to an 8-byte BID using a disk queue id in the 8 low-order bits of the offset (the map could be removed if stxxl can provide such a mapping).

    NO WAY! Just store plain BIDs, don't try to save memory here. You can't map a BID to a disk_queue_id. And there can be more then 256 files …

    Introducing a sentinel frees somebody from reallocating the vector's cache, where allocating large chunks of memory is a concern (i.e: not guaranteed due to memory fragmentation). So my primary goal is to avoid memory contention. I'm using the sentinel to handle hundreds of extended vectors in a multiuser environment, swapping them to disk, when they are displaced and loading them into memory, when they are demanded. In this scenario the sentinels work perfect.

    Why don't you use vectors that are bound to files? That way you could have them persistent over several program runs, too.
    And then eventually enhance vector::{,de}allocate_page_cache to inprove memory management in a way you need it.

    Do you really need a large vector cache? How does your access pattern look like? Random Access is a Bad Thing!

    Andreas

     
  • Edgar

    Edgar - 2011-04-12

    > So you dump the internal state (the BIDs) into a (chained list of)
    > typed_block<block_size, bid_type, 1, size_type>, returning the first BID in
    > the list …
    Yes, something like that (a tree of typed BID blocks to optimize disk io)

    > and reload the vector from that BID
    Right.

    >

    consists of the disk vector's root bid, the vector's size and a disk
    > queue map to pack the 16-byte BID to an 8-byte BID using a disk queue id in
    > the 8 low-order bits of the offset (the map could be removed if stxxl can
    > provide such a mapping).

    >
    > NO WAY! Just store plain BIDs, don't try to save memory here. You can't map
    > a BID to a disk_queue_id. And there can be more then 256 files …
    It is not a problem to revert it.

    > Why don't you use vectors that are bound to files? That way you could have
    > them persistent over several program runs, too.
    > And then eventually enhance vector::{,de}allocate_page_cache to inprove
    > memory management in a way you need it.
    Yes, this is an alternative.
    The decision not to use file bound vector's was routed by the impact of existing
    systems  (antivirus) on file creation. Another pro for me using the block manager
    for a vectors cache is, that the block manager's disk files (created by createdisk)
    are mostly defragmented, where files created on demand might be spread
    over the disk (depending on the os's relevance).
    > Do you really need a large vector cache? How does your access pattern look
    > like?
    > Random Access is a Bad Thing!
    I am using extended vectors as resultsets on a server-side where client buffers 
    are filled with those results. So I need a single random access to position the
    client's cursor and multiple bidirectional access to advance within the results.
    Due to the fact that this happens in a multiuser environment it is not unusual
    that the vector is displaced  to disk between two client requests.

    Edgar

     
  • Andreas Beckmann

    > Why don't you use vectors that are bound to files?

    Yes, this is an alternative. The decision not to use file bound vector's was routed by the impact of existing systems (antivirus) on file creation. Another pro for me using the block manager for a vectors cache is, that the block manager's disk files (created by createdisk) are mostly defragmented, where files created on demand might be spread over the disk (depending on the os's relevance).

    You could achieve similar things with a vector bound to a file:
    * Put the file on a separate partition/hard disk - should fix the fragmentation issue.
    * Do not delete the file at the end, just reuse and overwrite it the next time - should avoid the file creation penalty. Also don't vector.clear() (which would release the blocks), just vector.resize(0) at the begining of your program and vector.resize(vector.capacity()) at the end to keep all allocated blocks in the file.

    Andreas

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks