Best Practices Help

2009-12-03
2013-06-03
<< < 1 2 (Page 2 of 2)
  • Brian Frutchey
    Brian Frutchey
    2009-12-18

    Guys, I haven't been able to test the Windows-specific mods, but I am still seeing the byte array grow until OOM even when I do not use compression.  I just ran another test where I removed all properties but the disabling of transactions, and still OOM.  I will try the windows-specific settings next, then dig into the heap dump on a run.  Seems like the byte array is much smaller (only 78Mb at 4.6M inserts - one funny thing is that the size of the array in bytes always ends with '99983', like 78199983 at 4.6M inserts or 74799983 at 4.4M inserts - why might this be?), so the OOM could be related to other objects not being released.

     
  • Brian Frutchey
    Brian Frutchey
    2009-12-18

    I should probably also mention that I am setting Xmx to 1024M.  I can avoid OOM by increasing memory limit enough to support # of inserts I am doing.  Like 20M needs Xmx6000M.

     
  • Kevin Day
    Kevin Day
    2009-12-18

    this is different from my test results.  I show that after 7 minutes or so, the heap usage goes flat and stays that way (granted, I didn't let it run for hours, but I did let it run for another 30 minutes, and saw no heap growth at all).  I'll put together a simplified test that demonstrates this without having to have 3 or 4 source files.

     
  • Kevin Day
    Kevin Day
    2009-12-18

    Here's a test:

        package jdbm.btree;
       
        import java.io.File;
        import java.util.Properties;
       
        import jdbm.RecordManager;
        import jdbm.RecordManagerFactory;
        import jdbm.RecordManagerOptions;
        import jdbm.helper.StringComparator;
        import jdbm.helper.StringSerializer;
        import jdbm.helper.compression.LeadingValueCompressionProvider;
       
       
        /**
         * @author kevin
         */
        public class StringInsertionTest {
       
            public StringInsertionTest() {
                // TODO Auto-generated constructor stub
            }
       
            public static void main(String args) throws Exception {
                String dbName = "testrecman";
                new File(dbName + ".db").delete();
                new File(dbName + ".long").delete();
               
               
                Properties props = new Properties();
                props.setProperty(RecordManagerOptions.DISABLE_TRANSACTIONS, "TRUE");
                RecordManager recman = RecordManagerFactory.createRecordManager(dbName, props);
                BTree newTree = BTree.createInstance(recman, StringComparator.INSTANCE, StringSerializer.INSTANCE, StringSerializer.INSTANCE);
                newTree.setKeyCompressionProvider(LeadingValueCompressionProvider.STRING);
       
                int recordsSoFar = 0;
                while (true){
                    newTree.insert(buildRandString(10), buildRandString(25), true);
                    recordsSoFar++;
                    if (recordsSoFar % 1000 == 0){
                        recman.commit();
                        System.out.println("Commit");
                    }
                }
               
                // recman.close(); // unreachable b/c of while(true) above
            }
           
            public static String buildRandString(int size) {
                StringBuilder sb = new StringBuilder();
                for(int i=0; i<size; i++)
                    sb.append((char)(57*Math.random()+65));
                return sb.toString();
            }
       
        }

    I run this with VisualVM, and I see a very well behaved heap.  The biggest memory usage comes from byte, and the # of byte objects bounces nicely between ~200 and ~1600 (this is with a total of 30 million allocations).  I just don't see evidence of a memory leak here.

    I think that you'll need to figure out where your allocation is coming from, and get a handle on that.  I did notice in your original test that you were committing every 50K inserts.  Bear in mind that by default, jdbm caches 10 transactions worth of changes, so you should probably spend some time figuring out how many bytes are involved with 50K worth of insertions in your model.

    I would also suggest that you check on the overall performance impact on how frequently you commit.  For certain, there is a big hit if you commit every single insert.  But if you commit every 1000 inserts, I doubt that the impact will be much different than if you commit every 10K inserts.  This is something that only you can decide (it's pretty much a trade off between speed and memory, with asymptotic bennefit as you throw more memory at it).

     
  • Brian Frutchey
    Brian Frutchey
    2009-12-19

    It would seem the cause of the ever growing bytearray is my serializer.  If all I do is serialize a single string for the key and value, even if those strings produce the same number of bytes as the average object my serializer handles, the bytearray seems to stay the same size.  My serializer which has longs, single bytes, and sets of strings seems to be the problem - any ideas?

     
  • Kevin Day
    Kevin Day
    2009-12-20

    Memory leaks are generally the result of member variables of objects - you should start there.  Serializers are usually best implemented as stateless, immutable objects, so you might want to check your implementation and make sure that is the case.

    My recommendation is to create a small application that just runs your serializer in a tight loop, and monitor with a profiler.  In other words:

        Serializer mySerializer = ….
        while(true){
          mySerializer.serialize(somePayload);
        }

    If I recall correctly, you had some odd recursion going on in some of your methods (instead of using a retry loop) - that's always asking for problems.  If your coding style uses a lot of recursion, take a look at that as well.

    The biggest thing that you need to do at this point is to start simple, and gradually add complexity back in, monitoring heap behavior as you add each thing in.

     
  • Brian Frutchey
    Brian Frutchey
    2009-12-24

    Merry Christmas!  So I realized how foolish I have been - I am serializing a collection of strings as a single value, and that collection grows with every insert.  This is the source of my OOM.  So I fixed that, and now even Compression doesn't cause OOM.  I feel like an idiot…  Thanks for all the help guys!

    In order to fix the use of a collection as a value though I had to waste storage space (value was logically useless but I still had to set it, and even 1 byte per record adds up) and overload the keys of a BTree…  Would be really nice if we allowed duplicate keys in the BTree.

    I would still like an explanation as to why more frequent commits cause the on-disk DB file to be exponentially larger than less frequent commits.  For example, committing every 1K inserts results in a 40GB file after 1.5M inserts, which committing every 200K inserts results in only a 70MB file after 1.5M inserts.  Shouldn't the two files be the same size?

    Thanks again!

     
  • Kevin Day
    Kevin Day
    2009-12-28

    Brian - I've got a question for you towards the bottom of this post - please be sure to read!

    No, they shouldn't be the same size (although I am surprised that the difference is as much as you indicate).  The byte arrays that actually consume the pages in the file don't get constructed until you commit.  So if you have an object that changes 1000 times, it only gets serialized once, and only consumes space in a page once.

    If you commit more frequently, the same object can be written to page multiple times.  When you write an object to a page, and it is bigger than the record that it has already consumed, it winds up allocated to a new record, and the old record is left behind (it actually gets added to a free list).  So if you have a ton of records that are continuously growing in size, then I would expect the size of the db file to grow.

    But  I wouldn't expect it to grow by an order of magnitude, which is what you are indicating.

    Brian - I seem to remember that you were playing with the free record allocator - any chance that something you did could have resulted in this behavior?

    A useful unit test here would be to open two db files and insert into both in a tight loop, committing to one 10 times more frequently than the other, and checking the db file size of each.  If the keys and values inserted are all roughly the same size, we shouldn't see a massive divergence between the two.

     
<< < 1 2 (Page 2 of 2)