Menu

Smyle v. XCollections

Help
Curt Cox
2004-12-15
2013-04-08
  • Curt Cox

    Curt Cox - 2004-12-15

    Stefan,

    I'm the author of a disk-based collections package that meets a need similar to Smyle.
    https://xcollections.dev.java.net/

    Unlike Smyle, it doesn't make any real attempts at persistence.  XCollections attempts to provide a relatively painless way to replace RAM-based collections from java.util with disk-based collections.  That way, developers can write their application using standard collections and switch to XCollections if they need to handle more data than they have available RAM.

    I'm writing a benchmark analogous to drjava.smyle.tests.SmyleBench to give users some idea of what kind of performance they can expect.  I've included the source to my benchmark, so that you can object to anything that you find unfair.  I want to drum up users and developers for XCollections, but I also want users to have realistic expectations of what they are going to get.

    My test uncovered a few XCollections glitches, so you can't run it against the version currently available for download.  I'll fix that soon.

    Running SmyleBench with n = 1000000 generated a java.lang.OutOfMemoryError.  Is this a bug, or a usage restriction that I should be familiar with?

    I'm also interested in adding facilities to XCollections to make migrating between XCollections and Smyle as painless as possible.  The first step is probably adding a DataIO for Marshallable, so that things like PERSON_IO in my benchmark aren't necessary.  After that, I want to add factory methods to XCollections like xPersistentMap(key,value) that would use Smyle for persistence.  Any suggestions that you have for implementing either of those is very welcome.

    - Curt  

    /*
    This source file is part of Smyle, a database library.
    For up-to-date information, see http://www.drjava.de/smyle
    Copyright (C) 2001 Stefan Reich (doc@drjava.de)

    This library is free software; you can redistribute it and/or
    modify it under the terms of the GNU Lesser General Public
    License as published by the Free Software Foundation; either
    version 2.1 of the License, or (at your option) any later version.

    This library is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    Lesser General Public License for more details.

    You should have received a copy of the GNU Lesser General Public
    License along with this library; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

    For full license text, see doc/license/lgpl.txt in this distribution
    */

    package drjava.smyle.tests;

    import java.util.*;
    import java.io.*;

    import drjava.util.*;
    import drjava.smyle.*;
    import drjava.smyle.testtypes.*;

    import lgpl.io.*;
    import lgpl.xcollections.*;

    public class XCollectionBench extends Benchmark {
      static final int n = 1000000;
      Set table;
     
      public XCollectionBench() {
        setDescription("Adding, indexing and removing "+n+" Person records");
      }
     
      void add() {
        for (int i = 0; i < n; i++) {
          table.add(new Person(String.valueOf(i), i));
        }
        done("add "+n+" records");
      }

      void remove() {
        table.clear();
        done("removed "+n+" records");
      }
           
      protected void action() {
        table = XCollections.xSet(Person.class);
        add();
        remove();
      }
     
      public static void main(String[] args) {
        for (int i = 0; i < 1; i++) {
          XCollectionBench bench = new XCollectionBench();
          bench.runAndPrint();
          System.out.println("Records/s: "+(long) (n/(bench.totalTime()*0.001)));
        }
        System.gc(); // for memprofile; store is closed, but still referenced
      }
     
    public static final DataIO.Variable PERSON_IO = new DataIO.Variable() {
        {
            register(this,Person.class);
        }

        public Object read(DataInputStream in) throws IOException {
            String name = (String) DataIO.STRING.read(in);
            int age     = in.readInt();
            return new Person(name,age);
        }

        public void write(DataOutputStream out, Object o)
            throws IOException
        {
            Person person = (Person) o;
            DataIO.STRING.write(out,person.name);
            out.writeInt(person.age);
        }

        public Iterator nominateNulls() {
            return Collections.singletonList(new Person("",-1)).iterator();
        }

        public int getSize(Object o) {
            Person person = (Person) o;
            return DataIO.STRING.getSize(person.name) + DataIO.INTEGER.getSize();
        }
       
    };

    }

     
    • Curt Cox

      Curt Cox - 2005-01-06

      I'm reposting Stephan's reply here:

      OK, I looked at it again... Frankly, the thing is that I kind of dropped Smyle development. It works (for example, as the DB engine for Superversion), but I don't plan to do much work on it. I'm not all too happy with the "upper layers" of Smyle anymore (the IDL compiler, table handling and so on). I feel that I aimed for too much and delivered too little. For example, queries are way too restricted, automagic indexing doesn't work perfectly (although for practical purposes, it does work quite well).

      If people ask me about working on Smyle (they do sometimes), I nowadays try to talk them into using the lower Smyle layers - the raw bit management - and building a new engine on top of that. So far, nobody has taken the challenge, unfortunately.

      The lower layers basically consist of these classes:

      FileSystemDisk
      DefaultChunkManager
      Handles
      PersistentBTree

      These give you all you need to store raw byte arrays. With a little help from your side, garbage collection (with compaction) is performed automatically. Atomic transactions are guaranteed.

      Your part is to convert your data and index structures into byte arrays
      - and back, of course.

      I am very satisfied with the lower layers of Smyle; I think they have an extremely simple interface and they work reliably and efficiently.

      Maybe it would make sense to bridge XCollections to these layers?

      Cheers,
      -Stefan

       

Log in to post a comment.

MongoDB Logo MongoDB