Menu

Extremely slow with large number of files

Help
Ben Levine
2015-05-01
2015-05-05
  • Ben Levine

    Ben Levine - 2015-05-01

    Hello, I'm having a problem with execution speed when executing a simple "SELECT MAX" query on a large number (200-500) of relatively small .mdb files (less than 7MB each) with UCanAccess. It takes about 10+ minutes to get all the data, with each read taking about 1-2 seconds. The connection string I'm using is:

    conn=DriverManager.getConnection(
    "jdbc:ucanaccess://" + filePath + ";memory=false;SingleConnection=true;"
    + "SkipIndexes=true;");
    s = conn.createStatement();
    rs = s.executeQuery(query);

    I've tried varying degrees of concurrency but no matter how many threads I have attempting to execute, they all seem to execute their ucanaccess queries sequentially. Does anyone have any suggestions on how to accomplish this faster? My gut feeling is that opening 500 database connections is just going to be inherently slow, but I'm relatively new to JDBC so maybe there's something I'm missing.

     
  • Marco Amadei

    Marco Amadei - 2015-05-02

    Yes, you're likely missing something (or making some mistake), but a testcase would be needed in order to help you.I think that the time may be drammatically reduced.
    NOTICE THAT the query time is irrelevant in your case, it's only a matter of connection time.
    You may send along the critical code and just one of the database(to be copied 200-500 time to simulate the real situation), after having tested the issue reproducibility.

    NEVERTHELESS

    I firstly would try this:
    1. change memory=false into memory=true
    2. ensure the connection to each db is closed, after having executed the query and gotten the data
    3. ensure no more than 20 databases for time are open(--->more cycles and no more than 20 concurrent threads for time)
    4. check the -Xmx parameter, it should be at least -Xmx512M, ideally -Xmx1G, and heap space should be for your process. If not, reduce the concurrent threads.
    5. please, let me know your findings, with these settings:
    this should drop down the total time of 99%.

     

    Last edit: Marco Amadei 2015-05-02
  • Gord Thompson

    Gord Thompson - 2015-05-02

    For a single read-only operation on a large number of database files you may wish to avoid the overhead of setting up a UCanAccess connection for each one and just use the Jackcess API.

    I created 10 copies of a ~9MB .accdb file that contains a single table with 100,000 rows. The following code which uses UCanAccess consistently took about 12.5 seconds to execute

    public static void main(String[] args) {
        StopWatch sw = new StopWatch();
        sw.start();
        for (int i = 0; i < 10; i++) {
            String connStr = String.format("jdbc:ucanaccess://C:/Users/Public/test/100kRows%d.accdb;memory=true;singleConnection=true;skipIndexes=true", i);
            System.out.println(connStr);
            try (
                    Connection conn = DriverManager.getConnection(connStr);
                    Statement s = conn.createStatement();) {
                try (ResultSet rs = s.executeQuery("SELECT MAX(DoubleField) AS m FROM TestData")) {
                    rs.next();
    //                System.out.println(rs.getDouble(1));
                }
            } catch (Exception e) {
                e.printStackTrace(System.err);
            }
        }
        sw.stop();
        System.out.println(sw.toString());
    }
    

     
    By contrast, the following code which uses Jackcess directly took just over 4 seconds to run

    public static void main(String[] args) {
        StopWatch sw = new StopWatch();
        sw.start();
        for (int i = 0; i < 10; i++) {
            String dbFileSpec = String.format("C:/Users/Public/test/100kRows%d.accdb", i);
            System.out.println(dbFileSpec);
            try (Database db = DatabaseBuilder.open(new File(dbFileSpec))) {
                Double dmax = Double.NEGATIVE_INFINITY;
                Table t = db.getTable("TestData");
                for (Row r : t) {
                    Double d = r.getDouble("DoubleField");
                    if (d > dmax) {
                        dmax = d;
                    }
                }
    //            System.out.println(dmax);
            } catch (Exception e) {
                e.printStackTrace(System.err);
            }
        }
        sw.stop();
        System.out.println(sw.toString());
    }
    
     
  • Marco Amadei

    Marco Amadei - 2015-05-02

    @Gord Yes, I absolutely agree, even if it may depend on the real complexity of the SELECT MAX and the grouping(in other words, if a SQL approach is truly needed).

     
  • Ben Levine

    Ben Levine - 2015-05-04

    Thanks both of you!

    My Xms and Xmx are both set to 1g, both in the IDE and manually when the program runs, and I've ensured that the database connections (and related objects) are closed after each connection. I also have my max thread pool set to 20, and I may need to take a closer look at that code to make sure they're working as I intended.

    I'll try changing memory back to true, that "memory=false" was something I had added when I was running out of heap space. Then I'll try using Jackcess directly to see if that improves execution time further.

    Thanks again!

     
    • Gord Thompson

      Gord Thompson - 2015-05-04

      You're welcome.

      When the time comes for you to try using Jackcess be sure to take advantage of any index(es) on the column(s) you're using to find the MAX() value you need. My Jackcess code sample above does a full table scan because I really just wanted to test the effect of avoiding the JDBC/HSQLDB overhead. If there had been an index on "DoubleField" and I had used a Jackcess IndexCursor then the execution time would have been much faster.

       
  • Ben Levine

    Ben Levine - 2015-05-04

    Just an update--I tried setting memory=true and once again ended up getting heap size errors, so I went with tweaking the Jackcess code Gord gave me, and it works excellently. Thanks! These tables are only about 200 records, so an IndexCursor probably wouldn't save me a whole lot of time.

    For future reference, this was the JDBC code I was using--is there anything horribly wrong with it that would cause it to use over a gig of heap space with multiple reads without freeing any up? I'm completely new to JDBC, I've done some ADO.NET before with MSSQL but that's the extent of my database experience thus far.

    @Override
    protected MinMax<Date> getSingleFileMinMaxDates(String filename) throws SQLException {
        Connection conn = null;
        Statement s = null;
        ResultSet rs = null;
        String query = "SELECT MIN(BgDateTime), MAX(BgDateTime) FROM [BG_Log]";
        try {
            conn=DriverManager.getConnection(
                "jdbc:ucanaccess://" + filePath + ";memory=false;"
                        + "SkipIndexes=true;"); //I tried this to reduce the memory consumption
            s = conn.createStatement();
            rs = s.executeQuery(query);
            Date minDate = null;
            Date maxDate = null;
            while (rs.next()) {
                minDate = rs.getDate(1);
                maxDate = rs.getDate(2);
            }
            return new MinMax<>(minDate, maxDate); //wrapper that does exactly what it sounds like
        } catch (SQLException e) {
            System.err.println("SQL read failed! Query: \"" + query + "\"\n"
            + "Path supplied: \"" + filePath + "\"");
            e.printStackTrace();
            throw new RuntimeException(e);
        } finally {
            System.out.println("Read file " + filename);
            if (rs != null) {
                System.out.println("Closing ResultSet...");
                rs.close();
            }
            if (s !=null) {
                System.out.println("Closing Statement...");
                s.close();
            }
            if (conn != null) {
                System.out.println("Closing Connection...");
                conn.close(); 
            }
        }
        return null; //to shut up the compiler
    }
    
     
  • Marco Amadei

    Marco Amadei - 2015-05-05

    Nothing bad in your code JDBC. May be something didn't work as expected in your thread management. Cheers Marco

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.