From: Bryan T. <tho...@us...> - 2007-02-19 01:05:50
|
Update of /cvsroot/cweb/bigdata-rdf/src/java/com/bigdata/rdf In directory sc8-pr-cvs4.sourceforge.net:/tmp/cvs-serv8025/src/java/com/bigdata/rdf Modified Files: TripleStore.java Log Message: Working on transaction processing support. Index: TripleStore.java =================================================================== RCS file: /cvsroot/cweb/bigdata-rdf/src/java/com/bigdata/rdf/TripleStore.java,v retrieving revision 1.17 retrieving revision 1.18 diff -C2 -d -r1.17 -r1.18 *** TripleStore.java 17 Feb 2007 23:15:27 -0000 1.17 --- TripleStore.java 19 Feb 2007 01:05:47 -0000 1.18 *************** *** 67,70 **** --- 67,71 ---- import com.bigdata.journal.IJournal; import com.bigdata.journal.Journal; + import com.bigdata.journal.Tx; import com.bigdata.objndx.BTree; import com.bigdata.objndx.IIndex; *************** *** 93,96 **** --- 94,137 ---- * A triple store based on the <em>bigdata</em> architecture. * + * @todo Refactor to support transactions and concurrent load/query + * <p> + * Conflicts arise in the bigdata-RDF store when concurrent transactions + * attempt to define the same term. The problem arises because on index is + * used to map the term to an unique identifier and another to map the + * identifiers back to terms. Further, the statement indices use term + * identifiers directly in their keys. Therefore, resolving concurrent + * definition of the same term requires that we either do NOT isolate the + * writes on the term indices (which is probably an acceptable strategy) + * or that we let the application order the pass over the isolated indices + * and give the conflict resolver access to the {@link Tx} so that it can + * update the dependent indices if a conflict is discovered on the terms + * index. + * <p> + * The simplest approach appears to be NOT isolating the terms and ids + * indices. As long as the logic resides at the index, e.g., a lambda + * expression/method, to assign the identifier and create the entry in the + * ids index we can get buy with less isolation. If concurrent processes + * attempt to define the same term, then one or the other will wind up + * executing first (writes on indices are single threaded) and the result + * will be coherent as long as the write is committed before the ids are + * returned to the application. It simply does not matter which process + * defines the term since all that we care about is atomic, consistent, + * and durable. This is a case where group commit would work well (updates + * are blocked together on the server automatically to improve + * throughput). + * <p> + * Concurrent assertions of the same statement cause write-write + * conflicts, but they are trivially resolved -- we simply ignore the + * write-write conflict since both transactions agree on the statement + * data. Unlike the term indices, isolation is important for statements + * since we want to guarentee that a set of statements either is or is not + * asserted atomically. (With the terms index, we could care less as long + * as the indices are coherent.) + * <p> + * The only concern with the statement indices occurs when one transaction + * asserts a statement and a concurrent transaction deletes a statement. I + * need to go back and think this one through some more and figure out + * whether or not we need to abort a transaction in this case. + * * @todo Refactor to use a delegation mechanism so that we can run with or * without partitioned indices? (All you have to do now is change the *************** *** 98,104 **** * handle some different initialization properties.) * ! * @todo Play with the branching factor again. Now that we are using overflow ! * to evict data onto index segments we can use a higher branching factor ! * and simply evict more often. Is this worth it? We might want a lower * branching factor on the journal since we can not tell how large any * given write will be and then use larger branching factors on the index --- 139,145 ---- * handle some different initialization properties.) * ! * @todo Play with the branching factor again. Now that we are using overflow to ! * evict data onto index segments we can use a higher branching factor and ! * simply evict more often. Is this worth it? We might want a lower * branching factor on the journal since we can not tell how large any * given write will be and then use larger branching factors on the index |