This list is closed, nobody may subscribe to it.
| 2010 |
Jan
|
Feb
(19) |
Mar
(8) |
Apr
(25) |
May
(16) |
Jun
(77) |
Jul
(131) |
Aug
(76) |
Sep
(30) |
Oct
(7) |
Nov
(3) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
(2) |
Jul
(16) |
Aug
(3) |
Sep
(1) |
Oct
|
Nov
(7) |
Dec
(7) |
| 2012 |
Jan
(10) |
Feb
(1) |
Mar
(8) |
Apr
(6) |
May
(1) |
Jun
(3) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(8) |
Dec
(2) |
| 2013 |
Jan
(5) |
Feb
(12) |
Mar
(2) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(22) |
Aug
(50) |
Sep
(31) |
Oct
(64) |
Nov
(83) |
Dec
(28) |
| 2014 |
Jan
(31) |
Feb
(18) |
Mar
(27) |
Apr
(39) |
May
(45) |
Jun
(15) |
Jul
(6) |
Aug
(27) |
Sep
(6) |
Oct
(67) |
Nov
(70) |
Dec
(1) |
| 2015 |
Jan
(3) |
Feb
(18) |
Mar
(22) |
Apr
(121) |
May
(42) |
Jun
(17) |
Jul
(8) |
Aug
(11) |
Sep
(26) |
Oct
(15) |
Nov
(66) |
Dec
(38) |
| 2016 |
Jan
(14) |
Feb
(59) |
Mar
(28) |
Apr
(44) |
May
(21) |
Jun
(12) |
Jul
(9) |
Aug
(11) |
Sep
(4) |
Oct
(2) |
Nov
(1) |
Dec
|
| 2017 |
Jan
(20) |
Feb
(7) |
Mar
(4) |
Apr
(18) |
May
(7) |
Jun
(3) |
Jul
(13) |
Aug
(2) |
Sep
(4) |
Oct
(9) |
Nov
(2) |
Dec
(5) |
| 2018 |
Jan
|
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2019 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Martyn C. <ma...@sy...> - 2010-02-23 16:15:29
|
How about SCPO? Essentially partitioning the subject by context. But we really need to see the shape and use of the data to have anything real to say here. - Martyn Bryan Thompson wrote: > I would like to solicit some input on the question of whether the primary index for the quad store should be SPOC (it is today) or CSPO. There has been some discussion on this issue in the past. I am raising the issue again in the light of discussions where an entire context corresponding to a relatively large collection of statements is to be dropped, e.g., wikipedia when mapped onto a single context, and when eventual consistency is being used for the secondary indices (that is, we handle conflict resolution on the primary statement index, e.g., SPOC, and then have a restart safe protocol guaranteeing eventual updates on the secondary statement indices). > > I have come around to the opinion that mapping that much data onto a single context is generally wrong. The information would be more readily managed by mapping it onto a set of contexts corresponding to individual wikipedia entries, each of which was then associated with the source using statements about that context. > > Thoughts? > > Bryan > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > . > > |
|
From: Bryan T. <br...@sy...> - 2010-02-23 15:15:18
|
The current dev branch is: https://bigdata.svn.sourceforge.net/svnroot/bigdata/branches/DEV_BRANCH_27_OCT_2009 Thanks, Bryan |
|
From: Matthew R. <mr...@ca...> - 2010-02-23 00:42:34
|
Coming from a system where the Context is the main unit of management for statements, CSPO feels like the correct primary index. One question would be what effect on addition/deletion efficiency does the primary index make? More specifically, if within a transactions additions/deletions usually occur with a high number of statements per context, does the proximity of the changed statements within the primary index help performance? Matt On 2/22/2010 6:38 PM, Bryan Thompson wrote: > I would like to solicit some input on the question of whether the primary index for the quad store should be SPOC (it is today) or CSPO. There has been some discussion on this issue in the past. I am raising the issue again in the light of discussions where an entire context corresponding to a relatively large collection of statements is to be dropped, e.g., wikipedia when mapped onto a single context, and when eventual consistency is being used for the secondary indices (that is, we handle conflict resolution on the primary statement index, e.g., SPOC, and then have a restart safe protocol guaranteeing eventual updates on the secondary statement indices). > > I have come around to the opinion that mapping that much data onto a single context is generally wrong. The information would be more readily managed by mapping it onto a set of contexts corresponding to individual wikipedia entries, each of which was then associated with the source using statements about that context. > > Thoughts? > > Bryan > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Bigdata-developers mailing list > Big...@li... > https://lists.sourceforge.net/lists/listinfo/bigdata-developers > |
|
From: <and...@no...> - 2010-02-22 23:44:08
|
I have always believed in a CSPO, but then -- as you say -- I have probably been misusing the 4th position and a better data model would have saved me. That said, there is a lot of that type of misuse out there. -- andrew ________________________________ From: ext Bryan Thompson [br...@sy...] Sent: Monday, February 22, 2010 6:38 PM To: big...@li... Subject: [Bigdata-developers] CSPO or SPOC? I would like to solicit some input on the question of whether the primary index for the quad store should be SPOC (it is today) or CSPO. There has been some discussion on this issue in the past. I am raising the issue again in the light of discussions where an entire context corresponding to a relatively large collection of statements is to be dropped, e.g., wikipedia when mapped onto a single context, and when eventual consistency is being used for the secondary indices (that is, we handle conflict resolution on the primary statement index, e.g., SPOC, and then have a restart safe protocol guaranteeing eventual updates on the secondary statement indices). I have come around to the opinion that mapping that much data onto a single context is generally wrong. The information would be more readily managed by mapping it onto a set of contexts corresponding to individual wikipedia entries, each of which was then associated with the source using statements about that context. Thoughts? Bryan ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bigdata-developers mailing list Big...@li... https://lists.sourceforge.net/lists/listinfo/bigdata-developers |
|
From: Bryan T. <br...@sy...> - 2010-02-22 23:38:36
|
I would like to solicit some input on the question of whether the primary index for the quad store should be SPOC (it is today) or CSPO. There has been some discussion on this issue in the past. I am raising the issue again in the light of discussions where an entire context corresponding to a relatively large collection of statements is to be dropped, e.g., wikipedia when mapped onto a single context, and when eventual consistency is being used for the secondary indices (that is, we handle conflict resolution on the primary statement index, e.g., SPOC, and then have a restart safe protocol guaranteeing eventual updates on the secondary statement indices). I have come around to the opinion that mapping that much data onto a single context is generally wrong. The information would be more readily managed by mapping it onto a set of contexts corresponding to individual wikipedia entries, each of which was then associated with the source using statements about that context. Thoughts? Bryan |
|
From: Bryan T. <br...@sy...> - 2010-02-22 21:35:36
|
I've added some language on HA for the standalone versions of bigdata: Standalone In addition to the clustered database, there are two standalone bigdata versions. One version is based on the WORM store and is suitable for between one and two billion triples are write-once data sets. The other version is based on the RW store and is suitable for larger data sets and read-write workloads. The standalone versions have less overhead for full transactions and have no overhead for RMI. However, they have less total capacity and significantly less potential write and query throughput. Highly available standalone systems utilize the same write replication and master election mechanisms. Load balancing for query is possible, but must be done at the application layer, e.g., by round robin of different SPARQL queries to the different nodes in the failover chain. Like scale-out, the master for standalone is dynamically determined and writes must be directed to the master. Bryan > -----Original Message----- > From: Bryan Thompson > Sent: Monday, February 22, 2010 1:26 PM > To: 'big...@li...' > Subject: Bigdata HA architecture design document (RFC) > > All, > > Please find attached a design document for the bigdata HA > architecture. I believe that this captures most of the > inputs from various people over the last several months. > > Comments are hereby solicited. > > Thanks, > > Bryan > |
|
From: Bryan T. <br...@sy...> - 2010-02-22 18:41:50
|
All, Please find attached a design document for the bigdata HA architecture. I believe that this captures most of the inputs from various people over the last several months. Comments are hereby solicited. Thanks, Bryan |
|
From: Bryan T. <br...@sy...> - 2010-02-21 12:47:48
|
FYI, Here is an interesting thread from concurrency-interest. I believe that [1] is the AtomicMarkableReference suggestion referenced below. It is worth thinking about whether this might be a tool to solve some of our bottlenecks. I have not dug into the archives further, but I suspect that the context was striped queues drained by different threads sharing AtomicMarkableReference wrappers in order to achieve non-blocking takes. E.g., there is no need to synchronize across the queues for each thread since the mark (or timestamp) is atomically updated so you just take() in a loop until you can CAS the bit or timestamp and then you own that reference. However, I must say that I have observed high contentioon on CAS in some contexts. I expect that such designs will do less and less well as we move into many core computing. Bryan [1] http://cs.oswego.edu/pipermail/concurrency-interest/2006-September/003214.html -----Original Message----- From: con...@cs... [mailto:con...@cs...] On Behalf Of David Walend Sent: Friday, February 19, 2010 11:20 PM To: con...@cs... Subject: Re: [concurrency-interest] Use of j.u.c. constructs in open source projects > From: kedar mhaswade <ked...@gm...> > > Someone asked me if I knew open source projects where j.u.c. > constructs are > used heavily. > Her intent was to check out (study thoroughly) how these constructs > are put to (good) use. > > Does anyone on the list have any recommendation for such a project > (open source)? > SomnifugiJMS makes heavy use of BlockingQueues, a few Locks and Conditions, and this diabolical use of AtomicMarkableReferences that Tim Peierls suggested -- it does JMS message selectors without a database. I'm not sure I'd call it "good," but the code has been kicked around for about ten years. New bug reports have gotten pretty rare. https://somnifugijms.dev.java.net/ Hope that helps, Dave _______________________________________________ Concurrency-interest mailing list Con...@cs... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
From: Bryan T. <br...@sy...> - 2010-02-21 12:47:47
|
We are currently working on three hot spots:
1. A high throughput non-blocking cache. We have been collaborating with the infinispan project here, but I think that we will wind up doing our own implementation so we can manage the amount of RAM consumed by the cache. I've written up the likely design below.
2. Node#getChild(). I am wondering whether we could eliminate the WeakReference[] on each Node, replacing it with a B+Tree global weak value hash map embedded in a Memoizer pattern for getChild(). The Memoizer pattern would make this a non-blocking code path. Since it requires a backing hash map cache, we might consider moving all of the childRef[] data into that the cache. One alternative are a per-Node cache where we clear the entries for cleared references when they are discovered to have been cleared.
3. Low-level disk read/write. These operations are currently mutex due to a JVM bug which can corrupt data when the file size is changed concurrent with a read or write. We are addressing this with a read/write lock where the read lock is used for read/write operations and the write lock is used for file size changes.
Bryan
BCHM cache design
/**
* A mostly non-blocking cache based on a {@link ConcurrentHashMap} and batched
* updates to its access policy. This approach encapsulates:
* <ul>
* <li>
* (a) an unmodified ConcurrentHashMap (CHM); combined with</li>
* <li>
* (b) non-thread-safe thread-local buffers (TLB) for touches, managed by an
* inner CHM<ThreadId,TLB> instance. The reason for this inner map is to allow
* the TLB instances to be discarded by clear(); The TLBs get batched onto;</li>
* <li>
* (c) a shared non-thread safe access policy (LIRS, LRU) built on double-linked
* nodes (DLN) stored in the inner CHM. Updates are deferred until holding the
* lock (d). The DLN reference to the cached value is final. The (prior, next,
* delete) fields are only read or written while holding the lock (d). Other
* fields could be defined by subclassing a newDLN() method to support LIRS,
* etc. The access policy will need [head, tail] or similar fields, which would
* also be guarded by the lock (d);</li>
* <li>
* (d) a single lock guarding mutation on the access policy. Since there is only
* one lock, there can be no lock ordering problems. Both batching touches onto
* (c) and eviction (per the access policy) require access to the lock, but that
* is the only lock. If the access policy batches evictions, then lock requests
* will be rare and the whole cache will be non-blocking, wait free, and not
* spinning on CAS locks 99% of the time; and</li>
* <li>
* (d) explicit management of the threads used to access the cache. e.g., by
* queuing accepted requests and servicing them out of a thread pool, which has
* the benefit of managing the workload imposed by the clients.</li>
* </ul>
* <p>
* This should have the best possible performance and the simplest
* implementation. (b) The TLB could be a DLN[] or other simple data structures.
* The access policy (c) is composed from linking DLN instances together while
* holding the lock.
* <ul>
* <li>
* A get() on the outer class looks up the DLN on the inner CHM and places it
* into the TLB (if found).</li>
* <li>
* A put() or putIfAbsent() on the outer class creates a new DLN and either
* unconditionally or conditionally puts it into the inner CHM. The new DLN is
* added to the TLB IFF it was added to the inner CHM. The access order is NOT
* updated at this time.</li>
* <li>
* A remove() on the outer class acquires the lock (d), looks up the DLN in the
* cache, and synchronously unlinks the DLN if found and sets its [deleted]
* flag. I would recommend that the clients do not call remove() directly, or
* that an outer remove() method exists which only removes the DLN from the
* inner CHM and queues up remove requests to be processed the next time any
* thread batches its touches through the lock. The inner remove() method would
* synchronously update the DLNs.</li>
* <li>
* A clear() clear the ConcurrentHashMap<Key,DLN<Val>> map. It would also clear
* the inner ConcurrentHashMap<ThreadId,TLB> map, which would cause the existing
* TLB instances to be discarded. It would have to obtain the lock in order to
* clear the [head,tail] or related fields for the access policy.</li>
* </ul>
* When batching touches through the lock, only the access order is updated by
* the appropriate updates of the DLN nodes. If the [deleted] flag is set, then
* the DLN has been removed from the cache and its access order is NOT updated.
* If the cache is over its defined maximums, then evictions are batched while
* holding the lock. Evictions are only processed when batching touches through
* the lock.
*/
|
|
From: Bryan T. <br...@sy...> - 2010-02-14 18:48:51
|