[Sop-svn] SF.net SVN: sop:[41] trunk/sopf
Status: Planning
Brought to you by:
labiknight
|
From: <lab...@us...> - 2009-07-08 09:56:26
|
Revision: 41
http://sop.svn.sourceforge.net/sop/?rev=41&view=rev
Author: labiknight
Date: 2009-07-08 09:56:21 +0000 (Wed, 08 Jul 2009)
Log Message:
-----------
Complete - SOPF Cache Requirements
Updated - SOPF Model Requirements
Labi Oyapero
Modified Paths:
--------------
trunk/sopf/cache/src/site/apt/cache.apt
trunk/sopf/model/src/site/apt/phase1_requirements.apt
Modified: trunk/sopf/cache/src/site/apt/cache.apt
===================================================================
--- trunk/sopf/cache/src/site/apt/cache.apt 2009-06-25 01:56:16 UTC (rev 40)
+++ trunk/sopf/cache/src/site/apt/cache.apt 2009-07-08 09:56:21 UTC (rev 41)
@@ -26,12 +26,21 @@
* Global-Partitioned (gp): allocated between nodes
* Global-Unpartitioned (gu): all nodes have a copy
* Local : specific to one node
+
+Assumptions
+ * Neural structure will exceed 500KB in most real-life applications
-Requirements
+Goal Derived Requirements
+ * Cache must support arbitrary sized neural-networks (even terabytes if desired)
+ * Cache may be distributed in case of large-networks
+ * Local and backup persistence must be as efficient as possible.
+ * Cycle execution must support maximum throughput
+
+Technical Requirements
* Cache Types:
* LateralCache (LC)
* Cache nodes that executes cycles are lateralCaches referred to as sibling.
- * All siblings must have the same centralNode & neural-network and cyclestamp prior-to/after a cycle.
+ * All siblings must have the same centralCache & neural-network and cyclestamp prior-to/after a cycle.
* Must be pausable,although pause signal-must be broadcasted to all siblings
* Must have an alive msg
* list siblings lateralCaches
@@ -42,10 +51,9 @@
* data-obj must be binary-sorted internally on the basis of key (id)
* tasks must be stored on a FIFO queue and persisted with order preserved
* cache could use two ports to transfer data
- * {LC-internals} may be composed of queues and metadata:
+ * {LC-internals} may be composed of queues (e.g. task-queue) and metadata:
* local-data
- * Task-queues {myTaskQueue, [sibling-taskQueues]*} (FIFO queue)
- * Task-queues consist of tasks belonging to the LC's kernel.
+ * Dynamic size (keep that in mind)
* allocation-info
* gu-data that are relevant to execution in kernel.
* gp-data: Data-obj must be sorted by type then key (id), collection? (TreeSet)
@@ -65,18 +73,20 @@
* Must allocate only unallocated-gp-data in-case of LC-joins.
* reallocation-conditions,variables?
* New data obj should be added directly to the backup data (only) with a change in designVersion variable
- * Must be able to send reallocate event to listeners.
- * If dead-CC reawake and all LCs are wake & session-synced, sync gu-data& backup (no rollback).
+ * Must be able to send reallocate event to listeners (LCs).
+ * If dead-CC reawake and all LCs are wake & cyclestamp-synced, sync gu-data& backup (no rollback).
* Data:
* Data-Types
- * Global-Partioned (gp): must be IMutable?
- * Global-Unpartitioned (gu):
- * must be deltable
- * must be synced at the end of each cycle
- * Local
- * Dynamic size (keep that in mind)
- * Holds tasks
+ * Global-Partioned (gp)
+ * Global-Unpartitioned (gu):
+ * must be deltable
+ * must be synced at the end of each cycle
+ * Local
+ * PrecycleTaskQueue
+ * CycleTaskQueues {nodeCycleTaskQueue, [sibling-CycleTaskQueues]*} (FIFO queue)
+ * nodeCycleTaskQueue consist of tasks belonging to the LC's kernel.
+ * PostycleTaskQueue
* Does not contain the cycle count, that is tracked by the cache
* All global-data should be managed by the cache.
* All global-data must have an identifier.
@@ -84,103 +94,213 @@
* data-type must uniquely identify a type of data
* data-typeId must uniquely identify a data-type
* Some properties in the data-obj such as conns, are not changed by the task
- * Persistence
- * For the cache file, the first three elements must be
- * maxLocalData: max slots for local data, value: maxDataCount * 400. ((40/obj, 10-obj/data)
- * maxDataTypeHeaderCount: max-num data-types supported, default:500 (configurable)
- * maxDataCount: the max-num of components supported, default: 100000(configurable)
+ * File Structure
+ * For the cache file, the first 100 bytes consist of 7 fields and must be
+ * neuralNetId (long)
+ * cyclestamp (long)
+ * designVersion (int)
+ * maxLocalData (long): max slots for local data, value: maxDataCount * 400. ((40/obj, 10-obj/data)
+ * maxDataTypeHeaderCount (int): max-num data-types supported, default:500 (configurable)
+ * maxDataCount (long): the max-num of components supported, default: 100000(configurable)
+ * allocationInfoSize (short): num of slots for allocating-info
+ * status (byte): [ DESIGN (1)| NEW (3)| RESUME (5)| STOP (7)| QUIT (9), RUNNING (11)]
+ * reserved (57-bytes): reserved for future use (possibly include header-start-bits and end-bits)
+ * Local-data
+ * Dynamic size (keep that in mind, see maxLocalData)
+ * Allocation-info
+ * ccip, ccport [, siblingIp, siblingPort, siblingAllocation]*
+ * gu-data that are relevant to execution in kernel.
+ * gp-data: Data-obj must be sorted by type then key (id), collection? (TreeSet)
* data-type-headers must begin the global-data section
- * data-type-header-format: [data-type-header-start-bits, data-type-id, version, [propId,size]* ]
+ * format: [data-type-header-start-bits, data-type-id, version, [propId,size]*, data-type-header-end-bits ]
+ * size: byte , short , short , short*, byte
* There should a fixed num of data-type-header-slots, value is maxDataTypeHeaderCount * 150 (assumes average of 50 fields)
- * data-header must have maxDataCount*16 slots
+ * data-header must have maxDataCount*16 slots and must be sorted
+ * format: data-header-section-start-bits,[data-id, location]*, data-header-section-end-bits
+ * size: byte , long , long , byte
* data-format
- [data-starts-bits,data-type-id,[propId, propValue]*, data-end-bits]
- * The data-bytes-serializer & data-bytes-parser should be generated automatically from class.
+ [data-type-id,data-id,[propId, propValue]*]
+ * Use negative values as indices/values except where logic overrules, this provides a wider-range.
+ * The data-bytes-serializer & data-bytes-parser should be generated automatically from class.
+ * Persistence
+ * I suggest using memory maps since our global-data will often exceed 100Kb in real-applications. (assumptions)
Dependencies
- * Relies on the availability of file-cache for persistence.
- * Consider using jcs indexed cache as the underlying data-persister.
+ * Task must inherently identify data-id
Activity Sequence
-* START_SESSION:
- * CentralNode:
- * Listen for broadcast from lateralNodes, record their service-address, act & am & timestamp.
- * LateralNode:
- * broadcast to a CentralNode at a given-port, provide available cpu time (act) & available memory (am).
-* RE-ALLOCATE:
- * CentralNode:
- * Determine data-size, assign selector-value-range among lateralNode based on their act-am.
- * Broadcast selector-value-ranges
- * Send objects to lateralNodes based on the selector-affinity.
- * LateralNode:
- * Receive the allocations
- * Receive the gp-data for this node
- * CentralNode:
- * Do CACHE_SYNC.
-* CACHE_SYNC: (on the commencement of a new-cycle at all times)
- * CentralNode:
- * send cache-update signal & summated-gu-data delta & persistence-interval (for all global-data)
+* Assumptions and Terminology
+ * LCs without qualification implies member-LCs
+* START_NODES
+ * CentralCache:
+ * startup-params: data-file, max-mem, port, maxWait4Ready, maxWait4PauseResume,maxWait4DeadResume,
+ \nsaveInterval, backupInterval, maxSkippedCycles
+ * ? Listen for msg from lateralCaches
+ * if status equals RUNNING|PAUSE, do CC_RESUME
+ * if status equals STOP, do JOIN
+ * LateralCache:
+ * startup-params: data-file (may be empty), max-mem, port, ccIp, ccPort, nni?, cyclestamp?, dataPerGd?, sizePerLd?
+ \n pingCCInterval,
+ \n . Optional variables are missing if data-file is not empty.
+ * if status equals RUNNING|PAUSE, do LC_RESUME
+ * if status equals STOP, do JOIN
+* JOIN
+ * CentralCache:
+ * Ignore non-JOIN request from non-members.
+ * Listen for JOIN from lateralCaches, record their ip, port, act & am & timestamp.
+ * Send JOIN-ACCEPT back
+ * do ALLOCATE_TO_NODE for that LC
+ * if all gp-data is not allocated, send AWAIT_JOIN signal to that LC
+ * repeat until all gp-data has been allocated
+ * LateralCache:
+ * Send JOIN-req to a CentralCache at a given-port, provide available cpu time (act) & available memory (am).
+ * If response is JOIN-ACCEPT, accept response and do ALLOCATE_TO_NODE
+ * If response is JOIN-REJECT, record then send notification to user/UI and halt.
+*ALLOCATE_TO_NODE
+ * CentralCache:
+ * Determine data-size, assign selector-value-range (id) for the lateralCache based on their act-am.
+ * Send cyclestamp, allocation-info to LC
+ * Send global-data to the LC based on the selector-affinity (id).
+ * LateralCache:
+ * Receive the cyclestamp, allocation-info, maxWait4Ready, maxWait4PauseResume, maxWait4DeadResume, saveInterval,backupInterval
+ * Receive the gu-data & gp-data for this node
+ * update data file and load-data-file
+ * Do CACHE_READY.
+* CACHE_READY
+ * CentralCache:
+ * Receive CACHE_READY signal from all member LCs.
+ * If any member does not send CACHE_READY signal within maxWait4Ready time
+ * remove that member (send JOIN-REJECT to it) and do JOIN
+ * send CACHE_SYNC-signal with all-allocation-info to all lateralCaches (LCs)
+ * LateralCache:
+ * Send CACHE_READY signal to CC
+* CC_RESUME (DEAD/PAUSE)
+ * CentralCache:
+ * send RESUME_PAUSE/RESUME_DEAD signal with most variables and persisted-cycle-stamp except allocation-info to all LC
+ * If response is received from ALL LCs within maxWait4PauseResume/maxWait4DeadResume (depends on context)
+ * If initialStatus is RUNNING (RESUME_DEAD), request gu-data delta
+ * If persisted data-cycle-num is less than current-data-cycle-num, do REQ_DATA
+ * do CACHE_SYNC
+ * Else do ROLL_BACK
+ * LateralCache:
+ * If receive RESUME_DEAD/PAUSE signal
+ * send RESUME signal to CC
+ * if receive RESUME_DEAD & persisted-cyclestamp < current-cyclestamp
+ * await REQ_DATA
+ * Otherwise wait
+* LC_RESUME
+ * CentralCache:
+ * accept RESUME_PAUSE/DEAD signal from LC
+ * if RESUME_DEAD
+ * if (cycle-stamp - current-cycle-stamp > maxSkippedCycles)
+ * do ROLLBACK
+ * LateralCache:
+ * send RESUME_PAUSE/RESUME_DEAD signal with cycle-stamp & nni
+* CACHE_SYNC (on the commencement of a new-cycle at all times)
+ * CentralCache:
+ * send cache-sync signal & summated-gu-data delta & persistence-interval (for all global-data)
(if you want to set/change it).
- LateralNode:
- * receive cache-update signal & summated-gu-data-delta
- * if last-save-cycle matches save-interval, do CACHE_SAVE
+ LateralCache:
+ * receive cache-sync signal & summated-gu-data-delta
* send cache-ready signal to listeners (executor: then executor will send cycle-start signal)
* listen to cycle-start signal, track cycle-count
-* CACHE_SAVE:
- * LateralNode:
- * send cycle-count & gp-data to centralNode
-* END_CYCLE: (At the end of each cycle)
- * LateralNode:
+* END_CYCLE (At the end of each cycle)
+ * CentralCache:
+ * receive cycle-end signal from each LCs
+ * if last-save-cycle matches backupInterval, do REQ_DATA
+ * global-field delta provided by each lateralCache is used to track alive lateralCaches.
+ * if any lateralCache is dead, go-to DEAD_CACHE
+ * else do CACHE_SYNC
+ * LateralCache:
* receive cycle-end signal from its CycleEventSource (e.g. executors), track completed-cycles
- * send gu-data-delta to the centralNode, even if it is 0.
- * CentralNode:
- * global-field delta provided by each lateralNode is used to track alive lateralNodes.
- * if any lateralNode is dead, go-to DEAD_CACHE
- * else do CACHE_SYNC
-* CACHE_QUIT: (When a lateralNode decides to quit)
- * LateralNode:
+ * if last-save-cycle matches saveInterval, do CACHE_SAVE
+ * send end-cycle signal with gu-data-delta to the centralCache, even if it is 0.
+* CACHE_SAVE
+ * CentralCache:
+ * send SAVE signal to LCs
+ * LateralCache:
+ * receive SAVE signal from CC
+ * local persistence of state
+* REQ_DATA (data persistence mechanism)
+ * CentralCache:
+ * broadcast data-request-signal with a specific cycle-num to all lateralCaches.
+ * send start-transfer signal to one LC and get all data from it
+ * repeat with next LC until all LC have sent data
+ * store received gp-data & gu-data in temp, store gu-data in memory
+ * if data is received from all allocated lateralCaches, migrate temp to permanent.
+ * else roll-back (clear temp).
+ * LateralCache:
+ * receive REQ_DATA signal
+ * if current-cycle does not match req-data-cycle-num, end no-match-data-signal to the lateralCache,
+ * receive start-transfer, then send all gp-data upon receiving request
+* CACHE_PAUSE
+ * LateralCache:
+ * optionally send PAUSE signal to CC
+ * receive pause
+ * wait for resume signal from CC
+ * CentralCache:
+ * optionally, receive PAUSE signal and
+ * request gu-data delta
+ * do REQ_DATA
+ * send PAUSE signal
+* QUIT_LC (When a lateralCache decides to quit)
+ * LateralCache: (Quitting-LC)
* receive quit-signal from QuitEventSource
- * send cache-quit signal and last-completed cycle to centralNode, with gu-data-delta (if current-cycle = completed-cycles)
- * if completed-cycles != last-save-cycle, send all altered-gp-data to centralNode
+ * send cache-quit signal and last-completed cycle to centralCache, with gu-data-delta (if current-cycle = completed-cycles)
+ * await data-request
* cleanly shutdown, closing ports and send cache-shutdown event to listeners
- * CentralNode:
- * receive cache-quit signal and last-completed cycle to centralNode, with gu-data-delta
- * store the gu-data of the quit-lateralNode in temp-place
- * store the gp-data of the quit-lateralNode in temp-place.
+ * CentralCache:
+ * receive cache-quit signal and last-completed cycle to centralCache, with gu-data-delta
+ * store the gu-data of the quit-lateralCache in temp-place
+ * if completed-cyclestamp != persisted-cyclestamp,do REQ_DATA starting with quitting-LC
+ * store the gp-data of the quit-lateralCache in temp-place.
+ * send AWAIT_JOIN signal to LCs
+ * do JOIN
+* QUIT_CC
+ * CentralCache:
+ * receive quit-signal from QuitEventSource
* do REQ_DATA
- * do REQ_ACT_AM
- * do RE-ALLOCATE
-* DEAD_CACHE:
- * do ROLL_BACK
- * goto RE-ALLOCATE
-* ROLL_BACK:
- * CentralNode: rollback gu-data & gp-data to the last persisted-state
-* REQ_DATA: (data persistence mechanism)
- * CentralNode:
- * broadcast data-request-signal with a specific cycle-num to all lateralNodes.
- * LateralNode:
- * if current-cycle matches req-data-cycle-num, send gu-data upon receiving request,
- * else send no-match-data-signal to the lateralNode.
- * CentralNode:
- * store received gp-data in temp, store gu-data in memory
- * if data is received from all allocated lateralNodes, migrate temp to permanent.
- * else roll-back (clear temp).
-* REQ_ACT_AM: (request for act and am)
- * CentralNode:
+ * send quit-cc signal to LCs
+ * cleanly shutdown, closing ports and send cache-shutdown event to listeners
+ * LateralCache:
+ * receive quit-cc signal then halt, notify user
+ * await reconfiguration for a new CC
+ * continuously ping CC to see if it awakes at pingCCInterval
+* DEAD_LC (LC abruptly terminates)
+ * CentralCache:
+ * send AWAIT_JOIN signal to LCs
+ * if RESUME_DEAD signal received within maxWait4DeadResume
+ * do RESUME_LC
+ * else
+ * if (persisted-cyclestamp < current-cyclestamp), do ROLLBACK
+ * do JOIN
+* DEAD_CC (CC abruptly terminates)
+ * LateralCache:
+ * continuously ping CC to see if it awakes at pingCCInterval
+* ROLL_BACK
+ * CentralCache:
+ * send ROLLBACK signal to LCs
+ * REQ_ACT_AM
+ * do ALLOCATE_TO_NODE
+* REQ_ACT_AM (request for act and am)
+ * CentralCache:
* Broadcast request for act-am of expired act-am based on record timestamp
- * LateralNode:
- * Send act-am to centralNode
- * CentralNode:
- * Record lateralNode's act & am.
+ * LateralCache:
+ * Send act-am to centralCache
+ * CentralCache:
+ * Record lateralCache's act & am.
Definitions
* CACHE-EVENTS:
- * CacheReAllocate
- * CacheStarted
- * CacheReady
- * CacheSyncing
- * CacheSync
- * CacheShutdown
+ * Allocate
+ * Started
+ * Ready
+ * Syncing
+ * Sync
+ * Saving
+ * BackingUp
+ * Rollback
+ * Stop
* Messages:
* LateralCache to LateralCache
* DistTaskMsg - for distributing task to appropriate cache
@@ -198,7 +318,7 @@
* GU-Data
* implements IDeltable
CentralCacheProxy
- * public List<LateralCacheProxy> listNodes();
+ * public List<LateralCacheProxy> listLateralCaches();
* public void addListeners(CacheEventListener cel);
LateralCache
Modified: trunk/sopf/model/src/site/apt/phase1_requirements.apt
===================================================================
--- trunk/sopf/model/src/site/apt/phase1_requirements.apt 2009-06-25 01:56:16 UTC (rev 40)
+++ trunk/sopf/model/src/site/apt/phase1_requirements.apt 2009-07-08 09:56:21 UTC (rev 41)
@@ -12,8 +12,8 @@
* 3. Define associations for the various components.
* 4. Components does not have to play a runtime role, components may be useful for
\n network definition.
- * 4. Must implement appropriate interface that is required by infrastructure (cache, kernel, design,
+ * 5. Must implement appropriate interface that is required by infrastructure (cache, kernel, design,
\n analysis)
- * 5. Components must support design-plugin requirements e.g e.g wiring-functions
- * 6. Components must support model-state monitoring
+ * 6. Components must support design-plugin requirements e.g e.g wiring-functions
+ * 7. Components must support model-state monitoring
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|