We demonstrate some of the features of Roomy by giving three possible solutions to the pancake sorting problem. Pancake sorting refers to sorting a sequence of elements using a prefix reversal operation. For example, the sequence [3 2 1 4 5]
can be sorted by reversing the first three elements, yielding [1 2 3 4 5]
.
Here, we wish to answer the following questions: what is the minimum number of reversals sufficient to sort any sequence of N items. For example, any sequence of 11 items can be sorted with 13 or fewer reversals. We will solve this problem by performing a breadth-first search of the pancake graph, starting with the sorted sequence. Two sequences are connected if they are related by a single reversal.
For more details on the problem, see the Wikipedia article on pancake sorting.
RoomyList
s for Breadth-first Search_The complete code for the following example: pancakeList.c. _
The following is a simple Roomy program to perform a breadth-first search of the pancake graph. The following function produces all of the permutations that can be reached using a single reversal from a given permutation.
// Generate all states that can be reached in one prefix reversal // from the given state. void genAllPerm(Perm p, Perm nbrs[]) { int i; for(i = 2; i <= PERM_LEN; i++) { revPrefix(p, i, nbrs[i-2]); } }
where revPrefix(p, i, nbrs[j]
reverses the first i
elements of permutation p
and places the result in the array nbrs[j]
.
Next, we define three RoomyList
data structures. These three lists are in the global namespace. They will store all of the elements seen so far, the current level of the search, and the next level.
RoomyList* allLevList; // Contains all permutations discoverd so far. RoomyList* curLevList; // Contains permutations at the 'current' level. RoomyList* nextLevList; // Contains permutations at the 'next' level.
The following is a function that will be mapped over the curLevelList
to generate the next level, using the RoomyList_map
method. The function will be applied to each element of the list, and will store all neighboring elements in the nextLevList
.
// To be mapped over the current level to produce the next level. void genNextLevFromList(void* val) { Perm nbrs[PERM_LEN-1]; genAllPerm(val, nbrs); int i; for(i=0; i<PERM_LEN-1; i++) { RoomyList_add(nextLevList, &(nbrs[i])); } }
The RoomyList_add
method is used to add each new element to the nextLevList
. This Roomy operation is a delayed operation. That means the addition of the element does not happen until RoomyList_sync
is called on nextLevList
. This sync
operation will be see in the code below.
Next, we define the primary search method.
// Top level function for breadth-first search of pancake graph. void pancakeListBFS() { Roomy_log("BEGINNING %i PANCAKE BFS\n", PERM_LEN); // Create lists. allLevList = RoomyList_make("pancakeAllLevList", sizeof(Perm)); curLevList = RoomyList_make("pancakeLev0", sizeof(Perm)); nextLevList = RoomyList_make("pancakeLev1", sizeof(Perm)); Roomy_log("Initial RoomyLists constructed\n");
The method begins by printing some status information using the Roomy_log
method. Because the Roomy program is operating in parallel, the use of a simple printf
would cause each parallel process to output the information. Roomy_log
avoids this, but otherwise works like printf
.
Next, the three RoomyList
s are created. The constructor takes a list name and the size of the elements.
// Set identity element. Perm ident; getIdentPerm(ident); RoomyList_add(allLevList, &ident); RoomyList_add(curLevList, &ident); Roomy_sync();
This code creates the identity permutation (the solved element) and adds it to allLevList
and curLevList
. Notice that the method Roomy_sync
must be called to finalize these delayed operations. You could also sync
individually using RoomyList_sync
.
Now, the main loop of the breadth-first search.
int curLevel = 0; // The current level number. uint64 numThisLevel = 1; // Number of elements at current level. Roomy_log("Level %i done: %lli elements\n", curLevel, numThisLevel); // Continue generating new levels until the current level is empty. while(numThisLevel) { // Generate next level from current. RoomyList_map(curLevList, genNextLevFromList); RoomyList_sync(nextLevList);
The first step is to generate the next level from the current level. The RoomyList_map
method will apply the function genNextLevFromList
(defined above) to every element of curLevList
. Neighboring elements are added to nextLevList
, and then that list is sync
'ed.
// Detect duplicates. RoomyList_removeDupes(nextLevList); RoomyList_removeAll(nextLevList, allLevList); RoomyList_sync(nextLevList);
Then, duplicates are removed. First, duplicates within nextLevList
are removed using RoomyList_removeDupes
, then any elements seen at previous levels are removed using RoomyList_removeAll
.
// Record new non-duplicate elements. RoomyList_addAll(allLevList, nextLevList); RoomyList_sync(allLevList);
The non-duplicate elements are then added to the list of all previously seen element.
// Create new next level, delete current level, rotate pointers. curLevel++; RoomyList_destroy(curLevList); curLevList = nextLevList; char levName[1024]; sprintf(levName, "pancakeLev%i", curLevel+1); nextLevList = RoomyList_make(levName, sizeof(Perm)); // Get the number of new states at current level. numThisLevel = RoomyList_size(curLevList); Roomy_log("Level %i done: %lli elements\n", curLevel, numThisLevel); }
Finally, the levels are rotated and the loop repeats. The levels are rotated by destroying curLevList
, assigning nextLevList
to the curLevList
pointer, and creating a new, empty nextLevList
.
This process repeats until the size of curLevList
is zero. Then, we conclude the method by destroying all lists.
RoomyList_destroy(allLevList); RoomyList_destroy(curLevList); RoomyList_destroy(nextLevList); Roomy_log("PANCAKE BFS DONE\n"); }
And, the main method is in charge calling our search method, as well as initializing and finalizing Roomy (similar to how MPI works).
int main(int argc, char **argv) { Roomy_init(&argc, &argv); pancakeListBFS(); Roomy_finalize(); return 0; }
The output of running this program for the 11-pancake problem is:
Thu Dec 3 02:21:10 2009: BEGINNING 11 PANCAKE BFS Thu Dec 3 02:21:10 2009: Initial RoomyLists constructed Thu Dec 3 02:21:11 2009: Level 0 done: 1 elements Thu Dec 3 02:21:15 2009: Level 1 done: 10 elements Thu Dec 3 02:21:19 2009: Level 2 done: 90 elements Thu Dec 3 02:21:24 2009: Level 3 done: 809 elements Thu Dec 3 02:21:28 2009: Level 4 done: 6429 elements Thu Dec 3 02:21:32 2009: Level 5 done: 43891 elements Thu Dec 3 02:21:37 2009: Level 6 done: 252737 elements Thu Dec 3 02:21:45 2009: Level 7 done: 1174766 elements Thu Dec 3 02:22:09 2009: Level 8 done: 4126515 elements Thu Dec 3 02:23:27 2009: Level 9 done: 9981073 elements Thu Dec 3 02:26:48 2009: Level 10 done: 14250471 elements Thu Dec 3 02:31:39 2009: Level 11 done: 9123648 elements Thu Dec 3 02:35:07 2009: Level 12 done: 956354 elements Thu Dec 3 02:36:18 2009: Level 13 done: 6 elements Thu Dec 3 02:37:11 2009: Level 14 done: 0 elements Thu Dec 3 02:37:12 2009: PANCAKE BFS DONE
_The complete code for the above example: pancakeList.c. _
RoomyArray
for Breadth-first Search_The complete code for the following example: pancakeArray.c. _
The following is another Roomy program to solve the pancake sorting problem using breadth-first search. This time, a single RoomyArray
is used to track which states have been discovered so far. This array uses only one bit per element, instead of the 11 bytes per element in the RoomyList
version above (for the 11-pancake problem). Also, the RoomyArray
-based solution does not require the elements to be sorted, where the RoomyList
version must sort all generated elements to allow for duplicate detection. Because of this, the RoomyArray
solution is more than 10 times faster than the RoomyList
solution.
To facilitate the use of an indexed array, we first must have a method for efficiently mapping permutations of N
elements to integers in the range [0, N)
. The following is an implementation of the method given in "Ranking and unranking permutations in linear time", by Myrvold and Ruskey.
// Invert a permutation. void invertPerm(Perm p, Perm out) { int i; for(i=0; i<PERM_LEN; i++) out[p[i]] = i; } // Map a permutation to a 64-bit integer. uint64 rankRecur(Elt n, Perm p, Perm r) { if(n == 1) return 0; Elt s = p[n-1]; swap(p, n-1, r[n-1]); swap(r, s, n-1); return s + n * rankRecur(n-1, p, r); } uint64 rank(Perm p) { Perm p2; copyPerm(p2, p); Perm r; invertPerm(p2, r); return rankRecur(PERM_LEN, p2, r); } // Map a 64-bit integer to a permutation. void unrankRecur(Elt n, uint64 r, Perm p) { if(n > 0) { swap(p, n-1, r%n); unrankRecur(n-1, r/n, p); } } void unrank(uint64 r, Perm out) { getIdentPerm(out); unrankRecur(PERM_LEN, r, out); }
Given the above rank
and unrank
methods, we can generate all of the all neighbors of a given state using the following method.
// Generate all states that can be reached in one reversal from the given // ranked state void genAll(uint64 state, uint64 nbrs[]) { Perm p; unrank(state, p); int i; for(i=2; i<=PERM_LEN; i++) { Perm rev; revPrefix(p, i, rev); nbrs[i-2] = rank(rev); } }
The following code defines a RoomyArray
in the global namespace, along with a method that is applied to a given element of the array to generate neighboring elements.
RoomyArray* dupe; // Update function: if the element hasn't been seen yet, sets the bit and generates // neighboring states. void generate(uint64 state, void* oldVal, void* updateVal, void* newVal) { uint8 seen = *(uint8*)oldVal; if(!seen) { uint64 nbrs[PERM_LEN-1]; genAll(state, nbrs); int i; for(i=0; i<PERM_LEN-1; i++) { RoomyArray_update(dupe, nbrs[i], NULL, generate); // don't need to pass // update val } } *(uint8*)newVal = 1; }
The method generate
will be passed to the RoomyArray_update
method. Execution of the generate
method will be delayed until RoomyArray_sync
is called. The arguments given to generate
are as follows:
uint64 state
: The index of the RoomyArray
element being updated. This integer corresponds to a given permutation in the pancake sorting problem. void* oldVal
: The old value at this index of the RoomyArray
. Because this RoomyArray
is storing 1-bit elements, the value is of type uint8
, and will be either 0
or 1
. void* updateVal
: a user-defined value passed to RoomyArray_update
. In this case, there is no update value. void* newVal
: the new value to store at this index of the RoomyArray
. Again, it is of type uint8
and must be 0
or 1
. Before giving the main search routine, we define one more method that we will attach to the RoomyArray
as a predicate. A predicate is a function that returns 0
or 1
when given an element of the RoomyArray
. In this case, the predicate simply counts the number of elements in the array that equal 1
.
// A predicate counting the number of non-zero elements. uint8 isDupe(uint64 i, void* value) { return *(uint8*)value == 1; }
Now, the main breadth-first search routine. First, we create the RoomyArray
, register the update function generate
, and attach the predicate isDupe
.
// Top level function for method using three 1-bit arrays void pancakeOneBitBFS() { Roomy_log("BEGINNING %i PANCAKE BFS\n", PERM_LEN); // Initialize bit array dupe = RoomyArray_makeBits("pancakeDupes", 1, N_STATES); RoomyArray_registerUpdateFunc(dupe, generate, 0); RoomyArray_attachPredicate(dupe, isDupe);
Next, we update the element in the RoomyArray
that represents the identity (or solved) permutation. The arguments to RoomyArray_update
are: dupe
, the RoomyArray
to update; identPos
, the index of the element to update; the update value to pass to the update function, in this there is none, so NULL
is given; and generate
the user-defined update function to execute (defined above).
// set identity element Perm ident; getIdentPerm(ident); uint64 identPos = rank(ident); RoomyArray_update(dupe, identPos, NULL, generate);
Finally, the main search loop is defined.
// generate levels int curLevel = 0; uint64 lastTotal = 0, newTotal = 0, numThisLevel = 1; while(!RoomyArray_isSynced(dupe)) { // process existing updates, generate updates for next level RoomyArray_sync(dupe); newTotal = RoomyArray_predicateCount(dupe, isDupe); numThisLevel = newTotal - lastTotal; lastTotal = newTotal; Roomy_log("Level %i done: %lli elements\n", curLevel, numThisLevel); // next level curLevel++; } RoomyArray_destroy(dupe); Roomy_log("ONE-BIT PANCAKE BFS DONE\n"); }
The loop continues as long as RoomyArray_isSynced
is false, meaning there are delayed updates waiting to be processed. Each time RoomyArray_sync
is called, the generate
method is applied to all elements at the current level. Processing these delayed updates in turn generated more delayed update operations. We call these cascading delayed operations. The following rule is applied when processing delayed operations for a RoomyArray
:
Cascading delayed operations: if the processing of delayed operations for a RoomyArray
creates additional delayed operations for that same RoomyArray
, the new operations are not processed until the next call to RoomyArray_sync
.
(and the same rule applied for all other Roomy data structures).
So, each call to RoomyArray_sync
in the code above sets the bits for the current level to 1
and generates delayed updates for the next level. When all of the next level updates are duplicates, the search terminates.
The predicate isDupe
is used to count the number of elements seen so far using the RoomyArray_predicateCount
method.
The output of running this program for the 11-pancake problem is:
Sun Dec 6 22:12:49 2009: BEGINNING 11 PANCAKE BFS Sun Dec 6 22:12:49 2009: Initial RoomyLists constructed Sun Dec 6 22:12:51 2009: Level 0 done: 1 elements Sun Dec 6 22:12:54 2009: Level 1 done: 10 elements Sun Dec 6 22:12:58 2009: Level 2 done: 90 elements Sun Dec 6 22:13:02 2009: Level 3 done: 809 elements Sun Dec 6 22:13:06 2009: Level 4 done: 6429 elements Sun Dec 6 22:13:10 2009: Level 5 done: 43891 elements Sun Dec 6 22:13:14 2009: Level 6 done: 252737 elements Sun Dec 6 22:13:22 2009: Level 7 done: 1174766 elements Sun Dec 6 22:13:46 2009: Level 8 done: 4126515 elements Sun Dec 6 22:15:03 2009: Level 9 done: 9981073 elements Sun Dec 6 22:18:22 2009: Level 10 done: 14250471 elements Sun Dec 6 22:23:07 2009: Level 11 done: 9123648 elements Sun Dec 6 22:26:32 2009: Level 12 done: 956354 elements Sun Dec 6 22:27:42 2009: Level 13 done: 6 elements Sun Dec 6 22:28:35 2009: Level 14 done: 0 elements Sun Dec 6 22:28:35 2009: PANCAKE BFS DONE
_The complete code for the above example: pancakeArray.c. _
RoomyHashTable
for Breadth-first Search_The complete code for the following example: pancakeHashTable.c. _
The third solution to the pancake sorting problem uses a RoomyHashTable
. It is similar to the previous solutions, but introduces a new Roomy concept, a reduction. A reduction is used to extract a small amount of information from a Roomy data structure. In the following example, it is used to count the number of elements in a RoomyHashTable
. Other possible uses could be: finding the 10 largest elements; calculating the sum of all elements; or taking a random sampling of elements.
This solution uses the permutation ranking and unranking methods given above for the RoomyArray
-based solution.
First, we define three RoomyHashTable
s, which will store: all elements seen so far; the current level; and the next level (similar to the RoomyList
-based solution above).
// Three hash tables: all elts seen, cur level, next level RoomyHashTable* allRHT; RoomyHashTable* curRHT; RoomyHashTable* nextRHT; // The current level uint8 CUR_LEVEL;
The global variable CUR_LEVEL
will store the number of the depth of the current level.
The following three methods will be used to: generate the next level of the search from the current one; remove duplicates from the next level; and record the new non-duplicate elements as "seen".
// Function to be mapped over curRHT to generate next level in nextRHT. void generateNextLevel(void* key, void* value) { uint64* state = (uint64*)key; uint8* level = (uint8*)value; if(*level == CUR_LEVEL) { uint64 nbrs[PERM_LEN-1]; genAll(*state, nbrs); uint8 nextLevel = CUR_LEVEL + 1; int i; for(i=0; i<PERM_LEN-1; i++) { RoomyHashTable_insert(nextRHT, &(nbrs[i]), &nextLevel); } } } // To be mapped over allRHT to remove all seen elements from nextRHT. void removeDupes(void* key, void* value) { RoomyHashTable_remove(nextRHT, key); } // To be mapped over nextRHT to add all new elements to allRHT. void recordNewElts(void* key, void* value) { RoomyHashTable_insert(allRHT, key, value); }
The next two methods will be used to "reduce" the current level to a single integer, counting the number of elements in the hash table. This can be done more efficiently by simply using the RoomyHashTable_size
method. The reduction is shown simply for demonstration purposes.
// Reduction functions that count the number of elements. void addOneToAns(void* ansInOut, void* key, void* value) { uint64* count = (uint64*)ansInOut; *count = *count + 1; } void sumAns(void* ansInOut, void* ansIn) { uint64* count = (uint64*)ansInOut; *count = *count + *(uint64*)ansIn; }
The reduction is done in two stages. The first stage reduces all of the elements on a single compute node to a single value, using the first method, addOneToAns
. The second stage reduces the partial answers to get the final answer, using sumAns
. Because the order of the reduction is not guaranteed, the reduction functions must be commutative and associative.
Now, the main search routine. First, the three RoomyHashTable
s are created, and the identity permutation is inserted into allRHT
and curRHT
.
// Top level function for method using one RoomyHashTable void pancakeHashTable() { Roomy_log("BEGINNING %i PANCAKE BFS: RoomyHashTable version\n", PERM_LEN); // Set element counters (assuming level 0 has been completed). uint64 levelSize = 1; // number of elements in curRHT uint64 totalElts = 1; // total number of elements see so far // Create hash tables CUR_LEVEL = 0; allRHT = RoomyHashTable_make("allRHT", sizeof(uint64), sizeof(uint8), N_STATES); curRHT = RoomyHashTable_make("lev0RHT", sizeof(uint64), sizeof(uint8), levelSize); nextRHT = RoomyHashTable_make("lev1RHT", sizeof(uint64), sizeof(uint8), levelSize * PERM_LEN); // Set identity element Perm ident; getIdentPerm(ident); uint64 identPos = rank(ident); RoomyHashTable_insert(allRHT, &identPos, &CUR_LEVEL); RoomyHashTable_sync(allRHT); RoomyHashTable_insert(curRHT, &identPos, &CUR_LEVEL); RoomyHashTable_sync(curRHT); Roomy_log("Level %i done: %lli elements\n", CUR_LEVEL, levelSize);
The main loop will: generate the next level; remove duplicates; and record the non-duplicate elements.
// While current level is not empty while (levelSize > 0) { // map over cur level, add neighbors to next level RoomyHashTable_map(curRHT, generateNextLevel); RoomyHashTable_sync(nextRHT); // remove all elts in seen from next (by mapping over seen and calling // remove) RoomyHashTable_map(allRHT, removeDupes); RoomyHashTable_sync(nextRHT); // add all elts in next to seen (by mapping over next and calling update) RoomyHashTable_map(nextRHT, recordNewElts); RoomyHashTable_sync(allRHT);
Next, the levels are rotated: the next level becomes the current level; and a new next level is created. If the current level has M
elements, the hash table for the next level will be created with an initial capacity of M * B
, where B
is the branching factor. (In the case of the 11-pancake problem, B = 10
.) Accurately sizing the initial capacity of the RoomyHashTable
is important for efficiency. If the initial capacity is too small, the hash table will dynamically double in size as elements are inserted, requiring the entire table to be read and re-written. If the capacity it too large, excess disk space will be used, and more time will be spent reading empty hash table slots.
// rotate levels RoomyHashTable_destroy(curRHT); curRHT = nextRHT; levelSize = RoomyHashTable_size(curRHT); CUR_LEVEL++; char levName[1024]; sprintf(levName, "lev%iRHT", CUR_LEVEL + 1); // New table size is 30% bigger than the next level, to avoid doubling // of hash table capacity. But, it doesn't need to be bigger than the // entire search space. (PERM_LEN-1 is the branching factor.) uint64 tableSize = levelSize * (PERM_LEN-1) * 13 / 10; if (tableSize > N_STATES) { tableSize = N_STATES; } nextRHT = RoomyHashTable_make(levName, sizeof(uint64), sizeof(uint8), tableSize);
Finally, the size of the current level is calculated using a reduction, and that value is compared to the size returned by RoomyHashTable_size
.
// Confirm size reported by reduction matches level size. uint64 numEltsFromReduce = 0; RoomyHashTable_reduce(curRHT, &numEltsFromReduce, sizeof(uint64), addOneToAns, sumAns); assert(numEltsFromReduce == levelSize); totalElts = RoomyHashTable_size(allRHT); Roomy_log("Level %i done: %lli elements\n", CUR_LEVEL, levelSize); } Roomy_log("PANCAKE BFS DONE\n"); }
The output of running this program for the 11-pancake problem is:
Sun Dec 6 23:40:40 2009: BEGINNING 11 PANCAKE BFS: RoomyHashTable version Sun Dec 6 23:40:41 2009: Level 0 done: 1 elements Sun Dec 6 23:40:42 2009: Level 1 done: 10 elements Sun Dec 6 23:40:43 2009: Level 2 done: 90 elements Sun Dec 6 23:40:44 2009: Level 3 done: 809 elements Sun Dec 6 23:40:45 2009: Level 4 done: 6429 elements Sun Dec 6 23:40:46 2009: Level 5 done: 43891 elements Sun Dec 6 23:40:48 2009: Level 6 done: 252737 elements Sun Dec 6 23:40:50 2009: Level 7 done: 1174766 elements Sun Dec 6 23:40:56 2009: Level 8 done: 4126515 elements Sun Dec 6 23:41:12 2009: Level 9 done: 9981073 elements Sun Dec 6 23:41:42 2009: Level 10 done: 14250471 elements Sun Dec 6 23:42:31 2009: Level 11 done: 9123648 elements Sun Dec 6 23:42:58 2009: Level 12 done: 956354 elements Sun Dec 6 23:43:06 2009: Level 13 done: 6 elements Sun Dec 6 23:43:09 2009: Level 14 done: 0 elements Sun Dec 6 23:43:09 2009: PANCAKE BFS DONE
_The complete code for the above example: pancakeHashTable.c. _