Menu

RoomyUserGuide

Daniel Kunkle

Roomy Tutorial

The Pancake Sorting Problem

We demonstrate some of the features of Roomy by giving three possible solutions to the pancake sorting problem. Pancake sorting refers to sorting a sequence of elements using a prefix reversal operation. For example, the sequence [3 2 1 4 5] can be sorted by reversing the first three elements, yielding [1 2 3 4 5].

Here, we wish to answer the following questions: what is the minimum number of reversals sufficient to sort any sequence of N items. For example, any sequence of 11 items can be sorted with 13 or fewer reversals. We will solve this problem by performing a breadth-first search of the pancake graph, starting with the sorted sequence. Two sequences are connected if they are related by a single reversal.

For more details on the problem, see the Wikipedia article on pancake sorting.

_The complete code for the following example: pancakeList.c. _

The following is a simple Roomy program to perform a breadth-first search of the pancake graph. The following function produces all of the permutations that can be reached using a single reversal from a given permutation.

// Generate all states that can be reached in one prefix reversal 
// from the given state.
void genAllPerm(Perm p, Perm nbrs[]) {
    int i;
    for(i = 2; i <= PERM_LEN; i++) {
        revPrefix(p, i, nbrs[i-2]);
    }
}

where revPrefix(p, i, nbrs[j] reverses the first i elements of permutation p and places the result in the array nbrs[j].

Next, we define three RoomyList data structures. These three lists are in the global namespace. They will store all of the elements seen so far, the current level of the search, and the next level.

RoomyList* allLevList;   // Contains all permutations discoverd so far.
RoomyList* curLevList;   // Contains permutations at the 'current' level.
RoomyList* nextLevList;  // Contains permutations at the 'next' level.

The following is a function that will be mapped over the curLevelList to generate the next level, using the RoomyList_map method. The function will be applied to each element of the list, and will store all neighboring elements in the nextLevList.

// To be mapped over the current level to produce the next level.
void genNextLevFromList(void* val) {
    Perm nbrs[PERM_LEN-1];
    genAllPerm(val, nbrs);
    int i;
    for(i=0; i<PERM_LEN-1; i++) {
        RoomyList_add(nextLevList, &(nbrs[i]));
    }
}

The RoomyList_add method is used to add each new element to the nextLevList. This Roomy operation is a delayed operation. That means the addition of the element does not happen until RoomyList_sync is called on nextLevList. This sync operation will be see in the code below.

Next, we define the primary search method.

// Top level function for breadth-first search of pancake graph.
void pancakeListBFS() {
    Roomy_log("BEGINNING %i PANCAKE BFS\n", PERM_LEN);

    // Create lists.
    allLevList = RoomyList_make("pancakeAllLevList", sizeof(Perm));
    curLevList = RoomyList_make("pancakeLev0", sizeof(Perm));
    nextLevList = RoomyList_make("pancakeLev1", sizeof(Perm));

    Roomy_log("Initial RoomyLists constructed\n");

The method begins by printing some status information using the Roomy_log method. Because the Roomy program is operating in parallel, the use of a simple printf would cause each parallel process to output the information. Roomy_log avoids this, but otherwise works like printf.

Next, the three RoomyLists are created. The constructor takes a list name and the size of the elements.

    // Set identity element.
    Perm ident;
    getIdentPerm(ident);
    RoomyList_add(allLevList, &ident);
    RoomyList_add(curLevList, &ident);
    Roomy_sync();

This code creates the identity permutation (the solved element) and adds it to allLevList and curLevList. Notice that the method Roomy_sync must be called to finalize these delayed operations. You could also sync individually using RoomyList_sync.

Now, the main loop of the breadth-first search.

    int curLevel = 0;         // The current level number.
    uint64 numThisLevel = 1;  // Number of elements at current level.

    Roomy_log("Level %i done: %lli elements\n", curLevel, numThisLevel);

    // Continue generating new levels until the current level is empty.
    while(numThisLevel) {
        // Generate next level from current.
        RoomyList_map(curLevList, genNextLevFromList);
        RoomyList_sync(nextLevList);

The first step is to generate the next level from the current level. The RoomyList_map method will apply the function genNextLevFromList (defined above) to every element of curLevList. Neighboring elements are added to nextLevList, and then that list is sync'ed.

        // Detect duplicates.
        RoomyList_removeDupes(nextLevList);
        RoomyList_removeAll(nextLevList, allLevList);
        RoomyList_sync(nextLevList);

Then, duplicates are removed. First, duplicates within nextLevList are removed using RoomyList_removeDupes, then any elements seen at previous levels are removed using RoomyList_removeAll.

        // Record new non-duplicate elements.
        RoomyList_addAll(allLevList, nextLevList);
        RoomyList_sync(allLevList);

The non-duplicate elements are then added to the list of all previously seen element.

        // Create new next level, delete current level, rotate pointers.
        curLevel++;
        RoomyList_destroy(curLevList);
        curLevList = nextLevList;
        char levName[1024];
        sprintf(levName, "pancakeLev%i", curLevel+1);
        nextLevList = RoomyList_make(levName, sizeof(Perm));

        // Get the number of new states at current level.
        numThisLevel = RoomyList_size(curLevList);

        Roomy_log("Level %i done: %lli elements\n", curLevel, numThisLevel);
    }

Finally, the levels are rotated and the loop repeats. The levels are rotated by destroying curLevList, assigning nextLevList to the curLevList pointer, and creating a new, empty nextLevList.

This process repeats until the size of curLevList is zero. Then, we conclude the method by destroying all lists.

    RoomyList_destroy(allLevList);
    RoomyList_destroy(curLevList);
    RoomyList_destroy(nextLevList);

    Roomy_log("PANCAKE BFS DONE\n");
}

And, the main method is in charge calling our search method, as well as initializing and finalizing Roomy (similar to how MPI works).

int main(int argc, char **argv) {
    Roomy_init(&argc, &argv);
    pancakeListBFS();
    Roomy_finalize();
    return 0;
}

The output of running this program for the 11-pancake problem is:

Thu Dec  3 02:21:10 2009: BEGINNING 11 PANCAKE BFS
Thu Dec  3 02:21:10 2009: Initial RoomyLists constructed
Thu Dec  3 02:21:11 2009: Level 0 done: 1 elements
Thu Dec  3 02:21:15 2009: Level 1 done: 10 elements
Thu Dec  3 02:21:19 2009: Level 2 done: 90 elements
Thu Dec  3 02:21:24 2009: Level 3 done: 809 elements
Thu Dec  3 02:21:28 2009: Level 4 done: 6429 elements
Thu Dec  3 02:21:32 2009: Level 5 done: 43891 elements
Thu Dec  3 02:21:37 2009: Level 6 done: 252737 elements
Thu Dec  3 02:21:45 2009: Level 7 done: 1174766 elements
Thu Dec  3 02:22:09 2009: Level 8 done: 4126515 elements
Thu Dec  3 02:23:27 2009: Level 9 done: 9981073 elements
Thu Dec  3 02:26:48 2009: Level 10 done: 14250471 elements
Thu Dec  3 02:31:39 2009: Level 11 done: 9123648 elements
Thu Dec  3 02:35:07 2009: Level 12 done: 956354 elements
Thu Dec  3 02:36:18 2009: Level 13 done: 6 elements
Thu Dec  3 02:37:11 2009: Level 14 done: 0 elements
Thu Dec  3 02:37:12 2009: PANCAKE BFS DONE

_The complete code for the above example: pancakeList.c. _

_The complete code for the following example: pancakeArray.c. _

The following is another Roomy program to solve the pancake sorting problem using breadth-first search. This time, a single RoomyArray is used to track which states have been discovered so far. This array uses only one bit per element, instead of the 11 bytes per element in the RoomyList version above (for the 11-pancake problem). Also, the RoomyArray-based solution does not require the elements to be sorted, where the RoomyList version must sort all generated elements to allow for duplicate detection. Because of this, the RoomyArray solution is more than 10 times faster than the RoomyList solution.

To facilitate the use of an indexed array, we first must have a method for efficiently mapping permutations of N elements to integers in the range [0, N). The following is an implementation of the method given in "Ranking and unranking permutations in linear time", by Myrvold and Ruskey.

// Invert a permutation.
void invertPerm(Perm p, Perm out) {
    int i;
    for(i=0; i<PERM_LEN; i++)
        out[p[i]] = i;
}

// Map a permutation to a 64-bit integer.
uint64 rankRecur(Elt n, Perm p, Perm r) {
    if(n == 1) return 0;
    Elt s = p[n-1];
    swap(p, n-1, r[n-1]);
    swap(r, s, n-1);
    return s + n * rankRecur(n-1, p, r);
}
uint64 rank(Perm p) {
    Perm p2;
    copyPerm(p2, p);
    Perm r;
    invertPerm(p2, r);
    return rankRecur(PERM_LEN, p2, r);
}

// Map a 64-bit integer to a permutation.
void unrankRecur(Elt n, uint64 r, Perm p) {
    if(n > 0) {
        swap(p, n-1, r%n);
        unrankRecur(n-1, r/n, p);
    }
}
void unrank(uint64 r, Perm out) {
    getIdentPerm(out);
    unrankRecur(PERM_LEN, r, out);
}

Given the above rank and unrank methods, we can generate all of the all neighbors of a given state using the following method.

// Generate all states that can be reached in one reversal from the given
// ranked state
void genAll(uint64 state, uint64 nbrs[]) {
    Perm p;
    unrank(state, p);
    int i;
    for(i=2; i<=PERM_LEN; i++) {
        Perm rev;
        revPrefix(p, i, rev);
        nbrs[i-2] = rank(rev);
    }
}

The following code defines a RoomyArray in the global namespace, along with a method that is applied to a given element of the array to generate neighboring elements.

RoomyArray* dupe;

// Update function: if the element hasn't been seen yet, sets the bit and generates
// neighboring states.
void generate(uint64 state, void* oldVal, void* updateVal, void* newVal) {
    uint8 seen = *(uint8*)oldVal;
    if(!seen) {
        uint64 nbrs[PERM_LEN-1];
        genAll(state, nbrs);
        int i;
        for(i=0; i<PERM_LEN-1; i++) {
            RoomyArray_update(dupe, nbrs[i], NULL, generate); // don't need to pass
                                                              // update val
        }
    }
    *(uint8*)newVal = 1;
}

The method generate will be passed to the RoomyArray_update method. Execution of the generate method will be delayed until RoomyArray_sync is called. The arguments given to generate are as follows:

  • uint64 state: The index of the RoomyArray element being updated. This integer corresponds to a given permutation in the pancake sorting problem.
  • void* oldVal: The old value at this index of the RoomyArray. Because this RoomyArray is storing 1-bit elements, the value is of type uint8, and will be either 0 or 1.
  • void* updateVal: a user-defined value passed to RoomyArray_update. In this case, there is no update value.
  • void* newVal: the new value to store at this index of the RoomyArray. Again, it is of type uint8 and must be 0 or 1.

Before giving the main search routine, we define one more method that we will attach to the RoomyArray as a predicate. A predicate is a function that returns 0 or 1 when given an element of the RoomyArray. In this case, the predicate simply counts the number of elements in the array that equal 1.

// A predicate counting the number of non-zero elements.
uint8 isDupe(uint64 i, void* value) {
    return *(uint8*)value == 1;
}

Now, the main breadth-first search routine. First, we create the RoomyArray, register the update function generate, and attach the predicate isDupe.

// Top level function for method using three 1-bit arrays
void pancakeOneBitBFS() {
    Roomy_log("BEGINNING %i PANCAKE BFS\n", PERM_LEN);

    // Initialize bit array
    dupe = RoomyArray_makeBits("pancakeDupes", 1, N_STATES);
    RoomyArray_registerUpdateFunc(dupe, generate, 0);
    RoomyArray_attachPredicate(dupe, isDupe);

Next, we update the element in the RoomyArray that represents the identity (or solved) permutation. The arguments to RoomyArray_update are: dupe, the RoomyArray to update; identPos, the index of the element to update; the update value to pass to the update function, in this there is none, so NULL is given; and generate the user-defined update function to execute (defined above).

    // set identity element
    Perm ident;
    getIdentPerm(ident);
    uint64 identPos = rank(ident);
    RoomyArray_update(dupe, identPos, NULL, generate);

Finally, the main search loop is defined.

    // generate levels
    int curLevel = 0;
    uint64 lastTotal = 0, newTotal = 0, numThisLevel = 1;

    while(!RoomyArray_isSynced(dupe)) {
        // process existing updates, generate updates for next level
        RoomyArray_sync(dupe);
        newTotal = RoomyArray_predicateCount(dupe, isDupe);
        numThisLevel = newTotal - lastTotal;
        lastTotal = newTotal;

        Roomy_log("Level %i done: %lli elements\n", curLevel, numThisLevel);

        // next level
        curLevel++;
    }

    RoomyArray_destroy(dupe);

    Roomy_log("ONE-BIT PANCAKE BFS DONE\n");
}

The loop continues as long as RoomyArray_isSynced is false, meaning there are delayed updates waiting to be processed. Each time RoomyArray_sync is called, the generate method is applied to all elements at the current level. Processing these delayed updates in turn generated more delayed update operations. We call these cascading delayed operations. The following rule is applied when processing delayed operations for a RoomyArray:

Cascading delayed operations: if the processing of delayed operations for a RoomyArray creates additional delayed operations for that same RoomyArray, the new operations are not processed until the next call to RoomyArray_sync.

(and the same rule applied for all other Roomy data structures).

So, each call to RoomyArray_sync in the code above sets the bits for the current level to 1 and generates delayed updates for the next level. When all of the next level updates are duplicates, the search terminates.

The predicate isDupe is used to count the number of elements seen so far using the RoomyArray_predicateCount method.

The output of running this program for the 11-pancake problem is:

Sun Dec  6 22:12:49 2009: BEGINNING 11 PANCAKE BFS
Sun Dec  6 22:12:49 2009: Initial RoomyLists constructed
Sun Dec  6 22:12:51 2009: Level 0 done: 1 elements
Sun Dec  6 22:12:54 2009: Level 1 done: 10 elements
Sun Dec  6 22:12:58 2009: Level 2 done: 90 elements
Sun Dec  6 22:13:02 2009: Level 3 done: 809 elements
Sun Dec  6 22:13:06 2009: Level 4 done: 6429 elements
Sun Dec  6 22:13:10 2009: Level 5 done: 43891 elements
Sun Dec  6 22:13:14 2009: Level 6 done: 252737 elements
Sun Dec  6 22:13:22 2009: Level 7 done: 1174766 elements
Sun Dec  6 22:13:46 2009: Level 8 done: 4126515 elements
Sun Dec  6 22:15:03 2009: Level 9 done: 9981073 elements
Sun Dec  6 22:18:22 2009: Level 10 done: 14250471 elements
Sun Dec  6 22:23:07 2009: Level 11 done: 9123648 elements
Sun Dec  6 22:26:32 2009: Level 12 done: 956354 elements
Sun Dec  6 22:27:42 2009: Level 13 done: 6 elements
Sun Dec  6 22:28:35 2009: Level 14 done: 0 elements
Sun Dec  6 22:28:35 2009: PANCAKE BFS DONE

_The complete code for the above example: pancakeArray.c. _

_The complete code for the following example: pancakeHashTable.c. _

The third solution to the pancake sorting problem uses a RoomyHashTable. It is similar to the previous solutions, but introduces a new Roomy concept, a reduction. A reduction is used to extract a small amount of information from a Roomy data structure. In the following example, it is used to count the number of elements in a RoomyHashTable. Other possible uses could be: finding the 10 largest elements; calculating the sum of all elements; or taking a random sampling of elements.

This solution uses the permutation ranking and unranking methods given above for the RoomyArray-based solution.

First, we define three RoomyHashTables, which will store: all elements seen so far; the current level; and the next level (similar to the RoomyList-based solution above).

// Three hash tables: all elts seen, cur level, next level
RoomyHashTable* allRHT;
RoomyHashTable* curRHT;
RoomyHashTable* nextRHT;

// The current level
uint8 CUR_LEVEL;

The global variable CUR_LEVEL will store the number of the depth of the current level.

The following three methods will be used to: generate the next level of the search from the current one; remove duplicates from the next level; and record the new non-duplicate elements as "seen".

// Function to be mapped over curRHT to generate next level in nextRHT.
void generateNextLevel(void* key, void* value) {
    uint64* state = (uint64*)key;
    uint8* level = (uint8*)value;
    if(*level == CUR_LEVEL) {
        uint64 nbrs[PERM_LEN-1];
        genAll(*state, nbrs);
        uint8 nextLevel = CUR_LEVEL + 1;
        int i;
        for(i=0; i<PERM_LEN-1; i++) {
            RoomyHashTable_insert(nextRHT, &(nbrs[i]), &nextLevel);
        }
    }
}

// To be mapped over allRHT to remove all seen elements from nextRHT.
void removeDupes(void* key, void* value) {
    RoomyHashTable_remove(nextRHT, key);
}

// To be mapped over nextRHT to add all new elements to allRHT.
void recordNewElts(void* key, void* value) {
    RoomyHashTable_insert(allRHT, key, value);
}

The next two methods will be used to "reduce" the current level to a single integer, counting the number of elements in the hash table. This can be done more efficiently by simply using the RoomyHashTable_size method. The reduction is shown simply for demonstration purposes.

// Reduction functions that count the number of elements.
void addOneToAns(void* ansInOut, void* key, void* value) {
    uint64* count = (uint64*)ansInOut;
    *count = *count + 1;
}
void sumAns(void* ansInOut, void* ansIn) {
    uint64* count = (uint64*)ansInOut;
    *count = *count + *(uint64*)ansIn;
}

The reduction is done in two stages. The first stage reduces all of the elements on a single compute node to a single value, using the first method, addOneToAns. The second stage reduces the partial answers to get the final answer, using sumAns. Because the order of the reduction is not guaranteed, the reduction functions must be commutative and associative.

Now, the main search routine. First, the three RoomyHashTables are created, and the identity permutation is inserted into allRHT and curRHT.

// Top level function for method using one RoomyHashTable
void pancakeHashTable() {
    Roomy_log("BEGINNING %i PANCAKE BFS: RoomyHashTable version\n", PERM_LEN);

    // Set element counters (assuming level 0 has been completed).
    uint64 levelSize = 1;  // number of elements in curRHT
    uint64 totalElts = 1;  // total number of elements see so far

    // Create hash tables
    CUR_LEVEL = 0;
    allRHT = RoomyHashTable_make("allRHT",
                                  sizeof(uint64),
                                  sizeof(uint8),
                                  N_STATES);
    curRHT = RoomyHashTable_make("lev0RHT",
                                  sizeof(uint64),
                                  sizeof(uint8),
                                  levelSize);
    nextRHT = RoomyHashTable_make("lev1RHT",
                                   sizeof(uint64),
                                   sizeof(uint8),
                                   levelSize * PERM_LEN);

    // Set identity element
    Perm ident;
    getIdentPerm(ident);
    uint64 identPos = rank(ident);
    RoomyHashTable_insert(allRHT, &identPos, &CUR_LEVEL);
    RoomyHashTable_sync(allRHT);
    RoomyHashTable_insert(curRHT, &identPos, &CUR_LEVEL);
    RoomyHashTable_sync(curRHT);

    Roomy_log("Level %i done: %lli elements\n", CUR_LEVEL, levelSize);

The main loop will: generate the next level; remove duplicates; and record the non-duplicate elements.

    // While current level is not empty
    while (levelSize > 0) {
        // map over cur level, add neighbors to next level
        RoomyHashTable_map(curRHT, generateNextLevel);
        RoomyHashTable_sync(nextRHT);

        // remove all elts in seen from next (by mapping over seen and calling
        // remove)
        RoomyHashTable_map(allRHT, removeDupes);
        RoomyHashTable_sync(nextRHT);

        // add all elts in next to seen (by mapping over next and calling update)
        RoomyHashTable_map(nextRHT, recordNewElts);
        RoomyHashTable_sync(allRHT);

Next, the levels are rotated: the next level becomes the current level; and a new next level is created. If the current level has M elements, the hash table for the next level will be created with an initial capacity of M * B, where B is the branching factor. (In the case of the 11-pancake problem, B = 10.) Accurately sizing the initial capacity of the RoomyHashTable is important for efficiency. If the initial capacity is too small, the hash table will dynamically double in size as elements are inserted, requiring the entire table to be read and re-written. If the capacity it too large, excess disk space will be used, and more time will be spent reading empty hash table slots.

        // rotate levels
        RoomyHashTable_destroy(curRHT);
        curRHT = nextRHT;
        levelSize = RoomyHashTable_size(curRHT);
        CUR_LEVEL++;
        char levName[1024];
        sprintf(levName, "lev%iRHT", CUR_LEVEL + 1);

        // New table size is 30% bigger than the next level, to avoid doubling
        // of hash table capacity. But, it doesn't need to be bigger than the
        // entire search space. (PERM_LEN-1 is the branching factor.)
        uint64 tableSize = levelSize * (PERM_LEN-1) * 13 / 10;
        if (tableSize > N_STATES) {
            tableSize = N_STATES;
        }

        nextRHT = RoomyHashTable_make(levName,
                                      sizeof(uint64),
                                      sizeof(uint8),
                                      tableSize);

Finally, the size of the current level is calculated using a reduction, and that value is compared to the size returned by RoomyHashTable_size.

        // Confirm size reported by reduction matches level size.
        uint64 numEltsFromReduce = 0;
        RoomyHashTable_reduce(curRHT, &numEltsFromReduce, sizeof(uint64),
                              addOneToAns, sumAns);
        assert(numEltsFromReduce == levelSize);

        totalElts = RoomyHashTable_size(allRHT);

        Roomy_log("Level %i done: %lli elements\n", CUR_LEVEL, levelSize);
    }

    Roomy_log("PANCAKE BFS DONE\n");
}

The output of running this program for the 11-pancake problem is:

Sun Dec  6 23:40:40 2009: BEGINNING 11 PANCAKE BFS: RoomyHashTable version
Sun Dec  6 23:40:41 2009: Level 0 done: 1 elements
Sun Dec  6 23:40:42 2009: Level 1 done: 10 elements
Sun Dec  6 23:40:43 2009: Level 2 done: 90 elements
Sun Dec  6 23:40:44 2009: Level 3 done: 809 elements
Sun Dec  6 23:40:45 2009: Level 4 done: 6429 elements
Sun Dec  6 23:40:46 2009: Level 5 done: 43891 elements
Sun Dec  6 23:40:48 2009: Level 6 done: 252737 elements
Sun Dec  6 23:40:50 2009: Level 7 done: 1174766 elements
Sun Dec  6 23:40:56 2009: Level 8 done: 4126515 elements
Sun Dec  6 23:41:12 2009: Level 9 done: 9981073 elements
Sun Dec  6 23:41:42 2009: Level 10 done: 14250471 elements
Sun Dec  6 23:42:31 2009: Level 11 done: 9123648 elements
Sun Dec  6 23:42:58 2009: Level 12 done: 956354 elements
Sun Dec  6 23:43:06 2009: Level 13 done: 6 elements
Sun Dec  6 23:43:09 2009: Level 14 done: 0 elements
Sun Dec  6 23:43:09 2009: PANCAKE BFS DONE

_The complete code for the above example: pancakeHashTable.c. _


Related

Wiki: Home
Wiki: RoomyInstall

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.