4tH comes with a wide variety of sorting algorithms. I have to admit, that's not just to offer the user a large choice, I find it also fun to write and test them, so I'm always looking at new ones.
One of the more obscure ones I found was one by M.O. Oyelami, titled "A Modified Diminishing Increment Sort for Overcoming the Search for Best Sequence of Increment for Shellsort" and published in 2008 in the "Journal of Applied Sciences Research".
It works like this: Oyelami does one iteration of switching the first element with the last element, then the second with the second last element and so on until he reaches the middle. Then he does an ordinary Insertion sort.
Although I was unable to replicate his Shell sort speeds, it got me thinking. What if I made it a recursive algorithm, splitting the array in two equal parts and then repeat the procedure until the resulting array was only one element long?
Well, it worked a bit, but I didn't get a sorted array after one iteration. So I did another one and another one. That worked for arrays that were a power of two, but those who weren't were left unsorted.
It was the middle element that stayed stationary, spoiling all the fun, so I decided to add another step, which was to check the middle element and push it out of its comfort zone if required. That did it. And it did it fast, much faster than Oyelami sort. I decided to give it a name: Circlesort, because when you see the algorithm in a diagram there are lots of concentric circles.
The final step was to add it to Rosetta code as a "draft task". The implementers of the various languages quickly found out that both pointers met in the middle. When I optimized the 4tH version accordingly, I discovered that you could do without the extra code when you simply used the pointers as indexes for the next iteration. Sure, that destroyed the parallelization for arrays that are not a power of two, but the code was much more elegant that way.
Now the arduous task of working out its characteristics came. In short, it is an unstable, recursive, parallelizable, in place sorting algorithm, that shares many characteristics with Merge sort, having a typical performance of O(n log n log n). An array can be considered sorted when no swaps have been performed during an iteration.
It's one of the fastest ways to sort an inverted array, although when even one single element is out of place, this advantage disappears almost completely. Another interesting variation is to sort it up to 0.5 log n iterations and subsequently perform a Binary Insertion sort - it's much faster.
Note that although the highest and lowest values are known after each iteration, there is little to be gained by tightening up the loops. First, it decrements linearly, while the number of iterations decrements logarithmically. Second, test runs shows it is actually slower. The algorithm itself is deviously simple, just two loops and two branches:
int InnerCircle (int* a, int* b)
{
int* sta = a; /* calculate start address */
int* end = b; /* calculate end address */
int s = 0; /* number of swaps */
int t; /* temporary value for swapping */
if (sta == end) return (0); /* if array size is larger than 1 */
/* then perform the sort */
while (sta < end) /* sort as long as begin address */
{ /* is smaller than end address */
if (*sta > *end) /* if first element is larger than end */
{ /* then swap 'em */
t = *sta; *sta = *end; *end = t; s++;
} /* and increment no. swaps */
sta++; end--; /* now adjust addresses for next */
} /* iteration */
s += InnerCircle (a, end);
s += InnerCircle (sta, b);
return (s); /* do nothing, just pass through */
}
void CircleSort (int* a, int n)
{
while (InnerCircle (a, a + n - 1));
}
In this video you can see vanilla Circlesort, vanilla Circlesort + plain Insertionsort and finally good old Quicksort at work - on random data, that is. You'll see that plain Circlesort wastes much time on getting the final items in their proper position, so using Insertion sort (or an optimized Gnomesort) to clean up the mess is not too bad an idea.
For those who prefer a diagram to study its inner workings, here it is:
Yes, we previously stated it is a recursive algorithm, but it's not hard to turn it into an iterative algorithm by using a stack. I'm a Forth programmer, so I know my stacks. It's quite small, but still vastly oversized for all current architectures.
int CircleSort (int* myarray, int n)
{
typedef int* ip;
ip a;
ip b;
ip sta; /* calculate start address */
ip end; /* calculate end address */
int s; /* temporary value for swapping */
ip stack [128];
ip* sp = stack;
do {
*sp = myarray; sp++; *sp = myarray + n - 1; sp++;
s = 0;
while (sp != stack)
{
sp--; b = end = *sp;
sp--; a = sta = *sp;
while (sta < end) /* sort as long as begin address */
{ /* is smaller than end address */
if (Compare (sta, end)) /* if first element is larger than end */
{ /* then swap 'em */
Swap (sta, end); s++;
} /* and increment no. swaps */
sta++; end--; /* now adjust addresses for next */
} /* iteration */
if (sta < b) { *sp = sta; sp++; *sp = b; sp++; }
if (a < end) { *sp = a; sp++; *sp = end; sp++; }
}
} while (s);
}
If you need a little more "oompf" you can change the inner loop accordingly:
while (sta < end) /* sort as long as begin address */
{ /* is smaller than end address */
if (Compare (sta, end)) /* if first element is larger than end */
{ /* then swap 'em */
Swap (sta, end); s++;
if ((sta + 1) < (end - 1))
{
if (Compare (sta, sta+1)) { Swap (sta, sta + 1); s++; }
if (Compare (end-1, end)) { Swap (end - 1, end); s++; }
}
} /* and increment no. swaps */
sta++; end--; /* now adjust addresses for next */
} /* iteration */
This will diminish both the number of of swaps and comparisons by significantly reducing the number of iterations on large arrays (> 100,000 elements). The price is a small increase in the number of swaps and comparisons on smaller arrays. Under the right conditions it is almost twice as fast.
Don't think that "what's worth doing is worth overdoing" - if you further increase the number of elements tested it will simply run slower. As you can see by the demonstration below, the distribution of the array is much smoother after the first iteration compared to the vanilla algorithm.
The power of Circlesort is that it is a very simple algorithm - and easy to remember. Note that some computer science faculties don't want to cover Bubblesort anymore, because they think it's such a braindead, crippled algorithm. Circlesort is a good substitute, because it is just as simple but really delivers a punch. Although it doesn't reach the raw speed of Quicksort on random data, its average performance is more predictable (Best: O(n log n), Worst: O(n log n log n)).
A paper on the subject can be found here.