Menu

Algorithm of the Week

Gregory M Lyon Jay Jay Billings Dasha

Our mean, scary, big moustached boss, has us investigate "code stuff" once per week. We travelled to many countries and had to go into undercover to find the source of truth behind these algorithms.

~ Greg and Forest

Said mean, scary, big moustached boss is still reviewing the page. If you find an error on it, please contact the team and let us know so that the guys can figure out how to fix it!

~ Jay

2011-12-12 Recursive Quicksort

A pivot based divide & conquer algorithm for sorting lists.

This first decides on a pivot and puts numbers greater on the right, and lesser on the left.

It works down to the bottom, then works its way back up the tree its created.

Due to the recursive nature of this sort, memory and speed are negatively impacted. Calling back to the beginning of the same method causes the stack and overhead to quickly become an issue.

Overhead: Saving return address, creating & destroying autovariables.

Stack Overflow: Creating too many databases will overload the RAM and cause crash or worse.

When to use: It will always be better to use an iterative quicksort. - Why would you never use it? It's a perfectly good algorithm? ~JJB

Fastest: O(n log n)

Average: O(n log n)

Slowest: O(n^2)

2011-12-19 Non-Recursive Quicksort

Iterative quicksort.

The Quicksort algorithm remains the same, while the recursion is eliminated with either a loop (tail recursion) or pointer (body/ head recursion). Iterations allow control of memory management greatly improving performance and avoiding crashes. (With recursion, the data structures and algorithms will make a difference in performance, but it will never be better to use recursion.)

Iteration: Declare stack (autovariables, where to return to, return value), autovariables are replaced with fields, while loop replaces recursive call (an object is pushed through the values on the stack). Compilers often eliminate tail recursion because it is so easy to fix.

When to use: For small or moderately sized lists, then for very large lists one may use a binary sort. A binary sort has a slower start-up, but over time compensates with faster sorting. The fastest sorting in binary is O(n) and in quicksort it is O(n log n). What about for large lists? ~JJB

A divide & conquer algorithm that can search for a value in a sorted list.

This dichotomic algorithm is much faster than a linear search or Quicksort, and much more complex.

It is slower, and much less complex than a hash search. Should not be used in linked list because it needs to directly access an array list to work even as efficient as a linear search. Otherwise this sort is acceptable for rather large lists. Approaching billions in list (whatever no longer fits on the main memory of the computer) this list will still operate efficiently.

This is efficient only for large lists though, because of initial set-up time for preparing data structures and such. After it begins running, it becomes linear on a log graph, meaning it takes less a less additional time as the list grows larger.

When to use: For large lists.

Fastest: O(n)

Average: O(n log n)

Slowest: O(n log n)

2012-01-24 Linear Congruential Generator

_A generator that uses recurrence relation to create pseudo-random numbers. _

1. Linear Congruential Generator (LCG): This is a PRNG. True random number generators (TRNG's). This type uses a physical source of randomness to provide truly unpredictable numbers. TRNG's are mainly used for cryptography, because they are too slow for simulation purposes.

Quasirandom number generators (QRNG's). These generators attempt to evenly fill an n-dimensional space with points, without clustering or grouping of points.

Pseudorandom number generators (PRNG's). The most common type of random number generator, PRNG's are designed to look as random as a TRNG, but can be implemented in deterministic software because the state and transition function can be predicted completely.

LCG's are fast and require minimal memory (typically 32 or 64 bits) to retain state. This makes them valuable for simulating multiple independent streams. LCG's should not be used for applications where high-quality randomness is critical. A long period and good statistical quality is desired.

The Java Random class uses this algorithm, one of the flaws of this algorithm is that the random numbers it generates "fall mainly in the planes”. IE: Picking series of numbers from Random or any generator using the LCG algorithm will always introduce artifacts into the combinations of numbers produced. For example, generating strings of a certain length with successive calls to nextInt() will always produce strings with a subtle correlation between the characters in the string.

Another serious drawback: in the numbers produced, randomness depends on bit position. That is, in the numbers produced by lower bits will be less random than upper bits.

  1. The Mersenne Twister has a period of 219,937 and extremely good statistical quality. However, it presents problems because it has a large state that must be updated serially. Thus each thread must have an individual state in global RAM and make multiple accesses per generator. In combination with the relatively large amount of computation per generated number, this requirement makes the generator too slow, except in cases where the ultimate in quality is needed.

  2. The XORShift algorithm is a vast improvement on LCG generators. However, it does still have at least a couple of weaknesses:

When used "raw", it will never produce the value 0; used to produce long values as a sequence of calls, there will always be one possible value that won't be produced;

The generator will occasionally go through cycles in which few bits are set in the numbers generated.

Has period of 264. LCG's use a transition function of the form xn + 1 = (axn +c) mod m. The maximum period of the generator is m (assuming the triple (a, c, m) has certain properties), but this means that in a 32-bit integer, the period can be at most 232, which is far too low. LCG's also have known statistical flaws, making them unsuitable for modern simulations. Note that "long period" doesn't necessarily mean "high quality generator". Memory usage plays a large role as well.

Example:

    public long randomLong() {
                x ^= (x
  1. Multiply-with-carry: George Marsaglia's algorithm to combine efficiency of the XORShift and have a higher period than the Mersenne Twister. This is the best overall PRNG currently available.

When to use: When the quality of security is less important than speed in your operation.

2012-03-15 Smart Pointer

_A c++ pointer that deletes/deallocates itself when it is no longer needed. _

A pointer is a simple, more concrete implementation of the more abstract reference data type. It "points to" another value stored in the computer memory using its address.

A smart pointer is an abstract data type that simulates a pointer while providing additional features, such as automatic garbage collection or bounds checking. It is intended to reduce bugs caused by the misuse of pointers while retaining efficiency. All references in java are essentially intrinsically smart pointers.

The types of smart pointers are:

shared_ptr<T> Generic, most versatile using a reference count to determine when the object is no longer needed

scoped_ptr<T> A pointer automatically deleted when it goes out of scope. No assignment possible, but no performance penalties compared to "raw" pointers

intrusive_ptr<T> A reference counting pointer, providing better performance than shared_ptr, but requires the type T to provide its own reference counting mechanism

weak_ptr<T> a weak pointer, working in conjunction with shared_ptr to avoid circular references

When to use: The negligibly small extra overhead versus the safety and efficiency gained, make the smart pointer always better to use than a standard pointer.


Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.