CUDALucas - Browse /CUDA Libs at SourceForge.net

The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.
Name	Modified	Size	InfoDownloads / Week
Parent folder
CUDA-10.1-Libs-Windows.7z	2019-05-20	83.6 MB	0
CUDA-9.2-Libs-Windows.7z	2019-05-20	51.5 MB	0
CUDA-7.0-Libs-Windows.7z	2019-05-20	32.1 MB	0
CUDA-8.0-Libs-Windows.7z	2019-05-20	92.0 MB	0
CUDA-7.5-Libs-Windows.7z	2019-05-20	66.2 MB	0
CUDA-3.2-Libs-Windows.7z	2017-04-19	3.9 MB	0
CUDA-4.1-Libs-Windows.7z	2017-04-19	4.9 MB	0
CUDA-4.0-Libs-Windows.7z	2017-04-19	7.4 MB	0
CUDA-4.2-Libs-Windows.7z	2017-04-19	6.7 MB	0
CUDA-6.5-lib64-linux.tar.bz2	2015-02-27	29.6 MB	0
CUDA-6.0-lib64-linux.tar.bz2	2015-02-27	29.0 MB	0
CUDA-5.5-lib64-linux.tar.bz2	2015-02-27	20.5 MB	0
CUDA-5.0-lib64-linux.tar.bz2	2015-02-27	7.9 MB	0
CUDA-4.2-lib64-linux.tar.bz2	2015-02-27	5.4 MB	0
CUDA-6.5-Libs-Windows.7z	2014-10-17	42.3 MB	0
CUDA-6.0-Libs-Windows.7z	2014-04-28	53.2 MB	0
CUDA-5.5-Libs-Windows.7z	2014-01-12	26.8 MB	0
CUDA-5.0-Libs-Windows.7z	2014-01-12	10.2 MB	0
CUDA-Libs-Linux.tar.gz	2012-08-03	5.9 MB	1
Totals: 19 Items		579.3 MB	1
####################
# CUDALucas README #
####################

CUDALucas v2.06

Content

0    What is CUDALucas?
1    Supported Hardware
2    Compilation
 2.1 Compilation (Linux)
 2.2 Compilation (Windows)
3    Running CUDALucas
 3.1 Running CUDALucas (Windows)
4    How to get work and report results from/to GIMPS
6    Tuning
7    FAQ
8    To do list

################
# Info request #
################

If you have contributed to CUDALucas, please contact flashjh and owftheevil
on mersenneforum.org so we can add you to the list of developers. Please
include your contribution to CUDAlucas so it can be added to this README.

################
# Contributors #
################

If you would like to add or change this info, please send the updates to
flashjh and owftheevil

Non-inclusive list:

msft: Mr. Yamada, original and primary developer of the mathematics code
aspen:
Brain:
TheJudger:
monst:
Bdot:
Ethan (EO):
Dubslow:
Prime95:
owftheevil:
flashjh: Windows compiling and miscellaneous updates and testing

#######################
# 0 What is CUDALucas #
#######################

(See https://sourceforge.net/p/cudalucas/wiki/Home/ for a version of this
information that includes some links.)

CUDALucas is a program implementing the Lucas-Lehmer primality test for
Mersenne numbers using the Fast Fourier Transform implemented by nVidia's
cuFFT library. You need a CUDA-capable nVidia card with compute
compatibility >= 1.3.

Mersenne numbers are numbers of the form 2^p - 1. It is possible that some
of these numbers are prime. For instance, 2^7-1 is prime, 2^127-1 is prime,
and 2^57,885,161-1 is prime (and is also the largest known prime number in
the world). For various reasons explained in the Wikipedia article,
throughout almost all known history, the largest known prime number has
been a Mersenne prime.

Most CUDALucas users that the developers are aware of use this program to
help search for Mersenne primes in coordination with the Great Internet
Mersenne Prime Search. It is one of the internet's first distributed
computing projects, started in 1996. Since that year GIMPS has found all of
the largest known prime numbers. GIMPS searches for primes by doing some
"trial factoring" to find a small factor of a Mersenne number. If that
fails, then GIMPS performs the Lucas-Lehmer test to determine once and for
all if a Mersenne number is prime. You can participate without needing to
be aware of the mathematics involved. All you need to do is download and
run the free program GIMPS provides called Prime95.

However, Prime95 is optimized for CPUs. In the last few years, volunteer
developers from the GIMPS community have ported various parts of Prime95's
functionality to GPUs. Shoichiro Yamada took a CPU-based Lucas-Lehmer
testing program (written in generic C, as opposed to the x86-specific
assembly of Prime95) and ported it to CUDA. This is now known as CUDALucas.

The other GPU programs are mfaktc, a program using CUDA GPUs to perform the
"trial factoring" mentioned above, and mfakto which is a port of mfaktc to
OpenCL, supporting AMD/ATI graphics cards. (mfakto's developer maintains a
GitHub page for it. Both programs are free-and-open-source software under
the GPL.)

To participate in GIMPS yourself using CUDALucas (assuming you have the
necessary CUDA hardware listed above), all you need to do is go read
section 4. (and of course follow the directions given there.)


########################
# 1 Supported Hardware #
########################

CUDALucas should run on all CUDA capable Nvidia GPUs with compute
capability >= 1.3. Unfortunately this obviously excludes AMD/ATI GPUs, but
there are other programs for such cards that you can use to help GIMPS
(see section 0). 1.x is depreciated in CUDA 6.5 and may be removed from
newer versions of CUDA. 1.x is removed from CUDA >=7.0. 2.x is depreciated
in CUDA 8.0 and 2.x is removed from CUDA >=9.0.


#################
# 2 Compilation #
#################

You must have the CUDA Toolkit installed, as well as a C compiler.
gcc or MSVC will do in a pinch. The CUDA Toolkit includes nVidia's CUDA
compiler, as well as some necessary library and include files.
http://developer.nvidia.com/cuda-toolkit

Use the latest toolkit plus the defaults in the Makefiles, as required.

There are some different 'make' commands you can run, but at the moment
none of them does anything particularly interesting.


###########################
# 2.1 Compilation (Linux) #
###########################

compilation is not necessary, get a binary file from
https://sourceforge.net/projects/cudalucas/files/

To compile you will need the source files: CUDULucas.cu, cuda_safecalls.cu,
parse.h, and parse.c, and the Makefile. In the Makefile, adjust the CUDA
path to  point to your cuda installation (default: /usr/local/cuda).
You can produce a smaller executable by including only the architectures
you need to support. For example, if you only intend to run CUDALucas on a
GTX570, The CUFLAGS line could look like this:

CUFLAGS = -O$(OptLevel)  --generate-code arch=compute_20,code=sm_20
          --compiler-options=-Wall -I$(CUINC)

Now just run make from the folder where the source files are located.

#############################
# 2.2 Compilation (Windows) #
#############################

compilation is not necessary, get a binary file from
https://sourceforge.net/projects/cudalucas/files/

MSVS:
MSVS can make debug and non-debug versions from CUDA 4.0 and up.  To CUDA
4.0 thru 6.5 you need to have the applicable CUDA toolkit installed and
MSVS 2012.  CUDA 6.5 requires the toolkit and MSVS 2012 (last verified
with CUDA 5.5)
---------------------------------------------------------------------------
Detailed instructions for CUDA 4.0 to 5.0 are around on the internet
(or PM flashjh on mersenneforum for info)
---------------------------------------------------------------------------

How to create MSVS2012 solution from the latest Cudalucas 2.06, CUDA 6.5:

** You must have MSVS2012, the CUDA 5.5 Toolkit and current drivers
installed first **

1. Make new cuda project using project wizard
2. Delete kernel.cu that were created by default

Copy the following files to the project folder, then add them into
project (drag/drop into the MSVS GUI solution explorer)
3. Add cuda_safecalls.cu to project
4. Add cudalucas.cu to project
5. Add parse.c to project
6. Add parse.h to project

Project properties (These steps must be done for debug and release, Win32
and x64 options, as applicable!):

7. Linker|Input|Additional Dependencies, add cufft.lib after cudart.lib,
   if not already there
8. Linker|Debugging|Generate Debug Info, 'No' for Release, 'Yes' for Debug
9. CUDA C/C++|Common, change target machine platform to 64-bit or 32-bit
10. CUDA C/C++|Device, change code generation to: compute_13,sm_13;
    compute_20,sm_20;compute_30,sm_30;compute_35,sm_35;compute_50,sm_50
11. C/C++|Code Generation|Runtime Library, change to Multi-threaded (/MT)
    (release) or Multi-threaded Debug (/MTd) (debug)
12. Build Events|Post-Build Events|Command Line, add:

echo copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
copy "$(CudaToolkitBinDir)\cudart*.dll" "$(OutDir)"
echo copy "$(CudaToolkitBinDir)\cufft32*.dll" "$(OutDir)"
copy "$(CudaToolkitBinDir)\cufft32*.dll" "$(OutDir)"
echo copy "$(CudaToolkitBinDir)\cufft64*.dll" "$(OutDir)"
copy "$(CudaToolkitBinDir)\cufft64*.dll" "$(OutDir)"

13. Configuration Properties|General (as desired,these are just examples)
OUTPUT DIRECTORY: ..\..\..\Test\
DEBUG NAME: debug_$(ProjectName)-$(CudaToolkitVersion)-$(Platform)_r##
RELEASE NAME: (ProjectName)-$(CudaToolkitVersion)-$(Platform)_r##

WINDOWS MAKE:
This can make non-debug versions from CUDA 4.0 and up.  To compile CUDA 4.0
thru 6.5 you need to have the applicable CUDA toolkit installed and MSVS
2012.  CUDA 6.5 requires the toolkit and MSVS 2012.

** You must have the correct MSVS, the applicable CUDA Toolkit and current
drivers installed first **

1. Obtain make.exe from here:
   http://www.equation.com/servlet/equation.cmd?fa=make
2. Place make.exe into the folder with the sourcefiles and makefile.win
3. Use MUST use the 'command shortcut' included with the appropriate
   version of MSVS
   If you want x86, use the x86 and if you want x64, use x64.
   Use the shortcut from MSVS 2010 for CUDA 4.0 to CUDA 5.0 and MSVS 2012
   for CUDA 5.5/6.5
4. Open makefile.win and set your desired bit level, cuda and version and
   location of MSVS then save
5. Type: make -f makefile.win
6. When complete type make -f makefile.win clean
7. The executable is placed one directory up from your source files


##############################
# 3 Running CUDALucas        #
##############################

CUDALucas is designed to be primarily driven with the information in
CUDALucas.ini. (If you don't have a copy of that file, go to
https://sourceforge.net/projects/cudalucas/files/ to get the latest
version.) You can run CUDALucas from the command line without any
arguments, and it should read CUDALucas.ini (it should be in the same
directory) and start crunching.

CUDALucas reads what numbers to test from a "work file". The default work
file is "worktodo.txt", however you can change the name of that file in
CUDALucas.ini. The information in the work file should like something like:
Test=25613431 or DoubleCheck=25613431

CUDALucas will interpret this to mean "Test 2^25613431-1 to see if it's a
prime number." See section 4 on how to get numbers to test that haven't
been tested before. (This is done through GIMPS.) CUDALucas will keep
crunching numbers as long as there are assignments in your work file; it
will terminate if the file is empty. Alternately, you can just pass in a
single exponent as a command line argument, and CUDALucas will then test
2^arg-1 and exit.

When it's done testing a number, CUDALucas will output the results to a
"results file", which defaults to "results.txt"; again, however, you can
change that using the .ini file. We highly encourage you to report your
results to GIMPS (see section 4). You can keep track of your results if
you create an account with GIMPS.

You can modify a number of options that change how CUDALucas behaves;
again, see CUDALucas.ini. You can also specify any of those options from
the command line; try running "./CUDALucas -h" from a terminal to see what
options you can use. Also note that there is a self test mode and a
benchmark mode that can only be specified from the command line.

Note that you need library files to run CUDALucas; these are "cudart.dll"
and "cufft.dll" for Windows, and "cudart.so" and "cufft.so" for Linux. In
Windows, it's sufficient to put the .dll files into the same directory as
the executable; in Linux, you have to set the LD_LIBRARY_PATH environment
varibale to include the directory where the .so's are located.

It is safe to kill CUDALucas with a Ctrl+C (or by most any other method) at
any time. It will write a save file and exit. When you next run CUDALucas,
it will detect that there is a save file and resume.

Please feel free to ask for help on SourceForge or MersenneForum if this
isn't clear.


###################################
# 3.1 Running CUDALucas (Windows) #
###################################

Read the section above first.

Though CUDALucas is called from the command line, you can modify its
behavior with CUDALucas.ini, which means you don't need to pass arguments
on the command line. What this means for Windows users is that you can
right click on the executable and create a shortcut. Double clicking on the
shortcut should launch CUDALucas in a terminal where you can watch it
crunch. (The drawback to this is that if CUDALucas exits with an error, the
terminal will automatically close and you won't see the error message.)

############################
# 3.2 Command line options #
############################

-h                     prints a help message listing most of the command
                       line options and exits.

-v                     prints the program version number and exits.

-info                  causes current device info to be printed to the
                       screen at the beginning of the first test.

-k                     enables keyboard input during test, see ini file
                       description.

-polite n              sets the polite iteration interval to n, or disables
                       polite option if n = 0.

-d n                   sets CUDALucas to run on device d (default is 0)

-c n                   sets checkpoint iteration value. Checkpoints will be
                       written every n iterations.

-x n                   sets report iteration value. Screen reports will be
                       written every n iterations.

-f n<k|K|m|M>          sets fft length to n, n * 1024 (if k or K specified)
                       or n * 1048576 (if m or M specified). Values of n
                       that are not mutiples of 1024, or do not end in k,
                       K, m, or M will be rejected.

-threads m s           sets thread values for the multiplication and
                       splicing kernels. m and s should be powers of two
                       between 32 and 1024.

-i filename            sets the name of the file that the initialzation
                       information is to be obtained from. Default is
                       CUDALucas.ini.

-s <folder>            saves all checkpoint files to subdirectory
                       specified by "folder". Default folder is "savefiles"

-r n                   runs the short (n = 0) or long (n = 1) version of
                       the selftest.

-cufftbench s e i      s = lower test boundry (start)
                       e = upper test boundry (end)
                       i = number of iterations

                       Description: test cufft lengths i repetitions of a
                       50 LL iteration loop, for all reasonable fft lengths
                       between s * 1024 and e * 1024, then writes the
                       fastest fft lengths in the file <gpu> fft.txt.
                       Reasonable lengths are n * 1024 where the largest
                       prime factor of n is 7.

-threadbench s e i m   s = lower test boundry (start)
                       e = upper test boundry (end)
                       i = number of iterations
                       m = binary value for test (0 thru 16) (see below)

                       times i repetitions of a 50 LL iteration loop, for
                       certain ffts lengths between s * 1024 and e * 1024.
                       Each tested fft length gets combined with different
                       thread values for the multiplication and splicing
                       kernels. The fastest thread values for each fft are
                       written in the file <gpu> threads.txt. The
                       parameter m gives some control over which fft
                       lengths are tested, which thread values are tested,
                       and screen output:
                       bit 0: if set, only fft values from <gpu> fft.txt
                              will be tested, otherwise, all reasonable fft
                              lengths will be tested.
                       bit 1: if set, skips thread value 32.
                       bit 2: if set, skips thread value 1024.
                       bit 3: if set, supresses intermediate output: only
                              the optimal thread values for each fft will
                              be printed to the screen.


-memtest s i           s = # of chunks of memory
                       i = number of iterations

                       tests s 25MB chunks of memory doing i repetitions of
                       a 100,000 iteration loop on each of 5 different LL
                       test related sets of data. Each iteration consists
                       of copying a 25MB chunk of data, then re-reading
                       and comparing that copy to the original.

(see CUDALucas.ini for more info)

###################################################################
# 4 How to get work and report results from/to the GIMPS server   #
###################################################################

You can get numbers to test from the GIMPS server, which is called
PrimeNet. It is located at http://www.mersenne.org/. You can get and
work anonymously, however to track what numbers you've tested and track
your credit, you must create an account. You don't even need to enter your
email for an account, though of course it's easier if you ever lose your
login information :)

Getting work:
    Step 1) go to http://www.mersenne.org/ and (optionally) login with your
            username and password
    Step 2) on the menu on the left click "Manual Testing" and then
            "Assignments"
    Step 3) Choose the number of assignments you want. Note that even the
            smallest assignment will take a few days to complete, so we
            recommend you start with just one and come back for more when
            you know how fast you can complete work.
    Step 4) Choose your preferred work type. There are a variety of choices
            here; you can choose the default "World record tests", which
            means if your number is prime, it would be a world record.
            "Smallest available first time tests" might be
            not-World-record, though practically it's exactly the same as
            "World record tests". "100 million digits" is way beyond what's
            currently feasible, and will take months or years to complete
            one test. This isn't recommended. Finally, "Double Check tests"
            is where you get numbers that have been tested once, but
            haven't been double checked. Though Double Checking sounds less
            glamorous, we currently recommend this work type. Not only are
            the assignments shorter, but at the moment, two matching
            CUDALucas tests will not mark an number as "Double Checked".
            (This is for safety reasons, and hopefully we'll add more
            functionality in the future to remove this restriction.)
            What this means is that it's safer for CUDALucas to test
            numbers that have been tested once with Prime95, though some
            people do first time tests anyways.
    Step 5) Click the button "Get Assignments"
    Step 6) Copy and paste the "Test=..." (or "DoubleCheck=...") lines
            directly into your work file (default "worktodo.txt") in your
            CUDALucas directory.

Now you're all set. :) Just launch CUDALucas and watch it crunch :)

Once CUDALucas has finished a test, report the result to PrimeNet:
    Step 1) go to http://www.mersenne.org/ and (optionally) login with
            your username and password. (Again, if you want to track your
            credit, logging in is necessary.)

    Step 2) On the menu on the left click "Manual Testing" and then
            "Results"
    Step 3) Upload the results file (default "result.txt") generated by
            CUDALucas by using the "Search" and "Upload" buttons.
    Step 4) Once PrimeNet responds with a verification message, you can
            either delete your results file or move the data to a different
            file.

Advanced usage (set the FFT length):
    Using the cufftbench and the threadbench options above, CUDALucas is
    cabable of selecting the best FFT length for the exponent being tested.
    If you desire, you can specify the FFT length by adding a field to the
    "Test=..." assignment line in the work file. (e.g.) To use a 1440K
    length for a test, the line should look like
    "Test=<assignment key>,<exponent>,1440K".
    Note that no space is allowed between the number (1440) and the K. You
    must have a K or M (e.g. "...,<exponent>,3M" for a 3M length) for the
    program to recognize the field as an FFT length.

    The FFTLength ini option and the -f command line are depreciated and
    may be removed from future releases.



##################
# 5 Known issues #
##################

- The user interface is not hardened against malformed input. There are
  some checks but when you really try you should be able to screw it up.

- The GUI of your OS might be very laggy while running CUDALucas

  If you're experiencing this problem, try setting "Polite" to 1 and
  "PoliteValue" to 50 in CUDALucas.ini. (decreasing PoliteValue increases
  the gpu wait time)

- For those experiencing driver stops: This is an nVidia driver issue.
  Here is some info and some workarounds:

  The problem started in driver version 310.70 and is reported to only
  effect compute version 2.0 (5xx series) cards.  Here is (known) info
  about the bug:

     It hangs during a cufft call.
     It is specific to compute 2.0 cards.
     It is most likely not a problem with cufft:
       cuftt4.2 with Nvidia driver <=310.70 works,
       cufft4.2 with driver >310.70 shows the bug.
     In Linux, the driver is reset inside CUDALucas, no user action is
     necessary. In Windows, the devices are deactivated after the timeout,
     so CUDALucas needs to be restarted.

  Workarounds (no fix because nVidia hasn't fixed the driver)
     - Downgrade to nVidia driver version <=306.97 and run applicable CUDA
       version
     - Use a batch file to restart CUDALucas after it stops. This batch
       file will open CUDALucas and count restarts (copy and paste in a
       .bat file)

        @echo off
        Set count=0
        Set program=CUDALucas
        :loop
        TITLE %program% Current Reset Count = %count%
        Set /A count+=1
        rem echo %count% >> log.txt
        rem echo %count%
        %program%.exe
        GOTO loop

     - Go into the registry and modify (or add) the TdrDelay DWORD.  Make
       it 128 (dec). Restart the system and try again. Use the batch file
       to track driver stops. Modify the
       "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\
       GraphicsDrivers" TdrDelay (DWORD)

     - Buy a new GPU :)

   The driver stop issue does not seem to cause bad results.  Many
   successful tests were done with 2.06 Beta, CUDA 4.0 up to 6.5, 32 and 64
   bit involving many stops, restarts, and forced FFT size changes. If
   you're experiencing the issue, just get a workaround going and you'll
   be ok.

- Overclocking is very bad for CUDALucas.  CUDALucas relies on perfect
  computation of every single iteration for each test result.  Through
  extensive testing, overclocking has been shown to cause bad results.  If
  you must overclock, you should use the memtest option extensively to
  verify your memory is good and stable before trying to produce 'real'
  results. Until your system is stable, it is recommended you run several
  doublecheck tests as to be able to verify your reliability before going
  at 1st time LL tests.

############
# 6 Tuning #
############

Read CUDALucas.ini (you should have already read it in any case). You can
also activate extra error checking, as well as an option to save all
checkpoint files instead of just the most recent ones.You can also
activate as an option to save all checkpoint files instead of just the most
recent ones.

Suggested tests:

 CUDALucas -cufftbench 1024 8192 5
 CUDALucas -threadbench 1024 8192 5 0
 CUDALucas -r 1
 CUDALucas 6972593

A new card should have some integrity checks run on it. Options -r 0 or
-r 1 will run self tests which check residues after 10000 iterations for
various exponents. -r 0 runs a short test, testing only a few known
Mersenne primes. -r 1 is a more thourough test. Any residue mismatches in
these tests usually indicate memory problems with the card. For a more
complete memory test, use -memtest n i. Choose n and i so that the test
runs for at least a few hours. If memory errors are detected, decrease the
memory clock until the errors go away. An additional 1-2% decrease from the
initial stability point is recommended.

cufftbench
 To optimize fft selection for your card, run -cufftbench s e i
 All reasonable fft lengths between s * 1024 and e * 1024 will be tested i
 times, where reasonable is defined as 7 smooth multiples of 1024. Cards
 driving a display usually require higher values of i. The results of the
 test are written in <gpu> fft.txt. Any old version of the fft.txt file is
 saved with a time stamp added to the file name. The fft.txt file consists
 of the fastest fft lengths for the particular card, listed in increasing
 order. Each line starts with the fft length (as a multiple of 1024) and
 also includes an estimate of the largest exponent that fft length can be
 used with and the iteration time. The fft.txt can be edited, but it
 requires the fft length to be the first entry on any line, and that the
 ffts are listed in increasing order. Any line without an initial numerical
 entry is ignored.

threadbench
 The option -threadbench s e i m times different threads settings for two
 kernels, the kernel that does the pointwise squaring and the carry
 splicing kernel. fft lengths from s * 1024 to e * 1024 are tested, using
 values from 32 to 1024 for the threads settings. Just as with the
 cufftbench option, i iterations are done at each setting. Larger values of
 i are needed for cards driving a display. The fastest times and associated
 threads values are appended to <gpu> threads.txt. Only the most recent
 results are used. This file can also be edited manually. Each line should
 start with an fft length (as a multiple of 1024), followed by the threads
 value for the squaring kernel and then the threads value for the splicing
 kernel. The threads values must be powers of 2 between 32 and 1024.

Error check interval
 If set to n, will check the roundoff error once every n iterations.
 Slowest, but most accurate is with ErrorIterations=1. With any larger
 value, the reported roundoff error is most likely smaller than the largest
 roundoff error. Error thresholds are accordingly reduced for such values.
 For example, if the error threshold is set to 45 and ErrorIterations is 1,
 then any roundoff error <= .45 is ignored. Any roundoff error > .45
 triggers the error handling routines. But for ErrorIterations set to 100,
 any roundoff errors > .35 will trigger the error handling routines.

Screen report interval
 Screen report iterations involve extra memory writing on the device, as
 well as a memory transfer from device to host, together with some minimal
 host processing. Very frequent screen reports (once every few seconds)
 result in a noticeable slowdown and increased cpu utilization.

Checkpoint interval
 Checkpoint iterations involve a significant amount of cpu processing
 preparing the checkpoint file, besides writing the checkpoint file to the
 disk and backing up the old checkpoint file. Because of this, checkpoint
 iterations take significantly longer than non-checkpoint iterations.
 Less frequent checkpoints mitigate this delay, but risk losing time in
 case of power outage or other unexpected termination of the program.

Error Reset
 At each screen report, a roundoff error is computed. This roundoff error
 is either the the largest roundoff error encountered since the last report
 or a percentage of the last reported roundoff error, whichever is larger.
 The percent is given by the variable ErrorReset. Recording a new roundoff
 error is a slow process. Larger values of the error reset variable skip
 recording a new value more often, speeding up the iteration times. Small
 values report smaller roundoff errors that larger values ignore.

Polite
 The polite option can be used to introduce some idle time to the gpu.
 If Polite=1 and PoliteValue=n, then once every n iterations the gpu is
 synchonized, preventing any new work from being scheduled until all
 previouly assigned work is completed. n=50 is default in .ini

#########
# 7 FAQ #
#########

Q. What's new in 2.06?

A. - RCB
   - On-the-fly FFT selection.  Keyboard driven or automatic if error level
     exceeds threshold
   - Included GPU memtest and tools to automate FFT finetuning and thread
     selection
   - Bit shift to prevent errors from producing similar results

Q. Does CUDALucas support multiple GPUs?

A. Yes, with the exception that a single instance of CUDALucas can only
   use one GPU. For each GPU you want to run CUDALucas on you need (at
   least) one instance of CUDALucas. For each instance of CUDALucas you can
   use the commandline option "-d <GPU number>" to specify which GPU to use
   for each specific CUDALucas instance. Please read the next question,
   too.

Q. Can I run multiple instances of CUDALucas on the same computer?

A. Yes! You can even run more than one instance from the same dir. Use the
   "CUDALucas -i <ini filename>" command line option to specify an INI file
   other than the default "CUDALucas.ini". Each instance must have its own
   work file, however it is safe for all instances to print to the same
   results file. It is NOT safe for two instances to test the same exponent
   -- they will clobber each others' save files.

Q. Can I continue (load a checkpoint) from a 32bit version of CUDALucas
   with a 64bit version of CUDALucas (and vice versa)?

A. Yes!

Q. Can I continue from an old version of CUDALucas to a new version without
   losing my spot in the check?

A. No. Version 2.06 uses a different checkpoint file format from previous
   versions, as well as a crc to ensure integrity of the data in the
   checkpoint file. Running CUDALucas in a directory with an old version
   of the checkpoint file will cause the test to restart at iteration 0,
   eventually replacing the old checkpoint files with new ones. Checkpoint
   files for versions 1.6x -- 2.04 should be interchangable. Complete your
   check on the old version and then switch to 2.06

Q. Version numbers

A. Release numbers are X.XX, where the first number has now reached 2, and
   the other two just go up by one with each release. If you get a version
   with "Alpha" or "Beta" in it, then it probably doesn't work right and
   you shouldn't use it for "production" work.


Q. What files should I download from SourceForge?

A. There are various executables for both Windows and Linux, some text
   files, and some necessary library files. Pick whichever executable is
   best for you, then download all the text files including this README and
   CUDALucas.ini. Next, if you don't already have them, pick the library
   archive for your operating system. Place all these files into your
   CUDALucas folder, and then read the rest of this README.

   For Windows versions, Win32 is slightly faster than x64 (for most FFT
   lengths).
   CUDA 5.5 is faster than previous versions for Linux and Windows

Q. I get errors about "lib cudart not found" or "lib cufft not found".
   What can I do?

A. Download the applicable lib files from SourceForge.  The files must
   match the CUDA version CUDALucas was compiled for as listed in the name


###########
# 8 To do #
###########

2.07:
- Add log capability

- Much of the code is placed in the wrong functions, so the interface
  between functions is often extremely awkward. I'd like to fix this,
  especially since it will go a long way to adding log functionality for
  2.07.

- Automate BigCarry? (or just make it mandatory?)

- Add a command line switch to automate initial setup and burn-in (-setup)

- automatic primenet interaction (Eric Christenson is working on this for
  mfaktc)
                             ^ specification draft exists
   **For now, use MISFIT-CULU by Scott Lemieux
   http://www.mersenneforum.org/misfit/

- The security module that would be used would be closed source, to
  maintain integrity of PrimeNet's data. GPL v3 does not allow to have
  parts of the program to be closed source. Solution: We'll re-release
  under another license. This is NOT the end of the GPL v3 version! We'll
  release future versions of CUDALucas under GPL v3! We want CUDALucas
  being open source! The only differences of the closed version will be the
  security module and the license information.
Source: README, updated 2019-05-21
CUDALucas Files

A program that uses CUDA to accelerate the Lucas Lehmer test.