Using_Snappy_in_Picard

Alec Wysoker

Snappy is an open-source compression library that emphasizes fast compression over maximum compression. Snappy-java is a Java JNI wrapper for Snappy. If the Snappy-java library is installed on a system and made available to the JVM then Picard programs that sort large amounts of data will use it to compress temporary data before writing it to disk. This is a performance optimization only and is entirely optional. Snappy can reduce file I/O, and in some cases the savings in file I/O more than compensates for the time spent compressing and decompressing, and results in faster execution times. We have seen sort times reduced by nearly 25%, although your mileage may vary.

Requirements for using Snappy

In order for Picard to use Snappy, two requirements must be met: 1) the Snappy-java classes must be available on Java's classpath; and 2) the Snappy-java DLL (dynamic-link library) must be available to Snappy-java.

Making Snappy-java classes available

On Linux only, if you use one of the executable jars in the Picard bundle (e.g. SortSam.jar), the Snappy-java classes are included in the executable jar so you need do nothing additional. However, if you are using sam-<version#>.jar, or it you are on , then you must also add the Snappy-java jar to the classpath. The jar can be found in the picard-tools-<version#>.zip file.

Obtaining the Snappy-java DLL

TYhe Snappy-java jarfile contains Snappy DLLs for the most popular platforms, so you may extract it from there. Note that the Snappy-java DLL is not the same as the Snappy DLL, because it contains JNI (Jave native interface) code in addition to the Snappy code itself.

Making the Snappy DLL available

If the Snappy jar is on your classpath, Snappy-java can find the Snappy (DLL) automatically. The Linux version of the Snappy-java DLL is included in the executable jars. On other platforms, you must make the Snappy DLL available for Snappy-java to load. There are several ways this can be done:

  • Set the org.xerial.snappy.lib.path system property to point to the directory containing the Snappy-java DLL. E.g. add an argument like the following to your Java command line: -Dorg.xerial.snappy.lib.path=&lt;directory containing Snappy DLL&gt;.
  • Put the Snappy-java jar on your classpath, e.g. add an argument like the following to your Java command line: -classpath snappy-java&lt;version#&gt;.jar.

Controlling Snappy in Picard

In order to see if Snappy has been found by Picard, pass -Dsnappy.loader.verbosity=true on the Java command line. When SortingCollection first tries to sort something, a message will be printed to stderr indicating whether or not Snappy-java and Snappy DLL have been found.

In order to force Snappy not to be used by Picard, pass -Dsnappy.disable=true on the Java command line.


Related

Wiki: Main_Page