From: Lu, M. <mir...@ne...> - 2014-03-28 04:00:02
|
Hello everyone, I am a Ph. D student who's working on pine tree. Recently, I want to use "CreateSequenceDictionary" in picardtools/1.107 to build a "dictionary" file for my reference fasta file and use it for GATK. Our reference sequences is 23 Gb, and it includes 144 million sequences now. I've tried below commands in our server: 1)picard-tools CreateSequenceDictionary R=ptaeda.v1.01.scaffolds.fasta O=ptaeda.v1.01.scaffolds.dict 2)picard-tools CreateSequenceDictionary R=ptaeda.v1.01.scaffolds.fasta O=ptaeda.v1.01.scaffolds.dict MAX_RECORDS_IN_RAM=5000000 3)java -Xms16g -Xmx16g -XX:MaxPermSize=16g -jar /share/apps/picard-tools-1.107/CreateSequenceDictionary.jar MAX_RECORDS_IN_RAM=500000000 R=ptaeda.v1.01.scaffolds.fasta O=ptaeda.v1.01.scaffolds.dict By the way, when I submitted the job, I use sbatch -N 1 -n 2 jobname.sh (our server is slurm system, N means node, n means CPU) But I kept meeting this error: ____________________________________________________________________________________________ Module BUILD 1.6 Loaded. Module slurm/2.6.2 loaded Module picardtools/1.107 loaded [Thu Mar 27 09:55:50 PDT 2014] net.sf.picard.sam.CreateSequenceDictionary REFERENCE=ptaeda.v1.01.scaffolds.fasta OUTPUT=ptaeda.v1.01.scaffolds.dict MAX_RECORDS_IN_RAM=500000000 TRUNCATE_NAMES_AT_WHITESPAC E=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false [Thu Mar 27 09:55:50 PDT 2014] Executing as mmlu@bigmem1 on Linux 3.8.0-35-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_25-b30; Picard version: 1.107(1667) IntelDeflater [Thu Mar 27 16:35:01 PDT 2014] net.sf.picard.sam.CreateSequenceDictionary done. Elapsed time: 399.17 minutes. Runtime.totalMemory()=16337993728 To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2367) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535) at java.lang.StringBuffer.append(StringBuffer.java:322) at java.io.StringWriter.write(StringWriter.java:94) at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129) at java.io.BufferedWriter.write(BufferedWriter.java:230) at java.io.Writer.write(Writer.java:157) at java.io.Writer.append(Writer.java:227) at net.sf.samtools.SAMTextHeaderCodec.println(SAMTextHeaderCodec.java:389) at net.sf.samtools.SAMTextHeaderCodec.writeSQLine(SAMTextHeaderCodec.java:444) at net.sf.samtools.SAMTextHeaderCodec.encode(SAMTextHeaderCodec.java:368) at net.sf.samtools.SAMTextHeaderCodec.encode(SAMTextHeaderCodec.java:353) at net.sf.samtools.SAMFileWriterImpl.setHeader(SAMFileWriterImpl.java:126) at net.sf.samtools.SAMFileWriterFactory.makeSAMWriter(SAMFileWriterFactory.java:200) at net.sf.picard.sam.CreateSequenceDictionary.doWork(CreateSequenceDictionary.java:120) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:179) at net.sf.picard.sam.CreateSequenceDictionary.main(CreateSequenceDictionary.java:93) ______________________________________________________________________________________________ Could you advise me on this problem,please? I am worried this problem is due to our huge reference size. Could I split the reference sequences to several parts and build .dict files separately, then merge them at last? Thanks very much! Best, Mengmeng Lu |