From: Baas, B. <Bra...@ep...> - 2012-02-29 16:49:33
|
Hello, I am trying to get the formatting for refFlat.txt.gz as an input into Picard Tools. I am currently working with the ensembl build of Macaca mulatta (Macaque), but have more than a dozen Ensembl genomes that I need to do this for. I know that there is an inconsistency with the chromosome naming between the ensembl build and picard tools, so I took A "refFlat.txt.gz" and $ zcat refFlat.txt.gz | awk '{ OFS = "\t"; sub( /.*/ , "chr&" , $3); print $0; }' > refFlat.new.txt This fixed the chromosome naming to include "chr" infront of the chromosome numbers, it also introduced several aberrant "tabs" which I then removed and $gzip refFlat.new.txt To make a refFlat.new.txt.gz I then executed $ java -Xmx2g -jar ~/picard_tools_1.60/picard_tools_1.60/CollectRnaSeqMetrics.jar REF_FLAT=refFlat.new.txt.gz CHART_OUTPUT=coverage.pdf INPUT=accepted_hits.bam OUTPUT=CollectRnaSeqMetrics.txt STRAND_SPECIFICITY=FIRST_READ_TRANSCRIPTION_STRAND REFERENCE_SEQUENCE=monkey.fa VALIDATION_STRINGENCY=SILENT I never get the "coverage.pdf" and the CollectRnaSeqMetrics.txt is missing the majority of the metrics. When I do the same process with a UCSC built genome refFlat.txt.gz files, picard works wonderfully. Is there a database of refFlat.txt.gz files for picard that are formatted correctly when using ensembl built genomes? Thanks so much! Brad |