Home / v0.1
Name Modified Size InfoDownloads / Week
Parent folder
parser.cpp 2016-05-06 10.9 kB
LICENSE 2016-05-05 1.1 kB
README.txt 2016-05-05 1.8 kB
Totals: 3 Items   13.8 kB 0
### IntParser - code for parsing and subsetting UKBB binary format data

WARNING this code is largely UNTESTED. 

Usage: 
 parser -g genotype_map.csv [ -t -e ] [ -r rsids.txt ] [ -a affyids.txt ] [ -o outfile ] input.int

Arguments:
 -g genotype_map.csv  (if you don't know what this is, you are probably in the wrong place)
 -t/-e (transpose data to output per-SNP or evoker binary format data. If not given, per-individual data is outputted)
 -r rsids.txt/-a affyids.txt (pick a subset of SNPs using a file of rsids or affy ids)
 -o outfile (optionally write data. Otherwise, output is written to stdout)
 
 
### EXAMPLES 


## Compile the code
> g++ -o parser parser.cpp


## output intensity data one individual per-line to stdout
> parser -g genotype_map.csv input.int > output.txt

## output intensity data one individual per-line to a file for a subset of SNPs specified by rsid
> parser -g genotype_map.csv -r rsids.txt -o output_rsids.txt input.int

NOTE: use -a file.txt for a list of affy IDs

## output intensity data one SNP per line to stdout
> parser  -g genotype_map.csv -a affyids.txt -t input.int > output_affy_transposed.txt

WARNING: - reads data for every SNP into memory, and requires 16*Nsnp*Nvar bytes of RAM. So if you read in 100,000 SNPs for 500,000 individuals you will need 800Gb. If you run out of memory, try subsetting using -r or -a and then concatonating the data.

## Generate binary evoker output (untested, there is about a 10% chance that this works. Same memory warning as above):
 > parser -g genotype_map.csv -e -o output_evoker.bnt input.int 

(you will need to make your own binary plink data to run evoker with. Be careful of allele order!)

For bug reports or questions contact Luke Jostins <lj4@well.ox.ac.uk>
Code is published under an MIT license 
Source: README.txt, updated 2016-05-05