| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| parser.cpp | 2016-05-06 | 10.9 kB | |
| LICENSE | 2016-05-05 | 1.1 kB | |
| README.txt | 2016-05-05 | 1.8 kB | |
| Totals: 3 Items | 13.8 kB | 0 | |
### IntParser - code for parsing and subsetting UKBB binary format data WARNING this code is largely UNTESTED. Usage: parser -g genotype_map.csv [ -t -e ] [ -r rsids.txt ] [ -a affyids.txt ] [ -o outfile ] input.int Arguments: -g genotype_map.csv (if you don't know what this is, you are probably in the wrong place) -t/-e (transpose data to output per-SNP or evoker binary format data. If not given, per-individual data is outputted) -r rsids.txt/-a affyids.txt (pick a subset of SNPs using a file of rsids or affy ids) -o outfile (optionally write data. Otherwise, output is written to stdout) ### EXAMPLES ## Compile the code > g++ -o parser parser.cpp ## output intensity data one individual per-line to stdout > parser -g genotype_map.csv input.int > output.txt ## output intensity data one individual per-line to a file for a subset of SNPs specified by rsid > parser -g genotype_map.csv -r rsids.txt -o output_rsids.txt input.int NOTE: use -a file.txt for a list of affy IDs ## output intensity data one SNP per line to stdout > parser -g genotype_map.csv -a affyids.txt -t input.int > output_affy_transposed.txt WARNING: - reads data for every SNP into memory, and requires 16*Nsnp*Nvar bytes of RAM. So if you read in 100,000 SNPs for 500,000 individuals you will need 800Gb. If you run out of memory, try subsetting using -r or -a and then concatonating the data. ## Generate binary evoker output (untested, there is about a 10% chance that this works. Same memory warning as above): > parser -g genotype_map.csv -e -o output_evoker.bnt input.int (you will need to make your own binary plink data to run evoker with. Be careful of allele order!) For bug reports or questions contact Luke Jostins <lj4@well.ox.ac.uk> Code is published under an MIT license