1. Summary
  2. Files
  3. Support
  4. Report Spam
  5. Create account
  6. Log in

Main Page

From vdjfasta

Jump to: navigation, search

Contents

VDJFasta

VDJFasta was developed for the analysis of antibody repertoires by high-throughput sequencing. It was was published as part of the methods in

"Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire" - Glanville, Zhai, Berka et all 2009 PNAS

http://www.pnas.org/content/early/2009/10/28/0909775106.short

The software was released as part of the methods of that publication, and is made available to the community for non-commercial purposes. The software is provided "as is," with no implied support. For those interested in contributing code to the package, the author can be reached at Jacob <dot> Glanville <at> pfizer <dot> com.

Dependencies

VDJFasta requires NCBI's blast toolkit, HMMER's hmm toolkit, and Perl.

Installation

NOTE: The current installation is easy, but crude. Any assistance in grooming the package to improve the installation experience would be greatly appreciated

The package can be unpacked as

$ tar -xzvf vdjfasta-1.0.tgz

This will create the following directory structure

vdjfasta/bin/			<-- bin directory with executable perl scripts
        /db/			<-- blast and hmm reference databases
        /lib/			<-- perl module libraries
        /test/			<-- example sequences to test the software
        /README.txt		<-- this readme file

Once unpacked, a number of paths will need to be set in the constructor of vdjfasta/lib/VDJFasta.pm Adjust the paths to your current Blast and HMMER installations:

  $self->{blast}        = "/tools/apps/NCBI/blast-2.2.17/bin/blastall";
  $self->{hmmsearch}    = "/tools/apps/HMMER/hmmsearch";
  $self->{hmmalign}     = "/tools/apps/HMMER/hmmalign";
and the the current location of your vdjfasta installation:
  $self->{dnaVsegdb}    = "/tools/vdjfasta/db/imgt.VDJ.dna.nr.fa";
  $self->{dnaJsegdb}    = "/tools/vdjfasta/db/imgt.J.dna.fa";
  $self->{dnaDsegdb}    = "/tools/vdjfasta/db/imgt.HD.dna.nr.fa";
  $self->{dnaCsegdb}    = "/tools/vdjfasta/db/imgt.CH1.dna.nr.fa";

  $self->{VhVkHMM}      = "/tools/vdjfasta/db/Vh-linker-Vk.hmm";

Finally, add vdjfasta/bin to your path.

Description

The scripts available in the package are all located in vdjfasta/bin. Terse usage descriptions are provided if the scripts are executed without arguments.

 bin: scripts available in the package
     fasta-vdj-pipeline.pl    complete analysis of antibody nucleotide sequences
     fasta-dna2ig.pl          identify and translate antibody coding frames
     fasta-getGermline.pl     identify segment composition
     fasta-scFv-align.pl      align antibody sequences
     fasta-vdj-sim.pl         perform antibody vdj simulations
     msa-restack.pl           perform CDR-restacking on scFv output alignment

Output

The fasta-vdj-pipeline.pl carries dna sequences from initial screening through full segment characterization, translation, and amino acid identification of CDRs. An example of input and output is shown below:

Input would be a fasta file containing between 1 and 800,000 sequences (the upper limit is due to memory constraints in the HMMER software).

>54_100
NNNNNNNNNNNNGGGCGATTGNTTTAGCGGCCGCGAATTCGCCCTTTGAAACACCTGTGGTTCTTCCTCCTCCTGGTGGC
AGCTCCCAGATGGGTCCTGTCTCAGGTGCAGCTGCAGGAGTCGGGCCCAGGACTGGTGAAGCCTTCGGGGACCCTGTCCC
TCACCTGCGCTGTCTCTGGTGGCTCCATCAGCAGTAGTAACTGGTGGAGTTGGGTCCGCCAGCCCCCAGGGAAGGGGCTG
GAGTGGATTGGGGAAATCTATCATAGTGGGAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATATCAGTAGA
CAAGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCGCGGACACGGCCGTGTATTACTGTGCGAGAAAGC
TGGGGATTAAGTATGCTTTTGATATCTGGGGCCAAGGGACAATGGTCACCGTCTCTTCAGAGAGTCAGTCCTCCCCAACT
GTCTTCCCCCTCGTCTCCTGCGAGAGCCCCCTGTCTGATGAGAATTTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCT
GCCCAGCTCCATTTCCTTCTCCTGGAACTACCAGAACAACACTGAAGTCATGCAGGGTGTCAGAACCTTCCCAACACTGA
GGACAGGGGACAAATACACAGCTACCTCGCAGGTGTTACTGTCCGCCAAAAATGTCCTTGAAGGTTCAGATGAATACTTG
GTATGCAAAATCCACCATGGCAACAAAAATAAAGATCTGCATGTGCCGATTCCAGCTGTCGTTGAGATGAACCCCAATGT
GAGTGTGTTCATTCCACCACGTGATGCCTTCTCTGGCCCTGCACCCCGCAAGTCCAGACTCATCTGCGAGGCCACCAACT
TCAGTCCCAAACAGATCACAGTATCCTGGCTACAGGATGGGAAGCCTGTGAAATCTGGCTTCACNNCAGAGCCAGTGACT
GTCGANGCCNAANGNATCCAGACCCCAAANCTANNNGTCATNANCNNNCTGACCATCACTGAAAGNAGGGCNANTCNTTT
NANCNGCNGGACTAGTCCTTTANTGAGGGNNNNTGAGCTGNCGTANCATGNCATAGCTNNTCNGGNNNGNANTNNNNNTC
CNNNTCNNNANNNNNNANNNNNNNNNNNNNNNNNNNNNNNNCNGGGGNNCTANNNNNGNNCTNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNANNNTCNNNNNNNNNNNNN
>70_100
NNNNNNNNNNANNGGGCGATTGATTTAGCGGCCGCGAATTCGCCCTTTCCACGCTCCTGCTGCTGACCATCCCTTCATGG
GTCTTGTCCCAGATCACCTTGAAGGAGTCTGGTCCTACGCTGGTGAAACCCACACAGACCCTCACGCTGACCTGCACCTT
CTCTGGGTTCTCACTCAGCACTAGTGGAGTGGGTGTGGGCTGGATCCGTCAGCCCCCAGGAAAGGCCCTGGAGTGGCTTG
CACTCATTTATTGGAATGATGATAAGCGCTACAGCCCATCTCTGAAGAGCAGGCTCACCATCACCAAGGACACCTCCGAA
AACCAGGTGGTCCTTACAATGACCAACATGGACCCTGTGGACACAGCCACATATTACTGTGCACACGGATACAGCTATGG
TTACTTCCAGCACTGGGGCCAGGGCACCCTGGTCACCGTCTCCTCAGAGAGTCAGTCCTCCCCAACTGTCTTCCCCCTCG
TCTCCTGCGAGAGCCCCCTGTCTGATGAGAATTTGGTGGCCATGGGCTGCCTGGCCCGGGACTTCCTGCCCAGCTCCATT
TCCTTCTCCTGGAACTACCAGAACAACACTGAAGTCATGCAGGGTGTCAGAACCTTCCCAACACTGAGGACAGGGGACAA
ATACACAGCTACCTCGCAGGTGTTACTGTCCGCCAAAAATGTCCTTGAAGGTTCAGATGAATACTTGGTATGCAAAATCC
ACCATGGCAACAAAAACAAAGATCTGCATGTGCCGATTCCAGCTGTCGTTGAGATGAACCCCAATGTGAGTGTGTTCATT
CCACCACGTGATGCCTTCTCTGGCCCTGCACCCCGCAAGTCCAGACTCATCTGCGAGGCCACCAACTTCAGTCCCAAACA
GATCACAGTATCCTGGCTACAGGATGGGAAGCCTGTGAAATCTGGCTTCACCACAGANCCAGTGACTGTCGANGCCAAAG
GATCCAGACCCCNAACCTACNNGGTCATAAGCACACTGACCATCACTGAAAGCANGGCNANNCGTTNNAANNTGCAGGAC
TAGTCCCTTTNNTGAGGTNATNNNNANCTNNNTANCATGNCATAGCTGTTNCTGGNNNGAAATGTNNCNGCTNNNNNNCN
NNNANNNNNNNNNNNNNNNNNNNNAAANNCNGGGNNGNCNANGNNNNNNTAANNCNNNNANNNGNNNNNNNNNNNNNNNN
NCNNNNNGGNNNNNNNNGCNNNNNNNCNNTAANGNANNNGGNNNNN

Output is a fasta file containing translated variable domains and headers describing the segment and CDR composition.

>54_100_frame_0;IGHV4-4 294 0;;IGHJ3 48 0;CARKLGIKYAFDIW;;IGHM 22 1
QVQLQESGPGLVKPSGTLSLTCAVSGGSISS.sNWWSWVRQPPGKGLEWIGEIYHSGSTNYNPSLKSRVTISVDKSKNQFSLKLSSVTAADTAVYYCARKLgi.kyaFDIWGQGTMVTVSS
>70_100_frame_2;IGHV2-5 297 1;IGHD5-5/5-18 18 0;IGHJ1 46 0;CAHGYSYGYFQHW;;IGHM 22 1
QITLKESGPTLVKPTQTLTLTCTFSGFSLSTsgVGVGWIRQPPGKALEWLALIYWNDDKRYSPSLKSRLTITKDTSENQVVLTMTNMDPVDTATYYCAHGYsy..gyFQHWGQGTLVTVSS

Development

Project Logo

Click on the following image to upload a new version of the PNG logo image for your project:

Image:MediaWikiSidebarLogo.png

Personal tools