The Chemistry Development Kit (CDK) is a scientific, LGPL-ed library for bio- and cheminformatics and computational chemistry written in Java. The main home page of the CDK is now at: http://cdk.github.io
Comet open source tandem mass spectrometry (MS/MS) search engine.
Comet is an open source tandem mass spectrometry (MS/MS) sequence database search engine. It identifies peptides by searching MS/MS spectra against sequences present in protein sequence databases. Comet currently exists as a simple Windows or Linux command line binary that only does MS/MS database search. Supported input formats are mzXML, mzML, and ms2 files. Supported output formats are .out, SQT, and pepXML Documentation and project website: http://comet-ms.sourceforge.net
Lush is a Lisp dialect with extensions for object-oriented and array-oriented programming. Lush is intended for prototyping numerically intensive applications and is designed for easy integration of existing C/C++/Fortran codes.
The Elvira project is a suite of tools to perform high-throughput genomic assemblies of repetitive, structured samples such as viruses or targeted regions of larger genomes.
Java code developed by the Australian ICGC team for operating on next-generation sequencing data. This code is currently being maintained and expanded by the QIMR Berghofer Genome Informatics team (http://www.qimrberghofer.edu.au/lab/genome-informatics/) More details and documentation can be found on the wiki: http://sourceforge.net/p/adamajava/wiki/Home/
SAM (Sequence Alignment/Map) is a flexible generic format for storing nucleotide sequence alignment. SAMtools provide efficient utilities on manipulating alignments in the SAM format. The main samtools source code repository moved to GitHub in March 2012. For ongoing development since then, see http://github.com/samtools/samtools
Scanning probe microscopy data visualisation and analysis
A data visualization and processing tool for scanning probe microscopy (SPM, i.e. AFM, STM, MFM, SNOM/NSOM, ...) and profilometry data, useful also for general image and 2D data analysis.
Framework for Systems Biology
The Systems Biology Workbench(SBW) is a framework for application intercommunications. It uses a broker-based, distributed, message-passing architecture, supports many languages including Java, C++, Perl & Python, and runs under Linux,OSX & Win32. It comes with a large number of modules, encompassing the whole modeling cycle: creating computational models, simulating and analyzing them, visualizing the information, in order to improve the models. All using community standards, such as SED-ML, SBML and MIRIAM.
City of Hope CpG Island Analysis Pipeline
COHCAP (City of Hope CpG Island Analysis Pipeline) is an algorithm to analyze single-nucleotide resolution methylation data (Illumina 450k methylation array, targeted BS-Seq, etc.). It provides QC metrics, differential methylation for CpG Sites, differential methylation for CpG Islands, integration with gene expression data, and visualization of methylation values. Please note: 1) The standalone version of COHCAP is no longer being updated. Please see the Bioconductor version: http://bioconductor.org/packages/release/bioc/html/COHCAP.html 2) In addition to the original NAR paper, please see the following links: Benchmarks: http://www.nature.com/protocolexchange/protocols/2965#/introduction Protocol Exchange Files: http://sourceforge.net/projects/cohcap/files/Protocol_Exchange_Example.zip 3) Bioconductor Custom Annotation Files (including EPIC Array): https://sourceforge.net/projects/cohcap/files/additional_Bioconductor_annotations.zip/download
Synteny Block ExpLoration tool
Sibelia: A comparative genomic tool: It assists biologists in analysing the genomic variations that correlate with pathogens, or the genomic changes that help microorganisms adapt in different environments. Sibelia will also be helpful for the evolutionary and genome rearrangement studies for multiple strains of microorganisms.
Tools for integration and analysis of heterogeneous immunological data
This site hosts the source code for C++ version of the Broker for SBW, NOM module, advanced simulation suite, analysis applications and model editors.
Virtual Screening software for Computational Drug Discovery
PyRx is a Virtual Screening software for Computational Drug Discovery that can be used to screen libraries of compounds against potential drug targets. PyRx enables Medicinal Chemists to run Virtual Screening from any platform and helps users in every step of this process - from data preparation to job submission and analysis of the results. While it is true that there is no magic button in the drug discovery process, PyRx includes docking wizard with easy-to-use user interface which makes it a valuable tool for Computer-Aided Drug Design. PyRx also includes chemical spreadsheet-like functionality and powerful visualization engine that are essential for Rational Drug Design. Please visits PyRx home page to learn more about PyRx and watch videos on how to use it.
Somatic fusion-genes finder for RNA-seq data
FusionCatcher searches for novel/known somatic fusion genes, translocations, and chimeras in RNA-seq data (paired-end reads from Illumina NGS platforms like Solexa and HiSeq) from diseased samples. The aims of FusionCatcher are: - very good detection rate for finding candidate fusion genes, - very easy to use (i.e. no a priori knowledge of databases and bioinformatics is needed in order to run FusionCatcher), - very good detection of challenging fusion genes, like for example IGH fusions, CIC fusions, DUX4 fusions, CRLF2 fusions, TCF3 fusions, etc. - to be as automatic as possible (i.e. the FusionCatcher will choose automatically the best parameters in order to find candidate fusion genes, e.g. finding automatically the adapters, building the exon-exon junctions automatically based on the length of the input reads, etc.) while providing the best possible detection rate for finding fusion genes.
BBMap short read aligner, and other bioinformatic tools.
This package includes BBMap, a short read aligner, as well as various other bioinformatic tools. It is written in pure Java, can run on any platform, and has no dependencies other than Java being installed (compiled for Java 6 and higher). All tools are efficient and multithreaded. BBMap: Short read aligner for DNA and RNA-seq data. Capable of handling arbitrarily large genomes with millions of scaffolds. Handles Illumina, PacBio, 454, and other reads; very high sensitivity and tolerant of errors and numerous large indels. Very fast. BBNorm: Kmer-based error-correction and normalization tool. Dedupe: Simplifies assemblies by removing duplicate or contained subsequences that share a target percent identity. Reformat: Reformats reads between fasta/fastq/scarf/fasta+qual/sam, interleaved/paired, and ASCII-33/64, at over 500 MB/s. BBDuk: Filters, trims, or masks reads with kmer matches to an artifact/contaminant file. ...and more!
BioModels Database is a data resource that allows biologists to store, search and retrieve published mathematical models of biological interests. Models presented are annotated and linked to relevant data resources and are available in various format
ParseCNV CNV call association software takes CNV calls as input and creates SNP based statistics for CNV occurrence in population study cases and controls then calls CNVRs based on neighboring SNPs of similar significance.
BioPro project has been moved to http://www.bioera.net/
The Sashimi project hosts the Trans-Proteomic Pipeline (TPP), a mature suite of tools for mass-spec (MS, MS/MS) based proteomics: statistical validation, quantitation, visualization, and converters from raw MS data to the open mzML/mzXML formats.
Easy manipulation of sdf molecular data files.
sdsorter provides convenient routines for manipulating the contents of sdf molecular data files based on the embedded sd tags.
Genetic variants discovery tool
Bioinformatics pipeline for discovery of genetic variants from NGS reads.
An open source framework for LC-MS based proteomics and metabolomics. OpenMS offers data structures and algorithms for the processing of mass spectrometry data. The library is written in C++.
Software for storing and analysing bacterial sequence data
The Bacterial Isolate Genome Sequence Database (BIGSdb) is a scalable, web-accessible database system designed to store and analyse linked phenotypic and genotypic information in a computationally efficient manner. Sequence data can range from single sequence reads to multiple contigs generated by whole genome sequencing technologies. The system incorporates the capacity to define and identify any number of loci and genetic variants at those loci within the stored nucleotide sequences. These loci can be further organised into schemes for isolate characterisation or for evolutionary or functional analyses. See Jolley and Maiden 2010, BMC Bioinformatics 11:595 (http://www.biomedcentral.com/1471-2105/11/595). You can report bugs or make enhancement requests using the issues tracker at https://github.com/kjolley/BIGSdb. The source code is also mirrored there.
Pascal Units for Medical Applications
The PUMA Repository is a collection of Pascal units for medical informatics. It contains reusable source code for a wide field of health-care application development. The code includes converting functions for measurement units and an HL7 engine. PUMA is compatible with Lazarus and Free Pascal. Some of the units also support other Pascal implementations including Delphi, winsoft Pocket Studio and other compilers.
convert genome coordinates betweeen assemblies
CrossMap is a program for convenient conversion of genome coordinates and genomeannotation files between assemblies (eg. lift from GRCh36/hg18 to GRCh37/hg19 or vice versa).It support file in BAM, SAM, BED, Wiggle, BigWig, GFF, GTF format.