Bowtie, an ultrafast, memory-efficient short read aligner for short DNA sequences (reads) from next-gen sequencers. Please cite: Langmead B, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25.
A cross-platform statistical package for econometric analysis
gretl is a cross-platform software package for econometric analysis, written in the C programming language.
SciDAVis is a user-friendly data analysis and visualization program primarily aimed at high-quality plotting of scientific data. It strives to combine an intuitive, easy-to-use graphical user interface with powerful features such as Python scriptability.
An easy to use Java program that allows you to digitize data points off of scanned plots, scaled drawings, or orthographic photographs. Includes an automatic digitization feature that can automatically digitize many types of functional data.
Quantitative Content Analysis or Text Mining
KH Coder is a free software for quantitative content analysis or text data mining. It is also utilized for computational linguistics. You can analyze Japanese, English, French, German, Italian, Portuguese and Spanish text with KH Coder. Also, Catalan, Chinese (simplified), Korean, Russian and Slovenian language data can be analyzed with the latest alpha release (Version 3). KH Coder provides various kinds of search and statistical analysis functions using back-end tools such as Stanford POS Tagger, FreeLing, Snowball stemmer, MySQL and R.
Open source Health IT for the planet
OpenMRS is a community-developed, open source, enterprise electronic medical record system. Our mission is to improve health care delivery in resource-constrained environments by coordinating a global community to creates and support this software.
GATE (General Architecture for Text Engineering) is an architecture, framework and development environment for developing, evaluating and embedding Human Language Technology. See http://gate.ac.uk for full details.
FormScanner - Free OMR Software
FormScanner is an OMR (Optical Mark Recognition) software that automatically marks multiple-choice papers. FormScanner not bind you to use a default template of the form, but gives you the ability to use a custom template created from a simple scan of a blank form. The modules can be scanned as images with a simple scanner and processed with FormScanner software. All the collected information can be easily exported to a spreadsheet.
Data quality analysis, profiling, cleansing, duplicate detection +more
DataCleaner is a data quality analysis application and a solution platform for DQ solutions. It's core is a strong data profiling engine, which is extensible and thereby adds data cleansing, transformations, enrichment, deduplication, matching and merging. Website: http://datacleaner.github.io
Data analysis and visualization for Excel, for free
XL Toolbox is a free Excel statistics addin that helps analyzing and presenting data: Smart custom error bars, chart design, chart export to TIFF; formula builder, transpose wizard, analysis of variance (ANOVA); automatic backups, workbook management and more.
IT++ is a C++ library of mathematical, signal processing and communication classes and functions. Its main use is in simulation of communication systems and for performing research in the area of communications.
GridLAB-D is a new power system simulation tool that provides valuable information to users who design and operate electric power transmission and distribution systems, and to utilities that wish to take advantage of the latest smart grid technology. It incorporates advanced modeling techniques with high-performance algorithms to deliver the latest in end-use load modeling technology integrated with three-phase unbalanced power flow, and retail market systems. Historically, the inability to effectively model and evaluate smart grid technologies has been a barrier to adoption; GridLAB-D is designed to address this problem. User documentation can be found at: http://gridlab-d.shoutwiki.com/wiki/Quick_links The source code is available from GitHub. See https://github.com/gridlab-d/gridlab-d. Issue tracking is handled by GitHub. See https://github.com/gridlab-d/gridlab-d/issues.
QtiPlot is a user-friendly, platform independent data analysis and visualization application similar to the non-free Windows program Origin.
PriEsT is a decision making tool for Analytic Hierarchy Process (AHP).
Priorty Estimation Tool (PriEsT) is a decision analysis tool. You can use it for ranking the options you have, or alternatively, you may use it for resource allocation (budgeting) problems. In PriEsT, you enter a list of available options and then define your criteria for prioritization. After defining criteria, PriEsT allows you to enter your judgements against each criterion, which are then used to calculate the final ranking (or weights). Please cite this if you find it useful:- Siraj, S., Mikhailov, L. and Keane, J. A. (2015), "PriEsT: an interactive decision support tool to estimate priorities from pairwise comparison judgments". International Transactions in Operational Research. 22: 217–235. doi:10.1111/itor.12054
IRAMUTEQ : Interface de R pour les Analyses Multidimensionnelles de Textes et de Questionnaires. Logiciel de traitement de données pour des corpus texte ou de type individus/caractères. Permet notamment de réaliser des analyses de type "ALCESTE"
Map your path to clean data with an open source data profiling tool.
Map your path to clean data with Open Studio for Data Quality, the leading open source data profiling tool. Open Studio for Data Quality easily connects to hundreds of data sources and generates analysis to help define the next steps to clean data. Evaluate data quality against custom-defined thresholds, and measure conformance to internal standards such as SKU or external standards such as postal codes. Find out how to connect data with fuzzy matching or correlation analytics. Millions of downloads and a full range of robust, open source integration software tools have made Talend the open source leader in cloud and big data integration.
Java Modelling Tools is a suite for performance evaluation and modelling. Queuing Network models are solved with analytical, asymptotic and simulation methods; workload is characterized using clustering techniques.
Arbitrary-precision CRC calculator and algorithm finder
CRC RevEng is a portable, arbitrary-precision CRC calculator and algorithm finder. It calculates CRCs using any of the 102 preset algorithms, or a user-specified algorithm to any width. It calculates reversed CRCs to give the bit pattern that produces a desired forward CRC. CRC RevEng also reverse-engineers any CRC algorithm from sufficient correctly formatted message-CRC pairs and optional known parameters. It comprises powerful input interpretation options. Compliant with Ross Williams' Rocksoft(tm) model of parametrised CRC algorithms.
easyrec is a recommender system that aims for easy integration of recommendations into web applications. It has a web based admin tool, and its recommendation engine is accessible through a REST API, providing methods like 'other users also bought'
World's first open source data quality & data preparation project
This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy. This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
Unicode-XML-TEI text/corpus analysis platform
TXM is a free and open-source cross-platform Unicode & XML based text/corpus analysis environment and graphical client, supporting Windows, Linux and Mac OS X. It can also be used online as a J2EE standard compliant web portal (GWT based) with access control built in. DOWNLOAD LATEST VERSION OF TXM : http://textometrie.ens-lyon.fr/spip.php?rubrique61&lang=en TXM offers a comprehensive range of analysis tools (concordances, collocate search, frequency lists, etc.) based on the powerfull CQP full text search engine (http://cwb.sourceforge.net) and a range of statistical functions (factorial analysis, classification, cooccurrency analysis, etc.) based on R packages (http://www.r-project.org). Read the scientific background at the Textométrie project web site http://textometrie.ens-lyon.fr/?lang=en. Read a full description at the TEI Tools wiki http://wiki.tei-c.org/index.php/TXM.
Framework for text mining, data integration and data analysis. Keywords: ontology and graph alignment, relation mining, warehouse, semantic database integration, bioinformatics, systems biology, microarray, Java.
libcrn is document image processing library written in C++11 for Linux, Windows, Mac OsX and Google Android. It is a toolbox that allows to create easily software such as OCRs and layout analysis tools.
OSRA is a utility designed to convert graphical representations of chemical structures and reactions, as they appear in journal articles, patent documents, textbooks, trade magazines etc., into SMILES or SD file format- a computer recognizable molecular structure You can find links to the binary executables here: https://sourceforge.net/p/osra/wiki/Download/
Seasonal/Sequential (Instants/Durations, Even or not) Time Series
Objects to manipulate sequential and seasonal time series. Sequential time series based on time instants and time durations are handled. Both can be regularly or unevenly spaced (overlapping durations are allowed). Only POSIX* format are used for dates and times. The following classes are provided : POSIXcti, POSIXctp, TimeIntervalDataFrame, TimeInstantDataFrame, SubtimeDataFrame ; methods to switch from a class to another and to modify the time support of series (hourly time series to daily time series for instance) are also defined. Tools provided can be used for instance to handle environmental monitoring data (not always produced on a regular time base).