World's first open source data quality & data preparation project
This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy. This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
ETL engine based on Groovy
P.S. Dear friends. Repository migration to https://github.com/ascrus/getl . You can download jar file from this site or maven. GETL - based package in Groovy, which automates the work of loading and transforming data. His name is an acronym for «Groovy ETL». GETL is a set of libraries of pre-built classes and objects that can be used to solve problems unpacking, transform and load data into programs written in Groovy, or Java, as well as from any software that supports the work with Java classes. GETL taken into account when developing ideas and following requirements: 1. The simpler the class hierarchy, the easier solution; 2. The data structures tend to change over time, or not be known in advance, working with them must be maintained; 3. All routine work ETL should be automated wherever possible; 4. Compiling the code on the fly bail speed and reserve for the optimization; 5. Sophisticated class hierarchy guarantee easy connection of other open source solutions.
osDQ project dedicated to create apache spark based data quality
This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data quality and data preparation features for big data. This uses java API of apache spark
Java tools for decoding and manipulating BER encoded ASN.1 Files
A simple Java ASN BER decoder and profiler A tool for easy manipulation of BER encoded files. An "awk" for ASN.1 BER (for Unix people) or maybe a "notepad" for ASN.1 BER (for Windows people). Jberd (Java BER decoder) is a lightweight BER decoder and associated tools for interpreting and processing BER encoded ASN.1 files. The following facilities are provided: • JBerd Profiler. A tool for profiling the contents of BER encoded files • JBerd Flattener. A tool for converting BER encoded files to flat files for processing by other facilities • JBerd Decoder objects. A set of Java facilities for writing BER applications that require BER decoding Go to the "files" section (link at the top of this page) to download a pdf of detailed documentation. Andrew Forsyth
ETL Converter is a migration tool that builds open source ETL projects from existing projects made with proprietary software. The first version converts DataStage projects into Talend Open Studio projects. Other sources/targets will be available later.
Simple and easy ETL tool useful for small data warehouse projects. Written in Java.
Trauma registry suite; Data collection application and server scripts to build trauma data warehouse and perform web-based analysis reporting. Cross-platform compatible for Windows, Apple, Unix, or Linux.
Diffs, patches, and revision control for CSV files, spreadsheets, and databases.
The Aspen Content Management System (ACMS) implements a web-based centralized source for dissemination and collection of digital audio content. It features a cash-in and cash-out system using the PayPal Mass Pay API to buy, sell, and trade audio content.
Inventory, manufacturing, sales (POS) automation suite that consists of software written in Java 6 and custom developed hardware.
Danatomy is a framework to set up processes for the extraction, analysis, quality assurance, cleansing and transformation of data in RDBSs. The processes may be interactive, batch or mixed and their output may be reports or written back to RDBS again.
Farmers Helper is to help organize Farmers with thier Livestock, Feed, and Employees.
data organising system for arbitrary files
Uses alternate data streams to provide arbitrary tagging and searching of files within NTFS and other modern file systems supporting alternate data streams. Customisable vocabulary provides searchable standardised tagging system of file associations. Unlike most other file archiving systems, no additional database is required, and the system is robust, and persists file attributes irrespective of renaming / moving / copying / modifying etc. I would appreciate your opinions and suggestions / bugs etc.
IdeoReport is a java-based set of packages that allows reports generations in a variety of output formats including xls, pdf, jpeg, xml, csv and html. It can be integrated to existing applications (java and non-java) via different connectors.
JBelt :: link your design to the business
JBELT is a collection of procedure to create a PLM system oriented to connect the CAD application to the ERP database. Based on the web. The engine are JBoss, PostgreSQL and developed in Java on JBoss Seam framework.
The goal of this project is to provide java based libraries for core data mining algorithms. Most of the free implementations on the web are not robust/mature/scalable. This project aims at providing robust code that scales well for huge data sets.
Kaku is an enterprise resource planning system with client base, product management and invoice processing. The standard language is German.
Kommerce is a software for small business company who want to manage their sales and stock. It is designed to be a ready to use software and use the "keep it easy" way.
LIM helps a company to administer data about medias (such as CDs), software products, and licenses. To do so one can store data about machines, licensing models etc. Futhermore LIM supports people dealing with hardware/software leasing administration. Us
Monitors webpages for changes and emails output with differences to subscribers. Permits user accounts and registration. PHP/MYSQL.
Meter Data Management written in Erlang -- This is an experiment based on use cases that I know well. Erlang should work well for real-time metering.
NERPA is an clusterable/distributable/scatterable open ERP/CRM/ProjectManagement software, targeted to public application services, warehouse centers and frontend/backend corporate IT infrastructure.
OSTL - Open Source Transformation and Load: Tools for for data transformation and load in Datawarehouse (or other data repository), using Oracle Technologies and XML .
opensource reporting server and client interface. it is about to allocate multiple datasources e.g. databases, web services, ... the result can be rendered in nearly every output format (office/pdf/...)
Terminal Software of Storage, directed to the development of a control system warehouses, the license of this software is Open Source Initiative (CDDL), software is developed under standards opened.