Free open source ETL software for data integration anywhere.
Expand your open source stack with a free open source ETL tool for data integration and data transformation anywhere. Work with the latest cloud applications and platforms or traditional databases and applications using Open Studio for Data Integration to design and deploy quickly with graphical tools, native code generation, and 100s of pre-built components and connectors. Open Studio for Data Integration is fully open source, so you can see the code and work with it. Embed existing Java code libraries, create your own components or leverage community components and code to extend your project. Millions of downloads and a full range of robust, open source integration software tools have made Talend the open source leader in cloud and big data integration.
A high performance, open source, data replication engine for MySQL
Tungsten Replicator is a high performance, free and open source replication engine that supports a variety of extractor and applier modules. Data can be extracted from MySQL, Oracle and Amazon RDS, and applied to numerous transactional stores and datawarehouse stores (MySQL, Oracle, and Amazon RDS; NoSQL stores such as MongoDB; Vertica, Hadoop, and Amazon RDS). Tungsten Replicator helps technically focused users solve host of problems and offers features that surpass those of most other open source replicators. During replication, Tungsten Replication allows data to be exchanged between different databases and database versions, information can be filtered and modified, and deployment can be between on-premise or cloud-based databases. It supports parallel replication and advanced topologies such as fan-in and multi-master. It can also be used efficiently in cross-site deployments.
With Alfresco Audit Analysis and Reporting (A.A.A.R.) is provided a solution to extract, store and query audit data together with the document/folder informations at a very detailed level, with the goal to be useful to the end-user in a very easy way. To reach that goal, to make the data more friendly for the end-user, the data are published in reports in well-known formats (pdf, Microsoft Excel, csv, etc.) and stored directly in Alfresco as static documents organized in folders, versioned, authorized and published. On the top of the A.A.A.R. solution, the A.A.A.R. Analytics is a set of powerful tools to analyze data in an interactive and customizable way with a user console composed by dashboards, reports and free analysis.
Java utility that reads the metadata from table(s)
Dbmetadata is a Java utility that reads the metadata from table(s) in a specified database and creates the Informatica XML to import into the repository. I created this utility when we were migrating to a new platform and needed a quick way to create flatfile and relational sources and targets that matched the DDL of the table. I also needed to use shortcuts. If you use the import table list, it will create one XML file with all of the tables and shortcuts (if a shortcut folder is specified) for the requested output type and database/file type.
Data Vault loading automation using Pentaho Data Integration.
A metadata driven 'tool' to automate loading a designed Data Vault. It consists of a set of Pentaho Data Integration and database objects. Thel Virtual Machine (VMware) is a 64 bit Ubuntu Server 14.04, with MySQL (Percona Server) and PostgreSQL 9.4 as the database flavours and PDI version 5.2 CE. NB: Directory version_2.4 contains the most recent Virtual Machine. The readme.txt contains info about that VM.
webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy to master. The standard webStraktor output format is XML based, either in ASCII, UTF-8 or ISO-8859-1 (Latin1) code pages. webStraktor relies on the Apache HttpClient for retrieving content via the HTTP protocol. It adheres to the Robots Exclusion Protocol and it can be configured to operate in an anonymous way by connecting to the predominant types of web proxy servers. webStraktor extends the functionality of web crawlers, spiders or bots by integrating scraping and crawling capabilities.
Applications for data management
"Information is data in action", and, consequently, having good quality data is essential. The AESTEL package contains two highly configurable applications for data management: A data loader and a reporting application, i.e. DataLoader and AEREA, respectively. The data loader application applies user-defined instructions to validate, process and load data. The reporting application provides a query builder and spreadsheet template designer. Both applications work with any relational data model. (Postgres and Oracle have been tested). The two applications have been initially developed for small molecule drug discovery research. However, they can be extended for use in other data domains.
a financial math library and financial market data database
This project should combine a financial mathematics library with an underlying financial market database (and a set of other tools), which could be used by financial institutions for their financial market data needs as well as by students for research works.
NERPA is an clusterable/distributable/scatterable open ERP/CRM/ProjectManagement software, targeted to public application services, warehouse centers and frontend/backend corporate IT infrastructure.
This software integrate a Cifs Shares with a Microsof Windows Server Domain. The software is intentended to be used by system administrators who have to manage users permissions on cifs shares with an automount script at users logon.
SmartStock is a student project about improving the lending and managing process of devices that the ECE lends to students for their projects. It is a plugin of Booked (https://sourceforge.net/projects/phpscheduleit/). It allows managers to use QRcodes to identify every device to prenvent loss and to ease lending process.
GPU Analytic Database
A SQL based analytic engine running on an NVIDIA GPU for exceptional performance. We see over 700x performance increase over a well known database on the same machine.