Page 3 | big data free download

Showing 110 open source projects for "big data"

View related business solutions

Data Management Linux Clear Filters & Widen Search

Build Agents and Models on One Platform
Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
1

Snowplow Analytics

Enterprise-strength marketing and product analytics platform

Snowplow is ideal for data teams who want to manage the collection and warehousing of data across all their platforms and products.

Downloads: 0 This Week

Last Update: 2022-01-31
See Project
2

LogicalSets

Integrated Comprehensive Data Architecture & Methodology

This is an advanced data architecture and methodology. A comprehensive Enterprise Resource Management System. A re-usable database with rules for customization, While being a data driven transaction processing engine, this system has very advanced reporting capabilities. This design eliminates up to 90% of business logic due to the way the data is structured. Uses a concept called Table Sets. Has a compound key that tells the programmer what tableset, which record which applet...

Downloads: 0 This Week

Last Update: 2021-12-06
See Project
3

TensorBase

TensorBase is a new big data warehousing with modern efforts

...TensorBase has a clear-cut opposition to fork communities, repeat wheels, or hack traffic for so-called reputations (like Github stars). After thoughts, we decided to temporarily leave the general data warehousing field. For people who want to learn how a database system can be built up, or how to apply modern Rust to the high-performance field, or embed a lightweight data analysis system into your own big one. You can still try, ask or contribute to TensorBase. The committers are still around the community. We will help you in all kinds of interesting things pursued in the project by us and maybe you. ...

Downloads: 0 This Week

Last Update: 2022-07-25
See Project
4

Learn Julia the Hard Way

Learn Julia the hard way

The Julia base package is pretty big, although at the same time, there are lots of other packages around to expand it with. The result is that on the whole, it is impossible to give a thorough overview of all that Julia can do in just a few brief exercises. Therefore, I had to adopt a little 'bias', or 'slant' if you please, in deciding what to focus on and what to ignore. Julia is a technical computing language, although it does have the capabilities of any general-purpose language and...

Downloads: 0 This Week

Last Update: 2023-11-04
See Project
Secure File Transfer for Windows with Cerberus by Redwood
Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.

Try for Free
5

SZT-bigdata

SZT‑bigdata is an open source project

SZT‑bigdata is an open-source project analyzing real Shenzhen metro (subway) card usage data using big‑data frameworks like Spark, Hadoop, Hive, Kafka, Flink, ClickHouse, HBase, and Elasticsearch. Aimed at exploring transit passenger flow patterns and system optimization using a variety of Scala-based technologies.

Downloads: 0 This Week

Last Update: 2025-08-04
See Project
6

Open Source Data Quality and Profiling

World's first open source data quality & data preparation project

...It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/

8 Reviews

Downloads: 1 This Week

Last Update: 2021-01-20
See Project
7

MyCAT

Active, high-performance open source database middleware

...Regarded as MySQL cluster of enterprise database, MyCAT can take the place of expensive Oracle cluster. MyCAT is also a new type of database, which seems like a SQL Server integrated with the memory cache technology, NoSQL technology and HDFS big data. And as a new modern enterprise database product, MyCAT is combined with the traditional database and new distributed data warehouse. In a word, MyCAT is a fresh new middleware of database. MyCAT ’s objective is to smoothly migrate the current stand-alone database and applications to cloud side with low cost and to solve the bottleneck problem caused by the rapid growth of data storage and business scale.

Downloads: 2 This Week

Last Update: 2021-06-28
See Project
8

geometry-api-java

The Esri Geometry API for Java enables developers to write apps

The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
9

wzd

Powerful storage server, designed for big data storage systems

wZD is a server written in Go language that uses a modified version of the BoltDB database as a backend for saving and distributing any number of small and large files, NoSQL keys/values, in a compact form inside micro Bolt databases (archives), with distribution of files and values in BoltDB databases depending on the number of directories or subdirectories and the general structure of the directories. Using wZD can permanently solve the problem of a large number of files on any POSIX...

Downloads: 0 This Week

Last Update: 2020-05-19
See Project
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
10

Custom Apache Big data Distribution

A Custom Apache Distribution including Spark and Hadoop, for Windows.

This Distribution has been customized to work out of the box. So, just download it, and unzip it. Set the Path variables for bin folders, HADOOP_HOME, SPARK_HOME, and JAVA_HOME. That's it..! use Hadoop and Spark natively on Windows.

Downloads: 0 This Week

Last Update: 2020-03-11
See Project
11

inMap

Rich layers, better user experience, big data geographic visualization

inMap is a big data visualization library based on Baidu Map. It focuses on the display of scatter, heat map, grid, and aggregation in the direction of big data. It is committed to making big data visualization easy to use.

Downloads: 0 This Week

Last Update: 2022-07-12
See Project
12

FastoRedis

Cross-platform open source Redis DB management tool

FastoRedis (fork of FastoNoSQL) — is a cross-platform open source Redis management tool (i.e. Admin GUI). It put the same engine that powers Redis's redis-cli shell. Everything you can write in redis-cli shell — you can write in FastoRedis! Our program works on the most amount of Linux systems, also on Windows, Mac OS X, FreeBSD and Android platforms, on desktops and embedded devices.

Downloads: 3 This Week

Last Update: 2019-10-25
See Project
13

An introduction to Data Analysis in R

A guide for learning the basic tools on data anaylisis with R

An Introduction to Data Analysis in R [Book] A guide for learning the basic tools on data anaylisis: process, visualize and learn from your data using R programming. This repository holds the necessary data sets for the book "An introduction to Data Analysis in R", to be published by Springer series Use R!. The book can be purchased in XXX. The book is meant as an introductory guide to manipulate data sets in the Big Data paradigm. ...

Downloads: 0 This Week

Last Update: 2020-02-08
See Project
14

FastoNoSQL

FastoNoSQL it is GUI platform for NoSQL databases.

Gui managment admin tool for: Redis Memcached SSDB LevelDB RocksDB UnQLite LMDB UpscaleDB ForestDB

Downloads: 10 This Week

Last Update: 2019-06-19
See Project
15

OME-3DR

3D cell reconstruction and quantitative analysis based on OME data

For big OME data analysis, we integrate commonly used quantitative methods, describe our novel strategies to quantify and analyze biological markers related to the cell or organelle spatial-coordinate model, and present open-source OME-3-Dimensional Reconstruction (OME-3DR), a flexible, programmable and batch-oriented tool based on OME data, for reconstructing 3-dimensional (3D) spatial conformations and conducting further analyses, such as the identification, counting, localization and tracking of bio-imaging markers, calculation of model-contour parameters for association analyses, establishing spatial-coordinate system, 3D co-localization analyses and so on. ...

Downloads: 0 This Week

Last Update: 2019-07-04
See Project
16

OCW Test - Out of Commerce Works

Program for out of commerce works detection

The OCW Test program has been designed to provide assistance in the detection of works outside trade, taking as reference a list of works from a specific bibliographic catalog. In this first version, the program operates on the identifiers of the books of the library of the Complutense University of Madrid. However, the program can be reedited, to work on any bibliographic catalog.

Downloads: 0 This Week

Last Update: 2019-03-24
See Project
17

apache spark data pipeline osDQ

osDQ dedicated to create apache spark based data pipeline using JSON

This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark.

Downloads: 0 This Week

Last Update: 2019-01-20
See Project
18

MarDRe

MapReduce-based tool to remove duplicate DNA reads

MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads through the clustering of single-end and paired-end sequences from FASTQ/FASTA datasets. This tool allows bioinformatics to avoid the analysis of not necessary reads, reducing the time of subsequent procedures with the dataset. MarDRe is the Big Data counterpart of ParDRe (link above), which employs HPC technologies (i.e., hybrid MPI/multithreading) to reduce runtime on multicore systems. Instead, MarDRe takes advantage of the MapReduce programming model to significantly improve ParDRe performance on distributed systems, especially on cloud-based infrastructures. Written in pure Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for Big Data processing.

Downloads: 0 This Week

Last Update: 2019-01-23
See Project
19

HSRA

Hadoop spliced read aligner for RNA-seq data

...This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a cluster by combining a fast multithreaded spliced aligner (HISAT2) with Apache Hadoop, which is a distributed computing framework for scalable Big Data processing. HSRA currently supports single-end and paired-end read alignments from FASTQ/FASTA datasets. Moreover, our tool uses the Hadoop Sequence Parser (HSP) library (link above) to efficiently read the input datasets stored on the Hadoop Distributed File System (HDFS), being able to process datasets compressed with Gzip and BZip2 codecs.

Downloads: 0 This Week

Last Update: 2019-01-23
See Project
20

X10

Performance and Productivity at Scale

...Both its modern, type-safe sequential core and simple programming model for concurrency and distribution contribute to making X10 a high-productivity language in the HPC and Big Data spaces. User productivity is further enhanced by providing tools such as an Eclipse-based IDE (X10DT). Implementations of X10 are available for a wide variety of hardware and software platforms ranging from laptops, to commodity clusters, to supercomputers.

Downloads: 3 This Week

Last Update: 2019-01-07
See Project
21

fooltrader

Quant framework for stock

Build a standard data schema, and then implement various connectors to import systems you are familiar with for analysis. fooltrader is a quantitative analysis trading system designed using big data technology, including data capture, cleaning, structuring, calculation, display, backtesting and trading. Its goal is to provide a unified framework for the whole market (stock, futures, bonds, foreign exchange, digital currency, macroeconomics, etc.) for research, backtesting, forecasting, and trading. ...

Downloads: 0 This Week

Last Update: 2022-06-02
See Project
22

Redis Desktop Manager

:wrench: Cross-platform GUI management tool for Redis

Redis Desktop Manager is a fast, open source Redis database management application based on Qt 5. It's available for Windows, Linux and MacOS and offers an easy-to-use GUI to access your Redis DB. With Redis Desktop Manager you can perform some basic operations such as view keys as a tree, CRUD keys and execute commands via shell. It also supports SSL/TLS encryption, SSH tunnels and cloud Redis instances, such as: Amazon ElastiCache, Microsoft Azure Redis Cache and Redis Labs.

1 Review

Downloads: 0 This Week

Last Update: 2018-10-11
See Project
23

paralline

Big Data tool

Paralline executes a python function (or lambda function) or a script over each line of huge text files, in parallel processes and aggregates the result to a list.

Downloads: 0 This Week

Last Update: 2018-09-04
See Project
24

Cosmos DB Spark

Apache Spark Connector for Azure Cosmos DB

...It also allows you to easily create a lambda architecture for batch-processing, stream-processing, and a serving layer while being globally replicated and minimizing the latency involved in working with big data.

Downloads: 0 This Week

Last Update: 2023-12-21
See Project
25

Vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python

Data science solutions, insights, dashboards, machine learning, deployment. We start at 100GB. Vaex is a high-performance Python library for lazy Out-of-Core data frames (similar to Pandas), to visualize and explore big tabular datasets. It calculates statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) samples/rows per second.

Downloads: 0 This Week

Last Update: 2023-07-31
See Project