Showing 110 open source projects for "big data"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    Snowplow Analytics

    Snowplow Analytics

    Enterprise-strength marketing and product analytics platform

    Snowplow is ideal for data teams who want to manage the collection and warehousing of data across all their platforms and products.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2

    LogicalSets

    Integrated Comprehensive Data Architecture & Methodology

    This is an advanced data architecture and methodology. A comprehensive Enterprise Resource Management System. A re-usable database with rules for customization, While being a data driven transaction processing engine, this system has very advanced reporting capabilities. This design eliminates up to 90% of business logic due to the way the data is structured. Uses a concept called Table Sets. Has a compound key that tells the programmer what tableset, which record which applet...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    TensorBase

    TensorBase

    TensorBase is a new big data warehousing with modern efforts

    ...TensorBase has a clear-cut opposition to fork communities, repeat wheels, or hack traffic for so-called reputations (like Github stars). After thoughts, we decided to temporarily leave the general data warehousing field. For people who want to learn how a database system can be built up, or how to apply modern Rust to the high-performance field, or embed a lightweight data analysis system into your own big one. You can still try, ask or contribute to TensorBase. The committers are still around the community. We will help you in all kinds of interesting things pursued in the project by us and maybe you. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Learn Julia the Hard Way

    Learn Julia the Hard Way

    Learn Julia the hard way

    The Julia base package is pretty big, although at the same time, there are lots of other packages around to expand it with. The result is that on the whole, it is impossible to give a thorough overview of all that Julia can do in just a few brief exercises. Therefore, I had to adopt a little 'bias', or 'slant' if you please, in deciding what to focus on and what to ignore. Julia is a technical computing language, although it does have the capabilities of any general-purpose language and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 5
    SZT-bigdata

    SZT-bigdata

    SZT‑bigdata is an open source project

    SZT‑bigdata is an open-source project analyzing real Shenzhen metro (subway) card usage data using bigdata frameworks like Spark, Hadoop, Hive, Kafka, Flink, ClickHouse, HBase, and Elasticsearch. Aimed at exploring transit passenger flow patterns and system optimization using a variety of Scala-based technologies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    ...It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    MyCAT

    MyCAT

    Active, high-performance open source database middleware

    ...Regarded as MySQL cluster of enterprise database, MyCAT can take the place of expensive Oracle cluster. MyCAT is also a new type of database, which seems like a SQL Server integrated with the memory cache technology, NoSQL technology and HDFS big data. And as a new modern enterprise database product, MyCAT is combined with the traditional database and new distributed data warehouse. In a word, MyCAT is a fresh new middleware of database. MyCAT ’s objective is to smoothly migrate the current stand-alone database and applications to cloud side with low cost and to solve the bottleneck problem caused by the rapid growth of data storage and business scale.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    geometry-api-java

    geometry-api-java

    The Esri Geometry API for Java enables developers to write apps

    The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    wzd

    Powerful storage server, designed for big data storage systems

    wZD is a server written in Go language that uses a modified version of the BoltDB database as a backend for saving and distributing any number of small and large files, NoSQL keys/values, in a compact form inside micro Bolt databases (archives), with distribution of files and values in BoltDB databases depending on the number of directories or subdirectories and the general structure of the directories. Using wZD can permanently solve the problem of a large number of files on any POSIX...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 10

    Custom Apache Big data Distribution

    A Custom Apache Distribution including Spark and Hadoop, for Windows.

    This Distribution has been customized to work out of the box. So, just download it, and unzip it. Set the Path variables for bin folders, HADOOP_HOME, SPARK_HOME, and JAVA_HOME. That's it..! use Hadoop and Spark natively on Windows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    inMap

    inMap

    Rich layers, better user experience, big data geographic visualization

    inMap is a big data visualization library based on Baidu Map. It focuses on the display of scatter, heat map, grid, and aggregation in the direction of big data. It is committed to making big data visualization easy to use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    FastoRedis

    FastoRedis

    Cross-platform open source Redis DB management tool

    FastoRedis (fork of FastoNoSQL) — is a cross-platform open source Redis management tool (i.e. Admin GUI). It put the same engine that powers Redis's redis-cli shell. Everything you can write in redis-cli shell — you can write in FastoRedis! Our program works on the most amount of Linux systems, also on Windows, Mac OS X, FreeBSD and Android platforms, on desktops and embedded devices.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13

    An introduction to Data Analysis in R

    A guide for learning the basic tools on data anaylisis with R

    An Introduction to Data Analysis in R [Book] A guide for learning the basic tools on data anaylisis: process, visualize and learn from your data using R programming. This repository holds the necessary data sets for the book "An introduction to Data Analysis in R", to be published by Springer series Use R!. The book can be purchased in XXX. The book is meant as an introductory guide to manipulate data sets in the Big Data paradigm. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    FastoNoSQL

    FastoNoSQL

    FastoNoSQL it is GUI platform for NoSQL databases.

    Gui managment admin tool for: Redis Memcached SSDB LevelDB RocksDB UnQLite LMDB UpscaleDB ForestDB
    Downloads: 10 This Week
    Last Update:
    See Project
  • 15
    OME-3DR

    OME-3DR

    3D cell reconstruction and quantitative analysis based on OME data

    For big OME data analysis, we integrate commonly used quantitative methods, describe our novel strategies to quantify and analyze biological markers related to the cell or organelle spatial-coordinate model, and present open-source OME-3-Dimensional Reconstruction (OME-3DR), a flexible, programmable and batch-oriented tool based on OME data, for reconstructing 3-dimensional (3D) spatial conformations and conducting further analyses, such as the identification, counting, localization and tracking of bio-imaging markers, calculation of model-contour parameters for association analyses, establishing spatial-coordinate system, 3D co-localization analyses and so on. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OCW Test - Out of Commerce Works

    OCW Test - Out of Commerce Works

    Program for out of commerce works detection

    The OCW Test program has been designed to provide assistance in the detection of works outside trade, taking as reference a list of works from a specific bibliographic catalog. In this first version, the program operates on the identifiers of the books of the library of the Complutense University of Madrid. However, the program can be reedited, to work on any bibliographic catalog.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    apache spark data pipeline osDQ

    apache spark data pipeline osDQ

    osDQ dedicated to create apache spark based data pipeline using JSON

    This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    MarDRe

    MapReduce-based tool to remove duplicate DNA reads

    MarDRe is a de novo MapReduce-based parallel tool to remove duplicate and near-duplicate DNA reads through the clustering of single-end and paired-end sequences from FASTQ/FASTA datasets. This tool allows bioinformatics to avoid the analysis of not necessary reads, reducing the time of subsequent procedures with the dataset. MarDRe is the Big Data counterpart of ParDRe (link above), which employs HPC technologies (i.e., hybrid MPI/multithreading) to reduce runtime on multicore systems. Instead, MarDRe takes advantage of the MapReduce programming model to significantly improve ParDRe performance on distributed systems, especially on cloud-based infrastructures. Written in pure Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for Big Data processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    HSRA

    Hadoop spliced read aligner for RNA-seq data

    ...This tool allows bioinformatics researchers to efficiently distribute their mapping tasks over the nodes of a cluster by combining a fast multithreaded spliced aligner (HISAT2) with Apache Hadoop, which is a distributed computing framework for scalable Big Data processing. HSRA currently supports single-end and paired-end read alignments from FASTQ/FASTA datasets. Moreover, our tool uses the Hadoop Sequence Parser (HSP) library (link above) to efficiently read the input datasets stored on the Hadoop Distributed File System (HDFS), being able to process datasets compressed with Gzip and BZip2 codecs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    X10

    Performance and Productivity at Scale

    ...Both its modern, type-safe sequential core and simple programming model for concurrency and distribution contribute to making X10 a high-productivity language in the HPC and Big Data spaces. User productivity is further enhanced by providing tools such as an Eclipse-based IDE (X10DT). Implementations of X10 are available for a wide variety of hardware and software platforms ranging from laptops, to commodity clusters, to supercomputers.
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    fooltrader

    fooltrader

    Quant framework for stock

    Build a standard data schema, and then implement various connectors to import systems you are familiar with for analysis. fooltrader is a quantitative analysis trading system designed using big data technology, including data capture, cleaning, structuring, calculation, display, backtesting and trading. Its goal is to provide a unified framework for the whole market (stock, futures, bonds, foreign exchange, digital currency, macroeconomics, etc.) for research, backtesting, forecasting, and trading. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Redis Desktop Manager

    Redis Desktop Manager

    :wrench: Cross-platform GUI management tool for Redis

    Redis Desktop Manager is a fast, open source Redis database management application based on Qt 5. It's available for Windows, Linux and MacOS and offers an easy-to-use GUI to access your Redis DB. With Redis Desktop Manager you can perform some basic operations such as view keys as a tree, CRUD keys and execute commands via shell. It also supports SSL/TLS encryption, SSH tunnels and cloud Redis instances, such as: Amazon ElastiCache, Microsoft Azure Redis Cache and Redis Labs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    paralline

    Big Data tool

    Paralline executes a python function (or lambda function) or a script over each line of huge text files, in parallel processes and aggregates the result to a list.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Cosmos DB Spark

    Cosmos DB Spark

    Apache Spark Connector for Azure Cosmos DB

    ...It also allows you to easily create a lambda architecture for batch-processing, stream-processing, and a serving layer while being globally replicated and minimizing the latency involved in working with big data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Vaex

    Vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python

    Data science solutions, insights, dashboards, machine learning, deployment. We start at 100GB. Vaex is a high-performance Python library for lazy Out-of-Core data frames (similar to Pandas), to visualize and explore big tabular datasets. It calculates statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) samples/rows per second.
    Downloads: 0 This Week
    Last Update:
    See Project
Auth0 Logo