Showing 29 open source projects for "big data"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 1
    Big-AGI

    Big-AGI

    AI suite powered by state-of-the-art models and providing advanced AI

    ...The workspace includes advanced features like Beam, which enables multi-model consensus and comparative responses to improve reliability and reduce hallucination, and robust persona management to tailor responses to specific roles or workflows. Big-AGI can be self-hosted or deployed in cloud environments, giving users full control over data and model access limits and avoiding vendor lock-in.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    DATA SCIENCE ROADMAP

    DATA SCIENCE ROADMAP

    Data Science Roadmap from A to Z

    DATA SCIENCE ROADMAP is an educational repository designed to guide learners through the process of becoming proficient in data science and machine learning. The project presents a structured roadmap that outlines the knowledge and skills required for different stages of a data science career. Topics typically include programming with Python, statistics, mathematics, machine learning algorithms, data visualization, and big data technologies. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    FinMind

    FinMind

    Open Data, more than 50 financial data

    In the era of big data, data is the foundation of everything. We collect more than 50 kinds of Taiwan stock related information and provide download, online analysis, and backtesting. Regardless of the program, you can download data through the api provided by FinMind, or you can download data directly from the website. After data is available, statistical analysis, regression analysis, time series analysis, machine learning, and deep learning can be performed. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Vespa

    Vespa

    The open big data serving engine

    Make AI-driven decisions using your data, in real-time. At any scale, with unbeatable performance. Vespa is a full-featured text search engine and supports both regular text search and fast approximate vector search (ANN). This makes it easy to create high-performing search applications at any scale, whether you want to use traditional techniques or a modern vector-based approach. You can even combine both approaches efficiently in the same query, something no other engine can do....
    Downloads: 7 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 5
    LlamaIndex

    LlamaIndex

    Central interface to connect your LLM's with external data

    LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion. Provides indices over your unstructured and structured data for use with LLM's. These indices help to abstract away common boilerplate and pain points for in-context learning. Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when the context is too big. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 6
    marimo

    marimo

    A reactive notebook for Python

    marimo is an open-source reactive notebook for Python, reproducible, git-friendly, executable as a script, and shareable as an app. marimo notebooks are reproducible, extremely interactive, designed for collaboration (git-friendly!), deployable as scripts or apps, and fit for modern Pythonista. Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ROOT

    ROOT

    Analyzing, storing and visualizing big data, scientifically

    ROOT is a unified software package for the storage, processing, and analysis of scientific data: from its acquisition to the final visualization in the form of highly customizable, publication-ready plots. It is reliable, performant and well supported, easy to use and obtain, and strives to maximize the quantity and impact of scientific results obtained per unit cost, both of human effort and computing resources. ROOT provides a very efficient storage system for data models, that...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    HASH

    HASH

    The best way to use and work with blocks

    ...You can read more about our big-picture vision at hash.dev
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    ChatGLM2-6B

    ChatGLM2-6B

    ChatGLM2-6B: An Open Bilingual Chat LLM

    ChatGLM2-6B is the second-gen Chinese-English conversational LLM from ZhipuAI/Tsinghua. It upgrades the base model with GLM’s hybrid pretraining objective, 1.4 TB bilingual data, and preference alignment—delivering big gains on MMLU, CEval, GSM8K, and BBH. The context window extends up to 32K (FlashAttention), and Multi-Query Attention improves speed and memory use. The repo includes Python APIs, CLI & web demos, OpenAI-style/FASTAPI servers, and quantized checkpoints for lightweight local deployment on GPUs or CPU/MPS.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Sail

    Sail

    A drop-in Apache Spark replacement written in Rust

    Sail is an open-source distributed computation framework designed to unify batch processing, stream processing, and AI workloads into a single, high-performance engine. It is built entirely in Rust, eliminating JVM overhead and enabling predictable performance, fast startup times, and improved memory safety compared to traditional big data frameworks. Sail is compatible with the Spark Connect protocol, which means existing Spark SQL and DataFrame workloads can run without code changes, making adoption seamless for teams already using Spark-based pipelines. The framework is designed to operate across a variety of environments, including local machines, Kubernetes clusters, and cloud deployments, allowing flexible scaling based on workload requirements. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    NeuroMatch Academy (NMA)

    NeuroMatch Academy (NMA)

    NMA Computational Neuroscience course

    NMA Computational Neuroscience course. We have curated a curriculum that spans most areas of computational neuroscience (a hard task in an increasingly big field!). We will expose you to both theoretical modeling and more data-driven analyses. The Neuro Video Series is a series of 12 videos that covers basic neuroscience concepts and neuroscience methods. These videos are completely optional and do not need to be watched in a fixed order so you can pick and choose which videos will help you brush up on your knowledge. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    AlphaTree

    AlphaTree

    DNN && GAN && NLP && BIG DATA

    AlphaTree is an educational repository that provides a visual roadmap of deep learning models and related artificial intelligence technologies. The project focuses on explaining the historical development and relationships between major neural network architectures used in modern machine learning. It presents diagrams and documentation describing the evolution of models such as LeNet, AlexNet, VGG, ResNet, DenseNet, and Inception networks. The repository organizes these architectures into a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Angel

    Angel

    A Flexible and Powerful Parameter Server for large-scale ML

    Angel is a high-performance distributed machine learning and graph computing platform based on the philosophy of Parameter Server. It is tuned for performance with big data from Tencent and has a wide range of applicability and stability, demonstrating an increasing advantage in handling higher-dimension models. Angel is jointly developed by Tencent and Peking University, taking account of both high availability in industry and innovation in academia. With a model-centered core design concept, Angel partitions the parameters of complex models into multiple parameter-server nodes and implements a variety of machine learning algorithms and graph algorithms using efficient model-updating interfaces and functions, as well as a flexible consistency model for synchronization. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MOA - Massive Online Analysis

    MOA - Massive Online Analysis

    Big Data Stream Analytics Framework.

    A framework for learning from a continuous supply of examples, a data stream. Includes classification, regression, clustering, outlier detection and recommender systems. Related to the WEKA project, also written in Java, while scaling to adaptive large scale machine learning.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 16
    airda

    airda

    airda(Air Data Agent

    airda(Air Data Agent) is a multi-smart body for data analysis, capable of understanding data development and data analysis needs, understanding data, generating data-oriented queries, data visualization, machine learning and other tasks of SQL and Python codes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Alink

    Alink

    Alink is the Machine Learning algorithm platform based on Flink

    Alink is Alibaba’s scalable machine learning algorithm platform built on Apache Flink, designed for batch and stream data processing. It provides a wide variety of ready-to-use ML algorithms for tasks like classification, regression, clustering, recommendation, and more. Written in Java and Scala, Alink is suitable for enterprise-grade big data applications where performance and scalability are crucial. It supports model training, evaluation, and deployment in real-time environments and integrates seamlessly into Alibaba’s cloud ecosystem.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    SentimentAnalysis-Rick&Morty

    SentimentAnalysis-Rick&Morty

    Rick & Morty Sentiment Analysis - End-of-Degree Project - UNIR

    The remarkable progress in the field of Big Data has driven the development of new technologies in natural language processing and data analysis. Text mining is a fascinating application of data analysis that extracts relevant information from related writings in different linguistic contexts. And therefore, in natural language processing, sentiment analysis and classification stands out as a key application supported by text mining.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    SparrowRecSys

    SparrowRecSys

    A Deep Learning Recommender System

    SparrowRecSys is an open-source deep learning recommendation system framework designed to demonstrate the architecture and implementation of modern industrial-scale recommender systems. The project integrates multiple machine learning models and data processing pipelines to simulate how real-world recommendation platforms operate. It includes components for offline data processing, feature engineering, model training, real-time data updates, and online recommendation services. SparrowRecSys...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Sulla

    Sulla

    Javascript Whatsapp API library for chatbots

    ...Sulla will remember the session so there is no need to authenticate every time. By default QR code will appear on the terminal. The decryption is being done as fast as possible (outruns native methods). Supports big files!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    surpriver

    surpriver

    Find big moving stocks before they move using machine learning

    surpriver is a machine learning project designed to identify unusual stock market activity that may precede large price movements. The system analyzes historical stock price and volume data to detect anomalies that could indicate potential trading opportunities. By applying machine learning techniques to market indicators, the tool attempts to identify patterns in trading behavior that deviate significantly from normal market activity. These anomalies are interpreted as signals that a stock...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Olivia

    Olivia

    Your new best friend powered by an artificial neural network

    Olivia is an open-source chatbot built in Golang using Machine Learning technologies. Its goal is to provide a free and open-source alternative to big services like DialogFlow. You can chat with her by speaking (STT) or writing, she replies with a text message but you can enable her voice (TTS). Olivia can listen to you by saying “Hey Olivia” or clicking on the central button. She speaks to reply to you unless you've disabled her voice. Olivia respects your privacy. All the data used by Olivia is saved in your client. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    PanoramaServer

    Open Source Panorama Server for free virtual tour of 360 degrees views

    Ideal for creating virtual tours of panoramic views for all sorts including property exhibition for brokers at real estate agencies/property agents, tour guide for indoor/outdoor venues, information to public/private facilities for curators, travel journal for tourist as log book, backdrop setting for storytelling, treasure hunt like games, big data mining for pattern through computer vision in artificial intelligence, etc. It is like creating your own Google Map Street View. All is required by the user is to have photos of equirectangular format (panorama) taken from 3D cameras common for on-site premises. These images can be referenced by the PanoramaServer to create virtual travels with 360 degrees view where viewers can navigate to different locations, view information, etc. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Easy Machine Learning

    Easy Machine Learning

    Easy Machine Learning is a general-purpose dataflow-based system

    Machine learning algorithms have become the key components in many big data applications. However, the full potential of machine learning is still far from being realized because using machine learning algorithms is hard, especially on distributed platforms such as Hadoop and Spark. The key barriers come from not only the implementation of the algorithms themselves but also the processing for applying them to real applications which often involve multiple steps and different algorithms. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    H2O-3

    H2O-3

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning

    ...H2O-3 integrates with big data technologies such as Hadoop and Apache Spark, enabling organizations to run machine learning workflows on large-scale data infrastructure. The platform also includes a web-based interface called Flow that allows users to build models interactively through notebooks and visual tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo