Showing 311 open source projects for "big data"

View related business solutions
  • Collect! is a highly configurable debt collection software Icon
    Collect! is a highly configurable debt collection software

    Everything that matters to debt collection, all in one solution.

    The flexible & scalable debt collection software built to automate your workflow. From startup to enterprise, we have the solution for you.
    Learn More
  • Turn traffic into pipeline and prospects into customers Icon
    Turn traffic into pipeline and prospects into customers

    For account executives and sales engineers looking for a solution to manage their insights and sales data

    Docket is an AI-powered sales enablement platform designed to unify go-to-market (GTM) data through its proprietary Sales Knowledge Lake™ and activate it with intelligent AI agents. The platform helps marketing teams increase pipeline generation by 15% by engaging website visitors in human-like conversations and qualifying leads. For sales teams, Docket improves seller efficiency by 33% by providing instant product knowledge, retrieving collateral, and creating personalized documents. Built for GTM teams, Docket integrates with over 100 tools across the revenue tech stack and offers enterprise-grade security with SOC 2 Type II, GDPR, and ISO 27001 compliance. Customers report improved win rates, shorter sales cycles, and dramatically reduced response times. Docket’s scalable, accurate, and fast AI agents deliver reliable answers with confidence scores, empowering teams to close deals faster.
    Learn More
  • 1
    data.table

    data.table

    Extends base R’s data for high-performance data manipulation

    data.table is an R package that extends base R’s data.frame for high-performance data manipulation. It offers concise syntax, blazing speed, and memory-efficient operations. It supports fast file reading/writing, joins, grouping, reshaping, and updates by reference. It is heavily used in large data workflows, big data in R, production pipelines, etc. Extremely efficient grouping/aggregation/summarization; can handle very large datasets (hundreds of millions to billions of rows) in memory (if available). ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Genie

    Genie

    Distributed Big Data Orchestration Service

    Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Querybook

    Querybook

    Big Data Querying UI, combining collocated table metadata

    Querybook is Pinterest’s open-source big data IDE via a notebook interface. Querybook’s core focus is to make composing queries, creating analyses, and collaborating with others as simple as possible. Organize rich text, queries, and charts into a notebook to easily document your analyses. Work collaboratively with others in a DataDoc and get real-time updates. The Query Editor is aware of your tables and their columns, as such it provides autocompletion, syntax highlighting, and the ability to hover or click on a table to view its information. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    pandas

    pandas

    Fast, flexible and powerful Python data analysis toolkit

    pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase. pandas is continuously being developed to be a fundamental high-level building...
    Downloads: 33 This Week
    Last Update:
    See Project
  • Dominate AI Search Results Icon
    Dominate AI Search Results

    Generative Al is shaping brand discovery. AthenaHQ ensures your brand leads the conversation.

    AthenaHQ is a cutting-edge platform for Generative Engine Optimization (GEO), designed to help brands optimize their visibility and performance across AI-driven search platforms like ChatGPT, Google AI, and more.
    Learn More
  • 5
    FinMind

    FinMind

    Open Data, more than 50 financial data

    In the era of big data, data is the foundation of everything. We collect more than 50 kinds of Taiwan stock related information and provide download, online analysis, and backtesting. Regardless of the program, you can download data through the api provided by FinMind, or you can download data directly from the website. After data is available, statistical analysis, regression analysis, time series analysis, machine learning, and deep learning can be performed. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    XCharts

    XCharts

    A charting and data visualization library for Unity

    A charting and data visualization library for Unity. Unity data visualization chart plugin. A UGUIpowerful, easy-to-use, parameter-configurable data visualization chart plug-in. It supports ten built-in charts. A powerful, easy-to-use, configurable charting and data visualization library for Unity. Visual configuration of parameters, real-time preview of effects, and pure code drawing without additional resources. Support ten built-in charts such as line chart, column chart, pie chart, radar...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Apache Hudi

    Apache Hudi

    Upserts, Deletes And Incremental Processing on Big Data

    Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Apache HBase

    Apache HBase

    Get random, realtime read/write access to your Big Data

    Use Apache HBase™ when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables, billions of rows X millions of columns, atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Logan

    Logan

    Logan is a lightweight case logging system based on mobile platform

    ...To put it simply, the traditional idea is to piece together the problems that appear in the logs of each system, but the new idea is to aggregate and analyze all the logs generated by the user to find the scenes with problems. In the future, we will provide a data platform based on Logan big data, including advanced functions such as machine learning, troubleshooting log solution, and big data feature analysis.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Run applications fast and securely in a fully managed environment Icon
    Run applications fast and securely in a fully managed environment

    Cloud Run is a fully-managed compute platform that lets you run your code in a container directly on top of scalable infrastructure.

    Run frontend and backend services, batch jobs, deploy websites and applications, and queue processing workloads without the need to manage infrastructure.
    Try for free
  • 10
    HugeGraph

    HugeGraph

    A graph database that supports more than 100+ billion data

    ...HugeGraph supports fast import performance in the case of more than 10 billion Vertices and Edges Graph, millisecond-level OLTP query capability, and can be integrated into big data platforms like Hadoop or Spark for OLAP analysis. The main scenarios of HugeGraph include correlation search, fraud detection, and knowledge graph. Not only supports Gremlin graph query language and RESTful API but also provides commonly used graph algorithm APIs. To help users easily implement various queries and analyses, HugeGraph has a full range of accessory tools, such as supporting distributed storage, data replication, scaling horizontally, and supports many built-in backends of storage engines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    GridDB

    GridDB

    GridDB is a next-generation open source database

    A cyber-physical systems is a system that collects a variety of data in physical space (the real world), analyzes and converts it into knowledge in cyberspace, and feeds the knowledge back to the real world to revitalize industry and solve social problems. GridDB is an open database that enables real-time processing of vast amounts of time-series data in physical space, which is necessary to realize a cyber-physical system. Multi-model architecture capable of supporting various data stores...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Apache InLong

    Apache InLong

    Apache InLong - a one-stop integration framework for massive data

    ...InLong was originally built at Tencent, which has served online businesses for more than 8 years, to support massive data (data scale of more than 80 trillion pieces of data per day) reporting services in big data scenarios. The entire platform has integrated 5 modules: Ingestion, Convergence, Caching, Sorting, and Management, so that the business only needs to provide data sources, data service quality, data landing clusters and data landing formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Apache Bigtop

    Apache Bigtop

    Bigtop is an Apache Foundation project for Infrastructure Engineers

    Apache Bigtop is a project focused on building and packaging the Hadoop ecosystem and related big data components. It provides a consistent framework for testing, packaging, and deploying Hadoop distributions, including tools like HDFS, YARN, Spark, Hive, HBase, and more. By maintaining cross-platform builds (RPMs, DEBs, Docker images, and Kubernetes support), Bigtop makes it easier for organizations to deploy big data stacks in different environments. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Apache Doris

    Apache Doris

    MPP-based interactive SQL data warehousing for reporting and analysis

    Apache Doris is a modern MPP analytical database product. It can provide sub-second queries and efficient real-time data analysis. With it's distributed architecture, up to 10PB level datasets will be well supported and easy to operate. Apache Doris can meet various data analysis demands, including history data reports, real-time data analysis, interactive data analysis, and exploratory data analysis. Make your data analysis easier! Support standard SQL language, compatible with MySQL...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Fluid

    Fluid

    Fluid, elastic data abstraction and acceleration for BigData/AI apps

    Fluid, elastic data abstraction and acceleration for BigData/AI applications in the cloud. Provide DataSet abstraction for underlying heterogeneous data sources with multidimensional management in a cloud environment. Enable dataset warmup and acceleration for data-intensive applications by using a distributed cache in Kubernetes with observability, portability, and scalability. Taking characteristics of application and data into consideration for cloud application/dataset scheduling to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OnlineStats.jl

    OnlineStats.jl

    Single-pass algorithms for statistics

    OnlineStats does statistics and data visualization for big/streaming data via online algorithms. High-performance single-pass algorithms for statistics and data viz. Updated one observation at a time. Algorithms use O(1) memory. Algorithms use O(1) memory.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Apache RocketMQ

    Apache RocketMQ

    Distributed messaging and streaming platform with low latency

    ...A variety of cross language clients, such as Java, C/C++, Python, Go. Pluggable transport protocols, such as TCP, SSL, AIO. Built-in message tracing capability, also support opentracing. Versatile big-data and streaming ecosytem integration. Message retroactivity by time or offset. Reliable FIFO and strict ordered messaging in the same queue. Efficient pull and push consumption model. Million-level message accumulation capacity in a single queue. Multiple messaging protocols like JMS and OpenMessaging. Flexible distributed scale-out deployment architecture. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    Arroyo

    Arroyo

    Distributed stream processing engine in Rust

    Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    ODD Platform

    ODD Platform

    First open-source data discovery and observability platform

    Unlock the power of big data with OpenDataDiscovery Platform. Experience seamless end-to-end insights, powered by unprecedented observability and trust - from ingestion to production - while building your ideal tech stack! Democratize data and accelerate insights. Find data that fits your use case and discover hints left by your peers to leverage existing knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Blue Whale Configuration Platform

    Blue Whale Configuration Platform

    Blue Whale smart cloud configuration platform

    Has accumulated experience in supporting hundreds of Tencent businesses, compatible with various complex system architectures, born in operation and maintenance, and proficient in operation and maintenance. From configuration management to job execution, task scheduling and monitoring self-healing, and then through operation and maintenance big data analysis to assist operational decision-making, it covers the full-cycle assurance management of business operations in a comprehensive manner. The open PaaS has a powerful development framework and scheduling engine, as well as a complete operation and maintenance development training system, which helps the rapid transformation and upgrading of operation and maintenance. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Nebula Graph

    Nebula Graph

    A distributed, fast open-source graph database

    The graph database built for super large-scale graphs with milliseconds of latency. Optimized SUBGRAPH and FIND PATH for better performance. Optimized query paths to reduce redundant paths and time complexity. Optimized the method to get properties for better performance of MATCH statements. Nebula Graph adopts the Apache 2.0 license, one of the most permissive free software licenses in the world. Free as in freedom, because, under the Apache 2.0 license, you can use, copy, modify and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    JuiceFS

    JuiceFS

    JuiceFS is a distributed POSIX file system built on top of Redis

    ...Whether it's a public cloud, private cloud, or hybrid cloud, JuiceFS is available on any cloud of your choice and delivers flexibility, availability, scalability and strong consistency for your data-intensive applications. Purposely built to serve big data scenarios such as self-driving model training, recommendation engine, and Next-generation Gene Sequencing, JuiceFS specializes in high performance and easier management of tens of billion of files management. We bring JuiceFS to developers with the hope that it will be easy to use, reliable, high-performance, and solve all your file storage problems in a cloud environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Vespa

    Vespa

    The open big data serving engine

    Make AI-driven decisions using your data, in real-time. At any scale, with unbeatable performance. Vespa is a full-featured text search engine and supports both regular text search and fast approximate vector search (ANN). This makes it easy to create high-performing search applications at any scale, whether you want to use traditional techniques or a modern vector-based approach. You can even combine both approaches efficiently in the same query, something no other engine can do....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    .NET for Apache Spark

    .NET for Apache Spark

    A free, open-source, and cross-platform big data analytics framework

    .NET for Apache Spark provides high-performance APIs for using Apache Spark from C# and F#. With these .NET APIs, you can access the most popular Dataframe and SparkSQL aspects of Apache Spark, for working with structured data, and Spark Structured Streaming, for working with streaming data. .NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Brave for iOS

    Brave for iOS

    Brave iOS Browser

    ...By removing rare ads and trackers, there will be fewer ( visible or hidden ) things on every website you visit. And that means it will charge faster and save battery and even mobile data.
    Downloads: 93 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next