20 projects for "big data" with 2 filters applied:

  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 1
    testng

    testng

    TestNG testing framework

    TestNG is a testing framework inspired from JUnit and NUnit but introduces some new functionalities that make it more powerful and easier to use. Run your tests in arbitrarily big thread pools with various policies available (all methods in their own thread, one thread per test class, etc...).
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    Bacalhau

    Bacalhau

    Community-driven, simple, yet powerful framework

    Bacalhau is a decentralized compute platform for running jobs on data stored across distributed networks, like IPFS or Filecoin, without moving the data to centralized cloud environments. It allows developers to run containerized workloads close to where the data lives, reducing latency, cost, and privacy risks. Bacalhau supports various runtime environments and is designed to make decentralized data processing as accessible as traditional cloud computing. It’s especially useful for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    ...With Spark Streaming (microbatches) and Structured Streaming, it delivers low-latency event processing suitable for real-time analytics. The built-in MLlib library provides scalable machine learning algorithms, while GraphX enables graph computations integrated with data pipelines. Spark supports multiple languages—Scala, Java, Python, R—and connects with many storage systems like HDFS, S3, Cassandra, and streaming platforms like Kafka, making it a versatile choice for big data workloads in analytics, ETL, and data science.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    huihut interview

    huihut interview

    A summary of C/C++ technical interview basics

    ...It’s organized to be approachable whether you’re a student preparing for your first internship or an experienced engineer brushing up on fundamentals before a big interview round.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches. It implements multiple PDF decompression filters and handles common font encoding pathways, which are essential for turning raw PDF content streams into readable text. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Nano Events

    Nano Events

    Simple and tiny (107 bytes) event emitter library for JavaScript

    Nano Events is a minimalistic, high-performance event emitter library for JavaScript. Its goal is to provide the simplest possible API to add pub/sub capabilities (emitters and listeners) to any JS object or application, while keeping overhead and bundle size extremely small. Rather than offering many complex features, nanoevents focuses on the core primitives: creating an emitter, subscribing to named events, emitting events with arbitrary data, and unsubscribing. Because of its minimal API...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    MOA - Massive Online Analysis

    MOA - Massive Online Analysis

    Big Data Stream Analytics Framework.

    A framework for learning from a continuous supply of examples, a data stream. Includes classification, regression, clustering, outlier detection and recommender systems. Related to the WEKA project, also written in Java, while scaling to adaptive large scale machine learning.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 8
    applied-ml

    applied-ml

    Papers & tech blogs by companies sharing their work on data science

    ...For someone designing—or planning to build—a production ML system, this repo provides patterns, precedents, and lessons learned from firms that operate at big scale.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Guide to Technical Interviews

    Guide to Technical Interviews

    Guided collection and roadmap for preparing technical interviews

    This repository is a guided collection and roadmap for preparing technical interviews, covering the gamut from algorithmic challenges and data structures to system design and behavioral preparation. It consolidates resources like interview question lists, practice platforms, mock interview sites, and recommended books or blogs. For individuals targeting big-tech or rigorous interview processes, this acts as a structured study guide rather than a random list of links. The README breaks down preparation into categories — coding problems, system design, mock interview sites — so you can identify gap areas and allocate study time accordingly. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 10
    jQuery json-viewer

    jQuery json-viewer

    jQuery plugin for displaying JSON data

    json-viewer is a jQuery plugin for easily displaying JSON objects by transforming them into HTML.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Svelte forms lib

    Svelte forms lib

    A lightweight library for managing forms in Svelte

    Svelte Forms lib is a Formik-inspired library for building forms easily in a Svelte project. When building modern web applications forms often play a big part in it. We use forms to log in, place orders, book flights and perform other data-entry tasks. In developing a form, it's important to create a flow that guides the user efficiently and effectively through the workflow. This library helps you build forms by exposing an easy API for form creation, validation and submission.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Big List of Naughty Strings

    Big List of Naughty Strings

    List of strings which have a high probability of causing issues

    The Big List of Naughty Strings is a community-maintained catalog of “gotcha” inputs that commonly break software, from unusual Unicode to SQL and script injection payloads. It exists so developers and QA engineers can easily test edge cases that normal test data would miss, such as zero-width characters, right-to-left marks, emojis, foreign alphabets, and long or malformed strings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    data-science-ipython-notebooks

    data-science-ipython-notebooks

    Data science Python notebooks: Deep learning

    Data Science IPython Notebooks is a broad, curated set of Jupyter notebooks covering Python, data wrangling, visualization, machine learning, deep learning, and big data tools. It aims to be a practical map of the ecosystem, showing hands-on examples with libraries such as NumPy, pandas, matplotlib, scikit-learn, and others. Many notebooks introduce concepts step by step, then apply them to real datasets so readers can see techniques in action.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Skill Map

    Skill Map

    A visualization of programmer skill maps

    Skill‑Map is an open-source, collaborative project—originating from Geekbang—offering a structured visualization of programmer skill maps across domains like AI, front-end, backend, architecture, DevOps, and more. It serves as a navigable resource to organize learning paths and essential knowledge areas. Covers areas like AI, big data, architecture, frontend, backend, DevOps, testing, etc. Visual representation of programming and IT skill domains. Encourages community collaboration and feedback via GitHub Issues. Open to content updates, additions, and evolution as a living reference.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    Big Sack

    Big Sack: A lightweight Java Key/Value store with undo and disk cache.

    Big Sack is a Java persistence mechanism that allows storage of key value pairs following the popular Big Data paradigms. Its a very simple and straightforward way to bridge the gap between in-memory data structures and long-term storage. It has the convenience of Java SDK TreeMap and TreeSet classes and is used the same easy way, but it includes rollback through undo logging to checkpoint data so it does not wind up in an unknown state regardless of failures. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    microhex [discontinued]

    microhex [discontinued]

    Crossplatform hex-editing software based on Python and Qt

    This project is no longer supported. Use it on your own risk (or not use at all). Microhex is an intuitive HEX editing application that enables you to view and manipulate binary data for any file in your computer. Microhex displays the integer column and the characters column, allowing you to add new columns and delete existing ones. Each column can be assigned an unlimited number of linked address bars.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 17
    pyIRDG

    pyIRDG

    IMDb Relational Dataset Generator

    pyIRDG is a program written in Python to generate relational datasets in Prolog format. It uses data from the Internet Movie Database in combination with IMDbPY as backend. A graphical user interface written in pyQt allows the user to link multiple entities together as model for the generation process. The big four entities are Title, Person, Company and Character. Many attributes can be chosen for adding to the output .pl file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    A generic SQL driven data audit tool for detecting differences between any JDBC accessible database tables and other data sources. Platform independent. It's a unix like diff for databases. Produces key values with the differing column name and data
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Protodata
    Protodata is a language for manually creating binary data files without the use of a hex editor, with the original purpose of prototyping new file formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    pXw4Pa (poor XML wrapper for PHP arrays) are 2 simple php functions written with php4.3.7 that can read/write a php array from/to an xml file. Can be used to store data on xml files simply and fast, without to make use of others big stuff.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo