Showing 8 open source projects for "python data analysis"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Deequ

    Deequ

    Deequ is a library built on top of Apache Spark

    ...It also includes a little domain-specific language called DQDL (Data Quality Definition Language) which allows declarative specification of quality rules. Users typically run Deequ before feeding data downstream (to ML pipelines, analytics, or production systems), enabling early detection and isolation of data errors. There is also a Python wrapper, PyDeequ, for users who prefer working from Python environments.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    ...With Spark Streaming (microbatches) and Structured Streaming, it delivers low-latency event processing suitable for real-time analytics. The built-in MLlib library provides scalable machine learning algorithms, while GraphX enables graph computations integrated with data pipelines. Spark supports multiple languages—Scala, Java, Python, R—and connects with many storage systems like HDFS, S3, Cassandra, and streaming platforms like Kafka, making it a versatile choice for big data workloads in analytics, ETL, and data science.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    X's Recommendation Algorithm

    X's Recommendation Algorithm

    Source code for the X Recommendation Algorithm

    The Algorithm is Twitter’s open source release of the core ranking system that powers the platform’s home timeline. It provides transparency into how tweets are selected, prioritized, and surfaced to users, reflecting Twitter’s move toward openness in recommendation algorithms. The repository contains the recommendation pipeline, which incorporates signals such as engagement, relevance, and content features, and demonstrates how they combine to form ranked outputs. Written primarily in...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    CoolplaySpark

    CoolplaySpark

    Spark Cool Play: Spark source code analysis, Spark class library, etc.

    CoolplaySpark is a learning and practice repository designed to help users understand and work with Apache Spark. It serves as a companion resource for the book 深入理解Spark核心思想与源码分析 (In-Depth Understanding of Spark’s Core Concepts and Source Code Analysis). The project contains annotated examples, explanations, and exercises that guide learners through Spark’s architecture, execution model, and source code internals. It is particularly valuable for developers who want to strengthen their understanding of Spark by not only using it as a data processing engine but also exploring how its internals function. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6

    Waterloo

    Java-based scientific graphics

    Java-based scientific graphics with support for Java, Groovy, MATLAB, Python, the R statistical environment, Scala and SciLab.
    Leader badge
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Cane is a Data Manipulation Interface(DMI) based on code behaviour analysis. Cane enable developers manipulate data in the way close to natural logical. Developers can finish their job in one continuous operation supplied by Cane's chain operation
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Java OpenCL Process Virtual Machine. Spring IoC based framework for complex data analysis with OpenCL computing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB