Search Results for "python data analysis" - Page 7

Showing 5155 open source projects for "python data analysis"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Picsart Enterprise Background Removal API for Stunning eCommerce Visuals Icon
    Picsart Enterprise Background Removal API for Stunning eCommerce Visuals

    Instantly remove the background from your images in just one click.

    With our Remove Background API tool, you can access the transformative capabilities of automation , which will allow you to turn any photo asset into compelling product imagery. With elevated visuals quality on your digital platforms, you can captivate your audience, and therefore achieve higher engagement and sales.
    Learn More
  • 1
    FiftyOne

    FiftyOne

    The open-source tool for building high-quality datasets

    ... to boost the performance of your model. FiftyOne provides the building blocks for optimizing your dataset analysis pipeline. Use it to get hands-on with your data, including visualizing complex labels, evaluating your models, exploring scenarios of interest, identifying failure modes, finding annotation mistakes, and much more! Surveys show that machine learning engineers spend over half of their time wrangling data, but it doesn't have to be that way.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    TURF

    TURF

    A modular geospatial engine written in JavaScript

    Turf is a JavaScript library for spatial analysis. It includes traditional spatial operations, helper functions for creating GeoJSON data, and data classification and statistics tools. Turf can be added to your website as a client-side plugin, or you can run Turf server-side with Node.js. Modular, simple-to-understand JavaScript functions that speak GeoJSON. Turf is a collection of small modules, you only need to take what you want to use. Takes advantage of the newest algorithms and doesn't...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    MarkItDown

    MarkItDown

    Python tool for converting files and office documents to Markdown

    MarkItDown is a lightweight Python utility developed by Microsoft for converting various files and office documents to Markdown format. It is particularly useful for preparing documents for use with large language models and related text analysis pipelines. ​
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    Rerun

    Rerun

    Visualize streams of multimodal data

    Rerun is an open-source tool that helps developers visualize real-time multimodal data streams, such as images, point clouds, and tensors, for debugging and understanding ML and robotics systems. Designed for use with Python and Rust, it captures logged data and renders it through an interactive desktop interface, making it easier to understand how complex systems behave over time.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    Prophet

    Prophet

    Tool for producing high quality forecasts for time series data

    Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well. Prophet is used in many applications across Facebook for producing reliable forecasts for planning and goal setting...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    CellTypist

    CellTypist

    A tool for semi-automatic cell type classification, harmonization

    ... and accurate prediction. Scalable and flexible. Python-based implementation is easy to integrate into existing pipelines. A community-driven encyclopedia for cell types.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Trame

    Trame

    Weave various components and technologies into a Web App

    ... under Apache License Version 2.0 which allows users to create open source or commercial applications without any licensing worries. By relying simply on Python and HTML, trame focuses on one's data and associated analysis and visualizations while hiding the complications of web development.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    SageMaker Spark Container

    SageMaker Spark Container

    Docker image used to run data processing workloads

    Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The SageMaker Spark Container is a Docker image used to run batch data...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    TensorFlow

    TensorFlow

    TensorFlow is an open source library for machine learning

    Originally developed by Google for internal use, TensorFlow is an open source platform for machine learning. Available across all common operating systems (desktop, server and mobile), TensorFlow provides stable APIs for Python and C as well as APIs that are not guaranteed to be backwards compatible or are 3rd party for a variety of other languages. The platform can be easily deployed on multiple CPUs, GPUs and Google's proprietary chip, the tensor processing unit (TPU). TensorFlow...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • 10
    Pyodide

    Pyodide

    Pyodide is a Python distribution for the browser and Node.js

    Pyodide brings the Python runtime to the browser by compiling Python and its scientific libraries to WebAssembly. It allows developers to run Python code directly in web browsers without a server, supporting packages like NumPy, Pandas, and Matplotlib. Pyodide opens up new possibilities for interactive data analysis, scientific computing, and educational tools in web environments, all while integrating seamlessly with JavaScript.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Dagster

    Dagster

    An orchestration platform for the development, production

    Dagster is an orchestration platform for the development, production, and observation of data assets. Dagster as a productivity platform: With Dagster, you can focus on running tasks, or you can identify the key assets you need to create using a declarative approach. Embrace CI/CD best practices from the get-go: build reusable components, spot data quality issues, and flag bugs early. Dagster as a robust orchestration engine: Put your pipelines into production with a robust multi-tenant...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    JumpServer

    JumpServer

    Manage assets on different clouds at the same time

    ... and reuse. Prevent internal misuse and permission abuse. Management of people and assets. Retrospective safeguards and basis for accident analysis.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    PandasAI

    PandasAI

    PandasAI is a Python library that integrates generative AI

    PandasAI is a Python library that adds Generative AI capabilities to pandas, the popular data analysis and manipulation tool. It is designed to be used in conjunction with pandas, and is not a replacement for it. PandasAI makes pandas (and all the most used data analyst libraries) conversational, allowing you to ask questions to your data in natural language. For example, you can ask PandasAI to find all the rows in a DataFrame where the value of a column is greater than 5, and it will return...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    DVC

    DVC

    Data Version Control | Git for Data & Models

    DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code. Version control machine learning models, data sets and intermediate files. DVC connects them with code and uses Amazon S3, Microsoft Azure Blob Storage, Google Drive, Google Cloud Storage, Aliyun OSS, SSH/SFTP, HDFS, HTTP, network-attached storage, or disc to store file contents. Version control machine learning models, data sets...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    tslearn

    tslearn

    The machine learning toolkit for time series analysis in Python

    The machine learning toolkit for time series analysis in Python. tslearn expects a time series dataset to be formatted as a 3D numpy array. The three dimensions correspond to the number of time series, the number of measurements per time series and the number of dimensions respectively (n_ts, max_sz, d). In order to get the data in the right format.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    LlamaIndex

    LlamaIndex

    Central interface to connect your LLM's with external data

    LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    TorchRL

    TorchRL

    A modular, primitive-first, python-first PyTorch library

    TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch. TorchRL provides PyTorch and python-first, low and high-level abstractions for RL that are intended to be efficient, modular, documented, and properly tested. The code is aimed at supporting research in RL. Most of it is written in Python in a highly modular way, such that researchers can easily swap components, transform them, or write new ones with little effort.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    Streamlink

    Streamlink

    Streamlink is a CLI utility which pipes video streams

    Streamlink is a command-line utility that pipes video streams from various services into a video player, such as VLC. The main purpose of Streamlink is to avoid resource-heavy and unoptimized websites, while still allowing the user to enjoy various streamed content. There is also an API available for developers who want access to the stream data. Streamlink is built upon a plugin system that allows support for new services to be easily added. Most of the big streaming services are supported...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    Watchman

    Watchman

    A file watching service that records when files change

    Watchman purpose is to watch files and record when there are changes. It can trigger actions (rebuilding assets, for example) when there is a change in matching files. The watchman executable has both the client and the server components of the watchman service. When running watchman, it will attempt to communicate with your existing server instance (each user has their own persistent process), and will attempt to start it if it doesn’t exist. There are some options that affect how watchman...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    Logfire MCP

    Logfire MCP

    The Logfire MCP Server is here

    The Logfire MCP Server is a Model Context Protocol server that allows AI applications to access OpenTelemetry traces and metrics sent to Logfire. It enables retrieval and analysis of telemetry data, enhancing debugging and observability workflows. ​
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Pacu

    Pacu

    The AWS exploitation framework, designed for testing security

    Pacu (named after a type of Piranha in the Amazon) is a comprehensive AWS security-testing toolkit designed for offensive security practitioners. While several AWS security scanners currently serve as the proverbial “Nessus” of the cloud, Pacu is designed to be the Metasploit equivalent. Written in Python 3 with a modular architecture, Pacu has tools for every step of the pen testing process, covering the full cyber kill chain. Pacu is the aggregation of all of the exploitation experience...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    DOLMA

    DOLMA

    Data and tools for generating and inspecting OLMo pre-training data

    DOLMA (Data Optimization and Learning for Model Alignment) is a framework designed to manage large-scale datasets for training and fine-tuning language models efficiently.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    DoWhy

    DoWhy

    DoWhy is a Python library for causal inference

    DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks. Much like machine learning libraries have done for prediction, DoWhy is a Python library that aims to spark causal thinking and analysis. DoWhy provides a wide variety of algorithms for effect estimation, causal structure learning, diagnosis of causal...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    Perspective

    Perspective

    A data visualization and analytics component

    Perspective is a high-performance data visualization library for building real-time, interactive analytics dashboards. Developed by FINOS, it supports WebAssembly-powered pivot tables and can handle large streaming datasets with speed and flexibility. Perspective is ideal for fintech, trading, and IoT applications where insights from live data need to be visualized, sliced, and explored quickly in a browser.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    Scanpy

    Scanpy

    Single-cell analysis in Python

    Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells.
    Downloads: 1 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.