Showing 1153 open source projects for "python data analysis"

View related business solutions
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Orange Data Mining

    Orange Data Mining

    Orange: Interactive data analysis

    Open source machine learning and data visualization. Build data analysis workflows visually, with a large, diverse toolbox. Perform simple data analysis with clever data visualization. Explore statistical distributions, box plots and scatter plots, or dive deeper with decision trees, hierarchical clustering, heatmaps, MDS and linear projections. Even your multidimensional data can become sensible in 2D, especially with clever attribute ranking and selections. Interactive data exploration...
    Downloads: 38 This Week
    Last Update:
    See Project
  • 2
    Rust Data Analysis

    Rust Data Analysis

    Rust for data analysis encyclopedia (WIP)

    Welcome to the Rust Data Analysis repository! This collection of Jupyter notebooks provides a comprehensive exploration of data analysis using Rust. Powered by a Rust kernel, these notebooks allow you to dive deep into the realm of data analysis, leveraging the capabilities of the Rust programming language. With the help of various Rust libraries, such as ndarray, plotters, and more, you'll be able to extract valuable insights from different datasets with ease.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Cookiecutter Data Science

    Cookiecutter Data Science

    Project structure for doing and sharing data science work

    A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. When we think about data analysis, we often think just about the resulting reports, insights, or visualizations. While these end products are generally the main event, it's easy to focus on making the products look nice and ignore the quality of the code that generates them. Because these end products are created programmatically, code quality is still important! And we're not talking...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    data-diff

    data-diff

    Efficiently diff rows across two different databases

    We're excited to announce the launch of a new open-source product, data-diff that makes comparing datasets across databases fast at any scale. data-diff automates data quality checks for data replication and migration. In modern data platforms, data is constantly moving between systems, and at the modern data volume and complexity, systems go out of sync all the time. Until now, there has not been any tooling to ensure that when the data is correctly copied. Replicating data at scale, across...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    This curated list contains 390 awesome open-source projects with a total of 1.4M stars grouped into 28 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from GitHub and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! Ranked list of awesome python libraries for web development...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    dbt-re-data

    dbt-re-data

    re_data - fix data issues before your users & CEO would discover them

    re_data is an open-source data reliability framework for the modern data stack. Currently, re_data focuses on observing the dbt project (together with underlaying data warehouse - Postgres, BigQuery, Snowflake, Redshift). Data transformations in re_data are implemented and exposed as models & macros in this dbt package. Gather all relevant outputs about your data in one place using our cloud. Invite your team and debug it easily from there. Go back in time, and see your past metadata. Set up...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    pandas

    pandas

    Fast, flexible and powerful Python data analysis toolkit

    pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase. pandas is continuously being developed to be a fundamental high-level building...
    Downloads: 121 This Week
    Last Update:
    See Project
  • 9
    scikit-learn

    scikit-learn

    Machine learning in Python

    scikit-learn is an open source Python module for machine learning built on NumPy, SciPy and matplotlib. It offers simple and efficient tools for predictive data analysis and is reusable in various contexts.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • 10
    DuckDB

    DuckDB

    DuckDB is an in-process SQL OLAP Database Management System

    ... data analysis, e.g. Joining & aggregate multiple large tables. Concurrent large changes, to multiple large tables, e.g. appending rows, adding/removing/updating columns. Large result set transfer to client. For development, DuckDB requires CMake, Python3 and a C++11 compliant compiler. Run make in the root directory to compile the sources. For development, use make debug to build a non-optimized debug version.
    Downloads: 23 This Week
    Last Update:
    See Project
  • 11
    MinerU

    MinerU

    A high-quality tool for convert PDF to Markdown and JSON

    MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 12
    Quadratic

    Quadratic

    Data science spreadsheet with Python & SQL

    Quadratic enables your team to work together on data analysis to deliver better results, faster. You already know how to use a spreadsheet, but you’ve never had this much power before. Quadratic is a Web-based spreadsheet application that runs in the browser and as a native app (via Electron). Our goal is to build a spreadsheet that enables you to pull your data from its source (SaaS, Database, CSV, API, etc) and then work with that data using the most popular data science tools today (Python...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 13
    Bdash

    Bdash

    Simple SQL Client for lightweight data analysis

    Simple SQL Client for lightweight data analysis. You can share the result with gist. Supports MySQL, PostgreSQL (Amazon Redshift), SQLite3, Google BigQuery, Treasure Data, Amazon Athena. You can download and install from Web Site or Releases.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 14
    folium

    folium

    Python data, Leaflet.js maps

    folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via folium. folium makes it easy to visualize data that’s been manipulated in Python on an interactive leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing rich vector/raster/HTML visualizations as markers on the map. The library has a number of built...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 15
    Airbyte

    Airbyte

    Data integration platform for ELT pipelines from APIs, databases

    We believe that only an open-source solution to data movement can cover the long tail of data sources while empowering data engineers to customize existing connectors. Our ultimate vision is to help you move data from any source to any destination. Airbyte already provides the largest catalog of 300+ connectors for APIs, databases, data warehouses, and data lakes. Moving critical data with Airbyte is as easy and reliable as flipping on a switch. Our teams process more than 300 billion rows...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 16
    CyberChef

    CyberChef

    A web app for encryption, encoding, compression and data analysis

    CyberChef, developed by GCHQ, is a versatile web application dubbed the "Cyber Swiss Army Knife." It enables users to perform a wide array of operations on data, including encryption, encoding, compression, and analysis, all within a browser interface.​
    Downloads: 12 This Week
    Last Update:
    See Project
  • 17
    seaborn

    seaborn

    Statistical data visualization in Python

    Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Seaborn helps you explore and understand your data. Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 18
    The PyPlot module for Julia

    The PyPlot module for Julia

    Plotting for Julia based on matplotlib.pyplot

    This module provides a Julia interface to the Matplotlib plotting library from Python, and specifically to the matplotlib.pyplot module. PyPlot uses the Julia PyCall package to call Matplotlib directly from Julia with little or no overhead (arrays are passed without making a copy). (See also PythonPlot.jl for a version of PyPlot.jl using the alternative PythonCall.jl package.) This package takes advantage of Julia's multimedia I/O API to display plots in any Julia graphical backend, including...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 19
    reticulate

    reticulate

    R Interface to Python

    reticulate is an R package from Posit that creates seamless interoperability between R and Python. It lets you call Python modules, classes, and functions from within R, automatically translating between R and Python data structures. Useful for combining Python tooling with R projects, data analysis, and RMarkdown reports.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    DataEase

    DataEase

    Data visualization analysis tool

    An open source data visualization analysis tool available to everyone. DataEase is an open-source data visualization analysis tool that helps users quickly analyze data and gain insight into business trends, so as to achieve business improvement and optimization. DataEase supports rich data source connections, can quickly create charts by dragging and dropping, and can easily share with others. Supports rich chart types (Apache ECharts / AntV), supports drag-and-drop method to quickly create...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    Graphs.jl

    Graphs.jl

    An optimized graphs package for the Julia programming language

    The goal of Graphs.jl is to offer a performant platform for network and graph analysis in Julia, following the example of libraries such as NetworkX in Python. Offers a set of simple, concrete graph implementations – SimpleGraph (for undirected graphs) and SimpleDiGraph (for directed graphs), an API for the development of more sophisticated graph implementations under the AbstractGraph type, and a large collection of graph algorithms with the same requirements as this API.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 23
    Dagster

    Dagster

    An orchestration platform for the development, production

    Dagster is an orchestration platform for the development, production, and observation of data assets. Dagster as a productivity platform: With Dagster, you can focus on running tasks, or you can identify the key assets you need to create using a declarative approach. Embrace CI/CD best practices from the get-go: build reusable components, spot data quality issues, and flag bugs early. Dagster as a robust orchestration engine: Put your pipelines into production with a robust multi-tenant...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 24
    electricityMap

    electricityMap

    A real-time visualisation of the CO2 emissions of electricity

    Real-time visualization of the Greenhouse Gas (in terms of CO2 equivalent) footprint of electricity consumption built with d3.js and mapbox GL. Real-time data is defined as a data source with an hourly (or better) frequency, delayed by less than 2hrs. It should provide a breakdown by generation type. Often fossil fuel generation (coal/gas/oil) is combined under a single heading like 'thermal' or 'conventional', this is not a problem. Citizens should not be responsible for the emissions...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 25
    Kibana

    Kibana

    Your window into the Elastic Stack

    Kibana is a analytics and search dashboard for Elasticsearch that allows you to visualize Elasticsearch data and efficiently navigate the Elastic Stack. With Kibana you can visualize and shape your data simply and intuitively, share visualizations for greater collaboration, organize dashboards and visualizations, and so much more.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.