Showing 105 open source projects for "sql tools python"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Context for your AI agents Icon
    Context for your AI agents

    Crawl websites, sync to vector databases, and power RAG applications. Pre-built integrations for LLM pipelines and AI assistants.

    Build data pipelines that feed your AI models and agents without managing infrastructure. Crawl any website, transform content, and push directly to your preferred vector store. Use 10,000+ tools for RAG applications, AI assistants, and real-time knowledge bases. Monitor site changes, trigger workflows on new data, and keep your AIs fed with fresh, structured information. Cloud-native, API-first, and free to start until you need to scale.
    Try for free
  • 1
    pandas

    pandas

    Fast, flexible and powerful Python data analysis toolkit

    pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase.
    Downloads: 124 This Week
    Last Update:
    See Project
  • 2
    Mage.ai

    Mage.ai

    Build, run, and manage data pipelines for integrating data

    ...No more DAGs with spaghetti code. Start developing locally with a single command or launch a dev environment in your cloud using Terraform. Write code in Python, SQL, or R in the same data pipeline for ultimate flexibility.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    scikit-learn

    scikit-learn

    Machine learning in Python

    scikit-learn is an open source Python module for machine learning built on NumPy, SciPy and matplotlib. It offers simple and efficient tools for predictive data analysis and is reusable in various contexts.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    SageMaker Spark Container

    SageMaker Spark Container

    Docker image used to run data processing workloads

    Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The SageMaker Spark Container is a Docker image used to run batch data processing workloads on Amazon SageMaker using the Apache Spark framework. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • G-P - Global EOR Solution Icon
    G-P - Global EOR Solution

    Companies searching for an Employer of Record solution to mitigate risk and manage compliance, taxes, benefits, and payroll anywhere in the world

    With G-P's industry-leading Employer of Record (EOR) and Contractor solutions, you can hire, onboard and manage teams in 180+ countries — quickly and compliantly — without setting up entities.
    Learn More
  • 5
    Pathway

    Pathway

    Python ETL framework for stream processing, real-time analytics, LLM

    ...Unlike traditional batch processing frameworks, Pathway continuously updates the results of your data logic as new events arrive, functioning more like a database that reacts in real-time. It supports Python, integrates with modern data tools, and offers a deterministic dataflow model to ensure reproducibility and correctness.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Dask

    Dask

    Parallel computing with task scheduling

    Dask is a Python library for parallel and distributed computing, designed to scale analytics workloads from single machines to large clusters. It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Astropy

    Astropy

    Repository for the Astropy core package

    ...The Anaconda Python Distribution includes Astropy and is the recommended way to install both Python and the Astropy package. The astropy package contains key functionality and common tools needed for performing astronomy and astrophysics with Python. It is at the core of the Astropy Project, which aims to enable the community to develop a robust ecosystem of affiliated packages covering a broad range of needs for astronomical research, data processing, and data analysis.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    AWS SDK for pandas

    AWS SDK for pandas

    Easy integration with Athena, Glue, Redshift, Timestream, Neptune

    ...The library abstracts efficient patterns like partitioning, compression, and vectorized I/O so you get performant data lake operations without hand-rolling boilerplate. It also supports Redshift, OpenSearch, and other services, enabling ETL tasks that blend SQL engines and Python transformations. Operational helpers handle IAM, sessions, and concurrency while exposing knobs for encryption, versioning, and catalog consistency. The result is a productive workflow that keeps your analytics in Python while leveraging AWS-native storage and query engines at scale.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Luigi

    Luigi

    Python module that helps you build complex pipelines of batch jobs

    Luigi is a Python (3.6, 3.7, 3.8, 3.9 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more. The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop...
    Downloads: 3 This Week
    Last Update:
    See Project
  • BoldTrail Real Estate CRM Icon
    BoldTrail Real Estate CRM

    A first-of-its-kind homeownership solution that puts YOU at the center of the coveted lifetime consumer relationship.

    BoldTrail, the #1 rated real estate platform, is built to power your entire brokerage with next-generation technology your agents will use and love. Showcase your unique brand with customizable websites for your company, offices, and every agent. Maximize lead capture with a modern, portal-like consumer search experience and intelligent behavior tracking. Hyper-local area pages, home valuation pages and options for rich lifestyle data keep customers searching with your brokerage as the local experts. The most robust lead gen tools on the market help your brokerage, teams & agents effectively drive new business - no matter their budget. Empower your agents to generate free leads instantly with our simple to use landing pages & IDX squeeze pages. Drive more leads with higher quality and lower cost through in-house tools built within the platform. Diversify lead sources with our automated social media posting, integrated Google and Facebook advertising, custom text codes and more.
    Learn More
  • 10
    Datumaro

    Datumaro

    Dataset Management Framework, a Python library and a CLI tool to build

    Datumaro is a flexible Python-based dataset management framework and command-line tool for building, analyzing, transforming, and converting computer vision datasets in many popular formats. It supports importing and exporting annotations and images across a wide variety of standards like COCO, PASCAL VOC, YOLO, ImageNet, Cityscapes, and many more, enabling easy integration with different training pipelines and tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    geemap

    geemap

    A Python package for interactive geospaital analysis and visualization

    A Python package for interactive geospatial analysis and visualization with Google Earth Engine. Geemap is a Python package for geospatial analysis and visualization with Google Earth Engine (GEE), which is a cloud computing platform with a multi-petabyte catalog of satellite imagery and geospatial datasets. During the past few years, GEE has become very popular in the geospatial community and it has empowered numerous environmental applications at local, regional, and global scales. GEE...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    marimo

    marimo

    A reactive notebook for Python

    ...Version with git, run as Python scripts, import symbols from a notebook into other notebooks or Python files, and lint or format with your favorite tools. You'll always be able to reproduce your collaborators' results. Notebooks are executed in a deterministic order, with no hidden state, delete a cell and marimo deletes its variables while updating affected cells.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    CKAN

    CKAN

    CKAN is an open-source DMS for powering data hubs

    CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work with data. It's a data management system that provides a powerful platform for cataloging, storing and accessing datasets with a rich front-end, full API (for both data and catalog), visualization tools and more.CKAN is used by national and regional government organizations throughout the European Union, the Americas, Asia, and Oceania to power a variety of official and community data...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 14
    gusty

    gusty

    Making DAG construction easier

    gusty allows you to control your Airflow DAGs, Task Groups, and Tasks with greater ease. gusty manages collections of tasks, represented as any number of YAML, Python, SQL, Jupyter Notebook, or R Markdown files. A directory of task files is instantly rendered into a DAG by passing a file path to gusty's create_dag function. gusty also manages dependencies (within one DAG) and external dependencies (dependencies on tasks in other DAGs) for each task file you define. All you have to do is provide a list of dependencies or external_dependencies inside of a task file, and gusty will automatically set each task's dependencies and create external task sensors for any external dependencies listed. gusty works with both Airflow 1.x and Airflow 2.x, and has even more features, all of which aim to make the creation, management, and iteration of DAGs more fluid, so that you can intuitively design your DAG and build your tasks.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    HyperTools

    HyperTools

    A Python toolbox for gaining geometric insights

    HyperTools is a library for visualizing and manipulating high-dimensional data in Python. It is built on top of matplotlib (for plotting), seaborn (for plot styling), and scikit-learn (for data manipulation). Functions for plotting high-dimensional datasets in 2/3D. Static and animated plots. Simple API for customizing plot styles. Set of powerful data manipulation tools including hyperalignment, k-means clustering, normalizing and more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Neuroglancer

    Neuroglancer

    WebGL-based viewer for volumetric data

    Neuroglancer is a WebGL-based visualization tool designed for exploring large-scale volumetric and neuroimaging datasets directly in the browser. It allows users to interactively view arbitrary 2D and 3D cross-sections of volumetric data alongside 3D meshes and skeleton models, enabling precise examination of neural structures and biological imaging results. Its multi-pane interface synchronizes multiple orthogonal views with a central 3D viewport, making it ideal for analyzing complex brain...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    leafmap

    leafmap

    A Python package for interactive mapping and geospatial analysis

    A Python package for geospatial analysis and interactive mapping in a Jupyter environment. Leafmap is a Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment. It is a spin-off project of the geemap Python package, which was designed specifically to work with Google Earth Engine (GEE). However, not everyone in the geospatial community has access to the GEE cloud computing platform. Leafmap is designed to fill this gap for non-GEE users. It...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    PySR

    PySR

    High-Performance Symbolic Regression in Python and Julia

    PySR is an open-source tool for Symbolic Regression: a machine learning task where the goal is to find an interpretable symbolic expression that optimizes some objective. Over a period of several years, PySR has been engineered from the ground up to be (1) as high-performance as possible, (2) as configurable as possible, and (3) easy to use. PySR is developed alongside the Julia library SymbolicRegression.jl, which forms the powerful search engine of PySR. The details of these algorithms are...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Positron

    Positron

    Positron, a next-generation data science IDE

    Positron is a next-generation integrated development environment (IDE) created by Posit PBC (formerly RStudio Inc) specifically tailored for data science workflows in Python, R, and multi-language ecosystems. It aims to unify exploratory data analysis, production code, and data-app authoring in a single environment so that data scientists move from “question → insight → application” without switching tools. Built on the open-source Code-OSS foundation, Positron provides a familiar coding experience along with specialized panes and tooling for variable inspection, data-frame viewing, plotting previews, and interactive consoles designed for analytical work. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    FiftyOne

    FiftyOne

    The open-source tool for building high-quality datasets

    The open-source tool for building high-quality datasets and computer vision models. Nothing hinders the success of machine learning systems more than poor-quality data. And without the right tools, improving a model can be time-consuming and inefficient. FiftyOne supercharges your machine learning workflows by enabling you to visualize datasets and interpret models faster and more effectively. Improving data quality and understanding your model’s failure modes are the most impactful ways to...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    Arize Phoenix

    Arize Phoenix

    Uncover insights, surface problems, monitor, and fine tune your LLM

    Phoenix provides ML insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is an Open Source ML Observability library designed for the Notebook. The toolset is designed to ingest model inference data for LLMs, CV, NLP and tabular datasets. It allows Data Scientists to quickly visualize their model data, monitor performance, track down issues & insights, and easily export to improve. Deep Learning Models (CV, LLM, and Generative)...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    PyVista

    PyVista

    3D plotting and mesh analysis through a streamlined interface

    3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK). PyVista is a helper module for the Visualization Toolkit (VTK) that takes a different approach on interfacing with VTK through NumPy and direct array access. This package provides a Pythonic, well-documented interface exposing VTK’s powerful visualization backend to facilitate rapid prototyping, analysis, and visual integration of spatially referenced datasets. This module can be used for...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    airda

    airda

    airda(Air Data Agent

    airda(Air Data Agent) is a multi-smart body for data analysis, capable of understanding data development and data analysis needs, understanding data, generating data-oriented queries, data visualization, machine learning and other tasks of SQL and Python codes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PySyft

    PySyft

    Data science on data without acquiring a copy

    Most software libraries let you compute over the information you own and see inside of machines you control. However, this means that you cannot compute on information without first obtaining (at least partial) ownership of that information. It also means that you cannot compute using machines without first obtaining control over those machines. This is very limiting to human collaboration and systematically drives the centralization of data, because you cannot work with a bunch of data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next