Showing 75 open source projects for "sql data generator"

View related business solutions
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    DeerFlow

    DeerFlow

    Deep Research framework, combining language models with tools

    DeerFlow is an open-source, community-driven “deep research” framework / multi-agent orchestration platform developed by ByteDance. It aims to combine the reasoning power of large language models (LLMs) with automated tool-use — such as web search, web crawling, Python execution, and data processing — to enable complex, end-to-end research workflows. Instead of a monolithic AI assistant, DeerFlow defines multiple specialized agents (e.g. “planner,” “searcher,” “coder,” “report generator”) that collaborate in a structured workflow, allowing tasks like literature reviews, data gathering, data analysis, code execution, and final report generation to be largely automated. ...
    Downloads: 56 This Week
    Last Update:
    See Project
  • 2
    OceanBase seekdb

    OceanBase seekdb

    The AI-Native Search Database

    seekdb is an AI-native search database from OceanBase that unifies vector, full-text, relational, JSON, and GIS data into a single query engine. The system is designed to support hybrid search workloads and in-database AI workflows without requiring multiple specialized databases. It enables developers to perform semantic search, keyword search, and structured SQL queries within the same platform, simplifying modern AI application stacks. seekdb also embeds AI capabilities directly in the database layer, including embedding generation, reranking, and LLM inference for end-to-end RAG pipelines. ...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 3
    MCP Snowflake Server

    MCP Snowflake Server

    A Model Context Protocol (MCP) server implementation

    An MCP server implementation that facilitates database interactions with Snowflake, allowing execution of SQL queries and presentation of data insights as resources. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Dataherald

    Dataherald

    Interact with your SQL database, Natural Language to SQL using LLMs

    Dataherald is a platform that allows users to query structured databases using natural language, automatically converting plain English into SQL. It is designed to enable real-time, self-service analytics without needing technical knowledge of databases, making business data easily accessible to non-technical users. Dataherald focuses on speed, accuracy, and scalability for enterprise settings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    DataProfiler

    DataProfiler

    Extract schema, statistics and entities from datasets

    DataProfiler is an AI-powered tool for automatic data analysis and profiling, designed to detect patterns, anomalies, and schema inconsistencies in structured and unstructured datasets. The DataProfiler is a Python library designed to make data analysis, monitoring, and sensitive data detection easy. Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities (PII / NPI), and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Logfire MCP

    Logfire MCP

    The Logfire MCP Server is here

    The Logfire MCP Server is a Model Context Protocol server that allows AI applications to access OpenTelemetry traces and metrics sent to Logfire. It enables retrieval and analysis of telemetry data, enhancing debugging and observability workflows. ​
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    LlamaIndex

    LlamaIndex

    Central interface to connect your LLM's with external data

    LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion. Provides indices over your unstructured and structured data for use with LLM's. These indices help to abstract away common boilerplate and pain points for in-context learning. Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 8
    Preswald

    Preswald

    Python tool for browser-based interactive data apps in one file

    Preswald is an open source Python-based framework and static-site generator designed for building interactive data applications that run entirely in the browser. It packages application logic, data processing, and user interface components into a single self-contained output, enabling easy sharing and deployment without requiring local dependencies. Preswald leverages a WebAssembly runtime along with technologies like Pyodide and DuckDB to execute Python code directly in the browser environment. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    pgai

    pgai

    A suite of tools to develop RAG, semantic search, and other AI apps

    pgai is a suite of PostgreSQL extensions developed by Timescale to empower developers in building AI applications directly within their databases. It integrates tools for vector storage, advanced indexing, and AI model interactions, facilitating the development of applications like semantic search and Retrieval-Augmented Generation (RAG) without leaving the SQL environment.
    Downloads: 4 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    MyScaleDB

    MyScaleDB

    A @ClickHouse fork that supports high-performance vector search

    MyScaleDB is an open-source SQL vector database designed for building large-scale AI and machine learning applications that require both analytical queries and semantic vector search. The system is built on top of the ClickHouse database engine and extends it with specialized indexing and search capabilities optimized for vector embeddings. This design allows developers to store structured data, unstructured text, and high-dimensional vector embeddings within a single database platform. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    PostgresML

    PostgresML

    The GPU-powered AI application database

    ...Leverage multiple types of natural language processing and machine learning models such as vector search and personalization with embeddings to improve search results. Leverage your data with time series forecasting to garner key business insights. Build statistical and predictive models with the full power of SQL and dozens of regression algorithms. Return results and detect fraud faster with ML at the database layer. PostgresML abstracts the data management overhead from the ML/AI lifecycle by enabling users to run ML/LLM models directly on a Postgres database.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    airda

    airda

    airda(Air Data Agent

    airda(Air Data Agent) is a multi-smart body for data analysis, capable of understanding data development and data analysis needs, understanding data, generating data-oriented queries, data visualization, machine learning and other tasks of SQL and Python codes.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Memori

    Memori

    SQL-native memory layer enabling persistent context for AI agents

    ...It extracts structured information such as facts, preferences, rules, and summaries from interactions and stores them in standard SQL databases for later retrieval. By recalling relevant context during future model calls, Memori helps AI agents produce more consistent and context-aware responses while reducing the need to repeatedly provide background information. Memori is designed to work with multiple LLM providers, data stores, and AI frameworks, allowing it to integrate into existing software architectures without requiring major changes.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    JimuReport

    JimuReport

    Open source drag-and-drop reporting and dashboard builder platform

    ...JimuReport supports traditional report generation, print templates, and modern dashboard visualizations for business intelligence scenarios. JimuReport also includes components for building interactive charts, data tables, and analytical displays that can be used in enterprise applications. It can connect to multiple data sources and retrieve data through SQL queries, APIs, or other structured formats. It can be embedded into Java applications using Spring Boot integration modules.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    Open Gauss

    Open Gauss

    Project-scoped Lean workflow orchestrator from Math, Inc.

    Open Gauss is an enterprise-grade open-source relational database management system designed to handle large-scale data processing with high performance, reliability, and security. It is based on the PostgreSQL ecosystem but significantly extends its capabilities through architectural optimizations, AI-driven features, and enterprise-level enhancements. The database organizes data using the relational model, storing structured information in tables composed of rows and columns while supporting standard SQL for querying and management. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    dataline

    dataline

    AI data analysis and visualization on CSV, Postgres, MySQL, Snowflake

    dataline is an open-source AI data analysis and visualization platform that allows users to interact with datasets using natural language. The system enables both technical and non-technical users to explore data by asking questions conversationally, which the platform translates into database queries and analytical operations. It supports connections to multiple structured data sources such as PostgreSQL, MySQL, Snowflake, SQLite, Excel files, CSV datasets, and other database systems. Once...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    WanGP

    WanGP

    AI video generator optimized for low VRAM and older GPUs use

    Wan2GP is an open source AI video generation toolkit designed to make modern generative models accessible on consumer-grade hardware with limited GPU memory. It acts as a unified interface for running multiple video, image, and audio generation models, including Wan-based models as well as other systems like Hunyuan Video, Flux, and Qwen. A key focus of the project is reducing VRAM requirements, enabling some workflows to run on as little as 6 GB while still supporting older Nvidia and...
    Downloads: 36 This Week
    Last Update:
    See Project
  • 18
    civitai

    civitai

    Open platform for sharing and discovering Stable Diffusion models

    Civitai is an open source project that provides the codebase for a platform designed to share and manage generative AI models used for image generation. It focuses primarily on models compatible with Stable Diffusion and related technologies, allowing creators to upload, organize, and distribute custom AI models and related resources. These resources can include textual inversions, hypernetworks, aesthetic gradients, and variational autoencoders that modify or extend the capabilities of...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 19
    Learning Interpretability Tool

    Learning Interpretability Tool

    Interactively analyze ML models to understand their behavior

    The Learning Interpretability Tool (LIT, formerly known as the Language Interpretability Tool) is a visual, interactive ML model-understanding tool that supports text, image, and tabular data. It can be run as a standalone server, or inside of notebook environments such as Colab, Jupyter, and Google Cloud Vertex AI notebooks.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    Memobase

    Memobase

    Fast backend for long-term AI user memory via structured profiles

    Memobase is an open source backend system that enables long-term user memory functionality for AI applications by capturing and structuring information about users across interactions. Its design centers on creating user profiles and recording event timelines, allowing AI systems to remember, understand, and evolve in their behaviour toward individual users over time. Instead of relying purely on traditional embedding-based retrieval or RAG systems, Memobase uses profile and timeline...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 21
    HeavyDB

    HeavyDB

    HeavyDB (formerly MapD/OmniSciDB)

    HeavyDB is an open-source GPU-accelerated analytical database designed to perform extremely fast queries on large datasets. The system is built as a SQL-based relational columnar database engine that leverages modern hardware parallelism, including GPUs and multicore CPUs. Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the OmniSci platform (formerly MapD) and is commonly used for large-scale analytics and geospatial data processing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    WebGLM

    WebGLM

    An Efficient Web-enhanced Question Answering System

    ...The system is based on the General Language Model architecture and was designed to enable language models to interact directly with web information during the question-answering process. Instead of relying solely on knowledge stored in the model’s training data, the system retrieves relevant web content and integrates it into the reasoning process. WebGLM introduces several components that coordinate this process, including a retrieval module that selects relevant web documents, a generator that produces answers, and a scoring system that evaluates the quality of generated responses. The architecture aims to improve the reliability and usefulness of AI systems that answer questions about current or external knowledge sources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Pydantic Logfire

    Pydantic Logfire

    Python observability platform for tracing apps, metrics, and logs

    Pydantic Logfire is an observability platform designed to help developers monitor, analyze, and understand the behavior of their applications in real time. It is built by the team behind Pydantic and follows a philosophy of combining powerful capabilities with ease of use, making it accessible to entire engineering teams. Pydantic Logfire provides deep visibility into application performance by capturing traces, metrics, and logs through an OpenTelemetry-based architecture. It is...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    Lightweight' GAN

    Lightweight' GAN

    Implementation of 'lightweight' GAN, proposed in ICLR 2021

    Implementation of 'lightweight' GAN proposed in ICLR 2021, in Pytorch. The main contribution of the paper is a skip-layer excitation in the generator, paired with autoencoding self-supervised learning in the discriminator. Quoting the one-line summary "converge on single gpu with few hours' training, on 1024 resolution sub-hundred images". Augmentation is essential for Lightweight GAN to work effectively in a low data setting. You can test and see how your images will be augmented before they pass into a neural network (if you use augmentation). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Korvus

    Korvus

    Korvus is a search SDK that unifies the entire RAG pipeline

    Korvus is an open-source retrieval-augmented generation (RAG) pipeline designed to run entirely inside PostgreSQL, allowing developers to build AI search and knowledge systems directly within a database environment. The project consolidates the typical steps of a RAG pipeline—including embedding generation, document retrieval, reranking, and text generation—into a single query executed within the Postgres ecosystem. By leveraging PostgresML and vector extensions such as pgvector, Korvus...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB