35 projects for "sql data generator" with 2 filters applied:

  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Synthetic Data Generator

    Synthetic Data Generator

    SDG is a specialized framework

    Synthetic Data Generator is an open-source framework designed to generate high-quality synthetic tabular datasets that replicate the statistical characteristics of real data while avoiding privacy risks. The platform enables developers and data scientists to create artificial datasets that preserve important relationships between variables without containing sensitive personal information.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 2
    Dash Data Agent

    Dash Data Agent

    Self-learning data agent that grounds its answers in layers of content

    Dash is a self-learning data agent built by the Agno AI community that generates grounded answers to English queries over structured data by synthesizing SQL and reasoning based on six layers of context, improving automatically with each run. It sidesteps common limitations of simple text-to-SQL agents by incorporating multiple context layers — including schema structure, human annotations, known query patterns, institutional knowledge from docs, machine-discovered error patterns, and live runtime context — to generate SQL queries that are both technically correct and semantically meaningful. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Data-Science-Interview-Questions-Answers

    Data-Science-Interview-Questions-Answers

    Curated list of data science interview questions and answers

    ...The repository focuses on core data science fundamentals rather than acting as a software framework, which makes it especially useful as a study and revision resource. Its content is organized into subject-specific documents that cover machine learning, deep learning, statistics, probability, Python, SQL and databases, and resume-based interview questions. That structure makes it practical for users who want to study by topic, strengthen weak areas, or simulate the range of questions they may encounter in interviews.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Data Science Interviews

    Data Science Interviews

    Data science interview questions and answers

    Data Science Interviews is an open-source repository that collects common data science interview questions along with community-provided answers and explanations. The project serves as a preparation resource for students, job seekers, and professionals who want to review the technical knowledge required for data science roles. The repository organizes questions into different categories including theoretical machine learning concepts, technical programming questions, and probability or statistics problems. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    cracking-the-data-science-interview

    cracking-the-data-science-interview

    A Collection of Cheatsheets, Books, Questions, and Portfolio

    Cracking the Data Science Interview is an open educational repository that collects study materials, resources, and reference links for preparing for data science interviews. The project organizes content across many fundamental areas of data science, including statistics, probability, SQL, machine learning, and deep learning. It includes cheat sheets that summarize important technical concepts commonly discussed during technical interviews.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Rill

    Rill

    Fast SQL-based BI tool for real-time dashboards and analytics

    Rill is an operational BI tool that turns raw datasets into fast, interactive dashboards using SQL and a code-first approach. It helps data teams move from data lake to insight quickly, without the complexity of traditional BI systems. With an embedded in-memory database powered by DuckDB or ClickHouse, queries run in milliseconds, enabling real-time exploration and analysis. Rill supports local and remote data sources such as CSV, Parquet, S3, and GCS, making it flexible across environments. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    Aix-DB

    Aix-DB

    Based on the LangChain/LangGraph framework

    ...The platform supports multiple types of data sources and provides an end-to-end pipeline that includes intent recognition, SQL generation, database execution, and visual presentation of results. Its architecture includes multiple layers such as a web interface, API gateway, AI service layer, and data storage layer that support relational databases, vector stores, graph databases, and file systems.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Vanna 2.0

    Vanna 2.0

    Chat with your SQL database

    Vanna is an open-source Python framework that enables natural language interaction with databases by converting user questions into executable SQL queries using large language models. The framework uses a retrieval-augmented generation architecture that learns from database schemas, documentation, and past query examples to generate accurate queries tailored to a specific dataset. Vanna can be integrated into many environments, including notebooks, web applications, messaging platforms, and data dashboards, making it flexible for analytics and data exploration workflows. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    Deepnote

    Deepnote

    Deepnote is a drop-in replacement for Jupyter

    ...The system supports programming languages such as Python, R, and SQL and allows users to execute and analyze data directly within interactive notebooks. Deepnote emphasizes team-based data science by enabling real-time collaboration similar to shared document editors, allowing multiple users to work simultaneously on the same notebook environment.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    LlamaIndex

    LlamaIndex

    Central interface to connect your LLM's with external data

    LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion. Provides indices over your unstructured and structured data for use with LLM's. These indices help to abstract away common boilerplate and pain points for in-context learning. Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    Preswald

    Preswald

    Python tool for browser-based interactive data apps in one file

    Preswald is an open source Python-based framework and static-site generator designed for building interactive data applications that run entirely in the browser. It packages application logic, data processing, and user interface components into a single self-contained output, enabling easy sharing and deployment without requiring local dependencies. Preswald leverages a WebAssembly runtime along with technologies like Pyodide and DuckDB to execute Python code directly in the browser environment. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    MyScaleDB

    MyScaleDB

    A @ClickHouse fork that supports high-performance vector search

    MyScaleDB is an open-source SQL vector database designed for building large-scale AI and machine learning applications that require both analytical queries and semantic vector search. The system is built on top of the ClickHouse database engine and extends it with specialized indexing and search capabilities optimized for vector embeddings. This design allows developers to store structured data, unstructured text, and high-dimensional vector embeddings within a single database platform. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Memori

    Memori

    SQL-native memory layer enabling persistent context for AI agents

    ...It extracts structured information such as facts, preferences, rules, and summaries from interactions and stores them in standard SQL databases for later retrieval. By recalling relevant context during future model calls, Memori helps AI agents produce more consistent and context-aware responses while reducing the need to repeatedly provide background information. Memori is designed to work with multiple LLM providers, data stores, and AI frameworks, allowing it to integrate into existing software architectures without requiring major changes.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    JimuReport

    JimuReport

    Open source drag-and-drop reporting and dashboard builder platform

    ...JimuReport supports traditional report generation, print templates, and modern dashboard visualizations for business intelligence scenarios. JimuReport also includes components for building interactive charts, data tables, and analytical displays that can be used in enterprise applications. It can connect to multiple data sources and retrieve data through SQL queries, APIs, or other structured formats. It can be embedded into Java applications using Spring Boot integration modules.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    Open Gauss

    Open Gauss

    Project-scoped Lean workflow orchestrator from Math, Inc.

    Open Gauss is an enterprise-grade open-source relational database management system designed to handle large-scale data processing with high performance, reliability, and security. It is based on the PostgreSQL ecosystem but significantly extends its capabilities through architectural optimizations, AI-driven features, and enterprise-level enhancements. The database organizes data using the relational model, storing structured information in tables composed of rows and columns while supporting standard SQL for querying and management. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    WanGP

    WanGP

    AI video generator optimized for low VRAM and older GPUs use

    Wan2GP is an open source AI video generation toolkit designed to make modern generative models accessible on consumer-grade hardware with limited GPU memory. It acts as a unified interface for running multiple video, image, and audio generation models, including Wan-based models as well as other systems like Hunyuan Video, Flux, and Qwen. A key focus of the project is reducing VRAM requirements, enabling some workflows to run on as little as 6 GB while still supporting older Nvidia and...
    Downloads: 36 This Week
    Last Update:
    See Project
  • 17
    civitai

    civitai

    Open platform for sharing and discovering Stable Diffusion models

    Civitai is an open source project that provides the codebase for a platform designed to share and manage generative AI models used for image generation. It focuses primarily on models compatible with Stable Diffusion and related technologies, allowing creators to upload, organize, and distribute custom AI models and related resources. These resources can include textual inversions, hypernetworks, aesthetic gradients, and variational autoencoders that modify or extend the capabilities of...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 18
    Memobase

    Memobase

    Fast backend for long-term AI user memory via structured profiles

    Memobase is an open source backend system that enables long-term user memory functionality for AI applications by capturing and structuring information about users across interactions. Its design centers on creating user profiles and recording event timelines, allowing AI systems to remember, understand, and evolve in their behaviour toward individual users over time. Instead of relying purely on traditional embedding-based retrieval or RAG systems, Memobase uses profile and timeline...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 19
    WebGLM

    WebGLM

    An Efficient Web-enhanced Question Answering System

    ...The system is based on the General Language Model architecture and was designed to enable language models to interact directly with web information during the question-answering process. Instead of relying solely on knowledge stored in the model’s training data, the system retrieves relevant web content and integrates it into the reasoning process. WebGLM introduces several components that coordinate this process, including a retrieval module that selects relevant web documents, a generator that produces answers, and a scoring system that evaluates the quality of generated responses. The architecture aims to improve the reliability and usefulness of AI systems that answer questions about current or external knowledge sources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    HeavyDB

    HeavyDB

    HeavyDB (formerly MapD/OmniSciDB)

    HeavyDB is an open-source GPU-accelerated analytical database designed to perform extremely fast queries on large datasets. The system is built as a SQL-based relational columnar database engine that leverages modern hardware parallelism, including GPUs and multicore CPUs. Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the OmniSci platform (formerly MapD) and is commonly used for large-scale analytics and geospatial data processing. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Pydantic Logfire

    Pydantic Logfire

    Python observability platform for tracing apps, metrics, and logs

    Pydantic Logfire is an observability platform designed to help developers monitor, analyze, and understand the behavior of their applications in real time. It is built by the team behind Pydantic and follows a philosophy of combining powerful capabilities with ease of use, making it accessible to entire engineering teams. Pydantic Logfire provides deep visibility into application performance by capturing traces, metrics, and logs through an OpenTelemetry-based architecture. It is...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    Korvus

    Korvus

    Korvus is a search SDK that unifies the entire RAG pipeline

    Korvus is an open-source retrieval-augmented generation (RAG) pipeline designed to run entirely inside PostgreSQL, allowing developers to build AI search and knowledge systems directly within a database environment. The project consolidates the typical steps of a RAG pipeline—including embedding generation, document retrieval, reranking, and text generation—into a single query executed within the Postgres ecosystem. By leveraging PostgresML and vector extensions such as pgvector, Korvus...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    WhisperSpeech

    WhisperSpeech

    An Open Source text-to-speech system built by inverting Whisper

    WhisperSpeech is an open-source text-to-speech system created by “inverting” OpenAI’s Whisper, reusing its strengths as a semantic audio model to generate speech instead of only transcribing it. The project aims to be for speech what Stable Diffusion is for images: powerful, hackable, and safe for commercial use, with code under Apache-2.0/MIT and models trained only on properly licensed data. Its architecture follows a token-based, multi-stage pipeline inspired by AudioLM and SPEAR-TTS:...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    DB-GPT-Hub

    DB-GPT-Hub

    A repository that contains models, datasets, and fine-tuning

    DB-GPT-Hub is an open-source repository that provides datasets, models, and training tools designed to improve large language models for database interaction tasks, particularly Text-to-SQL. The project serves as a specialized extension of the broader DB-GPT ecosystem, focusing on the preparation and evaluation of models capable of translating natural language questions into structured database queries. It offers a modular framework that supports data preparation, model fine-tuning, benchmarking, and inference for Text-to-SQL systems. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Common Resource Grep - crgrep

    Common Resource Grep - crgrep

    Common Resource Grep

    CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB