Showing 55 open source projects for "data json"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    Google Workspace CLI

    Google Workspace CLI

    Command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, etc.

    Google Workspace CLI (gws) is a command-line tool designed to interact with Google Workspace services such as Drive, Gmail, Calendar, Sheets, and more from a single interface. It dynamically generates its command structure using Google’s Discovery Service, allowing it to automatically support new API endpoints as they become available. The tool eliminates the need for manual REST API calls by providing structured commands and built-in help for each resource and method. It outputs structured...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 2
    ConfiChat

    ConfiChat

    Lightweight, standalone, multi-platform, and privacy focused local LLM

    ...The tool supports local models such as Ollama and llama.cpp for fully offline operation, while also allowing integration with cloud APIs like OpenAI and Anthropic for access to more advanced capabilities. A key differentiator is its optional encryption of chat history and assets, ensuring that sensitive data can remain secure even when stored locally. Conversations are managed as local JSON files, giving users transparency and direct control over their data. Overall, ConfiChat is designed for users who prioritize privacy, flexibility, and independence from complex infrastructure while still maintaining access.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Qwen-2.5-VL

    Qwen-2.5-VL

    Qwen2.5-VL is the multimodal large language model series

    ...Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation (exceeding 8,000 tokens), and structured data comprehension, such as tables and JSON formats. They support context lengths up to 128,000 tokens and offer multilingual capabilities in over 29 languages, including Chinese, English, French, Spanish, and more. The models are open-source under the Apache 2.0 license, with resources and documentation available on platforms like Hugging Face and ModelScope.
    Downloads: 18 This Week
    Last Update:
    See Project
  • 4
    Skyvern

    Skyvern

    Automate browser-based workflows with LLMs and Computer Vision

    ...Skyvern understands how to solve CAPTCHAs to complete complicated workflows. Support for authenticating into user accounts, including support for 2FA/TOTP. Extract data from workflows in any schema of your choice including CSV or JSON. Automate procurement pipelines, breeze through government forms, and complete workflows in any language.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 5
    A2UI

    A2UI

    A Protocol for Agent-Driven Interfaces

    ...A key design principle of A2UI is security, as it avoids executing arbitrary code generated by models and instead restricts output to structured data that maps to a predefined catalog of trusted UI components. The system also supports incremental updates, allowing agents to progressively modify the interface as a conversation evolves.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    TTRL

    TTRL

    Test-Time Reinforcement Learning

    TTRL is an open-source framework for test-time reinforcement learning in large language models, with a particular focus on reasoning tasks where ground-truth labels are not available during inference. The project addresses the problem of how to generate useful reward signals from unlabeled test-time data, and its central insight is that common test-time scaling practices such as majority voting can be repurposed into reward estimates for online reinforcement learning. This makes the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DocStrange

    DocStrange

    Extract and convert data from any document, images, pdfs, word doc

    DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services. It is built for developers who need high-quality parsing from scans, photos, PDFs, office files,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Easy DataSet

    Easy DataSet

    A powerful tool for creating datasets for LLM fine-tuning

    Easy DataSet is a comprehensive open-source tool designed to make creating high-quality datasets for large language model fine-tuning, retrieval-augmented generation (RAG), and evaluation as easy and automated as possible by providing intuitive interfaces and powerful parsing, segmentation, and labeling tools. It supports ingesting domain-specific documents in a wide range of formats — including PDF, Markdown, DOCX, EPUB, and plain text — and can intelligently segment, clean, and structure...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    Swirl

    Swirl

    Swirl queries any number of data sources with APIs

    ...It's intended for use by developers and data scientists who want to solve multi-silo search problems from enterprise search to new monitoring & alerting solutions that push information to users continuously. Built on the Python/Django/RabbitMQ stack, SWIRL includes connectors to Apache Solr, ChatGPT, Elastic, OpenSearch | PostgreSQL, Google BigQuery plus generic HTTP/GET/JSON with configurations for premium services.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    MadelineProto

    MadelineProto

    Async PHP client/server API for the telegram MTProto protocol

    This library can be used to easily interact with Telegram without the bot API, just like the official apps. It can login with a phone number (MTProto API), or with a bot token (MTProto API, no bot API involved!). Internal peer management: you can provide a simple bot API chat id or a username to send a message or to call other mtproto methods! You can easily login as a user (2FA is supported) or as a bot! Simple error handling! It is highly customizable with a lot of different settings! Bot...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    TONL

    TONL

    TONL (Token-Optimized Notation Language)

    TONL is a cutting-edge data platform built around a production-ready serialization format designed to be both compact and powerful, combining human readability with performance features that make it suitable for large-scale applications and AI workflows. It provides a serialization format that significantly reduces token usage compared with traditional JSON, which can result in lower costs and more efficient prompt size utilization in LLM-driven systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    ENScan Go

    ENScan Go

    ENScan_GO is an enterprise information reconnaissance tool

    ENScan_GO is an enterprise information reconnaissance tool focused on Chinese corporate data sources. It aggregates official and third-party APIs to pull records like ICP filings, affiliated/holding companies, apps, mini-programs, and WeChat official accounts, then exports merged results for analysis. The tool targets analysts who need one-click collection and normalized output to reduce manual lookups across registries and platforms. Recent releases added a reworked task model with...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    ChatGPT Retrieval Plugin

    ChatGPT Retrieval Plugin

    The ChatGPT Retrieval Plugin lets you easily find personal documents

    The chatgpt-retrieval-plugin repository implements a semantic retrieval backend that lets ChatGPT (or GPT-powered tools) access private or organizational documents in natural language by combining vector search, embedding models, and plugin infrastructure. It can serve as a custom GPT plugin or function-calling backend so that a chat session can “look up” relevant documents based on user queries, inject those results into context, and respond more knowledgeably about a private knowledge...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    LangChain Extract

    LangChain Extract

    Did you say you like data?

    LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents. Built using FastAPI and the LangChain framework, the application exposes a REST API that can process documents and return structured outputs that match user-defined JSON schemas. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Qwen2.5

    Qwen2.5

    Open source large language model by Alibaba

    ...Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation (exceeding 8,000 tokens), and structured data comprehension, such as tables and JSON formats. They support context lengths up to 128,000 tokens and offer multilingual capabilities in over 29 languages, including Chinese, English, French, Spanish, and more. The models are open-source under the Apache 2.0 license, with resources and documentation available on platforms like Hugging Face and ModelScope. ...
    Downloads: 33 This Week
    Last Update:
    See Project
  • 17
    PromptSniffer

    PromptSniffer

    View Extract & Remove AI generation metadata with right click

    ...Core Functionality Read EXIF/Metadata: Extract and display comprehensive metadata from images AI Metadata Detection: Automatically identify and highlight AI generation metadata Metadata Removal: Strip AI generation metadata while preserving image quality Batch Processing: Handle multiple files with wildcard patterns Cross-Platform: Works on Windows, macOS, and Linux AI Tool Support ComfyUI: Detects and extracts workflow JSON data Stable Diffusion: Identifies prompts, parameters, and generation settings SwarmUI/StableSwarmUI: Handles JSON-formatted metadata Midjourney, DALL-E, NovelAI: Recognizes generation signatures Automatic1111, InvokeAI: Extracts generation parameters
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Whishper

    Whishper

    Transcribe any audio to text, translate and edit subtitles 100% locall

    Open-source, local-first audio transcription and subtitling suite with a simple web UI. Thanks to open-source technologies, Whishper can run 100% offline. Your data never leaves your computer. Whishper allows you to translate your transcriptions to and from more than 60 languages thanks to Argos Translate and LibreTranslate. Download the transcriptions in many formats (json, txt, vtt, srt). Easily edit your subtitles right in the Web-UI.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 19
    KoboldCpp

    KoboldCpp

    Run GGUF models easily with a UI or API. One File. Zero Install.

    KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable that builds off llama.cpp and adds many additional powerful features.
    Leader badge
    Downloads: 1,191 This Week
    Last Update:
    See Project
  • 20
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    LayoutParser

    LayoutParser

    A Unified Toolkit for Deep Learning Based Document Image Analysis

    With the help of state-of-the-art deep learning models, Layout Parser enables extracting complicated document structures using only several lines of code. This method is also more robust and generalizable as no sophisticated rules are involved in this process. A complete instruction for installing the main Layout Parser library and auxiliary components. Learn how to load DL Layout models and use them for layout detection. The full list of layout models currently available in Layout Parser....
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    igel

    igel

    Machine learning tool that allows you to train and test models

    A delightful machine learning tool that allows you to train/fit, test, and use models without writing code. The goal of the project is to provide machine learning for everyone, both technical and non-technical users. I sometimes needed a tool sometimes, which I could use to fast create a machine learning prototype. Whether to build some proof of concept, create a fast draft model to prove a point or use auto ML. I find myself often stuck writing boilerplate code and thinking too much about...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    Aquila DB

    Aquila DB

    An easy to use Neural Search Engine

    Aquila DB is a Neural search engine. In other words, it is a database to index Latent Vectors generated by ML models along with JSON Metadata to perform k-NN retrieval. It is dead simple to set up, language-agnostic, and drop in addition to your Machine Learning Applications. Aquila DB, as of current features is a ready solution for Machine Learning engineers and Data scientists to build Neural Information Retrieval applications out of the box with minimal dependencies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ReinventCommunity

    ReinventCommunity

    Jupyter Notebook tutorials for REINVENT 3.2

    This repository is a collection of useful jupyter notebooks, code snippets and example JSON files illustrating the use of Reinvent 3.2.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Semantic Segmentation in PyTorch

    Semantic Segmentation in PyTorch

    Semantic segmentation models, datasets & losses implemented in PyTorch

    Semantic segmentation models, datasets and losses implemented in PyTorch. PyTorch and Torchvision needs to be installed before running the scripts, together with PIL and opencv for data-preprocessing and tqdm for showing the training progress. PyTorch v1.1 is supported (using the new supported tensoboard); can work with earlier versions, but instead of using tensoboard, use tensoboardX. Poly learning rate, where the learning rate is scaled down linearly from the starting value down to zero...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB