Search Results for "structured text" - Page 2

Showing 332 open source projects for "structured text"

View related business solutions
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 1
    kg-gen

    kg-gen

    Knowledge Graph Generation from Any Text

    kg-gen is an open-source framework developed by the STAIR Lab that automatically generates knowledge graphs from unstructured text using large language models. The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Sparrow

    Sparrow

    Structured data extraction and instruction calling with ML, LLM

    Sparrow is an open-source platform designed to extract structured information from documents, images, and other unstructured data sources using machine learning and large language models. The system focuses on transforming complex documents such as invoices, receipts, forms, and scanned pages into structured formats like JSON that can be processed by downstream applications. It combines several components, including OCR pipelines, vision-language models, and LLM-based reasoning modules to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Beads

    Beads

    A memory upgrade for your coding agent

    Beads is an open-source project providing a distributed, structured memory system for AI coding agents, replacing ad-hoc text plans with a git-backed graph that represents tasks, dependencies, and progress in a persistent, queryable format. Instead of storing plans as unstructured Markdown or ephemeral notes, Beads organizes agent state, task artifacts, and relationships as nodes and edges in a version-controlled graph so that long-horizon projects don’t lose context or coherence as the agent proceeds. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 4
    LangExtract

    LangExtract

    A Python library for extracting structured information

    LangExtract is a Python library developed by Google that leverages large language models (LLMs) to extract structured information from unstructured text—such as clinical notes, research papers, or literary works—based on user-defined instructions. It is designed to transform free-form text into reliable, schema-constrained data while maintaining traceability back to the source material. Each extracted entity is precisely grounded in its original context, allowing visual inspection and validation via automatically generated interactive HTML visualizations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    BlogWizard

    BlogWizard

    Generate blog articles from video or audio

    BlogWizard is a demo/utility project built on top of Groq’s LLM infrastructure that converts video or audio content into well-structured blog posts, enabling creators to repurpose multimedia content into text — useful for SEO, accessibility, or reaching audiences that prefer reading. The tool uses transcription (e.g. via Whisper) to extract text from audio/video, then runs an LLM-based generation pipeline to transform that content into coherent, readable blog-format posts — with sections, formatting, and possibly metadata. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Chandra

    Chandra

    OCR model for complex documents with layout-aware structured outputs

    Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines. It is capable of...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Hiring Agent

    Hiring Agent

    AI agent to evaluate and score resumes

    Hiring Agent is an AI-powered resume evaluation pipeline for screening technical candidates. It reads a resume PDF and converts the content into Markdown-like text. It then uses a local or hosted language model to extract structured candidate information into sectioned JSON. The system can enrich that resume data with GitHub profile and repository signals when a profile is available. After the data is collected, it produces an explainable evaluation with category scores, supporting evidence, bonus points, and deductions. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    xAI Python SDK

    xAI Python SDK

    The official Python SDK for the xAI API

    ...It is a gRPC-based SDK designed for Python 3.10 and above, with both synchronous and asynchronous clients for different application styles. Developers can use it to generate text, images, videos, and structured outputs through xAI’s model services. The package is built for direct integration into Python projects, making it useful for backend apps, automation scripts, AI tools, research prototypes, and production workflows. It uses xAI’s native gRPC interface, which is intended for high-performance communication with the API. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Logrus

    Logrus

    Structured, pluggable logging for Go

    Logrus is a structured, pluggable logger for Go (golang) that is completely API compatible with the standard library logger. It encourages careful, structured logging through much more discoverable logging fields rather than long, unparseable error messages. This produces much more useful logging messages. Logrus is currently in maintenance mode, which means that new features will no longer be introduced. This does not mean however, that it is dead. It continues to be maintained for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    HeartMuLa

    HeartMuLa

    A Family of Open Sourced Music Foundation Models

    ...The project also includes HeartCodec, a music codec optimized for high reconstruction fidelity, enabling efficient tokenization and reconstruction workflows that are critical for training and generation pipelines. For text extraction from audio, it provides HeartTranscriptor, a Whisper-based model tuned specifically for lyrics transcription, which helps bridge generated or recorded audio back into structured text. It also introduces HeartCLAP, which aligns audio and text into a shared embedding space.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 11
    BrowserAI

    BrowserAI

    Run local LLMs like llama, deepseek, kokoro etc. inside your browser

    ...The platform provides a developer-friendly SDK with pre-configured popular models, and it allows for seamless switching between MLC and Transformer engines. Additionally, it supports features such as speech recognition, text-to-speech, structured output generation, and Web Worker support for non-blocking UI performance.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    ...It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 13
    Audiblez

    Audiblez

    Generate audiobooks from e-books

    Audiblez is a tool for generating high-quality .m4b audiobooks directly from .epub e-books using the Kokoro-82M neural text-to-speech model. It focuses on making audiobook creation easy and fast: from a single command, the tool splits an e-book into chapters, synthesizes audio for each section, and then merges the results into a structured audiobook with chapter-based WAV files and a final .m4b container. The Kokoro-82M model it uses is compact (82M parameters) yet natural sounding, trained on under 100 hours of audio, and supports multiple languages, including English (US/UK), Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, and Mandarin Chinese. ...
    Downloads: 30 This Week
    Last Update:
    See Project
  • 14
    node-llama-cpp

    node-llama-cpp

    Run AI models locally on your machine with node.js bindings for llama

    ...The system automatically detects the available hardware on a machine and selects the most appropriate compute backend, including CPU or GPU acceleration. Developers can use the library to perform tasks such as text generation, conversational chat, embedding generation, and structured output generation. Because it runs models locally, the platform is particularly useful for privacy-sensitive environments or offline AI deployments.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    Tolaria

    Tolaria

    Desktop app to manage markdown knowledge bases

    Tolaria is a platform designed to help developers understand, refactor, and improve codebases through structured analysis and transformation workflows. It focuses on breaking down complex systems into manageable components, making it easier to identify technical debt and architectural issues. The project emphasizes clarity, maintainability, and iterative improvement of software systems. It provides tools and patterns for analyzing dependencies, restructuring modules, and improving code...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    Search-Index

    Search-Index

    A persistent, network resilient, full text search library

    Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    NarratoAI

    NarratoAI

    Using AI models to automatically provide commentary and edit videos

    NarratoAI is an open-source platform designed to automate the generation of narrative content using artificial intelligence. The system combines large language models with media processing capabilities to create scripts, stories, and structured narrative outputs from user inputs. NarratoAI supports workflows where users provide prompts, themes, or source materials, and the software organizes them into coherent narrative structures suitable for articles, scripts, or multimedia storytelling. The project integrates multiple AI components such as text generation models, content structuring pipelines, and automated editing tools to streamline content creation. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    OpenMed

    OpenMed

    Open source healthcare AI

    OpenMed is an open-source healthcare AI and medical NLP toolkit designed to turn clinical text into structured insights using transformer-based models and production-oriented interfaces. Its core purpose is to provide specialized medical entity extraction, PII detection and de-identification, assertion-aware analysis, and related healthcare text processing capabilities without locking users into a proprietary platform. The project includes a curated registry of more than a dozen medical NER models focused on areas such as diseases, drugs, anatomy, genes, and protected health information, and it is built to support both research and deployment scenarios. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 19
    NeMo Retriever Library

    NeMo Retriever Library

    Document content and metadata extraction microservice

    NeMo Retriever Library is a scalable microservice framework designed for extracting, structuring, and enriching content from documents to support downstream generative AI applications. It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient handling of large datasets. It supports multiple extraction strategies for different document formats, balancing accuracy and throughput depending on the use case. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    OpenLess

    OpenLess

    AI-polished text appears at your cursor in any app

    OpenLess is an open-source voice input application for macOS and Windows that turns spoken ideas into polished text at the current cursor position. Users press a global hotkey, speak naturally, and release the key to receive cleaned-up text inside apps such as ChatGPT, Claude, Cursor, Notion, email clients, or chat boxes. Unlike basic dictation tools, it is designed to restructure loose speech into more useful writing, especially AI prompts with clearer context and constraints. The app...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    ChatGPT Exporter

    ChatGPT Exporter

    Export and Share your ChatGPT conversation history

    ChatGPT Exporter is a browser-based userscript tool designed to export ChatGPT conversations into multiple structured and shareable formats, enabling users to preserve, analyze, and reuse AI-generated content outside the ChatGPT interface. It integrates directly into the ChatGPT web environment, typically via tools like Tampermonkey, and adds export functionality without requiring backend services or complex setup. The tool supports a wide range of output formats including plain text, HTML, Markdown, JSON, and even image-based exports, making it suitable for documentation, knowledge management, and data analysis workflows. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    SemTools

    SemTools

    Semantic search and document parsing tools for the command line

    ...The project focuses on enabling developers and AI agents to process large document collections and extract meaningful semantic representations that can be searched efficiently. Built with Rust for performance and reliability, the toolchain provides fast processing of text and structured documents while maintaining low system overhead. SemTools can parse documents, build semantic embeddings, and perform similarity searches across datasets, making it useful for research, knowledge management, and AI-assisted coding workflows. The toolkit is designed to work well with modern AI pipelines, particularly those involving large language models that require structured knowledge retrieval.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    repo2txt

    repo2txt

    Web-based tool converts GitHub repository contents

    repo2txt is an open-source developer tool that converts the contents of a code repository into a single structured text file that can be easily consumed by large language models. The tool is designed to address the challenge of analyzing entire codebases with AI assistants, where code is normally distributed across many files and directories. By collecting repository contents and formatting them into a single text document, repo2txt allows developers to feed complete projects into AI systems for analysis, documentation, or code explanation tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    o1-engineer

    o1-engineer

    o1-engineer is a command-line tool designed to assist developers

    o1-engineer is a command-line development assistant powered by OpenAI’s API. It helps developers interact with projects through commands for code generation, file editing, project planning, and code review. The tool can add, edit, and manage both files and folders directly from the terminal. Its planning command can create structured project plans that can then guide systematic file and directory generation. It also keeps conversation history and allows users to save or reset context as...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 25
    Memory OS

    Memory OS

    A 7-layer memory operating system for Hermes Agent

    Memory OS is a local memory operating system for Hermes Agent. It is designed to help an AI agent retain project context, decisions, structured facts, reasoning patterns, and prior conversations across sessions. The system uses seven memory layers that combine flat files, SQLite, full-text search, structured facts, semantic recall, Qdrant vector storage, and a self-curating wiki pipeline. It injects only relevant context back into the agent so memory remains useful without wasting tokens. ...
    Downloads: 1 This Week
    Last Update:
    See Project
Auth0 Logo