Showing 38 open source projects for "structured text"

View related business solutions
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    MarkPDFDown

    MarkPDFDown

    A high-quality PDF to Markdown tool based on large language model

    ...By producing Markdown rather than raw text, the tool makes it easier to integrate documents into knowledge bases, documentation systems, or language model pipelines that rely on structured input. The software is particularly useful for developers working with technical documents, academic papers, or reports that need to be indexed, summarized, or processed by downstream AI systems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    kg-gen

    kg-gen

    Knowledge Graph Generation from Any Text

    kg-gen is an open-source framework developed by the STAIR Lab that automatically generates knowledge graphs from unstructured text using large language models. The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Sparrow

    Sparrow

    Structured data extraction and instruction calling with ML, LLM

    Sparrow is an open-source platform designed to extract structured information from documents, images, and other unstructured data sources using machine learning and large language models. The system focuses on transforming complex documents such as invoices, receipts, forms, and scanned pages into structured formats like JSON that can be processed by downstream applications. It combines several components, including OCR pipelines, vision-language models, and LLM-based reasoning modules to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    node-llama-cpp

    node-llama-cpp

    Run AI models locally on your machine with node.js bindings for llama

    ...The system automatically detects the available hardware on a machine and selects the most appropriate compute backend, including CPU or GPU acceleration. Developers can use the library to perform tasks such as text generation, conversational chat, embedding generation, and structured output generation. Because it runs models locally, the platform is particularly useful for privacy-sensitive environments or offline AI deployments.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 6
    NarratoAI

    NarratoAI

    Using AI models to automatically provide commentary and edit videos

    NarratoAI is an open-source platform designed to automate the generation of narrative content using artificial intelligence. The system combines large language models with media processing capabilities to create scripts, stories, and structured narrative outputs from user inputs. NarratoAI supports workflows where users provide prompts, themes, or source materials, and the software organizes them into coherent narrative structures suitable for articles, scripts, or multimedia storytelling. The project integrates multiple AI components such as text generation models, content structuring pipelines, and automated editing tools to streamline content creation. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    BAML

    BAML

    The AI framework that adds the engineering to prompt engineering

    BAML is an open-source framework and domain-specific language designed to bring structured engineering practices to prompt development for large language model applications. Instead of treating prompts as unstructured text, BAML introduces a schema-driven approach where prompts are defined as typed functions with explicit inputs and outputs. This design allows developers to treat language model interactions as predictable software components rather than ad-hoc prompt strings. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    repo2txt

    repo2txt

    Web-based tool converts GitHub repository contents

    repo2txt is an open-source developer tool that converts the contents of a code repository into a single structured text file that can be easily consumed by large language models. The tool is designed to address the challenge of analyzing entire codebases with AI assistants, where code is normally distributed across many files and directories. By collecting repository contents and formatting them into a single text document, repo2txt allows developers to feed complete projects into AI systems for analysis, documentation, or code explanation tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Engram

    Engram

    A New Axis of Sparsity for Large Language Models

    Engram is a high-performance embedding and similarity search library focused on making retrieval-augmented workflows efficient, scalable, and easy to adopt by developers building search, recommendation, or semantic matching systems. It provides utilities to generate embeddings from text or other structured data, index them using efficient approximate nearest neighbor algorithms, and perform real-time similarity queries even on large corpora. Engineered with speed and memory efficiency in mind, Engram supports batched indexing, incremental updates, and custom distance metrics so developers can tailor search behaviors to their domain’s needs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 10
    POML

    POML

    Prompt Orchestration Markup Language

    POML, or Prompt Orchestration Markup Language, is a structured markup language created to improve the organization and maintainability of prompts used in large language model applications. Traditional prompt engineering often relies on unstructured text, which can become difficult to manage as prompts grow more complex and incorporate dynamic data sources. POML addresses this issue by introducing an HTML-like syntax that allows developers to organize prompts into structured components such as roles, tasks, and examples. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Kor

    Kor

    LLM

    This is a half-baked prototype that “helps” you extract structured data from text using LLMs. Specify the schema of what should be extracted and provide some examples. Kor will generate a prompt, send it to the specified LLM and parse out the output. You might even get results back.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Hollama

    Hollama

    A minimal LLM chat app that runs entirely in your browser

    ...Because the application runs as a static web interface, it does not require complex backend infrastructure and can be easily deployed or self-hosted. Hollama supports both text-based and multimodal interactions, allowing users to work with models that process images as well as text. The interface includes features for editing prompts, retrying responses, copying generated code snippets, and storing conversation history locally within the browser. Mathematical expressions can be rendered using KaTeX, and Markdown formatting allows code blocks and structured outputs to appear clearly within conversations.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    GraphRAG

    GraphRAG

    A modular graph-based Retrieval-Augmented Generation (RAG) system

    The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Mirascope

    Mirascope

    LLM abstractions that aren't obstructions

    Mirascope is a powerful, flexible, and user-friendly library that simplifies the process of working with LLMs through a unified interface that works across various supported providers, including OpenAI, Anthropic, Mistral, Gemini, Groq, Cohere, LiteLLM, Azure AI, Vertex AI, and Bedrock. Whether you're generating text, extracting structured information, or developing complex AI-driven agent systems, Mirascope provides the tools you need to streamline your development process and create powerful, robust applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    files-to-prompt

    files-to-prompt

    Concatenate a directory full of files into a single prompt

    ...It includes rich filtering controls, letting you limit by extension, include or skip hidden files, and ignore paths that match glob patterns or .gitignore rules. The output format is flexible: you can emit plain text, Markdown with fenced code blocks, or a Claude-XML style format designed for structured multi-file prompts. It can read file paths from stdin (including NUL-separated paths), which makes it easy to combine with find, rg, or other shell tools.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    DocETL

    DocETL

    A system for agentic LLM-powered data processing and ETL

    DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data. Instead of relying on single prompts or ad-hoc scripts, DocETL provides a declarative pipeline framework that breaks complex document analysis tasks into manageable operations that can be optimized and orchestrated automatically. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    LlamaIndex

    LlamaIndex

    Central interface to connect your LLM's with external data

    LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM's with external data. LlamaIndex is a simple, flexible interface between your external data and LLMs. It provides the following tools in an easy-to-use fashion. Provides indices over your unstructured and structured data for use with LLM's. These indices help to abstract away common boilerplate and pain points for in-context learning. Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    RAPTOR

    RAPTOR

    The official implementation of RAPTOR

    RAPTOR is a retrieval architecture designed to improve retrieval-augmented generation systems by organizing documents into hierarchical structures that enable more effective context retrieval. Traditional RAG systems typically retrieve small text chunks independently, which can limit a model’s ability to understand broader document context. RAPTOR addresses this limitation by recursively embedding, clustering, and summarizing documents to create a tree-structured hierarchy of information. Each level of the tree represents summaries at different levels of abstraction, allowing retrieval to operate at both detailed and high-level conceptual layers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    LandPPT

    LandPPT

    An LLM-based presentation generation platform

    ...The system allows users to create complete PowerPoint presentations simply by entering a topic or uploading source documents such as PDFs, Word files, or Markdown notes. Using natural language processing and structured content generation, the platform produces presentation outlines and converts them into fully formatted slide decks. The application integrates multiple AI models from providers such as OpenAI, Anthropic, Google, and locally hosted models to generate text, images, and structured presentation layouts. It also includes template systems and style options that allow presentations to be customized for different industries, visual themes, or storytelling formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Qwen-2.5-VL

    Qwen-2.5-VL

    Qwen2.5-VL is the multimodal large language model series

    ...The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation (exceeding 8,000 tokens), and structured data comprehension, such as tables and JSON formats. They support context lengths up to 128,000 tokens and offer multilingual capabilities in over 29 languages, including Chinese, English, French, Spanish, and more. The models are open-source under the Apache 2.0 license, with resources and documentation available on platforms like Hugging Face and ModelScope.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    StarVector

    StarVector

    StarVector is a foundation model for SVG generation

    ...Its architecture combines computer vision techniques with language modeling capabilities so it can understand visual inputs and textual prompts simultaneously. The model converts raster images or text instructions into structured vector representations, enabling high-quality vectorization and design generation. This approach allows StarVector to create scalable graphics that maintain visual quality regardless of resolution, which is especially useful for design tools and illustration workflows. Because the model produces SVG code rather than pixel images, the output can be edited programmatically or integrated directly into web and design environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    LOTUS

    LOTUS

    AI-Powered Data Processing: Use LOTUS to process all of your datasets

    LOTUS is an open-source framework and query engine designed to enable efficient processing of structured and unstructured datasets using large language models. The system provides a declarative programming model that allows developers to express complex AI data operations using high-level commands rather than manually orchestrating model calls. It offers a Python interface with a Pandas-like API, making it familiar for data scientists and engineers already working with data analysis libraries. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    MyScaleDB

    MyScaleDB

    A @ClickHouse fork that supports high-performance vector search

    ...The system is built on top of the ClickHouse database engine and extends it with specialized indexing and search capabilities optimized for vector embeddings. This design allows developers to store structured data, unstructured text, and high-dimensional vector embeddings within a single database platform. MyScaleDB enables developers to perform vector similarity searches using standard SQL syntax, eliminating the need to learn specialized vector database query languages. The database is optimized for high performance and scalability, allowing it to handle extremely large datasets and high query loads typical of production AI applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Aix-DB

    Aix-DB

    Based on the LangChain/LangGraph framework

    Aix-DB is an open-source intelligent data analysis platform that combines large language models with database technologies to enable conversational data exploration. The system is designed as a ChatBI solution that allows users to query datasets using natural language and receive structured insights, charts, and visualizations automatically. Built on frameworks such as LangChain and LangGraph, Aix-DB integrates retrieval-augmented generation and Text-to-SQL capabilities to convert user questions into executable database queries. The platform supports multiple types of data sources and provides an end-to-end pipeline that includes intent recognition, SQL generation, database execution, and visual presentation of results. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PaperBanana

    PaperBanana

    Extension of Google Research’s PaperBanana

    PaperBanana is an open-source agentic framework designed to automatically generate publication-quality academic diagrams and statistical plots directly from text descriptions. The project focuses on helping researchers, educators, and data scientists transform conceptual descriptions of figures into structured visual outputs suitable for research papers, presentations, and technical reports. Instead of manually designing charts or diagrams using traditional visualization tools, users can describe the desired figure in natural language and allow the system to generate the visual representation automatically. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
Auth0 Logo