Page 2 | semantic free download

Showing 158 open source projects for "semantic"

View related business solutions

Artificial Intelligence Python Clear Filters & Widen Search

Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
1

BCEmbedding

Netease Youdao's open-source embedding and reranker models

...BCEmbedding also provides integrations for popular RAG frameworks, making it easier to add semantic search and reranking to AI applications.

Downloads: 0 This Week

Last Update: 2026-05-28
See Project
2

pix2pixHD

Synthesizing and manipulating 2048x1024 images with conditional GANs

...It also supports interactive editing, allowing users to modify semantic regions and regenerate images with realistic adjustments.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
3

DeepSeek-OCR 2

Visual Causal Flow

DeepSeek-OCR-2 is the second-generation optical character recognition system developed to improve document understanding by introducing a “visual causal flow” mechanism, enabling the encoder to reorder visual tokens in a way that better reflects semantic structure rather than strict raster scan order. It is designed to handle complex layouts and noisy documents by giving the model causal reasoning capabilities that mimic human visual scanning behavior, enhancing OCR performance on documents with rich spatial structure. The repository provides model code and inference scripts that let researchers and developers run and benchmark the system on both images and PDFs, with support for batch evaluation and optimized pipelines leveraging vLLM and transformers.

Downloads: 8 This Week

Last Update: 2026-02-03
See Project
4

MemU

MemU is an open-source memory framework for AI companions

MemU is an agentic memory layer for LLM applications, specifically designed for AI companions. Transform your memory into an intelligent file system that automatically organizes, connects, and evolves with your memories. Simple, fast, and reliable memory infrastructure for AI applications. Powerful tools and dedicated support to scale your AI applications with confidence. Full proprietary features, commercial usage rights, and white-labeling options for your enterprise needs. SSO/RBAC...

Downloads: 0 This Week

Last Update: 2026-03-23
See Project
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
5

ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

...The goal of the project is to bring BERT-style models up to date with the capabilities of modern large language models while preserving the strengths of bidirectional encoder architectures used for tasks such as classification, retrieval, and semantic search. ModernBERT introduces architectural improvements that enhance both training efficiency and inference performance, making the model more suitable for modern large-scale machine learning pipelines. The repository also includes FlexBERT, a modular framework that allows developers to experiment with different encoder building blocks and configurations when constructing new models.

Downloads: 1 This Week

Last Update: 2026-03-06
See Project
6

docext

An on-premises, OCR-free unstructured data extraction

...Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual and textual information directly from document images. This allows the system to detect and extract structured elements such as tables, signatures, key fields, and layout information while maintaining semantic understanding of the document content. The toolkit can also convert complex documents into structured markdown representations that preserve formatting and contextual relationships.

Downloads: 5 This Week

Last Update: 2026-03-12
See Project
7

SAG

SQL-Driven RAG Engine

...The engine integrates semantic vector similarity with traditional full-text search to improve both recall and precision. Because the knowledge graph is generated dynamically, the system can adapt to new information without requiring manual graph maintenance.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
8

WFGY 3.0

A tension reasoning engine over 131 S-class problems

WFGY is an experimental open-source reasoning framework designed to improve the reliability and interpretability of large language model outputs through structured reasoning layers. The project introduces a conceptual reasoning engine that analyzes complex problems by identifying semantic compression errors and residual assumptions within a system’s reasoning process. Its architecture treats reasoning failures as measurable signals that can be detected and analyzed rather than simply observed as incorrect answers. Different versions of the framework, including WFGY 1.0, 2.0, and 3.0, represent stages of development where early conceptual ideas evolved into more structured reasoning engines and diagnostic tools. ...

Downloads: 0 This Week

Last Update: 2026-05-11
See Project
9

Paperless-AI

AI-powered document analysis and tagging for Paperless-ngx

...It integrates with multiple OpenAI-compatible services as well as local models, giving users flexibility in how document intelligence is handled. A key capability is its use of retrieval-augmented generation, which enables semantic search and natural language interaction across an entire document archive. Users can ask contextual questions about their files and receive precise answers based on full document understanding rather than simple keyword matching. Paperless-AI also includes a web interface for manual review and tagging, allowing greater control when handling sensitive or complex documents.

Downloads: 7 This Week

Last Update: 2026-03-17
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
10

MemPalace

The highest-scoring AI memory system ever benchmarked

...Instead of relying on summarization or selective extraction like most memory tools, it takes a radically different approach by storing conversations in their entirety and making them retrievable through structured organization and semantic search. The system is inspired by the classical “memory palace” mnemonic technique, organizing information into hierarchical spaces such as wings, rooms, and halls, which allows AI agents to navigate past knowledge in a more contextual and intuitive way. It operates fully locally using tools like ChromaDB, meaning it requires no API keys, cloud services, or external dependencies once installed. ...

Downloads: 6 This Week

Last Update: 2 days ago
See Project
11

WeKnora

LLM framework for document understanding and semantic retrieval

WeKnora is an open source framework developed for deep document understanding and semantic information retrieval using large language models. It focuses on analyzing complex and heterogeneous documents by combining multiple processing stages such as multimodal document parsing, vector indexing, and intelligent retrieval. It follows the Retrieval-Augmented Generation (RAG) paradigm, where relevant document segments are retrieved and used by language models to generate accurate, context-aware responses. ...

Downloads: 6 This Week

Last Update: 2026-06-10
See Project
12

memsearch

A Markdown-first memory system, a standalone library for any AI agent

memsearch is a markdown-first memory system designed to provide long-term memory capabilities for AI agents through structured storage and semantic retrieval. It enables agents to store, organize, and retrieve information using embeddings and hybrid search techniques, ensuring that relevant context is always available. The system supports advanced features such as reranking and progressive disclosure, which help prioritize the most useful information for a given query. It integrates with vector databases like Milvus, enabling scalable storage and retrieval of large datasets. ...

Downloads: 3 This Week

Last Update: 1 day ago
See Project
13

kg-gen

Knowledge Graph Generation from Any Text

...The framework addresses common problems in automatic knowledge graph construction, particularly sparsity and duplication of entities, by applying a clustering and entity-resolution process that merges semantically similar nodes. This allows the generated graphs to be denser, more coherent, and easier to use for downstream tasks such as retrieval-augmented generation, semantic search, and reasoning systems.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
14

OpenRecall

OpenRecall is a fully open-source, privacy-first alternative

OpenRecall is an open-source, privacy-first system designed to capture, index, and make searchable a user’s entire digital activity history, effectively acting as a personal memory layer for computing environments. It works by taking periodic screenshots of a user’s screen and applying local AI processing, including OCR and semantic analysis, to extract and structure information from both text and images. This data is then indexed into a searchable database, allowing users to retrieve past information quickly using natural language queries. Unlike proprietary alternatives, OpenRecall operates entirely locally, ensuring that all captured data remains on the user’s device and is never transmitted to external servers. ...

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
15

AutoResearchClaw

Autonomous research from idea to paper. Chat an Idea. Get a Paper 🦞

...Built in Python, it orchestrates a multi-stage research pipeline that gathers literature, formulates hypotheses, runs experiments, analyzes results, and writes the final paper. The system retrieves real academic references from sources such as arXiv and Semantic Scholar to ensure credible citations. It can automatically generate code for experiments, run them in a sandbox environment, and analyze the results with statistical methods. The platform also uses multi-agent debate and automated peer review processes to refine research findings and improve paper quality. By combining literature discovery, experimentation, and writing automation, AutoResearchClaw aims to turn research ideas into conference-ready papers with minimal human intervention.

Downloads: 6 This Week

Last Update: 2026-05-20
See Project
16

DeepWiki Open

AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories

DeepWiki Open is an open-source, AI-powered wiki generator that automatically creates fully navigable, richly structured wiki documentation for GitHub, GitLab, or Bitbucket repositories by combining code analysis, vector embeddings, retrieval-augmented generation (RAG), and visualization tools. Users can enter a repository URL and the system will clone the project, build semantic embeddings of its codebase, extract architecture and relationships, generate human-readable documentation, and produce visual diagrams to help explain complex code structure. DeepWiki’s output turns raw repositories into interactive, web-style wikis complete with navigable sections, diagrams, and contextual explanations, making it easier for developers and collaborators to understand unfamiliar code. ...

Downloads: 2 This Week

Last Update: 2026-06-03
See Project
17

VibeVoice

Open-source multi-speaker long-form text-to-speech model

...Unlike traditional TTS systems, it excels in scalability, speaker consistency, and natural turn-taking for up to 90 minutes of continuous speech with as many as four distinct speakers. A key innovation is its use of continuous acoustic and semantic speech tokenizers operating at an ultra-low frame rate of 7.5 Hz, enabling high audio fidelity with efficient processing of long sequences. The model integrates a Qwen2.5-based large language model with a diffusion head to produce realistic acoustic details and capture conversational context. Training involved curriculum learning with increasing sequence lengths up to 65K tokens, allowing VibeVoice to handle very long dialogues effectively. ...

Downloads: 8 This Week

Last Update: 2026-05-06
See Project
18

Controllable-RAG-Agent

This repository provides an advanced RAG

Controllable-RAG-Agent is an advanced Retrieval-Augmented Generation (RAG) system designed specifically for complex, multi-step question answering over your own documents. Instead of relying solely on simple semantic search, it builds a deterministic control graph that acts as the “brain” of the agent, orchestrating planning, retrieval, reasoning, and verification across many steps. The pipeline ingests PDFs, splits them into chapters, cleans and preprocesses text, then constructs vector stores for fine-grained chunks, chapter summaries, and book quotes to support nuanced queries. ...

Downloads: 0 This Week

Last Update: 2026-06-04
See Project
19

UForm

Multi-Modal Neural Networks for Semantic Search, based on Mid-Fusion

UForm is a Multi-Modal Modal Inference package, designed to encode Multi-Lingual Texts, Images, and, soon, Audio, Video, and Documents, into a shared vector space! It comes with a set of homonymous pre-trained networks available on HuggingFace portal and extends the transfromers package to support Mid-fusion Models. Late-fusion models encode each modality independently, but into one shared vector space. Due to independent encoding late-fusion models are good at capturing coarse-grained...

Downloads: 0 This Week

Last Update: 2025-10-30
See Project
20

OpenMemory

Local long-term memory engine for AI apps with persistent storage

...It enables developers to give otherwise stateless models a structured memory layer that can store, retrieve, and manage contextual information over time. OpenMemory is built around a hierarchical memory architecture that organizes data into semantic sectors and connects them through a graph-based structure for efficient retrieval. It supports multiple embedding strategies, including synthetic and semantic embeddings, allowing developers to balance speed and accuracy depending on their use case. OpenMemory integrates with various AI tools and environments, offering SDKs and APIs that simplify adding memory capabilities to applications. ...

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
21

uqlm

Uncertainty Quantification for Language Models, is a Python package

UQLM is a Python library developed to detect hallucinations and quantify uncertainty in the outputs of large language models. The system implements a variety of uncertainty quantification techniques that assign confidence scores to model responses. These scores help developers determine how likely a generated answer is to contain errors or fabricated information. The library includes both black-box and white-box approaches to uncertainty estimation. Black-box methods evaluate model outputs...

Downloads: 2 This Week

Last Update: 2026-06-08
See Project
22

Basic Memory

Persistent AI memory using local Markdown knowledge graphs

...Instead of losing context after each chat, it stores information as simple Markdown files on your device, allowing both you and AI to read and write to the same knowledge base. It uses the Model Context Protocol (MCP) so compatible AI tools can access, update, and build on your notes across sessions. Basic Memory creates a semantic knowledge graph by linking related ideas, making it easier to retrieve, expand, and connect information over time. With a local-first design, your data stays private and portable, while optional cloud sync enables cross-device access. It combines simplicity with powerful indexing and search, giving you a flexible way to build long-term memory for projects, research, and workflows.

Downloads: 3 This Week

Last Update: 4 days ago
See Project
23

MinerU

A high-quality tool for convert PDF to Markdown and JSON

MinerU is an open-source, high-quality document extraction toolkit focused on converting PDFs (and other document formats) into structured Markdown and JSON. It leverages OCR and layout analysis to preserve semantic structure and metadata, ideal for research and data science workflows.

Downloads: 14 This Week

Last Update: 6 days ago
See Project
24

GEO Content Writer

Backlog-row-first content production system for teams

GEO Content Writer is an AI-based content generation tool designed to create optimized content tailored for geographic and semantic search contexts. It focuses on producing articles, pages, and structured content that align with both traditional SEO requirements and emerging AI search patterns. The system leverages language models to generate content that is context-aware, location-specific, and optimized for discoverability. It supports automated workflows for generating large volumes of content while maintaining consistency and relevance. ...

Downloads: 0 This Week

Last Update: 2026-04-21
See Project
25

The Hypersim Dataset

Photorealistic Synthetic Dataset for Holistic Indoor Scene

Hypersim is a large-scale, photorealistic synthetic dataset and tooling suite for indoor scene understanding research. It provides richly annotated renderings—RGB, depth, surface normals, instance and semantic segmentations, and material/lighting metadata—produced from high-fidelity virtual environments. The dataset spans diverse furniture layouts, room types, and camera trajectories, enabling robust training for geometry, segmentation, and SLAM-adjacent tasks. Rendering pipelines and utilities allow researchers to reproduce sequences, generate novel views, or extract task-specific supervision. ...

Downloads: 0 This Week

Last Update: 2026-01-09
See Project