Showing 17 open source projects for "text processing"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Text Embeddings Inference

    Text Embeddings Inference

    High-performance inference server for text embeddings models API layer

    Text Embeddings Inference is a high-performance server designed to serve text embedding models efficiently in production environments. It focuses on delivering fast and scalable embedding generation by leveraging optimized inference techniques and modern hardware acceleration. It is built to support transformer-based embedding models, making it suitable for tasks such as semantic search, clustering, and retrieval-augmented systems. It provides an API interface that allows developers to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Lingua-RS

    Lingua-RS

    The most accurate natural language detection library for Rust

    Lingua-RS is a language detection library implemented in Rust, designed to accurately identify the language of given text samples. It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    PostgresML

    PostgresML

    The GPU-powered AI application database

    PostgresML is a complete platform in a PostgreSQL extension. Build simpler, faster, and more scalable models right inside your database. Explore the SDK and test open source models in our hosted database. Combine and automate the entire workflow from embedding generation to indexing and querying for the simplest (and fastest) knowledge-based chatbot implementation. Leverage multiple types of natural language processing and machine learning models such as vector search and personalization...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Databend

    Databend

    Cloud-native open source data warehouse for analytics and AI queries

    Databend is an open source cloud-native data warehouse designed for large-scale analytics and modern data workloads. Built in Rust, the system focuses on high performance, scalability, and efficient data processing for analytical queries. It is designed with a separation of compute and storage, allowing compute nodes to scale independently while storing data in object storage systems. This architecture enables cost-efficient storage and elastic scaling for workloads that involve large datasets and complex queries. Databend provides a unified engine capable of handling analytics, vector search, and full-text search within a single platform. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • 5
    SemTools

    SemTools

    Semantic search and document parsing tools for the command line

    ...The project focuses on enabling developers and AI agents to process large document collections and extract meaningful semantic representations that can be searched efficiently. Built with Rust for performance and reliability, the toolchain provides fast processing of text and structured documents while maintaining low system overhead. SemTools can parse documents, build semantic embeddings, and perform similarity searches across datasets, making it useful for research, knowledge management, and AI-assisted coding workflows. The toolkit is designed to work well with modern AI pipelines, particularly those involving large language models that require structured knowledge retrieval.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Ollama-rs

    Ollama-rs

    A simple and easy-to-use library for interacting with the Ollama API

    Ollama-rs is a Rust library designed to provide a simple and efficient interface for interacting with the Ollama API, enabling developers to integrate local large language models into Rust applications. It follows the official Ollama API closely, ensuring compatibility while offering an idiomatic Rust experience with strong typing and asynchronous execution. The library supports a wide range of operations, including text generation, chat interactions, embeddings, and model management, making...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Scanopy

    Scanopy

    Clean network diagrams, One-time setup, zero upkeep

    Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 8
    Tokenizers

    Tokenizers

    Fast State-of-the-Art Tokenizers optimized for Research and Production

    ...Train new vocabularies and tokenize, using today’s most used tokenizers. Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server’s CPU. Easy to use, but also extremely versatile. Designed for both research and production. Full alignment tracking. Even with destructive normalization, it’s always possible to get the part of the original sentence that corresponds to any token. Does all the pre-processing: Truncation, Padding, add the special tokens your model needs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Biome

    Biome

    A toolchain for web projects, aimed to provide functionalities

    Biome formats and lints your code in a fraction of a second. Biome supports JavaScript, TypeScript, JSON, and CSS. It aims to support all main languages of modern web development. Biome has sane defaults and requires minimal configuration. Biome helps you as much as possible by displaying detailed and contextualized diagnostics. Biome unifies functionality that has previously been separate tools. Building upon a shared base allows us to provide a cohesive experience for processing code,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 10
    rust-bert

    rust-bert

    Rust native ready-to-use NLP pipelines and transformer-based models

    rust-bert is a Rust-based implementation of transformer-based natural language processing models that provides ready-to-use pipelines for tasks such as text classification, summarization, and question answering. The project ports many capabilities of the Hugging Face Transformers ecosystem into the Rust programming language. It allows developers to run state-of-the-art NLP models like BERT, GPT-2, and DistilBERT directly within Rust applications while maintaining high performance and memory efficiency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    ReductStore

    ReductStore

    The fastest time series object store for Edge AI

    History storage and management of images, vibration data, text, labels, and more - all in one place with the highest performance. Merge blob and time series functionalities, reducing the need for multiple databases. Customize real-time data retention policies and replication strategies. Store billions of time-stamped blobs with AI labels and access them with low latency. Outperform other databases with a customized solution for time-series object data. Capture and access blob data as time...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Mindwtr

    Mindwtr

    A complete Getting Things Done (GTD) productivity system for desktop a

    Mindwtr: The Privacy-First GTD System Mindwtr is a Getting Things Done (GTD) productivity tool designed for "Mind Like Water." It runs completely offline—no accounts, no tracking, and no subscriptions. The Core GTD Workflow Capture: Instantly offload thoughts to your Inbox. Clarify: Process tasks rapidly with the built-in "2-Minute Rule" timer. Organize: Sort tasks by Contexts (@work, @home), Areas, and Projects. Reflect: Keep your system trustworthy with a guided Weekly...
    Leader badge
    Downloads: 9 This Week
    Last Update:
    See Project
  • 14
    Rome formatter

    Rome formatter

    Unified developer tools for JavaScript, TypeScript, and the web

    ...Rome is designed to replace Babel, ESLint, webpack, Prettier, Jest, and others. Rome unifies functionality that has previously been separate tools. Building upon a shared base allows us to provide a cohesive experience for processing code, displaying errors, parallelizing work, caching, and configuration. Rome has strong conventions and aims to have minimal configuration. Read more about our project philosophy. Rome is written in Rust. Rome has first-class IDE support, with a sophisticated parser that represents the source text in full fidelity and top-notch error recovery. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Whatlang-RS

    Whatlang-RS

    Natural language detection library for Rust

    Whatlang-RS is a Rust-based language detection library optimized for speed and accuracy, supporting a wide range of languages with probabilistic models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    hora

    hora

    Efficient approximate nearest neighbor search algorithm collections

    hora is an open-source high-performance vector similarity search library designed for large-scale machine learning and information retrieval systems. The project focuses on approximate nearest neighbor search, a fundamental technique used in modern AI applications such as recommendation systems, image search, and semantic search engines. Hora implements multiple efficient indexing algorithms that allow systems to rapidly search through high-dimensional vectors produced by machine learning...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Xi Editor

    Xi Editor

    A modern editor with a backend written in Rust

    xi Editor (often styled “Xi”) is a project to build a modern, high-performance text editor designed for large files, extensibility, and native UI integration. Its architecture splits a thin UI layer from a high-performance core engine (written in Rust) that handles buffer editing, syntax highlighting, undo/redo, searching, and background processing asynchronously. This separation lets the core focus on speed and concurrency, while multiple frontends (macOS, GTK, web) handle rendering, input, and platform bridging. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB