Search Results for "text processing" - Page 7

Sort By:

Showing 1237 open source projects for "text processing"

View related business solutions

Mac Clear Filters & Widen Search

$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
1

led-text-editor

A simple and easy-to-use but yet powerful line-oriented text editor.

led is a simple and easy-to-use but yet powerful line-oriented text editor. It is written in Urn Lisp and compiled to Lua, so it is available for every platform where Lua (version 5.1 or higher) is available as well; however some special features are available only with Lua 5.1 (or LuaJIT) on AmigaOS, MorphOS, AROS and UNIX with XTerm. The latest release (18-Mar-2021) now supports also scripts.

Downloads: 0 This Week

Last Update: 2025-01-31
See Project
2

clip-js

online video editor built with nextjs, remotion and ffmpeg

clip-js is a browser-based video editor built with modern web technologies such as Next.js and Remotion, designed to provide real-time editing and rendering directly in the browser. It enables users to create and edit video compositions using a timeline interface, combining video, audio, images, and text layers into a single project. The system uses a WebAssembly port of FFmpeg to perform high-quality rendering, allowing export of videos without relying on server-side processing. It includes interactive controls for trimming, splitting, and arranging media elements with precise timing. The editor supports dynamic adjustments such as opacity, positioning, and layering to fine-tune compositions. ...

Downloads: 0 This Week

Last Update: 2026-04-27
See Project
3

Ollama-rs

A simple and easy-to-use library for interacting with the Ollama API

Ollama-rs is a Rust library designed to provide a simple and efficient interface for interacting with the Ollama API, enabling developers to integrate local large language models into Rust applications. It follows the official Ollama API closely, ensuring compatibility while offering an idiomatic Rust experience with strong typing and asynchronous execution. The library supports a wide range of operations, including text generation, chat interactions, embeddings, and model management, making...

Downloads: 0 This Week

Last Update: 2026-04-20
See Project
4

ASNmap

CLI tool for mapping organization network ranges using ASN data

...This capability makes it particularly useful for security researchers, penetration testers, and reconnaissance workflows that require identifying network infrastructure owned by a target organization. asnmap retrieves ASN-related data and returns structured results that can be easily integrated into automated pipelines. Output can be generated in multiple formats including plain text, JSON, and CSV, enabling flexible data processing and analysis. asnmap also supports reading input from standard input and piping its results directly into other command line tools for chained workflows.

Downloads: 0 This Week

Last Update: 2026-03-08
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

NeMo Curator

Scalable data pre processing and curation toolkit for LLMs

NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline...

Downloads: 0 This Week

Last Update: 2026-05-12
See Project
6

Tokenizers

Fast State-of-the-Art Tokenizers optimized for Research and Production

...Train new vocabularies and tokenize, using today’s most used tokenizers. Extremely fast (both training and tokenization), thanks to the Rust implementation. Takes less than 20 seconds to tokenize a GB of text on a server’s CPU. Easy to use, but also extremely versatile. Designed for both research and production. Full alignment tracking. Even with destructive normalization, it’s always possible to get the part of the original sentence that corresponds to any token. Does all the pre-processing: Truncation, Padding, add the special tokens your model needs.

Downloads: 0 This Week

Last Update: 2026-04-27
See Project
7

FlexLLMGen

Running large language models on a single GPU

FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
8

video-use

Edit videos with Claude Code

...The system intelligently analyzes audio transcripts and visual cues to make precise, context-aware editing decisions. It supports a wide range of content types, including interviews, tutorials, montages, and talking-head videos. By combining structured text representations with on-demand visual previews, it minimizes processing overhead while maintaining high-quality results. Overall, Video Use reimagines video editing as an AI-driven, conversational workflow.

Downloads: 16 This Week

Last Update: 2026-05-15
See Project
9

Raycast Ollama

Raycast extention for Ollama

Raycast Ollama is an extension for Raycast that integrates Ollama-based large language models directly into the macOS productivity launcher environment. It allows users to interact with local AI models through Raycast commands, enabling quick access to chat, text generation, and other AI-powered tasks without leaving their workflow. The extension is designed to be lightweight and fast, aligning with Raycast’s philosophy of keyboard-driven productivity. It provides a seamless interface for...

Downloads: 3 This Week

Last Update: 6 days ago
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

Feign

Make writing Java http clients easier

...Inspired by previous projects Retrofit, JAXRS-2.0 and WebSocket, Feign was designed to reduce the complexity that is often involved in binding the Denominator uniformly to HTTP APIs, no matter the ReSTfulness. Feign works by processing annotations into a templatized request, to which arguments are applied in a straightforward manner before output. While it may only support text-based APIs, it simplifies system aspects dramatically and makes it much easier to unit test your conversions. Feign makes use of great tools like Jersey and CXF for writing Java clients for ReST or SOAP services. ...

Downloads: 3 This Week

Last Update: 2026-04-17
See Project
11

Weaviate

Weaviate is a cloud-native, modular, real-time vector search engine

Weaviate in a nutshell: Weaviate is a vector search engine and vector database. Weaviate uses machine learning to vectorize and store data, and to find answers to natural language queries. With Weaviate you can also bring your custom ML models to production scale. Weaviate in detail: Weaviate is a low-latency vector search engine with out-of-the-box support for different media types (text, images, etc.). It offers Semantic Search, Question-Answer-Extraction, Classification, Customizable...

Downloads: 2 This Week

Last Update: 2 days ago
See Project
12

HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU...

1 Review

Downloads: 5 This Week

Last Update: 2025-09-23
See Project
13

spaCy models

Models for the spaCy Natural Language Processing (NLP) library

spaCy is designed to help you do real work, to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. Since its release in 2015, spaCy has become an industry...

Downloads: 5 This Week

Last Update: 2026-03-18
See Project
14

BibDesk

Bibliography manager for Mac OS X

BibDesk is a graphical bibTeX bibliography manager for Mac OS X.

21 Reviews

Downloads: 2,744 This Week

Last Update: 3 hours ago
See Project
15

Qwen

The official repo of Qwen chat & pretrained large language model

Qwen is a series of large language models developed by Alibaba Cloud, consisting of various pretrained versions like Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B. These models, which range from smaller to larger configurations, are designed for a wide range of natural language processing tasks. They are openly available for research and commercial use, with Qwen's code and model weights shared on GitHub. Qwen's capabilities include text generation, comprehension, and conversation, making it a versatile tool for developers looking to integrate advanced AI functionalities into their applications.

1 Review

Downloads: 12 This Week

Last Update: 2026-03-05
See Project
16

Scriberr

Self-hosted AI audio transcription

Scriberr is a self-hosted AI-powered transcription platform designed to convert audio and video into highly accurate text while prioritizing privacy and local processing. Unlike cloud-based transcription services, Scriberr runs entirely on the user’s machine, ensuring that sensitive recordings are never sent to third-party servers and remain fully under user control. It leverages modern speech recognition models such as Whisper and other advanced architectures to deliver precise transcripts with word-level timing and speaker identification. ...

Downloads: 4 This Week

Last Update: 2026-03-19
See Project
17

LandPPT

An LLM-based presentation generation platform

...The system allows users to create complete PowerPoint presentations simply by entering a topic or uploading source documents such as PDFs, Word files, or Markdown notes. Using natural language processing and structured content generation, the platform produces presentation outlines and converts them into fully formatted slide decks. The application integrates multiple AI models from providers such as OpenAI, Anthropic, Google, and locally hosted models to generate text, images, and structured presentation layouts. It also includes template systems and style options that allow presentations to be customized for different industries, visual themes, or storytelling formats.

Downloads: 4 This Week

Last Update: 2026-05-15
See Project
18

MiniCPM4

Ultra-Efficient LLMs on End Device

MiniCPM4 is part of the MiniCPM family of ultra-efficient large language models designed specifically for high performance on edge devices and resource-constrained environments. Unlike traditional large-scale models that require extensive computational resources, MiniCPM4 focuses on delivering competitive reasoning and language capabilities while maintaining significantly lower latency and higher efficiency. It achieves this through optimized architectures, scalable training strategies, and...

Downloads: 0 This Week

Last Update: 2026-04-13
See Project
19

Step3-VL-10B

Multimodal model achieving SOTA performance

...It achieves this efficiency and strong performance through unified pre-training on a massive 1.2 trillion-token multimodal corpus that jointly optimizes a language-aligned perception encoder with a powerful decoder, creating deep synergy between image processing and text understanding.

Downloads: 0 This Week

Last Update: 2026-01-22
See Project
20

LangExtract

A Python library for extracting structured information

LangExtract is a Python library developed by Google that leverages large language models (LLMs) to extract structured information from unstructured text—such as clinical notes, research papers, or literary works—based on user-defined instructions. It is designed to transform free-form text into reliable, schema-constrained data while maintaining traceability back to the source material. Each extracted entity is precisely grounded in its original context, allowing visual inspection and validation via automatically generated interactive HTML visualizations. ...

Downloads: 0 This Week

Last Update: 4 days ago
See Project
21

amrlib

A python library that makes AMR parsing, generation and visualization

A python library that makes AMR parsing, generation and visualization simple. amrlib is a python module designed to make processing for Abstract Meaning Representation (AMR) simple by providing the following functions. Sentence to Graph (StoG) parsing to create AMR graphs from English sentences. Graph to Sentence (GtoS) generation for turning AMR graphs into English sentences. A QT-based GUI to facilitate the conversion of sentences to graphs and back to sentences. Methods to plot AMR graphs...

Downloads: 0 This Week

Last Update: 2026-03-07
See Project
22

Kimi-Audio

Audio foundation model excelling in audio understanding

Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one system, enabling developers to build rich, multimodal audio applications without stitching together disparate components. ...

Downloads: 1 This Week

Last Update: 2026-01-27
See Project
23

SuperSocket

Extensible socket server application framework for .NET

...The framework supports multiple protocols, including TCP, UDP, and WebSocket, and gives developers a modular architecture for protocol parsing, session handling, command processing, and server hosting. It is suitable for chat servers, game servers, IoT gateways, telemetry services, industrial systems, and other real-time network applications. SuperSocket is designed to be flexible enough for custom binary or text protocols while still offering reusable abstractions for common server patterns. It is most useful for .NET teams that need robust networking infrastructure with room for domain-specific protocol logic.

Downloads: 3 This Week

Last Update: 6 days ago
See Project
24

Scanopy

Clean network diagrams, One-time setup, zero upkeep

Scanopy is a powerful multi-modal data capture and analysis toolkit that enables users to collect, process, and visualize structured and unstructured information from a variety of sources in a flexible pipeline. It is built to handle complex scanning tasks — such as OCR, document analysis, audio transcription, network data capture, and image extraction — while providing unified APIs and workflows that make managing heterogeneous data sources seamless. Developers can compose custom pipelines...

Downloads: 3 This Week

Last Update: 2026-04-28
See Project
25

LibPDF

A modern PDF library for TypeScript

...The library offers full read and write manipulation, including support for encryption with RC4 and modern AES cipher suites, form filling and flattening, digital signature creation and verification, page merging/splitting, rich text extraction with layout information, and font embedding with subsetting.

Downloads: 1 This Week

Last Update: 6 days ago
See Project