Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence Software
Search Results

Search Results for "python text parser" - Page 3

x

Sort By:

Relevance

Clear All Filters

OS

Windows 476
Linux 473
Mac 433
More...
BSD 145
ChromeOS 138
Mobile Operating Systems 13
Desktop Operating Systems 6
Server Operating Systems 1

Category

Artificial Intelligence 524
Software Development 42
Multimedia 25
Scientific/Engineering 22
System 14
Business 12
Text Editors 10
Education 8
Communications 7
Internet 5
Formats and Protocols 3
Database 2
Terminals 2
Desktop Environment 1
Games 1
Productivity 1
Religion and Philosophy 1
Security 1
Social sciences 1

License

OSI-Approved Open Source 477
Public Domain 4
Creative Commons Attribution License 2
Other License 1

Translations

English 31
German 9
French 3
Arabic 2
More...
Chinese (Simplified) 2
Bengali 1
Brazilian Portuguese 1
Dutch 1
Korean 1
Russian 1

Programming Language

Python 480
C++ 18
JavaScript 16
Java 15
More...
C 11
Unix Shell 9
C# 6
TypeScript 5
Perl 4
PHP 4
BASIC 3
Ruby 3
Kotlin 2
PowerShell 2
Visual Basic 2
Delphi/Kylix 1
Go 1
Lua 1
Prolog 1
R 1
Rust 1
Scheme 1
XSL (XSLT/XPath/XSL-FO) 1

Status

Production/Stable 30
Beta 21
Alpha 14
Pre-Alpha 7
More...
Mature 2
Planning 1

Showing 524 open source projects for "python text parser"

View related business solutions

Artificial Intelligence Clear Filters & Widen Search

Gen AI apps are built with MongoDB Atlas
The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free
Cloud-based help desk software with ServoDesk
Full access to Enterprise features. No credit card required.

What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.

Try ServoDesk for free
1

Lingua-Py

The most accurate natural language detection library for Python

Its task is simple: It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages. Language detection is often done as part of large machine learning frameworks or natural language processing...

Downloads: 1 This Week

Last Update: 2025-05-27
See Project
2

Hazm

Persian NLP Toolkit

Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.

Downloads: 0 This Week

Last Update: 2025-01-24
See Project
3

DocTR

Library for OCR-related tasks powered by Deep Learning

DocTR provides an easy and powerful way to extract valuable information from your documents. Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor. State-of-the-art performances on public document...

Downloads: 4 This Week

Last Update: 2025-07-09
See Project
4

Stanza

Stanford NLP Python library for many human languages

Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech and morphological features, to give a syntactic structure dependency parse, and to recognize named entities. ...

Downloads: 2 This Week

Last Update: 2025-10-05
See Project
Grafana: The open and composable observability platform
Faster answers, predictable costs, and no lock-in built by the team helping to make observability accessible to anyone.

Grafana is the open source analytics & monitoring solution for every database.

Learn More
5

Parlant

The behavior guidance framework for customer-facing LLM agents

Parlant is a lightweight speech-to-text and text-to-speech framework designed for real-time AI-driven voice applications.

Downloads: 0 This Week

Last Update: 2025-11-18
See Project
6

MTEB

MTEB: Massive Text Embedding Benchmark

Text embeddings are commonly evaluated on a small set of datasets from a single task not covering their possible applications to other tasks. It is unclear whether state-of-the-art embeddings on semantic textual similarity (STS) can be equally well applied to other tasks like clustering or reranking. This makes progress in the field difficult to track, as various models are constantly being proposed without proper evaluation. To solve this problem, we introduce the Massive Text Embedding...

Downloads: 2 This Week

Last Update: 1 day ago
See Project
7

OpenAI-Compatible Edge-TTS API

Free, high-quality text-to-speech API endpoint to replace OpenAI

OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to...

Downloads: 3 This Week

Last Update: 6 days ago
See Project
8

Label Sleuth

Open source no-code system for text annotation and building of text

An open-source no-code system for text annotation and building text classifiers. No AI knowledge needed. From task definition to working model in just a few hours! While domain experts label their data, Label Sleuth automatically trains in the background-appropriate machine learning models. To avoid wasted labeling effort, Label Sleuth employs active learning techniques to guide the user in what they should be labeled next. Domain experts can quickly start labeling their data through an...

Downloads: 0 This Week

Last Update: 2024-06-17
See Project
9

ChatTTS webUI & API

A simple native web interface that uses ChatTTS to synthesize text

ChatTTS-ui is a local web interface and API wrapper around the ChatTTS speech synthesis system, designed to make advanced TTS models easy to use from a browser. It runs a small backend server (Python + Torch + ffmpeg) and exposes a simple webpage where you can type text, adjust parameters, and generate audio. The project supports Chinese, English, and mixed text with digits and control symbols, making it suitable for bilingual content and numerically heavy text like announcements or prompts. From version 0.96 onward, ffmpeg installation is required for deployment, and previous CSV/PT voice tables are no longer valid, so users instead work with updated “voice value” parameters. ...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
Keep company data safe with Chrome Enterprise
Protect your business with AI policies and data loss prevention in the browser

Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.

Download Chrome
10

StyleTTS 2

Towards Human-Level Text-to-Speech through Style Diffusion

StyleTTS2 is a state-of-the-art text-to-speech system that aims for human-level naturalness by combining style diffusion, adversarial training, and large speech language models. It extends the original StyleTTS idea by introducing a style diffusion model that can sample rich, realistic speaking styles conditioned on reference speech, allowing highly expressive and diverse prosody. The architecture uses a two-stage training process and leverages an auxiliary speech language model to guide...

Downloads: 3 This Week

Last Update: 6 days ago
See Project
11

Label Studio

Label Studio is a multi-type data labeling and annotation tool

The most flexible data annotation tool. Quickly installable. Build custom UIs or use pre-built labeling templates. Detect objects on image, bboxes, polygons, circular, and keypoints supported. Partition image into multiple segments. Use ML models to pre-label and optimize the process. Label Studio is an open-source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can...

Downloads: 16 This Week

Last Update: 2025-09-30
See Project
12

Rasa

Open source machine learning framework to automate text conversations

...Rasa uses Poetry for packaging and dependency management. If you want to build it from the source, you have to install Poetry first. By default, Poetry will try to use the currently activated Python version to create the virtual environment for the current project automatically.

Downloads: 19 This Week

Last Update: 2025-01-14
See Project
13

langrocks

Tools like web browser, computer access and code runner for LLMs

Langrocks is a programming language experimentation toolkit that enables developers to create, test, and optimize custom programming languages.

Downloads: 0 This Week

Last Update: 2024-11-21
See Project
14

Qwen-Image

Qwen-Image is a powerful image generation foundation model

Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence. The model excels not only in text rendering but also in a wide range of artistic styles, including...

1 Review

Downloads: 14 This Week

Last Update: 2025-11-11
See Project
15

LangKit

An open-source toolkit for monitoring Language Learning Models (LLMs)

LangKit is an open-source text metrics toolkit for monitoring language models. It offers an array of methods for extracting relevant signals from the input and/or output text, which are compatible with the open-source data logging library whylogs. Productionizing language models, including LLMs, comes with a range of risks due to the infinite amount of input combinations, which can elicit an infinite amount of outputs. The unstructured nature of text poses a challenge in the ML observability...

Downloads: 0 This Week

Last Update: 2024-11-06
See Project
16

Underthesea

Underthesea - Vietnamese NLP Toolkit

Underthesea is a Vietnamese NLP toolkit providing various text processing capabilities, including word segmentation, part-of-speech tagging, and named entity recognition.

Downloads: 0 This Week

Last Update: 2025-10-02
See Project
17

WhisperLive

A nearly-live implementation of OpenAI's Whisper

WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...

Downloads: 7 This Week

Last Update: 6 days ago
See Project
18

NVIDIA NeMo

Toolkit for conversational AI

NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...

Downloads: 4 This Week

Last Update: 1 day ago
See Project
19

Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM

Qwen3-Omni is a natively end-to-end multilingual omni-modal foundation model that processes text, images, audio, and video and delivers real-time streaming responses in text and natural speech. It uses a Thinker-Talker architecture with a Mixture-of-Experts (MoE) design, early text-first pretraining, and mixed multimodal training to support strong performance across all modalities without sacrificing text or image quality. The model supports 119 text languages, 19 speech input languages, and...

Downloads: 3 This Week

Last Update: 2025-09-27
See Project
20

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...

Downloads: 7 This Week

Last Update: 2025-09-23
See Project
21

GPT-2 Output Dataset

Dataset of GPT-2 outputs for research in detection, biases, and more

The GPT-2 Output Dataset is a large collection of model-generated text, released by OpenAI alongside the GPT-2 research paper to study the behaviors and limitations of large language models. It contains 250,000 samples of GPT-2 outputs, generated with different sampling strategies such as top-k truncation, to highlight the diversity and quality of model completions. The dataset also includes corresponding human-written text for comparison, enabling researchers to explore methods for...

Downloads: 1 This Week

Last Update: 11 hours ago
See Project
22

IMS Toucan

Controllable and fast Text-to-Speech for over 7000 languages

IMS-Toucan is a toolkit for training, using, and teaching state-of-the-art text-to-speech systems, built at the Institute for Natural Language Processing (IMS), University of Stuttgart. It is the official home of ToucanTTS, a massively multilingual TTS system designed to support over 7,000 languages with a single unified framework. The toolkit focuses on being fast and controllable while not requiring huge amounts of compute, making it practical for research labs and smaller teams. It...

Downloads: 3 This Week

Last Update: 6 days ago
See Project
23

DeepSeek VL2

Mixture-of-Experts Vision-Language Models for Advanced Multimodal

DeepSeek-VL2 is DeepSeek’s vision + language multimodal model—essentially the next-gen successor to their first vision-language models. It combines image and text inputs into a unified embedding / reasoning space so that you can query with text and image jointly (e.g. “What’s going on in this scene?” or “Generate a caption appropriate to context”). The model supports both image understanding (vision tasks) and multimodal reasoning, and is likely used as a component in agent systems to...

Downloads: 3 This Week

Last Update: 2025-10-03
See Project
24

shuyuan

Reading book source

shuyuan is a project oriented around reading and knowledge consumption, especially targeting large-scale text content such as books, articles, or educational material. The name suggests “academy” or “study hall,” and the tool aims to help users ingest, organize, and manage reading content — possibly offering features like text parsing, annotation, metadata generation, translation, or storage for later reference. The repository is set up to support document ingestion, indexing, and maybe some...

Downloads: 0 This Week

Last Update: 6 days ago
See Project
25

IndexTTS2

Industrial-level controllable zero-shot text-to-speech system

IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output. The system supports zero-shot voice...

Downloads: 6 This Week

Last Update: 7 days ago
See Project

Previous
1
2
You're on page 3
4
5
6
7
Next

Related Searches

nvidia

ocr

morphological analysis

image annotation

rasa

image

offline artificial intelligence\

label studio

virtual machine

cluster management software

Related Categories

Artificial Intelligence

Software Development

Multimedia

Scientific/Engineering

System

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2025 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: