Search Results for "python text parser" - Page 4

Sort By:

Relevance

Clear All Filters

Windows 381
Linux 343
Mac 334
More...
BSD 77
ChromeOS 74
Mobile Operating Systems 9
Desktop Operating Systems 5
Server Operating Systems 1

Showing 381 open source projects for "python text parser"

View related business solutions

Artificial Intelligence Windows Clear Filters & Widen Search

Gen AI apps are built with MongoDB Atlas
Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.

Start Free
Photo and Video Editing APIs and SDKs
Trusted by 150 million+ creators and businesses globally

Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.

Learn More
1

GraphRAG

A modular graph-based Retrieval-Augmented Generation (RAG) system

The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.

Downloads: 3 This Week

Last Update: 5 days ago
See Project
2

Shap-E

Generate 3D objects conditioned on text or images

The shap-e repository provides the official code and model release for Shap-E, a conditional generative model designed to produce 3D assets (implicit functions, meshes, neural radiance fields) from text or image prompts. The model is built with a two-stage architecture: first an encoder that maps existing 3D assets into parameterizations of implicit functions, and then a conditional diffusion model trained on those parameterizations to generate new assets. Because it works at the level...

Downloads: 3 This Week

Last Update: 2025-10-02
See Project
3

HunyuanCustom

Multimodal-Driven Architecture for Customized Video Generation

HunyuanCustom is a multimodal video customization framework by Tencent Hunyuan, aimed at generating customized videos featuring particular subjects (people, characters) under flexible conditions, while maintaining subject/identity consistency. It supports conditioning via image, audio, video, and text, and can perform subject replacement in videos, generate avatars speaking given audio, or combine multiple subject images. The architecture builds on HunyuanVideo, with added modules for identity...

Downloads: 3 This Week

Last Update: 2025-09-23
See Project
4

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds state...

Downloads: 3 This Week

Last Update: 2025-09-23
See Project
Deliver secure remote access with OpenVPN.
Trusted by nearly 20,000 customers worldwide, and all major cloud providers.

OpenVPN's products provide scalable, secure remote access — giving complete freedom to your employees to work outside the office while securely accessing SaaS, the internet, and company resources.

Get started — no credit card required.
5

Autolabel

Label, clean and enrich text datasets with LLMs

Autolabel is a Python library to label, clean and enrich datasets with Large Language Models (LLMs). Autolabel data for NLP tasks such as classification, question-answering and named entity recognition, entity matching and more. Seamlessly use commercial and open-source LLMs from providers such as OpenAI, Anthropic, HuggingFace, Google and more.

Downloads: 3 This Week

Last Update: 2023-10-16
See Project
6

Guardrails

Adding guardrails to large language models

Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs). At the heart of Guardrails is the rail spec. rail is intended to be a language-agnostic, human-readable format for specifying structure and type information, validators and corrective actions over LLM outputs. We create a RAIL spec to describe the expected structure and types of the LLM output, the quality criteria for the output to be considered valid...

Downloads: 3 This Week

Last Update: 2025-09-22
See Project
7

CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo

CogVideo is an open source text-/image-/video-to-video generation project that hosts the CogVideoX family of diffusion-transformer models and end-to-end tooling. The repo includes SAT and Diffusers implementations, turnkey demos, and fine-tuning pipelines (including LoRA) designed to run across a wide range of NVIDIA GPUs, from desktop cards (e.g., RTX 3060) to data-center hardware (A100/H100). Current releases cover CogVideoX-2B, CogVideoX-5B, and the upgraded CogVideoX1.5-5B variants, plus...

Downloads: 3 This Week

Last Update: 2025-10-04
See Project
8

MLJAR Studio

Python package for AutoML on Tabular Data with Feature Engineering

We are working on new way for visual programming. We developed a desktop application called MLJAR Studio. It is a notebook-based development environment with interactive code recipes and a managed Python environment. All running locally on your machine. We are waiting for your feedback. The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct...

Downloads: 3 This Week

Last Update: 2025-07-07
See Project
9

GPT-2

Code for the paper Language Models are Unsupervised Multitask Learners

This repository contains the code and model weights for GPT-2, a large-scale unsupervised language model described in the OpenAI paper “Language Models are Unsupervised Multitask Learners.” The intent is to provide a starting point for researchers and engineers to experiment with GPT-2: generate text, fine‐tune on custom datasets, explore model behavior, or study its internal phenomena. The repository includes scripts for sampling, training, downloading pre-trained models, and utilities...

Downloads: 3 This Week

Last Update: 2025-09-26
See Project
No-Nonsense Code-to-Cloud Security for Devs | Aikido
Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.

Start for Free
10

Phi-3-MLX

Phi-3.5 for Mac: Locally-run Vision and Language Models

Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.

Downloads: 3 This Week

Last Update: 2025-03-13
See Project
11

Qwen2-Audio

Repo of Qwen2-Audio chat & pretrained large audio language model

Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound...

Downloads: 3 This Week

Last Update: 2025-09-23
See Project
12

Argilla

The open-source data curation platform for LLMs

Argilla is a production-ready framework for building and improving datasets for NLP projects. Deploy your own Argilla Server on Spaces with a few clicks. Use embeddings to find the most similar records with the UI. This feature uses vector search combined with traditional search (keyword and filter based). Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can use and combine your preferred...

Downloads: 3 This Week

Last Update: 2025-03-10
See Project
13

Dream Textures

Stable Diffusion built-in to Blender

Create textures, concept art, background assets, and more with a simple text prompt. Use the 'Seamless' option to create textures that tile perfectly with no visible seam. Texture entire scenes with 'Project Dream Texture' and depth to image. Re-style animations with the Cycles render pass. Run the models on your machine to iterate without slowdowns from a service. Create textures, concept art, and more with text prompts. Learn how to use the various configuration options to get exactly what...

Downloads: 3 This Week

Last Update: 2024-08-26
See Project
14

OpenCLIP

An open source implementation of CLIP

The goal of this repository is to enable training models with contrastive image-text supervision and to investigate their properties such as robustness to distribution shift. Our starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset. Specifically, a ResNet-50 model trained with our codebase on OpenAI's 15 million image subset of YFCC achieves 32.7% top-1 accuracy on ImageNet. OpenAI's CLIP model reaches 31.3% when...

Downloads: 3 This Week

Last Update: 2025-09-21
See Project
15

Qwen-2.5-VL

Qwen2.5-VL is the multimodal large language model series

Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...

Downloads: 3 This Week

Last Update: 2025-09-23
See Project
16

Underthesea

Underthesea - Vietnamese NLP Toolkit

Underthesea is a Vietnamese NLP toolkit providing various text processing capabilities, including word segmentation, part-of-speech tagging, and named entity recognition.

Downloads: 2 This Week

Last Update: 2025-09-28
See Project
17

Aider

Aider is AI pair programming in your terminal

...-driven change is committed with clear messages, giving developers full transparency and control. Beyond text prompts, Aider accepts images, web pages, and even voice commands to guide code changes. With growing adoption, community support, and active development, Aider is widely regarded as one of the most capable free AI coding assistants available today.

Downloads: 3 This Week

Last Update: 2025-08-09
See Project
18

towhee

Framework that is dedicated to making neural data processing

Towhee is an open-source machine-learning pipeline that helps you encode your unstructured data into embeddings. You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to model...

Downloads: 3 This Week

Last Update: 2023-12-05
See Project
19

DeepSpeed MII

MII makes low-latency and high-throughput inference possible

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed. The Deep Learning (DL) open-source community has seen tremendous growth in the last few months. Incredibly powerful text generation models such as the Bloom 176B, or image generation model such as Stable Diffusion are now available to anyone with access to a handful or even a single GPU through platforms such as Hugging Face. While open-sourcing has democratized access to AI capabilities, their application...

Downloads: 3 This Week

Last Update: 2025-03-25
See Project
20

Label Sleuth

Open source no-code system for text annotation and building of text

An open-source no-code system for text annotation and building text classifiers. No AI knowledge needed. From task definition to working model in just a few hours! While domain experts label their data, Label Sleuth automatically trains in the background-appropriate machine learning models. To avoid wasted labeling effort, Label Sleuth employs active learning techniques to guide the user in what they should be labeled next. Domain experts can quickly start labeling their data through...

Downloads: 2 This Week

Last Update: 2024-06-17
See Project
21

Large Concept Model

Language modeling in a sentence representation space

... large image–text or weakly supervised corpora. It includes utilities to build concept vocabularies, map supervision signals to those vocabularies, and measure zero-shot or few-shot generalization. Probing tools help diagnose what the model knows—e.g., attribute recognition, relation understanding, or compositionality—so you can iterate on data and objectives. The design is modular, making it straightforward to swap backbones, change objectives, or integrate retrieval components.

Downloads: 3 This Week

Last Update: 6 days ago
See Project
22

HumanEval

Code for the paper "Evaluating Large Language Models Trained on Code"

human-eval is a benchmark dataset and evaluation framework created by OpenAI for measuring the ability of language models to generate correct code. It consists of hand-written programming problems with unit tests, designed to assess functional correctness rather than superficial metrics like text similarity. Each task includes a natural language prompt and a function signature, requiring the model to generate an implementation that passes all provided tests. The benchmark has become a standard...

Downloads: 3 This Week

Last Update: 2025-10-03
See Project
23

HunyuanVideo-Foley

Multimodal Diffusion with Representation Alignment

HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use...

Downloads: 3 This Week

Last Update: 2025-09-23
See Project
24

Obsei

Obsei is a low code AI powered automation tool

Obsei is an automated no-code/low-code AI-powered text observation and analysis framework, designed for extracting insights from unstructured text data such as social media, reviews, and logs.

Downloads: 2 This Week

Last Update: 2025-01-24
See Project
25

SciSpaCy

A full spaCy pipeline and models for scientific/biomedical documents

ScispaCy is a spaCy extension optimized for processing biomedical and scientific text, providing domain-specific NLP models for tasks like named entity recognition (NER) and dependency parsing.

Downloads: 2 This Week

Last Update: 2025-10-01
See Project