Showing 291 open source projects for "processing"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Cloud tools for web scraping and data extraction Icon
    Cloud tools for web scraping and data extraction

    Deploy pre-built tools that crawl websites, extract structured data, and feed your applications. Reliable web data without maintaining scrapers.

    Automate web data collection with cloud tools that handle anti-bot measures, browser rendering, and data transformation out of the box. Extract content from any website, push to vector databases for RAG workflows, or pipe directly into your apps via API. Schedule runs, set up webhooks, and connect to your existing stack. Free tier available, then scale as you need to.
    Explore 10,000+ tools
  • 1
    TensorFlow

    TensorFlow

    TensorFlow is an open source library for machine learning

    ...Available across all common operating systems (desktop, server and mobile), TensorFlow provides stable APIs for Python and C as well as APIs that are not guaranteed to be backwards compatible or are 3rd party for a variety of other languages. The platform can be easily deployed on multiple CPUs, GPUs and Google's proprietary chip, the tensor processing unit (TPU). TensorFlow expresses its computations as dataflow graphs, with each node in the graph representing an operation. Nodes take tensors—multidimensional arrays—as input and produce tensors as output. The framework allows for these algorithms to be run in C++ for better performance, while the multiple levels of APIs let the user determine how high or low they wish the level of abstraction to be in the models produced. ...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 2
    DeerFlow

    DeerFlow

    Deep Research framework, combining language models with tools

    DeerFlow is an open-source, community-driven “deep research” framework / multi-agent orchestration platform developed by ByteDance. It aims to combine the reasoning power of large language models (LLMs) with automated tool-use — such as web search, web crawling, Python execution, and data processing — to enable complex, end-to-end research workflows. Instead of a monolithic AI assistant, DeerFlow defines multiple specialized agents (e.g. “planner,” “searcher,” “coder,” “report generator”) that collaborate in a structured workflow, allowing tasks like literature reviews, data gathering, data analysis, code execution, and final report generation to be largely automated. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Adapters

    Adapters

    A Unified Library for Parameter-Efficient Learning

    Adapters is an add-on library to HuggingFace's Transformers, integrating 10+ adapter methods into 20+ state-of-the-art Transformer models with minimal coding overhead for training and inference. Adapters provide a unified interface for efficient fine-tuning and modular transfer learning, supporting a myriad of features like full-precision or quantized training (e.g. Q-LoRA, Q-Bottleneck Adapters, or Q-PrefixTuning), adapter merging via task arithmetics or the composition of multiple adapters...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    TorchDistill

    TorchDistill

    A coding-free framework built on PyTorch

    torchdistill (formerly kdkit) offers various state-of-the-art knowledge distillation methods and enables you to design (new) experiments simply by editing a declarative yaml config file instead of Python code. Even when you need to extract intermediate representations in teacher/student models, you will NOT need to reimplement the models, which often change the interface of the forward, but instead specify the module path(s) in the yaml file. In addition to knowledge distillation, this...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Free and Open Source HR Software Icon
    Free and Open Source HR Software

    OrangeHRM provides a world-class HRIS experience and offers everything you and your team need to be that HR hero you know that you are.

    Give your HR team the tools they need to streamline administrative tasks, support employees, and make informed decisions with the OrangeHRM free and open source HR software.
    Learn More
  • 5
    flair

    flair

    A very simple framework for state-of-the-art NLP

    A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. A powerful NLP library. Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS), special support for biomedical texts, sense disambiguation and classification, with support for a rapidly growing number of languages. A text embedding library. Flair has simple interfaces that allow you to use and combine different word and document embeddings, including our proposed Flair embeddings and various transformers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    edge-tts is a Python module and command-line tool that gives you direct access to Microsoft Edge’s online text-to-speech service without needing the Edge browser, Windows, or any API key. It wraps the same cloud voices used by Edge, exposing them through a simple CLI (edge-tts, edge-playback) and a Python API, so you can script high-quality speech generation in your own applications. The tool lets you list available voices, specify locale and voice name, and generate audio files in common...
    Downloads: 38 This Week
    Last Update:
    See Project
  • 7
    spaCy models

    spaCy models

    Models for the spaCy Natural Language Processing (NLP) library

    spaCy is designed to help you do real work, to build real products, or gather real insights. The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. Since its release in 2015, spaCy has become an industry...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 8
    BitNet

    BitNet

    Inference framework for 1-bit LLMs

    ...The framework offers significant speedups and energy reductions, achieving up to 6.17x faster performance on x86 CPUs and 70% energy savings, allowing the running of models such as the BitNet b1.58 100B with impressive efficiency. With support for lossless inference and enhanced processing power, BitNet enables faster AI applications while minimizing resource usage. It is a crucial tool for developers looking to implement LLMs on local systems, offering quick execution without sacrificing performance or energy efficiency.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Download the most trusted enterprise browser Icon
    Download the most trusted enterprise browser

    Chrome Enterprise brings enterprise controls and easy integrations to the browser users already know and love.

    Chrome Enterprise is ideal for businesses of all sizes, IT professionals, and organizations looking for a secure, scalable, and easily managed browser solution that supports remote work, data protection, and streamlined enterprise operations.
    Learn More
  • 10
    POT

    POT

    Python Optimal Transport

    This open source Python library provides several solvers for optimization problems related to Optimal Transport for signal, image processing and machine learning.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    GPTme

    GPTme

    Your agent in your terminal, equipped with local tools

    GPTMe is a personal AI chatbot designed for self-reflection, journaling, and productivity, using GPT models to generate personalized insights and responses.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    clone-voice

    clone-voice

    A sound cloning tool with a web interface, using your voice

    Clone-voice is a local voice-cloning tool that lets you synthesize speech in any target voice or convert one recording into another voice using the same timbre. It is built around Coqui’s XTTS-v2 model, so it inherits multilingual support and modern neural TTS quality while wrapping it in a user-friendly desktop workflow. The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 13
    tika-python

    tika-python

    Python binding to the Apache Tika™ REST services

    A Python port of the Apache Tika library that makes Tika available using the Tika REST Server. This makes Apache Tika available as a Python library, installable via Setuptools, Pip and easy to install. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background. To get this working in a disconnected environment, download a tika server file (both tika-server.jar and tika-server.jar.md5, which can be found here) and set...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    abogen

    abogen

    Generate audiobooks from EPUBs, PDFs and text with captions

    abogen is a tool designed to generate audiobooks (or speech narrations) from textual sources such as EPUBs, PDFs, or plain text, with synchronized captions. In other words, it automates the pipeline of reading a digital book (or document), converting its text into speech via a TTS engine, and packaging the result into an audiobook format — likely along with timestamped captions or subtitles that align with the spoken audio. This can be very useful for accessibility, content consumption on...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    Megatron

    Megatron

    Ongoing research training transformer models at scale

    ...We developed efficient, model-parallel (tensor, sequence, and pipeline), and multi-node pre-training of transformer based models such as GPT, BERT, and T5 using mixed precision. Megatron is also used in NeMo Megatron, a framework to help enterprises overcome the challenges of building and training sophisticated natural language processing models with billions and trillions of parameters. Copyright (c) 2022, NVIDIA CORPORATION. All rights reserved.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    torchtext

    torchtext

    Data loaders and abstractions for text and NLP

    We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. LTS versions are distributed through a different channel than the other versioned releases. Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses. To build torchtext from source, you need git, CMake and C++11 compiler such as g++. When building from source, make sure that you have the same C++...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Qwen

    Qwen

    The official repo of Qwen chat & pretrained large language model

    Qwen is a series of large language models developed by Alibaba Cloud, consisting of various pretrained versions like Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B. These models, which range from smaller to larger configurations, are designed for a wide range of natural language processing tasks. They are openly available for research and commercial use, with Qwen's code and model weights shared on GitHub. Qwen's capabilities include text generation, comprehension, and conversation, making it a versatile tool for developers looking to integrate advanced AI functionalities into their applications.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 18
    HunyuanDiT

    HunyuanDiT

    Diffusion Transformer with Fine-Grained Chinese Understanding

    HunyuanDiT is a high-capability text-to-image diffusion transformer with bilingual (Chinese/English) understanding and multi-turn dialogue capability. It trains a diffusion model in latent space using a transformer backbone and integrates a Multimodal Large Language Model (MLLM) to refine captions and support conversational image generation. It supports adapters like ControlNet, IP-Adapter, LoRA, and can run under constrained VRAM via distillation versions. LoRA, ControlNet (pose, depth,...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 19
    DataChain

    DataChain

    AI-data warehouse to enrich, transform and analyze unstructured data

    ...Datachain is especially helpful if batch operations can be optimized – for instance, when synchronous API calls can be parallelized or where an LLM API offers batch processing.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    HunyuanVideo

    HunyuanVideo

    HunyuanVideo: A Systematic Framework For Large Video Generation Model

    HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 21
    OpenAdapt

    OpenAdapt

    Open Source Generative Process Automation

    OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web Graphical User Interfaces (GUIs). OpenAdapt learns to automate your desktop and web workflows by observing your demonstrations. Spend less time on repetitive tasks and more on work that truly matters. Boost team productivity in HR operations. Automate candidate sourcing using LinkedIn Recruiter, LinkedIn Talent Solutions, GetProspect, Reply.io, outreach.io, Gmail/Outlook, and...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    Matrix

    Matrix

    Multi-Agent daTa geneRation Infra and eXperimentation framework

    Matrix is a distributed, large-scale engine for multi-agent synthetic data generation and experiments: it provides the infrastructure to run thousands of “agentic” workflows concurrently (e.g. multiple LLMs interacting, reasoning, generating content, data-processing pipelines) by leveraging distributed computing (like Ray + cluster management). The idea is to treat data generation as a “data-to-data” transformation: each input item defines a task, and the runtime orchestrates asynchronous, peer-to-peer agent workflows, avoiding global synchronization bottlenecks. That design makes Matrix particularly well-suited for large-batch inference, model benchmarking, data curation, augmentation, or generation — whether for language, code, dialogue, or multimodal tasks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    npcpy

    npcpy

    The AI toolkit for the AI developer

    npcpy is a Python-based agent framework and command-line toolkit (the NPC Shell) for developers to build, test, and integrate AI agents into their workflows, including both command-line and GUI interfaces via NPC Studio. Welcome to npcpy, the core library of the NPC Toolkit that supercharges natural language processing pipelines and agent tooling. npcpy is a flexible framework for building state-of-the-art applications and conducting novel research with LLMs. The structure of npcpy also allows one to pass an npc to get_llm_response in addition to using the NPC's wrapped method, allowing you to be flexible in your implementation and testing.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    gensim

    gensim

    Topic Modelling for Humans

    Gensim is a Python library for topic modeling, document indexing, and similarity retrieval with large corpora. The target audience is the natural language processing (NLP) and information retrieval (IR) community.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Norfair

    Norfair

    Lightweight Python library for adding real-time multi-object tracking

    ...Any detector expressing its detections as a series of (x, y) coordinates can be used with Norfair. This includes detectors performing tasks such as object or keypoint detection. It can easily be inserted into complex video processing pipelines to add tracking to existing projects. At the same time, it is possible to build a video inference loop from scratch using just Norfair and a detector. Supports moving camera, re-identification with appearance embeddings, and n-dimensional object tracking. Norfair provides several predefined distance functions to compare tracked objects and detections. ...
    Downloads: 0 This Week
    Last Update:
    See Project