Showing 324 open source projects for "python text parser"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    crun

    crun

    A fast and lightweight fully featured OCI runtime and C library

    A fast and low-memory footprint OCI Container Runtime fully written in C. While most of the tools used in the Linux containers ecosystem are written in Go, I believe C is a better fit for a lower-level tool like container runtime. runc, the most used implementation of the OCI runtime specs written in Go, re-execs itself and uses a module written in C for setting up the environment before the container process starts. crun aims to be also usable as a library that can be easily included in...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    txtai

    txtai

    Build AI-powered semantic search applications

    ..., models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. The following applications are powered by txtai.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    GitSavvy

    GitSavvy

    Full git and GitHub integration with Sublime Text

    Sublime Text plugin providing probably all git has to offer. Sublime Text 2 is not supported. Also, GitSavvy takes advantage of modern features of Sublime Text (like annotations). For the best experience, use the latest Sublime Text dev build. The documentation is probably outdated. Yeah it's sad but you can contribute and I will eventually get onto it but every special view has help available, just press ?. GitSavvy requires Git versions at or greater than 2.18.0. basic Git functionality; init...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    goldmark

    goldmark

    A markdown parser written in Go. Easy to extend, standard, compliant

    A markdown parser is written in Go. Easy to extend, standard(CommonMark) compliant, well structured.golang-commonmark may be a good choice, but it seems to be a copy of markdown-it. blackfriday.v2 is a fast and widely-used implementation, but is not CommonMark-compliant and cannot be extended from outside of the package, since its AST uses structs instead of interfaces. Furthermore, its behavior differs from other implementations in some cases, especially regarding lists: Deep nested lists...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    Fabric.js

    Fabric.js

    Javascript Canvas Library and SVG-to-Canvas Parser

    Fabric.js is a simple yet powerful Javascript HTML5 canvas library that allows you to easily work with HTML5 canvas element in various ways. It is also an SVG-to-canvas (and vice versa) parser. Fabric provides an interactive object model on top of canvas element, so you can create and populate objects on canvas; manipulate the size, position and rotation of these objects; modify properties such as color, transparency and more. You could also group these objects together with just a simple...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Google Open Source Project Style Guide

    Google Open Source Project Style Guide

    Chinese version of Google open source project style guide

    .... If the project you are modifying originates from Google, you may be directed to the English version of the project page to understand the style used by the project. The Chinese version of the project uses reStructuredText plain text markup syntax, and uses Sphinx to generate document formats such as HTML / CHM / PDF.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    Doctrine Annotations

    Doctrine Annotations

    Annotations docblock parser

    Doctrine Annotations allows to implement custom annotation functionality for PHP classes. Annotations aren't implemented in PHP itself which is why this component offers a way to use the PHP doc-blocks as a place for the well known annotation syntax using the @ char. Annotations in Doctrine are used for the ORM configuration to build the class mapping, but it can be used in other projects for other purposes too. You can install the Annotation component with composer. The access to the...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 10
    Gretel Synthetics

    Gretel Synthetics

    Synthetic data generators for structured and unstructured text

    Unlock unlimited possibilities with synthetic data. Share, create, and augment data with cutting-edge generative AI. Generate unlimited data in minutes with synthetic data delivered as-a-service. Synthesize data that are as good or better than your original dataset, and maintain relationships and statistical insights. Customize privacy settings so that data is always safe while remaining useful for downstream workflows. Ensure data accuracy and privacy confidently with expert-grade reports....
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    Imagen - Pytorch

    Imagen - Pytorch

    Implementation of Imagen, Google's Text-to-Image Neural Network

    Implementation of Imagen, Google's Text-to-Image Neural Network that beats DALL-E2, in Pytorch. It is the new SOTA for text-to-image synthesis. Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pre-trained T5 model (attention network). It also contains dynamic clipping for improved classifier-free guidance, noise level conditioning, and a memory-efficient unit design. It appears neither CLIP nor prior network...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    Django Two-Factor Authentication

    Django Two-Factor Authentication

    Complete Two-Factor Authentication for Django

    Complete Two-Factor Authentication for Django. Built on top of the one-time password framework django-otp and Django's built-in authentication framework django.contrib.auth for providing the easiest integration into most Django projects. Inspired by the user experience of Google's Two-Step Authentication, allowing users to authenticate through call, text messages (SMS), by using a token generator app like Google Authenticator or a YubiKey hardware token generator (optional). If you run...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    GenAI Processors

    GenAI Processors

    GenAI Processors is a lightweight Python library

    GenAI Processors is a lightweight Python library for building modular, asynchronous, and composable AI pipelines around Gemini. Its central abstraction is the Processor, a unit of work that consumes an asynchronous stream of parts (text, images, audio, JSON) and produces another stream, making it natural to chain operations and keep everything streaming end-to-end. Processors can be composed sequentially (to build multi-step flows) or in parallel (to fan-out work and merge results), which makes...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Modeltranslation

    Modeltranslation

    Translates Django models using a registration approach

    The modeltranslation application is used to translate dynamic content of existing Django models to an arbitrary number of languages without having to change the original model classes. It uses a registration approach (comparable to Django's admin app) to be able to add translations to existing or new projects and is fully integrated into the Django admin backend. The advantage of a registration approach is the ability to add translations to models on a per-app basis. You can use the same app...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    Towhee is an open-source machine-learning pipeline that helps you encode your unstructured data into embeddings. You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to model...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    Llama Cloud Services

    Llama Cloud Services

    Knowledge Agents and Management in the Cloud

    Llama Cloud Services is a suite of tools designed to facilitate the integration of large language models (LLMs) into applications. It offers components for parsing, extracting, and reporting on complex documents, streamlining the process of preparing data for LLM consumption.​
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Multimodal

    Multimodal

    TorchMultimodal is a PyTorch library

    This project, also known as TorchMultimodal, is a PyTorch library for building, training, and experimenting with multimodal, multi-task models at scale. The library provides modular building blocks such as encoders, fusion modules, loss functions, and transformations that support combining modalities (vision, text, audio, etc.) in unified architectures. It includes a collection of ready model classes—like ALBEF, CLIP, BLIP-2, COCA, FLAVA, MDETR, and Omnivore—that serve as reference...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    Transformers4Rec

    Transformers4Rec

    Transformers4Rec is a flexible and efficient library

    Transformers4Rec is an advanced recommendation system library that leverages Transformer models for sequential and session-based recommendations. The library works as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). Transformers4Rec makes state-of-the-art transformer architectures available for RecSys researchers and industry practitioners. Traditional recommendation...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    mXparser

    mXparser

    Math Parser: Java, C#, C++, Kotlin, Android, and all .NET platforms

    Math Parser: Java, C#, C++, Kotlin, Android, and all .NET platforms (Nuget, Maven, CMake). Supports .NET Framework, .NET Core, .NET Standard, Xamarin, and more. Features: rich built-in library of math functions, operators, constants. Flexible in user-defined arguments, and functions. Expressions are provided as plain text. Easy to use. Well documented. MathParser.org-mXparser is an open-source mathematical expression parser and evaluator for Java and .NET, supporting complex calculations...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Toot

    Toot

    toot - Mastodon CLI & TUI

    Toot is a CLI and TUI tool for interacting with Mastodon instances from the command line.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    LangExtract

    LangExtract

    A Python library for extracting structured information

    LangExtract is a Python library developed by Google that leverages large language models (LLMs) to extract structured information from unstructured text—such as clinical notes, research papers, or literary works—based on user-defined instructions. It is designed to transform free-form text into reliable, schema-constrained data while maintaining traceability back to the source material. Each extracted entity is precisely grounded in its original context, allowing visual inspection...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    tslab

    tslab

    Interactive JavaScript and TypeScript programming with Jupyter

    tslab is an interactive programming environment and REPL with Jupyter for JavaScript and TypeScript users. You can write and execute JavaScript and TypeScript interactively on browsers and save results as Jupyter notebooks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    spacy-transformers

    spacy-transformers

    Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

    spaCy supports a number of transfer and multi-task learning workflows that can often help improve your pipeline’s efficiency or accuracy. Transfer learning refers to techniques such as word vector tables and language model pretraining. These techniques can be used to import knowledge from raw text into your pipeline, so that your models are able to generalize better from your annotated examples. You can convert word vectors from popular tools like FastText and Gensim, or you can load in any pre...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    kapture

    kapture

    Tools for manipulating datasets

    Kapture is a pivot file format, based on text and binary files, used to describe SfM (Structure From Motion) and more generally sensor-acquired data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Textual

    Textual

    Textual is a TUI (Text User Interface) framework for Python

    Textual is a Python framework for creating interactive applications that run in your terminal. Textual adds interactivity to Rich with a Python API inspired by modern web development. On modern terminal software (installed by default on most systems), Textual apps can use 16.7 million colors with mouse support and smooth flicker-free animation. A powerful layout engine and re-usable components makes it possible to build apps that rival the desktop and web experience. Textual runs on Linux...
    Downloads: 1 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.