Showing 386 open source projects for "python text parser"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 1
    DevDocs

    DevDocs

    API Documentation Browser

    The devdocs repository powers the DevDocs web application, a fast, offline-friendly documentation browser for many programming languages, libraries, and APIs. It aggregates documentation from multiple sources (e.g., MDN, Python, Ruby, Git, etc.), converts them into a uniform format, and indexes them for instant text searching. The codebase includes a backend that handles ingestion, parsing, and transformation of documentation sources into a static site structure, as well as the client side UI...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    SentenceTransformers

    SentenceTransformers

    Multilingual sentence & image embeddings with BERT

    SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. The initial work is described in our paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning. This can be useful for semantic textual similar, semantic search, or paraphrase mining...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    NetworkX

    NetworkX

    Network analysis in Python

    NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. Data structures for graphs, digraphs, and multigraphs. Many standard graph algorithms. Network structure and analysis measures. Generators for classic graphs, random graphs, and synthetic networks. Nodes can be "anything" (e.g., text, images, XML records). Edges can hold arbitrary data (e.g., weights, time-series). Open source 3-clause BSD license. Well tested...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 5 This Week
    Last Update:
    See Project
  • No-Nonsense Code-to-Cloud Security for Devs | Aikido Icon
    No-Nonsense Code-to-Cloud Security for Devs | Aikido

    Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

    Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.
    Start for Free
  • 5
    crun

    crun

    A fast and lightweight fully featured OCI runtime and C library

    A fast and low-memory footprint OCI Container Runtime fully written in C. While most of the tools used in the Linux containers ecosystem are written in Go, I believe C is a better fit for a lower-level tool like container runtime. runc, the most used implementation of the OCI runtime specs written in Go, re-execs itself and uses a module written in C for setting up the environment before the container process starts. crun aims to be also usable as a library that can be easily included in...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    txtai

    txtai

    Build AI-powered semantic search applications

    ..., models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. The following applications are powered by txtai.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    GitSavvy

    GitSavvy

    Full git and GitHub integration with Sublime Text

    Sublime Text plugin providing probably all git has to offer. Sublime Text 2 is not supported. Also, GitSavvy takes advantage of modern features of Sublime Text (like annotations). For the best experience, use the latest Sublime Text dev build. The documentation is probably outdated. Yeah it's sad but you can contribute and I will eventually get onto it but every special view has help available, just press ?. GitSavvy requires Git versions at or greater than 2.18.0. basic Git functionality; init...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    goldmark

    goldmark

    A markdown parser written in Go. Easy to extend, standard, compliant

    A markdown parser is written in Go. Easy to extend, standard(CommonMark) compliant, well structured.golang-commonmark may be a good choice, but it seems to be a copy of markdown-it. blackfriday.v2 is a fast and widely-used implementation, but is not CommonMark-compliant and cannot be extended from outside of the package, since its AST uses structs instead of interfaces. Furthermore, its behavior differs from other implementations in some cases, especially regarding lists: Deep nested lists...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Fabric.js

    Fabric.js

    Javascript Canvas Library and SVG-to-Canvas Parser

    Fabric.js is a simple yet powerful Javascript HTML5 canvas library that allows you to easily work with HTML5 canvas element in various ways. It is also an SVG-to-canvas (and vice versa) parser. Fabric provides an interactive object model on top of canvas element, so you can create and populate objects on canvas; manipulate the size, position and rotation of these objects; modify properties such as color, transparency and more. You could also group these objects together with just a simple...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    Markdig

    Markdig

    A fast, powerful, CommonMark compliant, extensible Markdown processor

    A fast, powerful, CommonMark compliant, extensible Markdown processor for .NET. Very fast parser and HTML renderer (no-regexp), very lightweight in terms of GC pressure. Abstract Syntax Tree with precise source code location for syntax tree, useful when building a Markdown editor. Check out MarkdownEditor for Visual Studio powered by Markdig! Even the core Markdown/CommonMark parsing is pluggable, so it allows to disable built-in Markdown/Commonmark parsing (e.g Disable HTML parsing) or change...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Google Open Source Project Style Guide

    Google Open Source Project Style Guide

    Chinese version of Google open source project style guide

    .... If the project you are modifying originates from Google, you may be directed to the English version of the project page to understand the style used by the project. The Chinese version of the project uses reStructuredText plain text markup syntax, and uses Sphinx to generate document formats such as HTML / CHM / PDF.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    Doctrine Annotations

    Doctrine Annotations

    Annotations docblock parser

    Doctrine Annotations allows to implement custom annotation functionality for PHP classes. Annotations aren't implemented in PHP itself which is why this component offers a way to use the PHP doc-blocks as a place for the well known annotation syntax using the @ char. Annotations in Doctrine are used for the ORM configuration to build the class mapping, but it can be used in other projects for other purposes too. You can install the Annotation component with composer. The access to the...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    borb

    borb

    borb is a library for reading, creating and manipulating PDF files

    borb is a library for creating and manipulating PDF files in python. borb is a pure python library to read, write, and manipulate PDF documents. It represents a PDF document as a JSON-like data structure of nested lists, dictionaries and primitives (numbers, string, booleans, etc) This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    Gretel Synthetics

    Gretel Synthetics

    Synthetic data generators for structured and unstructured text

    Unlock unlimited possibilities with synthetic data. Share, create, and augment data with cutting-edge generative AI. Generate unlimited data in minutes with synthetic data delivered as-a-service. Synthesize data that are as good or better than your original dataset, and maintain relationships and statistical insights. Customize privacy settings so that data is always safe while remaining useful for downstream workflows. Ensure data accuracy and privacy confidently with expert-grade reports....
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Imagen - Pytorch

    Imagen - Pytorch

    Implementation of Imagen, Google's Text-to-Image Neural Network

    Implementation of Imagen, Google's Text-to-Image Neural Network that beats DALL-E2, in Pytorch. It is the new SOTA for text-to-image synthesis. Architecturally, it is actually much simpler than DALL-E2. It consists of a cascading DDPM conditioned on text embeddings from a large pre-trained T5 model (attention network). It also contains dynamic clipping for improved classifier-free guidance, noise level conditioning, and a memory-efficient unit design. It appears neither CLIP nor prior network...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    Django Two-Factor Authentication

    Django Two-Factor Authentication

    Complete Two-Factor Authentication for Django

    Complete Two-Factor Authentication for Django. Built on top of the one-time password framework django-otp and Django's built-in authentication framework django.contrib.auth for providing the easiest integration into most Django projects. Inspired by the user experience of Google's Two-Step Authentication, allowing users to authenticate through call, text messages (SMS), by using a token generator app like Google Authenticator or a YubiKey hardware token generator (optional). If you run...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    GenAI Processors

    GenAI Processors

    GenAI Processors is a lightweight Python library

    GenAI Processors is a lightweight Python library for building modular, asynchronous, and composable AI pipelines around Gemini. Its central abstraction is the Processor, a unit of work that consumes an asynchronous stream of parts (text, images, audio, JSON) and produces another stream, making it natural to chain operations and keep everything streaming end-to-end. Processors can be composed sequentially (to build multi-step flows) or in parallel (to fan-out work and merge results), which makes...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Modeltranslation

    Modeltranslation

    Translates Django models using a registration approach

    The modeltranslation application is used to translate dynamic content of existing Django models to an arbitrary number of languages without having to change the original model classes. It uses a registration approach (comparable to Django's admin app) to be able to add translations to existing or new projects and is fully integrated into the Django admin backend. The advantage of a registration approach is the ability to add translations to models on a per-app basis. You can use the same app...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    Towhee is an open-source machine-learning pipeline that helps you encode your unstructured data into embeddings. You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to model...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    Llama Cloud Services

    Llama Cloud Services

    Knowledge Agents and Management in the Cloud

    Llama Cloud Services is a suite of tools designed to facilitate the integration of large language models (LLMs) into applications. It offers components for parsing, extracting, and reporting on complex documents, streamlining the process of preparing data for LLM consumption.​
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Multimodal

    Multimodal

    TorchMultimodal is a PyTorch library

    This project, also known as TorchMultimodal, is a PyTorch library for building, training, and experimenting with multimodal, multi-task models at scale. The library provides modular building blocks such as encoders, fusion modules, loss functions, and transformations that support combining modalities (vision, text, audio, etc.) in unified architectures. It includes a collection of ready model classes—like ALBEF, CLIP, BLIP-2, COCA, FLAVA, MDETR, and Omnivore—that serve as reference...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    Transformers4Rec

    Transformers4Rec

    Transformers4Rec is a flexible and efficient library

    Transformers4Rec is an advanced recommendation system library that leverages Transformer models for sequential and session-based recommendations. The library works as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). Transformers4Rec makes state-of-the-art transformer architectures available for RecSys researchers and industry practitioners. Traditional recommendation...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    mXparser

    mXparser

    Math Parser: Java, C#, C++, Kotlin, Android, and all .NET platforms

    Math Parser: Java, C#, C++, Kotlin, Android, and all .NET platforms (Nuget, Maven, CMake). Supports .NET Framework, .NET Core, .NET Standard, Xamarin, and more. Features: rich built-in library of math functions, operators, constants. Flexible in user-defined arguments, and functions. Expressions are provided as plain text. Easy to use. Well documented. MathParser.org-mXparser is an open-source mathematical expression parser and evaluator for Java and .NET, supporting complex calculations...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Toot

    Toot

    toot - Mastodon CLI & TUI

    Toot is a CLI and TUI tool for interacting with Mastodon instances from the command line.
    Downloads: 2 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.