Showing 176 open source projects for "python text parser"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • Compliant and Reliable File Transfers Backed by Top Security Certifications Icon
    Compliant and Reliable File Transfers Backed by Top Security Certifications

    Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

    Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.
    Start Free Trial
  • 1
    Selectolax

    Selectolax

    Python binding to Modest and Lexbor engines

    A fast HTML5 parser with CSS selectors using Modest and Lexbor engines. Selectolax supports two backends: Modest and Lexbor. By default, all examples use the Modest backend. Most of the features between backends are almost identical, but there are still some differences. Currently, the Lexbor backend is in beta and missing some of the features. To use lexbor, just import the parser and use it in the similar way to the HTMLParser.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Powerline

    Powerline

    Statusline plugin for vim with prompts for several other applications

    Powerline is a statusline plugin for vim, and provides statuslines and prompts for several other applications, including zsh, bash, tmux, IPython, Awesome, i3 and Qtile. Powerline was completely rewritten in Python to get rid of as much vimscript as possible. This has allowed much better extensibility, leaner and better config files, and a structured, object-oriented codebase with no mandatory third-party dependencies other than a Python interpreter. Using Python has allowed unit testing of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Webdis

    Webdis

    A Redis HTTP interface with JSON output

    Webdis is a lightweight web server that exposes Redis through an HTTP interface. It receives HTTP requests, forwards the corresponding commands to Redis, and returns the response in a web-friendly format such as JSON or plain text. This makes it useful when applications, scripts, dashboards, or browser-based tools need controlled access to Redis without using a native Redis client. Webdis is written in C and built with libraries such as hiredis, jansson, libevent, and http-parser. It can be deployed next to Redis in containerized environments and supports simple URL-based command execution. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    GitGutter

    GitGutter

    A Sublime Text 2/3 plugin to see git diff in gutter

    A Sublime Text plug-in to show information about files in a git repository. Gutter Icons indicating inserted, modified or deleted lines. Diff Popup with details about modified lines. Status Bar Text with information about file and repository and provides some commands like Goto Change to navigate between modified lines. Copy from Commit to copy the original content from the commit. Revert to Commit to revert a modified hunk to the original state in a commit. The diff popup shows the original...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Certbot

    Certbot

    Get free HTTPS certificates forever from Let's Encrypt

    Certbot is a fully-featured, easy-to-use, extensible client for the Let's Encrypt CA. It fetches a digital certificate from Let’s Encrypt, an open certificate authority launched by the EFF, Mozilla, and others. This certificate then lets browsers verify the identity of web servers and ensures secure communication over the Web. Obtaining and maintaining a certificate is usually such a hassle, but with Certbot and Let’s Encrypt it becomes automated and hassle-free. With just a few simple...
    Downloads: 102 This Week
    Last Update:
    See Project
  • 7
    newspaper4k

    newspaper4k

    Python library for scraping and analyzing online news articles easily

    Newspaper4k is a Python library designed for extracting, processing, and analyzing news articles from websites. It is a continuation and active fork of the original newspaper3k library, which had stopped receiving updates, with the goal of keeping the ecosystem maintained while adding improvements and bug fixes. It provides developers with tools to automatically download web pages, extract the main article content, and collect associated metadata such as titles, authors, images, and publication dates. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Ungoogled Chromium

    Ungoogled Chromium

    A lightweight approach to removing Google web service dependency

    In descending order of significance (i.e. most important objective first), ungoogled-chromium is Google Chromium, sans dependency on Google web services, ungoogled-chromium retains the default Chromium experience as closely as possible. Unlike other Chromium forks that have their own visions of a web browser, ungoogled-chromium is essentially a drop-in replacement for Chromium. ungoogled-chromium features tweaks to enhance privacy, control, and transparency. However, almost all of these...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 9
    news-please

    news-please

    Python tool for crawling and extracting structured data from news site

    ...Developers can use the software either as a standalone command line application or integrate it into their own Python applications through its library interface. Extracted article data can be stored in different formats and systems, including JSON files or database-backed storage solutions.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    Weibo Crawler

    Weibo Crawler

    Python crawler for collecting and downloading Sina Weibo user data

    weibo-crawler is a Python-based data collection tool designed to retrieve information from Sina Weibo user accounts. It automates the process of gathering posts, user profile details, and engagement metrics from one or more target accounts. weibo-crawler can extract comprehensive information about users, including profile attributes such as nickname, follower count, following count, and account metadata. It also captures detailed data about each post, including the content, publishing time,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    Toot

    Toot

    toot - Mastodon CLI & TUI

    Toot is a CLI and TUI tool for interacting with Mastodon instances from the command line.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    LinkChecker

    LinkChecker

    Check links in web documents or full websites

    LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    vim-plug

    vim-plug

    Minimalist Vim Plugin Manager

    vim-plug is a minimalist open source Vim plugin manager that's easy to set up and easy to use. It's got a concise, intuitive syntax and a single file, no need for boilerplate code. vim-plug is able to do a number of things. Firstly, it can perform parallel installation/update (with any of +job, +python, +python3, +ruby, or Neovim) extremely fast. It can create shallow clones to consume the least amount of disk space and download time. It can review and rollback updates, and is capable of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DjangoBlog

    DjangoBlog

    A blog system based on python3.8 and Django3.0

    Articles, pages, categories, tags (add, delete, edit), etc. Articles and pages support Markdown and highlighting. Articles support full-text search. Complete comment feature, include posting reply comment and email notification. Markdown supporting. Sidebar feature, new articles, most readings, tags, etc. OAuth Login supported, including Google, GitHub, Facebook, Weibo, QQ. Memcache supported, with cache auto refresh. Simple SEO Features, notify Google and Baidu when there was a new article...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    apache-logs-to-mysql

    Apache Log Parser and Data Normalization Application

    Apache Log Parser and Data Normalization Application Python handles File Processing & MySQL handles Data Processing ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema to automate importing Access & Error files and normalizing data into database designed for reports & data analysis. Runs on Windows, Linux and MacOS & tested with MySQL versions 8.0.39, 8.4.3, 9.0.0 & 9.1.0. 4 LogFormats & 2 ErrorLogFormats can be loaded and 5 MySQL Stored Procedures can be processed in a single Python `ProcessLogs function` execution. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    EditPlus

    EditPlus

    Text editor for Windows with built-in FTP, FTPS and sftp

    EditPlus is a lightweight text editor designed for Windows that caters to programmers, web developers, and anyone working with code or text. It offers powerful features like syntax highlighting, code folding, and a customizable interface, making it an excellent alternative to more complex Integrated Development Environments (IDEs). EditPlus supports a wide range of programming languages, including HTML, CSS, PHP, JavaScript, C++, and more. It also integrates tools for FTP, SFTP, and...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 20
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    ...Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    ciwiki

    ciwiki

    Personnal or familly wiki with low ressource requirement.

    Personal lightweight wiki based on DidiWiki. Upgraded to accept text and highlight color, image resize and video (youtube, dailymotion...) embedded. Written in C, doesn't require a lot of RAM. Works fine on Raspbian (Raspberry Pi). Example of Ciwiki running on Raspberry Pi B+ (700MHz, 512MB): http://inphilly.dyn.dhs.org
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    PingChecker

    PingChecker

    Ping Multiple Targets in Sequence

    PingChecker is a tool I wrote to help myself with pinging multiple hostnames or IP addresses for the purpose of determining patterns. You can enter targets yourself, or read names stored in a file, and ping all of them in sequence. The results are saved in both plain text and CSV format for easy viewing. I digitally sign some files in my releases. If you'd like to verify those signatures, you can find my PGP/GPG keys at: https://marcusadams.me/keys.html If you'd like to donate...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    TOMUSS

    TOMUSS: The Online Multi User Simple Spreadsheet

    TOMUSS is an interactive web application (groupware) allowing multiple concurrent users to edit data tables. Its primary goal is the management of students grades.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    PyNuker

    PyNuker

    A stress testing tool written in python.

    PyNuker is a network stress testing tool written in python. Because it is written in python it should run equally well on any system that has Python version 3.x installed. It infinitely(until stopped) sends a string of text via a UDP packet to a target computer or network device in an effort to flood the target with so much useless traffic that it stops responding to valid requests.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB