Showing 106 open source projects for "python text parser"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Selectolax

    Selectolax

    Python binding to Modest and Lexbor engines

    A fast HTML5 parser with CSS selectors using Modest and Lexbor engines. Selectolax supports two backends: Modest and Lexbor. By default, all examples use the Modest backend. Most of the features between backends are almost identical, but there are still some differences. Currently, the Lexbor backend is in beta and missing some of the features. To use lexbor, just import the parser and use it in the similar way to the HTMLParser.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Powerline

    Powerline

    Statusline plugin for vim with prompts for several other applications

    Powerline is a statusline plugin for vim, and provides statuslines and prompts for several other applications, including zsh, bash, tmux, IPython, Awesome, i3 and Qtile. Powerline was completely rewritten in Python to get rid of as much vimscript as possible. This has allowed much better extensibility, leaner and better config files, and a structured, object-oriented codebase with no mandatory third-party dependencies other than a Python interpreter. Using Python has allowed unit testing of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Certbot

    Certbot

    Get free HTTPS certificates forever from Let's Encrypt

    Certbot is a fully-featured, easy-to-use, extensible client for the Let's Encrypt CA. It fetches a digital certificate from Let’s Encrypt, an open certificate authority launched by the EFF, Mozilla, and others. This certificate then lets browsers verify the identity of web servers and ensures secure communication over the Web. Obtaining and maintaining a certificate is usually such a hassle, but with Certbot and Let’s Encrypt it becomes automated and hassle-free. With just a few simple...
    Downloads: 129 This Week
    Last Update:
    See Project
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 5
    ungoogled-chromium

    ungoogled-chromium

    A lightweight approach to removing Google web service dependency

    In descending order of significance (i.e. most important objective first), ungoogled-chromium is Google Chromium, sans dependency on Google web services, ungoogled-chromium retains the default Chromium experience as closely as possible. Unlike other Chromium forks that have their own visions of a web browser, ungoogled-chromium is essentially a drop-in replacement for Chromium. ungoogled-chromium features tweaks to enhance privacy, control, and transparency. However, almost all of these...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 6
    Toot

    Toot

    toot - Mastodon CLI & TUI

    Toot is a CLI and TUI tool for interacting with Mastodon instances from the command line.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    LinkChecker

    LinkChecker

    Check links in web documents or full websites

    LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make...
    Downloads: 1 This Week
    Last Update:
    See Project
  • The Best MRP Software for Small Businesses Icon
    The Best MRP Software for Small Businesses

    Gain control of your inventory and production operations with Katana MRP system

    Katana is the #1 modern manufacturing & inventory software for scaling businesses. Automate your workflows with Katana’s visual interface and smart auto-booking engine, which allow you to prioritize orders and see the availability of raw materials & finished goods in real-time.
    Try for Free
  • 10

    apache-logs-to-mysql

    Apache Log Parser and Data Normalization Application

    Apache Log Parser and Data Normalization Application Python handles File Processing & MySQL handles Data Processing ApacheLogs2MySQL consists of two Python Modules & one MySQL Schema to automate importing Access & Error files and normalizing data into database designed for reports & data analysis. Runs on Windows, Linux and MacOS & tested with MySQL versions 8.0.39, 8.4.3, 9.0.0 & 9.1.0. 4 LogFormats & 2 ErrorLogFormats can be loaded and 5 MySQL Stored Procedures can be processed in a single Python `ProcessLogs function` execution. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    PyNuker

    PyNuker

    A stress testing tool written in python.

    PyNuker is a network stress testing tool written in python. Because it is written in python it should run equally well on any system that has Python version 3.x installed. It infinitely(until stopped) sends a string of text via a UDP packet to a target computer or network device in an effort to flood the target with so much useless traffic that it stops responding to valid requests.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 12
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    ...Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 22 This Week
    Last Update:
    See Project
  • 13
    elFinder
    elFinder is a file manager for web similar to that you use on your computer. Written in JavaScript using jQuery UI, it just work's in any modern browser. Its creation is inspired by simplicity and convenience of Finder.app program used in Mac OS X.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14

    uweb browser: unlimited power

    minimal suckless android web browser with unlimited power

    - AI bot as search engine; append file content as input for complex query. - Powerful: html5 enhancement; any urls to host a website; javascript and shell scripting for general processing; and more with Termux. - Customizable: user-defined menus, (new) buttons and gestures for user agents, bookmarklets, url services, shell commands, internal functionality links and text processing etc. - Convenient: book/dictionary/txt/command line/app can be search engine. - Tiny: less than 200k -...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    Buku

    Buku

    Powerful command-line bookmark manager. Your mini web!

    buku is a powerful bookmark manager written in Python3 and SQLite3. buku fetches the title of a bookmarked web page and stores it along with any additional comments and tags. You can use your favourite editor to compose and update bookmarks. With multiple search options, including regex and a deep scan mode (particularly for URLs), it can find any bookmark instantly. Multiple search results can be opened in the browser at once. Though a terminal utility, it's possible to add bookmarks...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CerberusCMS5

    CerberusCMS5

    Cerberus Content Management System

    Cerberus Content Management System is a dynamic, secure and infinitely expandable CMS designed after a Unix-Like model. It is a custom written Web Application Framework ( W.A.F. ) with a consistent and custom written Pre-Hyper-Text-Post-Processor Programming Code Framework ( P.C.F. ). This Web Application Software Project' aim is to be the fastest and most secure Web Application Framework, Web Application Programming Code Framework, Text, Voice and Video Communications Platform and Content...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    DjangoBlog

    DjangoBlog

    A blog system based on python3.8 and Django3.0

    Articles, pages, categories, tags (add, delete, edit), etc. Articles and pages support Markdown and highlighting. Articles support full-text search. Complete comment feature, include posting reply comment and email notification. Markdown supporting. Sidebar feature, new articles, most readings, tags, etc. OAuth Login supported, including Google, GitHub, Facebook, Weibo, QQ. Memcache supported, with cache auto refresh. Simple SEO Features, notify Google and Baidu when there was a new article...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    The CSS Parser is implemented as a package of Java classes, that inputs Cascading Style Sheets source text and outputs a Document Object Model Level 2 Style tree. Alternatively, applications can use SAC: The Simple API for CSS. Its purpose is to allow developers working with Java to incorporate Cascading Style Sheet information, primarily in conjunction with XML application developments.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    ...Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    OOoPy is a library in Python for inspecting, creating or modifying OpenOffice.org documents. It uses the existing ElementTree XML library by Fredrik Lundh for manipulation of the OOo XML.
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    JSSoup

    JSSoup

    JavaScript + BeautifulSoup = JSSoup

    I'm a fan of Python library BeautifulSoup. It's feature-rich and very easy to use. But when I am working on a small react-native project, and I tried to find a HTML parser library like BeautifulSoup, I failed. So I want to write a HTML parser library that can be so easy to use just like BeautifulSoup in Javascript. JSSoup uses tautologistics/node-htmlparser as HTML dom parser, and creates a series of BeautifulSoup like API on top of it.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    JDynamiTe, Dynamic Template in Java

    JDynamiTe, Dynamic Template in Java

    Dynamically generate documents from templates

    JDynamiTe is a tool which allows you to dynamically create documents in any format from "template" documents. And very few lines of code (or no line at all!) are needed to do that. Some typical usage domains of JDynamiTe are: - dynamic Web pages creation, - text document generation, - source code generation... In fact, it can be useful in any case where pre-defined documents (templates) have to be dynamically populated with data. The main benefit of JDynamiTe is to allow a true...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    googler

    googler

    Google from the terminal

    googler is a power tool to Google (web, news, videos and site search) from the command line. It shows the title, URL and abstract for each result, which can be directly opened in a browser from the terminal. Results are fetched in pages (with page navigation). Supports sequential searches in a single googler instance. googler was initially written to cater to headless servers without X. You can integrate it with a text-based browser. However, it has grown into a very handy and flexible...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    googler

    googler

    Google Search, Google Site Search, Google News from the terminal

    googler is a power tool to Google (Web & News) and Google Site Search from the command-line. It shows the title, URL and abstract for each result, which can be directly opened in a browser from the terminal. Results are fetched in pages (with page navigation). Supports sequential searches in a single googler instance. googler was initially written to cater to headless servers without X. You can integrate it with a text-based browser. However, it has grown into a very handy and flexible...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next