Showing 29 open source projects for "html source extractor"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    xhtml2pdf

    xhtml2pdf

    A library for converting HTML into PDFs using ReportLab

    xhtml2pdf enables users to generate PDF documents from HTML content easily and with automated flow control such as pagination and keeping text together. The Python module can be used in any Python environment, including Django. The Command line tool is a stand-alone program that can be executed from the command line.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Pretty Jupyter

    Pretty Jupyter

    Creates dynamic html report from jupyter notebook.

    Pretty Jupyter is an easy-to-use package that allows to create beautiful & dynamic HTML reports. Most of the features require little to no work to get working and greatly improve the quality of the output report, or even the developer’s comfort when creating the report. For example, tabs make some visualizations much more comfortable. The features are integrated directly into the output page, therefore there is no need to have an interpreter running in the backend. This makes the HTML easily...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 3
    lxml

    lxml

    The lxml XML toolkit for Python

    A Python library for efficient XML and HTML processing, known for speed and compatibility. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all CPython versions from 3.6 to 3.12. See the introduction for more information about the...
    Downloads: 33 This Week
    Last Update:
    See Project
  • 4
    plotly.py

    plotly.py

    The interactive graphing library for Python

    ...Graphs made with plotly.py can be viewed in Jupyter notebooks, standalone HTML files, or hosted online using Chart Studio Cloud.
    Downloads: 10 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    claude-code-transcripts

    claude-code-transcripts

    Tools for publishing transcripts for Claude Code sessions

    claude-code-transcripts is a command-line utility that takes session files exported from Claude Code (in JSON or JSONL format) and turns them into clean, navigable HTML transcripts that can be viewed in any modern web browser. It is designed to make the often dense and verbose outputs from AI coding sessions easier to read, share, and archive by breaking conversations into paginated, annotated pages with navigable timelines of prompts and responses. Users can run this tool locally or fetch...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    LangExtract

    LangExtract

    A Python library for extracting structured information

    LangExtract is a Python library developed by Google that leverages large language models (LLMs) to extract structured information from unstructured text—such as clinical notes, research papers, or literary works—based on user-defined instructions. It is designed to transform free-form text into reliable, schema-constrained data while maintaining traceability back to the source material. Each extracted entity is precisely grounded in its original context, allowing visual inspection and validation via automatically generated interactive HTML visualizations. LangExtract supports a wide range of models, including Google Gemini, OpenAI GPT, and local LLMs via Ollama, making it adaptable to different deployment environments and compliance needs. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 7
    Jupyter Notebook Tools for Sphinx

    Jupyter Notebook Tools for Sphinx

    Sphinx source parser for Jupyter notebooks

    nbsphinx is a Sphinx extension that provides a source parser for *.ipynb files. Custom Sphinx directives are used to show Jupyter Notebook code cells (and of course their results) in both HTML and LaTeX output. Un-evaluated notebooks – i.e. notebooks without stored output cells – will be automatically executed during the Sphinx build process.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    notebooker

    notebooker

    Productionise & schedule your Jupyter Notebooks

    Productionise and schedule your Jupyter Notebooks, just as interactively as you wrote them. Notebooker is a webapp which can execute and parametrise Jupyter Notebooks as soon as they have been committed to git. The results are stored in MongoDB and searchable via the web interface, essentially turning your Jupyter Notebook into a production-style web-based report in a few clicks.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    WeasyPrint

    WeasyPrint

    The awesome document factory

    WeasyPrint is a smart solution helping people to create PDF documents. You can generate gorgeous statistical reports, invoices, tickets, and anything you want as long as you have some webdesign skills! Design your documents just as you design your websites! WeasyPrint follows the widely used HTML and CSS specifications from the W3C. You can use your usual web tools, languages and frameworks, but for print. Creating high-quality digital documents requires features that you love to use as...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 11
    Best-of Web Development with Python

    Best-of Web Development with Python

    A ranked list of awesome python libraries for web development

    This curated list contains 570 awesome open-source projects with a total of 2.4M stars grouped into 26 categories. All projects are ranked by a project-quality score, which is calculated based on various metrics automatically collected from Github and different package managers. If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! A ranked list of awesome python libraries for web...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 12
    Pysheeet

    Pysheeet

    Python Cheat Sheet

    Pysheeet is a community-driven collection of Python code snippets covering common patterns and tasks like sockets, file I/O, data structures, and more. Each snippet is concise and battle-tested, designed to save coding time and reduce boilerplate. With documentation hosted on Read the Docs and an active GitHub repo, it’s a go-to resource for Python developers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Selenium-python Helium

    Selenium-python Helium

    Selenium-python but lighter: Helium is the best Python library

    Under the hood, Helium forwards each call to Selenium. The difference is that Helium's API is much more high-level. In Selenium, you need to use HTML IDs, XPaths and CSS selectors to identify web page elements. Helium on the other hand lets you refer to elements by user-visible labels. As a result, Helium scripts are typically 30-50% shorter than similar Selenium scripts. What's more, they are easier to read and more stable with respect to changes in the underlying web page. Selenium-python...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Dominate

    Dominate

    Dominate is a Python library for creating and manipulating HTML docs

    Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure Python very concisely, which eliminates the need to learn another template language, and lets you take advantage of the more powerful features of Python. Dominate can also use keyword arguments to append attributes onto your tags. Most of the attributes are a direct copy from the HTML spec with a few variations. Through the use of the += operator and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    Graphtage

    Graphtage

    A semantic diff utility and library for tree-like files such as JSON

    Graphtage is a command-line utility and underlying library for semantically comparing and merging tree-like structures, such as JSON, XML, HTML, YAML, plist, and CSS files. Its name is a portmanteau of “graph” and “graftage”, the latter being the horticultural practice of joining two trees together such that they grow as one. Graphtage performs an analysis on an intermediate representation of the trees that is divorced from the filetypes of the input files. This means, for example, that you...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Amazon Braket Strawberry Fields Plugin

    Amazon Braket Strawberry Fields Plugin

    An open source framework for using Amazon Braket devices

    An open-source framework for using Amazon Braket devices with the Strawberry Fields photonic device programming library. This plugin provides a BraketEngine class for running photonic quantum circuits created in Strawberry Fields on the Amazon Braket service. The Amazon Braket Python SDK is an open source library that provides a framework to interact with quantum computing hardware devices and simulators through Amazon Braket. This plugin provides the classes BraketEngine for submitting...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    StarsAndClown

    StarsAndClown

    Github Star Gathering Treatment List

    StarsAndClown is a repository by the same maintainer that seems intended as a lighthearted “ranking / listing” project, possibly gathering interesting or amusing GitHub repositories, trending topics, or community “stars” — perhaps with a humorous or satirical twist given the name. The concept suggests a curated (or semi-automated) list of GitHub repos worth noting: whether because of popularity, novelty, or community interest — giving “people who eat grapes” (i.e. spectators) a way to enjoy...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    tkColorPicker

    tkColorPicker

    Color picker dialog for Tkinter, alternative to tkinter.colorchooser

    The tkcolorpicker module contains a `ColorPicker` class which implements the color picker dialog and an `askcolor` function that displays the color picker and returns the chosen color in RGB and HTML formats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    EasyHTML

    A python package for building DOM of the HTML documents

    A python package that provides an easy access to elements of HTML and XHTML documents through the Document Object Model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    mds-utils

    General purpose utilities for C++ and Python developers

    This library is intended to become a collection of several C++ utilities. At present it contains: 1. a tool for detecting machine endianity. 2. utilities for the Boost uBLAS library. Amongst them, some type traits for detecting different uBLAS matrix types. 3. some useful classes that allow to treat the old C FILE pointer as a C++ stream. 4. C++ wrappers of the main Python objects, independent of those in Boost Python. Wrappers are provided also for NumPy arrays. 5. C++...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Robot Framework JMeter Library

    Robot Framework JMeter Library

    Robot Framework and JMeter integration

    SOURCE CODE MOVED TO https://github.com/kowalpy/Robot-Framework-JMeter-Library . NEW RELEASES WILL APPEAR ONLY AT GITHUB AND PYPI. The Robot Framework library which can be used for starting JMeter and/or analysing and converting JMeter log files into HTML and SQLite format.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    HTML DOM Parser

    HTML parser which can be used for screen-scraping applications

    htmldom parses the HTML file and provides methods for iterating and searching the parse tree in a similar way as Jquery. To report bugs please mail me at bhimsen.pes@gmail.com
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    PyLibrary

    PyLibrary

    Libraries for Python developers.

    Development in Python (be it website or an App development or implementation of an automation framework) always involves certain operations like handling db queries, operations on web, development of data structures, windows operations (handing services, registries), logging and many more... What, if you have these libraries handy with you all the time? Just import and start using them.. In comes PyLibrary.. PyLibrary is a collection of infrastructure libraries that aid faster...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    A python3 lib rendering html.parser into stack and callback models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB