Showing 12 open source projects for "data formats html/xhtml tidy"

View related business solutions
  • Passwordless authentication enables a secure and frictionless experience for your users | Auth0 Icon
    Over two-thirds of people reuse passwords across sites, resulting in an increasingly insecure e-commerce ecosystem. Learn how passwordless can not only mitigate these issues but make the authentication experience delightful. Implement Auth0 in any application in just five minutes
  • Discover Multiview ERP: The Financial Management Revolution Icon
    Discover Multiview ERP: The Financial Management Revolution

    Reclaim precious moments with loved ones while our robust cloud accounting software streamlines your financial processes.

    Built for growing businesses and well-established enterprises alike, Multiview is a highly scalable and robust ERP.
  • 1
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 41 This Week
    Last Update:
    See Project
  • 3
    miniblink49

    miniblink49

    Lighter, faster browser kernel of blink to integrate HTML UI in apps

    miniblink is an open source, one file, small browser widget based on chromium. By using C interface, you can create a browser with just some line code. miniblink is an open source, single-file, and currently the smallest known chromium-based browser control. Through its exported pure C interface, a browser control can be created in a few lines of code. C++, C#, Delphi and other language calls (support C++, C#, Delphi language to call). Embedded Nodejs, support electron (with Nodejs, can run...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Small Business HR Management Software Icon
    Small Business HR Management Software

    Get a unified timekeeping, scheduling, payroll, HR and benefits portal with WorkforceHub.

    WorkforceHub is the instantly useful, delightfully simple to use, small business solution for tracking time, scheduling and hiring. It scales as your business grows while delivering the mission-critical features an organization needs. It is tailored to, built for, and priced for small business employers.
  • 5
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the browser...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    XRichText

    XRichText

    An Android rich text class library that supports graphic & text mixing

    An Android-rich text class library that supports graphic and text mixing, supports editing and previewing and supports inserting and deleting pictures. Use ScrollView as the outermost layout containing LineaLayout, filled with TextView and ImageView. When deleting, delete the TextView and ImageView according to the position of the cursor, and the text will be automatically merged. The generated data is a list collection, and the data format can be customized. Version V1.4 opens the image click...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    crawler4j

    crawler4j

    Open source web crawler for Java

    ... ics domain. visit function is called after the content of a URL is downloaded successfully. You can easily get the url, text, links, html, and unique id of the downloaded page. You should also implement a controller class which specifies the seeds of the crawl, the folder in which intermediate crawl data should be stored and the number of concurrent threads.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 9
    Simple-Scrape is a simple web-scraping library that allows for programmatic access to HTML code. No further techniques are needed and the library is very compact and thus easy to use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cyber Risk Assessment and Management Platform Icon
    Cyber Risk Assessment and Management Platform

    ConnectWise Identify is a powerful cybersecurity risk assessment platform offering strategic cybersecurity assessments and recommendations.

    When it comes to cybersecurity, what your clients don’t know can really hurt them. And believe it or not, keep them safe starts with asking questions. With ConnectWise Identify Assessment, get access to risk assessment backed by the NIST Cybersecurity Framework to uncover risks across your client’s entire business, not just their networks. With a clearly defined, easy-to-read risk report in hand, you can start having meaningful security conversations that can get you on the path of keeping your clients protected from every angle. Choose from two assessment levels to cover every client’s need, from the Essentials to cover the basics to our Comprehensive Assessment to dive deeper to uncover additional risks. Our intuitive heat map shows you your client’s overall risk level and priority to address risks based on probability and financial impact. Each report includes remediation recommendations to help you create a revenue-generating action plan.
  • 10
    This is an advanced web scraper with user friendly GUI which let the user define rules and web addresses to extract data from one time or periodically and a target database filed that the data should be saved in.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    datalus
    PHP web API designed to simplify object handling(loading, saving, querying, displaying, and editing), abstract the data from its display structure, and layout and allow the target data to be delivered to any supported format without special logic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    HtmlClient provides an SGML/HTML/XHTML parser and connection client making web-spidering as easy for developers as actually surfing the web with a premade browser. Based on Apache's HttpClient.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next