Showing 318 open source projects for "html source extractor"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    JuniCoder is a Java project that uses unicode as a base for decoding and encoding formats that invented workarounds to express characters not covered by ASCII. Decoders translate those inventions to unicode. Encoders encode to these inventions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Lanproxy

    Lanproxy

    Intranet penetration tool that proxies local area network computers

    Lanproxy is an intranet penetration tool that proxies local area network personal computers and servers to the public network. It supports tcp traffic forwarding and any tcp upper layer protocol (access to intranet websites, local payment interface debugging, ssh access, remote desktop, http proxy) , https proxy, socks5 proxy...). Penetration basic functions, same as the open source version, high performance, can support tens of thousands of penetration connections at the same time. Support...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    The MangaStream Downloader is an open source application written in Java for managing and downloading manga from the site mangastream.com and mangafox.me. It is written under the GNU-GPL license and uses an open source HTML parser - TagSoup. Follow the project page on Facebook for updates: https://www.facebook.com/MangastreamDownloader
    Downloads: 1 This Week
    Last Update:
    See Project
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 5
    Gecco

    Gecco

    Lightweight Java web crawler framework with jQuery-style extraction

    Gecco is a lightweight web crawler framework written in Java that simplifies the process of building web scraping applications. It is designed to make crawler development straightforward by allowing developers to extract page elements using jQuery-style selectors rather than complex parsing logic. It integrates several well-known Java libraries and frameworks, including tools for HTTP requests, HTML parsing, JSON processing, and application development. Through its annotation-based design,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6

    webmongo

    Accessing server-side mongodb through client javascript API.

    Accessing server-side mongodb through client javascript API. This project is a branch of dbcloud You can do almost invoke on mongodb through the javascript API in browser. The client javascript api support IE6.0+ Chrome FireFox and Wechat
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Save For Offline

    Save For Offline

    Android app for saving webpages for offline reading

    Android app for saving webpages for offline reading. Save For Offline is an Android app for saving full web pages for offline reading, with lots of features and options. In you web browser selects 'Share', and then 'Save For Offline'. Saves real HTML files which can be opened in other apps/devices. Download & save entire web pages with all assets for offline reading & viewing. Save HTML files in a custom directory. Save in the background, no need to wait for it to finish saving. Night mode,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8

    Java-WebTTS

    API-Makes static web pages readable with no coding

    This JAVA API helps create java web applications where static html pages can be read out to viewer. It helps people who are visually challenged partially. It is helpful to common people and children , too, and might work out great in educational site. It's fully customized and does not need developer to write a single line of code . All you need to do is to assign a specific id to the DOM element , whose innerHTML you want to be read out. Next release of the API will deal with many more...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Simple-Scrape is a simple web-scraping library that allows for programmatic access to HTML code. No further techniques are needed and the library is very compact and thus easy to use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    ...Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    ItsNat is a Component based AJAX Java Web Application Framework.No XML programming,no mixed view/code,no custom JavaScript.Only pure HTML,pure Java and server centric Swing-like programming with W3C standards where "The Browser is The Server"
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    CleanCode

    CleanCode

    .NET, PowerShell, SQL, Java, Perl, and Javascript developer libraries

    Develop clean code with our .NET components (plus PowerShell, SQL, Java, Perl, and JavaScript components as well!). CleanCode highlights include user controls, a validation engine, a diagnostic system, an XML/HTML pre-processor, and a variety of articles on code design.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    Chunk, an HTML Template Engine for Java

    Chunk, an HTML Template Engine for Java

    Clean, powerful templates for Java

    A powerful Java Template Engine, great for building HTML or XML docs. Chunk can handle many other needs and situations as well. In-tag filters & default values, multiple snippets per file, layered themes, macros, conditional includes, localization & more.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    wikihtml

    Converts wikitext documents into HTML documents

    This project is an application that converts wikitext documents into HTML documents. Wiki markup or wikitext is a markup language to write documents in wiki-based systems, such as web sites powered by MediaWiki.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    xowa

    xowa

    A free, open-source, offline Wikipedia application

    XOWA is a desktop application for reading and editing Wikipedia offline (XOWA has moved to http://gnosygnu.github.io/xowa/download.html)
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    wiki2xhtml converts wiki syntax into (X)HTML code and styles the page with CSS. It makes it easy to create good-looking pages without many know-how, and advanced users can use own code. The program can be run either in the console or with a GUI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Site monitoring

    Site monitoring

    Monitoring of websites with spider and email notifications

    Free website monitoring software, easy to set up and use for monitoring web sites. It is a web application programmed in Java programming language. You can monitor HTML pages, JSON and XML, pages in sitemap and even your whole web site using spider. Naturally you can check multiple websites. You can check HTTP result codes and even contents of the checked pages. Website checking is done periodically using build-in cron mechanism. In case of a check failure, application will automatically...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Aspose Java for Liferay

    Aspose Java for Liferay

    Provides export options for blogs, journals and dynamic lists

    This is Liferay CMS / Portal plugin released by Aspose pty ltd. Aspose.Total Java for Liferay (hook plugin app) provides options for exporting web-contents and blogs created in html to MS-WORD, MS-EXCEL and PDF file formats using Aspose.Total Java APIs. (Aspose.Words, Aspose.Cells and Aspose.PDF) The Plugin also provides very useful functionality / options for exporting the Dynamic Data Lists to MS-WORD, MS-EXCEL and PDF file formats using Aspose.Total Java APIs. (Aspose.Words,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Linkcrawler

    Linkcrawler

    Capable to "Crawl" a site and return a report of all links from it

    Java Desktop application capable to "Crawl" a site and return a report of the status of all the link present at the page, then it moves to another internal page and so on. LinkCrawlers provides a nice HTML5 report with the information of all link per WebPage, Easy to Read. This tool is useful for Web QA testers
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22

    waterlooFX

    Scientific Charting with JavaFX

    waterlooFX provides a library for scientific charting using JavaFX
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Intelligent Keyword Miner

    Intelligent Keyword Miner

    Intelligent SEO keyword miner and predicing tool

    THIS IS A NETBEANS 8.02 PROJECT ENGLISH ONLY This program was made to help me with the patent research. It simply generates the search keywords, based on your upvotes or a downvotes of the input parameters. It can accept a text or URL (text takes a prescedence over the URL). If you input URL, it goes to a page, and learns its text from HTML format. This program is intelligent as it predicts what you may want to search next, based on your personal trends. After searching the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    This is a apache v2.0 authentication module. Based on html form authentication and cookie authentication session. Cookie session are stored in memcache deamon. Can be used has an simple "Single Signe-On" (SSO). All the code source and the bug tracking has migrated to github: https://github.com/ZenProjects/Apache-Authmemcookie-Module All the documentation are here: https://zenprojects.github.io/Apache-Authmemcookie-Module/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    java Image Album

    java Image Album

    Java Image Album (jIA) is wizard-style photo album application.

    java Image Album (jIA) is a Free Open Source easy to use wizard-style JavaTM application that generates HTML photo albums. Automatically resize your images and produce a set of HTML pages including index pages with thumbnails and detailed caption pages for each photo. Publishing a new photo album is as simple as copying a directory of images to your web directory. Java Image Album is released under the Mozilla Public License 1.1.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB