Showing 128 open source projects for "html source extractor"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Free Manga Downloader

    Free Manga Downloader

    Forked from https://sf.net/p/fmd/

    The Free Manga Downloader (FMD) is an open source application written in Object-Pascal for managing and downloading manga from various websites. This is a mirror of main repository on GitHub. For feedback/bug report visit https://github.com/riderkick/FMD
    Leader badge
    Downloads: 313 This Week
    Last Update:
    See Project
  • 4
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 11 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    JuniCoder is a Java project that uses unicode as a base for decoding and encoding formats that invented workarounds to express characters not covered by ASCII. Decoders translate those inventions to unicode. Encoders encode to these inventions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    The MangaStream Downloader is an open source application written in Java for managing and downloading manga from the site mangastream.com and mangafox.me. It is written under the GNU-GPL license and uses an open source HTML parser - TagSoup. Follow the project page on Facebook for updates: https://www.facebook.com/MangastreamDownloader
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    SSEP - Site Search Engine PHP-Ajax

    SSEP - Site Search Engine PHP-Ajax

    A Free site search engine script build with PHP and Ajax.

    A Site Search engine script that uses MySQL to store your website's indexed pages, to add Search Functionality to Your Web Site. It is build with PHP and JavaScript, the search results are loaded via Ajax. The search system combine MySQL full text with SQL regexp, and words weight according to their location in the HTML elements, to determine the relevance of the search results. It can be included in any web site.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    ...Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    HyperSQL is like a doxygen plus javadoc for SQL, hypermapping SQL views, packages, procedures, and functions to HTML source code listings and showing all code locations where these are used.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Meta Tag Generator. Allows you to research SEO keywords generate proper compliant meta tags and output them to a HTML or text file for insertion into a finished web project.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Torrtux

    Torrtux

    A terminal-program for downloading torrents from PirateBay

    ...It also allows you to get the details of your torrent, the author, the date, the type, the size, etc., just like being on the TPB site ! Moreover, it retrieves subs from www.opensubtitles.org. It retrieves informations in the source code of the TPB page and parses it with regexp and the library html-parser. In the config file ~/.torrtuxrc, you can chose your display, subs, comments preferences, your torrent-manager and a proxy if needed ! Thanks for reporting all bugs you find !
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Wap Auto Index Advance

    Wap Auto Index Advance

    Auto Index wap is Advance of Download Portal (Multi Language)

    Djamolwap 13v -Advance Auto Index With Web Admin Panel + Multi Language + Themes ||||||||||||||||||||||||||||||||||||| New Updates ||||||||||||||||||||||||||||||||||||| - Multi Language Website 1) English 2) Urdu 3) Gujrati 4) Russian - User/Visitor manual change language website - Multi Language Plugin On/Off - Added Function in Admin Panel - Automatic All Mp3 Tag Setting Added _____________________________________________ Official Website : http://ai.djamol.com Demo...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Geoportal Server
    Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 17
    mindCMS

    mindCMS

    Small, fast and flexible Content Management System for PHP / MySQL

    Small, fast and flexible Content Management System - CMS for PHP / MySQL A very small, fast, compact and flexible Content Management System (CMS) for PHP Webservers using a reasonable amount of functions. Easily maintain your web pages and online files in any webbrowser.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Zoozle Search & Download Suchmaschine

    Zoozle Search & Download Suchmaschine

    Zoozle 2008 - 2010 Webpage, Tools and SQL Files

    Download search engine and directory with Rapidshare and Torrent - zoozle Download Suchmaschine All The files that run the World Leading German Download Search Engine in 2010 with 500 000 unique visitors a day - all the tools you need to set up a clone. Code Contains: - PHP Files for zoozle - Perl Crawler for gathering new content to database and all other cool tools i have...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    Link Finder

    Perl script to extract links from any html page

    Perl script (with source code) to extract links from any html page. No requirements or dependencies for this perl script.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Perstem
    Perstem is a Persian (Farsi) stemmer, morphological analyzer, transliterator, and partial part-of-speech tagger. Inflexional morphemes are separated or removed from their stems. Perstem can also tokenize and transliterate between various character set encodings and romanizations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Hypermail is a program that takes a file of mail messages in UNIX mailbox format and generates a set of cross-referenced HTML documents. Development of hypermail continues now at github: https://github.com/hypermail-project/hypermail
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22

    php-url-extractor

    List al URLs present in requested URL in absolute format

    This php program extracts all URLs present on the requested URL, in absolute path.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    navTango - Local

    navTango - Local

    navTango - Local is a link and document management application.

    "navTango - Local" is a web based application that lets you manage documents and links on your PC. navTango come with a search engine to index documents that live in its repository. The search engine with index PDF, HTML, Word, Powerpoint, Text, Excel and many other types of documents. navTango - Local works with IE, Firefox, Opera, Safari, and Chrome. This is an alpha version so you are on the bleeding edge. Use at your own risk.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ZetaBoards topic fetcher
    Fetches topics with new posts from ZetaBoards forums and does something with the URLs, like opening them in a browser. Configurations can be stored and manipulated for quicker fetching. Development, translations, bug reports, etc. are handled at Launchpad: https://launchpad.net/zb-fetcher SourceForge is used to host released files.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    WebExtractor360 is a free and open source web data extractor. It uses Regular Expressions to find, extract and scrape internet data quickly and easily.
    Downloads: 0 This Week
    Last Update:
    See Project