Showing 51 open source projects for "java html parser"

View related business solutions
  • Brilliantly simple email signature management. Icon
    Brilliantly simple email signature management.

    No matter the size or needs of your business, Exclaimer helps you amplify every email with optimized, effective email signatures

    Exclaimer provides the perfect solutions to create and control multiple organizational signatures from a centralized location. Our award-winning solutions gives users a dynamic and professional email signature when sent from any device, including Macs and mobiles, and create large cost savings by reducing the load on current IT personnel.
  • Foxhunt Security Awareness Platform for Businesses Icon
    Foxhunt Security Awareness Platform for Businesses

    For enterprises requiring a cybersecurity training platform to train their employees and reduce potential risks and threats

    Awareness isn’t enough. Hoxhunt uses interactive, bite-sized trainings that employees love to dramatically increase engagement, ensure compliance, and (measurably) reduce risky behaviors.
  • 1
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender...
    Leader badge
    Downloads: 93 This Week
    Last Update:
    See Project
  • 2
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 3
    Free Manga Downloader

    Free Manga Downloader

    Forked from https://sf.net/p/fmd/

    The Free Manga Downloader (FMD) is an open source application written in Object-Pascal for managing and downloading manga from various websites. This is a mirror of main repository on GitHub. For feedback/bug report visit https://github.com/riderkick/FMD
    Leader badge
    Downloads: 547 This Week
    Last Update:
    See Project
  • 4
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 19 This Week
    Last Update:
    See Project
  • Seamlessly Connect Your Entire Supply Chain with Top-Rated EDI Software Icon
    Seamlessly Connect Your Entire Supply Chain with Top-Rated EDI Software

    TrueCommerce is the only electronic data interchange (EDI) provider that offers a true one-stop shopping solution for EDI compliance.

    TrueCommerce delivers the most complete and scalable EDI solution that makes it easier and more cost-effective to do business with your trading partners.
  • 5
    cpDetector is a proxy for codepage detection of documents. It delegates to multiple instances that try to detect the codepage by different techinques. A command line executeable is shipped that allows to sort documents by codepage.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6
    JuniCoder is a Java project that uses unicode as a base for decoding and encoding formats that invented workarounds to express characters not covered by ASCII. Decoders translate those inventions to unicode. Encoders encode to these inventions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    The MangaStream Downloader is an open source application written in Java for managing and downloading manga from the site mangastream.com and mangafox.me. It is written under the GNU-GPL license and uses an open source HTML parser - TagSoup. Follow the project page on Facebook for updates: https://www.facebook.com/MangastreamDownloader
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 9

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Workable Hiring Software - Hire The Best People, Fast Icon
    Workable Hiring Software - Hire The Best People, Fast

    Find the best candidates with the best recruitment software

    Workable is the preferred software for today's recruiting industry and HR teams, trusted by over 6,000 companies to streamline their hiring processes. Finding the right person for the job has never been easier—users now possess the ability to manage multiple hiring pipelines at once, from posting a job to sourcing candidates. Workable is also seamlessly integrated between desktop and mobile, allowing admins full control and flexibility all in the ATS without needing additional software.
  • 10
    NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 11
    Torrtux

    Torrtux

    A terminal-program for downloading torrents from PirateBay

    ..., etc., just like being on the TPB site ! Moreover, it retrieves subs from www.opensubtitles.org. It retrieves informations in the source code of the TPB page and parses it with regexp and the library html-parser. In the config file ~/.torrtuxrc, you can chose your display, subs, comments preferences, your torrent-manager and a proxy if needed ! Thanks for reporting all bugs you find !
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    ChanDown - 4chan Image Downloader

    ChanDown - 4chan Image Downloader

    Auto Rescanning - Search Terms - Regularly Updated With New Features

    ========== NOTE: (AS OF 11/05/2015) 4chan html structure has changed, full images are downloaded as well as the thumbnail. Fix coming shortly (after my exams are over) to stop the thumbnails from downloading. ========== This is the first release of my 4chan image downloader. This downloader packs loads of great features such as the search ability. Check the features section and be sure to let me know if you want a feature added. Coming Soon: - Wiki, explaining in depth how to use...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Geoportal Server
    Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 14
    Regain is a Java search engine based on Jakarta Lucene. It provides indexing and searching files for plenty of formats (HTML,XML,doc(x),xls(x),ppt(x),oo,PDF,RTF,mp3,mp4,Java). A TagLibrary eases integrating search results in your JSP based web page.
    Downloads: 37 This Week
    Last Update:
    See Project
  • 15
    webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16

    ssSearchEngine

    keyword search engine for semi-structured data (Tables, lists,...)

    This application implement an approach for doing keyword based search over semi-structured data available in HTML pages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    HXPath

    XPath HTML parser

    HXPath is a command line tool useful to extract data from HTML documents. HXPath can select sub trees, like the standard xpath tool, but is also able to read contents and attributes and output them in a bash friendly format. HTML Tidy and HTTP/HTTPS get are built in too.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    A simple repository utilite that makes a html-page with the list of the directory.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ImageCrawler Application to extract Images from Websites. A Thumbnail view is provided. Based on Spring.NET and the HTML Agility Pack
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    JavaPub is a one-click install BibTex-publications portal based on a simple java codebase. It features a drag-and-drop uploader module to upload BibTex files and a module that generates the html-index and entry-pages for publication listings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Web-as-corpus tools in Java. * Simple Crawler (and also integration with Nutch and Heritrix) * HTML cleaner to remove boiler plate code * Language recognition * Corpus builder
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Webstats Solr is an attempt to make Apache Access log easier to Data Mine. By adding a powerful Search Engine (SOLR) as a Backend and using Java Script and HTML and maybe PHP I hope to out date AWStats.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    EasyGIS simplifies GIS data management, sharing, and publishing. REST interfaces (json, html views). Lucene based FTS searches. Thematic maps, business cartography. Integration with external GIS data providers - Google, OSM.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    HttpFinder is web content searching tool. It enables look for text content that matches given regular expression in html pages/scripts etc. All navigation is performed with use of other regexp which describes links to visit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next