Showing 40 open source projects for "java-ml"

View related business solutions
  • Enterprise AI Search, Intranet, and Wiki in one platform. Icon
    Enterprise AI Search, Intranet, and Wiki in one platform.

    Your company’s all-in-one solution for trusted information

    Cut through the noise and end information overload with Guru, an all-in-one wiki, intranet, and knowledge base that serves as your company's single source of truth.
  • Claims Processing solution for healthcare practitioners. Icon
    Claims Processing solution for healthcare practitioners.

    Very easy to use for medical, dental and therapy offices.

    Speedy Claims became the top CMS-1500 Software by providing the best customer service imaginable to our thousands of clients all over America. Medical billing isn't the kind of thing most people get excited about - it is just a tedious task you have to do. But while it will never be a fun task, it doesn't have to be as difficult or time consumimg as it is now. With Speedy Claims CMS-1500 software you can get the job done quickly and easily, allowing you to focus on the things you love about your job, like helping patients. With a simple interface, powerful features to eliminate repetitive work, and unrivaled customer support, it's simply the best HCFA 1500 software available on the market. A powerful built-in error checking helps ensure your HCFA 1500 form is complete and correctly filled out, preventing CMS-1500 claims from being denied.
  • 1
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make every...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    ACHE Focused Crawler

    ACHE Focused Crawler

    ACHE is a web crawler for domain-specific search

    ACHE is a focused web crawler. It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Field Service Management Software | BlueFolder Icon
    Field Service Management Software | BlueFolder

    Maximize technician productivity with intuitive field service software

    Track all your service data in one easy-to-use system, enabling your team to move faster and generate more revenue for your bottom line.
  • 5
    WFDownloader App

    WFDownloader App

    Free batch downloader for image, wallpaper, video, audio, document,

    ... by one. Go to the Help menu or check out website to get started. Note that this cross-platform version requires Java (minimum version Java 8) to be installed on your Operating System. For non-java required OS specific versions, check app's website.
    Leader badge
    Downloads: 142 This Week
    Last Update:
    See Project
  • 6
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender...
    Leader badge
    Downloads: 94 This Week
    Last Update:
    See Project
  • 7
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    ... to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    crawler4j

    crawler4j

    Open source web crawler for Java

    crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not. In the above example, this example is not allowing .css, .js and media files and only allows pages within...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    GitGet

    GitGet

    Ever wanted to download only a part of a Git repository.

    Ever wanted to download only a part of a Git repository. Just paste the URL of the repo you want to download and sit back and enjoy. This simple java application makes use of Web Scraping and downloads only those files you need, thus helping you save your precious bandwidth and space.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Eptura Workplace Software Icon
    Eptura Workplace Software

    From desk booking and visitor management, to space planning and office utilization data, Eptura Workplace helps your entire organization work smarter.

    With the world of work changed forever, it’s essential to manage your workplace and assets together to effectively create a high-performing environment. The Eptura experience combines the power of workplace management software with asset management, enabling you to effectively operate your building and facilitate hybrid work.
  • 10
    Save For Offline

    Save For Offline

    Android app for saving webpages for offline reading

    Android app for saving webpages for offline reading. Save For Offline is an Android app for saving full web pages for offline reading, with lots of features and options. In you web browser selects 'Share', and then 'Save For Offline'. Saves real HTML files which can be opened in other apps/devices. Download & save entire web pages with all assets for offline reading & viewing. Save HTML files in a custom directory. Save in the background, no need to wait for it to finish saving. Night mode,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 12
    phoneutria
    A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Simple-Scrape is a simple web-scraping library that allows for programmatic access to HTML code. No further techniques are needed and the library is very compact and thus easy to use.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14

    WebCollector

    WebCollector is an open source web crawler framework based on Java.

    WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes. Github: https://github.com/CrawlScript/WebCollector Demo: https://github.com/CrawlScript/WebCollector/blob/master/YahooCrawler.java
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Constellio Enterprise Search engine

    Constellio Enterprise Search engine

    Open source Search Engine and Enterprise Search

    Constellio is an enterprise search engine that allows companies to search all their organization's information through a single interface (Web, CRM, ERP, ECM, Mail etc.). Constellio is Based on Apache Solr and Google Search Appliance's connector. Constellio has a powerful web crawler.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Yet another web crawler? Yes, but this ones uses the full power of regular expressions to accept or reject, examine or ignore, save or refuse pages. You also use MIME types to do all this. Powerful and flexible.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 19
    a minimal Java web crawler
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Spider web scritto in java che consente un utilizzo sia come applicazione stand alone, sia come core di altre applicazioni che sfruttino le sue funzionalità.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ItSucks
    This project is a java web spider (web crawler) with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionalities are also available in a separate library.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    Folksonomy Web Crawler
    A Web crawler prototype designed to index pages of certain resource sharing platforms based on folksonomy tags. The results are displayed in an Excel spreadsheet.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    A simple to set up web scraper written in Java. It uses modified regEx to quickly write complex patterns to parse data out of a website. It contains a GUI tool for testing your configuration scripts and is fully automated through the command line
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next