Showing 16 open source projects for "command line search text"

View related business solutions
  • Red Hat Enterprise Linux on Microsoft Azure Icon
    Red Hat Enterprise Linux on Microsoft Azure

    Deploy Red Hat Enterprise Linux on Microsoft Azure for a secure, reliable, and scalable cloud environment, fully integrated with Microsoft services.

    Red Hat Enterprise Linux (RHEL) on Microsoft Azure provides a secure, reliable, and flexible foundation for your cloud infrastructure. Red Hat Enterprise Linux on Microsoft Azure is ideal for enterprises seeking to enhance their cloud environment with seamless integration, consistent performance, and comprehensive support.
    Learn More
  • Expo Pass Event Management Software Icon
    Expo Pass Event Management Software

    Your next event just got next level.

    We make all kinds of events, all kinds of easy. No matter what you got planned, we got the technology stuff to make it even better.
    Learn More
  • 1
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Easyspider - Distributed Web Crawler

    Easyspider - Distributed Web Crawler

    Easy Spider is a distributed Perl Web Crawler Project from 2006

    .../ https://www.artikelschreiben.com/ https://buzzerstar.com/ https://easyperlspider.sourceforge.io/ http://artikelschreiber.net/ http://sebastianenger.com/ http://unaique.de/ http://unaique.org/ It is fun to look at some code that is few years ago and to see how one has improved himself. If you want to write text automatically try https://www.artikelschreiber.com/en/ or https://www.unaique.net/en/!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Yet another web crawler? Yes, but this ones uses the full power of regular expressions to accept or reject, examine or ignore, save or refuse pages. You also use MIME types to do all this. Powerful and flexible.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Discover the full potential of site search Icon
    Discover the full potential of site search

    AddSearch provides lightning-fast, effortless, and customizable site search for any website or web application.

    Increase conversions, reduce helpdesk costs and make your customers happy. Let us show you how.
    Learn More
  • 5
    A simple to set up web scraper written in Java. It uses modified regEx to quickly write complex patterns to parse data out of a website. It contains a GUI tool for testing your configuration scripts and is fully automated through the command line
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    arachnode.net is an open source Web crawler for downloading, indexing and storing Internet content including e-mail addresses, files, hyperlinks, images, and Web pages and is written in C# using SQL Server 2008. See http://arachnode.net for the LATEST.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    bee-rain is a web crawler that harvest and index file over the network. You can see result by bee-rain website : http://bee-rain.internetcollaboratif.info/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Methanol is a scriptable multi-purpose web crawling system with an extensible configuration system and speed-optimized architectural design. Methabot is the web crawler of Methanol.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Redefine the way your organization pursues opportunity and manages risk. Icon
    Make the right business decisions with an easy to use solution that provides a comprehensive integrated approach to governance, risk and compliance.
    Learn More
  • 10
    elk is a powerful open-source python based command-line web crawler that can recursively search for files and text on websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Web Crawler & indexer project, for university
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    WebNews Crawler is a specific web crawler (spider, fetcher) designed to acquire and clean news articles from RSS and HTML pages. It can do a site specific extraction to extract the actual news content only, filtering out the advertising and other cruft.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    This plug-in for Google Desktop is a simple web spider (Könguló is Icelandic for spider) that crawls websites you specify, e.g. intranet websites, and dumps them into Google Desktop. You must install Google Desktop prior to installing the plug-in.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Robust featureful multi-threaded CLI web spider using apache commons httpclient v3.0 written in java. ASpider downloads any files matching your given mime-types from a website. Tries to reg.exp. match emails by default, logging all results using log4j.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    FWebSpider is a web crawler application written on Perl. It performs chosen site crawl, featuring response cache, URL storage, URL exclusion rules and more. It is developed to function as a local/global site search engine core.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Webhunter is a distributed, multi-threaded web crawler designed for both general indexing and crawling the web for focused content.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next