Showing 32 open source projects for "open any file"

View related business solutions
  • User Testing Platform | Testeum Icon
    User Testing Platform | Testeum

    Get worldwide testers to review your software, app or website! Quickly find bugs and usability issues in less than 48 hours.

    Tired of bugs and poor UX going unnoticed despite thorough internal testing? Testeum is the SaaS crowdtesting platform that connects mobile and web app creators with carefully selected testers based on your criteria.
    Learn More
  • Never Get Blocked Again | Enterprise Web Scraping Icon
    Never Get Blocked Again | Enterprise Web Scraping

    Enterprise-Grade Proxies • Built-in IP Rotation • 195 Countries • 20K+ Companies Trust Us

    Get unrestricted access to public web data with our ethically-sourced proxy network. Automated session management and advanced unblocking handle the hard parts. Scale from 1 to 1M requests with zero blocks. Built for developers with ready-to-use APIs, serverless functions, and complete documentation. Used by 20,000+ companies including Fortune 500s. SOC2 and GDPR compliant.
    Get Started
  • 1
    miniblink49

    miniblink49

    Lighter, faster browser kernel of blink to integrate HTML UI in apps

    miniblink is an open source, one file, small browser widget based on chromium. By using C interface, you can create a browser with just some line code. miniblink is an open source, single-file, and currently the smallest known chromium-based browser control. Through its exported pure C interface, a browser control can be created in a few lines of code. C++, C#, Delphi and other language calls (support C++, C#, Delphi language to call). Embedded Nodejs, support electron (with Nodejs, can run...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Roach

    Roach

    The complete web scraping toolkit for PHP

    ... on its own or install one of the framework-specific adapters. Currently, there’s a first-party adapter available to use Roach in your Laravel projects with more coming. Roach is built from the ground up with extensibility in mind. In fact, most of Roach’s built-in behavior works the exact same way that any custom extensions or middleware works.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    jsoup

    jsoup

    Java library for working with real-world HTML

    ... attempt to create a clean parse from the HTML you provide, regardless of whether the HTML is well-formed or not. You have HTML in a Java String, and you want to parse that HTML to get at its contents, or to make sure it's well formed, or to modify it. The String may have come from user input, a file, or from the web.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    BrowserBox

    BrowserBox

    Remote isolated browser API for security

    ... resources in a way that's both more sandboxed than, and less restricted than, traditional web <iframe> elements. Build applications that need cross-origin access, while delivering complex user stories that benefit from an encapsulated browser abstraction. Since the whole stack is written in JavaScript you can easily extend it to suit your needs. The technology that puts unrestricted browser capabilities within reach of a web app has never before existed in the open.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • 5
    Firecrawl

    Firecrawl

    Turn entire websites into LLM-ready markdown or structured data

    Crawl and convert any website into LLM-ready markdown or structured data. Built by Mendable.ai and the Firecrawl community. Includes powerful scraping, crawling, and data extraction capabilities. Firecrawl is an API service that takes a URL, crawls it, and converts it into clean markdown or structured data. We crawl all accessible subpages and give you clean data for each. No sitemap is required.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    ScrapydWeb

    ScrapydWeb

    Web app for Scrapyd cluster management

    Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization. Make sure that Scrapyd has been installed and started on all of your hosts. Start ScrapydWeb via command scrapydweb. (a config file would be generated for customizing settings on the first startup.) Add your Scrapyd servers, both formats of string and tuple are supported, you can attach basic auth for accessing the Scrapyd server, as well as a string for grouping or labeling. You can select any number...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Web Scraping for Laravel

    Web Scraping for Laravel

    Laravel adapter for Roach, the complete web scraping toolkit for PHP

    This is the Laravel adapter for Roach, the complete web scraping toolkit for PHP. Easily integrate Roach into any Laravel application. The Laravel adapter mostly provides the necessary container bindings for the various services Roach uses, as well as making certain configuration options available via a config file. The Laravel adapter of Roach registers a few Artisan commands to make out development experience as pleasant as possible. Roach ships with an interactive shell (often called Read...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    ... just load actually all links it is finding (and is allowed to load according to the robots.txt file), then it would just load the whole internet (if the URL(s) it starts with are no dead end). Or it can be restricted to load only links matching certain criteria (on same domain/host, URL path starts with "/foo",...) or only to a certain depth. A depth of 3 means 3 levels deep. Links found on the initial URLs provided to the crawler are level 1 and so on.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 by Okta Icon
    Our Free Plans just got better! | Auth0 by Okta

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.
    Try free now
  • 10
    Ulixee Hero

    Ulixee Hero

    The web browser built for scraping

    ... script as practically any browser.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Letterboxd Recommendations

    Letterboxd Recommendations

    Scraping publicly-accessible Letterboxd data for movie recommendations

    Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username. A user's "star" ratings are scraped from their Letterboxd profile and assigned numerical ratings from 1 to 10 (accounting for half stars). Their ratings are then combined with a sample of ratings from the top 4000 most active users on the site to create a collaborative filtering recommender model using singular value...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for URLs...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you have...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Scrapy-Redis

    Scrapy-Redis

    Redis-based components for Scrapy

    You can start multiple spider instances that share a single redis queue. Best suitable for broad multi-domain crawls. Scraped items gets pushed into a redis queued meaning that you can start as many as needed post-processing processes sharing the items queue. Scheduler + Duplication Filter, Item Pipeline, Base Spiders. Default requests serializer is pickle, but it can be changed to any module with loads and dumps functions. Note that pickle is not compatible between python versions. Version 0.3...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    SecretAgent

    SecretAgent

    The web scraper that's nearly impossible to block

    SecretAgent is a headless browser that’s nearly impossible to detect. It achieves this by emulating real users. And it has powerful auto-replay functionality that lets you create and debug scripts in record setting time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    WFDownloader App

    WFDownloader App

    Free batch downloader for image, wallpaper, video, audio, document,

    Use as an image gallery, wallpaper, audio/music, video, document, and other media bulk downloader from supported websites. Also use to download sequential website urls that have a certain pattern (e.g. image01.png to image100.png). Also use app's built-in site crawler for advanced link search or extraction. There is also special support for forum media and open directory downloading. It's a programmable downloader and also works with password protected sites. Say goodbye to downloading one...
    Leader badge
    Downloads: 136 This Week
    Last Update:
    See Project
  • 19
    Easyspider - Distributed Web Crawler

    Easyspider - Distributed Web Crawler

    Easy Spider is a distributed Perl Web Crawler Project from 2006

    Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing Software: https://www.artikelschreiber.com/en/ https://www.unaique.net/en/ https://www.unaique.com/ https://www.artikelschreiber.com/marketing/ https://www.paraphrasingtool1.com...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't lose...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    GOPA

    GOPA

    GOPA, a spider written in Golang, for Elasticsearch

    GOPA, a spider written in Golang, for Elasticsearch. Lightweight, low footprint, memory requirement should, be 100MB. Easy to deploy, no runtime or dependency required. Easy to use, no programming or script ability needed, out-of-box features. First of all, get it, two opinions: download the pre-built package or compile it yourself. Besides Elasticsearch, Gopa doesn't require any other dependencies, just simply run ./gopa to start the program. It's safety to press ctrl+c to stop the current...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Twitter Intelligence

    Twitter Intelligence

    Twitter Intelligence OSINT project performs tracking and analysis

    A project written in Python for Twitter tracking and analysis without using Twitter API. This project is a Python 3.x application. The package dependencies are in the file requirements.txt. Run that command to install the dependencies. SQLite is used as the database. Tweet data is stored on the Tweet, User, Location, Hashtag, HashtagTweet tables. The database is created automatically. analysis.py performs analysis processing. User, hashtag, and location analyzes are performed. You must write...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    phoneutria
    A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    PGBuild

    Compile your mobile web pages into mobile aps via build.phonegap.com

    PGbuild is a Phonegap development system that automates the development process by connecting your CMS/web server with the online service [Phonegap Build](http://build.phonegap.com). PGBuild is essentially a web spider that make off-line versions of web pages. The off-line version is zippped and send to the Phonegap Build service. The spider is controlled by a project file that sets the rules for the spider and the options for the phonebap build service. You may create and manage your...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
    Downloads: 8 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next