crawley

Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for URLs (this can lead to bogus results) Make use of HTTP_PROXY / HTTPS_PROXY environment values + handle proxy auth. Directory-only scan mode (aka fast-scan)

Features

Idiomatic, 100% test covered codebase
Below 1500 SLOC
Grabs most of useful resources URLs
Directory-only scan mode
Allows to ignore URLs with matched substrings from crawling
Extract API endpoints from JS files

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow crawley

crawley Web Site

Other Useful Business Software

$300 Free Credits for Your Google Cloud Projects

Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial

Rate This Project

User Reviews

Be the first to post a review of crawley!

Additional Project Details

Operating Systems

Mac, Windows

Programming Language

Related Categories

Go Web Scrapers

Registered

2023-04-12

Similar Business Software

Gaffa

Gaffa is a web scraping and browser automation API that gives developers full, real-browser control with a single API call no headless browsers, proxies, CAPTCHA handling, or scaling infrastructure to manage. JavaScript rendering is handled by default, so pages load exactly as they would for a...

See Software
NetNut

Get ready to experience unmatched control and insights with our user-friendly dashboard tailored to your needs. Monitor and adjust your proxies with just a few clicks. Track your usage and performance with detailed statistics. Our team is devoted to providing customers with proxy solutions...

See Software
Apify

Apify is a full-stack web scraping and automation platform helping anyone get value from the web. At its core is Apify Store, a marketplace with over 10,000 Actors where developers build, publish, and monetize automation tools. Actors are serverless cloud programs that extract data, automate...

See Software
Oxylabs

Oxylabs is a market leader in web intelligence with enterprise-grade, ethical, and compliant solutions. Its proxy infrastructure spans one of the largest global networks, offering residential, ISP, mobile, datacenter, & dedicated datacenter proxies, along with Web Unblocker – an AI-driven...

See Software
Bright Data

Bright Data is the world's #1 web data, proxies, & data scraping solutions platform. Fortune 500 companies, academic institutions and small businesses all rely on Bright Data's products, network and solutions to retrieve crucial public web data in the most efficient, reliable and flexible...

See Software
Price2Spy

Price2Spy makes automatic price adjustments easy to perform saving your most valuable resource - time, allowing your pricing team to focus on strategic planning and management. Since 2010, we have provided pricing intelligence for retailers and brands in 40+ countries, helping them smoothly...

See Software