Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't lose what you've already scraped. Start on one page and move to the next easily. The flow is predictable, following a breadth-first crawl through each of the pages. X-ray has support for concurrency, throttles, delays, timeouts and limits to help you scrape any page responsibly. Swap in different scrapers depending on your needs. Currently supports HTTP and PhantomJS driver drivers. In the future, I'd like to see a Tor driver for requesting pages through the Tor network.

Features

  • Flexible schema
  • Composable
  • Pagination support
  • Crawler support
  • Responsible
  • Pluggable drivers

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow X-RAY

X-RAY Web Site

Other Useful Business Software
$300 Free Credits for Your Google Cloud Projects Icon
$300 Free Credits for Your Google Cloud Projects

Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
Start Free Trial
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of X-RAY!

Additional Project Details

Programming Language

JavaScript

Related Categories

JavaScript Search Engines, JavaScript Web Scrapers

Registered

2021-10-05