LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects. LLM Scraper integrates browser automation through Playwright, allowing it to load webpages and process their content before sending it to a language model for interpretation. Multiple content processing modes are supported, including raw HTML, cleaned HTML, Markdown, extracted text, screenshots, and custom inputs, making it adaptable to a wide range of scraping scenarios. LLM Scraper also provides streaming output and code generation capabilities that help developers build reusable scraping workflows.

Features

  • Extracts structured data from webpages using large language models
  • Supports multiple LLM providers including GPT, Gemini, Llama, Qwen, and Sonnet
  • Schema-based data extraction using Zod or JSON Schema
  • Built on Playwright for automated webpage loading and interaction
  • Streaming mode for receiving partial structured outputs in real time
  • Code generation for creating reusable Playwright scraping scripts

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow LLM Scraper

LLM Scraper Web Site

Other Useful Business Software
Try Google Cloud Risk-Free With $300 in Credit Icon
Try Google Cloud Risk-Free With $300 in Credit

No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of LLM Scraper!

Additional Project Details

Programming Language

TypeScript

Related Categories

TypeScript Artificial Intelligence Software

Registered

2026-03-17