news-please is an open source news crawler and information extraction tool designed to collect and structure articles from online news websites. It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a site. It combines several established technologies and libraries to perform web crawling and content extraction, enabling reliable processing across a wide range of news sources. Developers can use the software either as a standalone command line application or integrate it into their own Python applications through its library interface. Extracted article data can be stored in different formats and systems, including JSON files or database-backed storage solutions.

Features

  • Crawls news websites and extracts structured article information
  • Recursively follows internal links and RSS feeds to discover articles
  • Extracts metadata such as headline, authors, language, images, and dates
  • Supports command line usage and integration as a Python library
  • Can retrieve and process large news archives from Common Crawl datasets
  • Stores extracted data in formats such as JSON or database backends

Project Samples

Project Activity

See All Activity >

Categories

Web Scrapers

License

Apache License V2.0

Follow news-please

news-please Web Site

Other Useful Business Software
Our Free Plans just got better! | Auth0 Icon
Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of news-please!

Additional Project Details

Programming Language

Python

Related Categories

Python Web Scrapers

Registered

2026-03-10