Newspaper4k is a Python library designed for extracting, processing, and analyzing news articles from websites. It is a continuation and active fork of the original newspaper3k library, which had stopped receiving updates, with the goal of keeping the ecosystem maintained while adding improvements and bug fixes. It provides developers with tools to automatically download web pages, extract the main article content, and collect associated metadata such as titles, authors, images, and publication dates. Newspaper4k also includes natural language processing capabilities that can generate summaries and identify keywords from extracted article text. Newspaper4k supports both single-article extraction and full news site processing, allowing users to build sources representing entire publications and iterate through their articles. It maintains compatibility with the original project so that existing code written for newspaper3k can continue working with minimal changes.

Features

  • Extracts full article text, titles, authors, and publication dates
  • Retrieves images, videos, and other metadata from news pages
  • Supports keyword extraction and article summarization using NLP
  • Processes individual articles or entire news websites as sources
  • Provides a Python API and command-line interface for scraping tasks
  • Maintains compatibility with the original newspaper3k library

Project Samples

Project Activity

See All Activity >

Categories

Web Scrapers

License

MIT License

Follow newspaper4k

newspaper4k Web Site

Other Useful Business Software
Gemini 3 and 200+ AI Models on One Platform Icon
Gemini 3 and 200+ AI Models on One Platform

Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of newspaper4k!

Additional Project Details

Programming Language

Python, Unix Shell

Related Categories

Unix Shell Web Scrapers, Python Web Scrapers

Registered

2 days ago