Newspaper4k is a Python library designed for extracting, processing, and analyzing news articles from websites. It is a continuation and active fork of the original newspaper3k library, which had stopped receiving updates, with the goal of keeping the ecosystem maintained while adding improvements and bug fixes. It provides developers with tools to automatically download web pages, extract the main article content, and collect associated metadata such as titles, authors, images, and publication dates. Newspaper4k also includes natural language processing capabilities that can generate summaries and identify keywords from extracted article text. Newspaper4k supports both single-article extraction and full news site processing, allowing users to build sources representing entire publications and iterate through their articles. It maintains compatibility with the original project so that existing code written for newspaper3k can continue working with minimal changes.

Features

  • Extracts full article text, titles, authors, and publication dates
  • Retrieves images, videos, and other metadata from news pages
  • Supports keyword extraction and article summarization using NLP
  • Processes individual articles or entire news websites as sources
  • Provides a Python API and command-line interface for scraping tasks
  • Maintains compatibility with the original newspaper3k library

Project Samples

Project Activity

See All Activity >

Categories

Web Scrapers

License

MIT License

Follow newspaper4k

newspaper4k Web Site

Other Useful Business Software
Build Securely on Azure with Proven Frameworks Icon
Build Securely on Azure with Proven Frameworks

Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
Download Now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of newspaper4k!

Additional Project Details

Programming Language

Python, Unix Shell

Related Categories

Unix Shell Web Scrapers, Python Web Scrapers

Registered

2026-03-11