Newspaper4k is a Python library designed for extracting, processing, and analyzing news articles from websites. It is a continuation and active fork of the original newspaper3k library, which had stopped receiving updates, with the goal of keeping the ecosystem maintained while adding improvements and bug fixes. It provides developers with tools to automatically download web pages, extract the main article content, and collect associated metadata such as titles, authors, images, and publication dates. Newspaper4k also includes natural language processing capabilities that can generate summaries and identify keywords from extracted article text. Newspaper4k supports both single-article extraction and full news site processing, allowing users to build sources representing entire publications and iterate through their articles. It maintains compatibility with the original project so that existing code written for newspaper3k can continue working with minimal changes.

Features

  • Extracts full article text, titles, authors, and publication dates
  • Retrieves images, videos, and other metadata from news pages
  • Supports keyword extraction and article summarization using NLP
  • Processes individual articles or entire news websites as sources
  • Provides a Python API and command-line interface for scraping tasks
  • Maintains compatibility with the original newspaper3k library

Project Samples

Project Activity

See All Activity >

Categories

Web Scrapers

License

MIT License

Follow newspaper4k

newspaper4k Web Site

Other Useful Business Software
Stop Storing Third-Party Tokens in Your Database Icon
Stop Storing Third-Party Tokens in Your Database

Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
Try Auth0 for Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of newspaper4k!

Additional Project Details

Programming Language

Python, Unix Shell

Related Categories

Unix Shell Web Scrapers, Python Web Scrapers

Registered

13 hours ago