Inspired by requests for its simplicity and powered by lxml for its speed. Newspaper is an amazing python library for extracting & curating articles. Newspaper delivers Instapaper style article extraction. Newspaper is a Python3 library! If you are certain that an entire news source is in one language, go ahead and use the same api. Works in 10+ languages, English, Chinese, German, Arabic, and more! On python3 you must install newspaper3k, not newspaper. newspaper is our python2 library. Although installing newspaper is simple with pip, you will run into fixable issues if you are trying to install on ubuntu. Source objects are an abstraction of online news media websites like CNN or ESPN. You can initialize them in two different ways. Building a Source will extract its categories, feeds, articles, brand, and description for you. You may also provide configuration parameters like language, browser_user_agent, and etc seamlessly.

Features

  • Multi-threaded article download framework
  • News url identification
  • Text extraction from html
  • Top image extraction from html
  • All image extraction from html
  • Keyword extraction from text
  • Summary extraction from text
  • Author extraction from text
  • Google trending terms extraction

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow Newspaper3k

Newspaper3k Web Site

Other Useful Business Software
Go From AI Idea to AI App Fast Icon
Go From AI Idea to AI App Fast

One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Newspaper3k!

Additional Project Details

Operating Systems

Mac

Programming Language

Python

Related Categories

Python MARC and Book Library Metadata, Python Metadata Editors

Registered

2021-05-26