DrQA is an open-domain question answering system that reads large text corpora—famously Wikipedia—to answer natural language questions with extractive spans. It follows a two-stage pipeline: a fast document retriever first narrows down candidate articles, and a neural machine reader then predicts the exact answer span from those passages. The retriever relies on classic IR features (like TF-IDF and n-gram statistics) to remain lightweight and scalable to millions of documents. The reader is a neural model trained on supervised QA data to estimate start and end positions within a paragraph, and it can be adapted to new domains through fine-tuning or distant supervision. The repository includes scripts to build the Wikipedia index, train the reader, and evaluate end-to-end performance. DrQA popularized a practical recipe for combining IR and neural reading, and it remains a strong baseline for open-domain QA research and production prototypes.

Features

  • Scalable TF-IDF–based retriever over large corpora
  • Neural span extractor trained for precise start/end predictions
  • End-to-end pipeline from indexing to answering questions
  • Tools for distant supervision and domain adaptation
  • Reproducible training and evaluation scripts for standard datasets
  • Modular components enabling IR or reader swaps and custom corpora

Project Samples

Project Activity

See All Activity >

License

BSD License

Follow DrQA

DrQA Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DrQA!

Additional Project Details

Operating Systems

Linux, Mac

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

2025-10-07